Skip to content

jiaqiwu1999/journal_scrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

journal_scraping_program

Goal: to help researches achieve efficient journal research and automated downloads Beautifulsoup, HTML, XML, FTP, Selenium Prerequisite: Chrome

Procedures:

  1. Search criteria and size entered by user
  2. Pubmed Central search using query string
  3. Beautifulsoup to scrape the journal IDs
  4. Use ID for PMC API
  5. ftp url received from API's XML page
  6. ftp request made and write the content to pdf file
  7. To change page and obtain more inputs, Selenium is used

Downloading Chrome Webdrivder

  1. Create new folder to put

Pasted Graphic

2. Copy the path of this folder and add it to system

Pasted Graphic 1

Pasted Graphic 2

Pasted Graphic 3

Pasted Graphic 4

Pasted Graphic 5

Paste it here

Pasted Graphic 6

Pasted Graphic 7

3. Download chromedriver from website https://chromedriver.storage.googleapis.com/index.html?path=94.0.4606.61/ get "win32" 4. Move the download to your folder created in step 1

Problems to be considered:

  1. Ranking of journals by no. of citations, or by recency
  2. More efficient way to avoid repetition of journal downloads

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages