Skip to content

Latest commit

 

History

History
34 lines (30 loc) · 2.09 KB

README.md

File metadata and controls

34 lines (30 loc) · 2.09 KB

journal_scraping_program

Goal: to help researches achieve efficient journal research and automated downloads Beautifulsoup, HTML, XML, FTP, Selenium Prerequisite: Chrome

Procedures:

  1. Search criteria and size entered by user
  2. Pubmed Central search using query string
  3. Beautifulsoup to scrape the journal IDs
  4. Use ID for PMC API
  5. ftp url received from API's XML page
  6. ftp request made and write the content to pdf file
  7. To change page and obtain more inputs, Selenium is used

Downloading Chrome Webdrivder

  1. Create new folder to put

Pasted Graphic

2. Copy the path of this folder and add it to system

Pasted Graphic 1

Pasted Graphic 2

Pasted Graphic 3

Pasted Graphic 4

Pasted Graphic 5

Paste it here

Pasted Graphic 6

Pasted Graphic 7

3. Download chromedriver from website https://chromedriver.storage.googleapis.com/index.html?path=94.0.4606.61/ get "win32" 4. Move the download to your folder created in step 1

Problems to be considered:

  1. Ranking of journals by no. of citations, or by recency
  2. More efficient way to avoid repetition of journal downloads