journal_scraping_program

Goal: to help researches achieve efficient journal research and automated downloads Beautifulsoup, HTML, XML, FTP, Selenium Prerequisite: Chrome

Procedures:

Search criteria and size entered by user
Pubmed Central search using query string
Beautifulsoup to scrape the journal IDs
Use ID for PMC API
ftp url received from API's XML page
ftp request made and write the content to pdf file
To change page and obtain more inputs, Selenium is used

Downloading Chrome Webdrivder

Create new folder to put

2. Copy the path of this folder and add it to system

Paste it here

3. Download chromedriver from website https://chromedriver.storage.googleapis.com/index.html?path=94.0.4606.61/ get "win32" 4. Move the download to your folder created in step 1

Problems to be considered:

Ranking of journals by no. of citations, or by recency
More efficient way to avoid repetition of journal downloads