It is a scraper that loads an image from facebook, downloads it in full size and travels through the gallery until it arrives to the same image.
It assumes that:
- the user provided link is an opened image, not just a gallery.
- certain style classes appear in these facebook pages
- the user provides good login information
Disclaimer: The secretly entered password is only used to pass it to selenium in the login page
- Uses Python 3.6
- Uses
selenium chromedriver
. The windows version is included in the repository, see this link for other versions. More information here - Google Chrome must be installed on your computer
- pip
- the virtual environment shall be loaded like:
virtualenv -p c:\Python36\python.exe .ve
- run
.ve.bat
to init the working directory by typing.ve
:- the webdriver will be added to the path
- the virtualenv will be loaded
- the requirements should be installed
pip install -r requirements.txt
- create
options.json
file fromoptions_template.json
. It will be ignored by git. - to leave from the virtualenv type
deactivate
- Update the
options.json
appropriately - Run
python traverse_gallery.py
- Enter your password in the prompt
After first successful login, you can save the printed cookie value to the options file, set the force-login
field to false and then you do not have to provide your data again until your tokens are valid.
Name | Description |
---|---|
loginURL | Url for the domain for session cookies, also contains the login fields. |
start_images | Array that holds full URL address (with parameters) of one image from each downloadable gallery. |
max_workers | Number of parallel image save processes |
username | The credential email address, if it is not present it will be asked for. It won't be stored anywhere, only sent to selenium. |
cookies | The cookies facebook uses to authenticate the users. After login I write the current cookies to the cobsole. I recommend you to fill it with that one, but feel free to get it from another source, but it might not work as intended. |
force_login | If true the login data will be requested, the password will have to be written in secretly. If false then the provided cookies will be used to authenticate the requests, but when no cookies are provided, then the user will be forced to sign in. |
save_image_index | If set to true , then save images by their appearance order by adding a number before their names. |
destination_dir | The destination directory of the result. Should be empty. The string does not need to contain a slash in the end. |
unique_galleries | If true , add timestamp to the start of the gallery folder name. Without this there is no guarantee that two galleries will be saved with different names. |
The galleries will be saved to a directory by the album names. The directories contain the images that are inside them. Also three files:
captions.txt
: The captions of the imagesdata.json
: All of the extracted data for further usageurls.txt
: Urls of the saved images in case of corrupted or missing downloads.
In case of errors see the log generated by the program, it might contain information about the errors.
see todo.md file
This project came alive, because I needed to collect the images uploaded to our group, to fill up our galleries in our public site, with the original image captions included. I've searched for already existing solutions, but I haven't found exactly what I was looking for. I've found seeya's Facebook Gallery Downloader. I couldn't make it work, facebook has changed since its latest commits, so I tried to use a generalized solution, closely to what I would do if I had to do it manually. It inspired me to use selenium, I updated the code to python3 and added my own tweaks.