Biblio is a command-line tool for extracting metadata from academic PDFs and renaming them according to a customisable format. It leverages the Google Gemini API for metadata extraction and provides a flexible way to manage your academic or research PDF library.
- Extracts metadata (authors, title, and year) from PDFs.
- Uses Google Gemini LLM for intelligent text extraction.
- Renames PDFs in a customisable format.
- Handles batch processing efficiently.
- Skips files with missing metadata.
- Prevents filename conflicts by sanitizing invalid characters.
-
Clone the Repository:
git clone <repository_url> cd biblio
-
Create the
.env
File: Create a.env
file in the root directory of the project. Add the following content, replacing the placeholders with your actual values:MODEL = gemini-2.0-flash-lite API_KEY = YOUR_GEMINI_API_KEY FORMAT = "{authors} ({year}). {title}"
- MODEL: The name of the Gemini model to use (e.g.,
gemini-2.0-flash-lite
). - API_KEY: Your Google Gemini API key.
- FORMAT_STR: The format string for renaming files. You can use the following placeholders:
{authors}
: The authors of the document.{title}
: The title of the document.{year}
: The year of the document.- if these properties are not found, default values are used. For example:
{authors}
defaults toUnknown Author
.
- MODEL: The name of the Gemini model to use (e.g.,
-
Build the Project:
cargo build --release
biblio file1.pdf file2.pdf ...
- Processes multiple PDFs at once.
- Extracted metadata is used to rename the files automatically.
- Example output:
> biblio paper1.pdf paper2.pdf - Loaded config. Model: gemini-2.0-flash-lite - Processing 2 file(s) - Processing batch: #1 - File: "paper1.pdf" - Renamed "paper1.pdf" to "Smith, J. (2020). Research Study" - File: "paper2.pdf" - Renamed "paper2.pdf" to "Doe, J., & Brown, A. (2018). AI in Healthcare.pdf"
MIT License. See LICENSE
file for details.