Pdf & images lab

A project to explore libraries to extract text from pdfs such as:

Besides, I explore others to extract text from images such as

Setup

Step 1. Navigate to the root directory of the repository and create a new conda environment for development:

conda create -n <your_env_name> python=3.12 -y && conda activate <your_env_name>

Step 2. Install poetry

conda install -c conda-forge poetry

Step 3. Install ipywidgets

conda install -n <your_env_name> ipywidgets

Step 4. Install all the dependencies

poetry install --no-root

Go to the notebook and select your environment to run the cells.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
donut		donut
img_easyocr		img_easyocr
img_pytesseract		img_pytesseract
img_transformers		img_transformers
pdf_mathpix		pdf_mathpix
pdfminer		pdfminer
pdfplumber		pdfplumber
pyMuPDF		pyMuPDF
pyPDF2		pyPDF2
pypdfium2		pypdfium2
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml