Skip to content

Latest commit

 

History

History
41 lines (29 loc) · 793 Bytes

README.md

File metadata and controls

41 lines (29 loc) · 793 Bytes

Pdf & images lab

A project to explore libraries to extract text from pdfs such as:

  • pdfminer
  • pyMuPDF
  • pyPDF2
  • ptpdfium2

Besides, I explore others to extract text from images such as

  • pytesseract
  • easyocr
  • transformers models from huggingface

Setup

Step 1. Navigate to the root directory of the repository and create a new conda environment for development:

conda create -n <your_env_name> python=3.12 -y && conda activate <your_env_name>

Step 2. Install poetry

conda install -c conda-forge poetry

Step 3. Install ipywidgets

conda install -n <your_env_name> ipywidgets

Step 4. Install all the dependencies

poetry install --no-root

Usage

Go to the notebook and select your environment to run the cells.