Welcome to PDF2TEXTO✨,
This is an open source project to deploy a webpage, upload a pdf file and download their .txt file content.
There are 2 versions available:
- PDF2texto.v.1: only works for selectable text (the code is this repo).
- PDF2texto.v.2: works for any type of pdf file (OCR implemented; the code is found in folder Streamlit_Colaboratory).
How does it work?
data:image/s3,"s3://crabby-images/a6724/a6724da0c5023a237e4fa7b6c3c165da12fce956" alt=""
How to use it?
- Run the Colaboratory notebook cell by cell.
- The last cell will return an output similar to this one:
You can now view your Streamlit app in your browser.
Local URL: http://localhost:8501
Network URL: http://123.45.6.78:8501
External URL: http://12.345.678.90:8501
npx: installed 22 in 2.441s
your url is: https://nice-forks-start.loca.lt
- In 'External URL', copy the number appearing between https and the port (:8501). In this example: 12.345.678.90
- Click in 'your url is:'. It will pop up a new window.
- Paste the number you copied in step 3. Click 'Submit'. It will start running ✨✨✨
I used this tutorial as first steps for building the website with Streamlit
This is the second part of the project. You can upload a .txt file. Select the source language, the target language, and automatically translate it! You should type your desired language by following this list. The models used are found here, in Helsinki-NLP/Opis-MT.
Not all languages are supported, as this is a prototype. If you want more, just add them easily in the code! ✨
You may need an account on Hugging Face in order to get your own Acess Token