Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable manually specifying the desired OPUS model? #74

Open
MoritzLaurer opened this issue Jul 31, 2022 · 4 comments
Open

Enable manually specifying the desired OPUS model? #74

MoritzLaurer opened this issue Jul 31, 2022 · 4 comments

Comments

@MoritzLaurer
Copy link

I really like the library, great work!
Is there a way to manually specify a specific OPUS model? For example EasyNMT with OPUS currently does not support English as source and Portuguese as target language because it tries to download 'opus-mt-en-pt' by default, which does not exist.
There is, however, an en2pt model on the hub now (https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-pt) with a slightly different name. I don't know how to tell EasyNMT to take this specific model instead of throwing the following error:

OSError: Helsinki-NLP/opus-mt-en-pt is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

@lucasfariaslf
Copy link

lucasfariaslf commented Nov 10, 2022

For some specific sentences it also seems that model.translate constructs a non-existent model identifier. For example, for some sentences in dutch, instead of the correct Helsinki-NLP/opus-mt-nl-en, it looks for non-existent Helsinki-NLP/opus-mt-nds-en, which then throws the same error @MoritzLaurer mentioned.

@glowinthedark
Copy link

glowinthedark commented Dec 12, 2022

You can bypass easynmt entirely and do it for example with transformers (pip install transformers):

from transformers import pipeline

pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-en-pt")
print(pipe("Nobody Expects the Spanish Inquisition")[0]["translation_text"])

but in this case you'll need to manually deal with sentence tokenization so it's not as easy as easynmt. Or you can use EasyNMT.sentence_splitting() https://github.com/UKPLab/EasyNMT/blob/main/easynmt/EasyNMT.py#L444

@tansaku
Copy link

tansaku commented Feb 20, 2023

I'm seeing this same problem:

Helsinki-NLP/opus-mt-pt-en is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

bit of a blocker for me as I'm trying to translate multiple different languages and it would be nice if easynmt just handled them all correctly - does anyone know how to go about fixing this?

@wasifferoze
Copy link

Adding source_lang resolved this problem for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants