Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to process newer mrd/gramtab files #27

Open
Evengard opened this issue Jul 24, 2022 · 4 comments
Open

Unable to process newer mrd/gramtab files #27

Evengard opened this issue Jul 24, 2022 · 4 comments

Comments

@Evengard
Copy link

I tried to generate a new morph.info file from newer mrd/gramtab files from https://github.com/sokirko74/morph_dict/tree/master/data/Russian. Unfortunately, it failed - the resulting morph.info was way too small and didn't contain anything useful.
Could you please fix the parsing of theese files to allow using a more fresh and complete dictionary?

@Evengard
Copy link
Author

Welp, the fix is actually quite easy. The newer gramtab/mrd files are in UTF-8 instead of Win1251. Changing the encoding resulted in successful generation of a newer morph.info.

@ermakovm
Copy link

Welp, the fix is actually quite easy. The newer gramtab/mrd files are in UTF-8 instead of Win1251. Changing the encoding resulted in successful generation of a newer morph.info.

Could you explain which files need to be updated to get a newer version of the dictionaries?

@Evengard
Copy link
Author

theese two to "utf-8" should be enough.

@Evengard
Copy link
Author

Evengard commented Dec 17, 2022

Or just use this one: https://github.com/Evengard/LuceneNetRussianMorphologyNetCore/blob/master/LuceneNetRussianMorphology/Resources/ru_morph.info - which I already generated myself 5 months ago or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants