Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More recent Tesseract build #7

Closed
StephenRUK opened this issue Apr 16, 2019 · 10 comments
Closed

More recent Tesseract build #7

StephenRUK opened this issue Apr 16, 2019 · 10 comments

Comments

@StephenRUK
Copy link

Hi! The version of Tesseract in this package has a bug which affects our software. I tried to use ccpan to compile the most recent Tesseract 4 DLL and replace it in this package, without success.

It would be great to have a new build with the latest Tesseract release. Or possibly instructions on how to build your conda package with a newer release by ourselves.

Danke!

@simonflueckiger
Copy link
Owner

Hi @StephenRUK! I will look into it today or tomorrow if I find some time. Could you let me know which version I should build for you to test? Python 3.7 64bit? And is there already an existing issue regarding this bug on https://github.com/tesseract-ocr/tesseract where I could find out more, just out of curiosity :)

@StephenRUK
Copy link
Author

We are working with symbol-level bounding boxes (to do something like this), and they are sometimes inaccurate when using the new Tesseract 4 engine. I found a few issues on it - e.g. here the bounding boxes work in RC 4,1 tesseract-ocr/tesseract#2240 - on the other hand, this issue is still open for the same problem tesseract-ocr/tesseract#2024

So I am putting my hope in 4.1.0-rc1, which is tagged here: https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.0-rc1 - Python 3.7, 64 Bit, Windows

@simonflueckiger
Copy link
Owner

Alright, here is a wheel bundled with tesseract 4.1.0-rc1. Would you mind giving this a try and report back if it works as intended?
tesserocr-2.4.0-cp37-cp37m-win_amd64.zip

@StephenRUK
Copy link
Author

Thanks for the new build, Simon. Installation of the whl works fine.

As for the Tesseract issue, it looks like the symbol bounding boxes have changed (I got some improvements) but are not yet fixed. If you're interested, see my example below:

Input:
somecharacters

Output symbol bounding box image:

image

@simonflueckiger
Copy link
Owner

I'm glad to hear that the new build made things a bit better. Really unfortunate that the issue is still around however. Is there anything I can do to help at this point?

@StephenRUK
Copy link
Author

If it's simple enough, could you explain how to create a new build .whl, or how to replace the Tesseract DLL with an updated one after installation? I noticed your Tesseract DLL has a different name than the original. Otherwise, I'll look back here when there's a new release 😄

@simonflueckiger
Copy link
Owner

Here is the most recent build of the windows tesserocr wheel on my appveyor: https://ci.appveyor.com/project/simonflueckiger/tesserocr-windows-build-8c9xt/builds/23914383. If all the dependencies are met on your system, it's as simple as calling python setup.py bdist_wheel. Unfortunately, tesserocr.pyd (after being compiled from tesserocr.pyx) can generally not interface with another tesseract dll [1], so it's not as simple as replacing the dll file in the site-packages/tesserocr folder.

I'm sorry for not giving you a more satisfactory answer, but feel free to reopen this issue if you want me to build with another tesseract commit that fixes this issue.

@StephenRUK
Copy link
Author

Perfect, I'll give the build a try when it's time. Thanks again!

@StephenRUK
Copy link
Author

Hi @simonflueckiger, in the meantime, a fix for the character bounding boxes has been issued. It didn't make release 4.1 but was merged in 2 days ago: tesseract-ocr/tesseract#2576 - a new build would be great whenever you get around to it 👍 (still x64, python 3.7)

@simonflueckiger
Copy link
Owner

Hi @StephenRUK, just letting you know that I'm currently looking into this. I'm running into some building issues, however. Stay tuned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants