Chat Nio 配置自定义文件解析服务报错网络错误的可能情况 - possible errors for configuring the custom file parsing service #6

dqzboy · 2024-05-17T15:48:30Z

chatnio-blob-service部署完成后，前端访问上传图片正常，但是chatnio的后台配置地址上传图片直接报错，日志也没有看到相关的信息输出

chatnio-blob-service使用docker部署
chatnio通过docker部署或者编译部署都提示网络错误

zmh-program · 2024-05-29T09:19:31Z

Please provide your browser logs, the possible factors for this issue are:

Mix Content Error: Browser does not allow HTTPS sites to initiate HTTP requests
CORS Issue: This is not an issue by default, but if you have misconfigured cross-domain resource sharing, this may cause this issue.

dqzboy · 2024-05-30T13:35:43Z

谢谢，已解决

zmh-program · 2024-05-30T13:42:09Z

marked as pinned issue

kitaev-chen · 2024-12-03T11:46:24Z

谢谢，已解决

??? 怎么解决的

zmh-program · 2024-12-03T11:58:25Z

谢谢，已解决

??? 怎么解决的

我建议你提供一下你的信息，这样没有任何有用信息呢

dqzboy · 2024-12-03T12:00:20Z

谢谢，已解决

??? 怎么解决的

跨域问题，直接配置为公网地址或者nginx代理一下

kitaev-chen · 2024-12-03T12:43:23Z

谢谢，已解决

??? 怎么解决的

跨域问题，直接配置为公网地址或者nginx代理一下

谢谢。如果只是想 localhost 的实验下，不知有没有解决方案。我是将 blob service 整合进 searxng 的 compose 文件里（带caddy），然后设置 - CORS_ALLOW_ORIGINS=* ，结果还是不行。

zmh-program · 2024-12-03T12:59:30Z

F12，看一下请求报错的是什么。

kitaev-chen · 2024-12-03T13:24:33Z

F12，看一下请求报错的是什么。

多谢提醒，很有帮助。host.docker.internal 之类的问题，其实 compose 里已经改为局域网 ip 重启了，居然是 F5 刷新不行，需要关闭标签再打开。

小文件基本没问题了，9 页的 paper 还是报错：
400 Client Error: Bad Request for url: http://192.168.xxx:xxx/ocr/predict-by-file

zmh-program · 2024-12-03T13:28:02Z

400 Client Error 这个是paddleocr-api的bug吧。

kitaev-chen · 2024-12-03T13:30:48Z

有可能，不知道能不能把 paddleocr 换成 zerox

zmh-program · 2024-12-03T13:33:18Z

来提一个新issue，也欢迎来pr。

kitaev-chen · 2024-12-03T22:53:00Z

400 Client Error 这个是 paddleocr-api 的 bug 吧。

搞清楚了，我测试的 pdf 抽取 image 之后有各种格式，比如 jpeg， jb2 什么的，前者 paddleocr-api 不支持，后者 blob-service 判断不是图片。难怪各种问题。

zmh-program · 2024-12-04T00:36:01Z

400 Client Error 这个是 paddleocr-api 的 bug 吧。

搞清楚了，我测试的 pdf 抽取 image 之后有各种格式，比如 jpeg， jb2 什么的，前者 paddleocr-api 不支持，后者 blob-service 判断不是图片。难怪各种问题。

涨见识了。我修复一下后者。

zmh-program · 2024-12-04T01:05:20Z

不对不对，你给我绕晕了哈哈哈。这个问题应该不是 blob 的检测器问题。（不过我再加下这些格式的支持，直接上传还是会有影响的）

我回去看了一下pdf里抽取image的逻辑，调用get_images后没有做判断。

blob-service/handlers/pdf.py

Lines 30 to 49 in 726ad05

    
           for image_instance in page.get_images(full=True):  # get all images on the page 
        
               cursor += 1 
        
               xref = image_instance[0]  # get the xref of the image 
        
               image = doc.extract_image(xref)  # extract the image 
        
               data = image['image']  # get the image data 
        
               suffix = image.get('ext', '')  # get the image extension 
        
               image_name = f"{filename}_extracted_{cursor}.{suffix}"  # create a name for the image 
        
               io = BytesIO(data) 
        
               io.name = image_name 
        
               io.seek(0) 
        
               # create a file-like object for the image 
        
               image_file = UploadFile(io, filename=image_name) 
        
               stack.append(await process_image(image_file, enable_ocr=enable_ocr, enable_vision=enable_vision, not_raise=True)) 
        
               print(f"[pdf] extracted image: {image_name} (page: {page.number}, cursor: {cursor}, max: {PDF_MAX_IMAGES})") 
        
               if PDF_MAX_IMAGES != -1 and cursor >= PDF_MAX_IMAGES: 
        
                   break

image.py实现process的逻辑，并没有is_image的判断。

blob-service/handlers/image.py

Lines 21 to 32 in 726ad05

    
           async def process(file: UploadFile, enable_ocr: bool, enable_vision: bool, not_raise: bool = False): 
        
               """Process image.""" 
        
               if enable_ocr: 
        
                   return create_ocr_task(file) 
        
               if not enable_vision: 
        
                   if not not_raise: 
        
                       return "" 
        
                   raise ValueError("Trying to upload image with Vision disabled.") 
        
               return await process_image(file)

至于image suffix checker在哪呢，应该是只有在processor.py中的switch实现里, process 是没有做判断的。

blob-service/handlers/processor.py

Lines 57 to 62 in 726ad05

    
           elif image.is_image(filename): 
        
               return "image", await image.process( 
        
                   file, 
        
                   enable_ocr=enable_ocr, 
        
                   enable_vision=enable_vision, 
        
               )

kitaev-chen · 2024-12-05T08:08:43Z

哦哦，多谢啦！我也是粗略看了一眼，不过加上更全点也挺好的。不知道paddleocr那支持多少，我回头也改改PaddleOCRFastAPI那边看。

dqzboy closed this as completed May 18, 2024

dqzboy reopened this May 18, 2024

dqzboy closed this as completed May 30, 2024

zmh-program changed the title ~~chatnio后台配置此项目地址直接网络错误~~ Chat Nio 配置自定义文件解析服务报错网络错误的可能情况 - possible errors for configuring the custom file parsing service May 30, 2024

zmh-program pinned this issue May 30, 2024

zmh-program reopened this Dec 4, 2024

zmh-program mentioned this issue Dec 4, 2024

Add support for JBIG2 and PJP images #13

Merged

zmh-program closed this as completed in #13 Dec 4, 2024

zmh-program closed this as completed in 3de3bca Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat Nio 配置自定义文件解析服务报错网络错误的可能情况 - possible errors for configuring the custom file parsing service #6

Chat Nio 配置自定义文件解析服务报错网络错误的可能情况 - possible errors for configuring the custom file parsing service #6

dqzboy commented May 17, 2024 •

edited

Loading

zmh-program commented May 29, 2024

dqzboy commented May 30, 2024

zmh-program commented May 30, 2024

kitaev-chen commented Dec 3, 2024

zmh-program commented Dec 3, 2024

dqzboy commented Dec 3, 2024

kitaev-chen commented Dec 3, 2024 •

edited

Loading

zmh-program commented Dec 3, 2024

kitaev-chen commented Dec 3, 2024

zmh-program commented Dec 3, 2024 •

edited

Loading

kitaev-chen commented Dec 3, 2024

zmh-program commented Dec 3, 2024

kitaev-chen commented Dec 3, 2024

zmh-program commented Dec 4, 2024 •

edited

Loading

zmh-program commented Dec 4, 2024 •

edited

Loading

kitaev-chen commented Dec 5, 2024

Chat Nio 配置自定义文件解析服务报错网络错误的可能情况 - possible errors for configuring the custom file parsing service #6

Chat Nio 配置自定义文件解析服务报错网络错误的可能情况 - possible errors for configuring the custom file parsing service #6

Comments

dqzboy commented May 17, 2024 • edited Loading

zmh-program commented May 29, 2024

dqzboy commented May 30, 2024

zmh-program commented May 30, 2024

kitaev-chen commented Dec 3, 2024

zmh-program commented Dec 3, 2024

dqzboy commented Dec 3, 2024

kitaev-chen commented Dec 3, 2024 • edited Loading

zmh-program commented Dec 3, 2024

kitaev-chen commented Dec 3, 2024

zmh-program commented Dec 3, 2024 • edited Loading

kitaev-chen commented Dec 3, 2024

zmh-program commented Dec 3, 2024

kitaev-chen commented Dec 3, 2024

zmh-program commented Dec 4, 2024 • edited Loading

zmh-program commented Dec 4, 2024 • edited Loading

kitaev-chen commented Dec 5, 2024

dqzboy commented May 17, 2024 •

edited

Loading

kitaev-chen commented Dec 3, 2024 •

edited

Loading

zmh-program commented Dec 3, 2024 •

edited

Loading

zmh-program commented Dec 4, 2024 •

edited

Loading

zmh-program commented Dec 4, 2024 •

edited

Loading