Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue streaming Chinese characters using azure-ai-inference SDK #39565

Closed
bombert opened this issue Feb 5, 2025 · 12 comments
Closed

Issue streaming Chinese characters using azure-ai-inference SDK #39565

bombert opened this issue Feb 5, 2025 · 12 comments
Assignees
Labels
AI Model Inference Issues related to the client library for Azure AI Model Inference (\sdk\ai\azure-ai-inference) bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@bombert
Copy link

bombert commented Feb 5, 2025

Type of issue

Code doesn't work

Description

Azure AI foundry SDK cannot handle steaming with Chinese characters.

Bug Recreation

from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential
key = "<your_key>"
endpoint = "<end_point>"
model_name = "<model_name>"

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key),
)

response = client.complete(
    stream=True,
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="9.11 and 9.8, which is greater? Please response in Chinese!!!")
    ],
    max_tokens=2048,
    model=model_name
)

for update in response:
    if update.choices:
        print(update.choices[0].delta.content or "", end="")

FATAL

File 342 line

line_list: List[str] = re.split(r"(?<=\n)", element.decode("utf-8"))
would throw exception when element doesn't end with a complete Chinese character, i.e. '\xe6\x8e\xa8\xe8\x8d'

Fixing Suggestion

Catch the exception and wait next chunk.

      try:
            if self._incomplete_element:
                element = self._incomplete_element + element
                self._incomplete_element = b""
            line_list: List[str] = re.split(r"(?<=\n)", element.decode("utf-8"))
        except UnicodeDecodeError:
            self._incomplete_element = element
            return False
@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Feb 5, 2025
@bombert bombert changed the title azure ai foundry SDK inference cannot handle steam contains Chinese character azure ai foundry SDK inference cannot handle steaming contains Chinese character Feb 5, 2025
@bombert bombert changed the title azure ai foundry SDK inference cannot handle steaming contains Chinese character azure ai foundry SDK inference cannot handle the steamings containing Chinese character Feb 5, 2025
@xiangyan99 xiangyan99 added bug This issue requires a change to an existing behavior in the product in order to be resolved. Service Attention Workflow: This issue is responsible by Azure service team. AI and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Feb 5, 2025
@github-actions github-actions bot added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Feb 5, 2025
Copy link

github-actions bot commented Feb 5, 2025

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @achauhan-scc @kingernupur @luigiw @needuv @paulshealy1 @singankit.

@kristapratico kristapratico added the Client This issue points to a problem in the data-plane of the library. label Feb 5, 2025
@kingernupur
Copy link
Member

Tagging @jhakulin

@dargilco dargilco self-assigned this Feb 10, 2025
@dargilco dargilco added AI Model Inference Issues related to the client library for Azure AI Model Inference (\sdk\ai\azure-ai-inference) and removed AI labels Feb 10, 2025
@dargilco
Copy link
Member

@bombert Thanks for reporting this and suggesting a fix! I'm looking into it.

@dargilco
Copy link
Member

@bombert which AI model did you use?

@dargilco
Copy link
Member

@bombert I've tried several models, but was not able to reproduce the issue. I'll need to see the issue for myself in order to fix it. Please share model name and how the model was deployed. What is the endpoint value above? Feel free to redact personal resource name if relevant. Were you using Azure OpenAI endpoint? GitHub models endpoint https://models.inference.ai.azure.com, Serverless API endpoint, Managed Computer endpoint, or default deployment via AI Foundry Project into an AI Service resource?.

@howieleung
Copy link
Member

@dargilco I think this is the line of code to be fixed.

Image

I think the http header from the response has the encoding. BaseAsyncAgentEventHandler has initialize function. See if you can add the response header as an argument. And pass it from the caller and consume in the decode function.

@dargilco
Copy link
Member

@howieleung the issue was reported on the Inference SDK, not the AI Projects SDK. But of course, we need to make sure both can handle this rare case. I'll follow up with you 1:1.

@dargilco dargilco changed the title azure ai foundry SDK inference cannot handle the steamings containing Chinese character Issue streaming Chinese characters using azure-ai-inference SDK (was: "azure ai foundry SDK inference cannot handle the steamings containing Chinese character") Feb 11, 2025
@dargilco dargilco changed the title Issue streaming Chinese characters using azure-ai-inference SDK (was: "azure ai foundry SDK inference cannot handle the steamings containing Chinese character") Issue streaming Chinese characters using azure-ai-inference SDK Feb 11, 2025
@TshkOtsk
Copy link

I got same issue.

AI model: DeepSeek-R1
Endpoint: Azure AI model inference endpoint (https://<DEPLOY_NAME>.services.ai.azure.com/models)

Code:

client = ChatCompletionsClient(
    endpoint=AZUREAI_ENDPOINT_URL,
    credential=AzureKeyCredential(AZUREAI_ENDPOINT_KEY)
)
deploy_name = AZUREAI_DEPLOYMENT_NAME

@app.get("/chat")
async def chat_endpoint(message: str):
    conversation_history.append({"role": "user", "content": message})
    
    messages = [SystemMessage(content="You are a helpful assistant.")]
    for entry in conversation_history:
        if entry["role"] == "user":
            messages.append(UserMessage(content=entry["content"]))
        else:
            messages.append(UserMessage(content=entry["content"]))

    response = client.complete(
        stream=True,
        messages=messages,
        max_tokens=4096,
        model=deploy_name,
    )
    
    def event_generator():
        assistant_response = ""
        for chunk in response:
            if chunk.choices and chunk.choices[0].delta and chunk.choices[0].delta.content:
                delta = chunk.choices[0].delta.content
                print(delta)
                assistant_response += delta
                yield f"data: {delta}\n\n"
        conversation_history.append({"role": "assistant", "content": assistant_response})
        yield "data: [DONE]\n\n"
    
    return StreamingResponse(event_generator(), media_type="text/event-stream")

Error messages:

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/responses.py", line 268, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/responses.py", line 264, in wrap
    await func()
  File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/responses.py", line 233, in listen_for_disconnect
    message = await receive()
  File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 531, in receive
    await self.message_event.wait()
  File "/Users/.pyenv/versions/3.10.4/lib/python3.10/asyncio/locks.py", line 214, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 10c300d30

During handling of the above exception, another exception occurred:

  + Exception Group Traceback (most recent call last):
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 403, in run_asgi
  |     result = await app(  # type: ignore[func-returns-value]
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
  |     return await self.app(scope, receive, send)
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
  |     await super().__call__(scope, receive, send)
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/applications.py", line 112, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/middleware/errors.py", line 187, in __call__
  |     raise exc
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in __call__
  |     await self.app(scope, receive, _send)
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
  |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
  |     raise exc
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/routing.py", line 715, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/routing.py", line 735, in app
  |     await route.handle(scope, receive, send)
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
  |     await self.app(scope, receive, send)
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/routing.py", line 76, in app
  |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
  |     raise exc
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
  |     await response(scope, receive, send)
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/responses.py", line 261, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 767, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/responses.py", line 264, in wrap
    |     await func()
    |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/responses.py", line 245, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/concurrency.py", line 60, in iterate_in_threadpool
    |     yield await anyio.to_thread.run_sync(_next, as_iterator)
    |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    |     return await get_async_backend().run_sync_in_worker_thread(
    |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2461, in run_sync_in_worker_thread
    |     return await future
    |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 962, in run
    |     result = context.run(func, *args)
    |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/starlette/concurrency.py", line 49, in _next
    |     return next(iterator)
    |   File "/Users/deepseek-chat/deepseekChat.py", line 179, in event_generator
    |     for chunk in response:
    |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/azure/ai/inference/models/_patch.py", line 413, in __next__
    |     self._done = self._read_next_block()
    |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/azure/ai/inference/models/_patch.py", line 426, in _read_next_block
    |     return self._deserialize_and_add_to_queue(element)
    |   File "/Users/.pyenv/versions/3.10.4/lib/python3.10/site-packages/azure/ai/inference/models/_patch.py", line 342, in _deserialize_and_add_to_queue
    |     line_list: List[str] = re.split(r"(?<=\n)", element.decode("utf-8"))
    | UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 4088-4089: unexpected end of data
    +------------------------------------

@dargilco
Copy link
Member

Thank you @TshkOtsk, let me try with DeepSeek-R1

@dargilco
Copy link
Member

@TshkOtsk Unfortunately I cannot reproduce this with DeepSeek-R1 model either (deployed to my AI Foundry project). I tried with and without Content Safety (which affects the chunking of the streamed response). Did you use exactly the System and User messages specified by @bombert above, or something else?

@dargilco
Copy link
Member

Good news @TshkOtsk, I was able to reproduce this once and I have SDK logs. Working on a proper fix and unit-tests. Will update here when I have an update.

@dargilco
Copy link
Member

@TshkOtsk @bombert I just released azure-ai-inference version 1.0.0b9 with a fix for this issue. Please give it a try. Thank you for reporting this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI Model Inference Issues related to the client library for Azure AI Model Inference (\sdk\ai\azure-ai-inference) bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

7 participants