Fix for Exception raised while parsing Chat Completions streaming response, in some rare cases #39741
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
I finally got one repro of the GitHub issue, while taking SDK logs and using DeepSeek model. That confirmed what I suspected. But I could not understand why the unit-test I added a couple of days ago, streaming Chinese characters broken across lines, did not exhibit the same issue. That unit-test passed.
After investigation turns out I had a bug in the unit-test! Missing comma after one of the SSE lines in the input array for the test. Which miraculously made the unit-test pass, when they should have failed exactly as reported in the GitHub issue.
After I discovered that, it was easy to implement a fix. I updated the logic in the SSE parsing to make sure UTF-8 decoding was moved further down, at the point where we are guaranteed we have a complete line of JSON string, just before deserializing it into the output chunk object. Until that point (including caching the previous incomplete line) the input is still handled as a "bytes" object.
I also updated all streaming samples with extra checks during printing of the streaming response and also printed the token usage. We already did that in the GitHub samples.