Log LLM tool call for streamed response #545

jackmpcollins · 2024-10-27T08:09:44Z

The current LLM streamed response parsing only collects the message content/text. This PR generalizes this to allow collecting any state needed to generate the response_data. Streamed OpenAI chat completions now include the tool calls in the Logfire UI. Behaviour for Anthropic and for OpenAI completions (non-chat) remains the same.

Fixes #542

codecov · 2024-10-27T08:14:52Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (ff7211b) to head (6f512b8).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #545   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          133       133           
  Lines        10263     10308   +45     
  Branches      1399      1405    +6     
=========================================
+ Hits         10263     10308   +45

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

logfire/_internal/integrations/llm_providers/types.py

logfire/_internal/integrations/llm_providers/openai.py

alexmojaki · 2024-10-28T12:22:59Z

logfire/_internal/integrations/llm_providers/openai.py

+            'message': final_completion.choices[0].message if final_completion.choices else None,
+            'usage': final_completion.usage,


I worry about this having a significantly different shape from the other response data dicts.

This is the same shape as the existing non-streamed chat completion, which is why it displays nicely in the UI.

logfire/logfire/_internal/integrations/llm_providers/openai.py

Lines 81 to 91 in af707e2

def on_response(response: ResponseT, span: LogfireSpan) -> ResponseT:

"""Updates the span based on the type of response."""

if isinstance(response, LegacyAPIResponse): # pragma: no cover

on_response(response.parse(), span) # type: ignore

return cast('ResponseT', response)

if isinstance(response, ChatCompletion):

span.set_attribute(

'response_data',

{'message': response.choices[0].message, 'usage': response.usage},

)

I'll admit it's not exactly the same: the message object here is a ParsedChatCompletionMessage, a subclass of the non-streamed chat completion message ChatCompletionMessage. It has the "parsed" and tool call "parsed_arguments" fields added. These "ParsedX" classes are those returned by the client.beta.chat.completions.parse method.

Aside: it would be handy to have the response dicts that have special handling on the frontend be documented or codified (e.g. using TypedDicts), ideally with the order of precedence. For example I found that "combined_chunk_content" takes precedence over "message", which is why I excluded it.

ah, i didn't realise we were already using this shape. yes, creating some types sounds good. @dmontagu any thoughts? should we consider these shapes stable?

tests/otel_integrations/test_openai.py

alexmojaki

Thanks! Are you happy for me to merge?

alexmojaki · 2024-11-12T09:02:28Z

cc @willbakst in case you want to take a look, but I won't wait to merge.

jackmpcollins · 2024-11-12T19:12:39Z

Thanks! Are you happy for me to merge?

Yes, happy to merge! Thanks

jackmpcollins added 13 commits October 25, 2024 23:31

Add tests for tool call stream w/o snapshot

a55495d

Fix pyright venv config

8207557

Replace content_from_stream with stream_state_cls

7e7ca8a

Handle null chunk

6b306bc

Add null response for empty choices

e603489

exclude_unset in httpx.Response to fix state parsing

bf70fe0

Uncomment stream options. Add snapshots

d27fb6f

Improve stream state params comment

a959b31

Use current snapshot to display partial responses

c8a940c

Fix index in streamed text test response

3137921

fix snapshots

57963b6

Update snapshots for anthropic tests

5f4c326

Fix typo: AnthropicMessageStreamState

c93c5a5

Make stream_state_cls required in record_streaming

5286f87

This was referenced Oct 27, 2024

logfire not logging the response jackmpcollins/magentic#359

Open

LLM tool call not shown for streamed response #542

Closed

alexmojaki reviewed Oct 28, 2024

View reviewed changes

jackmpcollins added 2 commits October 29, 2024 18:24

Merge branch 'main' into log-streamed-tool-call-response

e88f71f

Remove unneeded ...

6119fef

alexmojaki reviewed Oct 30, 2024

View reviewed changes

tests/otel_integrations/test_openai.py Outdated Show resolved Hide resolved

alexmojaki reviewed Oct 30, 2024

View reviewed changes

tests/otel_integrations/test_openai.py Show resolved Hide resolved

alexmojaki reviewed Oct 30, 2024

View reviewed changes

tests/otel_integrations/test_openai.py Outdated Show resolved Hide resolved

jackmpcollins added 4 commits November 10, 2024 15:23

Merge branch 'main' into log-streamed-tool-call-response

895460f

Fix: assistantassistantassistant role

7045b77

Add chunk_count to empty response response_data

b3b31c6

Fall back to OpenaiCompletionStreamState if import unavailable

b5b26e8

jackmpcollins mentioned this pull request Nov 12, 2024

Role field should not be repeated in streamed response chunks ollama/ollama#7626

Open

alexmojaki approved these changes Nov 12, 2024

View reviewed changes

Merge branch 'main' into log-streamed-tool-call-response

c3d813e

Merge branch 'main' into log-streamed-tool-call-response

6f512b8

alexmojaki enabled auto-merge (squash) November 13, 2024 11:55

alexmojaki merged commit 68fcf5a into pydantic:main Nov 13, 2024
19 checks passed

jackmpcollins deleted the log-streamed-tool-call-response branch November 13, 2024 18:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log LLM tool call for streamed response #545

Log LLM tool call for streamed response #545

jackmpcollins commented Oct 27, 2024

codecov bot commented Oct 27, 2024 •

edited

Loading

alexmojaki Oct 28, 2024

jackmpcollins Oct 30, 2024

alexmojaki Oct 30, 2024

alexmojaki left a comment

alexmojaki commented Nov 12, 2024 •

edited

Loading

jackmpcollins commented Nov 12, 2024

		'message': final_completion.choices[0].message if final_completion.choices else None,
		'usage': final_completion.usage,

	def on_response(response: ResponseT, span: LogfireSpan) -> ResponseT:
	"""Updates the span based on the type of response."""
	if isinstance(response, LegacyAPIResponse): # pragma: no cover
	on_response(response.parse(), span) # type: ignore
	return cast('ResponseT', response)

	if isinstance(response, ChatCompletion):
	span.set_attribute(
	'response_data',
	{'message': response.choices[0].message, 'usage': response.usage},
	)

Log LLM tool call for streamed response #545

Log LLM tool call for streamed response #545

Conversation

jackmpcollins commented Oct 27, 2024

codecov bot commented Oct 27, 2024 • edited Loading

Codecov Report

alexmojaki Oct 28, 2024

Choose a reason for hiding this comment

jackmpcollins Oct 30, 2024

Choose a reason for hiding this comment

alexmojaki Oct 30, 2024

Choose a reason for hiding this comment

alexmojaki left a comment

Choose a reason for hiding this comment

alexmojaki commented Nov 12, 2024 • edited Loading

jackmpcollins commented Nov 12, 2024

codecov bot commented Oct 27, 2024 •

edited

Loading

alexmojaki commented Nov 12, 2024 •

edited

Loading