You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I deployed 3 different models in Azure AI (Machine Learning Studio) serverless endpoint.
Mistral-Nemo
Llama-3.3-70B-Instruct
Llama-3.2-90B-Vision-Instruct
And use SemanticKernel package 1.33.0 + Microsoft.SemanticKernel.Connectors.AzureAIInference 1.33.0-beta to test, The Mistral-Nemo model works perfect. But not in llama series models.
To Reproduce
Steps to reproduce the behavior:
Depoly open source model (e.g llaba.3.-70B) to serverless endpoin.
Make a GetChatMessageContentsAsync call with tools
Expected behavior
Models like Llama 3.3 and Llama 3.2 should support tool calling and are expected to function as they do when hosted in Ollama.
Platform
OS: Mac
IDE: Rider
Language: C#
Source: SK 1.33.0, Microsoft.SemanticKernel.Connectors.AzureAIInference 1.33.0-beta
Additional context
With llama-3.3-70B, When make a chatcomplecton call with tools, It responded with an incorrect result and not invoke any tools. no matter use GetChatMessageContentsAsync or GetStreamingChatMessageContentsAsync.
With llama-3.2-90B, When make a chatcomplecton call with tools, It throws an exception right away (error message in bwlow)
Azure.RequestFailedException: {"object":"error","message":"\"auto\" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set","type":"BadRequestError","param":null,"code":400}
Status: 400 (Bad Request)
ErrorCode: Bad Request
Content:
{"error":{"code":"Bad Request","message":"{\"object\":\"error\",\"message\":\"\\\"auto\\\" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set\",\"type\":\"BadRequestError\",\"param\":null,\"code\":400}","status":400}}
Headers:
x-ms-rai-invoked: REDACTED
x-envoy-upstream-service-time: REDACTED
X-Request-ID: REDACTED
ms-azureml-model-error-reason: REDACTED
ms-azureml-model-error-statuscode: REDACTED
ms-azureml-model-time: REDACTED
azureml-destination-model-group: REDACTED
azureml-destination-region: REDACTED
azureml-destination-deployment: REDACTED
azureml-destination-endpoint: REDACTED
x-ms-client-request-id: 42bc2161-5a3f-4e88-8c9e-577390db941e
Request-Context: REDACTED
azureml-model-session: REDACTED
azureml-model-group: REDACTED
Date: Wed, 15 Jan 2025 05:46:51 GMT
Content-Length: 246
Content-Type: application/json
at Azure.Core.HttpPipelineExtensions.ProcessMessageAsync(HttpPipeline pipeline, HttpMessage message, RequestContext requestContext, CancellationToken cancellationToken)
at Azure.AI.Inference.ChatCompletionsClient.CompleteAsync(RequestContent content, String extraParams, RequestContext context)
at Azure.AI.Inference.ChatCompletionsClient.CompleteAsync(ChatCompletionsOptions chatCompletionsOptions, CancellationToken cancellationToken)
at Microsoft.Extensions.AI.AzureAIInferenceChatClient.CompleteAsync(IList`1 chatMessages, ChatOptions options, CancellationToken cancellationToken)
at Microsoft.Extensions.AI.FunctionInvokingChatClient.CompleteAsync(IList`1 chatMessages, ChatOptions options, CancellationToken cancellationToken)
at Microsoft.SemanticKernel.ChatCompletion.ChatClientChatCompletionService.GetChatMessageContentsAsync(ChatHistory chatHistory, PromptExecutionSettings executionSettings, Kernel kernel, CancellationToken cancellationToken)
The text was updated successfully, but these errors were encountered:
Describe the bug
I deployed 3 different models in Azure AI (Machine Learning Studio) serverless endpoint.
And use SemanticKernel package 1.33.0 + Microsoft.SemanticKernel.Connectors.AzureAIInference 1.33.0-beta to test, The Mistral-Nemo model works perfect. But not in llama series models.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Models like Llama 3.3 and Llama 3.2 should support tool calling and are expected to function as they do when hosted in Ollama.
Platform
Additional context
With llama-3.3-70B, When make a chatcomplecton call with tools, It responded with an incorrect result and not invoke any tools. no matter use GetChatMessageContentsAsync or GetStreamingChatMessageContentsAsync.
With llama-3.2-90B, When make a chatcomplecton call with tools, It throws an exception right away (error message in bwlow)
The text was updated successfully, but these errors were encountered: