Setting max_tokens to 8192 for Claude 3.5 Sonnet in LangChain with Amazon Bedrock #24438

cis-hotta · 2024-07-19T11:31:36Z

cis-hotta
Jul 19, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Description

Anthropic's Claude 3.5 Sonnet now supports a maximum output of 8192 tokens. I'd like to know if there's currently a way to set the max_tokens parameter to 8192 when using Claude 3.5 Sonnet through Amazon Bedrock via LangChain.

For reference, Anthropic's documentation about Claude models can be found here:
https://docs.anthropic.com/en/docs/about-claude/models

Has anyone successfully configured LangChain to utilize this increased token limit with Amazon Bedrock's implementation of Claude 3.5 Sonnet? If so, could you please share the method or any relevant code snippets?

Thank you in advance for any insights or assistance!

@dosu · 2024-07-19T12:08:25Z

dosubot[bot]
bot Jul 19, 2024

Yes, it is possible to configure LangChain to utilize the increased token limit of 8192 tokens for Claude 3.5 Sonnet with Amazon Bedrock. Here is the relevant code snippet for setting the max_tokens_to_sample parameter:

from langchain_community.llms.bedrock import Bedrock

# Initialize the Bedrock LLM with the appropriate model ID and client
llm = Bedrock(
    model_id="anthropic.claude-3.5-sonnet",
    client=<bedrock_client>,
    model_kwargs={
        "max_tokens_to_sample": 8192  # Set the token limit to 8192
    }
)

# Example usage
response = llm._call(prompt="Your prompt here")
print(response)

In the LLMInputOutputAdapter class, the prepare_input method sets the max_tokens_to_sample parameter if it is not already provided:

@classmethod
def prepare_input(
    cls,
    provider: str,
    model_kwargs: Dict[str, Any],
    prompt: Optional[str] = None,
    system: Optional[str] = None,
    messages: Optional[List[Dict]] = None,
) -> Dict[str, Any]:
    input_body = {**model_kwargs}
    if provider == "anthropic":
        if messages:
            input_body["anthropic_version"] = "bedrock-2023-05-31"
            input_body["messages"] = messages
            if system:
                input_body["system"] = system
            if "max_tokens" not in input_body:
                input_body["max_tokens"] = 1024
        if prompt:
            input_body["prompt"] = _human_assistant_format(prompt)
            if "max_tokens_to_sample" not in input_body:
                input_body["max_tokens_to_sample"] = 1024
    elif provider in ("ai21", "cohere", "meta", "mistral"):
        input_body["prompt"] = prompt
    elif provider == "amazon":
        input_body = dict()
        input_body["inputText"] = prompt
        input_body["textGenerationConfig"] = {**model_kwargs}
    else:
        input_body["inputText"] = prompt

    return input_body

To utilize the increased token limit, ensure that the max_tokens_to_sample parameter is set to 8192 in the model_kwargs dictionary when initializing the Bedrock LLM ^[1].

To continue talking to Dosu, mention @dosu.

1 reply

lutprodic Sep 15, 2024

have you tried this by yourself and successfully gotten more than 4,000 output tokens? Because the latest API of Sonnet 3.5 for Bedrock supports a maximum of 4,000 output tokens. Maybe I'm missing something?

cis-hotta · 2024-09-16T06:12:28Z

cis-hotta
Sep 16, 2024
Author

The implementation mentioned above did not work for me. As of August 19, 2024, while this is not for Amazon Bedrock, the original Anthropic API has officially supported this feature.
I will wait for Amazon Bedrock to implement this support.
https://docs.anthropic.com/en/release-notes/api#august-19th-2024
August 19th, 2024: We've moved 8,192 token outputs from beta to general availability for Claude 3.5 Sonnet.

Additionally, with the addition of "stop_reason" to the response_metadata of Bedrock (Anthropic), it's now possible to implement features such as generating continuations when the token count is insufficient.
The following is the relevant section, but it has not been updated yet:
https://python.langchain.com/docs/how_to/response_metadata/#bedrock-anthropic

However, when actually executed, you can see that stop_reason has been added:

{
    'usage': {'prompt_tokens': 12, 'completion_tokens': 58, 'total_tokens': 70}, 
    'stop_reason': 'end_turn',
    'model_id': 'anthropic.claude-3-haiku-20240307-v1:0'
}

0 replies

ibraheem-111 · 2024-09-20T07:38:29Z

ibraheem-111
Sep 20, 2024

I am still facing this issue, when when I set max_tokens in Chat Bedrock to 8192, I still get the following error:

{'usage': {'prompt_tokens': 2972, 'completion_tokens': 4096, 'total_tokens': 7068}, 'stop_reason': 'max_tokens', 'model_id': 'anthropic.claude-3-5-sonnet-20240620-v1:0'}

self.llm = ChatBedrock(
            model_id=model_id,
            model_kwargs=dict(temperature=0, max_tokens=8192),
            endpoint_url="https://bedrock-runtime.us-west-2.amazonaws.com",
            region_name="us-west-2",
            config=Config(read_timeout=1000))

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting max_tokens to 8192 for Claude 3.5 Sonnet in LangChain with Amazon Bedrock #24438

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Setting max_tokens to 8192 for Claude 3.5 Sonnet in LangChain with Amazon Bedrock #24438

cis-hotta Jul 19, 2024

Checked other resources

Description

Replies: 3 comments · 1 reply

dosubot[bot] bot Jul 19, 2024

lutprodic Sep 15, 2024

cis-hotta Sep 16, 2024 Author

ibraheem-111 Sep 20, 2024

cis-hotta
Jul 19, 2024

Replies: 3 comments 1 reply

dosubot[bot]
bot Jul 19, 2024

cis-hotta
Sep 16, 2024
Author

ibraheem-111
Sep 20, 2024