message bucket: collect messages and send in a single prompt #28

avelino · 2023-11-19T15:42:54Z

The POST /chat/{chat_id} receives one message at a time, when it receives it, it processes it in the llm and returns a message to you, example:

QA 1: hello how are you?
QA 2: I need help, can you help me, please?
QA 3: how do I perform operation XYZ on the web page?

This flow of conversation is common in a chat environment (e.g. WhatsApp), where the user breaks the line. What the user would like to receive is only the answer to QA 3, the other messages are introductory ("presentation").

As implemented today, we answer one message at a time:

Reply from QA1: Hi, how can I help?
Reply from QA2: It would be a pleasure to help you, how can I help you?
Reply from QA3: You should access, ... the answer to the question

the answer that matters is QA3, QA1 and QA2 are ~~"duplicated"~~

solution

Parameter in the endpoint (POST, create message) called message bucket, which activates intelligence to collect messages in the backend and make a single call to the LLM sending the collection of messages.

I can think of a solution to collect requests and if no message is received at X after the last message received, call the LLM aggregating all the messages not sent.

it's not the best solution, but it's the solution that comes to mind at first - this issue is to discuss the best solution, probably the proposed solution is not the best

The text was updated successfully, but these errors were encountered:

vmesel · 2023-11-19T20:12:01Z

@avelino I'm thinking that we will need a message buffer that stores the messages for a certain period before sending them to the server. Would we need a Redis or Memcached server or would you implement this by hand?

avelino · 2023-12-07T15:10:17Z

Would we need a Redis or Memcached server or would you implement this by hand?

I don't want to define technology (database) but rather discuss architecture, so we deal with "storage" as a storage resource (redis, memcached or other) and not a solution to the problem.

bucket endpoint:
- temporary solution for storing messages
- control is on the client side and calls the created bucket and sends all bucket messages to the LLM
I don't like this solution, I believe that we wouldn't have "intelligence" on the server side, but on the client side.
"Intelligence" based on the delay time for receiving messages:
- provisional solution for storing messages, activated by a "parameter" of the endpoint
- if it doesn't receive messages for X amount of time, it collects all the messages not sent to the LLM and sends them all at once (to the prompt)

version "2" is the way I'd like to see it working, but one doesn't prevent the other

vmesel · 2024-04-07T02:30:05Z

We can achieve this by checking how langchain implemented the approach on batch/streaming endpoints on langserve.

avelino added the enhancement New feature or request label Nov 19, 2023

avelino added this to project manager Nov 19, 2023

github-project-automation bot moved this to 💡 Icebox in project manager Nov 19, 2023

avelino added the EPIC label Nov 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

message bucket: collect messages and send in a single prompt #28

message bucket: collect messages and send in a single prompt #28

avelino commented Nov 19, 2023 •

edited

Loading

vmesel commented Nov 19, 2023

avelino commented Dec 7, 2023

vmesel commented Apr 7, 2024

message bucket: collect messages and send in a single prompt #28

message bucket: collect messages and send in a single prompt #28

Comments

avelino commented Nov 19, 2023 • edited Loading

solution

vmesel commented Nov 19, 2023

avelino commented Dec 7, 2023

vmesel commented Apr 7, 2024

avelino commented Nov 19, 2023 •

edited

Loading