You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The POST /chat/{chat_id} receives one message at a time, when it receives it, it processes it in the llm and returns a message to you, example:
QA 1: hello how are you?
QA 2: I need help, can you help me, please?
QA 3: how do I perform operation XYZ on the web page?
This flow of conversation is common in a chat environment (e.g. WhatsApp), where the user breaks the line. What the user would like to receive is only the answer to QA 3, the other messages are introductory ("presentation").
As implemented today, we answer one message at a time:
Reply from QA1: Hi, how can I help?
Reply from QA2: It would be a pleasure to help you, how can I help you?
Reply from QA3: You should access, ... the answer to the question
the answer that matters is QA3, QA1 and QA2 are "duplicated"
solution
Parameter in the endpoint (POST, create message) called message bucket, which activates intelligence to collect messages in the backend and make a single call to the LLM sending the collection of messages.
I can think of a solution to collect requests and if no message is received at X after the last message received, call the LLM aggregating all the messages not sent.
it's not the best solution, but it's the solution that comes to mind at first - this issue is to discuss the best solution, probably the proposed solution is not the best
The text was updated successfully, but these errors were encountered:
@avelino I'm thinking that we will need a message buffer that stores the messages for a certain period before sending them to the server. Would we need a Redis or Memcached server or would you implement this by hand?
Would we need a Redis or Memcached server or would you implement this by hand?
I don't want to define technology (database) but rather discuss architecture, so we deal with "storage" as a storage resource (redis, memcached or other) and not a solution to the problem.
bucket endpoint:
temporary solution for storing messages
control is on the client side and calls the created bucket and sends all bucket messages to the LLM
I don't like this solution, I believe that we wouldn't have "intelligence" on the server side, but on the client side.
"Intelligence" based on the delay time for receiving messages:
provisional solution for storing messages, activated by a "parameter" of the endpoint
if it doesn't receive messages for X amount of time, it collects all the messages not sent to the LLM and sends them all at once (to the prompt)
version "2" is the way I'd like to see it working, but one doesn't prevent the other
The
POST /chat/{chat_id}
receives one message at a time, when it receives it, it processes it in the llm and returns a message to you, example:This flow of conversation is common in a chat environment (e.g. WhatsApp), where the user breaks the line. What the user would like to receive is only the answer to QA 3, the other messages are introductory ("presentation").
As implemented today, we answer one message at a time:
solution
Parameter in the endpoint (POST, create message) called
message bucket
, which activates intelligence to collect messages in the backend and make a single call to the LLM sending the collection of messages.I can think of a solution to collect requests and if no message is received at X after the last message received, call the LLM aggregating all the messages not sent.
The text was updated successfully, but these errors were encountered: