Text Generate REST API schema #18

gavrissh · 2023-12-05T17:39:20Z

Propose generate rest api endpoints

/v2/models/{model_name}/versions/${MODEL_VERSION}/generate
/v2/models/{model_name}/versions/${MODEL_VERSION}/generate_stream

Reference - https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/protocol/extension_generate.html#generate-extension

Propose generate rest api endpoints Signed-off-by: Gavrish Prabhu <[email protected]>

Signed-off-by: Gavrish Prabhu <[email protected]>

specification/protocol/generate_rest.yaml

yuzisun · 2023-12-10T22:14:45Z

specification/protocol/generate_rest.yaml

+        - model_version
+        - done
+      properties:
+        text_output:


This is concatenated text output, we might still want to see the token generated for each iteration.

In the Nvidia implementation, each response in returning cumulative set of tokens.

1st json { text_output: "Here is" } . . . .. subsequent json response { text_output: "Here is the output for the prompt" }

Should we add additional property to display token generated in current response set?

Signed-off-by: Gavrish Prabhu <[email protected]>

gavrissh · 2023-12-17T16:57:18Z

@yuzisun Wanted to follow up, if the current state of changes are alright?

specification/protocol/generate_rest.yaml

Signed-off-by: Gavrish Prabhu <[email protected]>

gavrissh · 2024-01-03T12:54:06Z

I have updated with all the recent discussed changes

specification/protocol/generate_rest.yaml

Signed-off-by: Gavrish Prabhu <[email protected]>

cmaddalozzo · 2024-01-11T17:15:27Z

We should probably add the option to return log probabilities in the result. This seems to be fairly common among other APIs. This would comprise a boolean logprobs parameter in the request and a corresponding logprobs property in the response containing an array of objects with keys token and logprob.

Signed-off-by: Gavrish Prabhu <[email protected]>

gavrissh · 2024-01-16T13:19:06Z

We should probably add the option to return log probabilities in the result. This seems to be fairly common among other APIs. This would comprise a boolean logprobs parameter in the request and a corresponding logprobs property in the response containing an array of objects with keys token and logprob.

I have updated the PR to support the above items

yuzisun · 2024-01-17T06:14:14Z

specification/protocol/generate_rest.yaml

+          type: string
+        logprobs:
+          $ref: '#/components/schemas/Logprobs'
+    Logprobs:


suggest change the naming to Token as it is not just logprob field, see https://github.com/huggingface/text-generation-inference/blob/main/docs/openapi.json#L844.

yuzisun · 2024-01-17T06:18:39Z

specification/protocol/generate_rest.yaml

+          type: string
+        model_version:
+          type: string
+        logprobs:


suggest change the name here, in TGI it is called details which includes the tokens, not sure if we should follow the same.
https://github.com/huggingface/text-generation-inference/blob/main/docs/openapi.json#L645

in OpenAI it is choices
https://platform.openai.com/docs/api-reference/chat/object

in OpenAI logprobs is property under choices.
Even I was not sure here. It is up for any suggestions

Current
Output -> {
text_output,
model_name,
model_version,
logprobs -> List[Token]
}

Token is followed as per TGI - https://huggingface.github.io/text-generation-inference/#/Text%20Generation%20Inference/generate_stream
Token -> {
id,
logprob,
special,
text
}

yuzisun · 2024-01-17T06:19:29Z

specification/protocol/generate_rest.yaml

+          type: string
+        finish_reason:
+          type: string
+        logprobs:


For streaming case it is a single token.

yuzisun · 2024-01-17T07:53:01Z

specification/protocol/generate_rest.yaml

+        parameters:
+          allOf: 
+            - $ref: '#/components/schemas/GenerateParameters'
+        logprob:


should this be part of the GenerateParameters ?

Signed-off-by: Gavrish Prabhu <[email protected]>

yuzisun · 2024-01-21T19:02:27Z

specification/protocol/generate_rest.yaml

+            type: string
+          description: Sequences where the API will stop generating further tokens.
+        logprob:
+          type: boolean  


Can you add a description for this flag, also I think this should be the details flag as logprob is one of the fields on it.

yuzisun · 2024-01-21T19:03:33Z

specification/protocol/generate_rest.yaml

+          type: string
+        details:
+          $ref: '#/components/schemas/StreamDetails'
+    Logprobs:


Add a description for this

yuzisun · 2024-01-21T19:04:22Z

specification/protocol/generate_rest.yaml

+        id:
+          type: integer
+          format: int32
+          minimum: 0
+        logprob:
+          type: number
+          format: float
+        special:
+          type: boolean
+        text:
+          type: string


Let's make sure we have descriptions for these fields

yuzisun · 2024-01-21T19:20:57Z

specification/protocol/generate_rest.yaml

+      type: object
+      additionalProperties: {}
+      properties:
+        finish_reason:


finish_reason should be an enum

yuzisun · 2024-01-21T19:21:56Z

specification/protocol/generate_rest.yaml

+      properties:
+        finish_reason:
+          type: string
+        logprobs:


both finish_reason and logprobs should be required if details is requested.

Signed-off-by: Gavrish Prabhu <[email protected]>

yuzisun · 2024-01-25T12:35:14Z

Thanks @gavrishp !! Great job on getting this going with the initial version.

/lgtm
/approve

oss-prow-bot · 2024-01-25T12:35:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gavrishp, yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [yuzisun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Create generate_rest.yaml

5f5e4b7

Propose generate rest api endpoints Signed-off-by: Gavrish Prabhu <[email protected]>

oss-prow-bot bot added the do-not-merge/work-in-progress label Dec 5, 2023

johnugeorge mentioned this pull request Dec 6, 2023

WIP: Add generate API schema #7

Closed

gavrissh changed the title ~~Propose generate rest api endpoints schema~~ Text Generate REST API schema Dec 6, 2023

gavrissh marked this pull request as ready for review December 6, 2023 12:27

oss-prow-bot bot removed the do-not-merge/work-in-progress label Dec 6, 2023

Update generate_rest.yaml

19973d4

Signed-off-by: Gavrish Prabhu <[email protected]>

yuzisun reviewed Dec 10, 2023

View reviewed changes

specification/protocol/generate_rest.yaml Show resolved Hide resolved

yuzisun reviewed Dec 10, 2023

View reviewed changes

specification/protocol/generate_rest.yaml Outdated Show resolved Hide resolved

yuzisun reviewed Dec 10, 2023

View reviewed changes

specification/protocol/generate_rest.yaml Outdated Show resolved Hide resolved

yuzisun reviewed Dec 10, 2023

View reviewed changes

gavrissh added 3 commits December 12, 2023 02:23

Update generate_rest.yaml

725b5ec

Signed-off-by: Gavrish Prabhu <[email protected]>

Update generate_rest.yaml

8e729f0

Signed-off-by: Gavrish Prabhu <[email protected]>

Update generate_rest.yaml

e300fe3

Signed-off-by: Gavrish Prabhu <[email protected]>

yuzisun reviewed Dec 18, 2023

View reviewed changes

specification/protocol/generate_rest.yaml Outdated Show resolved Hide resolved

yuzisun reviewed Dec 24, 2023

View reviewed changes

specification/protocol/generate_rest.yaml Outdated Show resolved Hide resolved

Update generate_rest.yaml

04686fa

Signed-off-by: Gavrish Prabhu <[email protected]>

akgraner mentioned this pull request Jan 2, 2024

Kubeflow Steering Committee Elections - Testimonial Phase - Johnu George kubeflow/community#675

Closed

Update generate_rest.yaml

d977938

Signed-off-by: Gavrish Prabhu <[email protected]>

yuzisun reviewed Jan 9, 2024

View reviewed changes

specification/protocol/generate_rest.yaml Outdated Show resolved Hide resolved

yuzisun reviewed Jan 9, 2024

View reviewed changes

specification/protocol/generate_rest.yaml Outdated Show resolved Hide resolved

yuzisun reviewed Jan 9, 2024

View reviewed changes

specification/protocol/generate_rest.yaml Outdated Show resolved Hide resolved

gavrissh added 3 commits January 9, 2024 11:45

Update generate_rest.yaml

50efd57

Signed-off-by: Gavrish Prabhu <[email protected]>

Update generate_rest.yaml

058b57b

Signed-off-by: Gavrish Prabhu <[email protected]>

Update generate_rest.yaml

e6977a6

Signed-off-by: Gavrish Prabhu <[email protected]>

Update generate_rest.yaml

24d9129

Signed-off-by: Gavrish Prabhu <[email protected]>

yuzisun reviewed Jan 17, 2024

View reviewed changes

Update generate_rest.yaml

642f018

Signed-off-by: Gavrish Prabhu <[email protected]>

yuzisun reviewed Jan 21, 2024

View reviewed changes

gavrissh added 2 commits January 23, 2024 16:58

Update generate_rest.yaml

5902bae

Signed-off-by: Gavrish Prabhu <[email protected]>

Update generate_rest.yaml

23fb361

Signed-off-by: Gavrish Prabhu <[email protected]>

oss-prow-bot bot assigned yuzisun Jan 25, 2024

oss-prow-bot bot added the lgtm label Jan 25, 2024

oss-prow-bot bot added the approved label Jan 25, 2024

yuzisun merged commit 52528cf into kserve:main Feb 6, 2024
4 checks passed

RobertSamoilescu mentioned this pull request May 9, 2024

Inference streaming support SeldonIO/MLServer#1750

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text Generate REST API schema #18

Text Generate REST API schema #18

gavrissh commented Dec 5, 2023 •

edited

Loading

yuzisun Dec 10, 2023

gavrissh Dec 11, 2023

gavrissh commented Dec 17, 2023

gavrissh commented Jan 3, 2024

cmaddalozzo commented Jan 11, 2024 •

edited

Loading

gavrissh commented Jan 16, 2024

yuzisun Jan 17, 2024 •

edited

Loading

yuzisun Jan 17, 2024 •

edited

Loading

yuzisun Jan 17, 2024

gavrissh Jan 17, 2024 •

edited

Loading

yuzisun Jan 17, 2024

yuzisun Jan 17, 2024

yuzisun Jan 21, 2024 •

edited

Loading

yuzisun Jan 21, 2024

yuzisun Jan 21, 2024

yuzisun Jan 21, 2024

yuzisun Jan 21, 2024

yuzisun commented Jan 25, 2024

oss-prow-bot bot commented Jan 25, 2024

Text Generate REST API schema #18

Text Generate REST API schema #18

Conversation

gavrissh commented Dec 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gavrissh commented Dec 17, 2023

gavrissh commented Jan 3, 2024

cmaddalozzo commented Jan 11, 2024 • edited Loading

gavrissh commented Jan 16, 2024

yuzisun Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

yuzisun Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gavrissh Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuzisun Jan 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuzisun commented Jan 25, 2024

oss-prow-bot bot commented Jan 25, 2024

gavrissh commented Dec 5, 2023 •

edited

Loading

cmaddalozzo commented Jan 11, 2024 •

edited

Loading

yuzisun Jan 17, 2024 •

edited

Loading

yuzisun Jan 17, 2024 •

edited

Loading

gavrissh Jan 17, 2024 •

edited

Loading

yuzisun Jan 21, 2024 •

edited

Loading