-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change the configuration of engineArgs in config.pbtxt #72
base: main
Are you sure you want to change the base?
Conversation
# We need to use decoupled transaction policy for saturating | ||
# vLLM engine for max throughtput. | ||
# TODO [DLIS:5233]: Allow asynchronous execution to lift this | ||
# restriction for cases there is exactly a single response to | ||
# a single request. | ||
model_transaction_policy { | ||
decoupled: True | ||
} | ||
|
||
input [ | ||
{ | ||
name: "text_input" | ||
data_type: TYPE_STRING | ||
dims: [ 1 ] | ||
}, | ||
{ | ||
name: "stream" | ||
data_type: TYPE_BOOL | ||
dims: [ 1 ] | ||
optional: true | ||
}, | ||
{ | ||
name: "sampling_parameters" | ||
data_type: TYPE_STRING | ||
dims: [ 1 ] | ||
optional: true | ||
} | ||
] | ||
|
||
output [ | ||
{ | ||
name: "text_output" | ||
data_type: TYPE_STRING | ||
dims: [ -1 ] | ||
} | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've recently added autocomplete to vllm's model.py in this pr: triton-inference-server/vllm_backend#20
Thus, this entry can be eliminated from config.pbtxt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've recently added autocomplete to vllm's model.py in this pr: triton-inference-server/vllm_backend#20 Thus, this entry can be eliminated from config.pbtxt
Hi @oandreeva-nv Yes, I have seen the code of auto_complete_config.
But in my opinion, this may not be a better way to handle the parameters, because users will not know what parameters should be prepared through the configuration entry of config.pbtxt
unless they read model.py
.
This approach will indeed simplify config.pbtxt
, but I don't think this is a good interaction for users, especially if they are used to using Triton Server
. In the ecosystem of Triton Server
, this processing logic may make them feel confused, or at least I feel that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, inputs, outputs and model_transaction_policy do not change between different vLLM models, thus it might be helpful to auto complete these fields. That was the motivation
if k in model_engine_args: | ||
model_engine_args[k] = float(model_engine_args[k]) | ||
|
||
int_keys = ["seed", "pipeline_parallel_size", "tensor_parallel_size", "block_size", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you are doing. My only problem is that vLLM repository is under active development and these fields may change in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you are doing. My only problem is that vLLM repository is under active development and these fields may change in future.
@oandreeva-nv This is a very good question. In fact, just like the sampling parameter processing logic.
We really cannot elegantly handle such parameters that may change in a dynamic way, but generally speaking, the frequency of changes is not large.
We can correct this problem by compile matching versions of Triton Server + vLLM + model.py
.
In fact, 23.10-vllm-python-py3 has been released. The adaptation of Triton Server + vLLM + model.py
is closed-loop and can run very well. Even you can change the model.py
in /opt/tritonserver/backends/vllm
of the vllm_backend
container if you need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep major part of this tutorial as is, since it shows how to use Triton's vllm_backend. Thus, please don't change core part of README, you are welcome to add extra section thought.
On the other hand, it seems like this work is more related to Feature request, i.e. allow vllm_backend to set engineArgs from parameters, specified in config.pbtxt
and eliminate dependency on model.json
. If this is something you would like to be added to our vllm_backend, please open an issue on the main Triton issues page.
] | ||
|
||
# The configuration of engineArgs | ||
parameters { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do something like:
parameters [
{
key: ...
value: {...}
},
{
key: ...
value: {...}
},
...
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do something like:
parameters [ { key: ... value: {...} }, { key: ... value: {...} }, ... ]
@oandreeva-nv Yes, of course we can. But in fact, this is actually a relatively common usage. Can we do the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final decision is up to you, it just makes more sense for me to keep a list in parameters
, and not multiple parameters
entries with 1 parameter
CLA has been accepted by the Triton Team. This PR can be merged at our discretion. |
@nv-kmcgill53 OK, thank u so much. |
Hi @activezhao, thanks for your work! I was thinking about the structure of this tutorial. I would like to keep original version of this tutorial, which utilizes |
Hi @oandreeva-nv Cool, that sounds pretty good, it's a good idea. And could you please help us build the basic code structure, and then we can fill things in? |
Hi @oandreeva-nv I have created a folder named Customization with new README and suggested model.py and config.pbtxt. Could you please review the code? Thanks |
Quick_Deploy/Customization/vLLM/model_repository/vllm_model/1/model.py
Outdated
Show resolved
Hide resolved
Quick_Deploy/Customization/vLLM/model_repository/vllm_model/config.pbtxt
Outdated
Show resolved
Hide resolved
|
||
## Step 1: Prepare your model repository | ||
|
||
To use Triton, we need to build a model repository. A sample model repository for deploying `facebook/opt-125m` using vLLM in Triton is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, clarify that in this tutorial we are using Python backend + vLLM, so we probably also need to install vLLM in the environment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about since we are not using vllm backend, we target this tutorial to use general triton container, and not vllm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, clarify that in this tutorial we are using Python backend + vLLM, so we probably also need to install vLLM in the environment
Hi @oandreeva-nv In fact, I have used nvcr.io/nvidia/tritonserver:23.10-vllm-python-py3
with model.py + config.pbtxt
which I modified in the prod environment, and now it is running well.
So, l wonder that if I can change the README.md
to a best practice with details?
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, feel free to do as you see fit. I can take a look at the final README
Hi @activezhao , pre-commit hook failed unfortunately, could you please fix the formatting? |
OK, I will fix them. |
Hi @oandreeva-nv I have just updated README and adjusted directory structure, could you please help me to review the code? |
No description provided.