You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using exl2 in my environment to generate structured output (JSON). It works well. The issue is inferencing speed, as am unable to pass multi-instance filter to dynamic generator and thus unable to do batch inferencing.
On profiling I found the GIL is causing havoc on the throughput, as CPU usage is 100% and is a huge bottleneck. The system is unable to use additional CPUs and neither the framework lm-format-enforcer is under much active development recently.
Solution
I found there are framework for the same having much multi-instance inference support as well as significant speed-up on single thread performance.
I would see if you can't do what you want to do with Formatron. There's an example here. Formatron isn't hindered by the GIL and so you should have more luck with multiple jobs each running their own filter in a separate thread (as in the example).
Problem
I am using exl2 in my environment to generate structured output (JSON). It works well. The issue is inferencing speed, as am unable to pass multi-instance filter to dynamic generator and thus unable to do batch inferencing.
On profiling I found the GIL is causing havoc on the throughput, as CPU usage is 100% and is a huge bottleneck. The system is unable to use additional CPUs and neither the framework lm-format-enforcer is under much active development recently.
Solution
I found there are framework for the same having much multi-instance inference support as well as significant speed-up on single thread performance.
xgrammar framework
Can you look into this or guide me to how to get started to integrate with your inference framework?
Integration Docs
Alternatives
No response
Explanation
New framework should supports:
Paper
Blogpost
Examples
No response
Additional context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: