You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RAI 2.0 will introduce the rai-benchmark package, designed for various Language Model (LM) based benchmarks. The architecture is being designed to be easily extensible, supporting:
Multiple simulation engines (with O3DE as our primary target)
Various benchmark tasks
Different types of metrics that can be collected at different points during scenario execution
The architecture should be reusable for the following use-cases:
Tracking RAI performance along versions (locked llm model, locked agent structure)
Benchmarking LLM model (locked agent structure, locked engine)
tasks (prompt and appropriate metric. Metric can be based on the provided services e.g. service getting poses of the objects.),
scene_configs defining e.g. start objects positions,
scenarios (pairs of the tasks and scene_configs)
Simulation needs to meet some assumptions defined by the specific EngineConnector - e.g. specific services (e.g. used for spawning objects getting objects names and poses, getting available objects to spawn); for O3DEngineConnector simulation provided as binary
SceneConfig is a model that matches specific EngineConnector - e.g. when O3DEngineConnector has a possibility to spawn objects, O3DESceneConfig has field defining objects names and their initial poses.
RAI 2.0 will introduce the rai-benchmark package, designed for various Language Model (LM) based benchmarks. The architecture is being designed to be easily extensible, supporting:
The architecture should be reusable for the following use-cases:
The text was updated successfully, but these errors were encountered: