RAI Benchmark: Architecture proposal #405

maciejmajek · 2025-02-06T10:50:56Z

RAI 2.0 will introduce the rai-benchmark package, designed for various Language Model (LM) based benchmarks. The architecture is being designed to be easily extensible, supporting:

Multiple simulation engines (with O3DE as our primary target)
Various benchmark tasks
Different types of metrics that can be collected at different points during scenario execution

The architecture should be reusable for the following use-cases:

Tracking RAI performance along versions (locked llm model, locked agent structure)
Benchmarking LLM model (locked agent structure, locked engine)
Benchmarking Agent structure (locked llm model, locked engine)
Benchmarking various game engines (locked llm model, locked agent structure)

MagdalenaKotynia · 2025-02-06T17:11:07Z

Current draft proposal: https://github.com/RobotecAI/rai/blob/4bb8c80960c2ea311cb1b29a44cd245e0912b106/src/rai_benchmarks/benchmark_model.py

MagdalenaKotynia · 2025-02-07T16:56:17Z

@maciejmajek cc @jmatejcz @knicked
Redefined draft of benchmark design.

It is based on the following assumptions:

User defines:
- agent
- tasks (prompt and appropriate metric. Metric can be based on the provided services e.g. service getting poses of the objects.),
- scene_configs defining e.g. start objects positions,
- scenarios (pairs of the tasks and scene_configs)
Simulation needs to meet some assumptions defined by the specific EngineConnector - e.g. specific services (e.g. used for spawning objects getting objects names and poses, getting available objects to spawn); for O3DEngineConnector simulation provided as binary
SceneConfig is a model that matches specific EngineConnector - e.g. when O3DEngineConnector has a possibility to spawn objects, O3DESceneConfig has field defining objects names and their initial poses.

maciejmajek added this to RAI 2.0 Feb 4, 2025

maciejmajek converted this from a draft issue Feb 6, 2025

maciejmajek assigned MagdalenaKotynia and knicked Feb 6, 2025

maciejmajek assigned jmatejcz Feb 6, 2025

MagdalenaKotynia mentioned this issue Feb 12, 2025

feat: rai_sim #415

Merged

MagdalenaKotynia mentioned this issue Feb 19, 2025

feat: O3DE test benchmark #426

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAI Benchmark: Architecture proposal #405

RAI Benchmark: Architecture proposal #405

maciejmajek commented Feb 6, 2025 •

edited

Loading

MagdalenaKotynia commented Feb 6, 2025 •

edited

Loading

MagdalenaKotynia commented Feb 7, 2025 •

edited

Loading

RAI Benchmark: Architecture proposal #405

RAI Benchmark: Architecture proposal #405

Comments

maciejmajek commented Feb 6, 2025 • edited Loading

MagdalenaKotynia commented Feb 6, 2025 • edited Loading

MagdalenaKotynia commented Feb 7, 2025 • edited Loading

maciejmajek commented Feb 6, 2025 •

edited

Loading

MagdalenaKotynia commented Feb 6, 2025 •

edited

Loading

MagdalenaKotynia commented Feb 7, 2025 •

edited

Loading