Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAI Benchmark: Architecture proposal #405

Open
maciejmajek opened this issue Feb 6, 2025 · 2 comments
Open

RAI Benchmark: Architecture proposal #405

maciejmajek opened this issue Feb 6, 2025 · 2 comments
Assignees

Comments

@maciejmajek
Copy link
Member

maciejmajek commented Feb 6, 2025

RAI 2.0 will introduce the rai-benchmark package, designed for various Language Model (LM) based benchmarks. The architecture is being designed to be easily extensible, supporting:

  • Multiple simulation engines (with O3DE as our primary target)
  • Various benchmark tasks
  • Different types of metrics that can be collected at different points during scenario execution

The architecture should be reusable for the following use-cases:

  • Tracking RAI performance along versions (locked llm model, locked agent structure)
  • Benchmarking LLM model (locked agent structure, locked engine)
  • Benchmarking Agent structure (locked llm model, locked engine)
  • Benchmarking various game engines (locked llm model, locked agent structure)
@maciejmajek maciejmajek converted this from a draft issue Feb 6, 2025
@MagdalenaKotynia
Copy link
Member

MagdalenaKotynia commented Feb 6, 2025

@MagdalenaKotynia
Copy link
Member

MagdalenaKotynia commented Feb 7, 2025

@maciejmajek cc @jmatejcz @knicked
Redefined draft of benchmark design.

It is based on the following assumptions:

  • User defines:
    • agent
    • tasks (prompt and appropriate metric. Metric can be based on the provided services e.g. service getting poses of the objects.),
    • scene_configs defining e.g. start objects positions,
    • scenarios (pairs of the tasks and scene_configs)
  • Simulation needs to meet some assumptions defined by the specific EngineConnector - e.g. specific services (e.g. used for spawning objects getting objects names and poses, getting available objects to spawn); for O3DEngineConnector simulation provided as binary
  • SceneConfig is a model that matches specific EngineConnector - e.g. when O3DEngineConnector has a possibility to spawn objects, O3DESceneConfig has field defining objects names and their initial poses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

4 participants