ai-evaluation

Here are 12 public repositories matching this topic...

lechmazur / confabulations

Hallucinations (Confabulations) Document-Based Benchmark for RAG

benchmark leaderboard gemini llama language-model claude rag o1 hallucinations ai-evaluation llm gemini-pro llm-benchmarking confabulations deepseek-r1 o3-mini

Updated Feb 10, 2025
HTML

METR / vivaria

Star

Vivaria is METR's tool for running evaluations and conducting agent elicitation research.

ai elicitation ai-evaluation evals

Updated Feb 11, 2025
TypeScript

taoAIGC / AI-Shortcuts

Star

one click to open multi AI sites ｜一键打开多个 AI 站点，查看 AI 结果

ai gemini poe claude perplexity ai-evaluation llm chatgpt

Updated Jan 21, 2025

Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation metrics.

nlp machine-learning gemini llama language-model model-evaluation ai-safety mistral claude disinformation ai-security ai-benchmarks ai-evaluation llm llm-benchmarking gpt4o

Updated Jan 21, 2025

rungalileo / agent-leaderboard

Star

Ranking LLMs on agentic tasks

ai evaluation ai-agents ai-evaluation llms

Updated Feb 11, 2025
Jupyter Notebook

dpc10ster / RJafrocRocBook

Star

ROC methodology explained with R-examples

book roc ai-evaluation

Updated Apr 25, 2024
TeX

bigdata-ustc / CAT4AI

Star

Adaptive Testing Framework for AI Models (Psychometrics in AI Evaluation)

psychometrics adaptive-testing ai-evaluation

Updated Oct 1, 2024
Jupyter Notebook

dpc10ster / RJafrocFrocBook

Star

FROC methodology explained with R-examples

pdf r book ai-evaluation

Updated Dec 26, 2023
TeX

dpc10ster / RJafrocQuickStart

Star

RJafroc quick start for those already familiar with windows jafroc

r rjafroc ai-evaluation

Updated Dec 28, 2023
TeX

dpc10ster / WindowsJAFROC

Star

Installation files for Windows JAFROC software

windows ai-evaluation jafroc

Updated Feb 8, 2023

dpc10ster / datasets

Star

ROC/FROC datasets from my collaborations

datasets ai-evaluation

Updated Aug 14, 2023

gabrielhamalwa / magpie

Star

Repository for the LWDA'24 presentation on 'Psychometric Profiling of GPT Models for Bias Exploration', featuring conference materials including the poster, paper, slides, and references.

ai-safety personality-traits interpretability cognitive-bias explainability ai-evaluation gpt-models machine-psychology ai-bias psychometric-analysis lwda24

Updated Sep 23, 2024
TeX

Improve this page

Add a description, image, and links to the ai-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-evaluation

Here are 12 public repositories matching this topic...

lechmazur / confabulations

METR / vivaria

taoAIGC / AI-Shortcuts

lechmazur / deception

rungalileo / agent-leaderboard

dpc10ster / RJafrocRocBook

bigdata-ustc / CAT4AI

dpc10ster / RJafrocFrocBook

dpc10ster / RJafrocQuickStart

dpc10ster / WindowsJAFROC

dpc10ster / datasets

gabrielhamalwa / magpie

Improve this page

Add this topic to your repo