This repository contains a benchmark for evaluating large language models (LLMs) on their ability to handle medical ethics tasks. The primary focus of the benchmark is to assess LLMs' performance in the domains of medical knowledge and ethical decision-making.
The repository currently contains the following folder:
dataset/
This folder contains the datasets used for evaluating LLMs on medical ethics tasks. It includes data for various tasks such as:- Knowledge Evaluation: Assessing the model's grasp of medical ethics knowledge.
- Detecting Violations: Evaluating the model's ability to identify violations of medical ethics.
- Priority Dilemma: Testing the model’s decision-making in ethically charged dilemmas with clear priorities.
- Equilibrium Dilemma: Evaluating how well the model handles ethically neutral or balanced dilemmas.