Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

leave comments for eval performance #384

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sameelarif
Copy link
Collaborator

why

We want an easy way to view eval category performance

what changed

Added comments from the CI, which log the performance of each eval category

test plan

Run existing evals and observe if comments are left by CI

Copy link

changeset-bot bot commented Jan 7, 2025

⚠️ No Changeset found

Latest commit: 331190f

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@sameelarif sameelarif added the combination These changes affect multiple Stagehand functions label Jan 7, 2025
@sameelarif sameelarif requested a review from kamath January 7, 2025 02:20
Copy link
Contributor

github-actions bot commented Jan 7, 2025

🧪 E2E Test Results

✅ All E2E tests passed successfully

Copy link
Contributor

@kamath kamath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p sure this leaves a bunch of comments; this will get annoying super quick. we want one comment that can edit itself

Copy link
Contributor

github-actions bot commented Jan 7, 2025

🔄 Combination Eval Results

Score: 78%
View detailed results

@sameelarif sameelarif marked this pull request as draft January 7, 2025 23:23
Copy link
Contributor

github-actions bot commented Jan 8, 2025

🧪 E2E Test Results

✅ All E2E tests passed successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
combination These changes affect multiple Stagehand functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants