Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: cache is not working well #395

Open
arlenvasconcelos opened this issue Jan 9, 2025 · 2 comments
Open

Bug: cache is not working well #395

arlenvasconcelos opened this issue Jan 9, 2025 · 2 comments

Comments

@arlenvasconcelos
Copy link

Hi, team.

I am debugging my application that uses stagehand and I noted that the cache is not working very well.

I attached a simple code to help us to validate.

When running this application, I saw that we have many cache misses, even if the application is executed 3 times in a row, on a static page.

Debugging, I saw that the key used to get the cache and to set the cache are different, because of a simple attribute within the ZodObject: the _cached attribute. The ZodObject is inside of the cacheOptions object, that is used to create a hash (the cache key).

for the first iteration, the object used to hash is:

{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "system",
      "content": "You are extracting content on behalf of a user. If a user asks you to extract a 'list' of information, or 'all' information, YOU MUST EXTRACT ALL OF THE INFORMATION THAT THE USER REQUESTS. You will be given: 1. An instruction 2. A list of DOM elements to extract from. Print the exact text from the DOM elements with all symbols, characters, and endlines as is. Print null or an empty string if no new information is found. "
    },
    {
      "role": "user",
      "content": "Instruction: extract page title\nDOM: 0:<h1>Example Domain</h1>\n1:Example Domain\n2:<p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n3:This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.\n4:<a href=\"https://www.iana.org/domains/example\">More information...</a>\n5:More information...\n"
    }
  ],
  "temperature": 0.1,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "response_model": {
    "schema": {
      "_def": {
        "unknownKeys": "strip",
        "catchall": {
          "_def": {
            "typeName": "ZodNever"
          },
          "~standard": {
            "version": 1,
            "vendor": "zod"
          }
        },
        "typeName": "ZodObject"
      },
      "~standard": {
        "version": 1,
        "vendor": "zod"
      },
      "_cached": null
    },
    "name": "Extraction"
  }
}

and for set the cache the cachedOption (used to hash) is:

{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "system",
      "content": "You are extracting content on behalf of a user. If a user asks you to extract a 'list' of information, or 'all' information, YOU MUST EXTRACT ALL OF THE INFORMATION THAT THE USER REQUESTS. You will be given: 1. An instruction 2. A list of DOM elements to extract from. Print the exact text from the DOM elements with all symbols, characters, and endlines as is. Print null or an empty string if no new information is found. "
    },
    {
      "role": "user",
      "content": "Instruction: extract page title\nDOM: 0:<h1>Example Domain</h1>\n1:Example Domain\n2:<p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n3:This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.\n4:<a href=\"https://www.iana.org/domains/example\">More information...</a>\n5:More information...\n"
    }
  ],
  "temperature": 0.1,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "response_model": {
    "schema": {
      "_def": {
        "unknownKeys": "strip",
        "catchall": {
          "_def": {
            "typeName": "ZodNever"
          },
          "~standard": {
            "version": 1,
            "vendor": "zod"
          }
        },
        "typeName": "ZodObject"
      },
      "~standard": {
        "version": 1,
        "vendor": "zod"
      },
-      "_cached": null
+      "_cached": {
+        "shape": {
+          "pageTitle": {
+            "_def": {
+              "checks": [],
+              "typeName": "ZodString",
+              "coerce": false
+            },
+            "~standard": {
+              "version": 1,
+              "vendor": "zod"
+            }
+          }
+        },
+        "keys": [
+          "pageTitle"
+        ]
+      }
    },
    "name": "Extraction"
  }
}

In getCache function these objects are hashed and the result is different

As we increase the number of cache misses, this increases the cost of AI.

If I am wrong, help me, please.

@arlenvasconcelos
Copy link
Author

cache-bug.zip

@kamath
Copy link
Contributor

kamath commented Jan 13, 2025

Thanks for pointing this out! Caching is definitely still a WIP, appreciate the detailed bug report so we can repro and actually address this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants