Bug: cache is not working well #395

arlenvasconcelos · 2025-01-09T18:55:38Z

Hi, team.

I am debugging my application that uses stagehand and I noted that the cache is not working very well.

I attached a simple code to help us to validate.

When running this application, I saw that we have many cache misses, even if the application is executed 3 times in a row, on a static page.

Debugging, I saw that the key used to get the cache and to set the cache are different, because of a simple attribute within the ZodObject: the _cached attribute. The ZodObject is inside of the cacheOptions object, that is used to create a hash (the cache key).

for the first iteration, the object used to hash is:

{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "system",
      "content": "You are extracting content on behalf of a user. If a user asks you to extract a 'list' of information, or 'all' information, YOU MUST EXTRACT ALL OF THE INFORMATION THAT THE USER REQUESTS. You will be given: 1. An instruction 2. A list of DOM elements to extract from. Print the exact text from the DOM elements with all symbols, characters, and endlines as is. Print null or an empty string if no new information is found. "
    },
    {
      "role": "user",
      "content": "Instruction: extract page title\nDOM: 0:<h1>Example Domain</h1>\n1:Example Domain\n2:<p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n3:This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.\n4:<a href=\"https://www.iana.org/domains/example\">More information...</a>\n5:More information...\n"
    }
  ],
  "temperature": 0.1,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "response_model": {
    "schema": {
      "_def": {
        "unknownKeys": "strip",
        "catchall": {
          "_def": {
            "typeName": "ZodNever"
          },
          "~standard": {
            "version": 1,
            "vendor": "zod"
          }
        },
        "typeName": "ZodObject"
      },
      "~standard": {
        "version": 1,
        "vendor": "zod"
      },
      "_cached": null
    },
    "name": "Extraction"
  }
}

and for set the cache the cachedOption (used to hash) is:

{
  "model": "gpt-4o-mini",
  "messages": [
    {
      "role": "system",
      "content": "You are extracting content on behalf of a user. If a user asks you to extract a 'list' of information, or 'all' information, YOU MUST EXTRACT ALL OF THE INFORMATION THAT THE USER REQUESTS. You will be given: 1. An instruction 2. A list of DOM elements to extract from. Print the exact text from the DOM elements with all symbols, characters, and endlines as is. Print null or an empty string if no new information is found. "
    },
    {
      "role": "user",
      "content": "Instruction: extract page title\nDOM: 0:<h1>Example Domain</h1>\n1:Example Domain\n2:<p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n3:This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.\n4:<a href=\"https://www.iana.org/domains/example\">More information...</a>\n5:More information...\n"
    }
  ],
  "temperature": 0.1,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "response_model": {
    "schema": {
      "_def": {
        "unknownKeys": "strip",
        "catchall": {
          "_def": {
            "typeName": "ZodNever"
          },
          "~standard": {
            "version": 1,
            "vendor": "zod"
          }
        },
        "typeName": "ZodObject"
      },
      "~standard": {
        "version": 1,
        "vendor": "zod"
      },
-      "_cached": null
+      "_cached": {
+        "shape": {
+          "pageTitle": {
+            "_def": {
+              "checks": [],
+              "typeName": "ZodString",
+              "coerce": false
+            },
+            "~standard": {
+              "version": 1,
+              "vendor": "zod"
+            }
+          }
+        },
+        "keys": [
+          "pageTitle"
+        ]
+      }
    },
    "name": "Extraction"
  }
}

In getCache function these objects are hashed and the result is different

As we increase the number of cache misses, this increases the cost of AI.

If I am wrong, help me, please.

The text was updated successfully, but these errors were encountered:

arlenvasconcelos · 2025-01-09T19:02:03Z

cache-bug.zip

kamath · 2025-01-13T00:16:08Z

Thanks for pointing this out! Caching is definitely still a WIP, appreciate the detailed bug report so we can repro and actually address this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: cache is not working well #395

Bug: cache is not working well #395

arlenvasconcelos commented Jan 9, 2025

arlenvasconcelos commented Jan 9, 2025

kamath commented Jan 13, 2025

Bug: cache is not working well #395

Bug: cache is not working well #395

Comments

arlenvasconcelos commented Jan 9, 2025

arlenvasconcelos commented Jan 9, 2025

kamath commented Jan 13, 2025