-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Showing
1 changed file
with
31 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,24 +1,43 @@ | ||
# Caching | ||
# Caching in Portkey | ||
|
||
Portkey supports caching across text & chat completions. When the exact same request comes in to Portkey, we can return the response from our cache. | ||
Portkey offers two types of caching to enhance performance and optimize response retrieval: fixed string matching cache(simple) and semantic cache. | ||
|
||
This could be useful if you have fixed input prompts or are testing the app with the same inputs. | ||
## Fixed String Matching Cache | ||
The fixed string matching cache is the traditional caching mechanism where an exact match is performed on the input prompts. If the exact same request is received again, Portkey can directly return the response from the cache without executing the model. | ||
|
||
### Enabling cache | ||
|
||
To enable caching, pass the following headers in your requests. | ||
### Enabling Fixed String Matching Cache | ||
To enable the fixed string matching cache, include the following headers in your requests: | ||
|
||
```sh | ||
"x-portkey-cache": true | ||
"Cache-Control": "max-age:1000" | ||
"x-portkey-cache": "simple" | ||
"Cache-Control": "max-age:1000" | ||
``` | ||
The x-portkey-cache header enables or disables the cache storage and retrieval. The Cache-Control header accepts the max-age parameter in seconds, which specifies the maximum age of the cached response. If the Cache-Control header is not provided, Portkey will automatically cache requests for 7*24*60*60 seconds (7 days) when x-portkey-cache is set to true. | ||
|
||
The `x-portkey-cache` enables or disables cache storage and retrieval. The `Cache-Control` header accepts `max-age` in seconds. The minimum value for `Cache-Control` is 30. If you don't provide this header, we will automatically cache requests for `7*24*60*60 seconds` (7 days) when the `x-portkey-cache` is set to `true`. | ||
### Invalidating Fixed String Matching Cache | ||
You can force refresh the fixed string matching cache by using the x-portkey-cache-force-refresh header. Setting it to true ensures that the cache is invalidated, and a new value is stored in the cache. | ||
|
||
### Invalidating Cache | ||
```sh | ||
"x-portkey-cache-force-refresh": true | ||
``` | ||
|
||
You can choose to force refresh cache by using the `x-portkey-cache-force-refresh` header. Setting it to `true` ensures that the cache is invalidated, and a new value is stored in the cache. | ||
## Semantic Cache | ||
The semantic cache in Portkey goes beyond exact string matching and takes into account the contextual similarity between input prompts. It uses cosine similarity to determine if the similarity between the input and a cached request exceeds a certain threshold. If the similarity threshold is met, Portkey retrieves the response from the cache. | ||
|
||
### Enabling Semantic Cache | ||
To enable the semantic cache feature, use the following header in your requests: | ||
|
||
```sh | ||
"x-portkey-cache-force-refresh": true | ||
"x-portkey-cache": "semantic" | ||
``` | ||
|
||
Setting the x-portkey-cache header to "semantic" enables the semantic cache functionality. | ||
|
||
### Implementation Details | ||
When utilizing the semantic cache, it's important to note that the Cache-Control header is still applicable to control the maximum age of the cached response. | ||
|
||
If you wish to force refresh the semantic cache and invalidate existing entries, you can use the x-portkey-cache-force-refresh header as described earlier. | ||
|
||
By leveraging the semantic cache, you can optimize the caching process by considering the contextual similarity of input prompts, leading to more efficient response retrieval. | ||
|
||
Choose the appropriate caching mechanism based on your use case to improve performance and minimize unnecessary model executions in Portkey. |