Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: ratelimitAI #364

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -616,6 +616,14 @@
}
]
},
{
"group": "Ratelimit AI (TS)",
"pages": [
"redis/sdks/ratelimit-ai/overview",
"redis/sdks/ratelimit-ai/getting-started",
"redis/sdks/ratelimit-ai/features"
]
},
{
"group": "Ratelimit (TS)",
"pages": [
Expand Down
98 changes: 98 additions & 0 deletions redis/sdks/ratelimit-ai/features.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
---
title: "Features"
---

## Ratelimit Types

Ratelimit AI provides three types of rate limits specifically designed for LLM API usage. You can use these limits individually or combine them based on your needs.

### Requests Per Minute (RPM)

The number of requests allowed per minute. This limit is useful for:

- Controlling the number of requests made to the LLM API in a short period.
- Preventing API throttling
- Managing concurrent request load

```typescript Example
const ratelimit = new RatelimitAI({
RPM: 100, // 100 requests per minute
});
```

The RPM limit is checked before each request. If a user exceeds the limit in a minute, subsequent requests will be rate limited until the minute window resets.

### Requests Per Day (RPD)

The number of requests allowed per day. This limit is useful for:

- Implement daily usage quotas
- Set up tiered access levels

```typescript Example
const ratelimit = new RatelimitAI({
RPD: 1000, // 1000 requests per day
});
```

The RPD counter resets at midnight UTC, making it ideal for implementing quotas like "1000 requests per day per API key"

### Tokens Per Minute (TPM)

A specialized limit that tracks total token usage per minute, which is essential for LLM API cost management:

- Counts both input (prompt) and output (completion) tokens
- Provides more granular cost control than request-based limits.
- Aligns with LLM provider pricing models

```typescript Example
const ratelimit = new RatelimitAI({
TPM: 5000, // 5000 tokens per minute
});
```

## Request Scheduling
When rate limits are hit, Ratelimit AI can automatically schedule requests for later processing.

```typescript Example
const ratelimit = new RatelimitAI({
RPM: 100,
RPD: 1000,
callback: "https://api.example.com/retry"
});
```
<Note>
This feature requires the `QSTASH_TOKEN` environment variable to be set.
</Note>


When a request hits the rate limit:

1. The request is automatically scheduled in QStash
2. QStash waits until the rate limit resets
3. The request is executed (prompt will be sent to the LLM API) and the response (completion) is sent to your callback URL.

## Analytics

RatelimitAI can collect analytics about your rate limit usage. Analytics tracking is disabled by default and can be enabled during initialization:

```typescript Example
const ratelimit = new RatelimitAI({
// ...
analytics: true
});
```

When analytics is enabled, RatelimitAI will collect information about the number of requests made, rate limit successes, and failures. This data can be viewed in the [Upstash Console](https://console.upstash.com).

### Dashboard

The Upstash Console provides a Rate Limit Analytics dashboard where you can monitor your usage. Access it by clicking the three dots menu in your Redis database page and selecting **Rate Limit Analytics**.

The dashboard displays three main categories of requests: allowed requests showing successful API calls, rate limited requests indicating which identifiers hit limits, and denied requests showing blocked API calls. You can view this data over time and see usage patterns for different rate limit types.

<Note>
If you've configured RatelimitAI with a custom prefix, enter the same prefix in the dashboard's top left corner to filter your analytics data.
</Note>

For each rate-limited request, the analytics system records the identifier, timestamp, limit type (RPM/RPD/TPM), and status. For token-based limits, it also tracks the number of tokens used. This information helps you understand your API usage patterns and optimize your rate limit configurations.
89 changes: 89 additions & 0 deletions redis/sdks/ratelimit-ai/getting-started.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
title: Getting Started
---

## Steps
<Steps>
<Step title="Create Redis Instance">
For the rate limit to work, we need to create an Upstash Redis and get its
credentials. To create an Upstash Redis, you can follow the [Upstash Redis
"Get Started" guide](https://upstash.com/docs/redis/overall/getstarted).
</Step>

<Step title="Installation">
First, we need to install `@upstash/ratelimit-ai`. You can use npm, pnpm, or yarn to install the package from your terminal.
<CodeGroup>
```bash npm
npm install @upstash/ratelimit-ai
```

```bash pnpm
pnpm install @upstash/ratelimit-ai
```

```bash bun
bun install @upstash/ratelimit-ai
```
</CodeGroup>
</Step>

<Step title="Environment Variables">
Set up your environment variables. You can find these in your [Upstash Console](https://console.upstash.com/).

```bash .env
UPSTASH_REDIS_REST_URL=****
UPSTASH_REDIS_REST_TOKEN=****
# Optional: For request scheduling
QSTASH_TOKEN=****
```
</Step>

<Step title="Initialize RatelimitAI">
Create a new instance of RatelimitAI with your desired configuration:
```typescript
import { fetchRatelimitAI, RatelimitAI } from "@upstash/ratelimit-ai/openai";
import { createOpenAI } from '@ai-sdk/openai';

const ratelimit = new RatelimitAI({
RPM: 100, // Requests per minute
RPD: 1000, // Requests per day
TPM: 5000, // Tokens per minute
analytics: true, // Optional: Enable analytics
});

const userid = "user1";
const openai = createOpenAI({
fetch: fetchRatelimitAI(userid, ratelimit)
});
```

RatelimitAI will automatically:
- Track request counts (RPM, RPD)
- Track token counts (TPM)
- Handle rate limiting
- Schedule requests if configured
</Step>

<Step title="Optional: Enable Request Scheduling">
If you want to automatically retry rate-limited requests, add QStash integration:

```typescript
const ratelimit = new RatelimitAI({
RPM: 100,
RPD: 1000,
TPM: 5000,
callback: "https://api.example.com/retry", // Your callback URL
analytics: true,
});
```

When a request is rate-limited, RatelimitAI will schedule a retry using QStash. You can then handle the retry in your backend.

</Step>
</Steps>

## Next Steps
Now that you have RatelimitAI set up, you can:
- Learn about available features in the [Features](./features) section
- See integration examples in the [Examples](./examples) section
- Monitor your usage in the [Upstash Console](https://console.upstash.com/ratelimit)
72 changes: 72 additions & 0 deletions redis/sdks/ratelimit-ai/overview.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
title: Overview
---

# Upstash Ratelimit AI

It is a specialized rate limiting library designed for Large Language Model (LLM) API providers (OpenAI, Anthropic, Azure etc.) built on top of Upstash Redis. This connectionless library provides token-aware rate limiting with built-in support for request scheduling and analytics.

## Quick Links

<CardGroup cols={3}>

<Card
title="Github Repository"
icon="github"
href="https://upstash.com/docs/redis/sdks/ratelimit-ts/features#caching"
>
Have a look at the source code
</Card>


<Card
title="Getting Started"
icon="flag-checkered"
href="https://upstash.com/docs/redis/sdks/ratelimit-ts/features#caching"
>
Start using Ratelimit AI
</Card>

<Card
title="Features"
icon="wand-magic-sparkles"
href="https://upstash.com/docs/redis/sdks/ratelimit-ts/features#caching"
>
See the things you can do with Ratelimit AI
</Card>
</CardGroup>

## Features

<CardGroup cols={2}>
<Card title="Multiple Rate Limits" icon="chart-line" href="#">
Support for RPM (Requests Per Minute), RPD (Requests Per Day), and TPM
(Tokens Per Minute)
</Card>
<Card title="Token Counting" icon="calculator" href="#">
Automatic token counting for both prompts and responses
</Card>
<Card title="Analytics" icon="chart-bar" href="#">
Built-in analytics support with Upstash Console integration
</Card>
<Card title="Request Scheduling" icon="clock" href="#">
Automatic request scheduling with QStash when rate limits are hit
</Card>
<Card title="Serverless First" icon="cloud" href="#">
Designed for serverless environments including Edge functions
</Card>
<Card title="Easy Integration" icon="puzzle-piece" href="#">
Simple integration with popular LLM providers and AI SDKs
</Card>
</CardGroup>

## Examples

<CardGroup cols={2}>
<Card title="OpenAI" href="#">
Rate limit OpenAI API calls with token tracking on Vercel AI SDK
</Card>
<Card title="Anthropic" href="#">
Rate limit Anthropic Claude API with request scheduling on Vercel AI SDK
</Card>
</CardGroup>