Azure API Management (APIM) provides built-in rate limiting policies, but implementing sophisticated Dollar cost quota management for Azure OpenAI services requires a more tailored approach. This solution combines Azure Functions, Cosmos DB, and stored procedures to implement cost-based quota management with automatic renewal periods.
Architecture
Client → APIM (with RateLimitConfig) → Azure Function Proxy → Azure OpenAI
↓
Cosmos DB (quota tracking)
Technical Implementation
1. Rate Limit Configuration in APIM
The rate limiting configuration is injected into the request body by APIM using a policy fragment. Here's an example for a basic $5 quota:
<set-variable name="rateLimitConfig" value="@{
var productId = context.Product.Id;
var config = new JObject();
config["counterKey"] = productId;
config["quota"] = 5;
return config.ToString();
}" />
<include-fragment fragment-id="RateLimitConfig" />
For more advanced scenarios, you can customize token costs. Here's an example for a $10 quota with custom token pricing:
<set-variable name="rateLimitConfig" value="@{
var productId = context.Product.Id;
var config = new JObject();
config["counterKey"] = productId;
config["startDate"] = "2025-03-02T00:00:00Z";
config["renewal_period"] = 86400;
config["explicitEndDate"] = null;
config["quota"] = 10;
config["input_cost_per_token"] = 0.00003;
config["output_cost_per_token"] = 0.00006;
return config.ToString();
}" />
<include-fragment fragment-id="RateLimitConfig" />
Flexible Counter Keys
The counterKey
parameter is highly flexible and can be set to any unique identifier that makes sense for your rate limiting strategy:
- Product ID: Limit all users of a specific APIM product (e.g., "starter", "professional")
- User ID: Apply individual limits per user
- Subscription ID: Track usage at the subscription level
- Custom combinations: Combine identifiers for granular control (e.g., "product_starter_user_12345")
Rate Limit Configuration Parameters
Parameter | Description | Example Value | Required |
---|---|---|---|
counterKey | Unique identifier for tracking quota usage | "starter10" or "user_12345" | Yes |
quota | Maximum cost allowed in the renewal period | 10 | Yes |
startDate | When the quota period begins. If not provided, the system uses the time when the policy is first applied | "2025-03-02T00:00:00Z" | No |
renewal_period | Seconds until quota resets (86400 = daily). If not provided, no automatic reset occurs | 86400 | No |
endDate | Optional end date for the quota period | null or "2025-12-31T23:59:59Z" | No |
input_cost_per_token | Custom cost per input token | 0.00003 | No |
output_cost_per_token | Custom cost per output token | 0.00006 | No |
Scheduling and Time Windows
The time-based parameters work together to create flexible quota schedules:
- If the current date falls outside the range defined by
startDate
andendDate
, requests will be rejected with an error - The renewal window begins either on the specified
startDate
or when the policy is first applied - The
renewal_period
determines how frequently the accumulated cost resets to zero - Without a
renewal_period
, the quota accumulates indefinitely until theendDate
is reached
2. Quota Checking and Cost Tracking
The Azure Function performs two key operations:
- Pre-request quota check: Before processing each request, it verifies if the user has exceeded their quota
- Post-request cost tracking: After a successful request, it calculates the cost and updates the accumulated usage
Cost Calculation
For cost calculation, the system uses:
- Custom pricing: If
input_cost_per_token
andoutput_cost_per_token
are provided in the rate limit config - LiteLLM pricing: If custom pricing is not specified, the system falls back to LiteLLM's model prices for accurate cost estimation based on the model being used
The function returns appropriate HTTP status codes and headers:
- HTTP 429 (Too Many Requests) when quota is exceeded
- Response headers with usage information:
x-counter-key: starter5 x-accumulated-cost: 5.000915 x-quota: 5
3. Cosmos DB for State Management
Cosmos DB maintains the quota state with documents that track:
{
"id": "starter5",
"counterKey": "starter5",
"accumulatedCost": 5.000915,
"startDate": "2025-03-02T00:00:00.000Z",
"renewalPeriod": 86400,
"renewalStart": 1741132800000,
"endDate": null,
"quota": 5
}
A stored procedure handles atomic updates to ensure accurate tracking, including:
- Adding costs to the accumulated total
- Automatically resetting costs when the renewal period is reached
- Updating quota values when configuration changes
Benefits
- Fine-grained Cost Control: Track actual API usage costs rather than just request counts
- Flexible Quotas: Set daily, weekly, or monthly quotas with automatic renewal
- Transparent Usage: Response headers provide real-time quota usage information
- Product Differentiation: Different APIM products can have different quota levels
- Custom Pricing: Override default token costs for special pricing tiers
- Flexible Tracking: Use any identifier as the counter key for versatile quota management
- Time-based Scheduling: Define active periods and automatic reset windows for quota management
Getting Started
- Deploy the Azure Function with Cosmos DB integration
- Configure APIM policies to include rate limit configuration
- Set up different product policies for various quota levels
For a detailed implementation, visit our GitHub repository.
Demo Video: https://www.youtube.com/watch?v=vMX86_XpSAo
Tags: #AzureOpenAI #APIM #CosmosDB #RateLimiting #Serverless
Updated Mar 06, 2025
Version 5.0hieunhu
Microsoft
Joined November 10, 2024
AI - Azure AI services Blog
Follow this blog board to get notified when there's new activity