Routing strategies
| Strategy | Behavior |
|---|---|
| Priority | Match rules in priority order. Default. |
| Lowest-latency | Route to the fastest provider based on historical metrics. |
| Lowest-cost | Route to the cheapest provider for the requested model. |
| Least-busy | Route to the provider with the lowest concurrent load. |
| Tag-based | Route based on tags attached to the request. |
| Auto-router | Adaptive routing based on SLA requirements. |
Routing rules
Rules use pattern matching against the model name in the request:- Pattern types:
equals,startsWith,contains. - Priority: lower number = higher priority.
- Budget limits: optional per-rule USD budget with a time window (
hour,day,month). Windows auto-reset. - Token limits: optional per-rule token cap with a time window.
- Fallback strategies:
next-priority,lowest-cost,lowest-latency,fail.
Example
A rule that sends everygpt-4o* request to OpenAI, falls back to the cheapest available provider if OpenAI is down, and caps the team at $100/day:
| Field | Value |
|---|---|
| Pattern | startsWith: gpt-4o |
| Priority | 10 |
| Target provider | OpenAI |
| Budget | $100 / day |
| Fallback | lowest-cost |