Limitations - Guardway

Current limitations, known issues, and planned improvements for Guardway Gateway.

Provider Limitations

Limited Provider Support

Current: 18 providers Target: 50+ providers Impact: May not support all LLM providers needed by users Supported Providers:

OpenAI (GPT-4, GPT-3.5, DALL-E, Whisper, embeddings)
Anthropic (Claude models)
Google (Gemini)
Groq (fast inference)
Mistral AI
Cohere
Deepseek
Fireworks
HuggingFace
Together AI
Perplexity
OpenRouter
xAI (Grok)
Voyage (embeddings)
AWS Bedrock
AssemblyAI (audio)
ElevenLabs (TTS)
Fal.ai (images)

Missing Providers

Missing Providers (compared to LiteLLM):

AI21 Labs
Aleph Alpha
Baseten
Cloudflare Workers AI
Databricks
Gemini Pro Vision
Hugging Face Inference Endpoints
Ollama (local models)
Replicate
VertexAI
Many others (60+ additional providers in LiteLLM)

Workaround: Use generic OpenAI-compatible adapter for providers with OpenAI-compatible APIs Planned: Provider expansion is top priority

Missing Features

The following features are not yet implemented or only partially available. Workarounds are provided where possible.

1. Batch Processing API

Status: Not implemented Description: No support for OpenAI batch API (/v1/batches) Impact: Cannot process large batches of requests asynchronously Workaround: Process requests individually in parallel client-side Planned: Q2 2025

2. Vector Store Support

Status: Not implemented Description: No built-in vector database integration for RAG applications Missing:

Vector storage endpoints
Embedding management
Similarity search

Impact: Must manage vector storage separately Workaround: Use external vector databases (Pinecone, Weaviate, Qdrant, etc.) Planned: Q3 2025

3. Assistants API

Status: Partial implementation Description: OpenAI Assistants API (/v1/assistants) partially implemented Missing:

Code interpreter
File search
Function calling with assistants
Thread management

Impact: Cannot use assistants for complex workflows Workaround: Implement assistant logic client-side Planned: Q2 2025

4. Fine-tuning API

Status: Not implemented Description: No support for model fine-tuning endpoints Impact: Cannot manage fine-tuned models through gateway Workaround: Fine-tune directly with provider APIs Planned: Q4 2025 (lower priority)

5. Advanced Routing Strategies

Status: Partially implemented Available:

Priority-based routing
Lowest-cost routing
Lowest-latency routing

Missing:

Load-based routing (current request queue)
Geographic routing (nearest region)
A/B testing routing
Custom routing logic

Impact: Limited routing flexibility Workaround: Use priority-based routing with multiple rules Planned: Q2 2025 Status: Partial support Supported:

Text (chat, completions)
Images (vision models, image generation)
Audio (transcription, TTS)

Missing:

Video processing
Multi-modal embeddings
Image editing

Impact: Cannot process video or complex multi-modal inputs Planned: Q3 2025

Performance Limitations

Guardrail and API latency are the primary performance concerns. Disable unused guardrails and choose appropriate model sizes to minimize impact.

1. Guardrail Latency

Issue: SLM guardrails add 30-50ms latency Impact: Noticeable delay for low-latency requirements (<100ms) Breakdown:

PII Detection: ~15ms
Hate Speech Detection: ~20ms
Prompt Injection Detection: ~15ms
Total (parallel): ~30-50ms

Mitigation:

Disable unused guardrails
Use selective guardrails (only on sensitive endpoints)
Adjust inspection direction (request-only vs both)

Planned: Optimize to <20ms P95 (Q2 2025)

2. OrionFence API Latency

Issue: OrionFence ML API adds 100-500ms latency Impact: Significant latency increase for PII detection Breakdown (depends on model):

spacy-sm: 50-100ms
spacy-md: 100-200ms
spacy-lg: 200-300ms
transformers models: 300-500ms

Mitigation:

Use local regex-based detection (faster but less accurate)
Enable fallback mode
Use smaller models for non-critical detection

Workaround: Disable OrionFence API for low-latency endpoints

3. Cache Hit Rate

Issue: Cache effectiveness depends on request similarity Typical Hit Rates:

Identical requests: 90-100%
Similar requests (different parameters): 0%
Streaming requests: 0% (not cacheable)

Impact: Limited cache effectiveness for diverse workloads Mitigation:

Use cache for repeated queries (FAQ, common questions)
Implement request normalization
Increase cache TTL for stable responses

Planned: Semantic caching (similar requests) in Q3 2025

4. Memory Usage

Issue: In-memory store requires all metadata to fit in RAM Current Usage:

Baseline: ~100MB
Per 1000 API keys: ~5MB
Per 10000 request logs: ~50MB
Per provider model catalog: ~1MB

Impact: Memory limits scale of metadata Recommendation:

< 10,000 API keys: 2GB RAM sufficient
< 100,000 API keys: 4GB RAM sufficient
100,000 API keys: Consider external database

Planned: Hybrid storage (hot data in memory, cold in database) in Q3 2025

Scalability Constraints

Single Redis instance and in-memory log storage are the primary scalability bottlenecks. Plan for Redis Cluster and external log storage for large deployments.

1. Single Redis Instance

Issue: Single Redis instance is a bottleneck and single point of failure Impact:

Limited throughput (~50,000 ops/sec)
No automatic failover
Data loss if Redis crashes (without persistence)

Mitigation:

Enable Redis persistence (AOF + RDB)
Use Redis Sentinel for automatic failover
Use Redis Cluster for horizontal scaling

Planned: Redis Cluster support in Q2 2025

2. Request Log Storage

Issue: Logs stored in-memory with limited retention Current Limits:

In-memory: Last 10,000 requests
Redis: Configurable TTL (default 30 days)

Impact: Cannot query old logs beyond retention period Workaround: Export logs to external system (Elasticsearch, S3, etc.) Planned: External log storage integration (Q3 2025)

3. Concurrent Request Limit

Issue: Node.js event loop limits concurrent processing Typical Limits:

CPU-bound: ~1,000 requests/sec per core
I/O-bound: ~5,000 requests/sec per core

Impact: Need multiple instances for high throughput Mitigation:

Horizontal scaling (add more instances)
Use clustering (Node.js cluster module)
Optimize middleware (reduce processing time)

Recommendation: 1 instance per 5,000 req/s

4. WebSocket Limitations

Issue: No WebSocket support for real-time bidirectional communication Impact: Cannot use WebSockets for streaming or real-time updates Workaround: Use Server-Sent Events (SSE) for server-to-client streaming Planned: WebSocket support for admin UI in Q3 2025

Known Bugs and Workarounds

Streaming with Anthropic Claude

Issue: Streaming responses may occasionally include duplicate chunksAffected: Anthropic provider, streaming modeWorkaround: Client-side deduplication based on chunk IDsStatus: Fix planned for next release

Rate Limit Reset Time

Issue: Rate limit reset time in headers may be slightly inaccurateAffected: X-RateLimit-Reset-Requests headerImpact: Off by a few seconds (~5s)Workaround: Add 10s buffer when checking reset timeStatus: Low priority, cosmetic issue

Large Request Payloads

Issue: Requests >1MB may timeout with default settingsAffected: Image generation, large context windowsWorkaround: Increase timeouts:

GATEWAY_REQUEST_TIMEOUT_MS=300000
GATEWAY_COMPLETION_TIMEOUT_MS=240000

Status: Working as designed, configurable

Model Discovery Performance

Issue: Model discovery on startup can be slow (2-5 seconds per provider)Impact: Delayed startup timeWorkaround: Disable automatic discovery:

GATEWAY_ENABLE_MODEL_DISCOVERY=0

Manually configure models in provider settings.Status: Optimization planned for Q2 2025

MCP Session Timeout

Issue: MCP sessions timeout after 25 seconds of inactivityImpact: Need to reinitialize frequently used MCP serversWorkaround: Increase session TTL:

MCP_SESSION_TTL_MS=60000

Status: Working as designed, configurable

Browser Compatibility

Admin UI Browser Support

Fully Supported:

Chrome 90+
Firefox 88+
Safari 14+
Edge 90+

Partial Support:

Safari 13 (missing some CSS features)
Firefox 85-87 (missing some features)

Not Supported:

Internet Explorer (all versions)
Chrome < 90
Safari < 13

Known Issues:

Safari 13: Dark mode toggle animation broken
Firefox < 88: WebSocket status indicator not working
Mobile browsers: Limited support for playground features

Recommendation: Use latest Chrome or Firefox for best experience

MCP Limitations

1. MCP Protocol Version

Issue: Only supports MCP protocol version 2024-11-05 Impact: Incompatible with older MCP servers Workaround: Update MCP servers to latest version Planned: Backward compatibility in Q2 2025

2. MCP Server Types

Supported:

Remote HTTP MCP servers
Stdio-based MCP servers (via bridge)

Not Supported:

WebSocket MCP servers
gRPC MCP servers

Impact: Cannot use WebSocket or gRPC-based MCP servers Workaround: Convert to HTTP or stdio Planned: WebSocket support in Q3 2025

3. MCP Tool Size Limits

Issue: MCP tool responses limited to 512KB Configured: MCP_MAX_REQUEST_BYTES=524288 Impact: Cannot return large tool responses Workaround: Increase limit or stream responses Status: Working as designed, configurable

4. MCP Session Management

Issue: Sessions not shared across gateway instances Impact: Each gateway instance creates separate MCP sessions Workaround: Use session affinity (sticky sessions) in load balancer Planned: Shared session management in Q3 2025

Caching Limitations

1. Cache Key Strategy

Issue: Cache key is exact request hash (no semantic similarity) Impact: Minor request variations bypass cache Example:

“What is 2+2?” (cache miss)
“What is two plus two?” (cache miss, different wording)

Workaround: Normalize requests client-side Planned: Semantic caching in Q3 2025

2. Streaming Responses

Issue: Streaming responses are not cached Impact: No cache benefit for streaming requests Workaround: Use non-streaming requests when possible Status: Cannot cache streaming due to SSE protocol

3. Cache Invalidation

Issue: No automatic cache invalidation on configuration changes Impact: Stale cached responses after model changes Workaround: Manual cache clear after config changes:

curl -X POST http://localhost:8000/cache/clear \
  -H "Authorization: Bearer $SESSION_TOKEN"

Planned: Automatic invalidation in Q2 2025

4. Cache Size Limits

Issue: In-memory cache size limited to prevent memory exhaustion Default: 10,000 entries Impact: Cache eviction under high load Workaround: Use Redis cache for larger capacity:

CACHE_TYPE=redis

Status: Working as designed, configurable

Future Improvements

See below for planned improvements to address these limitations.

Short-Term (0-3 months)

Expand provider support to 50+ providers
Improve API documentation
Optimize guardrail performance
Add batch processing API

Medium-Term (3-6 months)

Vector store integration
Advanced routing strategies
Redis Cluster support
WebSocket support

Long-Term (6-12 months)

Semantic caching
Fine-tuning API
Multi-modal expansion
Auto-scaling optimizations

Reporting Issues

Found a bug or limitation not listed here?

Check GitHub Issues: https://github.com/guardwayai/agsec/issues
Search Documentation: May already be documented
Create Issue: Include:
- Description of issue
- Steps to reproduce
- Expected vs actual behavior
- Environment details (OS, Docker version, etc.)
- Logs/screenshots

Security Issues: Email security@guardway.io (do not create public issue)

For workarounds and best practices, see:

Getting Started

Guardway Gateway

Troubleshooting

Reference

​Provider Limitations

​Limited Provider Support

​Missing Features

​1. Batch Processing API

​2. Vector Store Support

​3. Assistants API

​4. Fine-tuning API

​5. Advanced Routing Strategies

​6. Multi-Modal Support

​Performance Limitations

​1. Guardrail Latency

​2. OrionFence API Latency

​3. Cache Hit Rate

​4. Memory Usage

​Scalability Constraints

​1. Single Redis Instance

​2. Request Log Storage

​3. Concurrent Request Limit

​4. WebSocket Limitations

​Known Bugs and Workarounds

​Browser Compatibility

​Admin UI Browser Support

​MCP Limitations

​1. MCP Protocol Version

​2. MCP Server Types

​3. MCP Tool Size Limits

​4. MCP Session Management

​Caching Limitations

​1. Cache Key Strategy

​2. Streaming Responses

​3. Cache Invalidation

​4. Cache Size Limits

​Future Improvements

Short-Term (0-3 months)

Medium-Term (3-6 months)

Long-Term (6-12 months)

​Reporting Issues

Provider Limitations

Limited Provider Support

Missing Features

1. Batch Processing API

2. Vector Store Support

3. Assistants API

4. Fine-tuning API

5. Advanced Routing Strategies

6. Multi-Modal Support

Performance Limitations

1. Guardrail Latency

2. OrionFence API Latency

3. Cache Hit Rate

4. Memory Usage

Scalability Constraints

1. Single Redis Instance

2. Request Log Storage

3. Concurrent Request Limit

4. WebSocket Limitations

Known Bugs and Workarounds

Browser Compatibility

Admin UI Browser Support

MCP Limitations

1. MCP Protocol Version

2. MCP Server Types

3. MCP Tool Size Limits

4. MCP Session Management

Caching Limitations

1. Cache Key Strategy

2. Streaming Responses

3. Cache Invalidation

4. Cache Size Limits

Future Improvements

Reporting Issues