Skip to main content
Current limitations, known issues, and planned improvements for Guardway Gateway.

Provider Limitations

Limited Provider Support

Current: 18 providers Target: 50+ providers Impact: May not support all LLM providers needed by users Supported Providers:
  • OpenAI (GPT-4, GPT-3.5, DALL-E, Whisper, embeddings)
  • Anthropic (Claude models)
  • Google (Gemini)
  • Groq (fast inference)
  • Mistral AI
  • Cohere
  • Deepseek
  • Fireworks
  • HuggingFace
  • Together AI
  • Perplexity
  • OpenRouter
  • xAI (Grok)
  • Voyage (embeddings)
  • AWS Bedrock
  • AssemblyAI (audio)
  • ElevenLabs (TTS)
  • Fal.ai (images)
Missing Providers (compared to LiteLLM):
  • AI21 Labs
  • Aleph Alpha
  • Baseten
  • Cloudflare Workers AI
  • Databricks
  • Gemini Pro Vision
  • Hugging Face Inference Endpoints
  • Ollama (local models)
  • Replicate
  • VertexAI
  • Many others (60+ additional providers in LiteLLM)
Workaround: Use generic OpenAI-compatible adapter for providers with OpenAI-compatible APIs Planned: Provider expansion is top priority

Missing Features

The following features are not yet implemented or only partially available. Workarounds are provided where possible.

1. Batch Processing API

Status: Not implemented Description: No support for OpenAI batch API (/v1/batches) Impact: Cannot process large batches of requests asynchronously Workaround: Process requests individually in parallel client-side Planned: Q2 2025

2. Vector Store Support

Status: Not implemented Description: No built-in vector database integration for RAG applications Missing:
  • Vector storage endpoints
  • Embedding management
  • Similarity search
Impact: Must manage vector storage separately Workaround: Use external vector databases (Pinecone, Weaviate, Qdrant, etc.) Planned: Q3 2025

3. Assistants API

Status: Partial implementation Description: OpenAI Assistants API (/v1/assistants) partially implemented Missing:
  • Code interpreter
  • File search
  • Function calling with assistants
  • Thread management
Impact: Cannot use assistants for complex workflows Workaround: Implement assistant logic client-side Planned: Q2 2025

4. Fine-tuning API

Status: Not implemented Description: No support for model fine-tuning endpoints Impact: Cannot manage fine-tuned models through gateway Workaround: Fine-tune directly with provider APIs Planned: Q4 2025 (lower priority)

5. Advanced Routing Strategies

Status: Partially implemented Available:
  • Priority-based routing
  • Lowest-cost routing
  • Lowest-latency routing
Missing:
  • Load-based routing (current request queue)
  • Geographic routing (nearest region)
  • A/B testing routing
  • Custom routing logic
Impact: Limited routing flexibility Workaround: Use priority-based routing with multiple rules Planned: Q2 2025

6. Multi-Modal Support

Status: Partial support Supported:
  • Text (chat, completions)
  • Images (vision models, image generation)
  • Audio (transcription, TTS)
Missing:
  • Video processing
  • Multi-modal embeddings
  • Image editing
Impact: Cannot process video or complex multi-modal inputs Planned: Q3 2025

Performance Limitations

Guardrail and API latency are the primary performance concerns. Disable unused guardrails and choose appropriate model sizes to minimize impact.

1. Guardrail Latency

Issue: SLM guardrails add 30-50ms latency Impact: Noticeable delay for low-latency requirements (<100ms) Breakdown:
  • PII Detection: ~15ms
  • Hate Speech Detection: ~20ms
  • Prompt Injection Detection: ~15ms
  • Total (parallel): ~30-50ms
Mitigation:
  • Disable unused guardrails
  • Use selective guardrails (only on sensitive endpoints)
  • Adjust inspection direction (request-only vs both)
Planned: Optimize to <20ms P95 (Q2 2025)

2. OrionFence API Latency

Issue: OrionFence ML API adds 100-500ms latency Impact: Significant latency increase for PII detection Breakdown (depends on model):
  • spacy-sm: 50-100ms
  • spacy-md: 100-200ms
  • spacy-lg: 200-300ms
  • transformers models: 300-500ms
Mitigation:
  • Use local regex-based detection (faster but less accurate)
  • Enable fallback mode
  • Use smaller models for non-critical detection
Workaround: Disable OrionFence API for low-latency endpoints

3. Cache Hit Rate

Issue: Cache effectiveness depends on request similarity Typical Hit Rates:
  • Identical requests: 90-100%
  • Similar requests (different parameters): 0%
  • Streaming requests: 0% (not cacheable)
Impact: Limited cache effectiveness for diverse workloads Mitigation:
  • Use cache for repeated queries (FAQ, common questions)
  • Implement request normalization
  • Increase cache TTL for stable responses
Planned: Semantic caching (similar requests) in Q3 2025

4. Memory Usage

Issue: In-memory store requires all metadata to fit in RAM Current Usage:
  • Baseline: ~100MB
  • Per 1000 API keys: ~5MB
  • Per 10000 request logs: ~50MB
  • Per provider model catalog: ~1MB
Impact: Memory limits scale of metadata Recommendation:
  • < 10,000 API keys: 2GB RAM sufficient
  • < 100,000 API keys: 4GB RAM sufficient
  • 100,000 API keys: Consider external database
Planned: Hybrid storage (hot data in memory, cold in database) in Q3 2025

Scalability Constraints

Single Redis instance and in-memory log storage are the primary scalability bottlenecks. Plan for Redis Cluster and external log storage for large deployments.

1. Single Redis Instance

Issue: Single Redis instance is a bottleneck and single point of failure Impact:
  • Limited throughput (~50,000 ops/sec)
  • No automatic failover
  • Data loss if Redis crashes (without persistence)
Mitigation:
  • Enable Redis persistence (AOF + RDB)
  • Use Redis Sentinel for automatic failover
  • Use Redis Cluster for horizontal scaling
Planned: Redis Cluster support in Q2 2025

2. Request Log Storage

Issue: Logs stored in-memory with limited retention Current Limits:
  • In-memory: Last 10,000 requests
  • Redis: Configurable TTL (default 30 days)
Impact: Cannot query old logs beyond retention period Workaround: Export logs to external system (Elasticsearch, S3, etc.) Planned: External log storage integration (Q3 2025)

3. Concurrent Request Limit

Issue: Node.js event loop limits concurrent processing Typical Limits:
  • CPU-bound: ~1,000 requests/sec per core
  • I/O-bound: ~5,000 requests/sec per core
Impact: Need multiple instances for high throughput Mitigation:
  • Horizontal scaling (add more instances)
  • Use clustering (Node.js cluster module)
  • Optimize middleware (reduce processing time)
Recommendation: 1 instance per 5,000 req/s

4. WebSocket Limitations

Issue: No WebSocket support for real-time bidirectional communication Impact: Cannot use WebSockets for streaming or real-time updates Workaround: Use Server-Sent Events (SSE) for server-to-client streaming Planned: WebSocket support for admin UI in Q3 2025

Known Bugs and Workarounds

Issue: Streaming responses may occasionally include duplicate chunksAffected: Anthropic provider, streaming modeWorkaround: Client-side deduplication based on chunk IDsStatus: Fix planned for next release
Issue: Rate limit reset time in headers may be slightly inaccurateAffected: X-RateLimit-Reset-Requests headerImpact: Off by a few seconds (~5s)Workaround: Add 10s buffer when checking reset timeStatus: Low priority, cosmetic issue
Issue: Requests >1MB may timeout with default settingsAffected: Image generation, large context windowsWorkaround: Increase timeouts:
GATEWAY_REQUEST_TIMEOUT_MS=300000
GATEWAY_COMPLETION_TIMEOUT_MS=240000
Status: Working as designed, configurable
Issue: Model discovery on startup can be slow (2-5 seconds per provider)Impact: Delayed startup timeWorkaround: Disable automatic discovery:
GATEWAY_ENABLE_MODEL_DISCOVERY=0
Manually configure models in provider settings.Status: Optimization planned for Q2 2025
Issue: MCP sessions timeout after 25 seconds of inactivityImpact: Need to reinitialize frequently used MCP serversWorkaround: Increase session TTL:
MCP_SESSION_TTL_MS=60000
Status: Working as designed, configurable

Browser Compatibility

Admin UI Browser Support

Fully Supported:
  • Chrome 90+
  • Firefox 88+
  • Safari 14+
  • Edge 90+
Partial Support:
  • Safari 13 (missing some CSS features)
  • Firefox 85-87 (missing some features)
Not Supported:
  • Internet Explorer (all versions)
  • Chrome < 90
  • Safari < 13
Known Issues:
  • Safari 13: Dark mode toggle animation broken
  • Firefox < 88: WebSocket status indicator not working
  • Mobile browsers: Limited support for playground features
Recommendation: Use latest Chrome or Firefox for best experience

MCP Limitations

1. MCP Protocol Version

Issue: Only supports MCP protocol version 2024-11-05 Impact: Incompatible with older MCP servers Workaround: Update MCP servers to latest version Planned: Backward compatibility in Q2 2025

2. MCP Server Types

Supported:
  • Remote HTTP MCP servers
  • Stdio-based MCP servers (via bridge)
Not Supported:
  • WebSocket MCP servers
  • gRPC MCP servers
Impact: Cannot use WebSocket or gRPC-based MCP servers Workaround: Convert to HTTP or stdio Planned: WebSocket support in Q3 2025

3. MCP Tool Size Limits

Issue: MCP tool responses limited to 512KB Configured: MCP_MAX_REQUEST_BYTES=524288 Impact: Cannot return large tool responses Workaround: Increase limit or stream responses Status: Working as designed, configurable

4. MCP Session Management

Issue: Sessions not shared across gateway instances Impact: Each gateway instance creates separate MCP sessions Workaround: Use session affinity (sticky sessions) in load balancer Planned: Shared session management in Q3 2025

Caching Limitations

1. Cache Key Strategy

Issue: Cache key is exact request hash (no semantic similarity) Impact: Minor request variations bypass cache Example:
  • “What is 2+2?” (cache miss)
  • “What is two plus two?” (cache miss, different wording)
Workaround: Normalize requests client-side Planned: Semantic caching in Q3 2025

2. Streaming Responses

Issue: Streaming responses are not cached Impact: No cache benefit for streaming requests Workaround: Use non-streaming requests when possible Status: Cannot cache streaming due to SSE protocol

3. Cache Invalidation

Issue: No automatic cache invalidation on configuration changes Impact: Stale cached responses after model changes Workaround: Manual cache clear after config changes:
curl -X POST http://localhost:8000/cache/clear \
  -H "Authorization: Bearer $SESSION_TOKEN"
Planned: Automatic invalidation in Q2 2025

4. Cache Size Limits

Issue: In-memory cache size limited to prevent memory exhaustion Default: 10,000 entries Impact: Cache eviction under high load Workaround: Use Redis cache for larger capacity:
CACHE_TYPE=redis
Status: Working as designed, configurable

Future Improvements

See below for planned improvements to address these limitations.

Short-Term (0-3 months)

  • Expand provider support to 50+ providers
  • Improve API documentation
  • Optimize guardrail performance
  • Add batch processing API

Medium-Term (3-6 months)

  • Vector store integration
  • Advanced routing strategies
  • Redis Cluster support
  • WebSocket support

Long-Term (6-12 months)

  • Semantic caching
  • Fine-tuning API
  • Multi-modal expansion
  • Auto-scaling optimizations

Reporting Issues

Found a bug or limitation not listed here?
  1. Check GitHub Issues: https://github.com/guardwayai/agsec/issues
  2. Search Documentation: May already be documented
  3. Create Issue: Include:
    • Description of issue
    • Steps to reproduce
    • Expected vs actual behavior
    • Environment details (OS, Docker version, etc.)
    • Logs/screenshots
Security Issues: Email security@guardway.io (do not create public issue)
For workarounds and best practices, see: