Glossary

LLM/AI Terms

Adapter

A software component that translates between AgSec’s unified API format and a provider’s specific API format. Each provider (OpenAI, Anthropic, etc.) has its own adapter.

Chat Completion

The primary LLM interaction where a model generates a response based on a conversation history (messages). Also called “chat” or “completion.”

Completion Tokens

The number of tokens in the generated response from the LLM. Typically costs more per token than prompt tokens.

Context Length

The maximum number of tokens (prompt + completion) that a model can process in a single request. For example, GPT-4 has a 128K token context length.

Embeddings

Dense vector representations of text that capture semantic meaning. Used for similarity search, clustering, and retrieval-augmented generation (RAG).

Few-Shot Learning

Providing a model with examples in the prompt to guide its behavior without fine-tuning. For example, showing 3 examples of the desired output format.

Fine-Tuning

The process of training a base model on domain-specific data to specialize its behavior. Results in a custom model.

Function Calling

See Tool Use.

Guardrails

Security and safety mechanisms that validate, filter, or block LLM inputs and outputs. Examples: PII detection, hate speech filtering, prompt injection detection.

Hallucination

When an LLM generates false or nonsensical information presented as fact. A key challenge in production LLM deployments.

JSON Mode

A feature where the LLM is constrained to output valid JSON only. Useful for structured data extraction.

Max Tokens

The maximum number of tokens the model can generate in its response. Acts as a cost control and prevents runaway generation.

Message

A unit in a conversation with an LLM, consisting of a role (system, user, assistant, or tool) and content (the text).

Model

A trained neural network capable of text generation, embeddings, image generation, or other AI tasks. Examples: GPT-4, Claude 3, Llama 2.

Moderation

Content filtering to detect harmful, unsafe, or inappropriate content. Can be applied to inputs (user prompts) or outputs (LLM responses).

Prompt

The input text sent to an LLM. Can include instructions, examples, and the actual query.

Prompt Engineering

The practice of crafting effective prompts to get desired behaviors from LLMs without fine-tuning.

Prompt Injection

A security attack where malicious input attempts to override the system prompt or manipulate the LLM’s behavior.

Prompt Injection vs. Jailbreaking: Prompt injection targets the system prompt to change the model’s behavior, while jailbreaking attempts to bypass safety guardrails. Both are security concerns addressed by Guardway.

Prompt Tokens

The number of tokens in the input sent to the LLM. Typically costs less per token than completion tokens.

Provider

A company or service that offers LLM APIs. Examples: OpenAI, Anthropic, Google, Cohere, Groq.

RAG (Retrieval-Augmented Generation)

A technique where relevant documents are retrieved from a knowledge base and included in the prompt to ground the LLM’s response in factual information.

Semantic Cache

A caching system that matches queries based on meaning rather than exact text match. Uses embeddings to find similar queries.

Stop Sequence

A string that, when generated by the model, signals the end of generation. Used to create structured outputs.

Streaming

Sending the LLM response in chunks as it’s generated, rather than waiting for the complete response. Improves perceived latency.

System Prompt

Instructions given to the LLM that set its behavior, persona, and constraints. Typically the first message in a conversation.

Temperature

A parameter (0-2) that controls randomness in generation. Higher = more creative/random, lower = more deterministic/focused.

Token

The basic unit of text processing for LLMs. Roughly 4 characters or 0.75 words in English. Tokenization varies by model.

Tokens vs. Words: A common source of confusion. Tokens are not words — they are sub-word units. The word “unbelievable” might be split into multiple tokens. Always check the model’s tokenizer for exact counts.

Tool Use

The ability for an LLM to call external functions/APIs. The model decides when to use a tool, AgSec calls it, and the result is fed back to the model.

Top-K

Sampling strategy where only the K most likely next tokens are considered. Reduces randomness.

Top-P (Nucleus Sampling)

Sampling strategy where tokens are selected from the smallest set whose cumulative probability exceeds P. More dynamic than top-K.

Vision

The ability for an LLM to process and understand images in addition to text. Example: GPT-4V, Claude 3.

Zero-Shot Learning

Using an LLM without providing examples, relying solely on instructions. The model must generalize from its training.

Gateway Terms

API Key

A secret token used to authenticate requests to the AgSec gateway. Each key can have quotas, budgets, and access controls.

Budget

A spending limit (in dollars) associated with an API key or team. Requests are blocked when the budget is exceeded.

Failover

The automatic switching to a backup provider when the primary provider fails or is unavailable.

Gateway

The central service that receives client requests, applies security policies, routes to providers, and returns responses. The core of AgSec.

Health Check

An endpoint (/health) that reports the operational status of the gateway and its dependencies (Redis, providers).

Latency

The time between sending a request and receiving a complete response. Measured in milliseconds (ms).

Middleware

Software components that intercept and process requests before they reach route handlers. Examples: authentication, rate limiting, logging.

Multi-tenancy

The ability to serve multiple independent customers (tenants) from a single gateway instance with isolation and access controls.

Quota

A limit on the number of requests allowed within a time period. Can be per-API-key, per-user, or per-team.

Rate Limiting

Restricting the number of requests allowed in a time window to prevent abuse and manage load. Can limit by requests/minute or tokens/minute.

Rate Limiting vs. Quota: Rate limiting controls the speed of requests (e.g., 100 requests/minute), while quotas control the total volume (e.g., 10,000 requests/month). Both are important for cost control and abuse prevention.

Routing

The process of selecting which provider and model to use for a request based on rules, strategies, or load balancing.

Routing Rule

A configuration that maps requests to specific providers based on patterns (model name, user, tags, etc.).

Routing Strategy

An algorithm for selecting providers. Examples: lowest-cost, lowest-latency, least-busy, priority-based.

Sanitization

The process of removing or redacting sensitive information (like PII) from text while preserving the rest of the content.

Store

AgSec’s data persistence layer, backed by Redis, that holds configuration, keys, logs, and metrics.

Throughput

The number of requests processed per unit of time, typically measured in requests per second (req/sec).

Webhook

An HTTP callback that AgSec can trigger when certain events occur (quota exceeded, budget threshold, etc.).

Security Terms

AES-256-GCM

Advanced Encryption Standard with 256-bit keys in Galois/Counter Mode. Used by AgSec to encrypt API keys and secrets at rest.

AppArmor

A Linux kernel security module that confines programs to a limited set of resources. Used in AgSec’s container hardening.

Attack Surface

The sum of all points where an unauthorized user could try to enter or extract data from a system.

Authentication

Verifying the identity of a user or system. In AgSec, this is done via API keys.

Authorization

Determining what actions an authenticated user is allowed to perform. In AgSec, this is per-API-key permissions.

Authentication vs. Authorization: Authentication answers “Who are you?” while authorization answers “What are you allowed to do?” Both are required for secure API access.

Capabilities (Linux)

Fine-grained privileges that can be granted to processes instead of full root access. AgSec drops unnecessary capabilities.

Defense in Depth

A security strategy employing multiple layers of defense so that if one layer fails, others still provide protection.

Encryption at Rest

Encrypting data when it’s stored (e.g., API keys in Redis) so it’s unreadable without the decryption key.

Encryption in Transit

Encrypting data while it’s being transmitted over the network, typically using TLS/HTTPS.

Fail-Closed

A security posture where errors cause requests to be blocked. More secure but less available.

Fail-Open

A security posture where errors allow requests to proceed. More available but less secure.

Fail-Closed vs. Fail-Open: These are opposite security postures. Fail-closed blocks requests on error (more secure), while fail-open allows them (more available). Choose based on your security requirements.

Least Privilege

The principle of granting only the minimum permissions necessary for a task. Applied to processes, users, and API keys.

Non-root User

Running processes as a non-privileged user rather than root to limit the impact of security breaches. AgSec containers use UID 1001.

PII (Personally Identifiable Information)

Data that can be used to identify an individual. Examples: SSN, email, phone number, name, address.

RBAC (Role-Based Access Control)

An access control approach where permissions are assigned to roles, and users are assigned to roles.

Read-only Root Filesystem

A security hardening technique where the container’s root filesystem cannot be modified at runtime, preventing certain types of attacks.

Secrets Management

Secure storage, access control, and rotation of sensitive data like API keys, passwords, and certificates.

Seccomp (Secure Computing Mode)

A Linux kernel feature that limits the system calls a process can make. AgSec uses a restricted seccomp profile.

TLS (Transport Layer Security)

Cryptographic protocol for secure communication over networks. HTTPS uses TLS.

Zero Trust

A security model that assumes no implicit trust and requires verification for every access request, regardless of location.

Observability Terms

Cardinality

The number of unique values for a dimension in time series data. High cardinality (many unique values) can cause performance issues.

Dashboard

A visual interface displaying metrics, graphs, and charts for monitoring system health and performance.

Distributed Tracing

Following a request as it flows through multiple services, collecting timing and metadata at each step.

Event

A discrete occurrence in the system, such as a request received, error encountered, or threshold exceeded.

Grafana

An open-source platform for visualizing metrics and logs, commonly used with Prometheus.

Instrumentation

Adding code to collect metrics, logs, and traces from an application.

Log Aggregation

Collecting logs from multiple sources into a centralized system for search and analysis.

Logging

Recording events, errors, and diagnostic information to files or logging services.

Metric

A numerical measurement collected over time. Examples: request count, latency, memory usage.

OpenTelemetry

An observability framework for generating, collecting, and exporting telemetry data (metrics, logs, traces).

P50 (50th Percentile / Median)

The value below which 50% of observations fall. Represents typical performance.

P95 (95th Percentile)

The value below which 95% of observations fall. Represents near-worst-case performance, filtering outliers.

P99 (99th Percentile)

The value below which 99% of observations fall. Represents worst-case performance for most requests.

Prometheus

An open-source monitoring and alerting system that collects and stores metrics as time series data.

Span

A single operation within a distributed trace, with start/end times and metadata.

Structured Logging

Logging in a machine-readable format (typically JSON) with consistent fields, enabling better search and analysis.

Time Series

A sequence of data points indexed by time, used for metrics like request rate over time.

Trace

A record of the path a request takes through multiple services, consisting of multiple spans.

Trace ID

A unique identifier for a distributed trace, used to correlate spans across services.

MCP Terms

JSON-RPC

A remote procedure call protocol encoded in JSON. Used by MCP for client-server communication.

MCP (Model Context Protocol)

A protocol that allows LLMs to interact with external tools, data sources, and services in a standardized way.

MCP Server

A service that implements the MCP protocol and exposes tools, resources, or prompts to clients.

Prompt (MCP)

A pre-defined prompt template provided by an MCP server that clients can use.

Resource (MCP)

A data source or document that an MCP server makes available to clients (e.g., files, database records).

Session

A stateful connection between an MCP client and server, maintaining context across multiple requests.

stdio Transport

Communication via standard input/output streams. Used by Python and Node.js MCP servers.

Tool (MCP)

A function that an MCP server exposes to clients. The LLM can call tools to perform actions or retrieve information.

Tool Filter

Access control rules that restrict which MCP tools are available to specific API keys.

Infrastructure Terms

Autoscaling

Automatically adjusting the number of running instances based on load metrics like CPU or request rate.

Container

A lightweight, standalone package containing an application and its dependencies. AgSec uses Docker containers.

Container Hardening

Security measures applied to containers, such as running as non-root, read-only filesystems, and capability dropping.

Docker

A platform for building, shipping, and running containerized applications.

Docker Compose

A tool for defining and running multi-container Docker applications using a YAML configuration file.

Health Probe

A check performed by orchestrators (Kubernetes) to determine if a container is healthy and should receive traffic.

Horizontal Scaling

Adding more instances of a service to handle increased load. Preferred over vertical scaling for stateless services.

Image (Container)

A read-only template used to create containers. Contains the application code and dependencies.

Infrastructure as Code (IaC)

Managing infrastructure through code (e.g., Terraform, CloudFormation) rather than manual processes.

Kubernetes (K8s)

An open-source container orchestration platform for automating deployment, scaling, and management of containerized applications.

Liveness Probe

A health check that determines if a container is running. Failed liveness probes cause the container to be restarted.

Load Balancer

A component that distributes incoming requests across multiple instances to ensure no single instance is overloaded.

Namespace (Kubernetes)

A way to divide cluster resources between multiple users or teams. Provides scope for names.

Orchestration

Automated configuration, coordination, and management of computer systems and services. Kubernetes is an orchestrator.

Pod (Kubernetes)

The smallest deployable unit in Kubernetes, consisting of one or more containers.

Readiness Probe

A health check that determines if a container is ready to receive traffic. Failed readiness probes remove the container from load balancing.

Replica

An identical copy of a service instance. Multiple replicas provide redundancy and increased capacity.

Service Mesh

Infrastructure layer providing features like traffic management, security, and observability for microservices. Examples: Istio, Linkerd.

Vertical Scaling

Increasing the resources (CPU, memory) of existing instances rather than adding more instances.

Volume

Persistent storage that can be attached to containers. Used for data that should survive container restarts.

HTTP/API Terms

API (Application Programming Interface)

A set of protocols and tools for building software applications. RESTful APIs use HTTP.

Endpoint

A specific URL path that accepts requests. Example: /v1/chat/completions.Metadata sent with HTTP requests and responses. Examples: Content-Type, Authorization.

HTTP Method

The action to perform on a resource. Common methods: GET (read), POST (create), PUT (update), DELETE (remove), PATCH (partial update).

HTTP Status Code

A three-digit code indicating the result of an HTTP request:

2xx: Success (200 OK, 201 Created)
3xx: Redirection
4xx: Client error (400 Bad Request, 401 Unauthorized, 404 Not Found)
5xx: Server error (500 Internal Server Error, 503 Service Unavailable)

Idempotency

The property where multiple identical requests have the same effect as a single request. GET, PUT, and DELETE are idempotent.

JSON (JavaScript Object Notation)

A lightweight data interchange format that’s easy for humans to read and write and easy for machines to parse and generate.

OpenAPI

A specification for describing RESTful APIs in a machine-readable format. Formerly known as Swagger.

Query Parameter

Data passed in the URL after a ?. Example: /search?q=query&limit=10.

Request Body

The data sent with POST/PUT/PATCH requests, typically in JSON format.

REST (Representational State Transfer)

An architectural style for distributed systems where resources are identified by URLs and manipulated using standard HTTP methods.

SSE (Server-Sent Events)

A standard for servers to push real-time updates to clients over HTTP. Used for streaming LLM responses.

Webhook

An HTTP callback that occurs when something happens. AgSec sends webhooks for events like quota exceeded.

Database Terms

Cache

Temporary storage for frequently accessed data to reduce latency and load on primary data sources.

Cache Eviction

The process of removing entries from a cache when it reaches capacity. Policies include LRU (Least Recently Used).

Cache Hit

When requested data is found in the cache, avoiding a slower lookup in the primary data source.

Cache Miss

When requested data is not in the cache, requiring a lookup in the primary data source.

Cache Hit vs. Cache Miss: A cache hit means the data was found in cache (fast), while a cache miss means it wasn’t (requires slower primary lookup). High hit rates indicate effective caching.

Cache Warming

Pre-populating a cache with data before it’s requested to improve hit rates.

Cardinality (Database)

The number of unique values in a column or field. High cardinality can impact index performance.

Connection Pool

A cache of database connections maintained so that connections can be reused when needed, reducing connection overhead.

Hash (Data Structure)

A data structure that maps keys to values using a hash function. Redis supports hash data types.

Index

A data structure that improves the speed of data retrieval operations. Created on specific fields for faster lookups.

Key-Value Store

A database that stores data as a collection of key-value pairs. Redis is a key-value store.

LRU (Least Recently Used)

A cache eviction policy that removes the least recently accessed items first when the cache is full.

Persistence

Saving data to disk so it survives process restarts. Redis supports RDB (snapshots) and AOF (append-only file) persistence.

Pipeline

Sending multiple commands to a database in a single request/response cycle. Reduces network round trips.

Redis

An in-memory key-value database used for caching, session storage, and real-time data. AgSec uses Redis for all state.

Replication

Copying data from one database to another to provide redundancy and increase availability.

Schema

The structure of a database, defining tables, fields, relationships, and constraints.

TTL (Time To Live)

The duration for which a cache entry or database record remains valid before expiring and being removed.

Vector Database

A database optimized for storing and querying high-dimensional vectors (embeddings). Examples: Qdrant, Pinecone, Weaviate.

Getting Started

Guardway Gateway

Troubleshooting

Reference

​Glossary

​Adapter

​Chat Completion

​Completion Tokens

​Context Length

​Embeddings

​Few-Shot Learning

​Fine-Tuning

​Function Calling

​Guardrails

​Hallucination

​JSON Mode

​Max Tokens

​Message

​Model

​Moderation

​Prompt

​Prompt Engineering

​Prompt Injection

​Prompt Tokens

​Provider

​RAG (Retrieval-Augmented Generation)

​Semantic Cache

​Stop Sequence

​Streaming

​System Prompt

​Temperature

​Token

​Tool Use

​Top-K

​Top-P (Nucleus Sampling)

​Vision

​Zero-Shot Learning

​API Key

​Budget

​Failover

​Gateway

​Health Check

​Latency

​Middleware

​Multi-tenancy

​Quota

​Rate Limiting

​Routing

​Routing Rule

​Routing Strategy

​Sanitization

​Store

​Throughput

​Webhook

​AES-256-GCM

​AppArmor

​Attack Surface

​Authentication

​Authorization

​Capabilities (Linux)

​Defense in Depth

​Encryption at Rest

​Encryption in Transit

​Fail-Closed

​Fail-Open

​Least Privilege

​Non-root User

​PII (Personally Identifiable Information)

​RBAC (Role-Based Access Control)

​Read-only Root Filesystem

​Secrets Management

​Seccomp (Secure Computing Mode)

​TLS (Transport Layer Security)

​Zero Trust

​Cardinality

​Dashboard

​Distributed Tracing

​Event

​Grafana

​Instrumentation

Adapter

Chat Completion

Completion Tokens

Context Length

Embeddings

Few-Shot Learning

Fine-Tuning

Function Calling

Guardrails

Hallucination

JSON Mode

Max Tokens

Message

Model

Moderation

Prompt

Prompt Engineering

Prompt Injection

Prompt Tokens

Provider

RAG (Retrieval-Augmented Generation)

Semantic Cache

Stop Sequence

Streaming

System Prompt

Temperature

Token

Tool Use

Top-K

Top-P (Nucleus Sampling)

Vision

Zero-Shot Learning

API Key

Budget

Failover

Gateway

Health Check

Latency

Middleware

Multi-tenancy

Quota

Rate Limiting

Routing

Routing Rule

Routing Strategy

Sanitization

Store

Throughput

Webhook

AES-256-GCM

AppArmor

Attack Surface

Authentication

Authorization

Capabilities (Linux)

Defense in Depth

Encryption at Rest

Encryption in Transit

Fail-Closed

Fail-Open

Least Privilege

Non-root User

PII (Personally Identifiable Information)

RBAC (Role-Based Access Control)

Read-only Root Filesystem

Secrets Management

Seccomp (Secure Computing Mode)

TLS (Transport Layer Security)

Zero Trust

Cardinality

Dashboard

Distributed Tracing

Event

Grafana

Instrumentation

Log Aggregation

Logging

Metric

OpenTelemetry