Skip to main content

Glossary

Adapter

A software component that translates between Guardway’s unified API format and a provider’s specific API format. Each provider (OpenAI, Anthropic, etc.) has its own adapter.

Chat Completion

The primary LLM interaction where a model generates a response based on a conversation history (messages). Also called “chat” or “completion.”

Completion Tokens

The number of tokens in the generated response from the LLM. Typically costs more per token than prompt tokens.

Context Length

The maximum number of tokens (prompt + completion) that a model can process in a single request. For example, GPT-4 has a 128K token context length.

Embeddings

Dense vector representations of text that capture semantic meaning. Used for similarity search, clustering, and retrieval-augmented generation (RAG).

Few-Shot Learning

Providing a model with examples in the prompt to guide its behavior without fine-tuning. For example, showing 3 examples of the desired output format.

Fine-Tuning

The process of training a base model on domain-specific data to specialize its behavior. Results in a custom model.

Function Calling

See Tool Use.

Guardrails

Security and safety mechanisms that validate, filter, or block LLM inputs and outputs. Examples: PII detection, hate speech filtering, prompt injection detection.

Hallucination

When an LLM generates false or nonsensical information presented as fact. A key challenge in production LLM deployments.

JSON Mode

A feature where the LLM is constrained to output valid JSON only. Useful for structured data extraction.

Max Tokens

The maximum number of tokens the model can generate in its response. Acts as a cost control and prevents runaway generation.

Message

A unit in a conversation with an LLM, consisting of a role (system, user, assistant, or tool) and content (the text).

Model

A trained neural network capable of text generation, embeddings, image generation, or other AI tasks. Examples: GPT-4, Claude 3, Llama 2.

Moderation

Content filtering to detect harmful, unsafe, or inappropriate content. Can be applied to inputs (user prompts) or outputs (LLM responses).

Prompt

The input text sent to an LLM. Can include instructions, examples, and the actual query.

Prompt Engineering

The practice of crafting effective prompts to get desired behaviors from LLMs without fine-tuning.

Prompt Injection

A security attack where malicious input attempts to override the system prompt or manipulate the LLM’s behavior.
Prompt Injection vs. Jailbreaking: Prompt injection targets the system prompt to change the model’s behavior, while jailbreaking attempts to bypass safety guardrails. Both are security concerns addressed by Guardway.

Prompt Tokens

The number of tokens in the input sent to the LLM. Typically costs less per token than completion tokens.

Provider

A company or service that offers LLM APIs. Examples: OpenAI, Anthropic, Google, Cohere, Groq.

RAG (Retrieval-Augmented Generation)

A technique where relevant documents are retrieved from a knowledge base and included in the prompt to ground the LLM’s response in factual information.

Semantic Cache

A caching system that matches queries based on meaning rather than exact text match. Uses embeddings to find similar queries.

Stop Sequence

A string that, when generated by the model, signals the end of generation. Used to create structured outputs.

Streaming

Sending the LLM response in chunks as it’s generated, rather than waiting for the complete response. Improves perceived latency.

System Prompt

Instructions given to the LLM that set its behavior, persona, and constraints. Typically the first message in a conversation.

Temperature

A parameter (0-2) that controls randomness in generation. Higher = more creative/random, lower = more deterministic/focused.

Token

The basic unit of text processing for LLMs. Roughly 4 characters or 0.75 words in English. Tokenization varies by model.
Tokens vs. Words: A common source of confusion. Tokens are not words — they are sub-word units. The word “unbelievable” might be split into multiple tokens. Always check the model’s tokenizer for exact counts.

Tool Use

The ability for an LLM to call external functions/APIs. The model decides when to use a tool, Guardway calls it, and the result is fed back to the model.

Top-K

Sampling strategy where only the K most likely next tokens are considered. Reduces randomness.

Top-P (Nucleus Sampling)

Sampling strategy where tokens are selected from the smallest set whose cumulative probability exceeds P. More dynamic than top-K.

Vision

The ability for an LLM to process and understand images in addition to text. Example: GPT-4V, Claude 3.

Zero-Shot Learning

Using an LLM without providing examples, relying solely on instructions. The model must generalize from its training.

API Key

A secret token used to authenticate requests to the Guardway gateway. Each key can have quotas, budgets, and access controls.

Budget

A spending limit (in dollars) associated with an API key or team. Requests are blocked when the budget is exceeded.

Failover

The automatic switching to a backup provider when the primary provider fails or is unavailable.

Gateway

The central service that receives client requests, applies security policies, routes to providers, and returns responses. The core of Guardway.

Health Check

An endpoint (/health) that reports the operational status of the gateway and its dependencies (Redis, providers).

Latency

The time between sending a request and receiving a complete response. Measured in milliseconds (ms).

Middleware

Software components that intercept and process requests before they reach route handlers. Examples: authentication, rate limiting, logging.

Multi-tenancy

The ability to serve multiple independent customers (tenants) from a single gateway instance with isolation and access controls.

Quota

A limit on the number of requests allowed within a time period. Can be per-API-key, per-user, or per-team.

Rate Limiting

Restricting the number of requests allowed in a time window to prevent abuse and manage load. Can limit by requests/minute or tokens/minute.
Rate Limiting vs. Quota: Rate limiting controls the speed of requests (e.g., 100 requests/minute), while quotas control the total volume (e.g., 10,000 requests/month). Both are important for cost control and abuse prevention.

Routing

The process of selecting which provider and model to use for a request based on rules, strategies, or load balancing.

Routing Rule

A configuration that maps requests to specific providers based on patterns (model name, user, tags, etc.).

Routing Strategy

An algorithm for selecting providers. Examples: lowest-cost, lowest-latency, least-busy, priority-based.

Sanitization

The process of removing or redacting sensitive information (like PII) from text while preserving the rest of the content.

Store

Guardway’s data persistence layer, backed by Redis, that holds configuration, keys, logs, and metrics.

Throughput

The number of requests processed per unit of time, typically measured in requests per second (req/sec).

Webhook

An HTTP callback that Guardway can trigger when certain events occur (quota exceeded, budget threshold, etc.).

AES-256-GCM

Advanced Encryption Standard with 256-bit keys in Galois/Counter Mode. Used by Guardway to encrypt API keys and secrets at rest.

AppArmor

A Linux kernel security module that confines programs to a limited set of resources. Used in Guardway’s container hardening.

Attack Surface

The sum of all points where an unauthorized user could try to enter or extract data from a system.

Authentication

Verifying the identity of a user or system. In Guardway, this is done via API keys.

Authorization

Determining what actions an authenticated user is allowed to perform. In Guardway, this is per-API-key permissions.
Authentication vs. Authorization: Authentication answers “Who are you?” while authorization answers “What are you allowed to do?” Both are required for secure API access.

Capabilities (Linux)

Fine-grained privileges that can be granted to processes instead of full root access. Guardway drops unnecessary capabilities.

Defense in Depth

A security strategy employing multiple layers of defense so that if one layer fails, others still provide protection.

Encryption at Rest

Encrypting data when it’s stored (e.g., API keys in Redis) so it’s unreadable without the decryption key.

Encryption in Transit

Encrypting data while it’s being transmitted over the network, typically using TLS/HTTPS.

Fail-Closed

A security posture where errors cause requests to be blocked. More secure but less available.

Fail-Open

A security posture where errors allow requests to proceed. More available but less secure.
Fail-Closed vs. Fail-Open: These are opposite security postures. Fail-closed blocks requests on error (more secure), while fail-open allows them (more available). Choose based on your security requirements.

Least Privilege

The principle of granting only the minimum permissions necessary for a task. Applied to processes, users, and API keys.

Non-root User

Running processes as a non-privileged user rather than root to limit the impact of security breaches. Guardway containers use UID 1001.

PII (Personally Identifiable Information)

Data that can be used to identify an individual. Examples: SSN, email, phone number, name, address.

RBAC (Role-Based Access Control)

An access control approach where permissions are assigned to roles, and users are assigned to roles.

Read-only Root Filesystem

A security hardening technique where the container’s root filesystem cannot be modified at runtime, preventing certain types of attacks.

Secrets Management

Secure storage, access control, and rotation of sensitive data like API keys, passwords, and certificates.

Seccomp (Secure Computing Mode)

A Linux kernel feature that limits the system calls a process can make. Guardway uses a restricted seccomp profile.

TLS (Transport Layer Security)

Cryptographic protocol for secure communication over networks. HTTPS uses TLS.

Zero Trust

A security model that assumes no implicit trust and requires verification for every access request, regardless of location.

JSON-RPC

A remote procedure call protocol encoded in JSON. Used by MCP for client-server communication.

MCP (Model Context Protocol)

A protocol that allows LLMs to interact with external tools, data sources, and services in a standardized way.

MCP Server

A service that implements the MCP protocol and exposes tools, resources, or prompts to clients.

Prompt (MCP)

A pre-defined prompt template provided by an MCP server that clients can use.

Resource (MCP)

A data source or document that an MCP server makes available to clients (e.g., files, database records).

Session

A stateful connection between an MCP client and server, maintaining context across multiple requests.

stdio Transport

Communication via standard input/output streams. Used by Python and Node.js MCP servers.

Tool (MCP)

A function that an MCP server exposes to clients. The LLM can call tools to perform actions or retrieve information.

Tool Filter

Access control rules that restrict which MCP tools are available to specific API keys.