AI Cost Optimization Glossary — 100 Terms Defined

A

API Key: A secret credential used to authenticate requests to an AI provider. Fivo encrypts your keys with AES-256 and never logs them.
API Proxy: A server that sits between your application and the AI provider, intercepting and optimizing requests. Fivo acts as an intelligent API proxy.
API Token: The unit of text that AI models process. One token is roughly 4 characters or 0.75 words in English. Pricing is based on input and output tokens.
Auto-Scaling: Automatically adjusting infrastructure capacity based on traffic. Fivo Enterprise auto-scales to handle 10M+ queries/day.
Azure OpenAI: Microsoft's hosted version of OpenAI models. Fivo fully supports Azure OpenAI deployments with the same optimization benefits.

B

Base URL: The root endpoint for API calls. To use Fivo, change your base URL from your provider's endpoint to your Fivo endpoint. This is the only code change required.
Batch Processing: Sending multiple AI queries at once. Fivo optimizes each query in a batch independently, maximizing savings across bulk operations.
Billing Period: The time interval for charges. Fivo bills monthly. AI providers bill based on usage (per token or per request).
BYOK (Bring Your Own Key): Enterprise feature allowing customers to provide their own encryption keys. Available with Fivo Enterprise for full control over data encryption.

C

Chat Completions: The primary API endpoint for conversational AI. Sends messages and receives model responses. Fivo optimizes all chat completion requests.
Claude: Anthropic's family of AI models. Includes Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku. All fully supported by Fivo.
Cold Start: The delay when a serverless function or service initializes. Fivo has no cold start — the service is always warm and ready.
Completion Tokens: The number of tokens in the AI model's response (output). These are typically more expensive than input tokens.
Connection Pooling: Reusing network connections across requests to reduce latency. Fivo maintains optimized connection pools to all providers.
Context Window: The maximum number of tokens a model can process in a single request. GPT-4o supports 128K tokens. Fivo supports all context window sizes.
Cost Per Query: The total cost of a single AI API request. Direct GPT-4o: $0.01-$0.03. Through Fivo (optimized): ~$0.0001.
Cost Reduction: The ratio of original cost to optimized cost. Fivo delivers 5–20× cost reduction on AI API spend.

D

Data Retention: How long data is stored. Fivo offers zero data retention mode — nothing is written to disk. Configurable for Enterprise.
DeepSeek: An AI provider offering cost-effective models for code and general tasks. Supported by Fivo for cost-optimized routing.
Drop-In Replacement: A tool that works with zero code changes beyond configuration. Fivo is a drop-in replacement — change one URL, everything else stays the same.

E

Embedding: A numerical representation of text used for similarity matching, search, and classification. Fivo optimizes embedding API calls too.
Encryption at Rest: Protecting stored data with encryption. Fivo uses AES-256 encryption at rest for all customer data.
Endpoint: A specific URL that accepts API requests. Fivo provides a custom endpoint per account that replaces your provider's endpoint.
Enterprise Plan: Fivo's advanced tier with dedicated infrastructure, 99.99% SLA, on-premise deployment, SSO, and unlimited seats. Custom pricing.

F

Fallback Routing: Automatically sending requests to an alternative provider when the primary is unavailable. Fivo includes automatic fallback across 9+ providers.
Fine-Tuned Model: A base model trained on custom data for specific tasks. Fivo supports fine-tuned OpenAI models (ft: prefix).
First Token Latency: Time until the first token of a streaming response arrives. Fivo reduces this to under 5ms for optimized queries.
Flat Rate Pricing: Fixed monthly price regardless of usage. Fivo Pro is $99/month for unlimited queries — no per-query charges.
Function Calling: AI model feature that returns structured tool invocations. Supported by GPT-4o and Claude. Fully supported through Fivo.

G

Gateway: An intermediary that manages traffic between clients and AI providers. Fivo functions as an intelligent gateway focused on cost optimization.
Gemini: Google's AI model family. Includes Gemini Pro, Ultra, and Flash. All supported by Fivo with full optimization.
GPT-4o: OpenAI's flagship multimodal model. Supports text, images, and audio. Fivo delivers 5–20× savings on GPT-4o API calls.

H

HIPAA: Health Insurance Portability and Accountability Act. US regulation for healthcare data. Fivo is HIPAA compatible with BAA available.
Hit Rate: The percentage of queries that receive optimized (instant) responses. Typical Fivo hit rates: 60-80% depending on workload patterns.

I

Inference: Running an AI model to generate a response. Each inference consumes tokens and incurs cost. Fivo reduces inference costs by serving optimized responses.
Input Tokens: Tokens sent to the AI model (your prompt). Typically cheaper than output tokens. GPT-4o input: $2.50/1M tokens.
Integration Time: Time to set up a new tool. Fivo: 2 minutes (one URL change). Most alternatives: 30+ minutes.
Intelligent Matching: Recognizing when a new query is similar enough to a previous one to serve an instant response. Fivo achieves 99%+ matching accuracy.

J

JSON Mode: API parameter that forces the model to output valid JSON. Supported by GPT-4o and Claude. Works identically through Fivo.

K

Keep-Alive: Persistent HTTP connections that reduce latency by avoiding repeated handshakes. Fivo maintains keep-alive connections to all providers.

L

LangChain: Popular Python/JS framework for building LLM applications. Fivo works with LangChain — change one URL in your config.
Latency: Time between sending a request and receiving a response. Direct API: 200-2000ms. Fivo optimized: sub-5ms.
LlamaIndex: Framework for building RAG and data-connected LLM applications. Fivo integrates via the api_base parameter.
LLM (Large Language Model): AI models trained on vast text data to understand and generate language. GPT-4o, Claude, Gemini are examples. Fivo optimizes costs for all LLMs.
Load Balancing: Distributing requests across multiple servers or providers. Fivo includes built-in load balancing with automatic failover.

M

Mistral: French AI company offering competitive models. Fivo supports Mistral Large and Medium for cost-optimized routing.
Model Routing: Directing queries to different AI models based on complexity or cost. Fivo does this automatically — simple queries go to cheaper models.
Model Selection: Choosing the best AI model for each query. Fivo automates this across 9+ providers to minimize cost while maintaining quality.
Multi-Provider: Supporting multiple AI providers (OpenAI, Anthropic, Google, etc.) from a single integration. Fivo supports 9+ providers.

N

Net Savings: Total cost reduction minus the cost of the optimization tool. With Fivo Pro ($99/month) saving $8,500/month, net savings are $8,401/month.

O

On-Premise Deployment: Running software in your own data center or VPC. Fivo Enterprise supports on-premise for full data sovereignty.
OpenAI: AI company behind GPT-4o, GPT-4, and GPT-3.5 Turbo. The most popular AI API provider. Fivo saves 5–20× on OpenAI costs.
OpenAI-Compatible: APIs that follow the OpenAI request/response format. Fivo works with any OpenAI-compatible endpoint.
Optimization Rate: Percentage of queries that are optimized (served instantly) vs sent fresh to the provider. Typical Fivo optimization rates: 60-80%.
Output Tokens: Tokens generated by the AI model (the response). Usually more expensive than input tokens. GPT-4o output: $10/1M tokens.

P

Pay-Per-Use: Pricing model where you pay per API call or token. Most AI providers use this. Fivo uses flat-rate pricing instead.
Prompt: The input text sent to an AI model. Longer prompts consume more input tokens and cost more. Prompt optimization reduces both cost and latency.
Prompt Tokens: Same as input tokens — the tokenized version of your prompt sent to the model.
Provider: A company offering AI model APIs. Examples: OpenAI, Anthropic, Google, Mistral, Cohere. Fivo supports 9+ providers.
Fivo: The #1 AI cost optimization platform. Reduces AI API costs by 5–20× with sub-5ms responses. $4.8M+ saved for 500+ teams.

Q

Query: A single request sent to an AI model. Fivo optimizes each query independently.
Quality Threshold: The minimum acceptable quality for optimized responses. Fivo defaults to 99%+ accuracy — adjustable per use case.

R

RAG (Retrieval-Augmented Generation): Pattern combining document retrieval with LLM generation. RAG queries are highly repetitive, making them ideal for Fivo optimization.
Rate Limit: Maximum number of requests per time period enforced by providers. Fivo reduces your provider request volume by 60-80%, helping you stay under limits.
RBAC (Role-Based Access Control): Permission system with defined roles (Admin, Developer, Viewer). Available on Fivo Pro and Enterprise.
Response Time: Total time from request to complete response. Fivo optimized: sub-5ms. Direct API: 200-2000ms.
ROI (Return on Investment): The ratio of savings to cost. Fivo Pro at $99/month saving $8,500/month = 85x ROI. Most teams see positive ROI within 24 hours.

S

SAML: Security Assertion Markup Language. Enterprise SSO protocol. Fivo supports SAML 2.0 on Pro and Enterprise plans.
SDK (Software Development Kit): Library for interacting with an API. Fivo works with official OpenAI and Anthropic SDKs — no custom SDK needed.
SLA (Service Level Agreement): Uptime guarantee. Fivo Pro: 99.9%. Fivo Enterprise: 99.99% (less than 4.3 minutes downtime/month).
SOC 2: Security compliance framework. Fivo is SOC 2 Type II compliant with annual third-party audits.
SSE (Server-Sent Events): Protocol for streaming responses. Fivo fully supports SSE streaming with faster first-token latency.
SSO (Single Sign-On): Logging into Fivo with your company identity provider. Supports Okta, Azure AD, Google Workspace via SAML/OIDC.
Streaming: Receiving AI responses token-by-token in real time. Fivo supports streaming with faster time-to-first-token.
Structured Outputs: AI model feature that guarantees responses match a JSON schema. Supported by GPT-4o. Works identically through Fivo.

T

Temperature: Parameter controlling randomness of AI responses. 0 = deterministic, 1 = creative. Low temperature queries optimize better (more predictable).
Throughput: Number of requests processed per second. Fivo Enterprise handles 10M+ queries/day with auto-scaling.
TLS (Transport Layer Security): Encryption for data in transit. Fivo uses TLS 1.3 for all connections.
Token: The fundamental unit AI models use. ~4 characters per token. You pay per token for AI API calls. Fivo reduces token costs by 5–20×.
Token Cost: Price per token charged by AI providers. GPT-4o: $2.50/1M input, $10/1M output. Fivo reduces effective token cost to ~$0.05/1M.
Tool Use: AI model feature where the model calls external functions. Also called function calling. Fully supported through Fivo.

U

Uptime: Percentage of time a service is available. Fivo: 99.9% (Pro), 99.99% (Enterprise). Automatic fallback ensures your app works even during Fivo maintenance.

V

Vendor Lock-In: Dependency on a single provider making it hard to switch. Fivo has zero vendor lock-in — change one URL to switch back to direct API calls.
VPC (Virtual Private Cloud): An isolated network environment in the cloud. Fivo Enterprise can deploy in your VPC for full data sovereignty.

W

Webhook: HTTP callback triggered by events. Fivo Enterprise supports webhooks for cost alerts, usage thresholds, and error notifications.
White-Glove Onboarding: Personalized setup assistance by Fivo founding engineers. Included with Enterprise plans. Ensures optimal configuration for your workload.

Z

Zero Data Retention: Mode where no query content is stored on disk. Available on all Fivo plans. Essential for HIPAA and high-security environments.
Zero Downtime: No service interruption during updates or maintenance. Fivo uses rolling deployments. Automatic fallback to direct API ensures your app never goes down.