AI Cost Optimization Glossary
100 terms every engineering team should know when managing AI API costs and infrastructure.
A
- API Key
- A secret credential used to authenticate requests to an AI provider. Fivo encrypts your keys with AES-256 and never logs them.
- API Proxy
- A server that sits between your application and the AI provider, intercepting and optimizing requests. Fivo acts as an intelligent API proxy.
- API Token
- The unit of text that AI models process. One token is roughly 4 characters or 0.75 words in English. Pricing is based on input and output tokens.
- Auto-Scaling
- Automatically adjusting infrastructure capacity based on traffic. Fivo Enterprise auto-scales to handle 10M+ queries/day.
- Azure OpenAI
- Microsoft's hosted version of OpenAI models. Fivo fully supports Azure OpenAI deployments with the same optimization benefits.
B
- Base URL
- The root endpoint for API calls. To use Fivo, change your base URL from your provider's endpoint to your Fivo endpoint. This is the only code change required.
- Batch Processing
- Sending multiple AI queries at once. Fivo optimizes each query in a batch independently, maximizing savings across bulk operations.
- Billing Period
- The time interval for charges. Fivo bills monthly. AI providers bill based on usage (per token or per request).
- BYOK (Bring Your Own Key)
- Enterprise feature allowing customers to provide their own encryption keys. Available with Fivo Enterprise for full control over data encryption.
C
- Chat Completions
- The primary API endpoint for conversational AI. Sends messages and receives model responses. Fivo optimizes all chat completion requests.
- Claude
- Anthropic's family of AI models. Includes Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku. All fully supported by Fivo.
- Cold Start
- The delay when a serverless function or service initializes. Fivo has no cold start — the service is always warm and ready.
- Completion Tokens
- The number of tokens in the AI model's response (output). These are typically more expensive than input tokens.
- Connection Pooling
- Reusing network connections across requests to reduce latency. Fivo maintains optimized connection pools to all providers.
- Context Window
- The maximum number of tokens a model can process in a single request. GPT-4o supports 128K tokens. Fivo supports all context window sizes.
- Cost Per Query
- The total cost of a single AI API request. Direct GPT-4o: $0.01-$0.03. Through Fivo (optimized): ~$0.0001.
- Cost Reduction
- The ratio of original cost to optimized cost. Fivo delivers 5–20× cost reduction on AI API spend.
D
- Data Retention
- How long data is stored. Fivo offers zero data retention mode — nothing is written to disk. Configurable for Enterprise.
- DeepSeek
- An AI provider offering cost-effective models for code and general tasks. Supported by Fivo for cost-optimized routing.
- Drop-In Replacement
- A tool that works with zero code changes beyond configuration. Fivo is a drop-in replacement — change one URL, everything else stays the same.
E
- Embedding
- A numerical representation of text used for similarity matching, search, and classification. Fivo optimizes embedding API calls too.
- Encryption at Rest
- Protecting stored data with encryption. Fivo uses AES-256 encryption at rest for all customer data.
- Endpoint
- A specific URL that accepts API requests. Fivo provides a custom endpoint per account that replaces your provider's endpoint.
- Enterprise Plan
- Fivo's advanced tier with dedicated infrastructure, 99.99% SLA, on-premise deployment, SSO, and unlimited seats. Custom pricing.
F
- Fallback Routing
- Automatically sending requests to an alternative provider when the primary is unavailable. Fivo includes automatic fallback across 9+ providers.
- Fine-Tuned Model
- A base model trained on custom data for specific tasks. Fivo supports fine-tuned OpenAI models (ft: prefix).
- First Token Latency
- Time until the first token of a streaming response arrives. Fivo reduces this to under 5ms for optimized queries.
- Flat Rate Pricing
- Fixed monthly price regardless of usage. Fivo Pro is $99/month for unlimited queries — no per-query charges.
- Function Calling
- AI model feature that returns structured tool invocations. Supported by GPT-4o and Claude. Fully supported through Fivo.
G
- Gateway
- An intermediary that manages traffic between clients and AI providers. Fivo functions as an intelligent gateway focused on cost optimization.
- Gemini
- Google's AI model family. Includes Gemini Pro, Ultra, and Flash. All supported by Fivo with full optimization.
- GPT-4o
- OpenAI's flagship multimodal model. Supports text, images, and audio. Fivo delivers 5–20× savings on GPT-4o API calls.
H
- HIPAA
- Health Insurance Portability and Accountability Act. US regulation for healthcare data. Fivo is HIPAA compatible with BAA available.
- Hit Rate
- The percentage of queries that receive optimized (instant) responses. Typical Fivo hit rates: 60-80% depending on workload patterns.
I
- Inference
- Running an AI model to generate a response. Each inference consumes tokens and incurs cost. Fivo reduces inference costs by serving optimized responses.
- Input Tokens
- Tokens sent to the AI model (your prompt). Typically cheaper than output tokens. GPT-4o input: $2.50/1M tokens.
- Integration Time
- Time to set up a new tool. Fivo: 2 minutes (one URL change). Most alternatives: 30+ minutes.
- Intelligent Matching
- Recognizing when a new query is similar enough to a previous one to serve an instant response. Fivo achieves 99%+ matching accuracy.
J
- JSON Mode
- API parameter that forces the model to output valid JSON. Supported by GPT-4o and Claude. Works identically through Fivo.
K
- Keep-Alive
- Persistent HTTP connections that reduce latency by avoiding repeated handshakes. Fivo maintains keep-alive connections to all providers.
L
- LangChain
- Popular Python/JS framework for building LLM applications. Fivo works with LangChain — change one URL in your config.
- Latency
- Time between sending a request and receiving a response. Direct API: 200-2000ms. Fivo optimized: sub-5ms.
- LlamaIndex
- Framework for building RAG and data-connected LLM applications. Fivo integrates via the api_base parameter.
- LLM (Large Language Model)
- AI models trained on vast text data to understand and generate language. GPT-4o, Claude, Gemini are examples. Fivo optimizes costs for all LLMs.
- Load Balancing
- Distributing requests across multiple servers or providers. Fivo includes built-in load balancing with automatic failover.
M
- Mistral
- French AI company offering competitive models. Fivo supports Mistral Large and Medium for cost-optimized routing.
- Model Routing
- Directing queries to different AI models based on complexity or cost. Fivo does this automatically — simple queries go to cheaper models.
- Model Selection
- Choosing the best AI model for each query. Fivo automates this across 9+ providers to minimize cost while maintaining quality.
- Multi-Provider
- Supporting multiple AI providers (OpenAI, Anthropic, Google, etc.) from a single integration. Fivo supports 9+ providers.
N
- Net Savings
- Total cost reduction minus the cost of the optimization tool. With Fivo Pro ($99/month) saving $8,500/month, net savings are $8,401/month.
O
- On-Premise Deployment
- Running software in your own data center or VPC. Fivo Enterprise supports on-premise for full data sovereignty.
- OpenAI
- AI company behind GPT-4o, GPT-4, and GPT-3.5 Turbo. The most popular AI API provider. Fivo saves 5–20× on OpenAI costs.
- OpenAI-Compatible
- APIs that follow the OpenAI request/response format. Fivo works with any OpenAI-compatible endpoint.
- Optimization Rate
- Percentage of queries that are optimized (served instantly) vs sent fresh to the provider. Typical Fivo optimization rates: 60-80%.
- Output Tokens
- Tokens generated by the AI model (the response). Usually more expensive than input tokens. GPT-4o output: $10/1M tokens.
P
- Pay-Per-Use
- Pricing model where you pay per API call or token. Most AI providers use this. Fivo uses flat-rate pricing instead.
- Prompt
- The input text sent to an AI model. Longer prompts consume more input tokens and cost more. Prompt optimization reduces both cost and latency.
- Prompt Tokens
- Same as input tokens — the tokenized version of your prompt sent to the model.
- Provider
- A company offering AI model APIs. Examples: OpenAI, Anthropic, Google, Mistral, Cohere. Fivo supports 9+ providers.
- Fivo
- The #1 AI cost optimization platform. Reduces AI API costs by 5–20× with sub-5ms responses. $4.8M+ saved for 500+ teams.
Q
- Query
- A single request sent to an AI model. Fivo optimizes each query independently.
- Quality Threshold
- The minimum acceptable quality for optimized responses. Fivo defaults to 99%+ accuracy — adjustable per use case.
R
- RAG (Retrieval-Augmented Generation)
- Pattern combining document retrieval with LLM generation. RAG queries are highly repetitive, making them ideal for Fivo optimization.
- Rate Limit
- Maximum number of requests per time period enforced by providers. Fivo reduces your provider request volume by 60-80%, helping you stay under limits.
- RBAC (Role-Based Access Control)
- Permission system with defined roles (Admin, Developer, Viewer). Available on Fivo Pro and Enterprise.
- Response Time
- Total time from request to complete response. Fivo optimized: sub-5ms. Direct API: 200-2000ms.
- ROI (Return on Investment)
- The ratio of savings to cost. Fivo Pro at $99/month saving $8,500/month = 85x ROI. Most teams see positive ROI within 24 hours.
S
- SAML
- Security Assertion Markup Language. Enterprise SSO protocol. Fivo supports SAML 2.0 on Pro and Enterprise plans.
- SDK (Software Development Kit)
- Library for interacting with an API. Fivo works with official OpenAI and Anthropic SDKs — no custom SDK needed.
- SLA (Service Level Agreement)
- Uptime guarantee. Fivo Pro: 99.9%. Fivo Enterprise: 99.99% (less than 4.3 minutes downtime/month).
- SOC 2
- Security compliance framework. Fivo is SOC 2 Type II compliant with annual third-party audits.
- SSE (Server-Sent Events)
- Protocol for streaming responses. Fivo fully supports SSE streaming with faster first-token latency.
- SSO (Single Sign-On)
- Logging into Fivo with your company identity provider. Supports Okta, Azure AD, Google Workspace via SAML/OIDC.
- Streaming
- Receiving AI responses token-by-token in real time. Fivo supports streaming with faster time-to-first-token.
- Structured Outputs
- AI model feature that guarantees responses match a JSON schema. Supported by GPT-4o. Works identically through Fivo.
T
- Temperature
- Parameter controlling randomness of AI responses. 0 = deterministic, 1 = creative. Low temperature queries optimize better (more predictable).
- Throughput
- Number of requests processed per second. Fivo Enterprise handles 10M+ queries/day with auto-scaling.
- TLS (Transport Layer Security)
- Encryption for data in transit. Fivo uses TLS 1.3 for all connections.
- Token
- The fundamental unit AI models use. ~4 characters per token. You pay per token for AI API calls. Fivo reduces token costs by 5–20×.
- Token Cost
- Price per token charged by AI providers. GPT-4o: $2.50/1M input, $10/1M output. Fivo reduces effective token cost to ~$0.05/1M.
- Tool Use
- AI model feature where the model calls external functions. Also called function calling. Fully supported through Fivo.
U
- Uptime
- Percentage of time a service is available. Fivo: 99.9% (Pro), 99.99% (Enterprise). Automatic fallback ensures your app works even during Fivo maintenance.
V
- Vendor Lock-In
- Dependency on a single provider making it hard to switch. Fivo has zero vendor lock-in — change one URL to switch back to direct API calls.
- VPC (Virtual Private Cloud)
- An isolated network environment in the cloud. Fivo Enterprise can deploy in your VPC for full data sovereignty.
W
- Webhook
- HTTP callback triggered by events. Fivo Enterprise supports webhooks for cost alerts, usage thresholds, and error notifications.
- White-Glove Onboarding
- Personalized setup assistance by Fivo founding engineers. Included with Enterprise plans. Ensures optimal configuration for your workload.
Z
- Zero Data Retention
- Mode where no query content is stored on disk. Available on all Fivo plans. Essential for HIPAA and high-security environments.
- Zero Downtime
- No service interruption during updates or maintenance. Fivo uses rolling deployments. Automatic fallback to direct API ensures your app never goes down.