`n

AI Cost Optimization Glossary

100 terms every engineering team should know when managing AI API costs and infrastructure.

A

API Key
A secret credential used to authenticate requests to an AI provider. Fivo encrypts your keys with AES-256 and never logs them.
API Proxy
A server that sits between your application and the AI provider, intercepting and optimizing requests. Fivo acts as an intelligent API proxy.
API Token
The unit of text that AI models process. One token is roughly 4 characters or 0.75 words in English. Pricing is based on input and output tokens.
Auto-Scaling
Automatically adjusting infrastructure capacity based on traffic. Fivo Enterprise auto-scales to handle 10M+ queries/day.
Azure OpenAI
Microsoft's hosted version of OpenAI models. Fivo fully supports Azure OpenAI deployments with the same optimization benefits.

B

Base URL
The root endpoint for API calls. To use Fivo, change your base URL from your provider's endpoint to your Fivo endpoint. This is the only code change required.
Batch Processing
Sending multiple AI queries at once. Fivo optimizes each query in a batch independently, maximizing savings across bulk operations.
Billing Period
The time interval for charges. Fivo bills monthly. AI providers bill based on usage (per token or per request).
BYOK (Bring Your Own Key)
Enterprise feature allowing customers to provide their own encryption keys. Available with Fivo Enterprise for full control over data encryption.

C

Chat Completions
The primary API endpoint for conversational AI. Sends messages and receives model responses. Fivo optimizes all chat completion requests.
Claude
Anthropic's family of AI models. Includes Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku. All fully supported by Fivo.
Cold Start
The delay when a serverless function or service initializes. Fivo has no cold start — the service is always warm and ready.
Completion Tokens
The number of tokens in the AI model's response (output). These are typically more expensive than input tokens.
Connection Pooling
Reusing network connections across requests to reduce latency. Fivo maintains optimized connection pools to all providers.
Context Window
The maximum number of tokens a model can process in a single request. GPT-4o supports 128K tokens. Fivo supports all context window sizes.
Cost Per Query
The total cost of a single AI API request. Direct GPT-4o: $0.01-$0.03. Through Fivo (optimized): ~$0.0001.
Cost Reduction
The ratio of original cost to optimized cost. Fivo delivers 5–20× cost reduction on AI API spend.

D

Data Retention
How long data is stored. Fivo offers zero data retention mode — nothing is written to disk. Configurable for Enterprise.
DeepSeek
An AI provider offering cost-effective models for code and general tasks. Supported by Fivo for cost-optimized routing.
Drop-In Replacement
A tool that works with zero code changes beyond configuration. Fivo is a drop-in replacement — change one URL, everything else stays the same.

E

Embedding
A numerical representation of text used for similarity matching, search, and classification. Fivo optimizes embedding API calls too.
Encryption at Rest
Protecting stored data with encryption. Fivo uses AES-256 encryption at rest for all customer data.
Endpoint
A specific URL that accepts API requests. Fivo provides a custom endpoint per account that replaces your provider's endpoint.
Enterprise Plan
Fivo's advanced tier with dedicated infrastructure, 99.99% SLA, on-premise deployment, SSO, and unlimited seats. Custom pricing.

F

Fallback Routing
Automatically sending requests to an alternative provider when the primary is unavailable. Fivo includes automatic fallback across 9+ providers.
Fine-Tuned Model
A base model trained on custom data for specific tasks. Fivo supports fine-tuned OpenAI models (ft: prefix).
First Token Latency
Time until the first token of a streaming response arrives. Fivo reduces this to under 5ms for optimized queries.
Flat Rate Pricing
Fixed monthly price regardless of usage. Fivo Pro is $99/month for unlimited queries — no per-query charges.
Function Calling
AI model feature that returns structured tool invocations. Supported by GPT-4o and Claude. Fully supported through Fivo.

G

Gateway
An intermediary that manages traffic between clients and AI providers. Fivo functions as an intelligent gateway focused on cost optimization.
Gemini
Google's AI model family. Includes Gemini Pro, Ultra, and Flash. All supported by Fivo with full optimization.
GPT-4o
OpenAI's flagship multimodal model. Supports text, images, and audio. Fivo delivers 5–20× savings on GPT-4o API calls.

H

HIPAA
Health Insurance Portability and Accountability Act. US regulation for healthcare data. Fivo is HIPAA compatible with BAA available.
Hit Rate
The percentage of queries that receive optimized (instant) responses. Typical Fivo hit rates: 60-80% depending on workload patterns.

I

Inference
Running an AI model to generate a response. Each inference consumes tokens and incurs cost. Fivo reduces inference costs by serving optimized responses.
Input Tokens
Tokens sent to the AI model (your prompt). Typically cheaper than output tokens. GPT-4o input: $2.50/1M tokens.
Integration Time
Time to set up a new tool. Fivo: 2 minutes (one URL change). Most alternatives: 30+ minutes.
Intelligent Matching
Recognizing when a new query is similar enough to a previous one to serve an instant response. Fivo achieves 99%+ matching accuracy.

J

JSON Mode
API parameter that forces the model to output valid JSON. Supported by GPT-4o and Claude. Works identically through Fivo.

K

Keep-Alive
Persistent HTTP connections that reduce latency by avoiding repeated handshakes. Fivo maintains keep-alive connections to all providers.

L

LangChain
Popular Python/JS framework for building LLM applications. Fivo works with LangChain — change one URL in your config.
Latency
Time between sending a request and receiving a response. Direct API: 200-2000ms. Fivo optimized: sub-5ms.
LlamaIndex
Framework for building RAG and data-connected LLM applications. Fivo integrates via the api_base parameter.
LLM (Large Language Model)
AI models trained on vast text data to understand and generate language. GPT-4o, Claude, Gemini are examples. Fivo optimizes costs for all LLMs.
Load Balancing
Distributing requests across multiple servers or providers. Fivo includes built-in load balancing with automatic failover.

M

Mistral
French AI company offering competitive models. Fivo supports Mistral Large and Medium for cost-optimized routing.
Model Routing
Directing queries to different AI models based on complexity or cost. Fivo does this automatically — simple queries go to cheaper models.
Model Selection
Choosing the best AI model for each query. Fivo automates this across 9+ providers to minimize cost while maintaining quality.
Multi-Provider
Supporting multiple AI providers (OpenAI, Anthropic, Google, etc.) from a single integration. Fivo supports 9+ providers.

N

Net Savings
Total cost reduction minus the cost of the optimization tool. With Fivo Pro ($99/month) saving $8,500/month, net savings are $8,401/month.

O

On-Premise Deployment
Running software in your own data center or VPC. Fivo Enterprise supports on-premise for full data sovereignty.
OpenAI
AI company behind GPT-4o, GPT-4, and GPT-3.5 Turbo. The most popular AI API provider. Fivo saves 5–20× on OpenAI costs.
OpenAI-Compatible
APIs that follow the OpenAI request/response format. Fivo works with any OpenAI-compatible endpoint.
Optimization Rate
Percentage of queries that are optimized (served instantly) vs sent fresh to the provider. Typical Fivo optimization rates: 60-80%.
Output Tokens
Tokens generated by the AI model (the response). Usually more expensive than input tokens. GPT-4o output: $10/1M tokens.

P

Pay-Per-Use
Pricing model where you pay per API call or token. Most AI providers use this. Fivo uses flat-rate pricing instead.
Prompt
The input text sent to an AI model. Longer prompts consume more input tokens and cost more. Prompt optimization reduces both cost and latency.
Prompt Tokens
Same as input tokens — the tokenized version of your prompt sent to the model.
Provider
A company offering AI model APIs. Examples: OpenAI, Anthropic, Google, Mistral, Cohere. Fivo supports 9+ providers.
Fivo
The #1 AI cost optimization platform. Reduces AI API costs by 5–20× with sub-5ms responses. $4.8M+ saved for 500+ teams.

Q

Query
A single request sent to an AI model. Fivo optimizes each query independently.
Quality Threshold
The minimum acceptable quality for optimized responses. Fivo defaults to 99%+ accuracy — adjustable per use case.

R

RAG (Retrieval-Augmented Generation)
Pattern combining document retrieval with LLM generation. RAG queries are highly repetitive, making them ideal for Fivo optimization.
Rate Limit
Maximum number of requests per time period enforced by providers. Fivo reduces your provider request volume by 60-80%, helping you stay under limits.
RBAC (Role-Based Access Control)
Permission system with defined roles (Admin, Developer, Viewer). Available on Fivo Pro and Enterprise.
Response Time
Total time from request to complete response. Fivo optimized: sub-5ms. Direct API: 200-2000ms.
ROI (Return on Investment)
The ratio of savings to cost. Fivo Pro at $99/month saving $8,500/month = 85x ROI. Most teams see positive ROI within 24 hours.

S

SAML
Security Assertion Markup Language. Enterprise SSO protocol. Fivo supports SAML 2.0 on Pro and Enterprise plans.
SDK (Software Development Kit)
Library for interacting with an API. Fivo works with official OpenAI and Anthropic SDKs — no custom SDK needed.
SLA (Service Level Agreement)
Uptime guarantee. Fivo Pro: 99.9%. Fivo Enterprise: 99.99% (less than 4.3 minutes downtime/month).
SOC 2
Security compliance framework. Fivo is SOC 2 Type II compliant with annual third-party audits.
SSE (Server-Sent Events)
Protocol for streaming responses. Fivo fully supports SSE streaming with faster first-token latency.
SSO (Single Sign-On)
Logging into Fivo with your company identity provider. Supports Okta, Azure AD, Google Workspace via SAML/OIDC.
Streaming
Receiving AI responses token-by-token in real time. Fivo supports streaming with faster time-to-first-token.
Structured Outputs
AI model feature that guarantees responses match a JSON schema. Supported by GPT-4o. Works identically through Fivo.

T

Temperature
Parameter controlling randomness of AI responses. 0 = deterministic, 1 = creative. Low temperature queries optimize better (more predictable).
Throughput
Number of requests processed per second. Fivo Enterprise handles 10M+ queries/day with auto-scaling.
TLS (Transport Layer Security)
Encryption for data in transit. Fivo uses TLS 1.3 for all connections.
Token
The fundamental unit AI models use. ~4 characters per token. You pay per token for AI API calls. Fivo reduces token costs by 5–20×.
Token Cost
Price per token charged by AI providers. GPT-4o: $2.50/1M input, $10/1M output. Fivo reduces effective token cost to ~$0.05/1M.
Tool Use
AI model feature where the model calls external functions. Also called function calling. Fully supported through Fivo.

U

Uptime
Percentage of time a service is available. Fivo: 99.9% (Pro), 99.99% (Enterprise). Automatic fallback ensures your app works even during Fivo maintenance.

V

Vendor Lock-In
Dependency on a single provider making it hard to switch. Fivo has zero vendor lock-in — change one URL to switch back to direct API calls.
VPC (Virtual Private Cloud)
An isolated network environment in the cloud. Fivo Enterprise can deploy in your VPC for full data sovereignty.

W

Webhook
HTTP callback triggered by events. Fivo Enterprise supports webhooks for cost alerts, usage thresholds, and error notifications.
White-Glove Onboarding
Personalized setup assistance by Fivo founding engineers. Included with Enterprise plans. Ensures optimal configuration for your workload.

Z

Zero Data Retention
Mode where no query content is stored on disk. Available on all Fivo plans. Essential for HIPAA and high-security environments.
Zero Downtime
No service interruption during updates or maintenance. Fivo uses rolling deployments. Automatic fallback to direct API ensures your app never goes down.