Choosing AI Models

Every Horizon agent is powered by a large language model (LLM) that handles reasoning, language understanding, and response generation. Choosing the right model for each agent is a balance between quality, speed, and cost.

Available models

Horizon supports multiple model providers and tiers. The exact models available depend on your subscription plan and any custom model connections your organization has configured.

Tier	Best for	Characteristics
Flagship	Complex reasoning, multi-step analysis, nuanced writing	Highest quality output, slower response time, highest token cost
Standard	General-purpose conversations, report generation, data lookups	Good balance of quality, speed, and cost
Fast	High-volume, simple tasks, quick lookups, routing decisions	Fastest response time, lowest cost, may miss nuance in complex requests

Configuring the model

To set or change an agent’s model:

Open the agent in the dashboard.
Navigate to the Model tab.
Select a model tier and specific model version.
Optionally adjust advanced parameters (see below).
Save.

Changes take effect on the next conversation turn — you do not need to redeploy the agent.

Advanced parameters

For most agents, the default parameters work well. These settings are available for fine-tuning:

Temperature

Controls randomness in the model’s output.

0.0 — deterministic, always picks the most likely response. Good for factual lookups and data reporting.
0.3-0.7 — balanced. Good for most business conversations.
0.8-1.0 — more creative and varied. Good for brainstorming or content generation.

Max tokens

The maximum number of tokens the model can generate in a single response. Higher values allow longer responses but increase cost and latency.

Default: 2,048 — sufficient for most conversational responses.
4,096-8,192 — appropriate for agents that generate long reports or detailed analysis.

Context window

The total number of tokens (input + output) the model can process. This includes the agent’s instructions, conversation history, skill results, and the response.

Token usage and cost

Every interaction with an agent consumes tokens:

Component	Token consumption
Agent instructions (system prompt)	Consumed on every turn
Conversation history	Grows with each turn
Skill inputs (parameters sent to skills)	Small, typically 100-500 tokens
Skill outputs (data returned from services)	Varies widely — a P&L report may be 2,000+ tokens
Model response	Varies by response length

Monitoring usage

Track token consumption from several places:

Agent Settings > Usage — per-agent token consumption over time.
Dashboard KPIs — workspace-wide token usage against your wallet balance.
Billing > Usage Report — detailed breakdown by department, agent, and model.

Cost optimization strategies

Right-size your model — do not use Flagship for simple Q&A agents.
Keep instructions concise — every character in the system prompt is processed on every turn.
Use memory wisely — configure memory summarization to keep conversation context compact.
Limit skill output — use pagination and filtering parameters in skills to avoid pulling unnecessary data.
Set token budgets — configure per-agent and per-department token limits to prevent runaway costs.

Switching models

You can change an agent’s model at any time without losing conversation history or memory. The agent will simply start using the new model on its next turn.