Choosing AI Models
Every Horizon agent is powered by a large language model (LLM) that handles reasoning, language understanding, and response generation. Choosing the right model for each agent is a balance between quality, speed, and cost.
Available models
Section titled “Available models”Horizon supports multiple model providers and tiers. The exact models available depend on your subscription plan and any custom model connections your organization has configured.
| Tier | Best for | Characteristics |
|---|---|---|
| Flagship | Complex reasoning, multi-step analysis, nuanced writing | Highest quality output, slower response time, highest token cost |
| Standard | General-purpose conversations, report generation, data lookups | Good balance of quality, speed, and cost |
| Fast | High-volume, simple tasks, quick lookups, routing decisions | Fastest response time, lowest cost, may miss nuance in complex requests |
Configuring the model
Section titled “Configuring the model”To set or change an agent’s model:
- Open the agent in the dashboard.
- Navigate to the Model tab.
- Select a model tier and specific model version.
- Optionally adjust advanced parameters (see below).
- Save.
Changes take effect on the next conversation turn — you do not need to redeploy the agent.
Advanced parameters
Section titled “Advanced parameters”For most agents, the default parameters work well. These settings are available for fine-tuning:
Temperature
Section titled “Temperature”Controls randomness in the model’s output.
- 0.0 — deterministic, always picks the most likely response. Good for factual lookups and data reporting.
- 0.3-0.7 — balanced. Good for most business conversations.
- 0.8-1.0 — more creative and varied. Good for brainstorming or content generation.
Max tokens
Section titled “Max tokens”The maximum number of tokens the model can generate in a single response. Higher values allow longer responses but increase cost and latency.
- Default: 2,048 — sufficient for most conversational responses.
- 4,096-8,192 — appropriate for agents that generate long reports or detailed analysis.
Context window
Section titled “Context window”The total number of tokens (input + output) the model can process. This includes the agent’s instructions, conversation history, skill results, and the response.
Token usage and cost
Section titled “Token usage and cost”Every interaction with an agent consumes tokens:
| Component | Token consumption |
|---|---|
| Agent instructions (system prompt) | Consumed on every turn |
| Conversation history | Grows with each turn |
| Skill inputs (parameters sent to skills) | Small, typically 100-500 tokens |
| Skill outputs (data returned from services) | Varies widely — a P&L report may be 2,000+ tokens |
| Model response | Varies by response length |
Monitoring usage
Section titled “Monitoring usage”Track token consumption from several places:
- Agent Settings > Usage — per-agent token consumption over time.
- Dashboard KPIs — workspace-wide token usage against your wallet balance.
- Billing > Usage Report — detailed breakdown by department, agent, and model.
Cost optimization strategies
Section titled “Cost optimization strategies”- Right-size your model — do not use Flagship for simple Q&A agents.
- Keep instructions concise — every character in the system prompt is processed on every turn.
- Use memory wisely — configure memory summarization to keep conversation context compact.
- Limit skill output — use pagination and filtering parameters in skills to avoid pulling unnecessary data.
- Set token budgets — configure per-agent and per-department token limits to prevent runaway costs.
Switching models
Section titled “Switching models”You can change an agent’s model at any time without losing conversation history or memory. The agent will simply start using the new model on its next turn.