Skip to content

Choosing AI Models

Every Horizon agent is powered by a large language model (LLM) that handles reasoning, language understanding, and response generation. Choosing the right model for each agent is a balance between quality, speed, and cost.

Horizon supports multiple model providers and tiers. The exact models available depend on your subscription plan and any custom model connections your organization has configured.

TierBest forCharacteristics
FlagshipComplex reasoning, multi-step analysis, nuanced writingHighest quality output, slower response time, highest token cost
StandardGeneral-purpose conversations, report generation, data lookupsGood balance of quality, speed, and cost
FastHigh-volume, simple tasks, quick lookups, routing decisionsFastest response time, lowest cost, may miss nuance in complex requests

To set or change an agent’s model:

  1. Open the agent in the dashboard.
  2. Navigate to the Model tab.
  3. Select a model tier and specific model version.
  4. Optionally adjust advanced parameters (see below).
  5. Save.

Changes take effect on the next conversation turn — you do not need to redeploy the agent.

For most agents, the default parameters work well. These settings are available for fine-tuning:

Controls randomness in the model’s output.

  • 0.0 — deterministic, always picks the most likely response. Good for factual lookups and data reporting.
  • 0.3-0.7 — balanced. Good for most business conversations.
  • 0.8-1.0 — more creative and varied. Good for brainstorming or content generation.

The maximum number of tokens the model can generate in a single response. Higher values allow longer responses but increase cost and latency.

  • Default: 2,048 — sufficient for most conversational responses.
  • 4,096-8,192 — appropriate for agents that generate long reports or detailed analysis.

The total number of tokens (input + output) the model can process. This includes the agent’s instructions, conversation history, skill results, and the response.

Every interaction with an agent consumes tokens:

ComponentToken consumption
Agent instructions (system prompt)Consumed on every turn
Conversation historyGrows with each turn
Skill inputs (parameters sent to skills)Small, typically 100-500 tokens
Skill outputs (data returned from services)Varies widely — a P&L report may be 2,000+ tokens
Model responseVaries by response length

Track token consumption from several places:

  • Agent Settings > Usage — per-agent token consumption over time.
  • Dashboard KPIs — workspace-wide token usage against your wallet balance.
  • Billing > Usage Report — detailed breakdown by department, agent, and model.
  1. Right-size your model — do not use Flagship for simple Q&A agents.
  2. Keep instructions concise — every character in the system prompt is processed on every turn.
  3. Use memory wisely — configure memory summarization to keep conversation context compact.
  4. Limit skill output — use pagination and filtering parameters in skills to avoid pulling unnecessary data.
  5. Set token budgets — configure per-agent and per-department token limits to prevent runaway costs.

You can change an agent’s model at any time without losing conversation history or memory. The agent will simply start using the new model on its next turn.