Skip to content

Agent Memory

Memory is what separates a stateless chatbot from a useful agent. Horizon’s memory system lets agents retain context across conversation turns and even across separate conversations, building up knowledge about your business over time.

Horizon agents have three distinct memory layers, each serving a different purpose.

Working memory is the agent’s short-term scratchpad. It holds information relevant to the current task within the active conversation.

  • Scope: current conversation turn and the immediately preceding turns.
  • Lifetime: cleared when the conversation ends (unless promoted to episodic memory).
  • Use case: holding intermediate results, tracking multi-step tasks, remembering what the user just said.

Working memory is automatic — you do not need to configure it. The model’s context window effectively defines its capacity.

Example: A user asks “What are the outstanding invoices for Acme Corp?” The agent queries QuickBooks, receives 15 invoices, and holds them in working memory so it can answer follow-up questions like “Which of those are overdue?” without querying again.

Each memory entry has a scope that controls its lifetime:

ScopeLifetimeCleared when
SessionActive conversation onlyConversation ends
PersistentAcross conversationsManually cleared, retention period expires, or agent is deleted
  • Working memory is always session-scoped.
  • Episodic memory is always persistent (but subject to retention settings).
  • Semantic memory is always persistent.

Navigate to the agent’s Memory tab to configure:

  • Episodic retention period — how long conversation summaries are kept. Default: 90 days.
  • Semantic memory limit — maximum number of semantic memories stored. Default: 500.
  • Auto-summarization — whether to compress older episodic memories to save tokens. Default: enabled.

Every semantic memory entry is assigned a confidence score between 0.0 and 1.0 that reflects how reliable the platform considers that memory to be.

Score rangeMeaningExample
0.9 - 1.0High confidence — explicitly stated by a user or confirmed multiple times”Our fiscal year starts April 1” (stated directly by the CFO)
0.7 - 0.89Medium confidence — inferred from consistent behaviorThe user always asks for accrual-based reports
0.5 - 0.69Low confidence — inferred from limited dataA single mention of a preferred report format
Below 0.5Uncertain — may be outdated or contradictedA preference from 6 months ago that has not been reconfirmed

Confidence scores affect how the agent uses memories:

  • High-confidence memories are applied automatically without asking the user.
  • Medium-confidence memories are applied but the agent may mention its assumption (“I’m using accrual-based accounting as usual — let me know if you’d prefer cash basis”).
  • Low-confidence memories prompt the agent to ask for confirmation before applying them.
  • Uncertain memories are available but not proactively used.

From the agent’s Memory tab, you can:

  • Browse all memories — see episodic and semantic memories with their confidence scores and timestamps.
  • Search memories — find specific memories by keyword.
  • Edit memories — correct inaccurate memories or update outdated information.
  • Delete memories — remove memories that are no longer relevant.
  • Export memories — download the agent’s memory as JSON for backup or migration.
  1. Let memory build naturally — resist the urge to pre-load hundreds of semantic memories. Agents learn best from real conversations.
  2. Review memories periodically — check the Memory tab monthly to remove outdated or incorrect entries.
  3. Use confidence scores — if an agent keeps making wrong assumptions, check whether it has low-confidence memories that need to be corrected or removed.
  4. Set appropriate retention — for compliance-sensitive environments, configure episodic retention to match your data retention policies.
  5. Monitor memory token cost — each retrieved memory consumes tokens. If an agent has hundreds of semantic memories, it may be retrieving too many per turn. Tune the retrieval settings to limit the number of memories injected per turn.