Agent Libraries: Teaching AI to Remember and Get Sharper

The Capability Gap

A freshly emerged agent knows your brand. It's been born from the Data Warehouse, constrained by the BIOS, and validated across platforms. It can produce good work.

But "good" isn't "excellent." The gap between a capable agent and an excellent one is operational infrastructure — the libraries, indexes, and self-management systems that allow it to work faster, stay consistent, and improve over time.

Step 7 is where agents build their own infrastructure. Not imposed from outside — self-generated based on their understanding of the brand and their role within it.

The Library Stack

Agent Libraries · Step 7

The 4-Layer Library Stack

Validated capabilities become operational knowledge. Self-built, self-maintained.

The Playbook~1,600 Lines

Decision trees, escalation rules, quality checkpoints, interaction protocols with other agents

Intelligence ThreadsLiving Docs

Pattern recognition, campaign learnings, customer behavior insights — evolves with every cycle

Restraint DoctrineAbsolute

Explicit refusals. Brand boundaries. Pricing floors. What the agent will never do, regardless of tier.

Capability Self-AssessmentScope

What it knows, partially knows, doesn't know, and where confidence scores are lowest

Each agent builds five types of operational documents:

1. The Playbook (~1,600 lines)

The playbook is the agent's decision framework. It covers every scenario the agent might encounter in its domain.

For Celtic Knot's Saoirse (media buying agent), the playbook includes:

Campaign launch decision trees (when to use ASC vs manual, when to test vs scale)
Budget reallocation triggers (if ROAS drops below 2.5x for 3 consecutive days, reduce by 20%)
Creative fatigue detection (CTR decline of >15% over 7 days = refresh creative)
Audience signal interpretation (high CTR + low conversion = targeting right, landing page wrong)
Seasonal adjustment rules (Q4 holiday: increase budget 40%, shift to gift messaging)

This isn't a prompt. It's a comprehensive operational manual that the agent authored based on its understanding of 6 years of Celtic Knot campaign data.

2. The Restraint Doctrine

Every agent documents what it explicitly refuses to do. These aren't limitations — they're quality controls.

Saoirse's restraints:

Never bid on competitor brand terms (brand ethos violation)
Never use discount-led creative without seasonal authorization
Never allocate more than 30% of budget to untested audiences
Never run campaigns without a minimum 3-day learning period
Never override human-set daily budget caps

Restraints are as important as capabilities. An agent without restraints is an agent without taste — it'll optimize for metrics at the expense of brand integrity.

3. Intelligence Threads

Intelligence threads are ongoing analysis streams that the agent maintains across sessions. They're the agent's "working memory" for patterns it's tracking:

Thread: Creative Performance Patterns — tracking which visual elements correlate with higher engagement over time
Thread: Audience Evolution — monitoring shifts in customer demographics and psychographics
Thread: Competitive Movement — tracking competitor messaging and positioning changes
Thread: Seasonal Signals — identifying yearly patterns that inform future campaign timing

Threads persist across conversations. They're updated with each new data point and reviewed weekly for insights that should trigger a BIOS update.

4. Context Efficiency Index

Remember from SYS Loader: the full BIOS is ~80,000 tokens. Loading all of it for every task wastes context and degrades performance.

The Context Efficiency Index is the agent's own mapping of which BIOS specs it needs for which tasks:

Task	Required Specs	Token Budget
Write ad copy	1.1, 1.3, 3.1, 4.2, 5.1	~8,000
Analyze campaign	3.1, 4.1, 6.1, 6.2	~6,000
Recommend budget	4.1, 4.2, 6.1, 6.4	~5,000
Creative brief	1.1, 2.1, 3.1, 5.1, 5.2	~10,000

This index is self-generated. The agent learned through experience which specs matter for which tasks and documented the mapping. It's a form of operational wisdom — knowing what to pay attention to and what to ignore.

5. Self-Evaluation Framework

Each agent defines its own success metrics and evaluates its performance against them:

First-pass acceptance rate: Target 85-90%. How often does the human accept the output without revision?
Restraint compliance: Target 100%. How often does the agent respect its own doctrine?
Context efficiency: Target <50% of max token budget per task. How well does it use context?
Consistency score: How similar is output quality across sessions?

The self-evaluation isn't self-congratulatory. Agents flag their own declining performance before humans notice. If Saoirse's ad copy acceptance rate drops from 88% to 72% over three weeks, she flags it in her next session report with hypotheses about why.

Democratic Evaluation

After each agent builds its library stack, the team reviews each other's work:

Saoirse reviews Brigid's content playbook for alignment with campaign performance data
Brigid reviews Niamh's customer engagement playbook for voice consistency
Fionn reviews everyone's KPI targets and self-evaluation frameworks for mathematical validity
Oisín reviews all restraint doctrines for brand coherence

This peer review catches blind spots. A content agent might set an unambitious quality standard that the analytics agent notices is below historical benchmarks. A media buying agent might miss a restraint that the creative director agent considers essential for brand integrity.

The Journaling Practice

Agents maintain session journals — structured logs of what they did, what they decided, and why. These journals serve three purposes:

1. Continuity: When an agent starts a new session, it reads its journal to understand what happened last time. This creates operational continuity without requiring a persistent memory system.

2. Pattern Detection: Journal entries accumulate into a dataset. When reviewed monthly, patterns emerge: recurring challenges, evolving best practices, shifting audience behavior.

3. Accountability: If an agent's output quality declines, the journal provides a forensic trail. You can trace back to the session where the quality shifted and understand what changed.

State Management

Agent libraries aren't static. They evolve through a managed state cycle:

Draft: New library content generated during emergence
Review: Cross-platform validation and peer review
Active: Approved and in production use
Deprecated: Superseded by updated version (retained for reference)

Each state transition is logged with a reason. When a playbook section is deprecated, the replacement section references why the change was made and what data triggered it. This creates an institutional memory that survives agent refreshes and BIOS version updates.

Why Self-Built > Externally Imposed

I could write the playbooks myself. I know these brands. I know the strategies. But externally imposed libraries have a ceiling: they reflect my understanding, not the agent's.

When Saoirse writes her own playbook based on 12,000 orders, 500 campaigns, and 6 years of seasonal data, she encodes patterns I might miss. When Brigid writes her own content pillars based on analyzing which email subject lines drove the highest revenue per subscriber, she's extracting signal from noise at a scale my manual review can't match.

Self-built libraries are more accurate because they come from the agent's direct analysis of the data. They're more complete because the agent has read every record. And they're more honest because the agent has no ego — it documents its weaknesses alongside its strengths.

The result: agents that don't just follow instructions — they operate with genuine expertise, built from evidence, refined through review, and continuously improved through self-evaluation.