ИИ-гайды

Оптимизация архитектуры AI-агентов: от хаоса к 92%

Разбор от редакции «ИИ для чайников» · Обновлено 23 мая 2026

💡 О чём гайд
Когда AI-агент растёт, его системный промпт становится толстым (400+ строк), инструменты — лишние, эвалы падают. Рефакторинг через три столпа — skills вместо промпта, базовые инструменты вместо кастомных, и редкие sub-agenty — снижает код до 15 строк и повышает успешность с 62% до 92%. Hill climbing по эвалам — единственный способ измерить улучшение.

📢 Больше разборов — в канале «ИИ для чайников»

📄 Скачать гайд (PDF) 📢 Канал ИИ для чайников

Agent degradation: Long system prompts (400+ lines) degrade performance as requirements pile up without refactoring

Hill climbing: Run evals → analyze failures → fix architecture → repeat until target metrics hit

Skills over prompts: Move business logic from system prompt to reusable skills (60% reduction possible)

Primitives first: Start with bash, file I/O, and web search. Add custom tools only when necessary

Subagents wisely: Use only for parallelization or separation of concerns (e.g., code review); avoid over-complication

Results: Stock Pilot: 400-line prompt → 15 lines, 12 tools → 3 primitives, 62–83% → 92% eval success

The Problem: The Bloated Agent

Agents like Stock Pilot (inventory management) face typical growth problems:

Long system prompt (~400 lines) from constant addition of business requirements
12 tools, 3 of which are wrappers around isolated subagents
Falling effectiveness (evals): success rate dropped to 62–83%, critical for business

Failed eval examples:

F1 (daily low-stock check): Agent reaches goal but via inefficient, winding path
F2 (sale order process): Communication failure between orchestrator and subagent
R8 (promo-month forecasting): Policy conflicts in different prompt sections lead to errors (e.g., using multiplier 1.35 instead of 3.1x)

Strategy: Hill Climbing on Evals

The methodology of iterative improvement:

Run evals and establish baseline
Analyze failure causes (Claude can help)
Modify agent architecture
Re-run evals and measure improvement
Repeat until target metrics reached

Evals are your compass for systematic improvement. Without them, you're guessing.

Three Pillars of Architecture Modernization

1. Skills Instead of Long Prompts

A skill is packaged, composable information that Claude pulls into context only when needed for a specific task.

Problem: All business processes (policies, procedures) baked into system prompt, polluting the context window.

Solution: Move specific instructions (e.g., forecasting logic) to Skills. Keep system prompt for only the most general, always-needed information.

Result: System prompt shrunk from ~400 to ~50 lines.

Rule: System prompt = information needed always. Skills = information needed sometimes.

2. Primitive Tools Instead of Custom Ones

Philosophy: Give agents the same primitives a human has at a computer.

Recommended sequence:

Start with Claude Code primitives: code execution (bash), file system, web search, task list
Remove unnecessary: If agent doesn't need web search, disable it
Add custom tools only when required

Example from the case: Instead of creating separate tools for analyzing each CSV, the agent got bash access. Now it writes Python scripts for data analysis, which:

Drastically reduced token usage (from >200K to minimum)
Lowered cost
Increased flexibility

MCP note: Connect only when there's a shared toolkit for many agents/clients. Often code execution (for API calls) is more flexible.

3. Conscious Use of Subagents

When you truly need them:

Parallelization: When you need to throw many Claude instances at one hard task (deep research, code analysis)
Fresh perspective: When you need to separate creation from review (one agent writes, another reviews)

Subagent problems: Complex orchestrator communication, logging and observability issues.

CMA solution: Use native Callable Agents for managed subagents with centralized observability.

In Stock Pilot, only one forecasting subagent was kept to isolate that process from main context.

Final Architecture and Results

Before:

Orchestrator + 400-line prompt
12 tools (3 subagents)
Eval success: ~62–83%

After (post-refactoring):

Orchestrator on Claude Managed Agents (no infrastructure worries)
3 primitive tools: bash, read, write
System prompt: 15 lines
Business logic moved to Skills
Eval success: ~92%

Key improvements:

Reduced token usage and cost
Higher productivity
Simplified maintenance and architecture

Bloated agents are inevitable as requirements grow, but they're fixable. Use skills to keep prompts lean, start with primitives and add tools only when needed, and deploy subagents consciously. Most importantly, write evals and use hill climbing to measure every improvement. Claude Managed Agents removes infrastructure burden so you can focus on agent design itself.

Часто задаваемые вопросы

Как узнать, когда мой агент слишком сложный?

Когда системный промпт превышает 100 строк, вы накапливаете технический долг. Если эвалы показывают падение успешности или агент требует множества витков для простых задач, пора рефакторить. Используйте эвалы для измерения деградации и hill climbing для итеративного улучшения.

Стоит ли использовать skills или держать всё в системном промпте?

Используйте skills для бизнес-логики, политик и процедур, которые нужны не всегда. Системный промпт должен быть лёгким — только для кор-поведения агента. Это сокращает загрязнение контекста и снижает затраты на токены на 60%.

Когда создавать sub-agent вместо инструмента?

Sub-agenty нужны для параллелизации независимых задач или разделения ответственности (один создаёт, другой проверяет). Для простых одношаговых операций используйте инструменты. Sub-agenty добавляют оркестрационные сложности, избегайте их для несложных задач.

Как измерить успех рефакторинга?

Запустите эвалы до и после рефакторинга. Отслеживайте success rate, token usage и latency. Hill climbing — это цикл: измерь → анализируй → улучшай → перемеряй. Эвалы — единственный источник истины, не полагайтесь на субъективные впечатления.