Building Your First AI Agent Team with Claude + n8n

Introduction

Last month I built a 5-agent team using the Claude API and n8n that knocked out a content production task that would normally take 3 days — research, writing, editing, formatting, all fully automated — for under $8 in API costs. This article breaks down the entire build process, including the pitfalls I hit and the architecture that finally worked.

The Problem

Lots of people have heard of multi-agent systems, but actually building one surfaces a few real problems:

First, how do agents communicate? You have one agent doing research and another writing the article — how do you pass data between them? How do you keep the format consistent?

Second, how do you handle errors? If the research agent returns garbage data, the writing agent downstream will produce even bigger garbage. A multi-agent system without quality gates is worse than a single agent.

Third, how do you debug? With 5 agents chained together, when the final output is wrong, how do you pinpoint which step went sideways?

n8n solves all three. It's an open-source workflow automation platform originally designed for API orchestration, but its visual nodes, webhooks, and error-handling mechanisms make it a natural fit as an agent orchestration layer.

Core Architecture

Design Principles

Stateless agents: Each agent only cares about its own input and output — it doesn't need to know who's upstream or downstream
n8n handles orchestration: All agent invocation order, conditional branching, and retry logic lives in n8n
Quality gates: Critical nodes have scoring mechanisms that automatically trigger revision workflows when results fall below threshold

Architecture Overview

Trigger (Webhook / Schedule)
    |
n8n Main Workflow
    |-- Node 1: Research Agent (Claude Sonnet)
    |       | Output: Structured research data (JSON)
    |-- Node 2: Writing Agent (Claude Sonnet)
    |       | Output: Article draft (Markdown)
    |-- Node 3: Review Agent (Claude Haiku)
    |       | Output: Score + revision notes
    |-- Conditional Branch: Score >= 75?
    |       |-- Yes -> Node 4: Formatting Agent
    |       |-- No  -> Node 5: Revision Agent -> back to Review
    |-- Output: Final file written to storage

Key Components

Research Agent: Receives topic keywords, generates a search strategy via the Claude API, calls external APIs to gather data, and outputs structured JSON.

Writing Agent: Receives the research data plus writing rules (tone, word count, structure template), and outputs a Markdown article.

Review Agent: Uses Claude Haiku (low cost, high speed) to score the article against preset criteria, outputting a score and specific revision suggestions.

Implementation Details

Step 1: Setting Up the n8n Environment

n8n supports Docker deployment or cloud hosting (Pro at $60/month, includes 10,000 executions). Self-hosting is cheaper:

# Deploy n8n with Docker
docker run -it --rm \
  --name n8n \
  -p 5678:5678 \
  -v n8n_data:/home/node/.n8n \
  -e N8N_AI_ENABLED=true \
  docker.n8n.io/n8nio/n8n

Key configuration: Enable the N8N_AI_ENABLED environment variable. This activates n8n's AI node panel, which includes Agent nodes, Tool nodes, and Memory nodes.

Step 2: Creating the Research Agent Node

In n8n, each agent is essentially an HTTP Request node calling the Claude API:

// Claude API call in an n8n Code node
const response = await $http.request({
  method: 'POST',
  url: 'https://api.anthropic.com/v1/messages',
  headers: {
    'x-api-key': $env.ANTHROPIC_API_KEY,
    'anthropic-version': '2023-06-01',
    'content-type': 'application/json',
  },
  body: {
    model: 'claude-sonnet-4-5-20250514',
    max_tokens: 4096,
    system: `You are a professional researcher. Given a topic, output structured research data.
    Output format must be JSON:
    {
      "key_facts": [...],      // Core facts with data sources
      "market_data": {...},     // Market data
      "competitor_analysis": [] // Competitor analysis
    }`,
    messages: [
      {
        role: 'user',
        content: `Research topic: ${$input.item.json.topic}`,
      },
    ],
  },
});

// Parse and pass to the next node
return { json: JSON.parse(response.content[0].text) };

One gotcha here: Claude sometimes wraps the returned JSON in Markdown code blocks (```json ... ```). The fix is to clean the response before parsing:

// Clean Claude's JSON response
function cleanJsonResponse(text: string): string {
  // Strip Markdown code block markers
  const cleaned = text.replace(/```json\n?/g, '').replace(/```\n?/g, '');
  return cleaned.trim();
}

Step 3: Writing Agent Node

The writing agent's core lies in the system prompt design. I packed brand voice rules, article structure templates, and a banned-words list all into the system prompt:

# Writing Agent's system prompt structure (simplified)
WRITING_SYSTEM_PROMPT = """
You are a technical content writing expert.

## Writing Rules
- Tone: Professional yet approachable, first person
- Word count: 2000-3000 words
- Structure: Opening hook -> Problem -> Solution -> Code examples -> Real-world experience -> Summary
- Banned words list: {banned_words}

## Input
You will receive a JSON-formatted research dataset. Write based on it.

## Output
Markdown-formatted article with YAML frontmatter.
"""

Step 4: Review Agent + Conditional Branching

This is the most critical part of the entire system. The review agent uses Claude Haiku ($1/M input, $5/M output), which keeps costs extremely low:

// Review Agent scoring logic
const reviewPrompt = `
Evaluate this article's quality across these 5 dimensions (20 points each, 100 total):

1. Depth of content: Does it include specific data and case studies?
2. Structural clarity: Is the logic coherent?
3. Tone consistency: Does it match the brand voice?
4. Technical accuracy: Are code and technical descriptions correct?
5. Originality: Does it offer a unique perspective?

Output JSON:
{
  "total_score": 82,
  "dimensions": { ... },
  "issues": ["Specific issue 1", "Specific issue 2"],
  "suggestions": ["Revision suggestion 1", "Revision suggestion 2"]
}

Article content:
${$input.item.json.article}
`;

n8n's Switch node can route based on the score:

total_score >= 75 -> proceed to the formatting node
total_score < 75 -> proceed to the revision node, then back to review (max 2 loops)

Step 5: Error Handling and Retries

n8n has a built-in Error Trigger node. For each agent node, I configured:

Timeout: Retry after 30 seconds of no response (Claude API can occasionally be slow)
Format errors: If JSON parsing fails, re-request with an added instruction: "Output strictly valid JSON with no additional explanation"
API rate limiting: 429 errors trigger exponential backoff retries (1s, 2s, 4s)

// n8n retry configuration (in node Settings)
{
  "retryOnFail": true,
  "maxTries": 3,
  "waitBetweenTries": 2000  // milliseconds
}

Real-World Results

Production Data

I ran this system through a batch of 50 articles. Here are the numbers:

Metric	Value
Average time per article	4 min 12 sec
Research Agent token usage	~2,800 input / ~1,500 output
Writing Agent token usage	~4,200 input / ~3,800 output
Review Agent token usage	~5,000 input / ~800 output
Average API cost per article	$0.15 (Sonnet) + $0.03 (Haiku) = $0.18
First-pass rate (>= 75 points)	72%
Pass rate after one revision	94%

Pitfalls I Hit

Pitfall 1: Agent role confusion. Initially my writing agent's prompt was too vague, so it would sometimes "go rogue" and do its own research, producing content inconsistent with the research agent's data. The fix: explicitly state in the system prompt, "You may only use the provided research data. Do not supplement with additional information."

Pitfall 2: n8n's built-in AI Agent node vs. custom HTTP node? n8n has a built-in AI Agent node that connects directly to Anthropic. But I found that custom HTTP Request nodes offer more flexibility — you get precise control over prompts, temperature, and output format. The built-in node is great for quick prototyping; for production, go custom.

Pitfall 3: Memory issues with parallel execution. n8n keeps all node data in memory by default. When 5 agents process long articles simultaneously, a single workflow's memory usage can exceed 500MB. The fix: split batch jobs into sub-workflows, each handling one article.

When to Use This

Good fit:

Content production pipelines (research -> writing -> review)
Data processing pipelines (extract -> clean -> analyze -> report)
Automated customer ticket classification and initial responses

Not a good fit:

Scenarios requiring real-time dialogue between agents (n8n executes sequentially and doesn't support multi-turn conversation orchestration)
Scenarios requiring complex state management (agents needing cross-task context)
Low-latency requirements (< 1 second response)

Comparison

Dimension	n8n + Claude API	LangChain/LangGraph	CrewAI
Learning curve	Low (visual drag-and-drop)	Medium (requires coding)	Medium (intuitive role abstraction)
Flexibility	Medium (limited by node types)	High (fully programmable)	Medium (framework constraints)
Debug experience	Excellent (visual execution logs)	Average (text logs)	Average
Production ops	Built-in monitoring and retries	Build your own	Build your own
Self-hosting cost	Just Docker	Custom deployment	Custom deployment
Best for	Ops / product teams	Engineering teams	Engineering teams

Takeaways

Three core takeaways:

n8n is the best entry point for multi-agent orchestration — visual nodes let you see the data flow intuitively, and the built-in error handling and retry mechanisms save you from writing a ton of infrastructure code.
Quality gates are the lifeblood of a multi-agent system — an agent chain without a review node is an assembly line without brakes. Using Claude Haiku for review keeps costs negligible ($0.03 per article).
Get a single pipeline working before scaling horizontally — don't start by trying to run 10 agents in parallel. Begin with a simple three-node flow (research -> writing -> review) to validate the logic, then add nodes and parallelism from there.

If you're ready to build your own agent team, I'd suggest starting with the simplest possible two-node workflow: one agent generates content, another reviews it. You can have it running within two days, then iterate from there.

What problems have you run into when building multi-agent systems? Let's talk about it in the comments.

Building Your First AI Agent Team with Claude + n8n

Building Your First AI Agent Team with Claude + n8n

Introduction

The Problem

Core Architecture

Design Principles

Architecture Overview

Key Components

Implementation Details

Step 1: Setting Up the n8n Environment

Step 2: Creating the Research Agent Node

Step 3: Writing Agent Node

Step 4: Review Agent + Conditional Branching

Step 5: Error Handling and Retries

Real-World Results

Production Data

Pitfalls I Hit

When to Use This

Comparison

Takeaways

Keep reading.

How to Connect Your AI Agent to Company Data — A Step-by-Step Guide

LangChain vs CrewAI vs Building from Scratch — My Experience

The Complete Guide to Building an MCP Server