How I Built an AI Content Agent — The Complete Workflow

Opening

The article you're reading right now — from topic selection to first draft to review to formatting — was entirely produced by an AI Agent Pipeline. This system has processed over 300 topics across 6 content series, Chinese-first, supporting both article and tweet formats. This article fully discloses the architecture, each Agent's responsibilities, the quality control mechanisms, and real-world production data.

Problem Background

When you're a one-person content operation, the bottleneck isn't writing itself — it's the entire writing workflow:

Research: A technical article needs 1-2 hours of research — reading docs, comparing prices, finding data
Writing: 2-3 hours for a first draft, plus maintaining tonal consistency
Review: Reviewing your own writing makes it hard to spot problems
Formatting: YAML frontmatter, file naming, directory structure — tedious but must be accurate

To produce 300 pieces of content at a human pace (one per day), you'd need an entire year. With the Agent Pipeline, I completed 80% of the first drafts in two weeks, spending the remaining time on human review and iteration.

Core Architecture

Pipeline Overview

Topic Source (300+ topics)
        ↓
┌──────────────────────────────────────────┐
│              Content Pipeline             │
│                                          │
│  ┌──────────┐    ┌──────────┐            │
│  │ Research  │───→│ Chinese  │            │
│  │  Agent    │    │  Draft   │            │
│  │          │    │  Agent   │            │
│  └──────────┘    └────┬─────┘            │
│                       │                  │
│                  ┌────▼─────┐            │
│                  │ Quality  │            │
│                  │ Reviewer │            │
│                  │          │            │
│                  └────┬─────┘            │
│                       │                  │
│              ┌────────┴────────┐         │
│              │                 │         │
│         score >= 75      score < 75      │
│              │                 │         │
│        ┌─────▼─────┐   ┌──────▼──────┐  │
│        │  Format    │   │  Revision   │  │
│        │  Agent     │   │  Agent      │  │
│        │            │   │             │  │
│        └─────┬─────┘   └──────┬──────┘  │
│              │                │          │
│              │           (back to review,│
│              │            max 2 rounds)  │
│              ↓                          │
│       Final Output                      │
└──────────────────────────────────────────┘

Design Principles

Single responsibility: Each Agent does one thing only — the researcher doesn't write, the writer doesn't format
Quality gates: The review Agent uses a scoring system to gate quality; below 75 triggers automatic revision
Chinese-first: All content is written in Chinese first, not translated from English
Batch processing: Tweets run 10 per batch, articles 5 per batch, executed in parallel

Implementation Details

Agent 1: Research Agent

The Research Agent's core task: given a topic, use WebSearch to collect facts, prices, and data, then output a structured research report.

import anthropic
import json

class ResearchAgent:
    """Research Agent: collects factual data"""

    def __init__(self):
        self.client = anthropic.Anthropic()

    async def research(self, topic: dict) -> dict:
        """Research a given topic"""
        system_prompt = """You are a professional technical researcher.

Task: Collect the latest factual data for the given topic.

Output format (JSON):
{
  "key_facts": [
    {"fact": "description", "source": "source", "date": "data date"}
  ],
  "pricing_data": {
    "product_name": {"price": "xxx", "source": "url"}
  },
  "statistics": [
    {"metric": "metric name", "value": "value", "context": "context"}
  ],
  "competitor_info": [...],
  "technical_details": [...]
}

Rules:
1. Only output verifiable facts, no speculation
2. Cite the source for each data point
3. Pricing data must include currency and date
4. If a category of information cannot be found, leave the field as an empty array"""

        response = self.client.messages.create(
            model="claude-sonnet-4-5-20250514",
            max_tokens=4096,
            system=system_prompt,
            messages=[{
                "role": "user",
                "content": f"""Topic information:
Title: {topic['title']}
Description: {topic['description']}
Series: {topic['series']}
Keywords: {', '.join(topic['keywords'])}"""
            }]
        )

        return json.loads(response.content[0].text)

Agent 2: Chinese Draft Agent

This is the most critical Agent. It receives research data and writing rules, and outputs a Chinese article.

class ChineseDraftAgent:
    """Chinese writing Agent"""

    def __init__(self, series_rules: str, brand_voice: str, anti_ai_patterns: list[str]):
        self.client = anthropic.Anthropic()
        self.series_rules = series_rules
        self.brand_voice = brand_voice
        self.banned_phrases = anti_ai_patterns

    async def write(self, topic: dict, research_data: dict) -> str:
        """Write an article based on research data"""
        system_prompt = f"""You are Jessie Qin's AI writing assistant.

## Author Background
- CS PhD + NYU Stern Master
- Senior Member of Technical Staff, Generative AI
- Founder of Solo Unicorn Club
- 12 years of living in the US, bilingual Chinese-English thinker

## Writing Rules
{self.series_rules}

## Brand Voice
{self.brand_voice}

## Banned Phrases (any occurrence counts as a quality failure)
{json.dumps(self.banned_phrases, ensure_ascii=False)}

## Core Requirements
1. Write in native Chinese, not translated from English
2. First person, based on hands-on experience
3. 2000-3000 words
4. Must include code examples
5. Must include production data (latency, cost, accuracy)
6. Keep technical terms in English: Agent, RAG, LLM, token, etc.
7. Every paragraph should have information density — no filler

## Article Structure
Opening hook → Problem background → Core architecture → Implementation details (with code) → Field lessons → Conclusion (3 takeaways)"""

        response = self.client.messages.create(
            model="claude-sonnet-4-5-20250514",
            max_tokens=8192,
            system=system_prompt,
            messages=[{
                "role": "user",
                "content": f"""Topic: {topic['title']}
Description: {topic['description']}

Research data:
{json.dumps(research_data, ensure_ascii=False, indent=2)}

Please write the complete article."""
            }]
        )

        return response.content[0].text

Agent 3: Quality Reviewer

class QualityReviewer:
    """Quality review Agent"""

    def __init__(self, rubric: dict, anti_ai_patterns: list[str]):
        self.client = anthropic.Anthropic()
        self.rubric = rubric
        self.anti_ai_patterns = anti_ai_patterns

    async def review(self, article: str, topic: dict) -> dict:
        """Review article quality"""

        # Step 1: Rule-based checks (no LLM needed)
        rule_issues = self._rule_check(article)

        # Step 2: LLM evaluation
        review_prompt = f"""You are a strict content quality reviewer.

## Scoring Criteria (20 points each, 100 total)

1. Content Depth (20 pts)
   - Contains specific data and case studies?
   - Has code examples?
   - Has production environment data?

2. Structural Clarity (20 pts)
   - Logical flow?
   - Natural transitions?
   - Clear hierarchy?

3. Brand Consistency (20 pts)
   - First-person practitioner perspective?
   - Concise, no-nonsense tone?
   - No AI-sounding writing?

4. Technical Accuracy (20 pts)
   - Code syntax correct?
   - Data consistent with descriptions?
   - Terminology used accurately?

5. Original Value (20 pts)
   - Unique perspectives or experiences?
   - Not a generic tutorial?
   - Reader gains something?

Output JSON:
{{
  "total_score": 82,
  "dimensions": {{
    "content_depth": {{"score": 18, "feedback": "..."}},
    "structure": {{"score": 16, "feedback": "..."}},
    "brand_voice": {{"score": 17, "feedback": "..."}},
    "technical_accuracy": {{"score": 15, "feedback": "..."}},
    "originality": {{"score": 16, "feedback": "..."}}
  }},
  "critical_issues": ["..."],
  "improvement_suggestions": ["..."]
}}

Topic: {topic['title']}

Article content:
{article}"""

        response = self.client.messages.create(
            model="claude-sonnet-4-5-20250514",
            max_tokens=2048,
            messages=[{"role": "user", "content": review_prompt}]
        )

        llm_review = json.loads(response.content[0].text)

        # Merge rule-based checks with LLM evaluation
        if rule_issues:
            llm_review["critical_issues"].extend(rule_issues)
            # Deduct 5 points per banned phrase
            penalty = len(rule_issues) * 5
            llm_review["total_score"] = max(0, llm_review["total_score"] - penalty)

        return llm_review

    def _rule_check(self, article: str) -> list[str]:
        """Rule-based checks (banned phrases, etc.)"""
        issues = []
        for phrase in self.anti_ai_patterns:
            if phrase in article:
                issues.append(f"Contains banned phrase: '{phrase}'")
        return issues

Agent 4: Revision Agent

class RevisionAgent:
    """Revision Agent: revises articles based on review feedback"""

    def __init__(self):
        self.client = anthropic.Anthropic()

    async def revise(
        self, article: str, review: dict, attempt: int
    ) -> str:
        """Revise the article based on review feedback"""
        system_prompt = f"""You are an article revision expert.

You've received an article and review feedback. Revise the article according to the feedback.

## Revision Rules
1. Only fix the issues identified in the review — don't overhaul parts that are fine
2. Preserve the overall structure and style of the original
3. If the review asks for additional content, weave it in naturally — don't force it
4. All banned phrases must be replaced
5. This is revision {attempt}/2 — please address every issue carefully

Review feedback:
{json.dumps(review, ensure_ascii=False, indent=2)}"""

        response = self.client.messages.create(
            model="claude-sonnet-4-5-20250514",
            max_tokens=8192,
            system=system_prompt,
            messages=[{
                "role": "user",
                "content": f"Please revise the following article:\n\n{article}"
            }]
        )

        return response.content[0].text

Agent 5: Format Agent

class FormatAgent:
    """Format Agent: generates the final file"""

    def __init__(self):
        self.client = anthropic.Anthropic()

    async def format_article(
        self, article: str, topic: dict, quality_score: int
    ) -> str:
        """Add frontmatter, validate format, output final file"""

        # Calculate word count
        word_count = len(article.replace(" ", "").replace("\n", ""))

        frontmatter = f"""---
title: "{topic['title']}"
date: 2026-03-07
series: {topic['series']}
topic_id: {topic['topic_id']}
lang: zh
format: article
word_count: {word_count}
status: draft
quality_score: {quality_score}
images: []
tags: {json.dumps(topic['tags'], ensure_ascii=False)}
twitter_summary: ""
---"""

        return f"{frontmatter}\n\n{article}"

Pipeline Orchestration

class ContentPipeline:
    """Content production Pipeline: orchestrates all Agents"""

    def __init__(self, series_config: dict):
        self.research = ResearchAgent()
        self.writer = ChineseDraftAgent(
            series_rules=series_config["rules"],
            brand_voice=series_config["voice"],
            anti_ai_patterns=series_config["banned_phrases"]
        )
        self.reviewer = QualityReviewer(
            rubric=series_config["rubric"],
            anti_ai_patterns=series_config["banned_phrases"]
        )
        self.reviser = RevisionAgent()
        self.formatter = FormatAgent()
        self.max_revision_cycles = 2

    async def produce(self, topic: dict) -> dict:
        """Full production workflow for a single article"""

        # 1. Research
        research_data = await self.research.research(topic)

        # 2. Write first draft
        draft = await self.writer.write(topic, research_data)

        # 3. Review + revision loop
        current_draft = draft
        for cycle in range(self.max_revision_cycles + 1):
            review = await self.reviewer.review(current_draft, topic)

            if review["total_score"] >= 75:
                # Passed review
                final = await self.formatter.format_article(
                    current_draft, topic, review["total_score"]
                )
                return {
                    "status": "approved",
                    "content": final,
                    "score": review["total_score"],
                    "revision_cycles": cycle
                }

            if cycle < self.max_revision_cycles:
                # Didn't pass — revise and re-review
                current_draft = await self.reviser.revise(
                    current_draft, review, cycle + 1
                )

        # Still didn't pass after two revisions — flag for human review
        return {
            "status": "needs_human_review",
            "content": current_draft,
            "score": review["total_score"],
            "last_review": review
        }

Lessons from the Field

Production Data

This system processed 300+ topics across two content formats — tweets and articles:

Metric	Tweet Series (D+F, 90 pieces)	Article Series (A+B+C+E, 210 pieces)
Research time per piece	8s	25s
Writing time per piece	12s	45s
Review time per piece	5s	18s
Total cost per piece	$0.06	$0.22
First-pass approval rate	78%	68%
Approval after 1 revision	95%	91%
Needs human review rate	5%	9%
Batch processing (parallel)	10/batch, ~2 min	5/batch, ~4 min

Total API cost for 300 pieces of content: ~$58. Time investment: two weeks (including system development + human review iterations).

Pitfalls We Hit

Pitfall 1: The writer Agent's "creative overreach." Initially, the writer Agent would fabricate production data (e.g., "latency decreased by 47%"). Solution: Emphasize in the system prompt that "all data must come from the research report — do not fabricate," and have the review Agent specifically check whether data has a cited source.

Pitfall 2: The review Agent gives its own Agent's articles high scores. Using the same model for both writing and reviewing creates a "self-preference" problem. Solution: Use a different temperature for the review Agent (0.3 vs. 0.7 for writing), and emphasize in the prompt: "You and the writer are different people — please evaluate objectively."

Pitfall 3: Rate limits during parallel execution. Running 5 articles through the Claude API simultaneously easily triggers rate limits (Claude API's Sonnet tier defaults to 4,000 RPM). Solution: Add a semaphore to control concurrency, and use exponential backoff for 429 errors.

Pitfall 4: Banned phrases slipping through. When the review Agent uses an LLM for checking, it occasionally misses banned phrases. Solution: Use regex matching as a hard check for banned phrases; reserve LLM review for semantic and quality evaluation only. If a rule can handle it, don't hand it to the LLM.

Conclusion

Three core takeaways:

The value of an Agent Pipeline lies not in any single Agent's capability, but in the automation of the workflow — Research, writing, review, revision, formatting — none of these steps are individually hard, but orchestrating them into a managed pipeline is what truly saves time.
Quality gates are the heart of an Agent Pipeline — A pipeline without a review Agent is a machine for mass-producing garbage. A 75-point threshold + max 2 revision rounds + human fallback — these three layers of gating ensure output quality.
Rule-based checks and LLM checks should work in tandem — Use rule-based checks for banned phrases, format validation, and data completeness (deterministic, zero cost). Use LLM evaluation for tonal consistency, content depth, and originality (flexible but variable). The two are complementary.

If you want to build your own content production Agent, start with the simplest two-step setup — one writing Agent + one review Agent. Validate quality on 10 pieces of content, tune the prompts, then add research and formatting stages.

Are you using AI for content production? Which part of the workflow has seen the biggest efficiency gain? I'd love to discuss.

How I Built an AI Content Agent — The Complete Workflow

How I Built an AI Content Agent — The Complete Workflow

Opening

Problem Background

Core Architecture

Pipeline Overview

Design Principles

Implementation Details

Agent 1: Research Agent

Agent 2: Chinese Draft Agent

Agent 3: Quality Reviewer

Agent 4: Revision Agent

Agent 5: Format Agent

Pipeline Orchestration

Lessons from the Field

Production Data

Pitfalls We Hit

Conclusion

Keep reading.

Building Your First AI Agent Team with Claude + n8n

How to Connect Your AI Agent to Company Data — A Step-by-Step Guide

LangChain vs CrewAI vs Building from Scratch — My Experience