How I Built an AI Content Agent — The Complete Workflow

How I Built an AI Content Agent — The Complete Workflow
Opening
The article you're reading right now — from topic selection to first draft to review to formatting — was entirely produced by an AI Agent Pipeline. This system has processed over 300 topics across 6 content series, Chinese-first, supporting both article and tweet formats. This article fully discloses the architecture, each Agent's responsibilities, the quality control mechanisms, and real-world production data.
Problem Background
When you're a one-person content operation, the bottleneck isn't writing itself — it's the entire writing workflow:
- Research: A technical article needs 1-2 hours of research — reading docs, comparing prices, finding data
- Writing: 2-3 hours for a first draft, plus maintaining tonal consistency
- Review: Reviewing your own writing makes it hard to spot problems
- Formatting: YAML frontmatter, file naming, directory structure — tedious but must be accurate
To produce 300 pieces of content at a human pace (one per day), you'd need an entire year. With the Agent Pipeline, I completed 80% of the first drafts in two weeks, spending the remaining time on human review and iteration.
Core Architecture
Pipeline Overview
Topic Source (300+ topics)
↓
┌──────────────────────────────────────────┐
│ Content Pipeline │
│ │
│ ┌──────────┐ ┌──────────┐ │
│ │ Research │───→│ Chinese │ │
│ │ Agent │ │ Draft │ │
│ │ │ │ Agent │ │
│ └──────────┘ └────┬─────┘ │
│ │ │
│ ┌────▼─────┐ │
│ │ Quality │ │
│ │ Reviewer │ │
│ │ │ │
│ └────┬─────┘ │
│ │ │
│ ┌────────┴────────┐ │
│ │ │ │
│ score >= 75 score < 75 │
│ │ │ │
│ ┌─────▼─────┐ ┌──────▼──────┐ │
│ │ Format │ │ Revision │ │
│ │ Agent │ │ Agent │ │
│ │ │ │ │ │
│ └─────┬─────┘ └──────┬──────┘ │
│ │ │ │
│ │ (back to review,│
│ │ max 2 rounds) │
│ ↓ │
│ Final Output │
└──────────────────────────────────────────┘
Design Principles
- Single responsibility: Each Agent does one thing only — the researcher doesn't write, the writer doesn't format
- Quality gates: The review Agent uses a scoring system to gate quality; below 75 triggers automatic revision
- Chinese-first: All content is written in Chinese first, not translated from English
- Batch processing: Tweets run 10 per batch, articles 5 per batch, executed in parallel
Implementation Details
Agent 1: Research Agent
The Research Agent's core task: given a topic, use WebSearch to collect facts, prices, and data, then output a structured research report.
import anthropic
import json
class ResearchAgent:
"""Research Agent: collects factual data"""
def __init__(self):
self.client = anthropic.Anthropic()
async def research(self, topic: dict) -> dict:
"""Research a given topic"""
system_prompt = """You are a professional technical researcher.
Task: Collect the latest factual data for the given topic.
Output format (JSON):
{
"key_facts": [
{"fact": "description", "source": "source", "date": "data date"}
],
"pricing_data": {
"product_name": {"price": "xxx", "source": "url"}
},
"statistics": [
{"metric": "metric name", "value": "value", "context": "context"}
],
"competitor_info": [...],
"technical_details": [...]
}
Rules:
1. Only output verifiable facts, no speculation
2. Cite the source for each data point
3. Pricing data must include currency and date
4. If a category of information cannot be found, leave the field as an empty array"""
response = self.client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=4096,
system=system_prompt,
messages=[{
"role": "user",
"content": f"""Topic information:
Title: {topic['title']}
Description: {topic['description']}
Series: {topic['series']}
Keywords: {', '.join(topic['keywords'])}"""
}]
)
return json.loads(response.content[0].text)
Agent 2: Chinese Draft Agent
This is the most critical Agent. It receives research data and writing rules, and outputs a Chinese article.
class ChineseDraftAgent:
"""Chinese writing Agent"""
def __init__(self, series_rules: str, brand_voice: str, anti_ai_patterns: list[str]):
self.client = anthropic.Anthropic()
self.series_rules = series_rules
self.brand_voice = brand_voice
self.banned_phrases = anti_ai_patterns
async def write(self, topic: dict, research_data: dict) -> str:
"""Write an article based on research data"""
system_prompt = f"""You are Jessie Qin's AI writing assistant.
## Author Background
- CS PhD + NYU Stern Master
- Senior Member of Technical Staff, Generative AI
- Founder of Solo Unicorn Club
- 12 years of living in the US, bilingual Chinese-English thinker
## Writing Rules
{self.series_rules}
## Brand Voice
{self.brand_voice}
## Banned Phrases (any occurrence counts as a quality failure)
{json.dumps(self.banned_phrases, ensure_ascii=False)}
## Core Requirements
1. Write in native Chinese, not translated from English
2. First person, based on hands-on experience
3. 2000-3000 words
4. Must include code examples
5. Must include production data (latency, cost, accuracy)
6. Keep technical terms in English: Agent, RAG, LLM, token, etc.
7. Every paragraph should have information density — no filler
## Article Structure
Opening hook → Problem background → Core architecture → Implementation details (with code) → Field lessons → Conclusion (3 takeaways)"""
response = self.client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=8192,
system=system_prompt,
messages=[{
"role": "user",
"content": f"""Topic: {topic['title']}
Description: {topic['description']}
Research data:
{json.dumps(research_data, ensure_ascii=False, indent=2)}
Please write the complete article."""
}]
)
return response.content[0].text
Agent 3: Quality Reviewer
class QualityReviewer:
"""Quality review Agent"""
def __init__(self, rubric: dict, anti_ai_patterns: list[str]):
self.client = anthropic.Anthropic()
self.rubric = rubric
self.anti_ai_patterns = anti_ai_patterns
async def review(self, article: str, topic: dict) -> dict:
"""Review article quality"""
# Step 1: Rule-based checks (no LLM needed)
rule_issues = self._rule_check(article)
# Step 2: LLM evaluation
review_prompt = f"""You are a strict content quality reviewer.
## Scoring Criteria (20 points each, 100 total)
1. Content Depth (20 pts)
- Contains specific data and case studies?
- Has code examples?
- Has production environment data?
2. Structural Clarity (20 pts)
- Logical flow?
- Natural transitions?
- Clear hierarchy?
3. Brand Consistency (20 pts)
- First-person practitioner perspective?
- Concise, no-nonsense tone?
- No AI-sounding writing?
4. Technical Accuracy (20 pts)
- Code syntax correct?
- Data consistent with descriptions?
- Terminology used accurately?
5. Original Value (20 pts)
- Unique perspectives or experiences?
- Not a generic tutorial?
- Reader gains something?
Output JSON:
{{
"total_score": 82,
"dimensions": {{
"content_depth": {{"score": 18, "feedback": "..."}},
"structure": {{"score": 16, "feedback": "..."}},
"brand_voice": {{"score": 17, "feedback": "..."}},
"technical_accuracy": {{"score": 15, "feedback": "..."}},
"originality": {{"score": 16, "feedback": "..."}}
}},
"critical_issues": ["..."],
"improvement_suggestions": ["..."]
}}
Topic: {topic['title']}
Article content:
{article}"""
response = self.client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=2048,
messages=[{"role": "user", "content": review_prompt}]
)
llm_review = json.loads(response.content[0].text)
# Merge rule-based checks with LLM evaluation
if rule_issues:
llm_review["critical_issues"].extend(rule_issues)
# Deduct 5 points per banned phrase
penalty = len(rule_issues) * 5
llm_review["total_score"] = max(0, llm_review["total_score"] - penalty)
return llm_review
def _rule_check(self, article: str) -> list[str]:
"""Rule-based checks (banned phrases, etc.)"""
issues = []
for phrase in self.anti_ai_patterns:
if phrase in article:
issues.append(f"Contains banned phrase: '{phrase}'")
return issues
Agent 4: Revision Agent
class RevisionAgent:
"""Revision Agent: revises articles based on review feedback"""
def __init__(self):
self.client = anthropic.Anthropic()
async def revise(
self, article: str, review: dict, attempt: int
) -> str:
"""Revise the article based on review feedback"""
system_prompt = f"""You are an article revision expert.
You've received an article and review feedback. Revise the article according to the feedback.
## Revision Rules
1. Only fix the issues identified in the review — don't overhaul parts that are fine
2. Preserve the overall structure and style of the original
3. If the review asks for additional content, weave it in naturally — don't force it
4. All banned phrases must be replaced
5. This is revision {attempt}/2 — please address every issue carefully
Review feedback:
{json.dumps(review, ensure_ascii=False, indent=2)}"""
response = self.client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=8192,
system=system_prompt,
messages=[{
"role": "user",
"content": f"Please revise the following article:\n\n{article}"
}]
)
return response.content[0].text
Agent 5: Format Agent
class FormatAgent:
"""Format Agent: generates the final file"""
def __init__(self):
self.client = anthropic.Anthropic()
async def format_article(
self, article: str, topic: dict, quality_score: int
) -> str:
"""Add frontmatter, validate format, output final file"""
# Calculate word count
word_count = len(article.replace(" ", "").replace("\n", ""))
frontmatter = f"""---
title: "{topic['title']}"
date: 2026-03-07
series: {topic['series']}
topic_id: {topic['topic_id']}
lang: zh
format: article
word_count: {word_count}
status: draft
quality_score: {quality_score}
images: []
tags: {json.dumps(topic['tags'], ensure_ascii=False)}
twitter_summary: ""
---"""
return f"{frontmatter}\n\n{article}"
Pipeline Orchestration
class ContentPipeline:
"""Content production Pipeline: orchestrates all Agents"""
def __init__(self, series_config: dict):
self.research = ResearchAgent()
self.writer = ChineseDraftAgent(
series_rules=series_config["rules"],
brand_voice=series_config["voice"],
anti_ai_patterns=series_config["banned_phrases"]
)
self.reviewer = QualityReviewer(
rubric=series_config["rubric"],
anti_ai_patterns=series_config["banned_phrases"]
)
self.reviser = RevisionAgent()
self.formatter = FormatAgent()
self.max_revision_cycles = 2
async def produce(self, topic: dict) -> dict:
"""Full production workflow for a single article"""
# 1. Research
research_data = await self.research.research(topic)
# 2. Write first draft
draft = await self.writer.write(topic, research_data)
# 3. Review + revision loop
current_draft = draft
for cycle in range(self.max_revision_cycles + 1):
review = await self.reviewer.review(current_draft, topic)
if review["total_score"] >= 75:
# Passed review
final = await self.formatter.format_article(
current_draft, topic, review["total_score"]
)
return {
"status": "approved",
"content": final,
"score": review["total_score"],
"revision_cycles": cycle
}
if cycle < self.max_revision_cycles:
# Didn't pass — revise and re-review
current_draft = await self.reviser.revise(
current_draft, review, cycle + 1
)
# Still didn't pass after two revisions — flag for human review
return {
"status": "needs_human_review",
"content": current_draft,
"score": review["total_score"],
"last_review": review
}
Lessons from the Field
Production Data
This system processed 300+ topics across two content formats — tweets and articles:
| Metric | Tweet Series (D+F, 90 pieces) | Article Series (A+B+C+E, 210 pieces) |
|---|---|---|
| Research time per piece | 8s | 25s |
| Writing time per piece | 12s | 45s |
| Review time per piece | 5s | 18s |
| Total cost per piece | $0.06 | $0.22 |
| First-pass approval rate | 78% | 68% |
| Approval after 1 revision | 95% | 91% |
| Needs human review rate | 5% | 9% |
| Batch processing (parallel) | 10/batch, ~2 min | 5/batch, ~4 min |
Total API cost for 300 pieces of content: ~$58. Time investment: two weeks (including system development + human review iterations).
Pitfalls We Hit
Pitfall 1: The writer Agent's "creative overreach." Initially, the writer Agent would fabricate production data (e.g., "latency decreased by 47%"). Solution: Emphasize in the system prompt that "all data must come from the research report — do not fabricate," and have the review Agent specifically check whether data has a cited source.
Pitfall 2: The review Agent gives its own Agent's articles high scores. Using the same model for both writing and reviewing creates a "self-preference" problem. Solution: Use a different temperature for the review Agent (0.3 vs. 0.7 for writing), and emphasize in the prompt: "You and the writer are different people — please evaluate objectively."
Pitfall 3: Rate limits during parallel execution. Running 5 articles through the Claude API simultaneously easily triggers rate limits (Claude API's Sonnet tier defaults to 4,000 RPM). Solution: Add a semaphore to control concurrency, and use exponential backoff for 429 errors.
Pitfall 4: Banned phrases slipping through. When the review Agent uses an LLM for checking, it occasionally misses banned phrases. Solution: Use regex matching as a hard check for banned phrases; reserve LLM review for semantic and quality evaluation only. If a rule can handle it, don't hand it to the LLM.
Conclusion
Three core takeaways:
-
The value of an Agent Pipeline lies not in any single Agent's capability, but in the automation of the workflow — Research, writing, review, revision, formatting — none of these steps are individually hard, but orchestrating them into a managed pipeline is what truly saves time.
-
Quality gates are the heart of an Agent Pipeline — A pipeline without a review Agent is a machine for mass-producing garbage. A 75-point threshold + max 2 revision rounds + human fallback — these three layers of gating ensure output quality.
-
Rule-based checks and LLM checks should work in tandem — Use rule-based checks for banned phrases, format validation, and data completeness (deterministic, zero cost). Use LLM evaluation for tonal consistency, content depth, and originality (flexible but variable). The two are complementary.
If you want to build your own content production Agent, start with the simplest two-step setup — one writing Agent + one review Agent. Validate quality on 10 pieces of content, tune the prompts, then add research and formatting stages.
Are you using AI for content production? Which part of the workflow has seen the biggest efficiency gain? I'd love to discuss.