Solo Unicorn Club logoSolo Unicorn
2,820 words

How I Designed an 8-Agent Community Management System

AI AgentMulti-AgentCommunity ManagementSolo Unicorn ClubCase Study
How I Designed an 8-Agent Community Management System

How I Designed an 8-Agent Community Management System

Opening

The Solo Unicorn Club now has over 2,000 members spanning finance, tech, design, law, and more. Managing a community this large by yourself is simply not humanly possible. I spent three weeks building an 8-Agent system that now automatically processes 200+ messages daily, with monthly API costs under $45. This article is a full breakdown of the design thinking, architectural details, and pitfalls I encountered along the way.

The Problem

The pain points of community management are very specific:

  • Message volume: 200-400 messages per day, ranging from technical questions to event signups to casual chat
  • Response time expectations: Members expect replies within minutes, not hours
  • Scattered knowledge: Historical discussions, event info, and resource links are scattered everywhere
  • Repetitive work: Over 60% of questions are repeats, yet each one requires a manual response

I tried several approaches:

  1. Pure manual: Spending 3-4 hours per day answering messages. Unsustainable
  2. Keyword bot: Too dumb, poor matching accuracy, bad member experience
  3. Single AI Agent: Prompt stuffed with 8,000 tokens, capable of everything but good at nothing

I ultimately went with a Multi-Agent architecture for one core reason: community management subtasks are wildly different from each other — onboarding, Q&A, content recommendations, event management — each requires a completely different context and behavioral pattern.

Core Architecture

Design Principles

Three principles, set in stone before building:

  1. Router-first: All messages go through the Router Agent first. It only classifies — never replies — to ensure accurate routing
  2. Specialists over generalists: Each Worker Agent does one thing only, with prompts trimmed to under 800 tokens
  3. Human escalation: Any question where the Agent is uncertain (confidence < 0.7) automatically escalates to me

System Architecture Diagram

User Message → Router Agent (classification)
                │
    ┌───────────┼───────────────────────┐
    │           │           │           │
    ▼           ▼           ▼           ▼
 Greeter    Q&A Agent   Content     Event
 Agent      (Q&A)       Curator    Scheduler
 (Welcome)              (Content)  (Events)
    │           │           │           │
    │     ┌─────┼─────┐     │           │
    │     ▼           ▼     │           │
    │  Knowledge    Search  │           │
    │  Base Agent   Agent   │           │
    │  (KB)        (Search) │           │
    │                       │           │
    └───────────┬───────────┘           │
                ▼                       │
          Digest Agent ←────────────────┘
          (Daily Digest)

The 8 Agents and Their Roles

Agent Responsibility Model Avg Latency Daily Calls
Router Message classification and routing GPT-4.1-mini 0.4s 220
Greeter New member welcome and onboarding GPT-4.1-mini 0.8s 8
Q&A Agent Answering community questions Claude Sonnet 4.5 2.1s 85
Content Curator Recommending relevant content and resources GPT-4.1 1.5s 35
Event Scheduler Event creation and registration management GPT-4.1-mini 0.6s 15
Knowledge Base Retrieving info from historical discussions text-embedding-3-small 0.3s 85
Search Agent Searching external information GPT-4.1-mini 1.8s 25
Digest Agent Generating daily/weekly community digests Claude Sonnet 4.5 4.2s 1

Implementation Details

Step 1: Router Agent — The Most Critical Component

The Router's accuracy directly determines the entire system's performance. I use GPT-4.1-mini for classification because this task doesn't need strong reasoning — it needs speed and consistency.

from openai import OpenAI
from pydantic import BaseModel
from enum import Enum

client = OpenAI()

class MessageCategory(str, Enum):
    GREETING = "greeting"       # New member introduction
    QUESTION = "question"       # Asking a question
    CONTENT = "content"         # Content sharing/discussion
    EVENT = "event"             # Event-related
    CHITCHAT = "chitchat"       # Casual chat (no action)
    ESCALATE = "escalate"       # Needs human intervention

class RouterResult(BaseModel):
    category: MessageCategory
    confidence: float           # 0-1 confidence score
    sub_topic: str              # Subtopic for downstream Agents
    needs_context: bool         # Whether to pull message history

ROUTER_PROMPT = """You are a community message classifier. Classify each message.
Rules:
- greeting: New member introduction or saying hello
- question: Any kind of question (technical, tools, career, business)
- content: Sharing articles, tools, opinions
- event: Event registration, schedule inquiries, meetups
- chitchat: Casual chat, memes, no substantive content
- escalate: Complaints, disputes, sensitive topics, anything you're unsure about
When confidence is below 0.7, set category to escalate."""

def route_message(message: str, user_info: dict) -> RouterResult:
    """Message classification: fast, accurate, low-cost"""
    response = client.beta.chat.completions.parse(
        model="gpt-4.1-mini",
        messages=[
            {"role": "system", "content": ROUTER_PROMPT},
            {"role": "user", "content": f"User info: {user_info}\nMessage: {message}"}
        ],
        response_format=RouterResult,
        temperature=0.1,  # Low temperature for classification tasks
    )
    return response.choices[0].message.parsed

Lesson learned: The Router's prompt originally lacked an escalate category, causing ambiguous messages to get randomly routed to other categories. After adding escalate + a confidence threshold, routing accuracy jumped from 82% to 94%.

Step 2: Q&A Agent — The Workhorse

The Q&A Agent handles the highest volume and demands the highest quality. Its workflow: first check the Knowledge Base — if a historical answer exists, cite it directly; if not, call the Search Agent for external information.

async def qa_agent(question: str, context: dict) -> str:
    """Q&A Agent: check internal knowledge base first, then external sources"""

    # Step 1: Search internal knowledge base
    kb_results = await knowledge_base.search(
        query=question,
        top_k=3,
        min_score=0.78  # Similarity threshold
    )

    if kb_results and kb_results[0].score > 0.85:
        # High similarity: generate answer based on historical response
        source = "knowledge_base"
        reference = kb_results[0].content
    else:
        # Low similarity: search external information
        source = "web_search"
        reference = await search_agent.search(question)

    # Step 2: Generate answer
    response = await client.chat.completions.create(
        model="claude-sonnet-4-5",  # Use a strong model for answer quality
        messages=[
            {"role": "system", "content": QA_PROMPT},
            {"role": "user", "content": f"""Question: {question}
Reference material (source: {source}): {reference}
User background: {context.get('user_profile', 'Unknown')}"""}
        ],
        max_tokens=500,  # Community answers should be concise
    )
    answer = response.choices[0].message.content

    # Step 3: Write new answer to knowledge base (for future retrieval)
    await knowledge_base.upsert(
        content=f"Q: {question}\nA: {answer}",
        metadata={"source": source, "timestamp": datetime.now().isoformat()}
    )
    return answer

Step 3: Greeter Agent — The New Member Experience

GREETER_PROMPT = """You are the welcome assistant for the Solo Unicorn Club. Tone: warm but not cheesy, professional but not cold.
Tasks:
1. Welcome the new member
2. Based on their introduction, recommend 2-3 discussion topics they might find interesting
3. Share community guidelines (brief version)
4. Encourage them to write a short self-introduction

Keep it under 150 words. No emojis."""

async def greet_new_member(intro_message: str, member_name: str) -> str:
    """New member onboarding: personalized welcome + topic recommendations"""
    # Retrieve recent hot topics to recommend to the newcomer
    hot_topics = await get_recent_hot_topics(limit=5)

    response = await client.chat.completions.create(
        model="gpt-4.1-mini",
        messages=[
            {"role": "system", "content": GREETER_PROMPT},
            {"role": "user", "content": f"""New member: {member_name}
Self-introduction: {intro_message}
Recent hot topics: {hot_topics}"""}
        ],
        max_tokens=200,
    )
    return response.choices[0].message.content

Step 4: Knowledge Base — The Heart of RAG

The knowledge base uses Qdrant as a vector store, holding all historical Q&A and discussion content.

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

class CommunityKnowledgeBase:
    def __init__(self):
        self.client = QdrantClient(url="http://localhost:6333")
        self.collection = "community_knowledge"
        self.embedding_model = "text-embedding-3-small"

    async def search(self, query: str, top_k: int = 3,
                     min_score: float = 0.75) -> list:
        """Semantic search: find the most relevant historical content"""
        # Generate query embedding
        query_vector = await self._embed(query)

        results = self.client.query_points(
            collection_name=self.collection,
            query=query_vector,
            limit=top_k,
            score_threshold=min_score,
        )
        return results.points

    async def upsert(self, content: str, metadata: dict):
        """Write new content to the knowledge base"""
        vector = await self._embed(content)
        point = PointStruct(
            id=str(uuid4()),
            vector=vector,
            payload={"content": content, **metadata}
        )
        self.client.upsert(
            collection_name=self.collection,
            points=[point]
        )

Key numbers: Embeddings use text-embedding-3-small ($0.02/1M tokens). Fully indexing ~15,000 pieces of historical content costs less than $0.30. Retrieval latency at p95 is ~45ms.

Step 5: Digest Agent — Automated Daily Summaries

Runs automatically every night, summarizing the day's discussion highlights.

async def generate_daily_digest(date: str) -> str:
    """Generate a daily community digest"""
    # Pull all non-chitchat messages from the day
    messages = await get_messages_by_date(date, exclude=["chitchat"])

    response = await client.chat.completions.create(
        model="claude-sonnet-4-5",  # Summarization needs strong synthesis ability
        messages=[
            {"role": "system", "content": """Generate a community daily digest. Format:
## Today's Highlights (Date)
### Hot Discussions (3-5 items)
### Useful Resources (with links)
### New Members (Welcome XX new friends)
### Upcoming Events
Keep it concise — no more than two sentences per item."""},
            {"role": "user", "content": f"Today's messages ({len(messages)} total):\n{format_messages(messages)}"}
        ],
        max_tokens=800,
    )
    return response.choices[0].message.content

Practical Lessons

Production Data (30-Day Average)

Metric Data
Daily messages processed 215
Router classification accuracy 94.2%
Q&A satisfaction rate (member feedback) 87%
Average response latency 2.3 seconds
Human escalation rate 8.5%
Monthly API cost (average) $42.80

Cost Breakdown

Agent Model Monthly Tokens Monthly Cost
Router GPT-4.1-mini 1.2M $0.72
Greeter GPT-4.1-mini 180K $0.11
Q&A Claude Sonnet 4.5 1.8M $18.90
Content Curator GPT-4.1 650K $2.60
Event Scheduler GPT-4.1-mini 280K $0.17
Knowledge Base embedding-3-small 900K $0.02
Search Agent GPT-4.1-mini 480K $0.29
Digest Agent Claude Sonnet 4.5 350K $3.68
Qdrant hosting - - $16.31
Total $42.80

The Q&A Agent eats up 44% of API costs because it uses Claude Sonnet 4.5 and has the highest call volume. Switching to GPT-4.1 would cut costs by 40%, but answer quality drops noticeably — I've tested this.

Pitfalls I've Hit

Pitfall 1: Over-complicated Router prompt. The initial Router prompt was 2,000 tokens long — not just classifying, but also analyzing sentiment and extracting keywords. This made latency high and responses inconsistent. I trimmed it to 500 tokens, focusing purely on classification, and accuracy actually improved.

Pitfall 2: Knowledge Base cold start. During the system's first week, the knowledge base was nearly empty, so the Q&A Agent constantly had to call the Search Agent, driving up both latency and cost. Solution: before launch, I pre-loaded 500 FAQs covering 80% of common questions.

Pitfall 3: Digest Agent hallucinations. One daily digest included a discussion topic that never happened — the model "filled in" content on its own. Solution: I added explicit instructions in the prompt ("only summarize the messages provided; do not add any information not in the source material") and built a simple fact-check step that cross-validates generated summaries against source messages using keyword matching.

Pitfall 4: Evening peak concurrency. Between 8-10 PM, message volume was 3x the average, and API rate limits became a bottleneck. Solution: non-urgent tasks (content recommendations, daily digest) were moved to async processing; during peak hours, only the Router and Q&A Agent get real-time priority.

Takeaways

Three things to remember:

  1. The Router Agent is the lifeline of a Multi-Agent system — spending 50% of your debugging time on the Router is not overkill. Every percentage point improvement in routing accuracy lifts the effective output of every downstream Agent
  2. The Knowledge Base determines the system's long-term value — prepare seed data for cold start, then keep accumulating. Three months in, my knowledge base covers 90%+ of common questions, and Q&A Agent search calls have dropped by 60%
  3. Model tiering is the key to cost control — not every Agent needs the strongest model. Router and Greeter use mini models, Q&A and Digest use strong models. Overall cost drops 60% without impacting experience

If you're managing a community too, start with just two Agents: a Router + a Q&A Agent. Get those working, then add others. Want to see the exact prompts and configs? Come to the Solo Unicorn Club — I'm happy to share directly.