企业 AI Agent 的 Human-in-the-Loop 框架

开场

去年帮一家电商公司部署了一个客服 Agent，上线第三天，Agent 自作主张给一个投诉客户退了 $2,400——那个订单的金额只有 $240。原因是 Agent 把"退全款"理解成了退十倍。这个事故让我花了两天时间重新设计 Human-in-the-Loop 机制。教训很简单：AI Agent 最危险的不是不够聪明，而是太自信地做错事。

问题背景

2026 年，企业部署 AI Agent 面临一个两难：

全自动化：效率高，但出错时损失大，且企业的合规要求不允许完全放手
每步审批：安全，但效率被压缩到甚至不如人工操作

正确的做法是分级管理——低风险操作自动执行，高风险操作人工审批，中间地带用 confidence threshold 动态决策。

多数团队犯的错误是把 Human-in-the-Loop 当作一个开关：要么全自动，要么全审批。实际上它应该是一个连续光谱，根据操作的风险等级和 Agent 的 confidence 动态调整。

核心框架：三层防护

第一层：Confidence Threshold（置信度阈值）

每个 Agent 的输出都附带一个 confidence score。根据分数决定是否需要人工介入。

from dataclasses import dataclass
from enum import Enum

class ActionLevel(Enum):
    AUTO = "auto"           # 直接执行
    NOTIFY = "notify"       # 执行后通知人类
    APPROVE = "approve"     # 先审批再执行
    ESCALATE = "escalate"   # 转交人类处理

@dataclass
class ConfidenceConfig:
    """不同操作类型的置信度阈值配置"""
    auto_threshold: float      # 高于此值：自动执行
    notify_threshold: float    # 高于此值：执行后通知
    approve_threshold: float   # 高于此值：等待审批
    # 低于 approve_threshold：自动 escalate

# 按风险等级配置阈值
THRESHOLDS = {
    "read_only": ConfidenceConfig(
        auto_threshold=0.6,    # 查询类操作，门槛低
        notify_threshold=0.4,
        approve_threshold=0.2,
    ),
    "low_risk_write": ConfidenceConfig(
        auto_threshold=0.85,   # 低风险写入，门槛中等
        notify_threshold=0.7,
        approve_threshold=0.5,
    ),
    "high_risk_write": ConfidenceConfig(
        auto_threshold=0.95,   # 高风险操作，门槛极高
        notify_threshold=0.85,
        approve_threshold=0.7,
    ),
    "financial": ConfidenceConfig(
        auto_threshold=0.99,   # 涉及资金，几乎不自动执行
        notify_threshold=0.95,
        approve_threshold=0.8,
    ),
}

def determine_action_level(
    confidence: float,
    operation_type: str,
    amount: float = 0,
) -> ActionLevel:
    """根据置信度和操作类型，决定是否需要人工介入"""
    config = THRESHOLDS.get(operation_type, THRESHOLDS["high_risk_write"])

    # 金额超过阈值，强制审批
    if amount > 500:
        return ActionLevel.APPROVE
    if amount > 5000:
        return ActionLevel.ESCALATE

    if confidence >= config.auto_threshold:
        return ActionLevel.AUTO
    elif confidence >= config.notify_threshold:
        return ActionLevel.NOTIFY
    elif confidence >= config.approve_threshold:
        return ActionLevel.APPROVE
    else:
        return ActionLevel.ESCALATE

关键设计决策：

阈值不是一刀切，而是按操作类型分级
金额是硬规则，超过 $500 无论 confidence 多高都要审批
ESCALATE 不是失败，是系统在保护自己

第二层：Approval Workflow（审批流程）

当 Agent 的操作需要审批时，系统生成一个结构化的审批请求。

import asyncio
from datetime import datetime, timedelta

@dataclass
class ApprovalRequest:
    request_id: str
    agent_name: str
    action: str                 # 要执行的操作
    reasoning: str              # Agent 的推理过程
    confidence: float
    impact: str                 # 影响范围描述
    affected_amount: float
    context: dict               # 相关上下文
    deadline: datetime          # 审批截止时间
    fallback_action: str        # 超时后的默认操作

class ApprovalWorkflow:
    def __init__(self, notification_service, timeout_minutes: int = 30):
        self.notification = notification_service
        self.timeout = timedelta(minutes=timeout_minutes)
        self.pending: dict[str, ApprovalRequest] = {}

    async def request_approval(self, request: ApprovalRequest) -> bool:
        """发起审批请求，等待人类决策"""
        # 1. 生成审批摘要（给人类看的，不是给 AI 看的）
        summary = self._format_for_human(request)

        # 2. 通过多渠道通知（Slack + 邮件 + 短信分级）
        if request.affected_amount > 1000:
            await self.notification.send_urgent(summary)  # 短信 + Slack
        else:
            await self.notification.send_normal(summary)   # Slack only

        # 3. 等待审批，带超时
        self.pending[request.request_id] = request
        try:
            decision = await asyncio.wait_for(
                self._wait_for_decision(request.request_id),
                timeout=self.timeout.total_seconds()
            )
            return decision
        except asyncio.TimeoutError:
            # 超时处理：执行 fallback
            await self._handle_timeout(request)
            return False

    def _format_for_human(self, req: ApprovalRequest) -> str:
        """格式化审批请求，让人类快速做决策"""
        return f"""
--- AI Agent 审批请求 ---
Agent: {req.agent_name}
操作: {req.action}
置信度: {req.confidence:.0%}
涉及金额: ${req.affected_amount:,.2f}
影响: {req.impact}
Agent 推理: {req.reasoning}
截止时间: {req.deadline.strftime('%H:%M')}
超时默认: {req.fallback_action}
---
回复 Y 批准 / N 拒绝 / M 手动处理
"""

设计要点：

审批请求必须包含 Agent 的推理过程，让人类知道"它为什么做这个决定"
必须有超时机制——不能让一个待审批的请求永远挂着
超时后执行保守的 fallback action，而非直接执行原操作

第三层：Escalation Pattern（升级模式）

不是所有问题都能通过审批解决。有些情况需要完全交给人类。

class EscalationManager:
    # 必须 escalate 的场景（硬规则，不看 confidence）
    HARD_ESCALATION_RULES = [
        "涉及法律合规问题",
        "客户明确要求与人工对话",
        "涉及个人敏感信息（身份证、银行卡）",
        "Agent 连续两次被审批拒绝",
        "同一用户 24 小时内第三次触发审批",
    ]

    async def evaluate_escalation(
        self,
        agent_output: dict,
        conversation_history: list,
        user_context: dict,
    ) -> bool:
        """评估是否需要升级到人工"""

        # 检查硬规则
        for rule in self.HARD_ESCALATION_RULES:
            if self._matches_rule(rule, agent_output, user_context):
                await self._escalate(
                    reason=rule,
                    priority="high",
                    context=conversation_history,
                )
                return True

        # 软规则：连续低 confidence
        recent_scores = self._get_recent_confidence_scores(
            user_id=user_context["user_id"],
            window=timedelta(hours=1)
        )
        if len(recent_scores) >= 3 and all(s < 0.6 for s in recent_scores):
            await self._escalate(
                reason="连续低置信度，Agent 可能无法处理此用户的需求",
                priority="medium",
                context=conversation_history,
            )
            return True

        return False

实战经验

生产数据

在电商客服系统上线三个月后的数据：

指标	上线初期	优化后
日均处理工单	450	520
自动完成率	62%	78%
需审批比例	28%	15%
人工 escalation	10%	7%
审批平均等待时间	18 分钟	6 分钟
错误执行率	3.2%	0.4%
退款错误	2 次/周	0 次/月

关键优化点：

把 confidence threshold 从固定值改成按操作类型分级，自动完成率提高了 16 个百分点
审批请求从纯文字改成结构化卡片（带金额高亮和一键操作），审批等待时间从 18 分钟降到 6 分钟
加了"学习循环"：被批准的审批请求自动加入训练数据，下次类似情况 confidence 更高

阈值调优方法

不要拍脑袋定阈值，用数据驱动：

def calibrate_thresholds(historical_data: list[dict]) -> dict:
    """基于历史数据校准阈值"""
    # 按操作类型分组
    grouped = group_by(historical_data, key="operation_type")

    for op_type, records in grouped.items():
        # 找到人类审批通过率 > 95% 的最低 confidence
        sorted_records = sorted(records, key=lambda r: r["confidence"])
        for i, record in enumerate(sorted_records):
            remaining = sorted_records[i:]
            approval_rate = sum(1 for r in remaining if r["approved"]) / len(remaining)
            if approval_rate >= 0.95:
                print(f"{op_type}: auto_threshold = {record['confidence']:.2f}")
                break

这个方法的逻辑：找到一个 confidence 分界线，在这个分界线以上，人类审批通过率超过 95%。那这些操作就可以放心自动执行。

踩过的坑

坑 1：notification fatigue。最初什么都发通知，审批人一天收 80 条 Slack 消息，很快就开始忽略了。解决方案：只有真正需要人工判断的才通知，纯信息类的写日志就行。

坑 2：没有 fallback action。审批请求超时后系统卡住了，后面的工单全部排队。解决方案：每个审批请求必须定义一个安全的 fallback——通常是"礼貌告知用户稍后处理"。

坑 3：confidence 分数不可靠。模型自己评估的 confidence 经常偏高（overconfident）。解决方案：不完全依赖模型自报的 confidence，额外用一些规则做 sanity check——比如涉及金额的操作，检查金额是否在合理范围内。

对比选型

方案	适用场景	优点	缺点
纯规则 + 白名单	操作类型少且固定	简单可控	不灵活，新操作要手动配置
Confidence 阈值	操作多样但风险可量化	动态适应	依赖 confidence 的准确性
LLM 二次判断	复杂场景需语义理解	理解力强	成本高，延迟增加
混合方案（推荐）	企业级部署	兼顾安全和效率	配置复杂度较高

总结

三条 takeaway：

Human-in-the-Loop 不是限制 AI，而是让 AI 可以上线——没有审批机制的 Agent 系统不会被企业采用。加了这一层，反而能给 Agent 更大的权限范围
阈值要用数据校准，不要拍脑袋——跑两周的灰度数据，分析 confidence 和人类判断的相关性，再定阈值。我的经验是初始阈值偏保守（多审批），然后逐步放宽
审批体验和审批机制一样重要——审批人如果收到一大段 AI 生成的文字，大概率不看就批了。结构化卡片 + 关键信息高亮 + 一键操作，审批速度能快 3 倍

如果你在企业里推 AI Agent，先搭好 Human-in-the-Loop 框架再开始。这不是"以后再加"的功能，这是上线的前提。

你的 Agent 系统是怎么做人机协作的？有什么好的实践？来一人独角兽俱乐部分享。

企业 AI Agent 的 Human-in-the-Loop 框架

企业 AI Agent 的 Human-in-the-Loop 框架

开场

问题背景

核心框架：三层防护

第一层：Confidence Threshold（置信度阈值）

第二层：Approval Workflow（审批流程）

第三层：Escalation Pattern（升级模式）

实战经验

生产数据

阈值调优方法

踩过的坑

对比选型

总结

Keep reading.

AI Agent 安全 — 5 种攻击向量和防御方案

从原型到生产 — 企业 AI Agent 上线清单

每个 AI 构建者都该知道的三种 Agent 架构