结构化输出与 Schema 控制：让 AI 返回可预测的数据格式

为什么需要结构化输出

当你调用 LLM API 后拿到一段自由文本，要提取有用信息就得自己解析——正则、字符串处理、各种兜底逻辑。这很烦，也容易出错。

结构化输出让 AI 直接返回 JSON 或特定格式的数据，省去解析步骤，代码更简洁可靠。

本文将介绍

OpenAI JSON Mode 使用方法
Function Calling 原理与实践
Anthropic 的结构化输出方案
自定义 Schema 约束技巧
前端场景下的最佳实践

OpenAI JSON Mode

最简单的结构化输出方式。

基础用法

const response = await openai.chat.completions.create({
  model: 'gpt-4-turbo-preview',
  response_format: { type: 'json_object' },
  messages: [
    {
      role: 'system',
      content: '你是一个数据提取助手。请以 JSON 格式返回结果。'
    },
    {
      role: 'user',
      content: '提取以下文本中的人物信息：张三是一名28岁的前端工程师，在北京工作。'
    }
  ]
})

const data = JSON.parse(response.choices[0].message.content)
// { "name": "张三", "age": 28, "occupation": "前端工程师", "city": "北京" }

关键点

必须在 system 或 user 消息中提到 JSON
返回一定是合法 JSON，无需担心解析失败
适合简单场景，复杂结构推荐 Function Calling

指定输出结构

const systemPrompt = `你是一个数据提取助手。
请严格按照以下 JSON 格式返回：
{
  "products": [
    {
      "name": "商品名称",
      "price": 价格数字,
      "currency": "货币单位",
      "inStock": 是否有货布尔值
    }
  ],
  "totalCount": 商品总数
}`

Function Calling

更强大、更可控的结构化输出方式。

工作原理

用户输入 → LLM 决定调用哪个函数 → 返回函数参数（结构化 JSON）→ 你执行函数

定义函数 Schema

const tools = [
  {
    type: 'function',
    function: {
      name: 'extract_user_info',
      description: '从文本中提取用户信息',
      parameters: {
        type: 'object',
        properties: {
          name: {
            type: 'string',
            description: '用户姓名'
          },
          age: {
            type: 'integer',
            description: '用户年龄'
          },
          email: {
            type: 'string',
            format: 'email',
            description: '邮箱地址'
          },
          skills: {
            type: 'array',
            items: { type: 'string' },
            description: '技能列表'
          }
        },
        required: ['name']
      }
    }
  }
]

调用示例

const response = await openai.chat.completions.create({
  model: 'gpt-4-turbo-preview',
  messages: [
    { 
      role: 'user', 
      content: '分析：小明，25岁，邮箱 xiaoming@example.com，擅长 Vue 和 React' 
    }
  ],
  tools,
  tool_choice: { type: 'function', function: { name: 'extract_user_info' } }
})

const toolCall = response.choices[0].message.tool_calls?.[0]
if (toolCall) {
  const args = JSON.parse(toolCall.function.arguments)
  // { name: '小明', age: 25, email: 'xiaoming@example.com', skills: ['Vue', 'React'] }
}

tool_choice 选项

选项	含义
`auto`	模型自动决定是否调用函数
`none`	禁止调用函数
`required`	必须调用某个函数
`{ function: { name: 'xxx' } }`	强制调用指定函数

复杂 Schema 设计

嵌套对象

{
  type: 'object',
  properties: {
    user: {
      type: 'object',
      properties: {
        profile: {
          type: 'object',
          properties: {
            avatar: { type: 'string' },
            bio: { type: 'string' }
          }
        }
      }
    }
  }
}

数组与枚举

{
  type: 'object',
  properties: {
    status: {
      type: 'string',
      enum: ['pending', 'approved', 'rejected']
    },
    tags: {
      type: 'array',
      items: { type: 'string' },
      minItems: 1,
      maxItems: 5
    }
  }
}

联合类型

{
  type: 'object',
  properties: {
    result: {
      oneOf: [
        { type: 'string' },
        { type: 'number' },
        { 
          type: 'object',
          properties: {
            error: { type: 'string' }
          }
        }
      ]
    }
  }
}

Anthropic Claude 方案

Claude 不支持原生 JSON Mode，但可以通过提示词实现：

const response = await anthropic.messages.create({
  model: 'claude-3-opus-20240229',
  max_tokens: 1024,
  messages: [
    {
      role: 'user',
      content: `提取用户信息并以 JSON 格式返回。

输入：小红，30岁，产品经理

输出格式：
\`\`\`json
{
  "name": "姓名",
  "age": 年龄数字,
  "role": "职位"
}
\`\`\`

请直接输出 JSON，不要添加其他文字。`
    }
  ]
})

// 从响应中提取 JSON
const content = response.content[0].text
const jsonMatch = content.match(/```json\n([\s\S]*?)\n```/)
if (jsonMatch) {
  const data = JSON.parse(jsonMatch[1])
}

Claude 的 Tool Use

Claude 3 也支持类似 Function Calling 的功能：

const response = await anthropic.messages.create({
  model: 'claude-3-opus-20240229',
  max_tokens: 1024,
  tools: [
    {
      name: 'get_weather',
      description: '获取指定城市的天气',
      input_schema: {
        type: 'object',
        properties: {
          city: { type: 'string', description: '城市名称' },
          unit: { type: 'string', enum: ['celsius', 'fahrenheit'] }
        },
        required: ['city']
      }
    }
  ],
  messages: [{ role: 'user', content: '北京今天天气怎么样？' }]
})

前端实践场景

场景 1：表单智能填充

// 用户粘贴一段文字，自动填充表单
const tools = [{
  type: 'function',
  function: {
    name: 'fill_form',
    description: '从文本中提取信息并填充表单',
    parameters: {
      type: 'object',
      properties: {
        company: { type: 'string' },
        contact: { type: 'string' },
        phone: { type: 'string' },
        address: { type: 'string' },
        notes: { type: 'string' }
      }
    }
  }
}]

// 用户输入："联系方式：张经理 138-xxxx-xxxx，公司：某某科技有限公司，地址：北京市..."
// AI 返回结构化数据，直接赋值给表单

场景 2：内容分类打标

const classifyContent = async (content: string) => {
  const tools = [{
    type: 'function',
    function: {
      name: 'classify',
      description: '对内容进行分类和打标签',
      parameters: {
        type: 'object',
        properties: {
          category: {
            type: 'string',
            enum: ['技术', '产品', '设计', '运营', '其他']
          },
          tags: {
            type: 'array',
            items: { type: 'string' },
            maxItems: 5
          },
          sentiment: {
            type: 'string',
            enum: ['positive', 'neutral', 'negative']
          },
          summary: {
            type: 'string',
            description: '100字以内的摘要'
          }
        },
        required: ['category', 'tags', 'summary']
      }
    }
  }]

  // 调用 API...
}

场景 3：数据校验与修复

const validateAndFix = async (data: unknown, schema: JSONSchema) => {
  const response = await openai.chat.completions.create({
    model: 'gpt-4-turbo-preview',
    response_format: { type: 'json_object' },
    messages: [
      {
        role: 'system',
        content: `你是一个数据校验助手。
检查输入数据是否符合 Schema，如果不符合，尝试修复。

Schema:
${JSON.stringify(schema, null, 2)}

返回格式:
{
  "valid": 是否有效,
  "errors": ["错误列表"],
  "fixed": 修复后的数据（如果能修复）
}`
      },
      {
        role: 'user',
        content: JSON.stringify(data)
      }
    ]
  })

  return JSON.parse(response.choices[0].message.content)
}

TypeScript 类型安全

结合 Zod 实现类型安全的结构化输出：

import { z } from 'zod'
import { zodToJsonSchema } from 'zod-to-json-schema'

// 定义 Schema
const UserSchema = z.object({
  name: z.string(),
  age: z.number().int().positive(),
  email: z.string().email().optional(),
  role: z.enum(['admin', 'user', 'guest'])
})

type User = z.infer<typeof UserSchema>

// 转换为 JSON Schema
const jsonSchema = zodToJsonSchema(UserSchema)

// 调用 API 并验证
const response = await callAI(prompt, jsonSchema)
const parsed = UserSchema.safeParse(response)

if (parsed.success) {
  const user: User = parsed.data // 类型安全
} else {
  console.error('解析失败:', parsed.error)
}

错误处理

JSON 解析失败

function safeParseJSON<T>(text: string, fallback: T): T {
  try {
    return JSON.parse(text)
  } catch (e) {
    // 尝试提取 JSON 块
    const match = text.match(/```(?:json)?\n?([\s\S]*?)\n?```/)
    if (match) {
      try {
        return JSON.parse(match[1])
      } catch {}
    }
    
    // 尝试修复常见问题
    const cleaned = text
      .replace(/[\x00-\x1F]+/g, '') // 移除控制字符
      .replace(/,\s*}/g, '}')       // 移除尾随逗号
      .replace(/,\s*]/g, ']')
    
    try {
      return JSON.parse(cleaned)
    } catch {
      return fallback
    }
  }
}

Schema 不匹配

function validateWithRetry(data: unknown, schema: JSONSchema, maxRetries = 2) {
  for (let i = 0; i < maxRetries; i++) {
    const result = validate(data, schema)
    if (result.valid) return data
    
    // 请求 AI 修复
    data = await askAIToFix(data, result.errors)
  }
  
  throw new Error('无法获取有效的结构化输出')
}

总结

方案	适用场景	优点	缺点
JSON Mode	简单结构	简单易用	结构不可控
Function Calling	复杂结构	Schema 约束强	配置稍复杂
提示词引导	兼容性	通用性好	需要后处理

核心建议：

优先使用 Function Calling：结构最可控
配合 Zod 验证：类型安全 + 运行时校验
准备降级方案：AI 输出不可 100% 信任

相关文章推荐：