新操作系统？ChatGPT 应用与应用 SDK（基于 MCP）：解锁新平台

简介：

ChatGPT 中的应用 现在允许第三方开发者构建在聊天界面内运行的交互式迷你应用。这些应用无需将用户引导至网站或移动应用，而是在对话中运行，并利用模型的推理能力来驱动操作。早期合作伙伴如 Canva、Coursera、Expedia 和 Zillow 演示了用户如何在不离开 ChatGPT 的情况下请求播放列表、设计海报或搜索房地产[1]。新的 应用 SDK 基于 模型上下文协议（MCP） 构建，这是一种让模型与外部工具和用户界面交互的开放标准[2]。这篇博客深入探讨了基于 MCP 的应用架构，解释了 SDK 的功能，逐步演示了如何构建应用，探索了用户如何发现和使用应用，并讨论了隐私和安全性考虑。我们在整个过程中引用了官方文档和可信的新闻报道，以确保分析的可信度。

理解模型上下文协议（MCP）

开放标准为何重要

「模型上下文协议」是应用程序 SDK 的基础。根据开发者文档，每个应用程序 SDK 集成都使用 MCP 服务器来公开工具，处理身份验证，并打包在 ChatGPT 中渲染的结构化数据和 HTML。MCP 是一个开放标准——任何人都可以用任何语言实现服务器并连接诸如 GPT-4 或 Codex 这样的模型。开源性质意味着没有供应商锁定；理论上，同一个应用可以在任何实现该协议的 AI 平台上运行。这种开放性鼓励社区贡献，并促进了类似早期网络的生态系统发展，类似于 HTTP 这样的标准使网站具有互操作性。

服务器、工具和资源

一个 MCP 服务器公开一个或多个工具。工具定义了模型可能调用的操作，例如“创建看板”、“搜索房屋”或“生成播放列表”。每个工具都由一个机器名称、一个易于理解的标题和一个告诉模型接受哪些参数的JSON 模式来描述。当 ChatGPT 决定应该调用工具时，它会向服务器发送一个结构化的调用。服务器执行逻辑——无论是通过查询 API、执行计算还是与数据库交互——然后返回一个工具响应。此响应包括三个字段：

structuredContent - 对模型可见的数据，描述当前状态。例如，甘特图可能包含列和任务的数组[3]。
content - 助理回复用户的可选文本。可以总结结果或给用户指示。
_meta - 对模型不可见的隐藏元数据。开发人员用它来存储 UI 组件中使用的 ID 或列表。例如，板块示例在 _meta 中使用 tasksById 映射来维护任务细节而不暴露给模型[4]。

工具也可以通过引用 ui:// URL 来引用资源，例如 HTML 模板或图像。服务器在启动时注册这些资源。文档警告，由于资源由 OpenAI 的基础设施缓存，开发人员应通过在文件名中包含构建哈希来为其版本化[5]。否则，用户可能会在部署后看到过时的 UI。

结构化内容与元数据

区分 structuredContent 和 _meta 是至关重要的。根据文档，structuredContent 对模型可见，并用于填充 UI 组件；_meta 对模型隐藏，可能包含 UI 的额外数据，如下拉菜单的列表[3]。通过分离可见和隐藏数据，开发者可以保护敏感信息不被模型获取，同时仍能渲染丰富的界面。这种设计还鼓励最小化数据共享；仅暴露完成任务所需的数据，符合隐私原则。

身份验证和会话

当用户首次调用应用时，服务器可能需要对其进行身份验证。Apps SDK 支持 OAuth 2.1 流程；开发者指定作用域并将用户重定向到身份提供者。一旦用户授予同意，应用程序获取令牌并可以访问用户的数据。服务器的任务是管理会话状态，通常通过将令牌存储在与用户 ChatGPT 账户对应的数据库中。这确保了后续的工具调用可以重用会话，而无需再次提示用户。

安全原则

OpenAI 强调「最小特权」、「明确用户同意」和「深度防御」[6]。应用程序应仅请求所需的最低权限，用户必须明确授权数据共享；模型本身绝不应猜测凭据。数据保留受到限制：结构化内容仅在用户提示处于活动状态时保留，日志在与开发人员共享之前会被编辑[6]。应用程序组件的网络访问受到内容安全策略的限制；iframe 无法访问任意浏览器 API，所有 HTTP 请求必须从服务器而非客户端发起[7]。这可以防止跨站脚本和令牌泄露。

应用程序 SDK：在 ChatGPT 中构建真实应用程序！

开发者体验

应用程序 SDK 将 MCP 封装在惯用的客户端库（目前是 Python 和 TypeScript）和脚手架工具中。创建应用程序时，您需要定义工具、注册 UI 模板并实现服务器逻辑。服务器可以在您自己的基础设施上运行，并使用任何框架（FastAPI、Express 等），但必须实现 MCP 端点。OpenAI 提供开发服务器和一个「MCP 检查器」来本地测试调用。

开发人员设计逻辑和用户界面。UI 通常用 React 编写并编译成静态资源。它们在 ChatGPT 中的一个沙盒 iframe 内提供。在这个 iframe 中，开发人员可以访问全局 window.openai 对象以与主机交互。根据「构建自定义 UX」指南，此 API 提供：

全局变量 – displayMode、maxHeight、theme 和 locale 用于告知组件关于布局和样式的信息[8]。
工具负载 – toolInput、toolOutput 和 widgetState 允许读取参数、结果和跨渲染的持久状态[8]。
操作 – setWidgetState() 保存跨消息持久的状态；callTool() 触发服务器操作；sendFollowupTurn() 向模型发送后续提示；requestDisplayMode() 请求全屏或画中画模式[8]。
事件 – 当主机更新布局或主题时，组件可以订阅 openai:set_globals，以及当工具调用解析时的 openai:tool_response[8]。

这些 API 让开发者能够构建丰富的交互组件，与模型的推理保持同步。例如，如果用户在看板中将任务拖到新列，组件可以发送 callTool 更新服务器，保持新状态，然后返回新的 structuredContent。同时，模型仅看到高级别的板状态，UI 处理诸如拖放等细节。

注册工具和模板

In the server code you register a tool and its template. For instance, in a TypeScript server you might write:

import { Tool, StructuredToolResponse } from "@openai/apps";

// Register UI template
server.registerResource("ui://kanban-board/abc123", buildHtml());

// Define tool schema
const createBoard: Tool = {
  name: "createKanbanBoard",
  description: "Create a new kanban board with given tasks and columns",
  inputSchema: z.object({
    title: z.string(),
    columns: z.array(z.object({ name: z.string() })),
    tasks: z.array(z.object({ name: z.string(), columnIndex: z.number() }))
  }),
  async execute(input, ctx): Promise<StructuredToolResponse> {
    // compute board state
    const columns = input.columns.map((col, i) => ({
      id: i,
      title: col.name,
      taskIds: input.tasks.filter(t => t.columnIndex === i).map((_t, idx) => idx)
    }));
    const tasksById = input.tasks.map((task, id) => ({ id, name: task.name }));
    return {
      content: `Created board '${input.title}'`,
      structuredContent: { title: input.title, columns },
      _meta: { tasksById, uiTemplate: "ui://kanban-board/abc123" }
    };
  }
};

The _meta field includes tasksById for hidden metadata and uiTemplate referencing the registered HTML. When ChatGPT receives this response, it will render the template with the structured content. The window.openai.toolOutput object in the component can then read the board data and display it.

Versioning and Caching

由于像 UI 模板这样的资源被缓存到 OpenAI 的服务器上，开发者应该在 ui:// 标识符中包含一个唯一的哈希或版本。文档警告说，如果你在不更新路径的情况下部署新版本，用户可能会因为缓存而继续看到旧的 UI。一个最佳实践是将提交的 SHA 或构建 ID 嵌入到 URL 中。这确保每次部署都会生成新的资源。

持久化状态和后续操作

组件通常需要持久化状态。例如，一个播放列表应用可能允许用户收藏歌曲；即便用户问了另一个问题，这些收藏的歌曲也应该保留。setWidgetState() 方法将数据存储在 structuredContent 之外，并在不同对话轮次间持久化。模型看不到此状态，从而确保隐私。

有时应用需要向用户提出澄清问题。sendFollowupTurn() 方法允许组件将新的提示发送回 ChatGPT，这样就会在对话记录中显示为模型提出的问题。这对于多步骤工作流程很有用：例如，一个旅行预订应用可能在用户选择酒店后询问“你会住几晚？”

Building Your First App: Step‑By‑Step Guide

In this section we will build a simple Task Tracker app that demonstrates the core concepts of the Apps SDK. The app will let a user create tasks and organise them into categories. We choose this example because it is generic, easy to extend and showcases structured content, metadata, custom UI and tool calls.

Set up the MCP Server

First install the TypeScript SDK and scaffolding tool:

npm install -g @openai/apps-generator
apps init task-tracker
cd task-tracker
npm install

This command scaffolds a project with a server, a React frontend and build scripts. The server uses Express and the @openai/apps library. Run npm run dev to start the development server; the project includes an MCP Inspector that opens in your browser and simulates ChatGPT calling your app.

Define the Tool

Open src/server.ts and define a tool called createTasks. The tool accepts an array of tasks and returns structured content grouping them by category. It also provides a summary in the content field.

import { Tool, StructuredToolResponse } from "@openai/apps";



export const createTasks: Tool = {
  name: "createTasks",
  description: "Create a list of tasks grouped by category",
  inputSchema: z.object({ tasks: z.array(z.object({ name: z.string(), category: z.string() })) }),
  async execute({ tasks }): Promise<StructuredToolResponse> {
    const categories = Array.from(new Set(tasks.map(t => t.category)));
    const grouped = categories.map(category => ({
      name: category,
      taskIds: tasks.filter(t => t.category === category).map((_, i) => i)
    }));
    const tasksById = tasks.map((task, id) => ({ id, name: task.name, category: task.category }));
    return {
      content: `Created ${tasks.length} tasks in ${categories.length} categories`,
      structuredContent: { categories: grouped },
      _meta: { tasksById, uiTemplate: "ui://task-tracker/1.0.0" }
    };
  }
};

server.registerResource("ui://task-tracker/1.0.0", fs.readFileSync(path.join(__dirname, "../dist/index.html"), "utf8"));
server.registerTool(createTasks);

Build the Custom UI

Next open src/frontend/App.tsx. This React component will read the structuredContent and display categories and tasks. It will also allow users to mark tasks as complete and persist that state using setWidgetState.

import { useEffect, useState } from "react";

declare global {
  interface Window {
    openai: any;
  }
}



export default function App() {
  const [complete, setComplete] = useState<{ [id: string]: boolean }>(() => window.openai.widgetState?.complete || {});
  const output = window.openai.toolOutput;
  const tasksById = output?._meta?.tasksById || [];
  const categories = output?.structuredContent?.categories || [];

  // persist completion state
  useEffect(() => {
    window.openai.setWidgetState({ complete });
  }, [complete]);

  return (
    <div className="task-tracker">
      {categories.map((cat: any, ci: number) => (
        <div key={ci} className="category">
          <h3>{cat.name}</h3>
          <ul>
            {cat.taskIds.map((tid: number) => (
              <li key={tid}>
                <label>
                  <input type="checkbox" checked={complete[tid]} onChange={() => setComplete(prev => ({ ...prev, [tid]: !prev[tid] }))} />
                  {tasksById[tid].name}
                </label>
              </li>
            ))}
          </ul>
        </div>
      ))}
    </div>
  );
}

This component uses window.openai.toolOutput to access the structuredContent and _meta fields. It stores completion state in widgetState so that checking a box persists even when the user continues the conversation. On subsequent tool calls, the component can fetch new tasks or update existing ones. This demonstrates how to combine model reasoning with client‑side interactions.

Testing and Iterating

Run npm run dev again and open the MCP Inspector. In the prompt area, type:

@task‑tracker create a list of tasks: buy milk in shopping, finish report in work, call mom in personal

The inspector will show the structured content and render the task list UI. You can check tasks off; the state persists across turns. You can then ask ChatGPT: “Remind me of my tasks later.” Because the model retains context, it can call the tool again, display the UI and summarise your progress.

How Users Discover and Use Apps

Named Mention and In‑Conversation Discovery

ChatGPT surfaces apps when it believes they can assist the user. There are two primary discovery modes. Named mention occurs when the user explicitly mentions the app name at the beginning of a prompt; in this case, the app will be surfaced automatically[9]. For instance, “@Spotify create a workout playlist” immediately invokes the Spotify integration. The user must place the app name at the start; otherwise the assistant may treat it as part of the conversation.

In‑conversation discovery happens when the model infers that an app could help based on context. The documentation explains that the model evaluates the conversation context, prior tool results and the user’s linked apps to determine which app might be relevant[9]. For example, if you are discussing travel plans, ChatGPT might suggest the Expedia app to book flights. The algorithm uses metadata like tool descriptions and keywords to match the conversation with potential actions[10]. Developers can improve discoverability by writing action‑oriented descriptions and clear UI component names.

Directory and Launcher

OpenAI plans to release an app directory where users can browse and discover new apps[10]. Each listing will include the app name, description, supported prompts and any onboarding instructions. Users can also access the launcher via the “+” button in chat; this shows a menu of available apps based on context. These entry points will help less technical users find and enable apps without memorising names.

Onboarding and Consent

用户首次激活应用时，ChatGPT 会启动引导流程。模型会要求用户连接他们的账户（如果需要），并解释应用需要哪些数据。开发者指南 强调应用必须尊重用户隐私，行为可预测，并有清晰的政策[11]。用户必须明确允许或拒绝权限，不存在静默数据访问。一旦连接，应用可以在后续交互中保持链接，但用户始终有能力断开连接并撤销权限。

隐私、安全与负责任的设计

值得信赖的应用原则

OpenAI 的 应用开发者指南 定义了多个原则，以确保生态系统保持安全和可信。应用程序必须提供合法服务，拥有明确的隐私政策和数据保留实践，并遵守使用政策[11]。它们应尽量减少数据收集，避免存储敏感个人信息，并且未经同意不得共享用户数据[12]。应用程序必须表现得可预测；它们不能操控模型生成有害或误导性内容。

数据界限和最小化

指南强调，应用程序应仅收集其功能所必需的数据，且不得请求或存储敏感数据，如健康记录或政府身份证件[12]。发送给模型的结构化内容不应包含秘密；隐藏的元数据不应存储用户令牌或私人信息。开发人员必须对在 OAuth 过程中获得的任何令牌实施强加密和安全存储。服务器应严格维护用户会话之间的界限；一个用户的数据绝不应泄漏到另一个用户的上下文中。

SDK 中的安全措施

「安全和隐私指南」概述了内置于平台的防御机制。它强调最小特权和明确用户同意为核心原则[6]。数据保留受到限制；开发者可以访问的日志会被编辑以去除个人身份信息，并且结构化内容仅在提示需要时保留[6]。来自 iframe 内的网络访问受到内容安全策略的限制；外部请求必须通过服务器，防止未经授权的跨域请求[7]。身份验证采用行业标准的 OAuth 流程，使用短期令牌。开发者需实施安全审查、漏洞报告渠道和事件监控，以保持操作准备状态[7]。

公平性和适当性

应用程序必须适合广泛的受众。指南禁止提供长篇内容、复杂自动化或广告的应用程序[13]。例如，应用程序不应尝试在 ChatGPT 中提供 30 分钟的视频或复制整个社交网络。该平台鼓励简明的互动，以补充对话流程。违反规定可能导致拒绝或移除。

机遇与考量

开发者的新分发渠道

通过向第三方应用程序开放 ChatGPT，OpenAI 将自己定位为用户和服务之间的“意图层”。开发者现在可以通过聊天界面触达数百万用户，而无需建立单独的网页或移动应用程序。应用程序有潜力降低摩擦：用户只需在对话中提到该服务，而不必下载应用程序或访问网站。这可能会使工具的访问更加民主化，并为小型开发者创造公平竞争的环境。

早期的合作展示了可能性：用户可以在观看 Coursera 讲座的同时向 ChatGPT 提问；在 Canva 上设计海报；浏览 Expedia 的旅行选项或 Zillow 的房地产列表；生成 Spotify 播放列表；或使用 Figma 绘制想法图[14][13]。由于这些应用程序在聊天中运行，模型可以总结、分析和生成推荐，将静态内容转化为互动课程。这些应用程序还提供多种显示模式——内嵌卡片、全屏或画中画——为不同任务提供灵活性[15]。

改变用户期望

在不切换上下文的情况下使用应用程序的能力可能会改变人们与服务互动的方式。ChatGPT 不仅仅是一个聊天机器人，而是一个通用的意图操作系统。正如 Casey Newton 所观察到的，这将使我们从启动独立应用程序转变为简单地陈述我们的需求[16]。一些分析师将这种转变比作 App Store 或浏览器的推出：一个汇集功能和竞争的平台。

然而，这种转变也引发了关于控制和权力的疑问。如果 ChatGPT 决定显示哪些应用程序，它可能会成为一个把关者。Newton 警告说，基于用户偏好的“AI 图谱”可能会带来比社交网络更严重的隐私风险[16]。经济激励可能导致付费展示或应用排名。开发者可能会感到被迫为 ChatGPT 设计，而不是拥有与用户的关系。平台保持透明和公平以维持信任是至关重要的。

监管和伦理影响

由于应用可以访问个人数据——例如位置、联系人、支付方式——监管机构可能会仔细审查数据如何通过ChatGPT流动。即使该平台尚未在欧盟推出，开发者仍必须遵守GDPR等隐私法律。OpenAI承诺提供更详细的隐私控制和货币化选项，包括一种代理商业协议，允许在聊天中即时结账。这个生态系统的成功将取决于强大的安全性、明确的用户同意和公平的经济模型。

未来方向和研究

Apps SDK仍在预览中，许多功能尚待完善。开发者路线图包括：

提交和审核流程 – 目前开发者可以构建应用，但无法公开列出。一个正式的审核流程将确保符合指南并建立信任。
收入分成和货币化 – OpenAI 暗示一种代理商业协议，可能允许用户直接在聊天中购买商品。这为电子商务带来了机会，但也提出了关于费用、排名和竞争的问题。
开发者工具 – 更多的语言和框架、改进的调试工具和更简便的部署管道将降低入门门槛。MCP 的开放标准性质可能会导致社区驱动的实现和托管服务商的出现。
互操作性 – 因为 MCP 是开放的，其他平台或模型可以采用。这可能使跨模型应用生态系统成为可能，开发者可以“一次编写，到处运行”。标准化代理协议和上下文共享的研究将非常重要。
安全研究 – 评估如何防止提示注入、恶意代码或用户数据的误用仍然是一个主要的研究领域。关于针对集成 LLM 应用的对抗性攻击的论文将为最佳实践和指南提供信息。

结论：一个新的操作系统正在形成

引入 ChatGPT 中的应用程序 和 基于 MCP 的应用程序 SDK 标志着我们与软件交互方式的重大转变。通过将第三方应用程序直接引入聊天界面，OpenAI 创建了一个融合自然语言、推理和交互式用户界面的新平台。模型上下文协议 为模型调用工具和渲染组件提供了一种开放、标准化的方式；应用程序 SDK 通过处理服务器通信、UI 集成和状态管理简化了开发。像 任务追踪器 这样的分步示例展示了如何在保持严格数据边界和隐私的同时轻松构建一个有用的应用程序。

然而，这项创新也带来了责任。开发者必须遵循以用户隐私、安全性和公平性为优先的指南[11][12]。最小特权和明确同意等安全机制可以保护用户[6]。与此同时，行业观察者警告说，这个平台可能会创造新的门槛和隐私风险[16]。随着生态系统的成熟，透明度、开放标准和社区参与将决定ChatGPT的应用平台是否能成为日常任务的变革性、值得信赖的层。

[1] AI军备竞赛最新动态：ChatGPT现在允许用户在聊天中连接Spotify和Zillow

https://www.forbes.com/sites/antoniopequenoiv/2025/10/06/openais-chatgpt-now-connects-with-third-party-apps-like-spotify-and-zillow-heres-the-latest-in-the-ai-arms-race/

[2] [3] [4] [5] 设置你的服务器

https://developers.openai.com/apps-sdk/build/mcp-server

[6] [7] 安全与隐私

https://developers.openai.com/apps-sdk/guides/security-privacy

[8] 构建自定义用户体验

https://developers.openai.com/apps-sdk/build/custom-ux

[9] [10] 用户交互

https://developers.openai.com/apps-sdk/concepts/user-interaction

[11] [12] 应用开发者指南

https://developers.openai.com/apps-sdk/app-developer-guidelines/