新的作業系統？在 ChatGPT 中的應用程式與 Apps SDK（基於 MCP）：解鎖新平台

簡介：

ChatGPT 中的應用程式 現在允許第三方開發者構建可互動的小型應用程式，這些應用程式嵌入在聊天介面中運行。與其將用戶引導至網站或移動應用程式，這些應用程式在對話中運行，並利用模型的推理能力來驅動操作。早期合作夥伴如 Canva、Coursera、Expedia 和 Zillow 展示了用戶如何在不離開 ChatGPT 的情況下要求播放清單、設計海報或搜尋房地產[1]。新的 Apps SDK 建立在 Model Context Protocol (MCP) 之上，這是一個開放標準，讓模型能夠與外部工具和用戶介面互動[2]。這篇博客深入探討基於 MCP 的應用程式架構，解釋 SDK 的能力，逐步演示如何建立應用程式，探索用戶如何發現和使用應用程式，並討論隱私和安全考量。在整個過程中，我們引用了官方文件和可信的新聞報導來支撐分析，確保來源可信。

理解模型上下文協議 (MCP)

為什麼開放標準很重要

「模型上下文協議」是應用 SDK 的基礎。根據開發者文檔，每個應用 SDK 整合都使用 MCP 伺服器來公開「工具」、處理身份驗證並打包在 ChatGPT 中呈現的結構化數據和 HTML[2]。MCP 是一個開放標準——任何人都可以用任何語言實現伺服器並連接如 GPT‑4 或 Codex 等模型。開源特性意味著沒有供應商鎖定；同一應用理論上可以運行在任何實施該協議的 AI 平台上。這種開放性鼓勵社區貢獻並促進類似於早期網絡的生態系統，當時像 HTTP 這樣的標準使網站具有互操作性。

伺服器、工具和資源

MCP 伺服器公開一個或多個「工具」。工具定義了模型可能調用的動作，如「創建看板」、「搜索房屋」或「生成播放清單」。每個工具由「機器名稱」、易於理解的標題和告知模型接受哪些參數的「JSON 架構」描述。當 ChatGPT 決定應調用該工具時，它會向伺服器發送結構化調用。伺服器執行邏輯——無論是查詢 API、執行計算還是與數據庫互動——然後返回「工具響應」。此響應包括三個欄位：

structuredContent – 向模型顯示當前狀態的數據。例如，甘特圖可能包括列和任務的數組[3]。
content – 助理向用戶回應的可選文本。這可以總結結果或給用戶指導。
_meta – 模型不可見的隱藏元數據。開發者使用它來存儲ID或用於UI元件的列表。例如，板的例子中在 _meta 使用 tasksById 映射來維護任務詳細信息而不暴露給模型[4]。

工具還可以通過引用 ui:// URL 來參照資源，例如HTML模板或圖片。伺服器在啟動期間註冊這些資源。文檔警告，由於資源由OpenAI的基礎設施緩存，開發者應該通過在文件名中包括版本哈希來版本化它們[5]。否則，用戶可能在部署後看到過時的UI。

結構化內容與元數據

在結構化內容和_meta之間的區別至關重要。根據文件說明，結構化內容對模型是可見的，用於填充UI元件；而_meta對模型是隱藏的，可能包含UI的額外數據，例如下拉選單的列表[3]。通過分離可見和隱藏數據，開發者可以保護敏感信息不被模型獲取，同時仍然能夠呈現豐富的介面。這種設計還鼓勵最小化數據共享；僅暴露完成任務所需的信息，符合隱私原則。

認證和會話

當用戶首次呼叫應用程式時，伺服器可能需要對其進行認證。Apps SDK支持OAuth 2.1流程；開發者指定範圍並將用戶重定向到身份提供者。一旦用戶授予同意，應用程式獲取令牌並可以訪問用戶的數據。伺服器的工作是管理會話狀態，通常通過將令牌存儲在與用戶ChatGPT帳戶鍵連接的資料庫中，這確保後續的工具呼叫可以重用會話，而無需再次提示用戶。

安全原則

OpenAI 強調最低權限、明確的用戶同意和縝密的防禦[6]。應用程式應僅請求所需的最低許可，並且用戶必須明確授權數據共享；模型本身絕不應猜測憑據。數據保留時間有限：結構化內容僅在用戶的提示活動期間存在，且日誌在與開發者共享之前會被編輯[6]。應用程式組件的網絡訪問受內容安全政策限制；iframe 無法訪問任意的瀏覽器 API，所有 HTTP 請求必須從服務器而非客戶端發起[7]。這防止了跨站腳本攻擊和令牌的外洩。

Apps SDK：在 ChatGPT 中構建真實應用程式！

開發者體驗

應用程式 SDK 將 MCP 包裝成慣用的客戶端庫（目前為 Python 和 TypeScript）和腳手架工具。當您創建應用程式時，您需要定義工具、註冊 UI 模板並實現伺服器邏輯。伺服器可以在您的基礎設施上運行，並使用任何框架（例如 FastAPI、Express 等），但必須實現 MCP 端點。OpenAI 提供開發伺服器和一個「MCP 檢查器」來在本地測試調用。

開發者設計邏輯和用戶介面。UI 通常使用 React 編寫，並編譯成靜態資源。這些資源在 ChatGPT 中於沙盒 iframe 中提供。在此 iframe 中，開發者可以訪問全域的 window.openai 物件來與主機互動。根據「建立自訂 UX」指南，此 API 提供：

全域變數 - displayMode、maxHeight、theme 和 locale 為元件提供佈局和樣式資訊 [8]。
工具載荷 - toolInput、toolOutput 和 widgetState 用於讀取參數、結果和渲染間的持久狀態 [8]。
動作 - setWidgetState() 儲存跨訊息持續的狀態；callTool() 觸發伺服器動作；sendFollowupTurn() 發送後續提示給模型；requestDisplayMode() 請求全螢幕或畫中畫模式 [8]。
事件 - 元件可以訂閱 openai:set_globals 當主機更新佈局或主題時，以及 openai:tool_response 當工具呼叫解決時 [8]。

這些 API 讓開發者能夠建立與模型推理同步的豐富互動元件。例如，若使用者將任務拖動到看板中的新列，元件可以發送 callTool 來更新伺服器，保存新的狀態，然後返回新的 structuredContent。此時模型僅看到高階的板狀態；UI 處理像是拖放這類的細節。

註冊工具和模板

In the server code you register a tool and its template. For instance, in a TypeScript server you might write:

import { Tool, StructuredToolResponse } from "@openai/apps";

// Register UI template
server.registerResource("ui://kanban-board/abc123", buildHtml());

// Define tool schema
const createBoard: Tool = {
  name: "createKanbanBoard",
  description: "Create a new kanban board with given tasks and columns",
  inputSchema: z.object({
    title: z.string(),
    columns: z.array(z.object({ name: z.string() })),
    tasks: z.array(z.object({ name: z.string(), columnIndex: z.number() }))
  }),
  async execute(input, ctx): Promise<StructuredToolResponse> {
    // compute board state
    const columns = input.columns.map((col, i) => ({
      id: i,
      title: col.name,
      taskIds: input.tasks.filter(t => t.columnIndex === i).map((_t, idx) => idx)
    }));
    const tasksById = input.tasks.map((task, id) => ({ id, name: task.name }));
    return {
      content: `Created board '${input.title}'`,
      structuredContent: { title: input.title, columns },
      _meta: { tasksById, uiTemplate: "ui://kanban-board/abc123" }
    };
  }
};

The _meta field includes tasksById for hidden metadata and uiTemplate referencing the registered HTML. When ChatGPT receives this response, it will render the template with the structured content. The window.openai.toolOutput object in the component can then read the board data and display it.

Versioning and Caching

由於像 UI 模板這樣的資源會緩存在 OpenAI 的伺服器上，開發者應在 ui:// 標識符中包含一個唯一的雜湊值或版本號。文檔警告說，如果您部署新版本而不更新路徑，用戶可能會因為緩存而繼續看到舊的 UI。最佳做法是將提交的 SHA 或構建 ID 嵌入到 URL 中。這樣可以確保每次部署都會產生新的資源。

持久化狀態和後續操作

組件通常需要持久化狀態。例如，一個播放列表應用程序可能允許用戶收藏歌曲；即使用戶詢問其他問題，這些收藏也應該保留。setWidgetState() 方法將數據存儲在 structuredContent 之外，並在多次交互中保持一致。模型不會看到這些狀態，從而確保隱私。

有時應用程序需要向用戶提出澄清問題。sendFollowupTurn() 方法允許組件將新提示發送回 ChatGPT，然後這將顯示在記錄中，就像模型問了這個問題一樣。這對於多步驟的工作流程非常有用：例如，旅行預訂應用可能在用戶選擇酒店後詢問“您將停留多少晚？”

Building Your First App: Step‑By‑Step Guide

In this section we will build a simple Task Tracker app that demonstrates the core concepts of the Apps SDK. The app will let a user create tasks and organise them into categories. We choose this example because it is generic, easy to extend and showcases structured content, metadata, custom UI and tool calls.

Set up the MCP Server

First install the TypeScript SDK and scaffolding tool:

npm install -g @openai/apps-generator
apps init task-tracker
cd task-tracker
npm install

This command scaffolds a project with a server, a React frontend and build scripts. The server uses Express and the @openai/apps library. Run npm run dev to start the development server; the project includes an MCP Inspector that opens in your browser and simulates ChatGPT calling your app.

Define the Tool

Open src/server.ts and define a tool called createTasks. The tool accepts an array of tasks and returns structured content grouping them by category. It also provides a summary in the content field.

import { Tool, StructuredToolResponse } from "@openai/apps";



export const createTasks: Tool = {
  name: "createTasks",
  description: "Create a list of tasks grouped by category",
  inputSchema: z.object({ tasks: z.array(z.object({ name: z.string(), category: z.string() })) }),
  async execute({ tasks }): Promise<StructuredToolResponse> {
    const categories = Array.from(new Set(tasks.map(t => t.category)));
    const grouped = categories.map(category => ({
      name: category,
      taskIds: tasks.filter(t => t.category === category).map((_, i) => i)
    }));
    const tasksById = tasks.map((task, id) => ({ id, name: task.name, category: task.category }));
    return {
      content: `Created ${tasks.length} tasks in ${categories.length} categories`,
      structuredContent: { categories: grouped },
      _meta: { tasksById, uiTemplate: "ui://task-tracker/1.0.0" }
    };
  }
};

server.registerResource("ui://task-tracker/1.0.0", fs.readFileSync(path.join(__dirname, "../dist/index.html"), "utf8"));
server.registerTool(createTasks);

Build the Custom UI

Next open src/frontend/App.tsx. This React component will read the structuredContent and display categories and tasks. It will also allow users to mark tasks as complete and persist that state using setWidgetState.

import { useEffect, useState } from "react";

declare global {
  interface Window {
    openai: any;
  }
}



export default function App() {
  const [complete, setComplete] = useState<{ [id: string]: boolean }>(() => window.openai.widgetState?.complete || {});
  const output = window.openai.toolOutput;
  const tasksById = output?._meta?.tasksById || [];
  const categories = output?.structuredContent?.categories || [];

  // persist completion state
  useEffect(() => {
    window.openai.setWidgetState({ complete });
  }, [complete]);

  return (
    <div className="task-tracker">
      {categories.map((cat: any, ci: number) => (
        <div key={ci} className="category">
          <h3>{cat.name}</h3>
          <ul>
            {cat.taskIds.map((tid: number) => (
              <li key={tid}>
                <label>
                  <input type="checkbox" checked={complete[tid]} onChange={() => setComplete(prev => ({ ...prev, [tid]: !prev[tid] }))} />
                  {tasksById[tid].name}
                </label>
              </li>
            ))}
          </ul>
        </div>
      ))}
    </div>
  );
}

This component uses window.openai.toolOutput to access the structuredContent and _meta fields. It stores completion state in widgetState so that checking a box persists even when the user continues the conversation. On subsequent tool calls, the component can fetch new tasks or update existing ones. This demonstrates how to combine model reasoning with client‑side interactions.

Testing and Iterating

Run npm run dev again and open the MCP Inspector. In the prompt area, type:

@task‑tracker create a list of tasks: buy milk in shopping, finish report in work, call mom in personal

The inspector will show the structured content and render the task list UI. You can check tasks off; the state persists across turns. You can then ask ChatGPT: “Remind me of my tasks later.” Because the model retains context, it can call the tool again, display the UI and summarise your progress.

How Users Discover and Use Apps

Named Mention and In‑Conversation Discovery

ChatGPT surfaces apps when it believes they can assist the user. There are two primary discovery modes. Named mention occurs when the user explicitly mentions the app name at the beginning of a prompt; in this case, the app will be surfaced automatically[9]. For instance, “@Spotify create a workout playlist” immediately invokes the Spotify integration. The user must place the app name at the start; otherwise the assistant may treat it as part of the conversation.

In‑conversation discovery happens when the model infers that an app could help based on context. The documentation explains that the model evaluates the conversation context, prior tool results and the user’s linked apps to determine which app might be relevant[9]. For example, if you are discussing travel plans, ChatGPT might suggest the Expedia app to book flights. The algorithm uses metadata like tool descriptions and keywords to match the conversation with potential actions[10]. Developers can improve discoverability by writing action‑oriented descriptions and clear UI component names.

Directory and Launcher

OpenAI plans to release an app directory where users can browse and discover new apps[10]. Each listing will include the app name, description, supported prompts and any onboarding instructions. Users can also access the launcher via the “+” button in chat; this shows a menu of available apps based on context. These entry points will help less technical users find and enable apps without memorising names.

Onboarding and Consent

用戶第一次啟動應用程式時，ChatGPT 會引導進行入門流程。模型會要求用戶連接他們的帳戶（如有需要），並解釋應用程式需要哪些資料。開發者指南強調應用程式必須尊重用戶的隱私，行為可預測，並擁有明確的政策[11]。用戶必須明確授權或拒絕許可；不會有靜默資料訪問。一旦連接，應用程式可以在後續互動中保持連結，但用戶始終有能力斷開連接和撤銷許可。

隱私、安全與負責任的設計

值得信賴的應用程式原則

OpenAI 的「應用程式開發者指南」定義了若干原則，以確保生態系統保持安全和可信。應用程式必須提供合法的服務，擁有明確的隱私政策和數據保存實踐，並遵循使用政策[11]。應盡量減少數據收集，避免儲存敏感個人信息，且不得在未經同意的情況下分享用戶數據[12]。應用程式必須表現得可預測；不能操控模型產生有害或誤導性的內容。

數據邊界與最小化

指引強調，應用程式應只收集其功能所必需的數據，且不得要求或存儲如健康記錄或政府 ID 等敏感數據[12]。發送給模型的結構化內容不應包含秘密；隱藏的元數據不應存儲用戶令牌或私人細節。開發者必須為 OAuth 過程中獲得的任何令牌實施強加密和安全存儲。伺服器應保持用戶會話之間的嚴格界限；一位用戶的數據絕不應洩漏到另一位用戶的上下文中。

SDK 中的安全措施

「安全與隱私指南」概述了平台內建的防禦機制，強調最小特權和明確的用戶同意為核心原則[6]。數據保留是有限的；開發者可以訪問的日誌會被編輯以去除可識別個人身份的信息，結構化內容只在提示所需的時間內保留[6]。Iframe內的網絡訪問受到內容安全政策的限制；外部抓取必須通過伺服器進行，防止未授權的跨來源請求[7]。身份驗證使用行業標準的OAuth流程與短期令牌。開發者必須實施安全審查、漏洞報告渠道和事件監控以維持運營準備[7]。

公平性和適當性

應用程式必須適合廣泛的受眾。指導方針禁止提供長篇內容、複雜的自動化或廣告的應用程式[13]。例如，應用程式不應嘗試在 ChatGPT 中傳送 30 分鐘的影片或複製整個社交網絡。該平台鼓勵補充對話流程的簡潔互動。違規可能導致拒絕或移除。

機會與考量

開發者的新發行渠道

通過向第三方應用程式開放 ChatGPT，OpenAI 將自己定位為用戶與服務之間的「意圖層」。開發者現在可以通過聊天介面觸及數百萬用戶，而無需建立單獨的網頁或移動應用程式。應用程式有潛力降低摩擦：用戶只需在對話中提及服務，而無需下載應用程式或訪問網站。這可能會使工具的使用更為普及，並為小型開發者創造公平競爭的環境。

早期的合作關係展示了可能性：用戶可以在觀看 Coursera 課程時向 ChatGPT 提問；在 Canva 設計海報；瀏覽 Expedia 旅行選項或 Zillow 房地產列表；生成 Spotify 播放列表；或使用 Figma 繪製想法圖[14][13]。因為應用程式在聊天中運行，模型可以總結、分析並生成建議，將靜態內容轉化為互動式課程。這些應用程式還提供多種顯示模式——內置卡片、全螢幕或畫中畫——為不同任務提供靈活性[15]。

改變用戶期望

能夠在不切換上下文的情況下使用應用程式，可能會重塑人們與服務的互動方式。ChatGPT 不僅僅是聊天機器人，而是一個意圖的通用操作系統。正如 Casey Newton 所觀察的那樣，這讓我們從啟動獨立應用程式轉變為僅僅表達我們的需求[16]。一些分析人士將這一轉變比作 App Store 或瀏覽器的推出：一個聚合功能與競爭的單一平台。

然而，這一轉變也引發了對控制和權力的質疑。如果 ChatGPT 決定展示哪些應用程式，它可能成為一個把關者。Newton 警告說，基於用戶偏好的「AI 圖譜」可能會帶來比社交網絡更嚴重的隱私風險[16]。經濟激勵可能導致付費展示或排名。開發者可能會感到壓力，設計針對 ChatGPT，而非擁有與用戶的關係。保持平台的透明和公平以維持信任至關重要。

法規和倫理影響

由於應用程式可以存取個人數據——位置、聯絡人、支付方式——監管機構可能會審查數據在 ChatGPT 中的流動方式。即使該平台尚未在歐盟推出，開發者仍須遵守 GDPR 等隱私法律[17]。OpenAI 承諾提供更細緻的隱私控制和貨幣化選項，包括允許在聊天中即時結帳的代理商業協議[18]。這個生態系統的成功將取決於強大的安全性、明確的用戶同意和公平的經濟模式。

未來方向與研究

應用程式 SDK 仍在預覽階段，許多功能尚待完善。開發者路線圖包括：

提交和審核流程 – 目前開發者可以建立應用程式，但不能公開列出。正式的審核流程將確保符合指導方針並建立信任。
收入分享和貨幣化 – OpenAI 暗示了一種代理商務協議，可能讓用戶直接在聊天中購買商品[18]。這帶來了電子商務的機會，但也引發了關於費用、排名和競爭的問題。
開發者工具 – 更多語言和框架、改進的調試工具和更簡單的部署管道將降低進入門檻。MCP 的開放標準特性可能會導致社區驅動的實現和託管提供商的出現。
互通性 – 因為 MCP 是開放的，其他平台或模型可能會採用它。這可能使跨模型應用程式生態系統成為可能，開發者可以一次撰寫，隨處運行。標準化代理協議和上下文共享的研究將變得重要。
安全研究 – 評估如何防止提示注入、惡意代碼或用戶數據濫用仍然是主要研究領域。關於對 LLM 整合應用的對抗性攻擊的論文將為最佳實踐和指導方針提供資訊。

結論：新作業系統的形成中

引入 Apps in ChatGPT 和 MCP‑based Apps SDK 標誌著我們與軟體互動方式的重大轉變。OpenAI 將第三方應用程式直接引入聊天介面，創造了一個融合自然語言、推理和互動 UI 的新平台。Model Context Protocol 提供了一種開放、標準化的方式讓模型調用工具和渲染元件；Apps SDK 通過處理伺服器通信、UI 集成和狀態管理來簡化開發。像 Task Tracker 這樣的逐步範例展示了如何在保持嚴格數據邊界和隱私的同時，輕鬆建立一個有用的應用程式。

然而，這項創新帶來了責任。開發者必須遵循以用戶隱私、安全和公平為優先的指導方針[11][12]。像最低權限和明確同意這樣的安全機制可以保護用戶[6]。同時，業界觀察者警告該平台可能會產生新的門檻和隱私風險[16]。隨著生態系統的成熟，透明度、開放標準和社區參與將決定ChatGPT的應用平台是否能成為日常任務的變革性、值得信賴的層次。

[1] AI 軍備競賽最新消息：ChatGPT 現在允許用戶在聊天中連接 Spotify 和 Zillow

https://www.forbes.com/sites/antoniopequenoiv/2025/10/06/openais-chatgpt-now-connects-with-third-party-apps-like-spotify-and-zillow-heres-the-latest-in-the-ai-arms-race/

[2] [3] [4] [5] 設置你的伺服器

https://developers.openai.com/apps-sdk/build/mcp-server

[6] [7] 安全與隱私

https://developers.openai.com/apps-sdk/guides/security-privacy

[8] 建立自訂用戶體驗

https://developers.openai.com/apps-sdk/build/custom-ux

[9] [10] 使用者互動

https://developers.openai.com/apps-sdk/concepts/user-interaction

[11] [12] 應用開發者指南

https://developers.openai.com/apps-sdk/app-developer-guidelines/