A New OS? Apps in ChatGPT & Apps SDK (MCP‑Based): Unlocking a New Platform

Author: Boxu Li at Macaron

Introduction:

Apps in ChatGPT now allos third‑party developers to build interactive mini‑applications that live inside the chat interface. Rather than sending users away to websites or mobile apps, these apps run within the conversation and leverage the model’s reasoning to drive actions. Early partners like Canva, Coursera, Expedia and Zillow demoed how users can ask for a playlist, design a poster or search real‑estate without leaving ChatGPT[1]. The new Apps SDK is built on the Model Context Protocol (MCP), an open standard that lets models interact with external tools and user interfaces[2]. This blog dives deeply into the architecture of MCP‑based apps, explains the SDK’s capabilities, walks through building an app step by step, explores how users discover and use apps, and discusses privacy and security considerations. Throughout we cite official documentation and reputable journalism to ground the analysis in credible sources.

Understanding the Model Context Protocol (MCP)

Why Open Standards Matter

The Model Context Protocol is the foundation of the Apps SDK. According to the developer documentation, every Apps SDK integration uses an MCP server to expose tools, handle authentication and package both structured data and HTML that renders in ChatGPT[2]. MCP is an open standard—anyone can implement a server in any language and connect a model such as GPT‑4 or Codex. The open‑source nature means there is no vendor lock‑in; the same app can theoretically run on any AI platform that implements the protocol. This openness encourages community contributions and fosters an ecosystem analogous to the early web, where standards like HTTP enabled interoperable websites.

Servers, Tools and Resources

An MCP server exposes one or more tools. A tool defines an action the model may call, such as “create a kanban board,” “search for houses,” or “generate a playlist.” Each tool is described by a machine name, a human‑friendly title and a JSON schema that tells the model what arguments it accepts. When ChatGPT decides the tool should be invoked, it sends a structured call to the server. The server executes the logic—whether by querying an API, performing a computation or interacting with a database—and then returns a tool response. This response includes three fields:

structuredContent – data visible to the model that describes the current state. For example, a kanban board might include an array of columns and tasks[3].
content – optional text that the assistant speaks back to the user. This can summarise the result or instruct the user.
_meta – hidden metadata not visible to the model. Developers use this to store IDs or lists used in UI components. For instance, the board example uses a tasksById map in _meta to maintain task details without exposing them to the model[4].

Tools can also refer to resources, such as HTML templates or images, by referencing a ui:// URL. The server registers these resources during startup. The documentation warns that because resources are cached by OpenAI’s infrastructure, developers should version them by including a build hash in the filename[5]. Otherwise, users might see stale UI after deployments.

Structured Content vs. Metadata

The distinction between structuredContent and _meta is critical. According to the docs, structuredContent is visible to the model and is used to hydrate the UI component; _meta is hidden from the model and may contain extra data for the UI such as lists for dropdown menus[3]. By separating visible and hidden data, developers can protect sensitive information from the model while still rendering rich interfaces. This design also encourages minimal data sharing; only what is needed to accomplish the task is exposed, aligning with privacy principles.

Authentication and Sessions

When a user first calls an app, the server may need to authenticate them. The Apps SDK supports OAuth 2.1 flows; developers specify scopes and redirect users to the identity provider. Once the user grants consent, the app obtains a token and can access the user’s data. The server’s job is to manage session state, often by storing tokens in a database keyed to the user’s ChatGPT account. This ensures that subsequent tool calls can reuse the session without prompting the user again.

Security Principles

OpenAI emphasises least privilege, explicit user consent and defense in depth[6]. Apps should request only the minimum permissions needed, and users must explicitly authorise data sharing; the model itself should never guess credentials. Data retention is limited: structured content remains only while the user’s prompt is active, and logs are redacted before being shared with developers[6]. Network access for app components is restricted by a content security policy; iframes cannot access arbitrary browser APIs, and all HTTP requests must originate from the server rather than the client[7]. This prevents cross‑site scripting and exfiltration of tokens.

The Apps SDK: Building Real Applications in ChatGPT

The Developer Experience

The Apps SDK wraps the MCP in idiomatic client libraries (currently Python and TypeScript) and scaffolding tools. When you create an app, you define the tools, register UI templates and implement server logic. The server can run on your own infrastructure and uses any framework (FastAPI, Express, etc.), but it must implement the MCP endpoints. OpenAI provides development servers and an MCP Inspector to test calls locally.

Developers design both the logic and the user interface. UIs are usually written in React and compiled into static resources. They are served inside a sandboxed iframe in ChatGPT. Within this iframe, developers can access a global window.openai object to interact with the host. According to the Build a custom UX guide, this API provides:

Globals – displayMode, maxHeight, theme and locale inform the component about layout and style[8].
Tool payloads – toolInput, toolOutput and widgetState allow reading the arguments, results and persistent state across renders[8].
Actions – setWidgetState() saves state that persists across messages; callTool() triggers a server action; sendFollowupTurn() sends a follow‑up prompt to the model; requestDisplayMode() asks to go fullscreen or picture‑in‑picture[8].
Events – the component can subscribe to openai:set_globals when the host updates layout or theme, and openai:tool_response when a tool call resolves[8].

These APIs let developers build rich interactive components that stay synchronised with the model’s reasoning. For example, if a user drags a task to a new column in a kanban board, the component can send a callTool to update the server, persist the new state, and then return a new structuredContent. Meanwhile the model sees only the high‑level board state; the UI handles details like drag‑and‑drop.

Registering Tools and Templates

In the server code you register a tool and its template. For instance, in a TypeScript server you might write:

import { Tool, StructuredToolResponse } from "@openai/apps";

// Register UI template
server.registerResource("ui://kanban-board/abc123", buildHtml());

// Define tool schema
const createBoard: Tool = {
  name: "createKanbanBoard",
  description: "Create a new kanban board with given tasks and columns",
  inputSchema: z.object({
    title: z.string(),
    columns: z.array(z.object({ name: z.string() })),
    tasks: z.array(z.object({ name: z.string(), columnIndex: z.number() }))
  }),
  async execute(input, ctx): Promise<StructuredToolResponse> {
    // compute board state
    const columns = input.columns.map((col, i) => ({
      id: i,
      title: col.name,
      taskIds: input.tasks.filter(t => t.columnIndex === i).map((_t, idx) => idx)
    }));
    const tasksById = input.tasks.map((task, id) => ({ id, name: task.name }));
    return {
      content: `Created board '${input.title}'`,
      structuredContent: { title: input.title, columns },
      _meta: { tasksById, uiTemplate: "ui://kanban-board/abc123" }
    };
  }
};

The _meta field includes tasksById for hidden metadata and uiTemplate referencing the registered HTML. When ChatGPT receives this response, it will render the template with the structured content. The window.openai.toolOutput object in the component can then read the board data and display it.

Versioning and Caching

Because resources like UI templates are cached on OpenAI’s servers, developers should include a unique hash or version in the ui:// identifier. The docs caution that if you deploy a new version without updating the path, users may continue to see the old UI due to caching[5]. A best practice is to embed the commit SHA or build ID into the URL. This ensures that each deployment results in a fresh resource.

Persisting State and Follow‑ups

Components often need to persist state. For instance, a playlist app might let users favourite songs; these favourites should remain even when the user asks another question. The setWidgetState() method stores data outside of structuredContent and persists across turns[8]. The model does not see this state, ensuring privacy.

Sometimes an app needs to ask the user a clarifying question. The sendFollowupTurn() method allows the component to send a new prompt back to ChatGPT, which will then appear in the transcript as if the model asked the question[8]. This is useful for multi‑step workflows: for example, a travel booking app might ask “How many nights will you stay?” after the user selects a hotel.

Building Your First App: Step‑By‑Step Guide

In this section we will build a simple Task Tracker app that demonstrates the core concepts of the Apps SDK. The app will let a user create tasks and organise them into categories. We choose this example because it is generic, easy to extend and showcases structured content, metadata, custom UI and tool calls.

Set up the MCP Server

First install the TypeScript SDK and scaffolding tool:

npm install -g @openai/apps-generator
apps init task-tracker
cd task-tracker
npm install

This command scaffolds a project with a server, a React frontend and build scripts. The server uses Express and the @openai/apps library. Run npm run dev to start the development server; the project includes an MCP Inspector that opens in your browser and simulates ChatGPT calling your app.

Define the Tool

Open src/server.ts and define a tool called createTasks. The tool accepts an array of tasks and returns structured content grouping them by category. It also provides a summary in the content field.

import { Tool, StructuredToolResponse } from "@openai/apps";

export const createTasks: Tool = {
  name: "createTasks",
  description: "Create a list of tasks grouped by category",
  inputSchema: z.object({ tasks: z.array(z.object({ name: z.string(), category: z.string() })) }),
  async execute({ tasks }): Promise<StructuredToolResponse> {
    const categories = Array.from(new Set(tasks.map(t => t.category)));
    const grouped = categories.map(category => ({
      name: category,
      taskIds: tasks.filter(t => t.category === category).map((_, i) => i)
    }));
    const tasksById = tasks.map((task, id) => ({ id, name: task.name, category: task.category }));
    return {
      content: `Created ${tasks.length} tasks in ${categories.length} categories`,
      structuredContent: { categories: grouped },
      _meta: { tasksById, uiTemplate: "ui://task-tracker/1.0.0" }
    };
  }
};

server.registerResource("ui://task-tracker/1.0.0", fs.readFileSync(path.join(__dirname, "../dist/index.html"), "utf8"));
server.registerTool(createTasks);

Build the Custom UI

Next open src/frontend/App.tsx. This React component will read the structuredContent and display categories and tasks. It will also allow users to mark tasks as complete and persist that state using setWidgetState.

import { useEffect, useState } from "react";

declare global {
  interface Window {
    openai: any;
  }
}

export default function App() {
  const [complete, setComplete] = useState<{ [id: string]: boolean }>(() => window.openai.widgetState?.complete || {});
  const output = window.openai.toolOutput;
  const tasksById = output?._meta?.tasksById || [];
  const categories = output?.structuredContent?.categories || [];

  // persist completion state
  useEffect(() => {
    window.openai.setWidgetState({ complete });
  }, [complete]);

  return (
    <div className="task-tracker">
      {categories.map((cat: any, ci: number) => (
        <div key={ci} className="category">
          <h3>{cat.name}</h3>
          <ul>
            {cat.taskIds.map((tid: number) => (
              <li key={tid}>
                <label>
                  <input type="checkbox" checked={complete[tid]} onChange={() => setComplete(prev => ({ ...prev, [tid]: !prev[tid] }))} />
                  {tasksById[tid].name}
                </label>
              </li>
            ))}
          </ul>
        </div>
      ))}
    </div>
  );
}

This component uses window.openai.toolOutput to access the structuredContent and _meta fields. It stores completion state in widgetState so that checking a box persists even when the user continues the conversation. On subsequent tool calls, the component can fetch new tasks or update existing ones. This demonstrates how to combine model reasoning with client‑side interactions.

Testing and Iterating

Run npm run dev again and open the MCP Inspector. In the prompt area, type:

@task‑tracker create a list of tasks: buy milk in shopping, finish report in work, call mom in personal

The inspector will show the structured content and render the task list UI. You can check tasks off; the state persists across turns. You can then ask ChatGPT: “Remind me of my tasks later.” Because the model retains context, it can call the tool again, display the UI and summarise your progress.

How Users Discover and Use Apps

Named Mention and In‑Conversation Discovery

ChatGPT surfaces apps when it believes they can assist the user. There are two primary discovery modes. Named mention occurs when the user explicitly mentions the app name at the beginning of a prompt; in this case, the app will be surfaced automatically[9]. For instance, “@Spotify create a workout playlist” immediately invokes the Spotify integration. The user must place the app name at the start; otherwise the assistant may treat it as part of the conversation.

In‑conversation discovery happens when the model infers that an app could help based on context. The documentation explains that the model evaluates the conversation context, prior tool results and the user’s linked apps to determine which app might be relevant[9]. For example, if you are discussing travel plans, ChatGPT might suggest the Expedia app to book flights. The algorithm uses metadata like tool descriptions and keywords to match the conversation with potential actions[10]. Developers can improve discoverability by writing action‑oriented descriptions and clear UI component names.

Directory and Launcher

OpenAI plans to release an app directory where users can browse and discover new apps[10]. Each listing will include the app name, description, supported prompts and any onboarding instructions. Users can also access the launcher via the “+” button in chat; this shows a menu of available apps based on context. These entry points will help less technical users find and enable apps without memorising names.

Onboarding and Consent

The first time a user activates an app, ChatGPT initiates an onboarding flow. The model asks the user to connect their account (if required) and explains what data the app needs. The developer guidelines emphasise that apps must respect users’ privacy, behave predictably and have clear policies[11]. Users must explicitly grant or deny permission; there is no silent data access. Once connected, the app can remain linked for subsequent interactions, but users always have the ability to disconnect and revoke permissions.

Privacy, Security and Responsible Design

Principles of Trustworthy Apps

OpenAI’s App Developer Guidelines define several principles to ensure the ecosystem remains safe and trustworthy. Apps must provide a legitimate service, have a clear privacy policy and data retention practices, and comply with usage policies[11]. They should minimise data collection, avoid storing sensitive personal information and not share user data without consent[12]. Apps must behave predictably; they cannot manipulate the model to produce harmful or misleading content.

Data Boundaries and Minimisation

The guidelines stress that apps should only collect data essential for their function and must not request or store sensitive data like health records or government IDs[12]. Structured content sent to the model should not contain secrets; hidden metadata should not store user tokens or private details. Developers must implement strong encryption and secure storage for any tokens obtained during OAuth. The server should maintain strict boundaries between user sessions; data from one user must never leak into another’s context.

Security Measures in the SDK

The Security & Privacy Guide outlines defence mechanisms built into the platform. It emphasises least privilege and explicit user consent as central principles[6]. Data retention is limited; logs accessible to developers are redacted to remove personally identifiable information, and structured content is only retained as long as the prompt requires[6]. Network access from within the iframe is restricted by content security policy; external fetches must go through the server, preventing unauthorized cross‑origin requests[7]. Authentication uses industry‑standard OAuth flows with short‑lived tokens. Developers are required to implement security reviews, bug reporting channels and incident monitoring to maintain operational readiness[7].

Fairness and Appropriateness

Apps must be appropriate for a broad audience. The guidelines forbid apps that deliver long‑form content, complex automation or advertisements[13]. For example, an app should not try to deliver a 30‑minute video or replicate an entire social network within ChatGPT. The platform encourages succinct interactions that complement the conversational flow. Violations may lead to rejection or removal.

Opportunities and Considerations

A New Distribution Channel for Developers

By opening ChatGPT to third‑party apps, OpenAI positions itself as an “intent layer” between users and services. Developers can now reach millions of users through the chat interface without building separate web or mobile apps. Apps have the potential to lower friction: instead of downloading an app or visiting a website, users just mention the service in conversation. This could democratise access to tools and level the playing field for small developers.

Early partnerships show the possibilities: users can watch Coursera lectures while asking ChatGPT questions; design posters in Canva; browse Expedia travel options or Zillow real estate listings; generate Spotify playlists; or diagram ideas with Figma[14][13]. Because the apps run inside chat, the model can summarise, analyse and generate recommendations, turning static content into interactive lessons. The apps also offer multiple display modes—inline cards, fullscreen or picture‑in‑picture—providing flexibility for different tasks[15].

Transforming User Expectations

The ability to use apps without switching contexts could reshape how people interact with services. ChatGPT becomes not just a chatbot but a universal operating system for intents. As Casey Newton observed, this moves us from launching discrete apps to simply stating what we want[16]. Some analysts compare this shift to the launch of the App Store or the browser: a single platform that aggregates functionality and competition.

However, this transformation raises questions about control and power. If ChatGPT determines which apps to surface, it might become a gatekeeper. Newton warns that an “AI graph” built on user preferences could create privacy risks more serious than those of social networks[16]. Economic incentives could lead to pay‑to‑play placement or ranking of apps. Developers may feel pressured to design for ChatGPT instead of owning their relationship with users. It is crucial that the platform remains transparent and fair to maintain trust.

Regulatory and Ethical Implications

Because apps can access personal data—location, contacts, payment methods—regulators may scrutinise how data flows through ChatGPT. Developers must comply with privacy laws like GDPR, even though the platform is not yet available in the European Union[17]. OpenAI has promised more granular privacy controls and monetisation options, including an agentic commerce protocol that will allow instant checkout within chat[18]. The success of this ecosystem will depend on robust security, clear user consent and fair economic models.

Future Directions and Research

The Apps SDK is still in preview, and many features remain to be fleshed out. The developer roadmap includes:

Submission and review workflow – Currently developers can build apps but cannot list them publicly. A formal review process will ensure compliance with guidelines and trust.
Revenue sharing and monetisation – OpenAI hinted at an agentic commerce protocol that could let users purchase goods directly in chat[18]. This raises opportunities for e‑commerce but also questions about fees, rankings and competition.
Developer tooling – More languages and frameworks, improved debugging tools and easier deployment pipelines will lower the barrier to entry. The open standard nature of MCP may lead to community‑driven implementations and hosting providers.
Interoperability – Because MCP is open, other platforms or models could adopt it. This could enable a cross‑model app ecosystem where developers write once and run anywhere. Research on standardising agent protocols and context sharing will be important.
Safety research – Evaluating how to prevent prompt injection, malicious code, or misuse of user data remains a major area of research. Papers on adversarial attacks against LLM‑integrated applications will inform best practices and guidelines.

Conclusion: A new OS in the Making

The introduction of Apps in ChatGPT and the MCP‑based Apps SDK marks a significant shift in how we interact with software. By bringing third‑party applications directly into the chat interface, OpenAI has created a new platform that blends natural language, reasoning and interactive UIs. The Model Context Protocol provides an open, standardised way for models to call tools and render components; the Apps SDK simplifies development by handling server communication, UI integration and state management. Step‑by‑step examples like the Task Tracker demonstrate how easy it is to build a useful app while maintaining strict data boundaries and privacy.

Yet this innovation comes with responsibilities. Developers must follow guidelines that prioritise user privacy, safety and fairness[11][12]. Security mechanisms like least privilege and explicit consent protect users[6]. At the same time, industry observers caution that the platform could create new forms of gatekeeping and privacy risks[16]. As the ecosystem matures, transparency, open standards and community engagement will determine whether ChatGPT’s app platform becomes a transformative, trusted layer for everyday tasks.

[1] AI Arms Race Latest: ChatGPT Now Lets Users Connect With Spotify And Zillow In Chats

https://www.forbes.com/sites/antoniopequenoiv/2025/10/06/openais-chatgpt-now-connects-with-third-party-apps-like-spotify-and-zillow-heres-the-latest-in-the-ai-arms-race/

[2] [3] [4] [5] Set up your server

https://developers.openai.com/apps-sdk/build/mcp-server

[6] [7] Security & Privacy

https://developers.openai.com/apps-sdk/guides/security-privacy

[8] Build a custom UX

https://developers.openai.com/apps-sdk/build/custom-ux

[9] [10] User Interaction

https://developers.openai.com/apps-sdk/concepts/user-interaction

[11] [12] App developer guidelines

https://developers.openai.com/apps-sdk/app-developer-guidelines/