Cross‑Lingual Personalization: How Macaron AI Bridges Culture

Author: Boxu Li at Macaron

Introduction

When Macaron AI was unveiled in August 2025 it positioned itself not as another enterprise assistant but as a personal companion designed to enrich everyday life. Its mission is inherently international: from the outset the platform supported English, Chinese, Japanese, Korean and Spanish, signalling an ambition to operate across linguistic and cultural boundaries. For users in Japan and South Korea – two countries with vibrant yet distinct digital ecosystems – this multilingual promise is more than a marketing slogan. It raises technical questions: How does Macaron handle cross‑lingual conversations? How does its memory system cope with diverse scripts, vocabulary and cultural references? What design choices enable a single agent to "think" in hiragana one moment and Hangul the next? This blog explores Macaron AI's cross‑lingual architecture and the mechanisms that allow it to personalize experiences for Japanese and Korean users while maintaining a coherent identity.

Personalization at scale requires more than translation. Macaron aims to model who you are through daily interactions, remembering not just facts but nuances like dietary goals and emotional highs. Achieving this for multiple languages demands data structures and algorithms that can capture meaning across writing systems, handle code‑switching, and respect cultural norms. This post breaks down the underlying techniques: multilingual tokenization, reinforcement‑guided memory retrieval, distributed identity management, and cultural adaptation. We will also discuss challenges such as bias, privacy and cross‑regional compliance, and outline research directions for cross‑lingual personal agents.

1 Multilingual Architecture and Tokenization

1.1 Universal vocabulary with script‑aware subword units

Large language models rely on tokenizers to break raw text into units the model can process. For languages like English and Spanish, subword tokenization (Byte‑Pair Encoding or SentencePiece) can capture morphology reasonably well. Japanese and Korean, however, pose unique challenges. Japanese mixes three scripts (kanji, hiragana and katakana) and lacks spaces, while Korean's Hangul is a featural alphabet assembled into syllable blocks. Macaron's engineers therefore build a multilingual vocabulary with script‑aware subword units. Each token encodes not only characters but also a language identifier, enabling the model to distinguish between homographs (e.g., "ha" could be a Korean phoneme or the Japanese particle "は"). The vocabulary includes tokens for common kanji compounds, radicals and Hangul jamo, allowing the model to represent morphological units efficiently and to break down rare words into meaningful pieces.

By sharing subword units across languages, Macaron leverages cross‑lingual transfer. For example, the concept of "study" appears in Japanese as 勉強 (benkyō) and in Korean as 공부 (gongbu). While the characters and sounds differ, the agent uses semantic embeddings learned across languages to map these tokens to a similar vector space. This unified representation enables Macaron to understand a Japanese user's interest in "language study" and later apply that knowledge when a Korean friend asks about "공부 계획" (study schedule). Without a unified vocabulary, the model would treat these as unrelated concepts.

1.2 Context window and alignment across scripts

Macaron's 671‑billion‑parameter model is trained on a large multilingual corpus, but the sheer sequence length of conversations requires an efficient context window. Japanese and Korean sentences can be longer than English due to the agglutinative nature of verbs and embedded particles. To support long dialogues, Macaron employs a hierarchical attention mechanism: the model processes local windows (sentences or paragraphs) before passing summarized representations to a global layer. This approach reduces the memory footprint while allowing the agent to maintain context across extended conversations. It also supports cross‑script alignment, where the model learns correspondences between segments in Japanese and Korean by minimizing the distance between their representations during training (a technique borrowed from cross‑lingual natural language processing).

1.3 Runtime language detection and code‑switching

Japanese and Korean users often mix English or Chinese terms into conversations, especially in technical domains or pop culture. Macaron's inference pipeline includes a runtime language detector that tags each incoming utterance with probability scores for supported languages. When a sentence includes loanwords or phrases from multiple languages, the agent splits the input into segments and processes each with the appropriate language context. This ensures correct pronunciation in voice output and proper handling of idioms. The memory subsystem attaches language tags to retrieved entries, allowing Macaron to retrieve relevant experiences even when the query language differs from the stored language.

2 Memory Token and Cross‑Lingual Retrieval

2.1 Reinforcement‑guided retrieval and memory tokens

Macaron's hallmark innovation is its memory token, a dynamic pointer that helps the agent decide what to remember, when to update memory, and how to apply those memories to current tasks. The token interacts with a hierarchical memory bank: short‑term context, medium‑term episodic memory and long‑term knowledge. Reinforcement learning (RL) trains the agent to adjust the token based on feedback such as user satisfaction and task success. If a Japanese user repeatedly asks about the same train schedule, the RL policy learns to promote those details in memory. If a Korean user expresses discomfort when past comments are resurfaced, the policy learns to decay references faster.

2.2 Distributed identity and domain boundaries

The Macaron team rejects the notion of a monolithic user profile; instead, identity is treated as an emergent narrative built from small interactions. Memories are organized by domain boundaries (e.g., work, hobbies, family) with a relevance federation mechanism that allows cross‑domain retrieval. For Japanese and Korean users, domain boundaries also include language domains: a memory item might be tagged as "Japanese—hobbies—music" or "Korean—family—finance". When the agent receives a query in Korean, it first searches Korean memories but can federate to Japanese memories if the semantic content matches. This prevents cross‑contamination while enabling cross‑lingual continuity.

2.3 Reference decay and privacy in multilingual contexts

Memories that are rarely accessed decay over time; the decay rate can vary across domains. The reference decay mechanism reduces the weight of unused memories, ensuring that a Japanese user's brief interest in a Korean drama does not permanently occupy memory space. Decay also supports privacy; sensitive information about family or finances can be set to decay faster. Users can explicitly delete memories or mark them as confidential. Macaron's policy binding framework attaches machine‑readable privacy rules directly to data, so that a memory with a "private—Korean" tag might only be accessible during authenticated sessions in that language. Combined with differentiated transparency, which offers different levels of disclosure to different stakeholders, these mechanisms allow Macaron to navigate Japan's privacy norms and Korea's evolving AI regulations.

3 Cultural Adaptation and Persona Customization

3.1 Onboarding through personality tests and color palettes

Upon signing up, users complete three personality tests that help Macaron match them with a personalized persona – including colours, communication styles and voice. In Japan, where aesthetic harmony and formality are valued, the tests might emphasize social etiquette, while Korean questionnaires might focus on family dynamics and peer relationships. The resulting persona influences not only the user interface but also the agent's politeness level, tone and choice of cultural references. A Japanese persona might prefer indirect suggestions ("How about planning a picnic next week?"), whereas a Korean persona might appreciate direct encouragement ("Let's plan a family trip!").

3.2 Localized mini‑apps: from kakeibo to hojikwan

Macaron's ability to generate mini‑apps on demand is not limited to generic productivity tools. The platform can produce bespoke applications with over 100,000 lines of code, such as a budgeting tool inspired by Japan's kakeibo tradition (a method of household accounting) or a Korean hojikwan planning app (managing family events and ancestral memorials). The user simply describes their needs in natural language, and the agent synthesizes a program that aligns with local customs. This requires a library of domain‑specific templates and the ability to integrate local calendars, public holidays and financial regulations. Reinforcement learning optimizes the generation process by evaluating user satisfaction: if Japanese users frequently tweak the kakeibo app to add categories like "omiyage" (souvenir) and "otsukuri" (monthly charity), the generator learns to include them by default in future apps.

3.3 Emotional norms and communication styles

Japan and South Korea have different norms for expressing emotion. Japanese culture often values modesty and context sensitivity, while Korean culture embraces expressive social interactions. Macaron adapts its response style accordingly, drawing on digital personhood research that emphasises fluid identity and user empowerment. In practice, this means that the agent may use honorific forms and indirect speech when conversing in Japanese, and more proactive suggestions when speaking Korean. The memory system logs feedback on tone and adaptively adjusts conversation styles. These adaptations are not hard-coded but emerge through RL: if a user consistently responds positively to a certain communication style, the reward signal reinforces that behaviour.

4 Implementation Details: Engineering for Cross‑Lingual Personal Agents

4.1 Data collection and training pipeline

Creating a personal agent that can converse in Japanese and Korean requires high‑quality data. Macaron's training corpus includes licensed books, news articles, blogs, transcripts and user‑generated content across all supported languages. Data is filtered for politeness, bias and domain coverage. The pre‑training phase uses masked language modelling and next‑token prediction on combined multilingual data to learn shared representations. Fine‑tuning introduces reinforcement learning from human feedback (RLHF): bilingual annotators in Tokyo and Seoul rate responses for cultural appropriateness, enabling the model to learn subtle cues such as when to use honorifics or when to ask clarifying questions. Additional contrastive learning objectives encourage alignment between semantically equivalent phrases across languages.

4.2 Cross‑lingual memory index and vector retrieval

Macaron's memory bank stores embeddings in a high‑dimensional vector space. For each memory item, the agent computes a representation that captures both the content and the language. A cross‑lingual memory index uses approximate nearest neighbour search to retrieve items regardless of the language of the query. For example, if a Korean user asks "피자 만들기 레시피" (pizza recipe), the agent may find a Japanese memory about "ピザの作り方" (how to make pizza) because both embed close to the concept of pizza. At retrieval time, the agent filters by user permissions and then converts the retrieved memory into the user's preferred language using a built‑in translator and summarizer. This enables knowledge sharing across languages while preserving privacy boundaries.

4.3 Safety and bias mitigation

Cross‑lingual models risk propagating biases present in training data. For Japan and Korea, where gender roles and age hierarchies play significant cultural roles, Macaron implements bias‑mitigation strategies. During fine‑tuning, the RL reward includes penalties for responses that reinforce stereotypes or violate local norms (e.g., assuming that only women handle household finances). The policy binding system ensures that personal data is never translated across languages without user consent. Furthermore, Macaron's differentiated transparency allows regulators to audit model behaviour at varying levels of detail: Japanese authorities might review general usage patterns, while Korean regulators could inspect raw logs under strict confidentiality.

5 Challenges and Research Directions

5.1 Handling dialects and regional variations

Both Japanese and Korean have regional dialects. In Japan, Kansai dialect uses different vocabulary and intonation than standard Tokyo speech. Korean dialects such as Jeolla and Gyeongsang present similar challenges. Current language detectors may misclassify dialectal inputs, leading to awkward responses. Future work could incorporate dialect embeddings trained on regional corpora, enabling the agent to identify and respond in the appropriate dialect. Users could even ask Macaron to mimic a specific accent, which might be appealing for role‑playing games or language learning modules.

5.2 Cross‑lingual commonsense reasoning

While the current model aligns semantic representations across languages, commonsense reasoning still suffers from cultural gaps. Expressions like "tsundoku" (積ん読, buying books and not reading them) or "빵셔틀" (bbang shuttle, a slang term for someone bullied into buying bread for others) have no direct English equivalent. Research on cross‑lingual commonsense knowledge graphs could help Macaron understand and explain such culture‑specific concepts. Integration with knowledge bases like ConceptNet or localized versions of ATOMIC could provide structured cultural knowledge that complements the LLM's statistical learning.

5.3 Privacy and regulatory alignment

The AI Promotion Act in Japan emphasises transparency and aligns AI development with existing regulations, while Korea's proposed AI Framework Act introduces obligations for risk management and human oversight. Personal agents must navigate these frameworks while respecting user privacy. Research is needed on federated learning to keep user data on device, differential privacy to prevent de‑identification across languages, and legal compliance engines that can interpret regulatory text in Japanese and Korean and map it to policy binding rules.

5.4 Cross‑modal integration

Future personal agents will not be limited to text. Macaron's vision includes connecting to IoT devices, VR interfaces and wearables. Cross‑modal interaction adds new complexity when dealing with multiple languages: a Japanese user might speak to a smart speaker in Japanese while reading Korean subtitles on a mixed reality headset. Aligning audio, text and visual data across languages will require multimodal transformers that can process speech, text and images simultaneously, as well as temporal synchronization between modalities.

5.5 Case study: bilingual education apps

To illustrate how cross‑lingual personalization works in practice, consider a Japanese user who wants to learn Korean and asks Macaron to build a study app. The agent begins by consulting the user's memory for previous language experiences—perhaps they studied English, so the agent knows they prefer visual aids and spaced repetition. The intent parser extracts slots like "target language: Korean," "source language: Japanese," "study focus: grammar and vocabulary," and "daily time: 20 minutes." Macaron's program synthesis engine then assembles modules: a morphological analyzer for Hangul, a sentence segmentation module for Japanese subtitles, a spaced‑repetition scheduler, and a quiz generator that integrates examples from the user's interests (e.g., Korean dramas or J‑pop lyrics).

The resulting app presents vocabulary cards with pronunciations, example sentences and cultural notes. A bidirectional translation layer links Korean vocabulary to equivalent Japanese phrases, using the cross‑lingual embeddings described earlier. Reinforcement learning personalizes the sequence: if the user struggles with verb conjugations, the reward model prioritizes grammar exercises; if they enjoy reading song lyrics, the agent surfaces more lyric translations. Because the memory system tags each lesson with language and domain, progress in Korean studies can later inform the user's Japanese creative writing, fostering transfer learning between languages. Users can share their bilingual study plans in the Macaron community, and the agent monitors feedback to refine the module library.

5.6 Philosophical reflections on cross‑lingual identity

The ability to operate across languages raises deeper questions about digital identity. Macaron's self‑model treats identity as an emergent narrative built from interactions. When those interactions occur in multiple languages, the narrative becomes even more fluid. Words carry cultural connotations: the Japanese term kokoro and the Korean term 마음 both translate to "heart/mind" but evoke different nuances. As Macaron weaves a user's memories across languages, it must decide which words to use when referring to feelings or memories. This choice shapes the user's perception of themselves. Philosophers of language argue that thought is influenced by the words we use; Macaron operationalizes this idea by selecting language based on context and desired emotional tone.

Cross‑lingual identity also touches on the concept of digital personhood. A user might maintain different personas in Japanese and Korean contexts—formal and reserved at work, casual and expressive in fandom communities. Macaron respects these boundaries by maintaining separate memory clusters while allowing deliberate cross‑pollination. Over time, users may choose to merge aspects of their identities, discovering common threads between their Japanese and Korean lives. Macaron facilitates this process by highlighting similar values, habits and aspirations found in both sets of memories, helping users craft a coherent personal narrative across cultures.