EN KO
← All Publications

Multi-Intent Detection for Korean Spoken Language

Korea Computer Congress 2024 (KCC 2024) Outstanding Paper Award
Yejin Yoon*, Jisu Kim*, Jungmin Im, Jungyeon Lee, Taeuk Kim

One-Line Summary

Constructs the first Korean multi-intent detection dataset for spoken language and benchmarks multiple models, revealing that Korean-specific linguistic features such as conjunctive verb endings and topic marker omission make multi-intent detection significantly harder than in English.

Background & Motivation

In real-world dialogue systems, users frequently express multiple intents within a single utterance — for example, asking about the weather while simultaneously requesting a restaurant recommendation. While multi-intent detection (MID) has been actively studied for English, Korean presents fundamentally different challenges that cannot be addressed by simply translating existing datasets or fine-tuning English-centric models.

Why Korean Multi-Intent Detection Is Uniquely Challenging:

  • Agglutinative morphology: Korean expresses grammatical relations through suffixes attached to word stems, making intent boundaries harder to isolate at the token level compared to English's analytic structure.
  • Conjunctive verb endings (연결어미): Korean speakers merge multiple intents using endings like -고 (and), -(으)면서 (while), and -(으)니까 (because/since), which often blur intent transitions without explicit delimiters.
  • Topic/subject marker omission: In spoken Korean, particles like -은/는 (topic) and -이/가 (subject) are frequently dropped, removing syntactic cues that help delineate separate intent clauses.
  • Lack of Korean MID resources: Existing multi-intent datasets (MixATIS, MixSNIPS) are entirely English-based, and direct translation fails to capture natural Korean multi-intent utterance patterns.

These factors mean that a Korean user saying the equivalent of "Check tomorrow's weather and also book a nearby restaurant" will produce an utterance where intent boundaries are encoded through morphological suffixes rather than separate clauses — a pattern existing English MID models are not designed to handle.

Concrete Examples of Korean Multi-Intent Challenges

PhenomenonKorean ExampleEnglish GlossDifficulty
Conjunctive ending (-고)"내일 날씨 알려주 맛집도 추천해줘""Tell me tomorrow's weather and recommend a restaurant"Intent boundary hidden in suffix
Simultaneous (-면서)"음악 틀어주면서 알람 설정해줘""Play music while setting an alarm"Temporal overlap blurs intent separation
Topic marker omission"날씨 어때 식당 예약해줘""How's the weather, book a restaurant"No syntactic marker between intents
Honorific variation"날씨 알려주세요" vs. "날씨 알려줘"Same intent, different formalitySurface variation confuses classifiers

Proposed Method

The authors propose a three-stage approach: constructing a Korean-native multi-intent dataset, adapting intent annotation guidelines to Korean linguistic conventions, and benchmarking both traditional and pretrained language models on the new dataset. The methodology is designed to produce a dataset that is linguistically authentic rather than a translation artifact, ensuring that the resulting benchmark genuinely measures Korean-specific MID challenges.

1
Korean Multi-Intent Dataset Construction
Rather than translating existing English MID datasets, the authors construct utterances from scratch to reflect natural Korean spoken language patterns. Multi-intent utterances are generated by combining single-intent templates using authentic Korean conjunctive structures. The dataset covers diverse intent combinations across multiple domains (weather, transportation, restaurants, schedule, etc.), ensuring that intent blending patterns reflect real Korean conversational behavior.
2
Korean-Specific Linguistic Annotation
Annotation guidelines are adapted to handle Korean-specific phenomena: honorific speech levels (e.g., 해요체 vs. 합쇼체) that affect utterance structure, zero-anaphora where subjects are dropped entirely, and cases where conjunctive endings create ambiguity between a single complex intent and genuinely separate intents. Annotators are trained to distinguish between syntactic coordination (single intent with multiple actions) and true multi-intent utterances.
3
Model Benchmarking & Analysis
Multiple model architectures are evaluated on the Korean MID dataset, including BERT-based classifiers (KR-BERT, KoBERT), sequence-level multi-label classification approaches, and token-level intent boundary detection methods. The evaluation uses both exact-match accuracy (all intents correctly identified) and partial-match metrics (F1 score over individual intents) to capture the full picture of model capabilities. Error analysis is conducted to categorize failure modes specific to Korean.

Experimental Results

The benchmark experiments reveal a consistent and significant performance gap between Korean multi-intent detection and comparable English tasks, confirming that language-specific challenges are the primary bottleneck rather than model capacity or dataset size.

Error Analysis Summary: A systematic categorization of model failures reveals three dominant error types, each tied to a specific Korean linguistic phenomenon:

  • Intent merging (most frequent): The model treats two distinct intents as a single compound intent, triggered primarily by conjunctive endings that join clauses without explicit separation.
  • Intent omission: The model detects only one of multiple intents, typically missing the latter intent when topic/subject markers are omitted, causing the model to interpret the utterance as a single extended request.
  • Intent splitting: Less common but notable — the model incorrectly splits a single complex intent into multiple intents, often triggered by honorific inflections that create surface-level variation within a single request.

Key Findings

FindingDetail
Korean vs. English difficultyModels trained on Korean MID show substantially lower exact-match accuracy compared to English MID benchmarks, even when using Korean-pretrained encoders
Implicit intent blendingUtterances using conjunctive endings (-고, -(으)면서) to merge intents are the hardest category, with the largest accuracy drops
Subject omission impactUtterances with dropped topic/subject markers cause models to misidentify the number of intents, often under-predicting
Honorific variationDifferent speech levels for the same intent content introduce surface-level variation that confuses classifiers

Why It Matters

This work fills a critical gap in Korean dialogue understanding research by establishing the first systematic study of multi-intent detection for Korean spoken language. As Korean conversational AI products (voice assistants, customer service chatbots, smart home controllers) mature, the ability to handle multi-intent utterances is no longer optional but essential for user satisfaction. The key contributions are threefold:

The paper was recognized with the Outstanding Paper Award at KCC 2024, underscoring its significance to the Korean NLP community.

Dialogue Multilingual