One-Line Summary
Constructs the first Korean multi-intent detection dataset for spoken language and benchmarks multiple models, revealing that Korean-specific linguistic features such as conjunctive verb endings and topic marker omission make multi-intent detection significantly harder than in English.
Background & Motivation
In real-world dialogue systems, users frequently express multiple intents within a single utterance — for example, asking about the weather while simultaneously requesting a restaurant recommendation. While multi-intent detection (MID) has been actively studied for English, Korean presents fundamentally different challenges that cannot be addressed by simply translating existing datasets or fine-tuning English-centric models.
Why Korean Multi-Intent Detection Is Uniquely Challenging:
- Agglutinative morphology: Korean expresses grammatical relations through suffixes attached to word stems, making intent boundaries harder to isolate at the token level compared to English's analytic structure.
- Conjunctive verb endings (연결어미): Korean speakers merge multiple intents using endings like -고 (and), -(으)면서 (while), and -(으)니까 (because/since), which often blur intent transitions without explicit delimiters.
- Topic/subject marker omission: In spoken Korean, particles like -은/는 (topic) and -이/가 (subject) are frequently dropped, removing syntactic cues that help delineate separate intent clauses.
- Lack of Korean MID resources: Existing multi-intent datasets (MixATIS, MixSNIPS) are entirely English-based, and direct translation fails to capture natural Korean multi-intent utterance patterns.
These factors mean that a Korean user saying the equivalent of "Check tomorrow's weather and also book a nearby restaurant" will produce an utterance where intent boundaries are encoded through morphological suffixes rather than separate clauses — a pattern existing English MID models are not designed to handle.
Concrete Examples of Korean Multi-Intent Challenges
| Phenomenon | Korean Example | English Gloss | Difficulty |
| Conjunctive ending (-고) | "내일 날씨 알려주고 맛집도 추천해줘" | "Tell me tomorrow's weather and recommend a restaurant" | Intent boundary hidden in suffix |
| Simultaneous (-면서) | "음악 틀어주면서 알람 설정해줘" | "Play music while setting an alarm" | Temporal overlap blurs intent separation |
| Topic marker omission | "날씨 어때 식당 예약해줘" | "How's the weather, book a restaurant" | No syntactic marker between intents |
| Honorific variation | "날씨 알려주세요" vs. "날씨 알려줘" | Same intent, different formality | Surface variation confuses classifiers |
Proposed Method
The authors propose a three-stage approach: constructing a Korean-native multi-intent dataset, adapting intent annotation guidelines to Korean linguistic conventions, and benchmarking both traditional and pretrained language models on the new dataset. The methodology is designed to produce a dataset that is linguistically authentic rather than a translation artifact, ensuring that the resulting benchmark genuinely measures Korean-specific MID challenges.
1
Korean Multi-Intent Dataset Construction
Rather than translating existing English MID datasets, the authors construct utterances from scratch to reflect natural Korean spoken language patterns. Multi-intent utterances are generated by combining single-intent templates using authentic Korean conjunctive structures. The dataset covers diverse intent combinations across multiple domains (weather, transportation, restaurants, schedule, etc.), ensuring that intent blending patterns reflect real Korean conversational behavior.
2
Korean-Specific Linguistic Annotation
Annotation guidelines are adapted to handle Korean-specific phenomena: honorific speech levels (e.g., 해요체 vs. 합쇼체) that affect utterance structure, zero-anaphora where subjects are dropped entirely, and cases where conjunctive endings create ambiguity between a single complex intent and genuinely separate intents. Annotators are trained to distinguish between syntactic coordination (single intent with multiple actions) and true multi-intent utterances.
3
Model Benchmarking & Analysis
Multiple model architectures are evaluated on the Korean MID dataset, including BERT-based classifiers (KR-BERT, KoBERT), sequence-level multi-label classification approaches, and token-level intent boundary detection methods. The evaluation uses both exact-match accuracy (all intents correctly identified) and partial-match metrics (F1 score over individual intents) to capture the full picture of model capabilities. Error analysis is conducted to categorize failure modes specific to Korean.
Experimental Results
The benchmark experiments reveal a consistent and significant performance gap between Korean multi-intent detection and comparable English tasks, confirming that language-specific challenges are the primary bottleneck rather than model capacity or dataset size.
Error Analysis Summary: A systematic categorization of model failures reveals three dominant error types, each tied to a specific Korean linguistic phenomenon:
- Intent merging (most frequent): The model treats two distinct intents as a single compound intent, triggered primarily by conjunctive endings that join clauses without explicit separation.
- Intent omission: The model detects only one of multiple intents, typically missing the latter intent when topic/subject markers are omitted, causing the model to interpret the utterance as a single extended request.
- Intent splitting: Less common but notable — the model incorrectly splits a single complex intent into multiple intents, often triggered by honorific inflections that create surface-level variation within a single request.
Key Findings
| Finding | Detail |
| Korean vs. English difficulty | Models trained on Korean MID show substantially lower exact-match accuracy compared to English MID benchmarks, even when using Korean-pretrained encoders |
| Implicit intent blending | Utterances using conjunctive endings (-고, -(으)면서) to merge intents are the hardest category, with the largest accuracy drops |
| Subject omission impact | Utterances with dropped topic/subject markers cause models to misidentify the number of intents, often under-predicting |
| Honorific variation | Different speech levels for the same intent content introduce surface-level variation that confuses classifiers |
- Conjunctive endings are the primary error source: When intents are merged through Korean verb endings rather than explicit conjunctions like "and" or "also," models frequently fail to detect the intent boundary, treating two intents as one.
- Korean-pretrained models outperform multilingual models: Models pretrained specifically on Korean corpora (KR-BERT, KoBERT) consistently outperform multilingual BERT variants, confirming the importance of language-specific pretraining.
- Multi-label classification outperforms sequence labeling: Treating MID as a multi-label classification task over the full utterance proves more effective than token-level intent boundary detection for Korean, likely because intent boundaries in Korean are morphologically encoded rather than positionally separated.
- Performance degrades with more intents: As the number of intents per utterance increases from 2 to 3+, accuracy drops sharply, indicating that models struggle to maintain sensitivity to multiple overlapping intent signals in complex Korean utterances.
Why It Matters
This work fills a critical gap in Korean dialogue understanding research by establishing the first systematic study of multi-intent detection for Korean spoken language. As Korean conversational AI products (voice assistants, customer service chatbots, smart home controllers) mature, the ability to handle multi-intent utterances is no longer optional but essential for user satisfaction. The key contributions are threefold:
- First Korean MID benchmark: Provides the research community with a purpose-built dataset that captures authentic Korean multi-intent utterance patterns, enabling future research without the limitations of translated English data.
- Language-specific design principles: Demonstrates empirically that simply applying English MID methods to Korean is insufficient, and identifies the specific linguistic phenomena (conjunctive endings, marker omission, honorific variation) that must be addressed for robust Korean intent detection.
- Practical implications for Korean conversational AI: As Korean voice assistants and chatbots become more prevalent, the ability to correctly handle multi-intent utterances is essential for natural user interaction. This work provides both the evaluation framework and the baseline models needed to advance this capability.
The paper was recognized with the Outstanding Paper Award at KCC 2024, underscoring its significance to the Korean NLP community.
Dialogue
Multilingual