EN KO
← All Publications

A Multi-Intent Dataset Enhanced with Implicit Concatenation

Korea Computer Congress 2024 (KCC 2024)
Sungmin So*, Jiwoo Min*, Yejin Yoon, Jungyeon Lee, Taeuk Kim

One-Line Summary

A new multi-intent detection dataset that incorporates realistic linguistic phenomena -- ellipsis and coreference -- to create more naturalistic multi-intent utterances beyond simple concatenation of single-intent sentences, revealing significant performance drops in existing models when faced with natural language complexity.

Background & Motivation

Multi-intent detection is the task of identifying multiple user intents within a single utterance, a common scenario in real-world dialogue systems (e.g., "Book a flight to Seoul and find me a hotel nearby"). While this capability is critical for building practical conversational AI, existing datasets fall short of capturing the true complexity of human language.

Key Limitations of Existing Multi-Intent Datasets:

  • Mechanical concatenation: Most datasets (e.g., MixATIS, MixSNIPS) are constructed by simply joining two or more single-intent utterances with a conjunction, producing stilted and artificial examples like "Play jazz music and what is the weather in Seoul."
  • Missing ellipsis: In natural speech, speakers omit repeated elements (e.g., "Book a flight to Seoul and a hotel there" instead of "Book a flight to Seoul and book a hotel in Seoul"), but existing datasets rarely model this phenomenon.
  • Absent coreference: Real users naturally use pronouns and anaphoric references across intents (e.g., "Find a restaurant nearby and reserve it for two"), which existing datasets do not capture.
  • Evaluation gap: Models trained and evaluated on artificially concatenated data may appear to perform well but fail on real-world utterances that exhibit these natural linguistic phenomena.

This gap between artificial benchmarks and real-world language use motivates the creation of a dataset that incorporates implicit concatenation -- multi-intent utterances where ellipsis and coreference make the expressions compact and natural, yet significantly more challenging for automated systems to parse.

Explicit vs. Implicit Concatenation: Concrete Examples

TypeExampleLinguistic Phenomenon
Explicit (Existing)"Book a flight to Seoul and book a hotel in Seoul"Mechanical conjunction; repeated elements preserved
Implicit (Ellipsis)"Book a flight and a hotel in Seoul"Shared verb and location collapsed
Implicit (Coreference)"Find an Italian restaurant and reserve it for two"Anaphoric pronoun replaces repeated entity
Implicit (Combined)"Find a hotel near the airport and check in at 3 PM"Both ellipsis (hotel omitted) and deixis (implied location)

The examples illustrate a spectrum from artificial to natural: while explicit concatenation preserves all information redundantly, implicit concatenation removes redundancy in ways that require inference to recover the full semantic content — exactly the kind of reasoning that current models lack.

Proposed Method: Implicit Concatenation Dataset Construction

The core idea is to transform mechanically concatenated multi-intent utterances into linguistically natural ones by systematically applying ellipsis and coreference. Unlike previous datasets that simply join single-intent sentences with conjunctions, this approach produces utterances that mirror how real users naturally compress multiple requests into a single, compact expression. The construction follows a structured pipeline:

1
Base Pair Selection
Single-intent utterances from existing benchmarks (e.g., ATIS, SNIPS) are paired to form multi-intent combinations. Pairs are selected to cover a diverse range of intent combinations across domains such as travel, music, weather, and restaurant booking.
2
Implicit Concatenation via Ellipsis
Repeated elements shared between the two intents are identified and removed from one of the utterances. For example, "Book a flight to Seoul" + "Book a hotel in Seoul" becomes "Book a flight and a hotel in Seoul," where the repeated verb and location are collapsed into a single, natural expression.
3
Implicit Concatenation via Coreference
Entities mentioned in the first intent are replaced with pronouns or deictic expressions in the second intent. For instance, "Find an Italian restaurant" + "Make a reservation at the Italian restaurant" becomes "Find an Italian restaurant and make a reservation there," introducing anaphoric reference across intents.
4
Quality Validation & Annotation
Human annotators review the transformed utterances for linguistic naturalness, semantic preservation, and correct intent/slot labeling. Utterances that sound unnatural or lose intent information are revised or discarded to maintain dataset quality.

Experimental Results

State-of-the-art multi-intent detection models were evaluated on both the standard (explicit) concatenation dataset and the new implicit concatenation dataset to quantify the impact of realistic linguistic phenomena. The evaluation is designed to answer a critical question: do models that succeed on artificial benchmarks truly understand multi-intent utterances, or have they merely learned to exploit surface-level concatenation patterns?

Performance Comparison: Explicit vs. Implicit Concatenation

Evaluation SettingIntent DetectionSlot FillingOverall Difficulty
Explicit Concatenation (Baseline)HighHighStandard
Implicit Concatenation (Ours)Significantly LowerSignificantly LowerChallenging

Detailed Error Analysis by Phenomenon Type:

  • Shared-verb ellipsis: When two intents share a verb (e.g., "book") and only one instance is retained, models frequently fail to propagate the action to both intent slots, resulting in incomplete slot filling for the second intent.
  • Shared-entity ellipsis: When a location or entity is mentioned once but applies to both intents (e.g., "in Seoul"), models sometimes assign it only to the first intent, leaving the second intent's location slot empty.
  • Pronominal coreference: When "it" or "there" refers back to an entity from the first intent, models struggle to resolve the reference, often leaving the referent slot unfilled or filling it with a generic placeholder.
  • Combined phenomena: Utterances exhibiting both ellipsis and coreference simultaneously show the steepest performance drops, as models must perform multiple types of inference to recover the full intent structure.

Why It Matters

As dialogue systems are deployed in increasingly complex real-world scenarios, the ability to understand natural multi-intent utterances becomes essential. The findings of this work have direct implications for both academic research on dialogue understanding and the practical engineering of production-grade conversational AI systems. This work makes three important contributions:

Dialogue Benchmark