EN KO
← All Publications

Enhancing Conversational Context Inference via Reasoning Feedback-Based Learning

The 37th Annual Conference on Human and Cognitive Language Technology (HCLT 2025)
Yuri Son, Taeuk Kim

One-Line Summary

A novel Reasoning Feedback-based Learning (RFL) framework that harnesses detailed reasoning feedback from an external model to progressively refine training data, achieving 95.04% accuracy on conversational context inference — a 7.93%p gain over the baseline and 1.32%p over standard fine-tuning.

Background & Motivation

Conversational context inference — the task of understanding implicit information, speaker intentions, and situational dynamics from dialogue — is a fundamental challenge in building robust dialogue systems. While large language models (LLMs) have advanced dialogue understanding, they still falter on complex cases that require multi-step reasoning over conversation history.

Key Challenges with Existing Approaches:

  • Performance plateau on hard examples: Standard fine-tuning methods improve overall accuracy but stall on difficult inference cases where surface-level pattern matching is insufficient.
  • Limited learning signal: Multiple-choice question (MCQ) fine-tuning teaches the model what the correct answer is, but not why — the reasoning path remains opaque.
  • No targeted error correction: Conventional training treats all examples equally rather than concentrating effort on the most challenging instances where the model repeatedly fails.

These limitations motivate RFL: a framework that supplements standard training with structured reasoning feedback from a stronger external model, explicitly teaching the target model why its incorrect predictions are wrong and guiding it toward the correct reasoning path.

Proposed Method: Reasoning Feedback-Based Learning (RFL)

RFL is a three-stage iterative framework that leverages an external, more capable model to generate detailed reasoning feedback. This feedback is used to progressively refine training data so the target model can overcome its most persistent errors.

1
Initial Fine-Tuning & Error Collection
The target model is first fine-tuned on the conversational context inference task using standard MCQ training. After training, it is evaluated on the training set to identify incorrectly predicted instances — the hard cases where the model fails despite standard supervision.
2
Reasoning Feedback Generation
An external, more powerful model receives each incorrectly predicted instance along with the target model's wrong answer. It generates detailed reasoning feedback explaining: (a) why the chosen answer is incorrect, (b) what the correct reasoning chain should be, and (c) the correct answer with justification. This transforms bare labels into rich, explanatory training signal.
3
Progressive Data Refinement & Re-Training
The reasoning feedback is incorporated into the training data, replacing or augmenting the original instances for the hard cases. The target model is then re-trained on this progressively refined dataset, concentrating learning effort on the most challenging examples. This cycle can be repeated to achieve iterative improvement.

How Reasoning Feedback Differs from Standard Labels:

In standard MCQ fine-tuning, the training signal is simply the correct answer label (e.g., "Answer: B"). RFL replaces this with a structured reasoning trace that includes three components:

  • Error diagnosis: An explicit explanation of why the target model's predicted answer is wrong, pinpointing the faulty reasoning step.
  • Correct reasoning chain: A step-by-step reasoning path that connects dialogue context clues to the correct answer, making the implicit inference process explicit.
  • Justified answer: The correct answer accompanied by a natural language justification grounded in the conversation, reinforcing the causal link between evidence and conclusion.

This rich signal transforms each error into a targeted learning opportunity, teaching the model not just to memorize answers but to internalize the reasoning patterns behind them.

Experimental Setup

RFL is evaluated on the conversational context inference task, which frames dialogue understanding as a multiple-choice problem: given a conversation history, the model must select the correct inference about implicit information, speaker intent, or situational context from a set of candidates.

ComponentDetails
TaskConversational Context Inference (MCQ format)
Target ModelLLM fine-tuned on dialogue inference data
External Feedback ModelLarger, more capable LLM used to generate reasoning feedback
MetricAccuracy (%)
Training StrategyIterative: fine-tune → collect errors → generate feedback → retrain

The key design choice is the progressive refinement loop: after each training round, only the instances the model still gets wrong are sent to the external model for feedback, ensuring that computational resources are concentrated where they matter most.

Experimental Results

Main Results

RFL is compared against a baseline model (without fine-tuning) and standard MCQ fine-tuning (without reasoning feedback).

MethodAccuracy (%)Improvement over Baseline
Baseline87.11
MCQ Fine-Tuning93.72+6.61%p
RFL (Proposed)95.04+7.93%p

Where Does the Gain Come From?

Decomposing the RFL Improvement:

  • Standard fine-tuning contribution (+6.61%p): MCQ-based training captures the majority of learnable patterns, lifting accuracy from 87.11% to 93.72%. This addresses the "easy" and "medium" difficulty examples where pattern matching suffices.
  • Reasoning feedback contribution (+1.32%p): The additional gain from 93.72% to 95.04% comes entirely from hard cases that standard fine-tuning fails to solve. Though smaller in absolute terms, this gain is especially significant because it targets the most challenging tail of the distribution — cases that resist conventional training.

Analysis: Impact on Error Categories

Why It Matters

This work demonstrates that structured reasoning feedback from external models can serve as a powerful training signal for improving dialogue understanding. The contributions extend beyond the specific task:

Links

Dialogue Reasoning