A training-free contrastive decoding method that dynamically blends parametric, contextual, and abstention distributions so that LLMs answer when they have relevant knowledge and gracefully abstain when they do not.
LLMs acquire broad parametric knowledge during pre-training, yet inevitably lack information about underrepresented or rapidly-evolving topics. Retrieval-Augmented Generation (RAG) supplements this with external contextual knowledge, but sometimes neither source contains the answer. Forcing the model to respond in such cases leads to confident-sounding hallucinations, which is especially dangerous in high-stakes domains.
The Missing Scenario: Prior work on contrastive decoding (e.g., Context-Aware Decoding, Adaptive Contrastive Decoding) only handles cases where at least one knowledge source is relevant. None of them address the critical fourth scenario—where both parametric and contextual knowledge are absent—and therefore these methods never abstain, achieving near-zero F1abs scores.
The authors identify four distinct scenarios: (1) only parametric knowledge available, (2) only contextual knowledge available, (3) both available, and (4) neither available. They first construct a controlled testbed that explicitly labels each scenario, then propose Contrastive Decoding with Abstention (CDA)—a training-free method that robustly handles all four.
Evaluated on three QA benchmarks (Natural Questions, HotpotQA, TriviaQA) with four LLMs (Llama3-8B, Llama2-7B/13B, Mistral-7B). Metrics: F1ans (answerable accuracy), F1abs (abstention accuracy), and Reliability Score (RS).
| Dataset | Method | F1ans | F1abs | RS |
|---|---|---|---|---|
| NQ | FSB (best baseline) | 69.27 | 54.94 | 59.64 |
| NQ | CDA | 72.06 | 55.49 | 62.95 |
| NQ | CDA-m | 73.15 | 55.47 | 63.72 |
| HotpotQA | FSB (best baseline) | 74.89 | 58.51 | 66.21 |
| HotpotQA | CDA | 78.71 | 62.50 | 70.20 |
| HotpotQA | CDA-m | 79.32 | 62.59 | 70.64 |
| TriviaQA | FSB (best baseline) | 77.02 | 59.84 | 68.55 |
| TriviaQA | CDA | 80.39 | 65.67 | 72.35 |
| TriviaQA | CDA-m | 80.93 | 65.66 | 72.74 |
For trustworthy AI deployment, knowing when not to answer is just as important as answering correctly. CDA is the first training-free decoding approach that integrates abstention directly into the contrastive decoding framework, handling all four knowledge-access scenarios without any parameter updates. Its calibrated uncertainty estimation ensures robust performance across diverse models and datasets, while momentum stabilization prevents error propagation during autoregressive generation. The method is immediately applicable to any instruction-tuned LLM with RAG, making it a practical step toward more reliable question-answering systems.