Introduces PURE, a select-then-generate framework for preference-consistent explanations in LLM recommenders, addressing inconsistencies missed by standard metrics.

Topological visualization of Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation
Brave API

The provided context does not contain information about a framework called PURE, a select-then-generate approach for preference-consistent explanations, or any specific method addressing preference-inconsistent explanations in LLM-based recommenders beyond factual correctness. While several papers discuss the factual consistency and quality of explanations generated by large language models (LLMs) in recommendation systems, none explicitly introduce or describe the PURE framework as mentioned in the query.

Research highlighted in the context emphasizes the gap between surface-level fluency and actual factual consistency in LLM-generated explanations, showing that high semantic similarity scores (e.g., BERTScore F1: 0.81–0.90) often mask low factual precision (4.38%–32.88%) when evaluated against evidence from user reviews . A prompting-based pipeline has been proposed to extract atomic statements from reviews and construct ground truth explanations for more reliable evaluation . Additionally, counterfactual approaches have been explored to generate explanations by identifying minimal changes that alter recommendation outcomes, enhancing both user understanding and model debugging .

LLMs are also being used as explainable re-rankers in hybrid frameworks that combine traditional models with LLMs to improve both accuracy and interpretability, leveraging techniques like Direct Preference Optimization (DPO) and bootstrapping to mitigate biases . User studies indicate a strong preference for LLM-generated explanations over baseline methods, particularly due to their clarity, detail, and perceived transparency, even though they may not always reflect the internal logic of the recommender system .

However, no source in the context addresses the specific issue of preference-inconsistency—where explanations may be factually correct but misaligned with user preferences—or introduces a solution named PURE for this problem. Therefore, based on the available information, the claim in the query cannot be substantiated.

Generated Mar 7, 2026
Cerebras Thinking

This paper addresses a critical gap in LLM-based explainable recommendation: while current systems effectively minimize factual hallucinations, they frequently generate explanations that are factually accurate yet contextually irrelevant to the specific user's preferences. The authors identify this phenomenon as "preference inconsistency," where a model might correctly identify a product feature (e.g., "this movie is family-friendly") but fail to recognize that the target user prioritizes a different attribute (e.g., "dark, gritty tone"). The research argues that standard evaluation metrics focused solely on factual correctness are insufficient, as they fail to capture the semantic alignment between an explanation and the user's underlying intent.

To resolve this, the study introduces PURE, a novel "select-then-generate" framework designed to ground explanations in user-specific intent. Unlike standard end-to-end generative approaches that often produce text based on generic item attributes, PURE first filters and selects the most relevant user preferences from historical interactions or reviews before passing them to the generator. This intermediate selection step acts as a grounding mechanism, ensuring that the subsequent explanation generation is strictly aligned with the user's actual taste profile rather than just the item's general properties.

The contributions of this work are validated through new evaluation benchmarks specifically designed to measure preference consistency, demonstrating that PURE significantly outperforms baseline methods in maintaining user alignment without sacrificing factual accuracy. This research is vital for the advancement of trustworthy recommender systems, as it shifts the paradigm of explainability from a focus on generic truthfulness to a focus on personalized relevance. By ensuring that explanations are not only true but also meaningful to the specific user, PURE enhances the utility and acceptance of AI-driven recommendations.

Generated 29d ago
Open-Weights Reasoning

Summary: Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation

This paper introduces PURE (Preference-Unified Recommendation Explanations), a framework designed to generate preference-consistent explanations for large language model (LLM)-based recommenders. While existing explainability methods often focus on factual correctness (e.g., ensuring recommendations match user history), they frequently overlook preference consistency—whether explanations align with the underlying ranking logic or user preferences. The authors argue that standard evaluation metrics (e.g., faithfulness, plausibility) fail to capture this discrepancy, leading to explanations that may appear coherent but do not reflect the model’s actual decision-making process.

PURE adopts a select-then-generate approach, where it first selects preference-aligned reasoning paths from the LLM’s internal rationale generation and then refines them into coherent explanations. Experiments demonstrate that PURE outperforms baselines in generating explanations that are not only factually accurate but also aligned with the recommender’s ranking criteria. The work highlights a critical gap in explainable recommendation systems: even factually correct explanations can be misleading if they do not reflect the model’s true preference-based decision logic. This contribution is particularly relevant for applications where transparency and trust are paramount, such as personalized content or product recommendations. By addressing preference consistency, PURE sets a new benchmark for evaluating and improving the reliability of LLM-driven explainable systems.

Why it matters: - Technical insight: Challenges the assumption that factual correctness alone ensures high-quality explanations. - Practical impact: Improves alignment between explanations and model behavior, enhancing user trust and system interpretability. - Methodological innovation: Introduces a framework that bridges the gap between explanation generation and underlying ranking logic, a problem often overlooked in prior work.

Source: [arXiv:2603.03080](https://arxiv.org/abs/2603.03080)

Generated 29d ago
Sources