The TREC 2025 DRAGUN track evaluates RAG systems for generating reader-oriented reports on news trustworthiness amid misinformation.
The TREC 2025 DRAGUN (Detection, Retrieval, and Augmented Generation for Understanding News) Track provides a framework for evaluating Retrieval-Augmented Generation (RAG) systems designed to assist readers in assessing the trustworthiness of online news articles in the presence of misinformation . As a successor to the TREC 2024 Lateral Reading Track, it supports reader-driven judgment by generating neutral, multi-source context rather than delivering definitive verdicts on truthfulness .
The track features two parallel tasks: (1) Question Generation, which involves producing 10 ranked investigative questions per news article to guide trustworthiness assessment, and (2) Report Generation, the core task, which requires generating a 250-word, well-attributed report grounded in the MS MARCO V2.1 Segmented Corpus . Each sentence in the report may cite up to three document segments, ensuring the output remains factually anchored to the corpus . The 30 target news articles, released as topics, serve as the basis for evaluation .
To support automated evaluation, the DRAGUN organizers developed an AutoJudge system that replicates human assessment using importance-weighted rubrics created by NIST assessors . These rubrics include key questions and expected short answers deemed critical for trustworthiness assessment . The AutoJudge achieves high correlation with human judgments, with Kendall’s $$\tau = 0.678$$ for Task 1 and $$\tau = 0.872$$ for Task 2, enabling reliable reuse of the evaluation framework for future research .
This resource allows for both benchmarking assistive RAG systems and advancing automated evaluation methods, with human assessments serving as a gold standard . The structured submission format—using JSONL for reports and tab-separated files for questions—facilitates consistency and cross-track participation with other TREC 2025 RAG-related tasks . These developments represent significant progress in AI tools for misinformation detection and media literacy support .
This paper outlines the framework and resources for the TREC 2025 DRAGUN (Distilling Reading Assistance and Guidance from User Needs) track, which focuses on evaluating Retrieval-Augmented Generation (RAG) systems designed to assist readers in assessing news trustworthiness. As misinformation proliferates, the track challenges participants to move beyond simple question-answering to generate comprehensive, reader-oriented reports that analyze the veracity of news claims. The authors define the task architecture, detailing how systems must retrieve relevant evidence and synthesize it into coherent explanations that aid human decision-making, rather than just providing a binary "true/false" label.
A key contribution of this work is the provision of a robust dataset and an automated evaluation pipeline tailored for assistive AI. The paper describes the construction of the test collection, which includes news articles, associated claims, and evidence sources, as well as the methodologies for assessing system performance. Crucially, it introduces metrics designed to evaluate the helpfulness and reliability of the generated reports, addressing the difficulty of automatically scoring nuanced, explanatory text. This includes metrics for citation accuracy, argumentation quality, and the ability to detect subtle forms of misinformation or bias.
This research is significant because it establishes a standardized benchmark for a critical application of large language models: combating misinformation through user assistance. By shifting the evaluation focus from mere retrieval accuracy to the quality of reader guidance, the DRAGUN track encourages the development of RAG systems that are not only factually correct but also genuinely useful for non-expert users navigating complex media landscapes. The resources provided here serve as a foundation for future research into trustworthy AI, offering a rigorous testbed for systems intended to enhance media literacy and public understanding.
# Summary: Resources for Automated Evaluation of Assistive RAG Systems that Help Readers with News Trustworthiness Assessment
By formalizing the evaluation of assistive RAG systems, this paper advances both technical capabilities and ethical considerations in AI-driven media literacy.