About Me

I’m a fifth year Ph.D. student at the University of North Carolina at Chapel Hill, where I am advised by Mohit Bansal. My work is supported by a Google PhD Fellowship.

Previously, I graduated with my bachelor’s and master’s degrees from Columbia University, advised by Kathleen McKeown.

My research focuses on Faithful and Multimodal AI, aiming to reduce hallucinations and improve reasoning in generation tasks. My recent work covers three main pillars:

Faithfulness & Hallucination Mitigation: Developing metrics and methods to ensure model outputs are factually consistent (e.g., FactPEGASUS, Faithfulness-Aware Decoding, PrefixNLI).
Fine-grained Attribution & RAG: Creating frameworks that allow models to cite their sources and reason transparently (e.g., GenerationPrograms, LAQuer).
Multimodal Reasoning & Retrieval: Grounding vision-language models to reduce hallucinations in cross-modal tasks (e.g., CLaMR, Contrastive Region Guidance).

Publications & Preprints

For an up-to-date list of my publications, please visit my Google Scholar Page.

MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments
Han Wang, David Wan, Hyunji Lee, Thinh Pham, Mikaela Cankosyan, Weiyuan Chen, Elias Stengel-Eskin, Tu Vu, Mohit Bansal
arXiv Preprint
[Paper] [Code]

Multimodal Fact-Level Attribution for Verifiable Reasoning
David Wan, Han Wang, Ziyang Wang, Elias Stengel-Eskin, Hyunji Lee, and Mohit Bansal.
Forty-Third International Conference on Machine Learning (ICML 2026)
[Paper] [Code]

PrefixNLI: Detecting Factual Inconsistencies as Soon as They Arise
Sapir Harary, Eran Hirsch, Aviv Slobodkin, David Wan, Mohit Bansal, and Ido Dagan.
The 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
[Paper] [Code]

DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning
Nithin Sivakumaran, Justin Chen, David Wan, Yue Zhang, Jaehong Yoon, Elias Stengel-Eskin, and Mohit Bansal
The 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2026)
[Paper] [Code]

CLaMR: Multimodal Late-Interaction Retrieval
David Wan, Han Wang, Elias Stengel-Eskin, Jaemin Cho, and Mohit Bansal
arXiv Preprint
[Paper] [Code]

GenerationPrograms: Fine-grained Attribution with Executable Programs
David Wan, Eran Hirsch, Elias Stengel-Eskin, Ido Dagan, and Mohit Bansal
Second Conference on Language Models (COLM 2025)
[Paper] [Code]

MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration
David Wan, Justin Chih-Yao Chen, Elias Stengel-Eskin, and Mohit Bansal
Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025)
[Paper] [Code]

QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization
Shiyue Zhang*, David Wan*, Arie Cattan, Ayal Klein, Ido Dagan, and Mohit Bansal
Second Conference on Language Models (COLM 2025)
[Paper] [Code]

On Positional Bias of Faithfulness for Long-form Summarization
David Wan, Jesse Vig, Mohit Bansal, and Shafiq Joty
Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025)
[Paper] [Code]

Localizing Factual Inconsistencies in Attributable Text Generation
Arie Cattan, Paul Roit, Shiyue Zhang, David Wan, Roee Aharoni, Idan Szpektor, Mohit Bansal, and Ido Dagan
Transactions of the Association for Computational Linguistics (TACL 2025)
[Paper] [Code]

ACUEval: Fine-grained Hallucination Evaluation and Correction for Abstractive Summarization
David Wan, Koustuv Sinha, Srinivasan Iyer, Asli Celikyilmaz, Mohit Bansal, and Ramakanth Pasunuru
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024 Findings)
[Paper] [Code]

Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
David Wan, Jaemin Cho, Elias Stengel-Eskin, and Mohit Bansal
Proceedings of the European Conference on Computer Vision (ECCV 2024)
[Paper] [Code] [Website]

HistAlign: Improving Context Dependency in Language Generation by Aligning with History
David Wan, Shiyue Zhang, and Mohit Bansal
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)
[Paper] [Code]

Extractive is not Faithful: An Investigation of Broad Unfaithfulness Problems in Extractive Summarization
Shiyue Zhang*, David Wan*, and Mohit Bansal
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)
[Paper] [Code]

Faithfulness-Aware Decoding Strategies for Abstractive Summarization
David Wan, Mengwen Liu, Kathleen McKeown, Markus Dreyer, and Mohit Bansal
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023)
[Paper] [Code]

Evaluating and Improving Factuality in Multimodal Abstractive Summarization
David Wan and Mohit Bansal
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)
[Paper] [Code]

Constrained Regeneration for Cross-Lingual Query-Focused Extractive Summarization
Elsbeth Turcan, David Wan, Faisal Ladhak, Petra Galuscakova, Sukanta Sen, Svetlana Tchistiakova, Weijia Xu, Marine Carpuat, Kenneth Heafield, Douglas Oard, and Kathleen McKeown
Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022)
[Paper]

FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization
David Wan and Mohit Bansal
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2022)
[Paper] [Code]

Segmenting Subtitles for Correcting ASR Segmentation Errors
David Wan, Chris Kedzie, Faisal Ladhak, Elsbeth Turcan, Petra Galuscakova, Elena Zotkina, Zhengping Jiang, Peter Bell, and Kathleen McKeown
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (EACL 2021)
[Paper]

Incorporating Terminology Constraints in Automatic Post-Editing
David Wan, Chris Kedzie, Faisal Ladhak, Marine Carpuat, and Kathleen McKeown
Proceedings of the Fifth Conference on Machine Translation (WMT 2020)
[Paper] [Code]

Subtitles to Segmentation: Improving Low-Resource Speech-to-Text Translation Pipelines
David Wan, Zhengping Jiang, Chris Kedzie, Elsbeth Turcan, Peter Bell, and Kathy McKeown
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS 2020)
[Paper]