David Wan

I’m a fourth year Ph.D. student at the University of North Carolina at Chapel Hill, where I am advised by Mohit Bansal. Before this, I graduated with a bachelor’s and masters’ degree from Columbia University, advised by Kathleen McKeown. My research interest is in natural language processing. My work at UNC was supported by a Google PhD Fellowship.

Internships

Awards

Publications

  • On Positional Bias of Faithfulness for Long-form Summarization
    David Wan, Jesse Vig, Mohit Bansal, Shafiq Joty
    [paper][code]

  • Localizing Factual Inconsistencies in Attributable Text Generation
    Arie Cattan, Paul Roit, Shiyue Zhang, David Wan, Roee Aharoni, Idan Szpektor, Mohit Bansal and Ido Dagan
    [paper][code]

  • ACUEval:Fine-grained Hallucination Evaluation and Correction for Abstractive Summarization
    David Wan, Koustuv Sinha, Srinivasan Iyer, Asli Celikyilmaz, Mohit Bansal, and Ramakanth Pasunuru
    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024 Findings)
    [paper][code]

  • Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
    David Wan, Jaemin Cho, Elias Stengel-Eskin and Mohit Bansal
    Proceedings of the European Conference on Computer Vision (ECCV 2024)
    [paper][code][website]

  • HistAlign: Improving Context Dependency in Language Generation by Aligning with History
    David Wan, Shiyue Zhang and Mohit Bansal
    Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)
    [paper][code]

  • Extractive is not Faithful: An Investigation of Broad Unfaithfulness Problems in Extractive Summarization
    Shiyue Zhang*, David Wan* and Mohit Bansal
    Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)
    [paper][bib][code]

  • Faithfulness-Aware Decoding Strategies for Abstractive Summarization
    David Wan, Mengwen Liu, Kathleen McKeown, Markus Dreyer, and Mohit Bansal
    Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023)
    [paper][bib][code]

  • Evaluating and Improving Factuality in Multimodal Abstractive Summarization
    David Wan and Mohit Bansal
    Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)
    [paper][bib][code]

  • Constrained Regeneration for Cross-Lingual Query-Focused Extractive Summarization
    Elsbeth Turcan, David Wan, Faisal Ladhak, Petra Galuscakova, Sukanta Sen, Svetlana Tchistiakova, Weijia Xu, Marine Carpuat, Kenneth Heafield, Douglas Oard and Kathleen McKeown
    Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022)
    [paper][bib]

  • FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization
    David Wan and Mohit Bansal
    Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2022)
    [paper][bib] [code]

  • Segmenting Subtitles for Correcting ASR Segmentation Errors
    David Wan, Chris Kedzie, Faisal Ladhak, Elsbeth Turcan, Petra Galuscakova, Elena Zotkina, Zhengping Jiang, Peter Bell and Kathleen McKeown
    Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (EACL 2021)
    [paper][bib]

  • Incorporating Terminology Constraints in Automatic Post-Editing
    David Wan, Chris Kedzie, Faisal Ladhak, Marine Carpuat and Kathleen McKeown
    Proceedings of the Fifth Conference on Machine Translation (WMT 2020)
    [paper][bib] [code]

  • Subtitles to Segmentation: Improving Low-Resource Speech-to-Text Translation Pipelines
    David Wan, Zhengping Jiang, Chris Kedzie, Elsbeth Turcan, Peter Bell and Kathy McKeown
    Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS 2020)
    [paper][bib]