James Clarke & Research

Models for Sentence Compression: A Comparison across Domains, Training Requirements and Evaluation Measures

James Clarke and Mirella Lapata. 2006. Models for Sentence Compression: A Comparison across Domains, Training Requirements and Evaluation Measures. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 377–384. Sydney, Australia.

Download talk slides.

Abstract

Sentence compression is the task of producing a summary at the sentence level. This paper focuses on three aspects of this task which have not received detailed treatment in the literature: training requirements, scalability, and automatic evaluation.We provide a novel comparison between a supervised constituent-based and an weakly supervised word-based compression algorithm and examine how these models port to different domains (written vs. spoken text). To achieve this, a human-authored compression corpus has been created and our study highlights potential problems with the automatically gathered compression corpora currently used. Finally, we assess whether automatic evaluation measures can be used to determine compression quality.

Bibtex

@inproceedings{Clarke:Lapata:06a,
  author =       {James Clarke and Mirella Lapata},
  title =        {Models for Sentence Compression: A Comparison across
                  Domains, Training Requirements and Evaluation
                  Measures},
  booktitle =    {Proceedings of the 21st International Conference on
                  Computational Linguistics and 44th Annual Meeting of
                  the Association for Computational Linguistics},
  pages =        {377--384},
  year =         2006,
  address =      {Sydney, Australia},
  URL =          {http://jamesclarke.net/media/papers/clarke-lapata-acl06a.pdf},
}