Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models
Year: 2,020
Journal: International Conference on Machine Learning
Languages: All Languages
Programming languages: C, Python
Input data:
sentences
Output data:
text
Project website: https://github.com/google-research/pegasus
In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new selfsupervised objective. In PEGASUS, important sentences are removed/masked from an input document and are generated together as one output sequence from the remaining sentences, similar to an extractive summary.