Robustly optimized BERT approach
Journal: International Conference on Learning Representations
Languages: English
Programming languages: Python
Input data:
concatenation of two segments (sequences of tokens)
Segments usually consist of more than one natural sentence. The two segments are presented as a single input sequence to (RoBERTa) with special tokens delimiting them (s. BERT)
-> pair of sentences
Output data:
tokens, segment and position embeddings
Project website: https://github.com/pytorch/fairseq
We present a replication study of BERT pretraining that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it.