Robustly optimized BERT approach

Journal: International Conference on Learning Representations
Languages: English
Programming languages: Python
Input data:

concatenation of two segments (sequences of tokens)
Segments usually consist of more than one natural sentence. The two segments are presented as a single input sequence to (RoBERTa) with special tokens delimiting them (s. BERT)
-> pair of sentences

Output data:

tokens, segment and position embeddings

We present a replication study of BERT pretraining that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.