Year: 2,020
Journal: Conference on Neural Information Processing Systems
Languages: English
Programming languages: Python
Input data:


We present a novel span-based dynamic convolution operator and integrate it into the self-attention mechanism to form our mixed attention block for language pre-training. We also devise a bottleneck structure applied to the self-attention module and a grouped linear operation for the feed-forward module.

Sign In


Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.