Journal: Association for Computational Linguistics
Languages: All Languages
Programming languages: Python
Project website: https://github.com/kimiyoung/transformer-xl
We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem.