Year: 2,021
Programming languages: Python
Input data:

text? (“quantized input” but model probably does this itself?)

In this work, we propose IBERT, a novel quantization scheme for Transformer based models that quantizes the entire inference with integer-only arithmetic. Based on lightweight integer-only approximation methods for nonlinear operations, e.g., GELU, Softmax, and Layer Normalization, I-BERT performs an endto-end integer-only BERT inference without any floating point calculation.

Sign In


Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.