Contextual String Embeddings

Year: 2,018
Journal: International Conference on Computational Linguistics
Languages: Dutch, English, German, Polish
Programming languages: Python
Input data:

Sentences (sequence of characters)

Output data:

Word embedding (contextual string embeddings)

This paper proposes a novel type of contextualized characterlevel word embedding which is hypothesized to combine the best attributes of the word-embeddings; namely, the ability to (1) pre-train on large unlabeled corpora, (2) capture word meaning in context and therefore produce different embeddings for polysemous words depending on their usage, and (3) model words and context fundamentally as sequences of characters, to both better handle rare and misspelled words as well as model subword structures such as prefixes and endings. It presents a method to generate such a contextualized embedding for any string of characters in a sentential context, and thus refers to the proposed representations as contextual string embeddings.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.