T-Pos

Year: 2,011
Journal: Conference on Empirical Methods in Natural Language Processing
Languages: English
Programming languages: Python
Input data:

If the file is a tab separated file. Use the i-th (starting from 0) column as a text column to read from. Output file will have that column data replaced with the annotated text.
CAUTION: Make sure there are no newline characters in the text column. This will break the format.

Output data:

The output contains the tokenized and tagged words separated by spaces with tags separated by forward slash ‘/’

The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.