SATZ
Year: 1,995
Languages: English, French, German
Input data:
Plain text
Output data:
Text with sentence boundaries disambiguated
As an alternative, I present an ecient, trainable algorithm that can be easily adapted to new text genres and some range of natural languages. The algorithm uses a lexicon with part-of-speech probabilities and a feedforward neural network for rapid training. The method described requires minimal storage overhead and a very small amount of training data. The algorithm overcomes the limitations of existing methods and produces a very high accuracy.