Language Detection

Lexalytics provides a library to detect the language of a piece of text, primarily those languages we support directly. We also detect languages that might be easily confused with a supported one, such as Bulgarian versus Russian. Texts can be made of pieces from different languages, for instance intermingled Spanish and English is quite common in some social media. We do not try to identify the individual pieces, only judge what the entire text is most likely to be.

In our Semantria product, language detection is done at submission time and cannot be used as a way to route content in our system. The content will go to the configuration you specified regardless of the language detected. However, you can use the detected value to see if you are submitting content in a language different from the configuration language.