Searching content in different languages

Till now, we've talked mostly in theory about language analysis, for example, handling multiple languages our data can consist, and things like that. This will now change as we will now discuss how we can handle multiple languages in our data.

Why we need to handle languages differently

As you already know that ElasticSearch allows us to choose different analyzers for our data, we can have our data divided into words on the basis of whitespaces, have them lowercased, and so on. This can usually be done with the data regardless of the language—you should have the same tokenization on the basis of whitespaces for English, German, and Polish (that doesn't apply to Chinese though). However, what if you want to ...

Get ElasticSearch Server now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.