O'Reilly logo

Administrating Solr by Surendra Mohan

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Language Detection

In this section, we will learn about language detections, and how to set up and configure so as to make it functional.

Solr has a unique capability to identify languages and map them with their respective fields while indexing. To do so, it uses langid, which is a UpdateRequestProcessor. This language detection feature can be implemented in Solr using the following:

  • Tika language detection
  • LangDetect language detection
  • Compact Language Detector (CLD)

Now, we will have a look at the comparison between these three implementations.

Parameter

CLD

Apache Tika

LangDetect

Language count supported

21

17

21

Languages not supported

N/A

Bulgarian, Czech, Lithuanian, and Latvian

N/A

Languages detected

> 76

27

53

Accuracy

Medium ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required