August 2019
Intermediate to advanced
560 pages
13h 41m
English
This character filter removes the HTML tags (for more information about HTML tags and entities, you can refer to https://www.w3schools.com/html/default.asp). The HTML entities are replaced by the corresponding decoded UTF-8 characters. The contents stay the same by default, but the whole HTML comment will be removed. Let's suppose that we use the same HTML input text string as the previous example; after applying the html_strip character filter, the output of the character filter will be as follows:
"You'll love Elasticsearch 7.0."
Let's apply the html_strip character filter, the standard tokenizer and the lowercase token filter to the input text. In the following screenshot, the body token does not exist in the response ...
Read now
Unlock full access