Mark Harwood

Revealing the Uncommonly Common with Elasticsearch

Date: This event took place live on October 30 2014

Presented by: Mark Harwood

Duration: Approximately 60 minutes.

Cost: Free

Questions? Please send email to


Hosted By: Ben Lorica

This webcast will be discussing how Elasticsearch is taking search engine technology and branching it out from its roots in relevance-ranking search results to providing more insightful analysis of large datasets.

The biggest distinction between a database and a search engine is that the search engine starts with an assumption that not all information is equal. A search engine maintains the frequency of use of every word in its index and uses these counts to help relevance-rank matches, separating the signal from the noise.

Elasticsearch is taking this treasure-trove of statistics and developing powerful new analytic capabilities that can spot anomalies and patterns in subsets of information whether it represents text, IP addresses, bank account IDs, product purchases or more.

In this webcast we will demonstrate how various forms of useful signal can be separated from the backdrop of noise that exists in all of our datasets.

Applications include:

  • Making product recommendations
  • Root cause analysis in fault reports
  • Detecting unusual hotspots of crime
  • Training classifiers
  • Revealing badly categorised content
  • Detecting credit card fraud

These use cases are outlined in this blog post.

About Mark Harwood

Mark Harwood is a software engineer at Elasticsearch and long-time contributor to Lucene. Prior to joining Elasticsearch, Mark was Chief Scientist at BAE Systems Detica, designing search and visualization systems on multi-billion document solutions for analysts in commercial and government clients. Twitter: @elasticmark

About Ben Lorica

Ben Lorica is the Chief Data Scientist and Director of Content Strategy for Data at O'Reilly Media, Inc.. He has applied Business Intelligence, Data Mining, Machine Learning and Statistical Analysis in a variety of settings including Direct Marketing, Consumer and Market Research, Targeted Advertising, Text Mining, and Financial Engineering. His background includes stints with an investment management company, internet startups, and financial services. He is an advisor to Databricks.

You may also be interested in:

Strata + Hadoop World