O'Reilly logo

Practical Data Analysis by Hector Cuesta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 2. Working with Data

Building real world's data analytics requires accurate data. In this chapter we discuss how to obtain, clean, normalize, and transform raw data into a standard format such as Comma-Separated Values (CSV) or JavaScript Object Notation (JSON) using OpenRefine.

In this chapter we will cover:

  • Datasource
    • Open data
    • Text files
    • Excel files
    • SQL databases
    • NoSQL databases
    • Multimedia
    • Web scraping
  • Data scrubbing
    • Statistical methods
    • Text parsing
    • Data transformation
  • Data formats
    • CSV
    • JSON
    • XML
    • YAML
  • Getting started with OpenRefine

Datasource

Datasource is a term used for all the technology related to the extraction and storage of data. A datasource can be anything from a simple text file to a big database. The raw data can come from observation logs, sensors, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required