Data Science Craftsmanship, Part II

In the previous chapter, you discovered a number of data science techniques and recipes, including visualizing data with data videos, new types of metrics, computer science topics, and questions to ask when choosing a vendor, as well as a comparison between data scientists, statisticians, and data engineers.

In this chapter, you consider material that is less focused on metrics and more focused on applications. It includes discussions on how to create a data dictionary, hidden decision trees, hash joins in the context of NoSQL databases, and the first Analyticbridge Theorem, which provides a simple, model-free, nonparametric way to compute confidence intervals without statistical theory or knowledge.

This chapter is less statistical theory–oriented compared with the previous chapter. The topics discussed in this chapter are typically classified as data analyses rather than statistical or computer analyses. Most of the material has not been published before. Case studies, applications, and success stories are discussed in the next chapter.

The topics discussed in this chapter, such as hidden decision trees, data dictionaries, and hash joins, are important subjects for data scientists because they are at the intersection of statistics and computer science, and are designed to handle big data. Traditional statisticians typically don't learn or use these techniques, but data scientists do.

Data Dictionary

One of the most valuable tools when ...

Get Developing Analytic Talent: Becoming a Data Scientist now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.