Video description
Machine-learning expert Mikio Braun moves budding data scientists into the world of big data with this overview of how to do complex data analysis at scale. You'll learn the general concepts behind machine learning, compare small scale and large scale data analysis algorithms, and review the basics of the architectures used in large-scale distributed processing. You'll then explore the use of Spark programming for data flow systems,and the many uses of approximation. Braun also outlines evaluation, feature extraction, and model-selection computing costs in big data analysis. The video closes with a discussion of the relationship between the amount of available data and the complexity of the learning problem.
- Review machine learning concepts such as fitting a model to data
- Learn core concepts behind large scale algorithms like stochastic gradient descent
- Review the architectures used in Hadoop-based systems and data flow systems
- Explore resilient distributed dataset structures, vectors, and matrices using Spark
- Review Sparks’s machine libraries and how to run basic machine learning tasks
- Understand the use of approximation in optimization and compressing feature spaces
- Learn what makes data “complex”
Mikio Braun is a data scientist researcher, a start-up entrepreneur, and the on-going creator of jblas, the open source library for fast linear algebra in Java. He has a Ph.D. in Computer Science, and works at Zalando.
Publisher resources
Product information
- Title: Scalable Machine Learning
- Author(s):
- Release date: December 2015
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491939437
You might also like
book
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. …
video
Deep Learning with Python Video Edition
"The clearest explanation of deep learning I have come across...it was a joy to read." Richard …
book
Python Data Science Handbook
For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, …
book
Practical Statistics for Data Scientists, 2nd Edition
Statistical methods are a key part of data science, yet few data scientists have formal statistical …