O'Reilly logo

Microsoft Azure Machine Learning by Sumit Mund

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data exploration and preparation

Create a new experiment in ML Studio. Drag the uploaded dataset to the canvas and visualize it. As you can see, it has 1157 rows and 3600 columns. Usually, the data exposed in a Kaggle competition is already cleaned, which saves you the effort of data cleansing, such as dealing with missing values. In ML Studio, you can't see all the columns and rows. There are 3,578 columns that have mid-infrared absorbance measurements and these entire column names start with the letter 'm'. You may like to separate them out. To do so, you can use an Execute Python Script module with the following code, where the inline comments explain the lines of code. For this, refer to Chapter 10, Extensibility with R and Python, to find ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required