O'Reilly logo

C# Machine Learning Projects by Yoon Hyup Hwang

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Categorical variable distribution

The features that we have in this dataset are a mixture of categorical and continuous variables. For example, the feature named duration, which represents the length of the connection, is a continuous variable. However, the feature, named protocol_type, which represents the type of the protocol, such as tcp, udp, and so forth, is a categorical variable. For a complete set of feature descriptions, you can go to this link: http://kdd.ics.uci.edu/databases/kddcup99/task.html.

In this section, we are going to take a look at the distribution differences in the categorical variables between the normal connections and the malicious connections. The following code shows how we separate the sample set into two subgroups, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required