Using the saved data from the previous chapter, you first need to load it:
|||with open("data", "rb") as f:|
|||L = pickle.load(f)|
Your data is a list of lists. Each inner list has three items: [’x’, ’y’, ’out’]. Your tree will predict the last item: ’out’. You’ll provide a label for each column to help make readable rules from the tree which will be built up of sub-trees using split points. But first, you need to find the split points.
You use information gain to find split points. This needs the proportions of data in each group. You can use the collections library to find counts of each value, which gets you most of the way there.
Try it on a list of numbers: