As we have seen earlier, the dataset contains information regarding the pickup and drop off coordinates. However, there is no information regarding the distance between the pickup and drop off points, which is arguably the most important factor in deciding taxi fares. Therefore, let's create a new feature that calculates the distance between each pair of pickup and drop off points.
Recall from geometry that the Euclidean Distance is the straight-line distance between any two points:
Let's define a function to calculate the Euclidean distance between any two points, given the latitude and longitudes of the two points: