Appendix A. Datasets

All datasets are stored under src/main/resources/datasets. While Java class codes are stored under src/main/java, user resources are stored under src/main/resources. In general, we use the JAR loader functionality to retrieve contents of a file directly from the JAR, not from the filesystem.

Anscombe’s Quartet

Anscombe’s quartet is a set of four x-y pairs of data with remarkable properties. Although the x-y plots of each pair look completely different, the data has the properties that make statistical measures almost identical. The values for each of the four x-y data series are in Table A-1.

Table A-1. Anscombe’s quartet data
x1y1x2y2x3y3x4y4
10.08.0410.09.1410.07.468.06.58
8.06.958.08.148.06.778.05.76
13.07.5813.08.7413.012.748.07.71
9.08.819.08.779.07.118.08.84
11.08.3311.09.2611.07.818.08.47
14.09.9614.08.1014.08.848.07.04
6.07.246.06.136.06.088.05.25
4.04.264.03.104.05.3919.012.50
12.010.8412.09.1312.08.158.05.56
7.04.827.07.267.06.428.07.91
5.05.685.04.745.05.738.06.89

We can easily hardcode the data as static members of the class:

public class Anscombe {
    public static final double[] x1 = {10.0, 8.0, 13.0, 9.0, 11.0,
                                       14.0, 6.0, 4.0, 12.0, 7.0, 5.0};
    public static final double[] y1 = {8.04, 6.95, 7.58, 8.81, 8.33,
                                       9.96, 7.24, 4.26, 10.84, 4.82, 5.68};
    public static final double[] x2 = {10.0, 8.0, 13.0, 9.0, 11.0,
                                       14.0, 6.0, 4.0, 12.0, 7.0, 5.0};
    public static final double[] y2 = {9.14, 8.14, 8.74, 8.77, 9.26,
                                       8.10, 6.13, 3.10, 9.13, 7.26, 4.74};
    public static final double ...

Get Data Science with Java now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.