Data! Data! Data! I can’t make bricks without clay.
Even though it is tempting to dive right into retrieving financial data from Bloomberg and other sources, taking a moment to plan where we are going to store the data will save you a lot of grief down the road. Coming up with a layout, or schema, is an essential step because unorganized data can quickly become cumbersome to update and error-prone. Take, for example, a Microsoft Excel workbook or Access database for tracking students, teachers, and classes. If it kept track of the teacher for each class using the teacher’s full name and then the teacher married and decided to change her name, each reference to that name would need to be updated. If one instance were missed, it could lead to big issues down the road. If the school were large enough, it is also possible that two teachers could have the same full name, which could cause scheduling and payroll mistakes.
To solve this problem, we need to store our data in what is known as “third normal form.” Put simply, we will create one database or Excel table for each set of data (a table for students, a table for teachers, a table for classes, etc.). The columns in these worksheets or tables will contain only the attributes that belong to their respective entity. This is referred to as a has-a relationship. This way, no information or attribute will be repeated. For instance, a teacher’s full name would appear only in the ...