Big data governance is essential for maintaining the data quality and allowing analysts to make better decisions. It will enable financial organizations to avoid the costs associated with low-quality data re-work and reporting in compliance with regulations, such as Sarbanes-Oxley and Basel II/Basel III.
Big data governance should include at a minimum:
- Data definitions including metadata: Must know what is the data stored on the Hadoop platform.
- Full data process lineage: Must know where is the data coming from, what transformation has it gone through, and where has it eventually landed.
- NoSQL stores: These are flexible schema but that doesn't mean that we will store any junk. Even if we allow flexible schema, any changes to schema ...