After your hardware has been racked and the software installed, it’s time to validate the installation. You want to ensure that the cluster works within a reasonable, expected range of performance and that all the components work well with each other. Validation may include a variety of practices, such as:
To detect bad hardware and platform misconfigurations. Any setup can have disks that are dead on arrival, memory sticks that aren’t seated right, network adapters that work intermittently, and more. Storage disks, in particular, tend to fail according to the bathtub curve.1 You can use burn-in tests to “smoke” these components out and replace them before demand on the system comes into play.
To demonstrate or prove degraded performance. For this you need evidence, not expectations. If you exercise your system at regular intervals while you configure the hardware, the operating system, and the Hadoop components, you can correlate recent changes to a change in system efficiency. You can identify regressions (or progressions!) caused by new hardware or software upgrades simply by running regular, repeatable tests for performance.
To ensure that your monitoring, alerting, day-to-day operations, and other triage operations work as you expect. Rehearsing your recovery procedures and playbooks before the system goes into service—without the pressure of production demand or the clamor of angry tenants—is ...