Murder by the Masses
So after all that load testing, what happened on the day of the launch? How could the site crash so badly and so fast? Our first thought was that marketing was just way off on their demand estimates. Perhaps the customers had built up anticipation for the new site. That theory died quickly when we found out that customers had never been told the launch date. Maybe there was some misconfiguration or mismatch between production and the test environment?
The session counts led us almost straight to the problem. It was the number of sessions that killed the site. Sessions are the Achilles’ heel of every application server. Each session consumes resources, mainly RAM. With session replication enabled (it was), each session ...