Introduction to DevOps for the financial industry
DevOps lessons from Capital One and LMAX.
DevOps lessons from Capital One and LMAX.
From small trading firms to big banks and exchanges, financial industry players, racing to deliver content and features to customers more quickly, are looking at DevOps ideas, practices, and tools to solve problems in financial systems. Let’s look at the drivers for adopting DevOps in financial systems and the examples of Capital One and LMAX for insight into how it can be done effectively.
One of the major drivers for DevOps in financial enterprises is the adoption of cloud services. Online financial institutions like exchanges or clearinghouses are essentially cloud services providers to the rest of the market. And most order and execution management system vendors are, or are becoming, SaaS providers to trading firms. So it makes sense for them to adopt some of the same ideas and design approaches as cloud providers: Infrastructure as Code; virtualization; rapid, automated system provisioning and deployment.
The financial services industry is spending billions of dollars on building private internal clouds and using public cloud SaaS and PaaS (or private/public hybrid) solutions. This trend started in backend, general-purpose systems, with HR, CRM, and office services using popular SaaS platforms and services like Microsoft’s Office 360 or Azure. Now more financial services providers are taking advantage of cloud platforms for data intelligence and analytics, using cloud storage services, and building test platforms in the Cloud.
Today, even the regulators are in the Cloud. FINRA’s new surveillance platform runs on Amazon’s AWS, using public trade data. The SEC has moved its SEC.gov website and Edgar company filing system, as well as its MIDAS data analytics platform, to a private/public cloud to save operations and maintenance costs, improve availability, and handle surges in demand (such as the one that happened, for example, during Facebook’s IPO). Cloud adoption is still being held back by concerns about security and data privacy, data residency and data protection, and other compliance restrictions, according to a recent survey from the Cloud Security Alliance (“How Cloud is Being Used in the Financial Sector: Survey Report”, March 2015).
However, as cloud providers continue to raise the level of transparency and improve auditing controls over operations, encryption, and ediscovery, and as regulators provide clearer guidance on the use of cloud services, more and more financial data will make its way into the Cloud.
DevOps is a natural next step in organizations where Agile development has proved successful. Development teams who have proven that they can iterate through designs and deliver features quickly, and the business sponsors who are waiting for these features, grow frustrated with delays in getting systems into production. They start looking for ways to simplify and streamline the work of acceptance testing and security and compliance reviews; dependency analysis and packaging; release management and deployment.
Agile ideas and principles—prioritizing working software over documentation, frequent delivery, face-to-face collaboration, and a focus on technical excellence and automation—form the foundation of DevOps. And Continuous Delivery, which is the control framework for DevOps, is also built on top of a fundamental Agile development practice: Continuous Integration.
Capital One purchased ING Direct USA in 2012. Until then, Capital One outsourced most of its IT. Today, Capital One is fully committed to Agile and DevOps.
According to public presentations made by staff, Capital One’s Agile experiment started in late 2011, with just two teams. As more teams were trained in Agile development, as at ING, they found that they were building software quickly, but it was taking too long to get working software into production. Development sprints led to testing and hardening sprints before the code was finally ready to be packaged and handed off to production. This wasn’t Agile; it was “Agilefall.”
Capital One developers were following the Scaled Agile Framework (SAFe). They leveraged the idea of System Teams in SAFe, creating dedicated DevOps teams in each program to help streamline the —offs between development and operations. These teams were responsible for setting up and managing the development and test environments, automating build and deployment processes, and release management, acting as “air traffic controllers to navigate through the CABs.”
Integration testing, security testing, and performance testing were all being done outside of development sprints by separate test teams. They brought this testing into the dedicated DevOps teams and automated it. Then they moved all testing into the development sprints, adopting behavior-driven/acceptance-test-driven development and wiring integration, security, and performance testing into a Continuous Delivery pipeline. Today they have 700 Agile teams following Continuous Delivery.
In Continuous Integration, developers make sure that the code builds and runs correctly on each check-in. Continuous Delivery takes this to the next step.
It’s not just about automating unit testing (something that the development team owns). Continuous Delivery is about configuring test environments to match production as closely as possible, automatically; packaging the code and deploying it to test environments, automatically; running acceptance tests and stress tests and performance tests and security tests and other checks, with pass/fail feedback to the team—again, automatically. It’s about auditing all of these steps and communicating status to a dashboard, then later, using the same pipeline and deployment steps to deploy the changes to production.
Continuous Delivery is the backbone of DevOps. It’s an automated framework for making software and infrastructure changes; pushing out software upgrades, patches, and changes to configurations; and is repeatable, predictable, efficient, and fully audited.
Putting a Continuous Delivery pipeline together requires a high degree of cooperation between development and operations, and a much greater shared understanding of how the system works, what production really looks like, and how it runs. It forces teams to start talking to each other, exposing details about how they work.
There is a lot of work that needs to be done. Understanding dependencies, standardizing configurations, and bringing configuration into code. Cleaning up the build—getting rid of inconsistencies, hardcoding, and jury rigging. Putting everything into version control: application code and configuration, binary dependencies (like the Java Runtime), infrastructure configuration (recipes/manifests), database schemas, and configurations for the CI/CD pipeline itself. Automating testing. Getting all of the steps for deployment together and automating them carefully. Doing all of this in a heterogeneous environment, with different architectures and technology platforms and languages.
This work isn’t development, and it’s not operations either. This can make it hard to build a business case for: it’s not about delivering specific business features or content, and it can take time to show results. But the payoff can be huge.
The London Multi-Asset Exchange (LMAX) is a highly regulated FX retail market in the UK, where Dave Farley (coauthor of the Continuous Delivery book) helped pioneer the model of Continuous Delivery.
LMAX’s systems were built from scratch following Agile best practices: TDD, pair programming, and Continuous Integration. But LMAX took this further, automatically deploying code to integration, acceptance, and performance testing environments, building up a Continuous Delivery pipeline.
LMAX has made a massive investment in automated testing. Each build runs through 25,000 unit tests with code coverage failure, simple code analysis (using tools like FindBugs, PMD, and custom architectural dependency checks), and automated integration sanity checks. All of these tests and checks must pass for every piece of code submitted.
The last good build is automatically picked up and promoted to integration and acceptance testing, where more than 10,000 end-to-end tests are run on a test cluster, including API-level acceptance tests, multiple levels of performance tests, and fault injection tests that selectively fail parts of the system and verify that the system recovers correctly without losing data. More than 24 hours’ worth of tests are executed in parallel in less than 1 hour.
If all of the tests and reviews pass, the build is tagged. All builds are kept in a secure repository, together with dependent binaries (such as the Java Runtime). Everything is tracked in version control.
QA can conduct manual exploratory testing or other kinds of tests on a build. Operations can then pull a tagged build from the development repository to their separate secure production repository, and use the same automated tools to deploy to production. Releases to production are scheduled every two weeks, on a Saturday, outside of trading hours.
There is nothing sexy about the technology involved: they rolled a lot of the tooling on their own using scripts and simple conventions. But it’s everything that we’ve come to know today as Continuous