Chapter 1. Modularity Introduction
Dealing with change is one of the most fundamental issues in building applications. It doesn’t have to be difficult, however, since we are dealing with software that is not constrained to any physical laws. Change is an architectural issue, but unfortunately, it is often overlooked as such. This is logical to some extent, because most of the time you simply cannot see it coming.
Traditional software development processes can make it hard to cope with change. In waterfall, and even DSDM and RUP to some extent, the big design is being done up front. Once the design is created, it is carved in stone, and further down the line, there is rarely any possibility to divert from the original ideas. As soon as the first big chunks of functionality are built, it is pretty much impossible to get a fundamental change in the software design.
When using an agile methodology, you must have all disciplines involved: requirements, architecture, development, testing, and so on. As most activities of the software development process are performed in short iterations (or sprints), this affects the way that one activity can influence the remainder of the development process. Shorter development cycles can lead to a greater transparency. But they can also keep you from seeing the entire effect of certain architectural decisions when you first encounter them. Some parts of an application’s architecture are harder to change than others. The key thing when dealing with change is to predict where it is coming, and delay any permanent architectural decisions until you have all necessary information for those decisions. This is hard, of course. On the other hand, a completely flexible application architecture is also impossible to achieve. Flexibility will likely cause so many abstractions that the complexity of the resulting application is unreasonably high. Complexity is a monster and should be avoided at all cost. It was Leonardo da Vinci who said, “Simplicity is the ultimate sophistication,” and he is absolutely right about that.
A number of studies have been conducted that show how the number of lines of code double about every seven years. Not only are we building bigger applications, but we are also using more diverse technologies. Advanced mechanisms such as Aspect Oriented Programming, Polyglot Programming (using multiple programming languages within the same solution), the enormous amount and availability and diversity of open source frameworks for all layers of the application, using alternative storage solutions in the form of NoSQL, the need for multitenancy in PaaS and SaaS environments, dealing with massive scale and elastic resource allocation, etc, etc. Currently, complexity is rising at such a disturbing pace that we have to put mechanisms in place to bring it to a halt.
More complexity usually means spending more time understanding what the application actually does. Not to mention the amount of time, energy, and money that has to be invested in maintaining these applications for a long time. As an example, how many times have you looked at some code that you had written yourself a couple of years ago and had a hard time understanding what it actually does? With the evolution in which open source frameworks are being made available into the market, the amount of knowledge to keep up with in your teams is also very hard to deal with. This is also why writing readable code is so important. Rather name your variable
l instead of
listOfBooks? You save some typing now but will regret it later on.
Divide and Conquer
Why is it difficult to maintain a large code base over many years? Many projects get into more and more problems as soon as the code base becomes larger. The core of the problem is that the code is not modular: there is no clear separation between different parts of the code base. Because of that, it becomes difficult to understand what the code is doing. To understand a nonmodular code base, you basically need to understand all the code, which is unrealistic. Working on a large code base that you can only partly comprehend will introduce bugs in places you couldn’t foresee whenever you make any changes. We have all seen these kind of projects, and we all know what is wrong with them; they ended up as spaghetti code.
This can be improved by identifying separate parts of a system and strictly separating them from other parts of the code. Doing so starts with one of the most basic object orientation best practices: program to interfaces, not to implementations. By programming to interfaces only, you can make changes to implementations without breaking consumers of the interface. We all know and apply this every day, but code bases still end up as an unmaintainable mess. In real applications, we are dealing with several interfaces and implementation classes when working on a part of a system. How do you know which interfaces and classes belong together and should be seen as one piece when looking at the bigger picture? By programming to interfaces, we try to prevent coupling. At the same time, there is always some coupling, between classes. Without any coupling there wouldn’t be any useful code. Again, we are looking at a basic object orientation best practice: promote cohesion; prevent coupling. In fact, this best practice can be seen as the golden rule of modular software design. Code that logically belongs together should be cohesive; it should do only one thing. The goal is to prevent coupling between these logical cohesive parts.
How can we communicate and enforce cohesion in separated parts of the code base while keeping coupling between them low? This is a far more difficult question. Object orientation and design patterns do give us the tools to deal with this at the finest level of single classes and interfaces. It doesn’t give much guidance in dealing with groups of classes and interfaces. The Java language itself doesn’t have any tools for this. We need something at a higher level: techniques to deal with well-defined groups of classes and interface, i.e., logical parts of a system. Let’s call them modules.
Modularizing a software design refers to a logical partitioning of the system design that allows complex software to be manageable for the purpose of implementation and maintenance. The logic of partitioning may be based on related functions, implementation considerations, data, or other criteria. The term modularity is widely used, and systems are deemed modular when they can be decomposed into a number of components that may be mixed and matched in a variety of configurations. Such components are able to connect, interact, or exchange resources (data) in some way by adhering to a standardized interface. Modular applications are systems of components that are loosely coupled.
Service Oriented Architecture All Over Again?
If you have ever heard of Service Oriented Architecture (SOA), you might recognize that it proposes a solution to the problems discussed. The SOA promise is in fact even better (on paper). With SOA, we reuse existing systems, creating flexible and reusable business process services to an extent that new business requirements can be implemented by simply mixing and matching services. Theoretically speaking. Many of the SOA promises were never delivered and probably never will be because integrating systems at this level has many challenges. There is something very valuable in the concept, however. Isolating reusable parts of a larger system makes a lot of sense.
SOA is not the solution for the problems discussed in this book. SOA is on a very different architectural level: the level of integrating and reusing systems. Modularity is about implementing systems and about separating concerns within such a system. Modularity is about writing code; SOA is about system integration. Some have tried to apply patterns from the SOA world within a single system because it does theoretically solve the spaghetti code problem. You could identify parts of an application and implement them as separate “services.” This is basically what we are looking for, but the patterns that are related to SOA are not fit for use within a single system. Applying integration patterns introduces a lot of complexity and overhead, both during development and runtime. It’s an extreme form of over-engineering and dramatically slows down development and maintenance. Instead, we need something to apply the same concepts in a way that makes sense on the scale of classes and interfaces, with minimal development and runtime overhead.
A Better Look at Modularity and What It Really Means
Understanding the golden rule of software design and designing a system on paper such that it promotes cohesion between separate parts of the application and minimizes coupling between them is not the hardest part of applying modularity to software design. To create a truly modular system, it is important to not only have it modular in the design phase, but to also take that design and implement it in such a way that it is still modular at runtime. Modularity can therefore be subdivided in both design time modularity and runtime modularity. In the next few paragraphs, we are going to explore them a bit.
Modularity is an architectural principle that starts at design time. Modules and relationships between modules have to be carefully identified. Similar to object-oriented programming, there is no silver bullet that will magically introduce modularity. There are trade-offs to be made, and patterns exist to help you do so. We will show many examples of this throughout the book. The most important step toward modularity is coming up with a logical separation of modules. This is basically what most architects will do, but they do this on the level of systems and layers. We need to do this on the code level as well, on a much more fine-grained level. This means that modularity doesn’t come for free: we need to actively work toward it. Just throwing a framework in the mix will not make a system modular all of a sudden.
Design time modularity can be applied to any code base, even if you are on a traditional nonmodular Java EE application server. Identifying modules in a design will already help, creating a clear design. Without a runtime environment that respects modules, it is very hard to enforce modularity. Enforcing sounds like a bad thing, and you might say that it is the role of developers to respect modularity. It certainly is our job as developers to do so, but we can use some help. It is very difficult to not accidentally break modularity by just relying on standards and guidelines when working in a large code base. The module borders are simply too vague to spot mistakes. Over time, small errors will leak into the code base, and finally modularity will just slowly evaporate.
It’s much easier to work with modules if our runtime supports it. The runtime should at least:
- Enforce module isolation
- Enforce and clarify module dependencies
- Provide a services framework to consume modules
Enforcing module isolation is making sure that other modules are not using internal classes. A module should only be used by an explicitly defined API, and the runtime should make sure that other classes and interfaces are not visible to the outside world (other modules). Enforcing module dependencies is making sure that modules only depend on a well defined set of other modules, and a module should not load when its dependencies are not resolved to prevent runtime errors. Services are another key aspect of modularity, and we will talk a lot more about them in the next chapter.
In essence, one could say that classes are a pretty modular concept. However, they are very limited in a way that you cannot have much structure materialized in just plain classes. Grouping classes can be done using the concept of packages, but the intention of packages is more to provide a mechanism to organize classes in namespaces belonging to a similar category or providing similar functionality. Within a package, it is not possible to hide certain classes from classes in other packages or even from classes in the same package. Sure there are some tricks, such as inner classes, or classes contained in other classes, but in the end this does not give you a truly modular concept.
An important aspect of runtime modularity is the packaging of the deployment artifact. On the Java platform, the JAR file has been the traditional unit of deployment. A JAR file holds a collection of classes or packages of classes, and their resources, and has the ability to carry some metadata about that distribution. Unfortunately, the Java runtime treats all JAR files as equal, and the information in the metadata description is ignored by both the classloader and the virtual machine. In plain Java and Java EE, all classes found in deployed JAR files are put in one large space, the classpath. In order to get the concept of runtime modularity to work, an additional mechanism is needed.
The basic idea of working on a modular approach using JAR files is not a bad one at all. JAR files are not only a unit of deployment, but also a unit of distribution, a unit of reuse, and a unit that you can add a version to.
There have been a number of attempts of enabling Java with a modular runtime. OSGi was an early candidate, given its Java Specification Request (JSR 8) number, but mostly because of political reasons and colliding characters, this never had a chance of success. Halfway through the 2000s, the concept of the Java Module System (JSR 277) was introduced, but it never made it into the Java runtime. As part of the original plans for Java SE 7, in what was later to become the final days of Sun as the steward of Java, a new plan for modularity was launched under its codename Jigsaw. Then there is Maven, which was originally designed to be a building system isolating application modules and managing dependencies. Finally, vendors such as RedHat have attempted to create a modularity system. In the next paragraphs, we will take a better look at the solutions that are available in Java today.
OSGi is the best known modularity solution for Java. Unfortunately, it also has a reputation for being an over-engineered, hard-to-use technology. To start with the latter: it’s not. Many of the complaints are due to misunderstanding and inadequate tooling. It is true that OSGi is a complex specification. OSGi has been in use for over 10 years, in many different kind of systems. Because of this, there are many corner cases that the specification should facilitate. This is not over-engineering, but the result of evolving a standard for many years. The good news is that you won’t have to deal with most of the complexity as an application developer or architect. Understanding the basics and, more important, understanding basic modularity patterns will get you a long way. That, combined with the greatly improved tooling available today, makes OSGi hardly any more difficult to use than Java without OSGi. And it gives you the power of modularity and some really nice development features as a side effect. The rest of the book will use OSGi as the technology to work with, and instead of trying to convince you that OSGi is easy to work with, we will get you started as soon as possible and let you see for yourself.
The most interesting alternative for OSGi seems to be Jigsaw, as it is supposed to become the standard Java modularity solution. There is one problem however: it doesn’t exist yet and will not anytime soon. Jigsaw was delayed for Java SE 8 and then again deferred to Java SE 9, which is currently scheduled for 2015–2016. The current scope of Jigsaw is also far from sufficient to build truly modular applications. It will merely be the basis to modularize the JDK itself and a basis for future work, post Java 9. Jigsaw is trying to solve two problems at the same time: modularizing the JDK itself and pushing the same model to developers. Splitting the JDK into smaller pieces makes a lot of sense, especially on “small” devices. Because of all the legacy, this is a much more difficult challenge than creating modular applications. To facilitate the JDK modularization, there will be some new language constructs to define modules, and this will also be available to developers. Unfortunately the proposal is based on concepts that will not be sufficient for application modularization. If you are looking at modularity today, Jigsaw is simply not a viable alternative. Will Jigsaw eventually be an OSGi killer? That is for the future to tell.
JBoss Modules is the modularity solution developed by RedHat as the basis of JBoss Application Server 7. JBoss Modules was developed with one very specific goal: startup speed of the application server. Because of this goal, JBoss Modules does a lot less than OSGi to make it faster at startup time. This made JBoss AS7 one of the fastest application servers to start up, but JBoss Modules an inadequate modularity solution for general use. The most important concept that JBoss Modules lacks is a dynamic service layer, which is the most important modularity concept while building applications. So far, there has also been little effort on making JBoss Modules usable outside of JBoss; documentation and real-life examples are very limited.
Although Maven isn’t really a modularity solution, it is often discussed in this context. By separating a code base in smaller projects and splitting APIs and implementations in separate Maven projects, you can get pretty decent compile time modularity. As discussed previously, compile time modularity is only half of the solution. We need runtime modularity as well, and Maven doesn’t facilitate in this at all. However, you could very well use Maven for compile time modularity while using OSGi for runtime modularity. This has been done in many projects and works well. In this book, you will see that you do not need Maven at all when using OSGi. OSGi modules (bundles) already contain the metadata required to define modules, and we don’t need to duplicate this in our build environment. Maven itself is slow and painful to use, and it’s difficult to get very fast turnarounds on code changes. For those reasons, and the fact that we actually don’t need Maven at all, we don’t advise using it as a modularity solution.
Choosing a Solution: OSGi
OSGi is the only mature modularity solution for Java today, and that is unlikely to change anytime soon. Although OSGi has been criticized for being too complex to use in the past, recent improvements to tooling and frameworks have changed this completely. The remainder of this book will be focused on using OSGi to achieve modularity. Although the concepts discussed in this book would work with any modularity framework, we want to keep the book as practical as possible for solving today’s problems.
What Is OSGi?
OSGi is a modularity framework for the Java platform. The OSGi Alliance started work on this in 1999 to support Java on devices such as set-top boxes, service gateways, and all kinds of consumer electronics. Nowadays OSGi is applied in a much broader field and has become the de facto modularity solution in Java.
The OSGi framework is specified in the OSGi Core Specification, which describes the inner workings of the OSGi framework. Next to the OSGi Core Specification there is the OSGi Compendium Specification, which describes a set of standardized OSGi services such as the Log Service, Configuration Admin, and the HTTP Service. Several of the compendium services will be used in this book. Additionally, there is Enterprise OSGi, a set of specifications focused on using OSGi in an enterprise environment. This includes Remote Services, the JDBC Service, and the JPA Service. Besides these specifications, there are a number of lesser-known compendiums for residential and mobile usage that are not covered in this book.
When using OSGi, you need an implementation. The most popular OSGi implementations are Apache Felix and Equinox from the Eclipse Foundation. Both are mature implementations, but the authors of this book do have a preference for Apache Felix, because Equinox tends to be a little more heavyweight, and there are some implementation choices that only make sense for Eclipse usage. Therefore, Apache Felix is the implementation of our choice used throughout this book. It really doesn’t matter much which implementation is used however, because the bundles you create and the compendium services run on any implementation.
You should understand that an OSGi framework such as Apache Felix only offers an implementation of the OSGi Core Specification. When you want to use the compendium or enterprise services, you will need implementations for those as well. The great thing about this is that you never bloat your runtime with anything that you are not using. Compare that to traditional application servers.
OSGi in the Real World
OSGi has a very long tradition of being used in all kinds of different systems. Its history started in embedded systems such as cars and home automation gateways, where modularity was mostly envisioned as a way for modules from different vendors to coexist in the same framework, giving them independent lifecycles so they could be added and updated without having to constantly reboot the system. Another well known example is the Eclipse IDE. Eclipse is built entirely on top of OSGi, which enables the Eclipse plug-in system. While Eclipse is probably the best known OSGi example, you can consider it the worst example as well. There are many Eclipse-specific design decisions that were necessary to support the Eclipse ecosystem but can be considered bad practices for most other applications.
In more recent years, we have seen a move toward the enterprise world of software, starting with application servers that based their core inner constructs on OSGi, such as Oracle GlassFish Application Server and IBM WebSphere® Application Server. The main reasons for this move to OSGi was to isolate the complexity of various parts of the application server and to optimize startup speed along the way.
There are also countless examples of applications built on top of OSGi. Just like with most other technology, it is hard to find exact usage numbers, but there is a very active user base. The authors of this book use OSGi as the primary technology for most of their projects, including some very large, high-profile applications.
In the past few years, a lot of progress has been made on OSGi tooling. When considering a good tool for OSGi development, we have a list of requirements it should take care of:
- Automatically generate bundle JAR files
- Generate package imports using byte code analysis
- Provide an easy way to start an OSGi container to run and test our code
- Hot code updates in a running container for a fast development turnaround
- Run in-container integration tests
- Help with versioning of bundles
Bndtools is by far the tool that fits these requirements best, and we strongly advise using Bndtools for OSGi development. Bndtools is an Eclipse plug-in focused on OSGi development. Under the hood, it’s based on BND. BND is a library and set of command-line tools to facilitate OSGi development. Bndtools brings this to the Eclipse IDE. When using Bndtools, you don’t have to use Maven for builds, although there is some integration. Bndtools instead generates ANT build files that can be used out of the box for headless/offline builds on a build server, for example. Bndtools also provides wizards for editing manifest files and supports repositories.
Similar to Maven, Bndtools supports the concept of repositories. Repositories are basically a place where bundles are stored and indexed. A Maven repository needs external metadata (the POM file), however; while in OSGi, the metadata is already included in the bundle itself. A repository in Bndtools contains OSGi bundles that you can use within your project. We will delve deeper into using and setting up repositories later.
The most appealing feature of Bndtools is running an OSGi container directly from the IDE enabling hot code updates. This improves development speed enormously and would almost justify the use of OSGi by itself. We will use Bndtools in the remainder of this book, and we strongly recommend you do so as well.
The BND Maven plug-in can generate bundles and calculate package imports directly from a Maven build. Similar to Bndtools, the plug-in is based on the underlying BND library, and the code analysis used to generate package imports is the same. Other manifest details such as the bundle name and version are extracted from the POM file. This works fine, and many projects use this approach. The major downside is Maven itself. Maven is a decent build tool, but it does a very bad job at facilitating the development process. Fast development requires a fast turnaround of code changes, and this is very difficult to achieve with Maven. Because OSGi bundles already contain all the metadata required to setup dependencies, etc., there is actually no real reason to still use Maven.
Eclipse Tycho is another Eclipse plug-in that facilitates OSGi development. It is more tied toward Eclipse plug-in development and less fit for general OSGi development. Tycho is also more an addition to Maven than a complete development environment. It helps, for example, to build p2 installation sites, which again is very Eclipse specific.
NetBeans and IntelliJ
NetBeans currently offers only very basic support for OSGi as described on the NetBeans Wiki. IntelliJ doesn’t offer support for OSGi out-of-the-box. However, both IDEs support Maven, and using the Maven Bundle plug-in, you can develop OSGi projects in a very similar way like any other types of Maven-based projects. Although this works just as well as other types of Java projects, this model misses features such as dynamic bundle reloads and editors for bundle descriptors that Bndtools offers. Without those features, the development experience is simply much less, and you simply don’t get all the potential out of OSGi. Even if you are a NetBeans or IntelliJ user, we recommend trying Bndtools with Eclipse. Hopefully, OSGi support in NetBeans and IntelliJ will improve in the near future.