NoSQL technologies are built to solve business problems, not just “wrangle big data”

Knowing the architectures is key to thinking strategically and delivering value.

By Dale Kim

February 3, 2016

It's raining pencils... (source: By Adrien Lebrun on Flickr)

If you’re an application developer, then you probably have a good grasp on NoSQL databases and why they get so much hype. Usually, the impetus for moving from a relational database to a NoSQL technology is that you are hitting performance limits of your current database. Understanding NoSQL options before hitting the failure point of current technology enables you to formulate a strong data strategy. You can choose the right tool for the job and derive greater value from your data. And if you’re deploying enterprise-wide data-as-a-service systems—often a strong option for non-tech companies—knowing the strengths and limitations of the underlying technologies helps you to better plan for how you deploy your data.

As a business decision maker, you don’t have to understand all the intricacies of NoSQL technologies in the market, but a high level overview can give you the background necessary to help you make better decisions on selecting technologies to solve your business problems.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

Types of NoSQL databases

Unlike relational database management systems (RDBMS), which all model data the same way with relational schemas, there are different types of NoSQL databases that offer different ways to model and store data. Here are the major types:

Key-value databases, as the name suggests, map keys to values. The “key” is simply an identifier, and the value tends to be implemented as an opaque binary object that’s decoded by the database application, similar to the way an RDBMS deals with large data objects like images, sound files, or big chunks of unstructured text. Key-value databases enable quick access to data but the storage schema doesn’t embody relationships among the data.
Document databases (formerly known as document-oriented databases) store data in a format such as XML or JSON, which are referred to as self-describing formats because they include descriptive labels on what data is stored. Document databases are good for storing data that is hierarchical and nested, like books and other text-heavy content.
Wide column stores are somewhat similar to the relational model in that they store data in tables—but they are much more flexible. Relational databases don’t allow adding new columns on the fly, which is an important constraint for the sake of data integrity, but there are environments where adding columns on the fly are useful. Wide column databases are far more efficient for storing records that have different sets of columns.
Graph databases use a branch of mathematics known as graph theory. Graph theory represents entities as “vertexes” connected by “edges.” The edges show relationships between entities. Examples include airline route networks or “friend of a friend” relationships in social networks.

In theory, each of these types are good for specific types of data and use cases, but in practice, many different types of data can be handled by any of the first three types of NoSQL databases. NoSQL users tend to choose their technology for other features and characteristics, and typically worry less about the primary supported data model. That simply means it is up to the application developer to model their data to fit the type of NoSQL database. It is up to you as the business owner to define the business requirements—response SLAs, uptime, data sources and types, etc.,—and then match the right technology for your needs.

Graph databases are more specialized than the other three types, and are great for social network analysis, but are less practical for general database workloads. If you have business requirements around quickly identifying links between many different entities, using a graph database is far faster and more efficient than any other type of database.

Advantages of NoSQL

So, why should you go with a NoSQL database? As I mentioned above, the primary reason is that you have data volumes that are hitting the performance limits of your RDBMS. This can be resolved with two main characteristics of NoSQL databases: data flexibility and scalability. If you need to access a wide variety of data formats in a single system, your developers might end up spending an inordinate amount of time trying to create a standardized schema that can accept all the different source data formats. And if your data simply grows at an ongoing and unpredictable rate, the cost of scaling your RDBMS to handle that load can be prohibitive. NoSQL databases provide the data flexibility and scalability to handle your big data environment in a cost-effective manner.

NoSQL database must-haves

What should you look for in a NoSQL database? There’s no simple answer to that because your requirements may differ significantly from other NoSQL users. In addition to performance (i.e., throughput and latency), some key questions you should be asking include:

How well can the system scale? If you’re looking at tens of gigabytes of data, then you’ll probably be fine with any NoSQL technology. Once you start managing many terabytes, or even petabytes of data, you’ll need a system that was designed for such volume.
What access controls does the system include? If you have data that must necessarily have different levels of access for different roles and users, built-in security (access controls) is a must.
How does the system integrate with Hadoop? Since most enterprises today will deploy an analytics system for data stored in NoSQL, tight integrations with Hadoop are often necessary to more efficiently derive business-critical insights.
What administrative tasks are required? Some systems require intensive disk cleanup tasks, which impact the overall uptime of your system, so seeking automated optimizations is important for running a production environment.

Disadvantages of NoSQL

This all sounds good, but there are some disadvantages. If a basic tenet of NoSQL is about using the right tool for the job, then certainly there are jobs for which NoSQL aren’t ideal. For example, you can’t use NoSQL as a drop-in replacement for an RDBMS. It doesn’t have full SQL support, and you don’t get the multi-row transactional guarantees (known as “ACID transactions”) that make RDBMS so powerful. There also is no API standard across NoSQL technologies (i.e., no analog to the SQL query language), so writing applications that can be easily ported to other NoSQL databases is a challenge. But NoSQL continues to flourish because of the significant advantages it provides when it comes to cost-effective scaling and performance.

Conclusion

Databases are complex products, and NoSQL adds yet another new dimension to data management. But if you take a little time to plan carefully, and map some of the NoSQL advantages to your existing challenges as well as to your existing aspirations, you will have a smoother experience with the latest in database technology.

Post topics: Big Data Tools and Pipelines