Chapter 4. GraphQL Servers

At a high level, a GraphQL server is responsible for responding to queries from clients. It’s typically fronted by an HTTP server and listens at https://api.myserver.com/graphql. Clients (whether GraphQL or otherwise) make requests to that endpoint, and the server responds.

A GraphQL server is composed of two parts: an HTTP server and a GraphQL engine, as shown in Figure 4-1.

Figure 4-1. GraphQL server

The core GraphQL engine accepts the schema definition upon instantiation, builds the type schema, and allows you to execute queries against that schema. This is a library of code implemented in many common programming languages.

The HTTP server accepts the GraphQL queries and then passes them to the core GraphQL engine. When the engine responds, the HTTP server then passes the JSON response back to the client.

Let’s discuss this in more detail.

Building a Type Schema

The GraphQL schema definition is purely some text. It’s like code that hasn’t been compiled yet. It’s pretty useless on its own sitting in a text file or as a string in memory somewhere. For the GraphQL schema to be of any use, it must be loaded into a GraphQL server upon instantiation. Here’s how you’d do it with GraphQL.js, which is Facebook’s reference implementation and the near standard:

var { graphql, buildSchema } = require('graphql');

// Construct a schema, using GraphQL schema language
var schema = buildSchema(`
	type Product {
		id: ID!
		brandIconURL: String
		name: String!
		description: String!
		price: Float!
		ableToSell: Boolean!
		averageReview: Float
		numberOfReviews: Int
		maxAllowableQty: Int
		images: [Image]
	}
`);

// The root provides a resolver function for each API endpoint
var root = {
  id: () => { return '94695736'; },
  brandIconURL: () => { return 'https://www.legocdn.com/images/disney_icon.png'; },
  name: () => { return 'The Disney Castle'; },
  description: () => { return 'Welcome to the magical Disney Castle!....'; },
  price: () => { return 349.99; },
  ableToSell: () => { return true; },
  averageReview: () => { return 4.5; },
  numberOfReviews: () => { return 208; },
  maxAllowableQty: () => { return 5; },
  images: () => { return [ {url: 'https://www.legocdn.com/images/products/94695736/1.png', altText: 'Fully assembled castle'}, {url: 'https://www.legocdn.com/images/products/94695736/2.png', altText: 'Castle in the box'} ] },
};

(based somewhat on the example found in the GraphQL.js tutorial).

In this very simple example, we instantiated a GraphQL server with a single type defined. We cover this shortly, but each field must have a corresponding “resolver,” which is a function that provides the value for the field. In this example, the data is hardcoded but in real-world code you’d call the various REST APIs to retrieve the data needed for each field.

At this point, the server is instantiated but is not yet ready to accept queries.

HTTP Request Handling

The GraphQL engine is responsible for building the type schema and parsing/validating/executing queries, but it doesn’t provide the functionality required to accept and respond to queries. You need a way to pass queries to the library and for the responses to be passed back to the clients given that the clients are always physically separated from the servers. Although you could front your GraphQL engine with anything that can be used to communicate with a client, an HTTP server makes the most sense and is by far the default within the GraphQL community.

Note

The GraphQL specification doesn’t include anything in front of the GraphQL engine. HTTP servers, authentication, authorization, caching, and so on are all completely beyond the scope of the specification.

An HTTP server automatically gets you the following:

  • Authentication

  • Authorization

  • Protection from malicious queries

  • Monitoring

  • Metrics collection

  • Health checking

And so on. There are hundreds of feature-rich, mature HTTP servers available, so it makes sense to take advantage of that ecosystem to front your GraphQL engine.

Most GraphQL servers are written in JavaScript and therefore require a JavaScript-based HTTP server such as Express, Koa, and Hapi. You can easily embed these HTTP servers in your application. You can also layer your HTTP servers with other intermediaries that are capable of working with HTTP. For example, you could put AWS Elastic Load Balancer (ELB) in front and have that route HTTP requests down to Express. AWS ELB could have the logic for security, monitoring, and more, and Express could serve as a pass-through to the GraphQL engine.

By custom, GraphQL is exposed as /graphql. All requests are posted to that single URI, typically as HTTP POST, though HTTP GET is often used, as well.

Here are the variables that you’ll need to post:

query

The actual query, like "{product(id: $id) {price}}." This is required.

operationName

The name of the query to execute, in case there are multiple queries provided. Recall that in Chapter 2 we discussed that multiple queries are possible in the same string. This is required only if multiple queries are provided.

variables

A map of key/value pairs that are used as variables.

Here’s an example of what you’d post to a /graphql URI over HTTP POST:

{
  "query": "{product(id: $id) {price}}",
  "variables": { "id": "94695736"}
}

Next, it’s time for the GraphQL engine to parse the query.

Parsing Queries

Upon receiving a query, the GraphQL engine immediately parses it to an abstract syntax tree (AST). An AST is basically a parsed version of the query with extra metadata around arguments, data types, location of code, and so on. Visually, an AST looks like a graph, as demonstrated in Figure 4-2.

Figure 4-2. AST

Conversion to AST is required by the GraphQL specification.

A very simple GraphQL query looks something like this:

query {
	product(id: "100001", locale: "en_US") {
		name
	}
}

In an AST, it would be represented as shown in Figure 4-3.

The full AST representation of this simple three-line query is 120 lines long. Conversion to an AST is something that the GraphQL server automatically does. You will never need to see or deal with an AST. It’s strictly internal to the GraphQL server.

Figure 4-3. An AST representation of a sample query (from https://astexplorer.net)

Validating

Next, the GraphQL server takes the newly generated AST and checks it for errors. The GraphQL specification is very particular about validation. It starts out the validation section by stating the following:

GraphQL does not just verify if a request is syntactically correct, but also ensures that it is unambiguous and mistake‐free in the context of a given GraphQL schema.

Common errors include:

  • Referencing types or fields that don’t exist

  • Missing required parameters or fields

  • Data in the wrong format (e.g., a String when the schema called for a Float)

The GraphQL specification is explicit in stating that only queries that have been fully validated should be executed.

Note

This is another example of how GraphQL is different from REST. Unless you implement custom validation for each REST API, most applications will execute any request that passes XML or JSON validation.

Executing Queries

Now that the server has been started and the query has been parsed and validated, it’s time to actually execute it.

The GraphQL server starts by crawling the AST it produced earlier. It then executes what are called “resolver” functions for each type and field to retrieve the data from the underlying source. Finally, the GraphQL server constructs a single JSON response that is passed back to the HTTP server in front.

Let’s spend some time on resolvers because they’re really the heart of GraphQL. A resolver is the function that calls the underlying REST APIs, databases, legacy backends, or any other source of data.

Resolvers are what actually call the underlying source of data and return the data in the proper format. They shouldn’t have any business logic, because GraphQL is strictly an intermediary.

Here’s an example of a simple query:

query {
	product(id: "94695736", locale: "en_US") {
		name
	}
}

And here’s a simple response:

{
	"data" {
		"product": {
			"name": "The Disney Castle"
		}
}

In the GraphQL server, there’s a resolver function for the “name” field that returns the actual name of the product. Here’s what that function would look like in GraphQL.js, the JavaScript-based GraphQL reference implementation from Facebook:

name(obj, args, context, info) {
	if (context.product == null) {
		fetch('https://api.myserver.com/product/94695736')
			.then(resp => resp.json())
			.then(context.product = resp)
	}
	return context.product.name;
)

In this example, name is the field name, obj is the parent object (in this example, the query), args are any arguments provided to the field (like (locale='en_US')), and context is a catch-all object that can be used to store long-lived objects of value to other resolver functions. You wouldn’t want to call a REST API for every field.

Server Implementations

GraphQL is a specification, not a specific implementation. Anyone can write a GraphQL server using whatever programming language and implementation methodology, so long as it adheres to the specification. As we’ve discussed, the GraphQL specification is fairly silent on how the internals of a GraphQL server should work.

Much of the GraphQL community uses GraphQL.js either directly or indirectly as the GraphQL server. GraphQL.js was released by Facebook (along with the original specification) in 2015 and is actively maintained by a large community of individual and corporate contributors. The entire Apollo ecosystem, which is the largest and most popular within the GraphQL community, is built around GraphQL.js. There really aren’t any other JavaScript-based implementations of GraphQL servers, and with JavaScript being the default client- and server-side programming language in the GraphQL community, GraphQL.js is the de facto standard.

Many organizations do not run much or any JavaScript on the server side, yet they still choose GraphQL.js because of how strong of a GraphQL server implementation it is. Remember, there shouldn’t be any business logic in your GraphQL server. GraphQL is a layer over the top of your REST APIs, databases, legacy backends, or any other source of data. Most of the time, you call an API, parse the results, and return the data specified by each type or field. The programming language you use is mostly irrelevant. What matters is that you’re able to find developers and that you’re supported by a rich ecosystem of tooling. JavaScript checks both of those boxes.

If you want a non-JavaScript GraphQL server, there are implementations available in many programming languages including Scala, Go, Java, Ruby, Python, Haskell, and more.

When possible, it’s best to use a GraphQL client and server provided by the same vendor because of the additional functionality provided outside of the GraphQL specification. Remember that caching, security, monitoring, testing, and other aspects are completely beyond the scope of the GraphQL specification. Many of those features require making changes to both the client and the server, with both pieces working together. Furthermore, some frontend frameworks like Relay require that the GraphQL server have additional functionality that’s outside the GraphQL specification. For example, users of Sangria, the GraphQL implementation written in Scala, must add an additional library to support Relay.

Now that we’ve covered the fundamentals of GraphQL servers, let’s discuss some of the additional features that are available in various implementations.

Monitoring

Like all services in production, you must monitor GraphQL for availability, functionality, and performance. Unlike REST APIs, for which each endpoint can be monitored separately, all GraphQL requests are GETs or POSTs to /graphql. A query can be for one field or for an entire product catalog.

The key to monitoring GraphQL is to monitor your resolver functions because that’s where the real work happens. The GraphQL server itself is very unlikely to fail. Within each resolver function, it’s best practice to log a correlation ID, the operation type, and the query complexity (which we discuss soon) to a log file and/or a time series database like InfluxDB or Prometheus. You then can layer an analytics platform like Grafana on top to aggregate and analyze the data.

Per the GraphQL specification, GraphQL servers execute resolvers concurrently in the case of queries, and sequentially in the case of mutations. Therefore, the performance of any given GraphQL query is a function of the slowest resolver and the performance of any given GraphQL mutation is the sum of all resolvers.

Commercial GraphQL vendors have full monitoring solutions in place already, so it’s best to use that functionality if it’s available.

Testing

Every time you touch your schema, resolvers, or underlying data sources (REST APIs, databases, legacy backends, or any other source of data) you need to retest your entire GraphQL layer to make sure you didn’t introduce any errors.

Fortunately, GraphQL is easy to test in local environments as well as in integration or QA environments. With GraphQL, you have a fixed set of inputs (queries, mutations, and variables) and a fixed set of outputs (in nicely formatted JSON). It’s easy to write test cases with real or mocked data that exercise every type and field in your schema. Writing tests should be mandatory for every new type and field introduced to your schema.

If you want to provide your frontend developers with some mocked data so that they can build their frontends as the backend is being built in parallel, it’s easy to have each resolver return some mocked data. If you want to test resolver functionality in isolation, outside of the GraphQL server, you can call each resolver independently because they’re self-contained functions with a few inputs and a fixed output.

Some commercial GraphQL server vendors have features that allow you to replay all transactions from a past period of time (usually a few days) against your changed GraphQL schema to make sure nothing broke. You can even integrate these tests into your CI pipeline with every change to the schema.

Everyone’s testing needs are unique, and there are enough commercial products and open source tools available to meet your needs.

Security

A topic of particular importance to those adopting GraphQL is security. GraphQL’s centralization makes security easier (authentication, authorization, etc.) but also more challenging (expensive queries, destructive mutations, etc.).

Let’s explore the security-related topics that you’ll need to address.

Authentication

Though the /graphql URI is often publicly available, you don’t want anyone to be able to call it. Users should be required to properly authenticate. Authentication ensures that a user, whether a human or another system, is who he/she/it purports to be.

Authentication should be performed in a layer that sits atop your GraphQL server. The HTTP servers embedded within a typical GraphQL server tend to be fairly minimalist and therefore might not support the additional authentication-related features that a more robust HTTP server/load balancer/reverse proxy would be able to support. You also want to shield your GraphQL server from abusive queries and denial of service attacks.

Because GraphQL is served over HTTP, you can take advantage of all of the common authentication schemes and tooling available for traditional REST APIs. See APIs for Modern Commerce (O’Reilly 2017) for more information.

Authorization

After you’ve authenticated your client, you must now authorize that client to call specific operations (query, mutation, subscription, introspection), specific types (products, orders, inventory, etc.) and specific fields (price, quantity, availableToSell).

Here are some common business rules you’d want to implement:

  • The merchandising team should be able to view only product catalog–related data. Any data related to an individual customer shouldn’t be viewable.

  • Only connections made from the CSR application should be allowed to add credits to a customer’s account.

  • Administrators should be able to do anything.

  • Nobody should be allowed to call the deleteAllOrders mutation in a production environment.

If you recall from earlier in the chapter, each resolver is passed a context object of some sort, which is used to access long-lived objects, database connections, and other objects that are of value to all resolvers. That context object can also be instantiated with a user object. That user object could have the following:

  • User ID/name

  • Role(s)

  • Organization

Within each resolver, you can then apply limited business logic. Here’s a very simple example of how you’d prevent merchandising team members from viewing the products field of an order:

products(obj, args, context, info) {
	if (context.user == null || context.user.role == "merchandising") {
		return null;
	}
	return context.order.products;
)

If the user isn’t attached to context or the user has the wrong role, you could return null (as in this example), an empty value/array, or throw an error to the client. It’s up to you.

In this example, we’re using GraphQL.js, but the concepts are the same regardless of your GraphQL server implementation.

Expensive Queries

One of the benefits of GraphQL is that it allows you to query and mutate large amounts of data with a single line of text. Can you imagine the amount of work a GraphQL server would need in order to serve this response?

query lotsOfData {
	allProducts
	allSKUs
	allCategories
	allCustomers
	allOrders
}

That single query would quickly peg any server’s CPU. The response size would be well into the gigabytes. If someone were to accidentally run that a few times, it could very quickly bring down an entire GraphQL server.

Another challenge with GraphQL is that its ability to traverse a graph can lead to deep recursions that burn valuable resources. You could define a product with a reference to category and a category with a reference to all of its products as follows:

type Product {
	category: Category!
}

type Category {
	products: [Product]
}

Now imagine a query like this:

query somethingMalicious {
	allProducts {
		category {
			products {
				category {
					products {
						category
					}
				}
			}
		}
	}
}

Fortunately, there are well-established ways to protect your GraphQL server from overly complex queries, whether malicious or not.

Query size limits

Before the query even hits your GraphQL server, you can filter out unusually large HTTP requests. Large could be defined as follows:

  • Number of characters

  • Number of bytes

  • Number of unique types and/or fields requested

This filtering can be done in the HTTP server above your GraphQL server.

Clearly this isn’t very effective, but queries that are dramatically larger than others should be filtered. You might want to block all HTTP requests that are larger than 250 kilobytes, for example.

Timeouts

Another way to protect your GraphQL server is to kill queries that are taking too long to execute. If a query is running for 10 seconds, something is probably wrong and the query should be terminated.

It’s best to terminate long-running queries at both the HTTP server above the GraphQL server as well as within the GraphQL server itself. Depending on the implementation, it is possible to halt the execution of the resolvers, but there might be unexpected errors as connections to data sources are abruptly terminated.

Allowlists

Rather than allow all GraphQL queries, another approach is to create an allowlist of acceptable queries. Only queries on the list are allowed to be executed.

If you have control over your clients, you can run tools like PersistGraphQL to analyze your code and pull out any GraphQL queries. Those queries are then added to the list. If you were to pull up GraphiQL and arbitrarily execute queries, they wouldn’t work.

Another option is to watch the queries executed over the course of a week and build an allowlist from that. Subsequent queries not on that allowlist wouldn’t be allowed to be executed.

Query depth

Another option for protecting your GraphQL server is to check the depth of your queries before you execute them. Using something like graphql-depth-limit, you can see how many levels deep your queries are.

For example, this query is one level deep:

query somethingMalicious {
	allProducts
}

This query is two levels deep:

query somethingMalicious {
	allProducts {
		category
	}
}

And so on.

Part of the value of GraphQL is that you can form these large, nested queries. But there needs to be a limit. A query five levels deep is verging on abusive. A query 10 levels deep is definitely abusive.

An advantage of checking query depth is that it forces your developers to come up with more elegant solutions. Nobody wants to write or debug a query that’s five levels deep.

Query complexity/cost

Similar to query depth, you can estimate the cost of a query before it’s executed by looking at its complexity. Here are some factors that influence a query’s complexity:

  • Nesting depth

  • How many fields are requested

  • How many types are requested

Assuming defaults from graphql-query-complexity, the following query would have a cost score of 2:

query {
	product(id: "94695736", locale: "en_US") { # cost=1
	        name                               # cost=1
	}
}

You could assign a higher weight to specific fields. For example, retrieving a product and its attributes is pretty straightforward. Now let’s add in price:

query {
	product(id: "94695736", locale: "en_US") { # cost=1
	        name                               # cost=1
	        description,                       # cost=1
	        brandIconURL,                      # cost=1
	        price                              # cost=5
	}
}

This query comes out with a cost of 9. Price has a higher complexity score because retrieving it requires another call to a different REST API. Network hops are always more expensive and therefore cost more.

You then can set a maximum cost for a query. For example, you could set a cost limit of 100. As with query depth, enforcing a maximum cost forces your developers to come up with more elegant solutions.

Merging Schemas

The greatest value of GraphQL is that there’s a single schema for frontend developers to execute queries against. Developers can get any data they want from any backend datasource with one query. The GraphQL schema and the interconnectedness of the types and fields is what enables this. Unfortunately, the GraphQL layer can quickly become a monolith. Imagine having 25 different microservice teams, each trying to contribute to a single GraphQL schema as depicted in Figure 4-4. It quickly becomes complicated. The whole “monolith in the pipes” problem is what led to service-oriented architecture’s decline.

Given this tension between centralization and decentralization, how do you allow individual teams to work in parallel while exposing a single cohesive GraphQL schema to your clients?

Figure 4-4. Centralization of GraphQL versus decentralization of clients/microservices

Separate Files

A very simple yet effective way of distributing the ownership of your GraphQL schema is to break apart your schema into separate physical files. When you instantiate your GraphQL server, you need only to pass it a string containing your schema definition. That string could be retrieved from 1 file or 100 files, it doesn’t matter to your server. It needs a single string. At runtime or at build time, you could easily combine multiple files. Popular libraries like graphql-import can automate this process for you.

In this model, you would have your pricing microservice team own pricing.graphql, while your product catalog team would own product_catalog.graphql, and so on. Different teams will be touching other team’s schema definitions, but at least there’s some ownership and physical separation of definitions.

Schema Stitching

To this point, we’ve only discussed having different microservice teams within an organization contributing to the same schema. With many software vendors exposing their own GraphQL endpoints, your frontend developers could end up having to call different /graphql endpoints based on what data they want to retrieve. Suppose that you have a commerce platform vendor, a CMS vendor, and some microservices you’ve built in-house. Each organization exposes its own /graphql endpoint as follows:

Commerce platform: http://commerce-platform-vendor.com/graphql

query product {
	product(id: "94695736") {
		displayName
	}
}

CMS: http://cms-vendor.com/graphql

query content {
	content(productId: "94695736") {
		longDescription
		images
	}
}

Your own microservices: http://your-company.com/graphql

query inventory {
	inventory(productId: "94695736") {
		quantity
	}
}

Imagine the challenges your frontend developers would have calling each of those endpoints, each with their own authentication and authorization schemes. You might as well go back to calling individual REST endpoints. You can run into the same issue if you have multiple teams within an organization each exposing their own GraphQL endpoint.

With GraphQL schema stitching (again, not part of the GraphQL specification), you can combine those three queries into one:

query productDetailPage {
	product(id: "94695736") {
		displayName
	}

	inventory(productId: "94695736") {
		quantity
	}

	content(productId: "94695736") {
		longDescription
		images
	}
}

To set it up, you need a GraphQL gateway that exposes its own /graphql endpoint. That gateway then connects to and merges the schemas from the other /graphql endpoints.

The big drawback with schema stitching is that you must query each type. You can’t have one query that accesses one type (like “product” in this example). Instead, you need to have one query that accesses the product, inventory, and productContent types. Another drawback is that there can be naming conflicts. For example, what happens if two schemas both define a type named product?

Schema Federation

Increasingly, schema federation is replacing schema stitching. Schema federation allows multiple teams/vendors to contribute to a single type, so that clients need to query only one type.

Here’s an example of how you’d instantiate your Apollo-based GraphQL server with different types from different sources:

const gateway = new ApolloGateway({
	serviceList: [
		{ name: 'product', url: 'http://commerce-platform-vendor.com/graphql' },
		{ name: 'content', url: 'http://cms-vendor.com/graphql' },
		{ name: 'inventory', url: 'http://your-company.com/graphql' }
	]
});

(async () => {
	const { schema, executor } = await gateway.load();
	const server = new ApolloServer({ schema, executor });
	server.listen();
})();

You can then query a single type (in this case product) and it will magically pull fields from the appropriate types from the appropriate /graphql endpoints provided the @key and @external directives are properly used:

query productDetailPage {
	product(id: "94695736") {
		displayName       # from http://commerce-platform-vendor.com/graphql
		longDescription   # from http://cms-vendor.com/graphql
		images            # from http://cms-vendor.com/graphql
		inventory         # from http://your-company.com/graphql
	}
}

Schema federation at scale is difficult but worth it due to the autonomy it gives each team.

Final Thoughts

By now you should have a firm understanding of the shortcomings of using plain REST for commerce, graphs and how they are used in everyday life, the origins of GraphQL, the GraphQL specification, and how GraphQL clients and servers work.

Adopting GraphQL will give your frontend developers a substantial competitive advantage by allowing them to get exactly the data they want, without the burden of trying to find and retrieve data from various REST APIs. Even though GraphQL is an additional layer to maintain, many will find that not having to support many diverse frontends that rapidly change will far outweigh the overhead incurred by adopting it.

Happy GraphQL’ing!

Get GraphQL for Modern Commerce now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.