Chapter 4. Onboarding and Identity

Now that you have a sense of the broader multi-tenant terminology and landscape, let’s look at what it means to bring these concepts to life in a working solution. The question is: where do you start? So many teams ask me this question. Fortunately, this is an area where I think there’s a pretty uniform answer. Whether you’re migrating or greenfield, I’d always point you at onboarding, identity, and the control plane as the starting point for building most multi-tenant architectures. Each of these elements forces important, foundational constructs into your environment, defining how tenants will be introduced and how users will be created and bound to tenants. These first steps will begin to establish the building blocks of our control plane.

By starting here, you’ll put tenancy front and center. This means that all the layers of your architecture are now forced to be multi-tenant aware. Each component of your system will now have to consider how tenancy might shape its design and implementation. While this may seem like a subtle nuance, its impact is quite profound. The mere presence of tenancy touches how you isolate tenants, how you represent their data, how you support multiple personas, how you bill tenants, and a host of other aspects of your solution. It also begins to establish the clear boundary between the control and application planes. The goal is to avoid falling into the trap of starting with the application and bolting on tenancy after the fact. This never works well and typically leads to significant refactoring and compromises that undermine the design of your SaaS architecture.

I’ll start the chapter by looking at what it takes to get the basics of our control plane up and running. This is where we’ll look at the provisioning of the infrastructure and resources that are needed to host the various services that will be used to manage and operate your SaaS architecture. While this control plane will ultimately host many services, I’ll keep this chapter mostly focused on the onboarding and identity functionality. Then, later, we can see how we build out more aspects of the control plane.

As we dig into onboarding, you’ll get a much better sense of all the moving parts that are part of this process. For some environments, the orchestration of this process can be quite complex. While the nature of onboarding can vary for each SaaS environment, there are still some common themes that span many implementations. I’ll dig into some of these themes as I walk you through a sample onboarding flow. This should surface some of the considerations that go into building your own onboarding service. It should also highlight the critical role onboarding plays within your SaaS architecture.

The next area I’ll review is identity. I’ll look more into the details of how we bind individual users to tenants to arrive at the notion of tenant context that was discussed in Chapter 1. This will include going deeper into the specific identity mechanisms that allow us to shape how tenants are authenticated, injecting tenant context into the requests that flow through all the backend services of your SaaS application. We’ll see how this context ends up shaping and influencing how teams build and manage the multi-tenant features of their SaaS architecture.

Looking at all these foundational concepts together should give you a clearer view into just how essential it is to address these concepts up front. The goal is to expose you to the key strategies, patterns, and considerations without getting too close to the specifics of any one technology. Understanding these core concepts will equip you with the insights that will shape how you approach many of the multi-tenant topics we’ll be covering in subsequent chapters.

Creating a Baseline Environment

To get started on this journey, I want to approach onboarding and identity as if we are starting from scratch. This should give you a better sense of how you might approach implementing these strategies from the ground up. That means we have to take a step back from the specifics of onboarding and identity and first think about what foundational pieces we have to put in place before we can start onboarding tenants. The services that support onboarding run inside of the control plane, so we need to start by putting in place all the bits that are needed to run all the control plane microservices that support onboarding and identity.

Creation of infrastructure, its dependent resources, and the control plane is what I refer to as creating a baseline environment. We essentially need to create the scripts and the automation that will allow us to spin up all of the constructs that are needed to host our SaaS environment. While our goal is to get onboarding and identity up and running, the scope of the baseline environment includes all the resources that would be one-time provisioned to set up our multi-tenant environment before we start onboarding. This means we’ll be setting up some resources that go beyond the scope of tenant onboarding and identity. We won’t focus on those other bits right now, but it’s important to note that a baseline environment is inclusive of all of these concepts.

The actual creation of our baseline environment is achieved through a classic DevOps model, using infrastructure automation tooling to create, configure, and deploy all the assets that are required by our baseline environment. Figure 4-1 provides a highly conceptualized view of this experience.

Figure 4-1. Automating creation of a baseline environment

The basic idea is that you’ll pick the DevOps tool(s) that fits your environment and create a single, repeatable automation model that can configure everything you need to get your environment moved to a state where it can begin onboarding tenants.

Of course, what’s actually in your baseline environment will vary wildly based on the nature of the specific technology stack you’re using for your SaaS solution. A Kubernetes stack, for example, could look very different from a serverless stack. The nuances of different cloud providers would also influence the provisioning process. We’ll look at more specific examples to see how they land, but for now we want to come up a level and just focus on what needs to get provisioned in this step to prepare our system to begin onboarding tenants.

Creating Your Baseline Environment

To get a better sense of what’s in this baseline environment, let’s look at a sample of what might get provisioned, configured, and deployed to bring your baseline environment to life. In Figure 4-2 you’ll see that I’ve assembled a conceptual view of the components and infrastructure that might get created in a baseline environment. The goal was to represent some of the core baseline infrastructure concepts without getting too lost in the details of any specific technology.

Figure 4-2. Provisioning a baseline environment

In the middle of Figure 4-2, you’ll see that I’ve created the foundational networking infrastructure that’s needed to host my multi-tenant SaaS environment. For this example, I’ve just grabbed some common AWS networking constructs (a VPC, Availability Zones, and some subnets) to represent the high availability network that will host my SaaS environment. These same networking constructs could be mapped to any number of different technologies. The key at this stage is just to focus on the fact that the configuration and setup of this baseline environment will require you to provision and configure all the core networking constructs that will be used by your control plane and, potentially, your tenants.

Within this network, I’ve also shown the deployment of the control plane. Since the control plane is shared by all tenants, it can be configured and deployed as part of the provisioning of your baseline environment. The control plane must also be in place for us to begin onboarding tenants and establishing their identity. Here, to simplify matters, I included a sampling of a few services. In reality the list of control plane services would include a much broader range. We’ll see those services in more detail when we start digging into more concrete solutions.

On the bottom righthand side of Figure 4-2, you’ll also see a collection of pooled resources. The items here represent the conceptual placeholders for any resources that might be shared by tenants. Generally, if you have pooled resources that will be shared by all tenants, you can provision them during the setup of your baseline environment (since they won’t need to be created during the onboarding process). Storage often provides a good example here. Imagine having a pooled database for some microservice in your solution. If it’s pooled, it could be created when the baseline environment is provisioned. You’ll also see the setup of a shared identity construct and a pooled message queue. Again, these are just here to highlight the fact that you’ll want to consider whether these should be provisioned during the setup of your baseline environment. I’ll get into some of the trade-offs when we go deeper into the tenant onboarding experience later in this chapter.

Finally, on the top right, I’ve shown placeholders for the system admin identity and administration console. This represents the users that are logging into the specific tooling that you’ve created to support, update, configure, and generally manage the state of your multi-tenant architecture. I refer to this targeted tooling as your system admin console. It’s this console that serves as the single plane of glass for your SaaS environment, providing your team with a purpose-built collection of features and capabilities that are essential to operating your multi-tenant environment; it will be used in combination with other off-the-shelf solutions that provide more generalized functionality. Even with these other tools, most SaaS teams require their own custom admin application that can address the specific multi-tenant needs of their environment.

Figure 4-3 provides a snapshot of a simple SaaS administration console application to help make this concept more concrete. It’s through this application that you’ll have access to all core information about your SaaS solution. You’ll be able to monitor the status of onboarding tenants, activate/deactivate tenants, manage tenant policies, view tenant/tier metrics, and any other functionality that’s needed to manage and operate your SaaS solution. This application must be configured and deployed as part of the setup of your baseline environment.

Figure 4-3. Creating and deploying a system admin console

It’s worth noting that some teams tend to underinvest in their admin consoles, deferring to ready-made solutions in favor of building something themselves. Generally, this trade-off rarely seems worth it. While you might be able to use third-party solutions to compose a console experience, there are specific operations, insights, and configuration options that can only be addressed effectively through the creation of a targeted experience.

Creating and Managing System Admin Identities

As part of setting up your baseline environment and configuring your administration application, you’ll see that your provisioning process must also set up your system admin identity model. Each time you trigger the creation of a baseline environment, you’ll be required to provide the profile of the initial administrative user that will be able to log into your admin console. Creation of this identity is entirely separate from the creation of a tenant identity. This also means you’ll need to have a completely separate authentication experience to allow these system admin users to access the admin console or any command-line tooling you might be using to manage your multi-tenant environment.

To support this system admin identity, you’ll need to have some identity provider that owns and authenticates these users. The identity provider you use here could be the same identity provider that will be used for your tenant identities. Or, it could be a separate identity provider that is used as part of a more global enterprise administration strategy. Regardless of which identity provider you use, the basic mechanics of introducing a system admin identity are going to be very similar.

The key takeaway is that you’ll need some steps on your baseline provisioning automation to create and configure your system administration identity model. This automation will include the creation and configuration of the identity provider along with the creation of the initial system admin users. Once that user is set up, you should be able to use this identity to access your system admin console. Once you’re made it into the system admin console, you’ll be able to manage and create more system admin users.

The example in Figure 4-3 happens to show a view of system admin user management. Here, I’ve accessed and authenticated into the console after provisioning my environment. I can now use this same page to create and manage other system admin users.

Triggering Onboarding from the Admin Console

Once you’ve established your system admin user and you have your admin console up and running, you have all the pieces in place to create and onboard tenants. Now, in the final version of your offering, your onboarding could be invoked as part of some self-service experience, or it could be driven by some internal process. Obviously, if this is an internally driven process, then you’ll want to use your system admin console to manage onboarding. This would mean having some operation within your console that collects all the data needed for a new tenant before invoking the onboarding operation.

Some teams find lots of value in being able to onboard tenants from within the system admin console. Even if onboarding were to eventually be a self-service model, you could still have the ability to test and validate your onboarding experience from the admin console. This can be especially helpful to teams that are validating and testing the onboarding experience of your application.

Control Plane Provisioning Options

In Figure 4-2, I showed the control plane being deployed into the same baseline infrastructure where your tenants would also land. This is a perfectly valid option. However, it’s worth noting that how and where this control plane is placed can vary based on the needs of your environment and the technology stack that’s being used for your multi-tenant architecture. In Kubernetes, for example, I could have a separate namespace for the control plane, placing my tenant environments alongside the control plane within the same cluster and networking infrastructure. I could also choose to land the control plane in a completely separate infrastructure that is dedicated to the control plane.

Figure 4-4 provides a conceptual view of these two options. On the left, you’ll see the shared control plane model where the control plane is deployed into the same environment with your tenant infrastructure. And, on the right, you’ll see an approach where the control plane gets its own dedicated environment. Here the tenants are running in a completely separate network or cluster that draws a harder line between the control and application planes.

Figure 4-4. Picking a control plane deployment model

The trade-offs of these two choices are pretty straightforward. You might choose to have a dedicated control plane environment if you want to scale, manage, and operate these environments completely independently. Compliance could also factor in here; those requirements or your domain may be better addressed by placing stronger boundaries between your control and application panes. Of course, putting the control plane in the same environment with the application plane does simplify things a bit. It reduces the number of moving parts you have to manage, configure, and provision. It might also reduce your cost footprint. If you do opt for the dedicated model, you’ll need to decide how you’ll integrate these separate constructs to allow the control plane to interact with your application plane.

Your technology stack choices might also influence how you deploy your control plane. Some teams, for example, might opt for different technology stacks for the control and application planes. I might, for example, choose serverless for the control plane and containers for the application plane. This might steer you more toward a dedicated control plane model.

The Onboarding Experience

Now that our baseline environment is provisioned and configured, we can turn our attention to the onboarding of tenants. It’s through onboarding that you’ll find that you’re establishing and exercising some of the most foundational elements of a multi-tenant architecture. In fact, when working with greenfield or migrating SaaS customers, I always suggest that they focus their initial attention on the onboarding process.

Starting here forces teams to answer many of the hard questions that will influence and shape the rest of their SaaS architecture. Onboarding isn’t just about creating a tenant. It’s about creating and configuring all the moving parts of your infrastructure that are needed to support that new tenant. In some cases, that might be a lightweight exercise and, in others, it might require a significant amount of code to orchestrate each step in the onboarding process. How your tenants are tiered, how they authenticate, how their policies are managed, how their isolation is configured, how they’re routed—these are all areas that are touched by the onboarding experience of your multi-tenant environment.

Onboarding Is Part of Your Service

Many teams fall into the trap of viewing onboarding as something that gets bolted onto their system after it’s built. They’ll create placeholders and workarounds to simulate the onboarding experience with the idea that they can “make it real” later in the process. This comes back to the discussion of comparing a service to a product. In a SaaS environment, onboarding isn’t viewed as some script or automation that’s somehow outside of the scope of your offering. Instead, it is one of the most fundamental components of your SaaS experience and getting it right should be key to any team that is building a multi-tenant solution.

Onboarding sits right in the middle of both your business and technical priorities. The experience each customer has with onboarding can have a profound impact on the broader success of the business. How seamless, efficient, and reliable this process is will have a direct impact on the experience and perception of the customers consuming your product. It is your chance to make a positive first impression. The onboarding experience is also directly connected to the notion of time to value, which looks at how long it takes a customer to move from sign-up to actual productivity and value within your SaaS offering. Any added friction that shows up here is going to impact the impression you make as a service and could, potentially, influence your ability to move customers from adopters to promoters.

Onboarding is also where the deployment, identity, routing, and tiering strategies are put into action. How tenants are siloed and pooled, for example, will need to be expressed and realized directly through your onboarding experience. How and where you authenticate tenants will be configured and applied as part of onboarding. How your tenants are contextually routed based on their tier and deployment model will be configured within the scope of onboarding. So many of these key multi-tenant design choices that you make in your SaaS architecture are ultimately expressed and brought to life through the onboarding process of your system. In many respects, your onboarding configuration, automation, and deployment code will be at the epicenter of realizing the multi-tenant strategies that you adopt for your SaaS environment.

The amount of effort and code that goes into automating onboarding may come as a surprise to some teams. It’s not uncommon for SaaS teams to underestimate the level of effort and investment that comes with building a robust onboarding experience. In reality, onboarding represents one of the most fundamental elements of a multi-tenant environment. It’s through onboarding that you can achieve the operational and agility goals that are essential to a SaaS business.

Self-Service Versus Internal Onboarding

So far, this discussion of onboarding may seem like it’s mostly describing mechanisms that are used by organizations that rely on a self-service tenant registration experience. Many of us have signed up for countless B2C SaaS offerings where we filled out some form, submitted our information, and started using some SaaS service. While this classic mode of onboarding is within our scope, we must also consider scenarios where our onboarding process may not support a self-service model. Imagine, for example, some B2B SaaS provider that only onboards after you’ve reached a deal and agreed to onboard them to your system. These SaaS vendors may only have some internally managed onboarding experience.

My point is that onboarding has no binding to a particular experience. You might have self-service onboarding or you might use internal onboarding. Every SaaS solution, regardless of how it presents its onboarding experience, must still lean into the same set of values. To me, the bar for self-service and internally managed onboarding processes is the same. Both of these approaches should be creating a fully automated, repeatable, low-friction onboarding process that focuses on maximizing a customer’s time to value. Yes, someone in operations might run your internal process. This, however, does not mean that you’d expect less automation, scale, or durability from that onboarding process.

For any SaaS system that I build, I want to be sure that I’m treating this onboarding experience as a key part of my system. It is at the center of ensuring that I have a consistent, repeatable, automated onboarding mechanism that ensures that each new tenant will be introduced without requiring any manual processes or one-off configuration.

The Fundamental Parts of Onboarding

Now that you have a better sense of the onboarding importance, let’s shift our focus more toward the details of the underlying components of an onboarding experience. While there are lots of details within the implementation of the onboarding process, my goal at this stage is to give you a top-level view of the core components of this process and outline the guiding principles that typically shape this experience.

Figure 4-5 provides a conceptual view of the moving parts of a multi-tenant onboarding experience.

Figure 4-5. The fundamentals of tenant onboarding

On the left you’ll see the illustration of the two common patterns that could be used to drive an onboarding process. First, I’ve shown a tenant administrator that is onboarding through some self-service sign-up process, presumably a web application that allows the tenant to submit their information, select a plan, and provide whatever configuration information is needed to establish themselves as a new tenant in the system. I’ve also shown a second onboarding flow that, in this example, is initiated by a system administrator. This represents some internal role at the SaaS provider using an administration console (or some other tooling) to enter the onboarding data for a new tenant and triggering the onboarding process. For this example, I included both of these onboarding paths. However, in most instances, a SaaS organization will support one of these two approaches. I only showed both here to drive home the idea that onboarding, regardless of its entry point, is meant to be a fully automated process for either of these two use cases.

For these onboarding paths, you’ll see that they both send an onboarding request to the Onboarding service (step 1). For onboarding, I generally prefer to have a single onboarding service that can own all the orchestration of onboarding. This service owns the full lifecycle of the onboarding process, managing and ensuring that all steps in the process are completed successfully. This is especially important since some aspects of onboarding may run asynchronously or have dependencies on third-party integrations that could have availability issues.

The onboarding process then calls a series of distributed services that are used to create and configure the tenant’s settings and supporting infrastructure. The sequencing of this onboarding flow can vary based on the nature of your SaaS application. Generally, the goal is to create and configure all of the required tenant assets before making the tenant active and/or notifying the tenant admin user that their account is active.

While there are multiple ways to implement this onboarding flow, you’ll need to start with creating a tenant identifier. In our example, this tenant identifier will be created by sending a create tenant request to the Tenant Management service (step 2), passing in all the information about our tenant (company name, identity configuration, tier, and so on). It will also generate the unique identifier that will be associated with our tenant. Teams will often use a globally unique identifier (GUID) as the value for their tenant identifier, avoiding the inclusion of any attributes that might be connected to the name or other identifying information about the tenant. This prevents anyone from being able to connect a tenant with a given identifier. This tenant also is created with some notion of an “active” status that manages the current state of a tenant. In this case, where we’re onboarding, the active state will initially be set to false. Once the system creates this tenant, you’ll have a tenant identifier that can be used across the rest of the onboarding experience. I’ll get into more detail about the Tenant Management service and its role within the control plane in Chapter 5.

The next step in our tenant onboarding example will involve the provisioning of any tenant resources that are required (step 3). This provisioning step can, for some multi-tenant architectures, represent one of the most significant pieces of your onboarding implementation. For a full stack silo deployment, for example, this could mean provisioning a completely new collection of infrastructure and application services. In contrast, a full stack pool environment might require minimal infrastructure provisioning and configuration.

As we dig into more working examples, you may be surprised to find out how much code and automation is devoted to this onboarding experience. In fact, this is often an area where SaaS systems blur the DevOps boundaries. While, in traditional environments, much of the DevOps lifecycle is focused on provisioning and updating your baseline infrastructure, SaaS environments may rely on the execution of DevOps code during the onboarding of each individual tenant. Your system may be provisioning and configuring new infrastructure at runtime to process the creation of a siloed tenant infrastructure. As you can imagine, this brings new considerations and mindsets to how you organize and build the overall DevOps footprint of your multi-tenant solution. For some, this represents a new mindset and new approaches to the tooling used to provision tenant environments.

At this stage, we have a tenant created and our tenant resources are provisioned. Now we can add this new tenant to the billing system (step 4). This is essentially where you’ll provide information to the billing system that identifies the new tenant and any information that’s needed to characterize the billing model that should be applied to this particular tenant. The assumption here is that, in advance of onboarding a new tenant, you’ve configured and set up the different tiers or billing plans that determine the overall pricing model of your solution. Then, during onboarding, your Billing service will correlate the tenant’s onboarding profile with the appropriate (pre-configured) billing plan.

You’ll notice that Figure 4-5 calls out a separate billing provider. The idea is that your Billing service will manage and orchestrate any integration you might have with your billing system. In many instances, this billing provider may be supported by a third-party system. It’s in these cases where you may see value in putting a separate Billing service between your onboarding process and the billing provider, allowing you to manage any unique considerations that might be required to support a given billing provider. In other instances, you might directly integrate with the billing provider from your Onboarding service. It’s also worth noting that some SaaS companies will use an internal billing system. Even in this scenario, you’d still want your onboarding process to follow a similar pattern of integration. There’s lots more about billing to consider (outside the scope of onboarding). I’ll get more into those details in Chapter 14.

For the final piece of the onboarding experience, we need to create the tenant admin user (step 5). If you recall, the tenant admin role represents the first user that is created for a given tenant. This tenant will have the ability to create any additional users that will be able to access the system. At this stage, though, our main goal is to create this initial user within our identity provider to enable our tenant to authenticate and access their provisioned environment. Here, you’ll need to rely on the features of your identity provider to orchestrate the notification and validation of this new tenant. Most identity providers will support the generation of an email message that includes a URL and temporary password for accessing the system. This process then triggers the authenticating user to enter a new password as part of the login flow. The goal is to push much of the automation of this sign-up process to your identity provider. Rely on these providers to send email invites and temporary passwords and handle password resets.

There is one last bit to this onboarding flow that you’ll need to consider. Earlier, when the tenant was created (step 2), I set the active status of the tenant to false. It’s the job of your Onboarding service to track the state of all of these different onboarding states. Only after it determines that each process has completed successfully will it set the tenant’s active status to true. This may include process retries and other fallback strategies to address any failure that may have happened during the provisioning and configuration of the tenant environment. Assuming the onboarding succeeds, the Onboarding service can now call the Tenant Management service and update the active status to true. This is especially important to the administration console of your SaaS environment, which provides the functionality that is used to view and manage the state of tenants. During this onboarding process, the view of tenants should show the state of any tenant that is being onboarded and highlight the active status of your tenants.

Tracking and Surfacing Onboarding States

From looking at this process, it should be clear that your onboarding process includes lots of moving parts and dependencies. The more complex this process becomes, the more important it is to have useful, detailed operational insights into the various states of your onboarding flow. This is essential to analyzing progress, identifying issues, and profiling the overall behavior and trends of your onboarding automation. It also means identifying the right design and tooling to effectively capture and surface the onboarding profile of your solution.

At a minimum, you could imagine having a distinct set of states mapped to each of the steps in our onboarding flow. So, you might have separate states for TENANT​_CRE⁠ATED, TENANT_PROVISIONED, BILLING_INITIALIZED, USER_CREATED, and TENANT​_ACTIVATED. Each of these states could be surfaced through the tenant view in your administration console, allowing you to inspect the onboarding of any tenant at a given moment in time.

The real value of assigning and surfacing onboarding states is to provide richer operational insights into the status of your onboarding progress. This will be essential to troubleshooting any unexpected onboarding issues. Knowing precisely where your onboarding process is failing is of prime importance to your operational teams. This is especially important when your onboarding process includes a significant amount of infrastructure provisioning and configuration. In these cases, you might track more granular states that give you insights into the various stages that are within the moving parts of your provisioning process.

Tier-Based Onboarding

As part of looking at the onboarding flow, I outlined the role of the Provisioning service and its role in creating and configuring tenant environments. This provisioning process gets a bit more interesting when you consider how different tenant tiers could influence how you implement your provisioning lifecycle. If you recall, we use tiers to present different tenant profiles with different experiences. These different experiences often translate into a need for separate infrastructure and configurations based on the tier of your system.

To better understand this, let’s look at a conceptual example of a tier-based onboarding example. Figure 4-6 provides a view of an environment that supports two separate tiers (basic and premium).

Figure 4-6. An example of tier-based onboarding

I’ve narrowed the view down to focus exclusively on the Provisioning service within our control plane. Whenever the Onboarding service triggers this Provisioning service, it provides the tenant contextual information that includes the tier that will be associated with the new tenant. When the Provisioning service receives this request, it will evaluate the tier and determine how the selected tier will influence the configuration and infrastructure that will be needed to support your tenant environment. In this example, our SaaS solution offers premium tier tenants a full stack silo deployment mode with fully dedicated resources for each tenant. This means each onboarding event will need to automate the provisioning of these full tenant stacks. Basic tier tenants, however, are onboarded into a full stack pool model where all the infrastructure is shared by tenants. Here, the onboarding will be a lighter weight experience, simply augmenting the configuration to add support for this new tenant.

These full stack deployment models have pretty distinct onboarding experiences that are relatively easy to digest. Where this gets more interesting is when you have a mixed deployment model. With a mixed mode deployment, your resources are siloed and pooled with finer granularity. This means that your onboarding process will need to apply the tier-based onboarding policies to each resource based on its silo or pool configuration. Figure 4-7 provides an example of how mixed mode deployment influences your provisioning process.

Figure 4-7. Tier-based onboarding with mixed mode deployments

I’ve intentionally made this architecture a bit busy. Our same Provisioning service is shown here, but now it has much more to consider as each tenant onboards. Let’s start on the left of Figure 4-7 where you’ll see that I have two services that are deployed separately for Tenant 1 and Tenant 2. So, for every premium tier tenant, your Provisioning service will need to configure and deploy Fulfillment and Order services in a fully siloed model. The storage and compute of these two microservices are fully siloed.

Now, even though these two microservices are deployed separately for premium tier tenants, these same services are also consumed by your Basic tier tenants. This is represented in the middle of the diagram where I’ve identified pooled versions of the Fulfillment and Order microservices that are shared by all non-premium tier tenants (in this case, Tenants 3..N). This means the Provisioning service must perform a one-time configuration and deployment of these services to support the pooled tenants. Once these services are up and running, the job of the Provisioning service for each new tenant will require fewer moving parts. You may need to configure routing or set up some policies, but most of the heavy lifting will be done after the initial provisioning and deployment of these services.

Finally, on the righthand side of the environment in Figure 4-7, you’ll see a range of services that are deployed in varying models to support the needs of both premium and basic tier tenants. The silo and pool choices that you make here are driven more by the universal needs of your multi-tenant architecture (instead of tiers). The idea is that you’re selecting silo and pool options based on a set of global needs (noisy neighbor, compliance, and so on).

In this example, I’ve intentionally created some variation in these services to highlight the use cases you may need to support as part of tenant onboarding. The Product microservice, for example, uses siloed compute for all tenants; that’s why you see a separate instance of the service for Tenants 1–3. However, you’ll also see that this same service uses pooled storage. This adds a new wrinkle to the onboarding story. Now, your Provisioning service must handle this variation, provisioning the storage a single time for all tenants while still provisioning and deploying separate instances of the Product microservice as each tenant is onboarded.

The other services (Ratings and Cart) are just here to highlight additional patterns you could see when implementing your Provisioning service. Ratings is entirely pooled for compute and storage, while the Cart microservice has pooled compute and siloed storage. Supporting onboarding for these services is about knowing what’s siloed and what’s pooled and contextually triggering the creation and configuration of these resources. This mirrors the discussion we had (in Chapter 3) around mixed mode deployment. However, here we’re looking at how that mixed mode can influence the onboarding experience of your multi-tenant environment.

One key question often comes up around the general timing of provisioning pooled resources during the onboarding process. Since these resources are configured and deployed once, many may prefer to pre-provision these resources as part of the initial setup of your entire multi-tenant environment. So, if you’re setting up a brand-new baseline environment, you could choose to provision all the pooled resources at this time. To me, this seems like the more natural approach. This could mean that your Provisioning service would support a separate path that is invoked by your DevOps tooling to perform the one-time creation of these resources. Then, as each new tenant onboards, this shared infrastructure would already be in place.

The other option could be to delay the creation of these pooled resources and trigger their creation during the onboarding of your first tenant (almost like the Lazy loading pattern). While this could slow your onboarding process, the overhead of this process would only be absorbed by your first tenant. My general bias is to pre-provision these resources. However, there could be other factors that steer you toward either one of these strategies.

While it can be interesting and powerful to support these different tier-based deployment models, it’s also essential to consider how your onboarding complexity might impact the complexity of our overall SaaS environment. Yes, you want to give the business lots of tools to be able to support different tenant profiles. At the same time, you don’t want to over-rotate here. Also, it is important to emphasize that this is still a tier level of customization. You should never view this mechanism as a way to support any notion of one-off customization for individual tenants.

Tracking Onboarded Resources

If your onboarding process needs to provision dedicated tenant resources, then you’ll also have to consider how your multi-tenant environment will track and identify these resources. What you’ll find here is that other aspects of your system will end up needing to locate and target these tenant-specific resources.

To really understand what I’m getting at, let’s consider a more concrete example. Imagine you’ve onboarded a tenant in the mixed model deployment model in Figure 4-7. This model includes plenty of examples of siloed and pooled resources. Now, imagine you just onboarded a premium tier tenant into this environment and created the individual resources that were needed to support that tenant.

Once onboarding is done and our tenant is up and running, you’ll still be deploying updates to this environment. Patches, new features, and other changes will certainly need to get deployed through the lifecycle of your application. This is where things get a bit interesting. With the mixed mode deployment we have here, we can’t simply deploy to one static location to update our system. Imagine, for example, rolling out a new version of the Order service. To get the new code deployed, your DevOps experience will need to find the separate deployments of the Order service that span all the different resources that were provisioned by the onboarding experience. Here, that would mean deploying the Order service to the Tenant 1 and Tenant 2 premium tier silos and to the basic tier pooled instance that is shared by the other tenants.

So, that begs the question: how would your deployment process know how to handle this? How would it know which resources are siloed for each tenant? The only way for this to work is to have your onboarding experience capture and record the location and identity of these per-tenant resources. While the need for this tracking information is clear, there’s no clear or standard strategy that is commonly applied to address this. Some place the data in a table as new tenants are onboarded and reference this table during their deployment process. Others might use pieces of their DevOps tool chain to address this challenge. The main takeaway is that if your onboarding process provisions dedicated tenant resources, you’ll need to capture and record the information about these resources so they can be referenced by other parts of your deployment and operational experience. You’ll see more concrete examples of this when we start looking at orchestrating onboarding in EKS (Chapter 10) and serverless (Chapter 11) examples.

Handling Onboarding Failures

Any failure in the onboarding process can represent a significant issue for SaaS providers. However, these failures take on added importance in any multi-tenant environment that has a self-service onboarding experience. Onboarding represents the first impression you’re making with a tenant and any failure in this process could translate into lost business.

While some of your reliability here will be extracted from applying solid engineering practices, there are also areas within onboarding where your dependencies on external systems can impact the durability of your onboarding process. To get a better sense of the options, let’s look at a specific example of a potential external dependency that could be part of your onboarding experience. Figure 4-8 provides a conceptual view of the billing integration that could be part of your onboarding flow.

Figure 4-8. Fault-tolerant integration with a billing provider

In this example, let’s presume you are reliant on an integration with a third-party billing provider. By including a third-party billing solution in your onboarding (which is common), you’ve made the reliability of your onboarding experience directly dependent on the availability of the billing provider. If the billing system is down, so is your tenant onboarding.

Now, you might just presume that this is just the risk associated with using third-party solutions. However, in this scenario, your system may very likely be able to continue to operate—even when the billing system is down. While it’s true that you need to get the billing account created, your system could still finish its onboarding process and complete the billing configuration when the system is back online.

In Figure 4-8, I’ve highlighted a potential approach to this problem: making the billing integration completely asynchronous. In this model, your onboarding process would request the addition of a new tenant through a queue. The Billing service then picks up the request and attempts to create an account in the billing system using an asynchronous request. If this request fails, the Billing service will capture the failure and schedule a retry. There are lots of different strategies for implementing a fault-tolerant integration. Don’t get lost in the details. The key takeaway is that I’ve created an integration model with the billing provider that enables my onboarding flow to continue without waiting for the creation of the billing account. For some, it may simply be preferable to have this always be an asynchronous integration purely for the benefit of expediting the onboarding experience.

I’ve focused on billing just because it provides a natural illustration of the importance of having a fault-tolerant onboarding experience. In reality, you should look at all the moving parts of your onboarding automation and look for points of failure or bottlenecks that might require new strategies that can expedite or add durability to your onboarding process. The cost of a failed onboarding is generally high and you want to do whatever you can to make this mechanism as robust as possible.

Testing Your Onboarding Experience

At this point, the role and importance of onboarding should be clear. The potential complexity and the number of moving parts in this process can make it particularly prone to errors. With this in mind, it should also be clear that you’ll want to take extra measures to validate the efficiency and repeatability of your onboarding process. Too many teams build an onboarding process and simply rely on the activity of customers to uncover any bottlenecks or design flaws that might be impacting their onboarding experience. To get around this, I always suggest that teams invest in building a rich collection of onboarding tests that can be used to exercise and push on all the dimensions of the onboarding experience.

There’s a range of potential test types you might consider here. You might, for example, create load tests for onboarding that simulate different onboarding workloads. Or, you might create tests that validate your ability to recover from failures. Some teams will introduce performance tests that measure the time it takes to onboard tenants. Each of these tests could be executed with a mix of different tenant tiers where a tenant’s tier might exercise different paths of your onboarding experience.

The goal is to ensure that the design, architecture, and automation assumptions of your onboarding experience are being fully realized in your working solution. That means pushing scale by simulating a full range of use cases that will push your onboarding design and implementation. It will also allow you to verify that your environment is correctly surfacing any key metrics that are used to measure your ability to meet any SLAs you’ve defined. The emphasis here is not just on ensuring that the happy path works—it’s about ensuring that onboarding meets the scale and availability requirements and delivers the service experience that meets the expectations of your customers.

Creating a SaaS Identity

So far, I’ve touched briefly on the role of identity as part of the onboarding process. However, there are lots of pieces to the identity puzzle that need exploring. Yes, onboarding sets up identity, but what does that mean? How does identity get configured, and how does multi-tenancy affect the overall experience of our SaaS environment? Here, we’ll dig more deeply into how tenancy shapes the authentication, authorization, and general multi-tenant footprint of a SaaS environment.

With multi-tenant identity, you’ll have to go beyond thinking about identity purely as a tool for authenticating users. You must broaden your view of identity to include the idea that each authenticated user must always be authenticated in the context of a tenant. It’s true that users are connected to this experience, but much of the underlying implementation of your multi-tenant architecture is primarily focused on the tenant associated with that user. So, this means our identity model must be expanded to cover both users and tenants. The basic goal is to create a tighter binding between users and tenants that allows them to be accessed, shared, and managed as a single unit.

In Figure 4-9 you’ll see a conceptual view of how a SaaS identity is composed. On the left, I have the classic view of what I’ve labeled as a user identity. This identity is focused squarely on describing and capturing the attributes of an individual. Names, phone number, email—these are all typical descriptors that would be used to characterize the user of a system. On the righthand side, however, I have also introduced the idea of a tenant identity. A tenant is more of an entity than an individual. A company, for example, subscribes to your SaaS service as a tenant, and that tenant often has many users.

Figure 4-9. Creating a logical SaaS identity

For multi-tenant environments, these two distinct notions of identity are joined together to create what I refer to as a SaaS identity. This SaaS identity must be introduced in a way that allows it to become a first-class identity construct that is passed through all the layers of your system. It becomes the vehicle for conveying your tenant context to all the parts of your system that need access to these user and tenant attributes. This SaaS identity maps directly to the tenant context concept that I described in Chapter 1.

The key is that this SaaS identity needs to be introduced without somehow impacting or complicating the traditional authentication experience. Your SaaS authentication experience must retain the freedom to follow a classic authentication flow while still enabling the merging of the user and tenant identities. Figure 4-10 provides a view of this concept in action.

Figure 4-10. SaaS identity authentication flow

In the flow you see here, a tenant user attempts to access a SaaS web application (step 1). The application detects that the user is not authenticated and redirects them to an identity provider that has awareness of both the user and tenant identities (step 2). When the user is authenticated, the identity provider will own responsibility for returning the SaaS identity (step 3). Then, this SaaS identity is passed downstream to all the rest of the moving parts of our system (step 4). This identity includes all of the tenant and user attributes that are needed to support the needs of the remaining elements of your SaaS application.

While this flow might vary based on the nature of your identity technology, the spirit of this experience should remain similar across different identity models. It also might be influenced by how tenants flow into your system and get routed to an identity provider. Subdomains, email addresses, or lookup tables, for example, could shape how you resolve a tenant’s path to the corresponding identity provider. In the end, your goal is to resolve and create this SaaS identity at the front of this process and avoid pushing this responsibility further into the details of your design and implementation.

Attaching a Tenant Identity

At this stage, I’ve talked about joining user and tenant identities. While this may make sense conceptually, we still haven’t talked about how you can combine these two concepts into a true, first-class SaaS identity construct. Naturally, how you do this will vary from one identity provider to the next.

For this discussion, I’m going to focus on how the Open Authorization (OAuth) and OpenID Connect (OIDC) specifications can be used to create and configure a SaaS identity. These specifications are used widely by a number of modern identity providers, serving as an open standard for decentralized authentication and authorization. As such, you should find that the techniques I’ve covered here should have some natural mapping to your application’s identity model.

To get tenants attached to users, we first need to understand how the OIDC specification packages and conveys a user’s authentication information. Generally, when authenticating against an OIDC-compliant identity provider, you’ll find that each authentication returns identity and access tokens. These are represented as JSON Web Tokens (JWTs) that hold all the authentication context to be used for downstream authorization. The identity token is meant to convey information about a user, while the access token is used to authorize that user’s access to different resources.

Within these JWTs, you’ll find a set of properties and values that provide more detailed information about a user. This data is referred to as claims. There is a default set of claims that are generally included with each token to ensure a standardized representation of common attributes. It’s these JWTs that become the universal currency of our multi-tenant identity model.

The good news with JWTs is that they allow for the introduction of custom claims. These custom claims are essentially the equivalent of user-defined fields that can be used to attach your own property/value pairs to JWTs. This creates the opportunity for you to attach tenant contextual data to these tokens. Figure 4-11 provides an illustration of how these custom tenant claims would get added to your JWTs.

On the left is a sample JWT populated with the example claims that are part of the OIDC specification. I won’t go through all of these, but it is worth calling out the specific user attributes that show up here. You’ll see that name, given_name, family_name, gender, birthdate, and email are all in this list. On the right, however, are the attributes of our tenant that need to be merged into the JWT. These simply get added as property/value pairs to standardized representation.

Figure 4-11. Adding tenant custom claims to a JWT

While there’s nothing magical or elegant about this model, being able to introduce these custom claims as first-class citizens provides a significant upside. Imagine how having these attributes embedded as claims ends up shaping your multi-tenant authentication and authorization experience. Figure 4-12 highlights how this seemingly simple construct ends up having a cascading impact across your multi-tenant architecture.

Figure 4-12. Authenticating with embedded tenant context

In this example, the flow starts at the web application. The user hits this page and is not authenticated, which sends them off to the identity provider for authentication (step 1). This represents a very familiar and vanilla flow that you’ve likely built multiple times. What’s different is that the data comes back from this authentication experience. When you authenticate here, the identity provider is going to return its standard tokens (step 2). However, because I’ve configured the identity provider with tenant-specific custom claims, the tokens that are returned now align with the SaaS identity that was discussed earlier. The tokens participate and behave like any other token, but are enriched with the added tenant context that we need to create a SaaS identity.

Now, these tokens can be injected as bearer tokens and sent downstream to your backend services, inheriting all the security, lifecycle, and other mechanisms that are built into the OIDC and OAuth specifications (step 3). This strategy is particularly powerful when you look at how it impacts the broader experience of your backend services. Figure 4-13 provides an example of how these tokens could flow through the different multi-tenant microservices of your application.

Figure 4-13. Passing tokens to downstream microservices

I have three different microservices, each of which requires access to tenant context. When you authenticate and receive a tenant-aware token, this token is passed into whichever microservice you’re initially calling. In this case, the call comes into the Order microservice (step 1). Then, imagine that this service needs to invoke another backend service (Product) to complete its task. It can then pass this same token along to the Product service (step 2). This pattern can then continue to cascade through additional downstream service invocations (step 3). In this example, I’ve assumed I can insert the JWT into an HTTP request as a bearer token. However, even if you’re using another protocol here, there are likely ways you can inject this JWT as part of your context.

You can imagine how this very simple mechanism ends up having a rather profound impact on your overall multi-tenant architecture. This single JWT will touch so many of the moving parts within the implementation of your multi-tenant environment. Microservices will use it for logging, metrics, billing, tenant isolation, data partitioning, and a host of other areas. Your broader SaaS architecture will use it for tiering, throttling, routing, and other global mechanisms that require tenant context. So, yes, it’s a simple concept, but the importance of its role within your SaaS architecture cannot be understated.

Populating Custom Claims During Onboarding

We’ve now seen how custom claims give us a way to connect users to tenants. What may be less clear is how and when these claims are actually introduced. There are two pretty straightforward steps associated with adding and populating these custom claims. First, before you onboard any tenant, you’ll typically need to configure your identity provider, identifying each of the custom attributes that you’d like to have added to your authentication experience. Here, you’ll define each property and type that you’ll want to end up in your custom claims. This prepares your identity provider to accept new tenants that can store and configure their tenants with the additional attributes.

The second half of this process is executed during onboarding. Earlier, I discussed the creation of the tenant administration user as part of the overall onboarding flow. However, what I didn’t mention was the population of the custom claims for your newly created tenant. As you’re adding the information about your user (name, email, etc.), you’ll also populate all the tenant context fields for that user (tenant ID, role, tier). This data must be populated for each user within the identity provider, so even after onboarding has been completed, the introduction of additional users must include the population of these custom attributes.

Using Custom Claims Judiciously

Custom claims are a useful construct for attaching tenant context to your tokens. In some cases, teams will get attached to this mechanism and expand its role, using it to capture and convey application security context. While there are no hard and fast rules here, I generally assume that if something is a custom claim, it’s playing an essential role in shaping tenant context and influencing your global authorization story.

Many applications rely on access control constructs to enable or disable access to specific application functionality. These controls should be managed outside the scope of your identity provider. Generally, I’d view it as a mistake to bloat your tokens with custom claims that are part of your traditional application access control strategy. Instead, these kinds of controls should be implemented with any one of the language or technology stack mechanisms built exclusively for this purpose.

There may be times where it’s unclear whether an attribute belongs in a custom claim or your application access control model. To me, this is often resolved based on the lifecycle and role of the attribute. If the attribute tends to be evolving with the introduction of application features, functions, and capabilities, then it should be managed more through application access controls. Generally, attributes that land in your custom claims are unlikely to be changing as your application changes. The content of your tokens, for example, should not be shifting on a weekly basis based on the addition of new application features or configuration options.

No Centralized Services for Resolving Tenant Context

Some teams try to draw a harder line between tenant identity and user identity. In these environments, the identity provider is only used to authenticate users. Here, when a user is authenticated, the tokens returned from this process do not include any tenant contextual information. In this model, these systems must rely on some downstream mechanisms to resolve tenancy. Figure 4-14 provides an example of how this might be implemented.

Figure 4-14. Using a separate user/tenant mapping service

In this example, the web application authenticates against an identity provider that has no awareness of tenant context (step 1). A successful authentication here will still return the JWTs we discussed. However, these tokens will not include any of the tenant-specific custom claims that were outlined earlier (step 2). Instead, the only data here is user data. This token is then passed along to the Order microservice (step 3). Now, when this order service needs to access data for a specific tenant, it needs to identify which tenant is associated with the current request. Since the JWT doesn’t include this information, your code would need to acquire the context from another service (step 4). In this example, I’ve introduced a Tenant Mapping service that takes the JWT, extracts the user information, resolves the user to a tenant, and returns the tenant identifier (step 5). This identifier is then used to get an order for this specific tenant (step 6).

On the surface this may seem like a perfectly valid strategy. However, it actually presents real challenges for many SaaS environments. The lesser of the issues here is that it creates a hard separation between the user and the tenant, requiring teams to manage the coupled state of the user and the tenant independently. The bigger issue, though, is that every service in the system must go through this centralized mapping mechanism to resolve tenant context. Imagine this step being performed across hundreds of services and thousands of requests. Many who adopt this approach quickly discover that this Tenant Mapping service ends up creating a significant bottleneck in their system, which then leads teams down a path of trying to optimize a service that is actually providing no business value.

This is another reason why it’s so essential that the user and tenant contexts are bound together and shared universally across the entire surface of your multi-tenant architecture. As a rule of thumb, my goal is to never have a service need to invoke some external mechanism to resolve and acquire tenant context. You want to have most everything you need to know about the tenant shared through the JWT that includes your SaaS identity information. Yes, there may be exceptions, but this should be the general mindset you take when thinking about how you’re mapping users to tenants.

Federated SaaS Identity

Most of what I’ve described so far assumes that your SaaS system will be able to run with a single identity provider that is under your control. While this represents the ideal scenario and maximizes your options, it’s also not practical to assume that every SaaS solution is built with this model. Some SaaS providers face business, domain, or customer needs that require them to support a customer or third-party hosted identity provider.

One common case I’ve seen is a scenario where a SaaS customer has some enterprise dependency on an existing, internal identity provider. Some of these customers may, as a condition of their purchase, require a SaaS provider to support authentication from these internal identity providers. These cases often come down to weighing the value of acquiring this customer against adding complexity to your environment that could impact the agility and operational efficiency of your overall SaaS experience. Still, when the right opportunity presents itself, the business parameters can push teams toward strategies that allow them to support this model.

Typically, this is achieved through some added level of tenant configuration where your tenant onboarding will add additional support for configuring this externally hosted identity provider. The goal would be to make this as seamless as possible, limiting the introduction of any invasive or one-off code that would include tenant-centric customization. The other challenge is that, in some cases, you’ll need to provide side-by-side support for the external and internal identity providers. The reality is that most of your customers are likely to expect your solution to include built-in identity support. Figure 4-15 provides a view of the moving parts of this identity pattern.

Figure 4-15. Supporting externally hosted identity providers

At the center of this example, you’ll see that I have an authentication manager. This is a conceptual placeholder for introducing some service into your authentication flow that can support a more distributed set of identity providers. To make this work, your system will need to always determine how an identity provider is hosted. Each time a user needs to authenticate, you’ll need to examine that user and retrieve the identity configuration, which will include data that describes the location and configuration of a given tenant.

On the lefthand side of Figure 4-15, I’ve included a mix of internally and externally hosted identity providers that need to be supported by a single SaaS experience. Two tenants are using their own identity provider. The remaining tenants are using your internally hosted identity provider.

This model seems pretty straightforward. However, the twist is that your system has no control over these external identity providers. As such, you can’t configure the claims of these providers or have your onboarding process add additional tenant context to the identity data that’s managed by these providers. This means that the JWTs returned from your authentication requests will not include any of the tenant context that is essential to your multi-tenant environment. To resolve this, your solution will need to introduce new functionality that can enrich the tokens returned from these external identity providers, assuming responsibility for enriching these tokens with tenant context that is managed within your SaaS environment. This allows all downstream services to continue to rely on tenant-aware JWT tokens regardless of which identity provider was used to authenticate your user. How these tokens are enriched will depend on the nature of our solution. There are strategies that will provide hooks that allow you to dynamically inject the added tenant context. In other instances, you may need a more custom solution. Generally, though, the federation models of the identity space often offer you different techniques to deal with this use case.

I’ve included this model because it represents an inevitable pattern that appears in the wild. It’s worth noting that there are clear downsides to this approach. Any time you have to insert yourself into the authentication flow, you are taking on an added role within the security footprint of your multi-tenant architecture. You may also be required to address scale and single point of failure requirements that come with sitting in the authentication flow. So, while this may be necessary, it comes with real baggage that you’ll want to consider carefully.

Tenant Grouping/Mapping Constructs

While identity providers often conform to well established specifications (OIDC, OAuth2), the constructs that are used to organize and manage identities can vary from one identity provider to the next. These providers offer a range of different constructs to group and organize users. This is especially important in multi-tenant environments where you may want to group all the users that belong to a tenant together. These group constructs can have implications that will influence how you land tenants within your identity provider. In some cases, you might also be able to use these groups to apply tiering policies to tenants to shape their authentication and authorization experience.

If we look at Amazon Cognito, for example, you’ll see that it offers multiple ways to organize tenants. Cognito introduces the idea of a User Pool. These User Pools are used to hold a collection of users, and they can be individually configured, allowing pools to offer separate authentication experiences. This might lead to a User Pool per tenant model where each tenant would be given its own pool. The alternative would be to put all tenants in a single User Pool and use other mechanisms (groups, for example) to associate users with tenants. You’d also want to consider how any limits from your identity provider might factor into choosing a strategy.

There are trade-offs you’ll want to consider as you pick between these different identity constructs. The number of tenants you have, for example, might make it impractical to have separate User Pools for every tenant. Or you may not need much variation between tenants and prefer to have all tenants configured and managed collectively. You might also be thinking about how the choices you make here could impact the authentication flow of your SaaS solution. If you have separate User Pools for each tenant, you need to think about how to map tenants to their designated pools during the authentication process. This may add a level of indirection that you may not want to absorb as part of your solution.

Scale, identity requirements, and a host of other considerations are going to shape how you choose to map tenants to whichever constructs are supported by your identity provider. The key is that as you start to lay out your SaaS identity strategy, you’ll want to identify the different units of organization that can be used to group your tenants and determine how those will shape the scale, authentication, and configuration of your multi-tenant authentication experience.

With different organizational constructs also come different identity configuration options. Identity providers generally provide a range of options that can be used to configure your authentication experience. Multi-factor authentication (MFA), for example, is offered as an identity feature that can be enabled or disabled. You can also configure password formatting requirements and expiration policies.

The settings for these different configuration options do not have to be globally applied to all of your tenants. You may want to make different identity features available to different tenant tiers. Maybe you’ll only make MFA available to your premium tier tenants, or you might decide to surface these configuration options within the tenant administration experience of your SaaS application and allow each tenant to configure these different identity settings. This can be a differentiating feature that can add value for your tenants and allow them to create the identity experience that best fits the needs of their business.

How or if you can offer this identity customization will depend on how your specific identity provider organizes and surfaces these options. Some providers will allow you to configure this separately for individual tenants, and others will only allow this to be configured globally. You’ll need to dig into the constructs of your specific identity provider to figure out whether you can associate these identity policies with individual tenants.

Sharing User IDs Across Tenants

Each user of your SaaS system has some user ID that identifies that user to a tenant. This user identifier is often represented by an email address. In many cases, a single user will be associated with a single SaaS tenant. However, there are times when SaaS providers have interest in associating a single email address with many tenants. This, of course, adds a level of indirection to your authentication. Somewhere in your login flow, your SaaS system will need to determine which tenant you’re accessing.

While I’ve seen requests for supporting this mode, I have yet to uncover any out-of-the-box strategy for handling this use case. That being said, there are some patterns that I have seen applied here. The most brute force way I’ve seen is one that pushes the tenant resolution to the end user; during sign-in, the system will detect that a user belongs to multiple tenants and will prompt the user to select a target tenant. This is anything but elegant and it does create an information leak in that anyone can use an email address to see which tenants you belong to (if and only if you belong to more than one). In the model, you’d have a mapping table that connected users to tenants and you would use this as a lookup in advance of starting the authentication flow.

A cleaner approach to this would be to rely on an authentication experience that supplied context more explicitly. The best example is probably domains and subdomains. If each of your tenants is assigned a subdomain (tenant1.saasprovider.com), your authentication process can use this subdomain to acquire the tenant context. Then the system would authenticate you against the specified tenant. This would allow the user to authenticate without any intermediate process to identify the target tenant.

There are other complications in this scenario. Imagine, for example, all of your users are running in a shared identity provider construct. In that mode, the identity provider is going to require each user to be unique. This would make it impossible to support having a single user ID associated with multiple tenants. Instead, you may want to consider relying on a more granular construct to hold each tenant’s data (like the User Pool mentioned earlier).

Tenant Authentication Is Not Tenant Isolation

As part of this discussion of authentication and JWTs, I sometimes find that teams will equate authentication to tenant isolation. The assumption here is that authentication is the barrier to entry for tenants and that, once you’ve made it beyond that challenge, you have met the criteria for tenant isolation in multi-tenant environments.

This is definitely an area of disconnect. Yes, authentication starts the isolation story by issuing a JWT with tenant context. However, the code in your microservices can still include implementation that—even when working on behalf of an authenticated user—can access the resources of another tenant. Tenant isolation builds upon the tenant context that you get from an authenticated user, implementing a completely separate layer of controls and measures to ensure that your code is not allowed to cross a tenant boundary. You’ll get a deeper look at these strategies in Chapter 9.

Conclusion

This chapter was all about describing the foundational elements that represent the starting point for creating a multi-tenant architecture. My focus was on introducing the core constructs that are used to inject the notion of tenancy into your architecture. You’ll notice that nothing about these first steps includes any effort to define the application experience. Instead, it’s putting tenancy front and center in your architecture. Putting these fundamental pieces in place early will require your team to design, build, test, and operate in a multi-tenant context across all the stages of your development process. From day one, your architecture will need to account for all the dynamics that come with supporting multiple tenants. The overall goal is to avoid the trap of viewing multi-tenancy as a bolt-on that can be added after you’ve built your application. That mode rarely works and usually leads to painful compromises and refactoring.

We started the chapter at the most basic level, exploring the process of creating your baseline environment and deploying the first bits of your control plane. Getting the shell of the control plane in place allows you to carve out the space that will eventually house all the services that will be part of it. It also forces you to begin thinking about the overall deployment, versioning, and general lifecycle of your control plane.

From there, we shifted our attention to the onboarding experience, highlighting the complexity, challenges, and considerations that come with introducing tenants into your environment. We walked through a conceptual view of an onboarding flow to give you a better sense of the moving parts that are part of this experience. A big part of this discussion also surrounded the mindset that comes with automating your onboarding flow. It’s here that we saw how automating this onboarding automation brings new DevOps nuances to your environment, stretching how you might think about where and when tenant environments are provisioned and configured. Our look at onboarding also emphasized the broader role it plays in supporting and enabling the scale, agility, and innovation goals of your business.

With onboarding, we talked about how tenants get introduced into your environment. The natural progression was to look at how the setup of these tenants influences the authentication experience of your environment. It’s through authentication that we see some of the payoff of the work that was done during onboarding. Our review of authentication shifted our focus to the role identity plays in a SaaS environment. We examined how our identity provider creates a connection between users and tenants, establishing what I referred to as a SaaS identity. This makes SaaS identity a first-class concept in our architecture. We explored how the authentication of tenants yields tokens that include all the context we need to inject into all the downstream bits of our SaaS architecture. This should have highlighted just how essential it is to have this SaaS identity woven into your experience from the outset of building a multi-tenant environment.

While I’ve only touched on the conceptual elements of onboarding and identity, this should give you a better sense of the moving parts and considerations that come with creating these foundational constructs. As we move forward, we’ll see more concrete versions of these mechanisms and see how different deployment models and technology stacks can influence the design and implementation of onboarding and identity. We’ll also see this notion of tenant context showing up in our review of other dimensions of your architecture, including data partitioning, tenant isolation, multi-tenant microservices, and so on.

First, though, we’re going to look a little deeper inside the control plane and examine the Tenant Management component. This chapter already hinted at how Tenant Management surfaces as part of the onboarding experience. Now, I want to look more exclusively at the role of this service within your control plane. While not exotic or overly complex, it often sits at the middle of our multi-tenant story. I’ll look at what it means to create this service and outline some of the key considerations that can influence its implementation.

Get Building Multi-Tenant SaaS Architectures now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.