Breaking the Stack: Rethinking Tenant Isolation in your Cloud Migration - Part 3

Written by Lucas Ward | Jul 9, 2025 6:30:00 AM

Part 3 of 3 - The Pool Tenant Isolation Model

Welcome back to the “Breaking the Stack” blog series. In this part (Part 3) will take an in-depth look at the pool tenant isolation model. This section will frequently reference the base knowledge and concepts that we built up in Part 1 and Part 2, so if you are feeling lost or unsure, go back and read those parts first! After exploring the pool Isolation model, we will spend some time discussing monolith decomposition, the services AWS provides for us to achieve tenant isolation, and finally, we will take a look at a decision framework for choosing a new tenancy isolation model (or not) as part of your cloud migration strategy.

Pool Model

The pool tenancy isolation model is characterized by the stack having all shared resources. This is by far the most common model in more modern systems, especially when talking about web and SaaS applications. Most applications in this model will have these key indicators:

A single client that is shared by everyone, usually accessed via a web browser. There may be multiple versions at any given time, for instance, if doing a phased roll-out or a blue-green deployment, but for the most part, there is always just one version of the application that we want users to access.
A single database instance or server that has one schema. Every client will access the same database (or schema), and tenant isolation will be controlled by some type of tenant identifier. For instance, when you log in with your user, that user may be a part of a group, and that group ID could be an index on every table in the database, allowing for queries to access data for only that single group of users (i.e., a single tenant). Modern database engines have native mechanisms to make this much easier, commonly called row-level security. It’s important to note that even if you have a large DB cluster with multiple reader and writer nodes, it’s still considered a shared resource.
A single API layer or a layer of microservices that serve all clients. These resources may be scaled horizontally, so there could be multiple copies of the service running, but for all intents and purposes, it is shared. This is especially true if the service is stateless. With stateless services, any tenant can use any copy of the service and receive the same result. If there is a state, then you will likely have to use sticky sessions, and each service will handle a pool of tenants, but again, this is still considered a shared resource.

Let’s have a look at a diagram to better illustrate this tenancy model.

Diagram 3

Now that we have a good understanding of the pool isolation model, let’s dive into what considerations you may need to make when migrating an app that uses this tenancy isolation model. Most of the migrations I have been involved in with these types of apps are characterized by large data transfers and single cut-over events. In the silo isolation model, we were able to migrate one client at a time, allowing for a phased approach. For pool isolation model apps, we have to move all of the data at one time and then perform a single cut-over event. If there are multiple services, they can usually be done in phases, but moving the data is where these migrations will typically grow in complexity.

With silo isolation, we can use tools like backup and restore to move databases. This process usually has a maintenance window where the data delta is moved (i.e., any changes that were made from the time of the full backup to the time of the cut-over window). For pool isolation model apps, especially at scale, these methodologies start to break down a bit… For one, the Full backup could be massive, as it contains all of the tenant's data. Secondly, if we do a differential backup, depending on how much data has changed, it could take longer to transfer over the network and restore on the target machine than we have available in our scheduled maintenance window. This is where transactional replication, log shipping, logical decoding, change data capture, and many other migration strategies come into play. By continuously streaming data from the source database to the target, we ensure that we can have a relatively short cut-over maintenance window. If someone says zero downtime cut-over, be sure to inspect this statement and their strategies with scrutiny. I prefer the term, near-zero downtime cut-over.

What about if we are migrating a silo or bridge isolation model app and wish to move towards the pool isolation model? You may want to do this to modernize your stack, take advantage of certain economies of scale, or lower your maintenance burden. If the databases in your source environment are isolated per tenant, and you want to combine them, you may be in for some serious refactoring. Remember, the mechanism that allows tenants to share a single database is tightly coupled to the actual data model. Inserting unique identifiers into every row of every table, and then subsequently updating application code to leverage those identifiers in every query, can be quite the undertaking. Not only that, but the cost of getting it wrong could be compliance violations or data breaches. This is why it is very important to let the compliance and performance requirements drive the decision-making aspect of this architecture change. If adding tenant identifiers to tables and updating the queries to leverage those identifiers is doable, and if combining all the tenants into one DB cluster will not negatively impact performance, then this could be possible for your application.

More common than transitioning from isolated DBs to pooled DB resources is simply running the isolated DBs on a single server. If in your source environment, you have many different database servers that all have some number of tenant databases running on them, consolidating onto a single database server could be the right move. Just make sure to test extensively. If you have hard-coded database users in your application, this modernization step may require some refactoring of how applications are configured to set up and maintain database connection strings. This is a great time to introduce services like AWS Secrets Manager and AWS Systems Manager.

This may seem hearsay to some, but you can also use the migration of a pool tenancy isolation model app to change to a bridge or silo model. For instance, if you are having serious performance issues with monolithic services or large databases, splitting them into isolated services may be just the remedy you need. This is especially true if you have some tenants with very strict compliance or performance requirements, and others that don’t. Moving certain very large or busy tenants into their dedicated resources can take pressure off all of the other tenants. This is commonly referred to as the noisy neighbor problem. Sometimes, when we design applications, we design them for the amount of traffic that we have at the time (which is usually very low to begin with). As the application ages, new features are added, and the code base grows. At the same time, traffic keeps increasing, with the number of users continuing to grow (hopefully). If scaling becomes an issue for your application, and no amount of refactoring is fixing said issue, or you don’t have the dev talent to refactor, then changing isolation models could be the right move for you.

Finally, let’s talk about monitoring… If you are migrating from a silo isolation model to a bridge isolation model (or a pool isolation model), be sure to explore how monitoring and observability will work. When tenants are isolated, troubleshooting typically involves going to that isolated environment to view logs, metrics, traces, etc. If we add a shared resource layer, we need to ensure that those logs and metrics tell us which tenant we are looking at. We can do this by making the shared resources tenant-aware and injecting into those log files tenant identifiers. This is often referred to as application instrumentation. You don’t want to end up in a spot where there are error messages, and you don’t know which isolated client or database they originated from.

Decomposing the Monolith: Opportunities for Mixed Models

As I have mentioned above, migrating provides many opportunities to refactor and segment our systems. A common path of modernization during migrations is the decomposing of monoliths. By breaking out different parts of our monolith into microservices, we can start to apply different tenant isolation strategies at different layers. This can be achieved using the strangler pattern or creating a proxy service. As an example, consider the following.

Transition to leveraging modern centralized identity and access management (CIAM) systems. These systems typically let you add metadata to a user or group of users. These metadata flags, sometimes called custom claims or properties, can be used to help with routing between pooled and isolated resources. Examples of services with this capability include AWS Cognito, Azure AD B2C, and Auth0. The added benefit of moving to one of these systems is the availability of things like MFA, self-service password resets, and a reduction in maintenance overhead costs. The so-called “front door” of the application is the best place to start handling tenancy concerns, as logging in is usually the first step a user will take when accessing your software.

Let’s talk business logic in the Monolith. If we are looking at a more modern monolith, business logic likely exists in the application layer, between the user interface and where the data lives. This could be an API (or collection of them, let’s not discount the “microservices monolith”), or it could be a piece of middleware. If this is the case and you are migrating the application and want to adapt some sort of different tenancy isolation model, you will need to let the business logic drive the isolation strategy in this layer. To help better understand, consider an example where we have a monolithic application with an API layer, where each tenant has their own instance of the API. The primary reason we may have ended up in this situation is that tenants have specific rules regarding authorization workflows, caching policies, or message routing between services. If this is the case, it would be quite the undertaking to switch this to a pool model. If the bulk of the business logic lives in this layer, it may imply that the database is not chock-full of business logic in the form of stored procedures like some legacy applications. So maybe the database is a prime candidate for consolidation into a shared resource. For Legacy Applications where the inverse is true, adding a shared API layer may be just the thing!

AWS Services and Constructs to Support Isolation

Let’s investigate how AWS Cloud helps us to achieve our tenancy isolation needs. Starting at the very top layer is AWS Organizations and Accounts. If your application, or rather your compliance requirements surrounding your use case, have hard isolation requirements, then putting each tenant into a separate account and managing those accounts with AWS Organizations may be the right direction for you. This strategy is only for the most extreme use cases, where isolation must exist at every layer.

If you still need isolation, but don’t require the extreme example of using separate AWS accounts, and you have undertaken the process of containerizing your application, then you can rely on namespaces in Elastic Kubernetes Service (EKS) or on using separate Fargate Tasks in Elastic Container Service (ECS). These container orchestration paradigms are a bit easier to “logically manage” than having separate servers and provide a much easier way to choreograph the various services and pieces of your application stack. This is commonly referred to as soft isolation or logical isolation. If you are not into containers and don’t know about orchestration engines, just look at SQL Server for a fine example. SQL Server allows you to run multiple “logically isolated” databases on a single machine. I won’t say this is the exact same thing as namespaces in EKS or Fargate Tasks in ECS, but it is a close enough pattern to remove doubt and achieve understanding.

To quickly name a few more, AWS has RDS and Aurora relational database systems that support many engines, most of which have great isolation options. For object storage, when using AWS S3, you can leverage bucket policies and prefixes for storage segmentations. For access control and resource management, you can leverage Service Control Policies, IAM Boundaries, Resource Tags, and so many more. If you are curious about all the various options available to you, don’t hesitate to drop us a line.

How to Choose an Isolation Model

Finally, we have reached the end of our journey. I want to leave you with two things here at the end. A decision framework and a quick peek at how tenancy isolation can impact a company's Artificial Intelligence ambitions. Let’s have a look at that decision framework for deciding on tenancy isolation models. Below are five key considerations when it comes to choosing.

Regulatory / Compliance - If your workload has to maintain HIPAA or PCI compliance, be sure to seek to deeply understand these requirements and how they impact tenancy isolation. If unsure, it is best to err on the side of caution and simply keep your data layer isolated.

Noisy Neighbor Risk - If you have tenants that are 100 or even 1000 times busier (i.e., consuming lots of CPU, RAM, Network throughput, Disk IO, etc.), it may be worth it to isolate those tenants. This will help to avoid your smaller tenants being impacted by traffic spikes on your largest tenants.

Per-Tenant Customizations - If you allow your tenants to apply a layer of customization to their workload that is not strictly stored in front-end logic, meaning, having customization in the API or DB layer (and also business logic), then it may be a good idea to keep those tenants isolated. Figuring out how to make a one-size-fits-all version of your application that keeps every tenant happy when they have had the freedom of customization in the past is a surefire way to make your tenants upset, potentially impacting your business outcomes.

Cost Optimization - If cost and efficiency are a primary concern, using pooled resources will typically result in some economies of scale. Make sure to factor in the engineering cost and effort to achieve isolation model changes into your decision-making.

Operational Complexity - There are tradeoffs in both directions when looking at siloed, bridge, or pooled tenant isolation strategies. Having many siloed tenants means having lots of separate infrastructure, it also means having a different type of deployment automation when compared to pooled tenants. On the flip side, operating a pooled or bridge isolation tenancy model ultimately means that you will have bigger systems that support more parallel traffic. If your org is not already using containers, doing so can help simplify things, but it would be a joke to think it did not come with added complexity and skillset requirements.

Conclusion

Well, it has been a heck of a journey! This blog series explores the complexities of tenant isolation strategies during cloud migrations, particularly from on-premise hypervisors like VMware to cloud environments like AWS. It defines tenant isolation and discusses how migration necessitates a rethinking of these strategies due to cost, performance, compliance, and architectural differences.

The blog series details three primary tenancy isolation models: silo (dedicated resources per tenant), bridge (mixed shared and isolated resources), and pool (all shared resources). It explains the considerations, challenges, and opportunities associated with each model during migration, emphasizing the importance of understanding application "touch points" and data layer segmentation.

It highlights that the pool model is common in modern web applications, while the silo model offers isolation flexibility but higher costs. The bridge model provides a balance between sharing and isolation. The choice of model depends on regulatory compliance, noisy neighbor risk, per-tenant customizations, cost optimization, and operational complexity.

The document concludes with AWS services and constructs that support various isolation needs, emphasizing that the migration process offers significant opportunities for modernization and adaptation of tenant isolation strategies. If you made it this far, why not drop us a line at sales@ipponusa.com? We would love to service your tenancy isolation needs.

View full post