Welcome back to the “Breaking the Stack” blog series. In this part (Part 3) will take an in-depth look at the pool tenant isolation model. This section will frequently reference the base knowledge and concepts that we built up in Part 1 and Part 2, so if you are feeling lost or unsure, go back and read those parts first! After exploring the pool Isolation model, we will spend some time discussing monolith decomposition, the services AWS provides for us to achieve tenant isolation, and finally, we will take a look at a decision framework for choosing a new tenancy isolation model (or not) as part of your cloud migration strategy.
The pool tenancy isolation model is characterized by the stack having all shared resources. This is by far the most common model in more modern systems, especially when talking about web and SaaS applications. Most applications in this model will have these key indicators:
Let’s have a look at a diagram to better illustrate this tenancy model.
Diagram 3
Now that we have a good understanding of the pool isolation model, let’s dive into what considerations you may need to make when migrating an app that uses this tenancy isolation model. Most of the migrations I have been involved in with these types of apps are characterized by large data transfers and single cut-over events. In the silo isolation model, we were able to migrate one client at a time, allowing for a phased approach. For pool isolation model apps, we have to move all of the data at one time and then perform a single cut-over event. If there are multiple services, they can usually be done in phases, but moving the data is where these migrations will typically grow in complexity.
With silo isolation, we can use tools like backup and restore to move databases. This process usually has a maintenance window where the data delta is moved (i.e., any changes that were made from the time of the full backup to the time of the cut-over window). For pool isolation model apps, especially at scale, these methodologies start to break down a bit… For one, the Full backup could be massive, as it contains all of the tenant's data. Secondly, if we do a differential backup, depending on how much data has changed, it could take longer to transfer over the network and restore on the target machine than we have available in our scheduled maintenance window. This is where transactional replication, log shipping, logical decoding, change data capture, and many other migration strategies come into play. By continuously streaming data from the source database to the target, we ensure that we can have a relatively short cut-over maintenance window. If someone says zero downtime cut-over, be sure to inspect this statement and their strategies with scrutiny. I prefer the term, near-zero downtime cut-over.
What about if we are migrating a silo or bridge isolation model app and wish to move towards the pool isolation model? You may want to do this to modernize your stack, take advantage of certain economies of scale, or lower your maintenance burden. If the databases in your source environment are isolated per tenant, and you want to combine them, you may be in for some serious refactoring. Remember, the mechanism that allows tenants to share a single database is tightly coupled to the actual data model. Inserting unique identifiers into every row of every table, and then subsequently updating application code to leverage those identifiers in every query, can be quite the undertaking. Not only that, but the cost of getting it wrong could be compliance violations or data breaches. This is why it is very important to let the compliance and performance requirements drive the decision-making aspect of this architecture change. If adding tenant identifiers to tables and updating the queries to leverage those identifiers is doable, and if combining all the tenants into one DB cluster will not negatively impact performance, then this could be possible for your application.
More common than transitioning from isolated DBs to pooled DB resources is simply running the isolated DBs on a single server. If in your source environment, you have many different database servers that all have some number of tenant databases running on them, consolidating onto a single database server could be the right move. Just make sure to test extensively. If you have hard-coded database users in your application, this modernization step may require some refactoring of how applications are configured to set up and maintain database connection strings. This is a great time to introduce services like AWS Secrets Manager and AWS Systems Manager.
This may seem hearsay to some, but you can also use the migration of a pool tenancy isolation model app to change to a bridge or silo model. For instance, if you are having serious performance issues with monolithic services or large databases, splitting them into isolated services may be just the remedy you need. This is especially true if you have some tenants with very strict compliance or performance requirements, and others that don’t. Moving certain very large or busy tenants into their dedicated resources can take pressure off all of the other tenants. This is commonly referred to as the noisy neighbor problem. Sometimes, when we design applications, we design them for the amount of traffic that we have at the time (which is usually very low to begin with). As the application ages, new features are added, and the code base grows. At the same time, traffic keeps increasing, with the number of users continuing to grow (hopefully). If scaling becomes an issue for your application, and no amount of refactoring is fixing said issue, or you don’t have the dev talent to refactor, then changing isolation models could be the right move for you.
Finally, let’s talk about monitoring… If you are migrating from a silo isolation model to a bridge isolation model (or a pool isolation model), be sure to explore how monitoring and observability will work. When tenants are isolated, troubleshooting typically involves going to that isolated environment to view logs, metrics, traces, etc. If we add a shared resource layer, we need to ensure that those logs and metrics tell us which tenant we are looking at. We can do this by making the shared resources tenant-aware and injecting into those log files tenant identifiers. This is often referred to as application instrumentation. You don’t want to end up in a spot where there are error messages, and you don’t know which isolated client or database they originated from.
As I have mentioned above, migrating provides many opportunities to refactor and segment our systems. A common path of modernization during migrations is the decomposing of monoliths. By breaking out different parts of our monolith into microservices, we can start to apply different tenant isolation strategies at different layers. This can be achieved using the strangler pattern or creating a proxy service. As an example, consider the following.
Transition to leveraging modern centralized identity and access management (CIAM) systems. These systems typically let you add metadata to a user or group of users. These metadata flags, sometimes called custom claims or properties, can be used to help with routing between pooled and isolated resources. Examples of services with this capability include AWS Cognito, Azure AD B2C, and Auth0. The added benefit of moving to one of these systems is the availability of things like MFA, self-service password resets, and a reduction in maintenance overhead costs. The so-called “front door” of the application is the best place to start handling tenancy concerns, as logging in is usually the first step a user will take when accessing your software.
Let’s talk business logic in the Monolith. If we are looking at a more modern monolith, business logic likely exists in the application layer, between the user interface and where the data lives. This could be an API (or collection of them, let’s not discount the “microservices monolith”), or it could be a piece of middleware. If this is the case and you are migrating the application and want to adapt some sort of different tenancy isolation model, you will need to let the business logic drive the isolation strategy in this layer. To help better understand, consider an example where we have a monolithic application with an API layer, where each tenant has their own instance of the API. The primary reason we may have ended up in this situation is that tenants have specific rules regarding authorization workflows, caching policies, or message routing between services. If this is the case, it would be quite the undertaking to switch this to a pool model. If the bulk of the business logic lives in this layer, it may imply that the database is not chock-full of business logic in the form of stored procedures like some legacy applications. So maybe the database is a prime candidate for consolidation into a shared resource. For Legacy Applications where the inverse is true, adding a shared API layer may be just the thing!
Let’s investigate how AWS Cloud helps us to achieve our tenancy isolation needs. Starting at the very top layer is AWS Organizations and Accounts. If your application, or rather your compliance requirements surrounding your use case, have hard isolation requirements, then putting each tenant into a separate account and managing those accounts with AWS Organizations may be the right direction for you. This strategy is only for the most extreme use cases, where isolation must exist at every layer.
If you still need isolation, but don’t require the extreme example of using separate AWS accounts, and you have undertaken the process of containerizing your application, then you can rely on namespaces in Elastic Kubernetes Service (EKS) or on using separate Fargate Tasks in Elastic Container Service (ECS). These container orchestration paradigms are a bit easier to “logically manage” than having separate servers and provide a much easier way to choreograph the various services and pieces of your application stack. This is commonly referred to as soft isolation or logical isolation. If you are not into containers and don’t know about orchestration engines, just look at SQL Server for a fine example. SQL Server allows you to run multiple “logically isolated” databases on a single machine. I won’t say this is the exact same thing as namespaces in EKS or Fargate Tasks in ECS, but it is a close enough pattern to remove doubt and achieve understanding.
To quickly name a few more, AWS has RDS and Aurora relational database systems that support many engines, most of which have great isolation options. For object storage, when using AWS S3, you can leverage bucket policies and prefixes for storage segmentations. For access control and resource management, you can leverage Service Control Policies, IAM Boundaries, Resource Tags, and so many more. If you are curious about all the various options available to you, don’t hesitate to drop us a line.
Finally, we have reached the end of our journey. I want to leave you with two things here at the end. A decision framework and a quick peek at how tenancy isolation can impact a company's Artificial Intelligence ambitions. Let’s have a look at that decision framework for deciding on tenancy isolation models. Below are five key considerations when it comes to choosing.
Well, it has been a heck of a journey! This blog series explores the complexities of tenant isolation strategies during cloud migrations, particularly from on-premise hypervisors like VMware to cloud environments like AWS. It defines tenant isolation and discusses how migration necessitates a rethinking of these strategies due to cost, performance, compliance, and architectural differences.
The blog series details three primary tenancy isolation models: silo (dedicated resources per tenant), bridge (mixed shared and isolated resources), and pool (all shared resources). It explains the considerations, challenges, and opportunities associated with each model during migration, emphasizing the importance of understanding application "touch points" and data layer segmentation.
It highlights that the pool model is common in modern web applications, while the silo model offers isolation flexibility but higher costs. The bridge model provides a balance between sharing and isolation. The choice of model depends on regulatory compliance, noisy neighbor risk, per-tenant customizations, cost optimization, and operational complexity.
The document concludes with AWS services and constructs that support various isolation needs, emphasizing that the migration process offers significant opportunities for modernization and adaptation of tenant isolation strategies. If you made it this far, why not drop us a line at sales@ipponusa.com? We would love to service your tenancy isolation needs.