This blog is to serve as a bonus to the three-part blog series "Breaking the Stack: Rethinking Tenant Isolation in Your Cloud Migration.” In this blog, we will explore how various tenant isolation strategies can impact your AI/ML strategy and maturity. If you are not familiar with the silo, bridge, and pool tenant isolation models, go back and read the three-part series first! Without further ado, let’s take a look at a very hot topic - AI adoption.
Most organizations nowadays have some level of ambition when it comes to AI and ML in their stack. I’m here to tell you it’s not just hype, and the use cases are as real as they are wide and varied. Let’s dive into the various tenancy isolation models and how they can impact your machine learning and large language model designs.
In this model, data is typically kept in separate infrastructure stacks or databases, often for regulatory or security reasons. While this model does increase those things, it makes it much harder to aggregate data across tenants. This limits the ability to train global models or to gain cross-tenant insights. Not only will training be a challenge, but costs will be high as well. This is primarily due to redundant compute resources that are required to run large language models separately per tenant. These challenges exist, but some solutions are being actively developed and explored, such as federated learning or anonymized data aggregation approaches that can allow you to safely leverage AI at scale, even if you are using a siloed tenant isolation paradigm.
In this tenancy model, data and services are shared. This means it is much better suited for applying AI/ML technologies at scale. The centralized data enables more powerful and accurate models to be trained across tenants. Inference endpoints can serve all tenants, improving cost, efficiency, and performance. Integration with cloud-native services like Amazon Bedrock, SageMaker, and Comprehend services is also much simpler. Sounds perfect, right? Well, look out! The pool isolation model, when combined with LLMs or ML, can and will introduce a higher risk of data leakage or unintended model behavior. Tenant data must be strictly segmented during training and inference, and strong tenant context enforcement, role-based access controls, and explainability controls must be critically examined.
That brings us to the final tenancy isolation model and the end of this document. The thing with the bridge model is that it depends on which parts of the stack are pooled and which parts are siloed. This model provides some flexibility when it comes to leveraging AI. For instance, sensitive data, such as health records or financial data, can remain siloed for privacy-focused ML tasks. Less-sensitive, aggregated data, such as behavioral patterns and usage metrics, can be pooled for shared model training or LLM fine-tuning. This model allows a selective approach where needed, leveraging tenant-specific models for compliance-sensitive use cases and shared models for general features like chatbots, recommendations, or anomaly detection. This approach may seem balanced, but it is also very complex and requires carefully planned architectural designs to define where models run, how data is shared, and how to avoid tenant data bleed.
In this blog, we looked at how tenant isolation models can impact a company's AI/ML maturity. Silo models hinder cross-tenant data aggregation, pool models enhance AI/ML applications but introduce data leakage risks, and bridge models offer selective AI application. If you are curious about your AI/ML strategy or maturity, or if you want to explore changes to your tenant isolation strategies in support of your AI initiatives, then drop us a line at sales@ipponusa.com.