Snowflake’s Cortex AI capabilities allow teams to shift their analytical bottleneck from technological restrictions to their users’ creativity. In this article, we discover how Cortex can be used to analyze data stored in free-form text fields at scale.
I recently had the opportunity to help an insurance company with a common problem: valuable data is being captured but is stored in free-form text fields. As such, it was nearly impossible to construct a query that extracts and delivers key information in a reportable format. My client knew they could add value to the company if they could just tap into this data, but before Cortex, they had no easy solution.
While this article covers solutions for my client’s specific use case, these strategies can be applied to any use case dealing with free-form text data at scale. The examples in this article are simplified versions of what was used in my project.
Calls or chats between an insurance agent and a customer are recorded in a notes table. The agent selects a category and a subcategory for the interaction and adds any notes from the conversation. While the category fields enable users to create queries to identify records that fall into different buckets, it is difficult to extract insights from the notes fields themselves - especially at scale.
note_id |
category |
subcategory |
Notes |
1 |
Billing |
Policy |
The customer called in and said they were dropping us. They were disappointed with the lack of transparency on the website. |
2 |
Claims |
Status |
The customer wasn’t sure how to find the claim status on the website and is also considering shopping around for new insurance. |
3 |
Other |
Other |
The customer called to express her gratitude for Jeff and mentioned he was the best insurance representative she had ever dealt with. |
Let’s consider the above records. Sure, we can easily write a query to identify how many Billing or Claims calls came in, but what happens if more than one category applies to a call? How can we identify calls that discuss cancellations if the category field is not ‘Cancellation’? How can we get a better view of issues negatively affecting the customer experience on our website? This is the type of analysis our customer wanted to be able to run and what drove us to our solution.
We required a solution that allowed users to easily query records for key information that may get lost inside the notes fields. To do so, we take advantage of the Cortex complete function to task an LLM with identifying whether or not key topics appear during a discussion. These ‘yes’ or ‘no’ responses can then be extracted into columns where the new data points are ready for analysis.
Using our example from above, let's say we want to capture information on whether or not a call discusses cancellations, website questions, or unhappy customers. By using Snowflake's Cortex complete function, we are able to make the following transformation:
note_id |
category |
subcategory |
notes |
1 |
Billing |
Policy |
The customer called in and said they were dropping us. They were disappointed with the lack of transparency on the website. |
2 |
Claims |
Status |
The customer likes her service but is considering shopping around for cheaper rates. |
3 |
Other |
Other |
The customer was frustrated with how difficult filing a claim is on the website. |
note_id |
category |
subcategory |
notes |
is_cancel |
is_website |
is_unhappy |
1 |
Billing |
Policy |
The customer called in and said they were dropping us. They were disappointed with the lack of transparency on the website. |
1 |
1 |
1 |
2 |
Claims |
Status |
The customer likes her service but is considering shopping around for cheaper rates. |
1 |
0 |
0 |
3 |
Other |
Other |
The customer was frustrated with how difficult filing a claim is on the website. |
0 |
1 |
1 |
With a few more steps, and using 100% Snowflake native features, we were able to fully automate this process for production workloads. We simply added the code for the above transformation into a stored procedure, created a stream on the base table, and created a task to execute the procedure whenever new records arrive in the base table.
In discussions with the end users, we pinpointed ten key category flags that would be most advantageous for the LLM to recognize. For our initial phase, we processed a sample of 50,000 records. Here are some highlights of the value this process added.
While this new feature allows users to identify which categories an interaction falls into more easily, it also opens the door for previously impossible analytics. For example, if we want to quickly identify website issues that are hurting the customer experience, we could query for records that have the ‘Website Question’ and ‘Unhappy Customer’ flags marked as “true.” We could then run an additional Cortex process on top of these to identify common themes throughout this subset of notes.
Or what if we want a better picture of why customers are canceling even if they are not unhappy with our service? Do we see a high correlation between website questions and late payments? How can we use this information to reduce overall call volume - and, ultimately, our business costs?
Cortex allows teams to shift their analytical bottleneck from technological restrictions to their users’ creativity, and I was pleasantly surprised at how easy Snowflake makes it. I was able to get a working POC up and running for this project in a little under 2 hours.
While this was an impactful use case, it is just scratching the surface of what Cortex can do. I’m excited to continue diving deeper, and am eager to hear how others are incorporating Cortex into their own workloads as well. Stay tuned for more insights, or contact us today to learn more about how Ippon can help your organization!