Breaking Analytical Barriers with Snowflake Cortex

Snowflake’s Cortex AI capabilities allow teams to shift their analytical bottleneck from technological restrictions to their users’ creativity. In this article, we discover how Cortex can be used to analyze data stored in free-form text fields at scale.

A Common Dilemma

I recently had the opportunity to help an insurance company with a common problem: valuable data is being captured but is stored in free-form text fields. As such, it was nearly impossible to construct a query that extracts and delivers key information in a reportable format. My client knew they could add value to the company if they could just tap into this data, but before Cortex, they had no easy solution.

While this article covers solutions for my client’s specific use case, these strategies can be applied to any use case dealing with free-form text data at scale. The examples in this article are simplified versions of what was used in my project.

The Use Case

Calls or chats between an insurance agent and a customer are recorded in a notes table. The agent selects a category and a subcategory for the interaction and adds any notes from the conversation. While the category fields enable users to create queries to identify records that fall into different buckets, it is difficult to extract insights from the notes fields themselves - especially at scale.

Example:

note_id	category	subcategory	Notes
1	Billing	Policy	The customer called in and said they were dropping us. They were disappointed with the lack of transparency on the website.
2	Claims	Status	The customer wasn’t sure how to find the claim status on the website and is also considering shopping around for new insurance.
3	Other	Other	The customer called to express her gratitude for Jeff and mentioned he was the best insurance representative she had ever dealt with.

Let’s consider the above records. Sure, we can easily write a query to identify how many Billing or Claims calls came in, but what happens if more than one category applies to a call? How can we identify calls that discuss cancellations if the category field is not ‘Cancellation’? How can we get a better view of issues negatively affecting the customer experience on our website? This is the type of analysis our customer wanted to be able to run and what drove us to our solution.

The Solution

We required a solution that allowed users to easily query records for key information that may get lost inside the notes fields. To do so, we take advantage of the Cortex complete function to task an LLM with identifying whether or not key topics appear during a discussion. These ‘yes’ or ‘no’ responses can then be extracted into columns where the new data points are ready for analysis.

Using our example from above, let's say we want to capture information on whether or not a call discusses cancellations, website questions, or unhappy customers. By using Snowflake's Cortex complete function, we are able to make the following transformation:

Before

note_id	category	subcategory	notes
1	Billing	Policy	The customer called in and said they were dropping us. They were disappointed with the lack of transparency on the website.
2	Claims	Status	The customer likes her service but is considering shopping around for cheaper rates.
3	Other	Other	The customer was frustrated with how difficult filing a claim is on the website.

After

note_id	category	subcategory	notes	is_cancel	is_website	is_unhappy
1	Billing	Policy	The customer called in and said they were dropping us. They were disappointed with the lack of transparency on the website.	1	1	1
2	Claims	Status	The customer likes her service but is considering shopping around for cheaper rates.	1	0	0
3	Other	Other	The customer was frustrated with how difficult filing a claim is on the website.	0	1	1

With a few more steps, and using 100% Snowflake native features, we were able to fully automate this process for production workloads. We simply added the code for the above transformation into a stored procedure, created a stream on the base table, and created a task to execute the procedure whenever new records arrive in the base table.

The Result

In discussions with the end users, we pinpointed ten key category flags that would be most advantageous for the LLM to recognize. For our initial phase, we processed a sample of 50,000 records. Here are some highlights of the value this process added.

No more two-category restrictions.

- 90% of the records had at least one of the ten flags marked as true, and over 50% of the records had at least two flags marked as true.

Key topics are no longer missed.

- Before running the process, roughly 10,000 records were marked as ‘Cancellation’ records using the category and subcategory fields. With our process, over 8,000 additional cancellation records were identified. Each of these interactions is an opportunity to potentially intervene and retain customers - that previously went unidentified.

Adding labels for vague or unclear categories.

Several records had either ‘Unknown’ or ‘Other’ for both their primary and subcategory fields. Our process was able to identify at least one additional category for over 85% of these records.

While this new feature allows users to identify which categories an interaction falls into more easily, it also opens the door for previously impossible analytics. For example, if we want to quickly identify website issues that are hurting the customer experience, we could query for records that have the ‘Website Question’ and ‘Unhappy Customer’ flags marked as “true.” We could then run an additional Cortex process on top of these to identify common themes throughout this subset of notes.

Or what if we want a better picture of why customers are canceling even if they are not unhappy with our service? Do we see a high correlation between website questions and late payments? How can we use this information to reduce overall call volume - and, ultimately, our business costs?

Wrapping it up

Cortex allows teams to shift their analytical bottleneck from technological restrictions to their users’ creativity, and I was pleasantly surprised at how easy Snowflake makes it. I was able to get a working POC up and running for this project in a little under 2 hours.

While this was an impactful use case, it is just scratching the surface of what Cortex can do. I’m excited to continue diving deeper, and am eager to hear how others are incorporating Cortex into their own workloads as well. Stay tuned for more insights, or contact us today to learn more about how Ippon can help your organization!

Tags:

Post by Adam Tobin
Oct 29, 2024 2:45:00 AM

Breaking Analytical Barriers with Snowflake Cortex

A Common Dilemma

The Use Case

Example:

The Solution

Before

After

The Result

Wrapping it up

Comments

Subscribe to Our Blog!

Categories

Recent Posts

Navigation

Services

Ippon International

Contact

Breaking Analytical Barriers with Snowflake Cortex

A Common Dilemma

The Use Case

Example:

The Solution

Before

After

The Result

Wrapping it up

Related Articles

Snowflake Introduces UBAC - What it Means For RBAC

Snowflake Summit 2025 - Platform Keynote Feature Announcements

Snowflake Notebooks QuickStart: Ingesting Data via External APIs

Comments

Subscribe to Our Blog!

Categories

Recent Posts

Navigation

Services

Ippon International

Contact