Snowflake Cortex: Insights from an Insurance Use Case

Written by Adam Tobin | Nov 12, 2024 7:43:24 PM

If you haven’t read my last blog, I recently completed an incredible project where I had the opportunity to utilize Snowflake’s Cortex AI capabilities. This article highlights the insights, tips, and pain points I discovered while working with this new feature.

It's Fast. It's Easy

Before we dive deep into the weeds, I want to point out that one of the most refreshing parts of working with Cortex is how easy and fast it is to develop solutions. Prior to this project, I worked on another AI project for a hedge fund company where we used a much more complex tech stack, including Langchain, where we spent several days just setting up the framework to be able to begin developing our solution.

With Cortex, I was able to get a working POC for our solution in a little under 2 hours. Having all of the AI capabilities built right into Snowflake, including the ability to utilize the power of an LLM with basic SQL functions, gave me more time to work on the logic as opposed to getting the LLM to work.

Token Considerations

With Snowflake’s Cortex LLM functions, users are charged based on credits per 1 million tokens (Snowflake Consumption Table). Tokens are a combination of characters that an LLM uses to decompose text and can be loosely thought of as the number of grouped characters in a text field.

For Cortex LLM functions, tokens from both the prompt and the response are included in the final cost. A prompt can be broken down further into the instructions and the context. Let’s take a look at a quick example to make sure this is clear.

Instruction:

“Who is the author of this input? Respond with only the author’s name.”

Context:

“Title: Adam’s Book; Author: Adam Tobin.”

Response:

“Adam Tobin.”

In the above example, tokens from each of these three texts are added together to come up with the final cost. With this in mind, I used a few different strategies to help reduce the total cost of our LLM process.

Summarize

Summarize lengthy note fields before processing to reduce overall cost.

In my project, I analyzed data from notes, which were stored in free-form text fields. These varied in length and could be anywhere from 0 to over 5000 characters long. In short, this means the cost of running the Complete function is heavily dependent on how large the note data was.

One strategy to compensate for the cost of the extremely large text fields is to first summarize the data before passing it to the Complete function. The Cortex Summarize function only costs 0.10 credits per 1 million tokens compared to the 1.21 credits for the Complete function. While we risk losing some accuracy by summarizing, we also gain the advantage of only analyzing the most critical parts of a note and dismissing any unnecessary noise. Not to mention, this strategy can yield significant cost savings by reducing the number of tokens processed by the Complete function.

Let’s do some math to see how running the Summarize function first can reduce our overall cost. It’s important to note that the Summarize function uses an implicit prompt with an unknown number of characters, which we can estimate to be roughly 25 tokens.

Example:

Suppose we want to run the Complete function against a text field that is 5000 tokens long. The instruction text we supply is 500 tokens long, and our prompt is constructed in a way that guarantees our response will be 100 tokens long. We will be using the llama3-70b model, which charges 1.21 credits per 1 million tokens.

Method 1: Complete

Equation for Complete Cost

I = Instruction tokens, C = Context tokens, R = Response tokens

((I + C + R)1.21)1000000

-> ((500 + 5000 + 100)1.21)1000000 = 0.006776

Using the Complete function by itself, this run will cost roughly 0.0068 credits.

Method 2: Summarize before Complete

Summarize function:

I = Instruction tokens, C = Context tokens, S = Summary tokens (response)

((I + C + S)0.1) 1000000

-> ((25 + 5000 + S)0.1) 1000000

Now, the actual cost of the Summarize function is dependent on how many tokens exist in the response (S). This is typically a small percentage of the original token count, but for argument’s sake, let’s say this is 2500 tokens.

Summarize: ((25 + 5000 + 2500)0.1)1000000 =0.0007525

Complete: ((500 + 2500+ 100)1.21)1000000 = 0.003751

Total: 0.0045

Compared to our original cost of .0068 credits, this may not seem like much of a difference. However, consider running this process on 100,000 records inside a Business Critical account, which charges $4 per credit.

Complete: 0.0068 * 100,000 * 4 = $2,720

Summarize + Complete: 0.0045 * 100,000 * 4 = $1,800

By summarizing before running the Complete function, users can save nearly $1,000 in LLM costs.

*Important note on the Summarize Function:

One behavior I noticed with the current state of the Summarize function is that there is no parameter for response length. The summary can be as short as a single sentence or as long as multiple paragraphs. Our team met with several Snowflake representatives who acknowledged the issue and mentioned they were working on a solution.

Know when to skip:

Understand which records should be skipped to avoid unnecessary costs.

Another cost-saving tip is to add a layer of logic that checks whether or not the LLM function should run in the first place. As we’ve already covered, users are charged for tokens in the prompt (instructions and context) as well as the response. Let’s take a look at a costly mistake below:

Instruction:

“You are a grammar expert tasked with analyzing how many grammatical mistakes are in the student’s submission below. The criteria for…”

Context:

“User submission deleted.”

Response:

“Zero."

For this context, we would not expect or even desire, a response from our LLM function. However, if we do not have any logic in place to prevent the LLM function from running on this record, we are charged for the prompt regardless. With relatively short instructions for your prompt, this may seem inconsequential. However, what if your prompt trails on and is over 1000 tokens long? Regardless of whether or not the LLM can generate a valuable response, you are still charged for the 1000 tokens.

At the very minimum, I recommend adding logic so that the Complete function only runs if the field is not null and if the field is over 15 characters long.

Count your tokens!

This last piece of advice for dealing with tokens is simple but important. Cortex comes with a built-in ‘count’ function that allows users to calculate the number of tokens in a given context.

Query:

select SNOWFLAKE.CORTEX.COUNT_TOKENS('llama3-70b', ‘George Washington’)

Response:

This is a great way for users to provide stakeholders with cost estimates, as well as to double-check their work before kicking off a potentially costly process.

Prepare... For Failure?

As with any project, it’s important that the appropriate steps are taken to prepare for any workloads you plan to process using Cortex. With typical Snowflake jobs, a failure results in the loss of credits consumed from a warehouse. With Cortex jobs, however, a failure may be much more costly depending on how much work has been lost.

Start small.

While it’s hard not to go full speed ahead with the excitement of working with new and powerful technologies, it’s important to slow down and start small with Cortex. Accidentally running a select query on a large table has little to no consequences. However, accidentally running LLM functions on a large table could cost you a decent chunk of your monthly budget. I urge users to start with small data sets and put restrictions in place before beginning any heavy lifting with Cortex.

Save your work.

If you are using Cortex inside of a large or complex job, I recommend saving the LLM responses to a separate table before further transformations take place. If a job fails after your LLM responses are generated, it is extremely beneficial not to need to reprocess this step, as it is likely one of your most expensive.

Plan for errors.

To avoid failures in the first place, ensure that you are using all error-handling techniques at your disposal. A common technique when working with LLMs is to generate responses in a JSON format so they are consistent and easily extracted. Snowflake has a built-in function called TRY_PARSE_JSON, which I highly recommend over the more commonly known PARSE_JSON function. This ensures that your process will return a NULL value for records that Snowflake is unable to parse rather than returning an error, which will fail your job if no other handlers are in place.

Wrapping it up

Of all the AI tools I have worked with, Cortex currently holds the top spot in my books. My biggest takeaway so far is how impressed I am with Snowflake’s ability to remove the barrier to entry by utilizing AI tools for the average user. Stay tuned for more insights, or contact us today to learn more about how Ippon can help your organization!

View full post