Snowflake Notebooks QuickStart: Ingesting Data via External APIs
In late 2024, Snowflake Notebooks became generally available for all users. While this new feature is popularly advertised as a great tool for AI/ML workloads, my favorite aspect of notebooks is that it’s yet another feature released by Snowflake that allows users to manage and handle all of their data workloads directly inside of their Snowflake environment.
In this article, I will demonstrate how to pull data directly into Snowflake from external APIs in three easy steps using notebooks. For our example, we will be using data from consumerfinance.gov with the help of their free API (https://cfpb.github.io/api/ccdb/api.html). We will be ingesting complaint data for Walmart for no other reason than the fact that it’s the first company that came to mind!
Prerequisites
If you do not have access to the ACCOUNTADMIN role, it is required that the role you intend to use has the following privileges:
- CREATE NOTEBOOK privilege on a location.
- CREATE TABLE on a location.
- USAGE on a warehouse.
- CREATE INTEGRATION on the account.
STEP 1: Create the External Access Integration
By default, Snowflake blocks all communications with external URLs unless access is explicitly granted through an access integration object. That is, your request via the API will fail as long as this has not been configured. Fortunately, this can easily be done directly inside a SQL worksheet on Snowsight.
Users must first create a network rule that specifies the different URLs that are allowed:
Once the network rule is set up, users simply need to create an External Access Integration object with this network rule listed under the ALLOWED_NETWORK_RULES parameter:
STEP 2: Create the notebook and enable the Integration.
The next step is to create the notebook. This can be done by selecting “notebooks” under the “projects” icon on the left side of the screen in Snowsight. Make sure that you have the correct role selected (bottom left of your screen) and that you choose the appropriate location and warehouse for your notebook.
Once the notebook has been created, users must enable the external access integration. Do so by clicking the three dots at the top right of your screen and choosing ‘notebook settings’. Next, click ‘external access’ and toggle on your newly created integration. If you do not see your integration, ensure that you have the correct role chosen and that your chosen role has the usage privilege on the integration.
STEP 3: Time to code!
Users should have the ‘requests’ package installed by default, but if not, you can do so by navigating to the ‘packages’ drop-down and searching for ‘requests’ under the Anaconda tab.
Once you verify that all required packages are installed, you can begin writing the code. Below I have posted the full code, followed by explanations of what each section is doing.
Full Notebook Code
First, we get our imports and create the session object.
get_active_session allows us to interact with Snowflake through our current session, requests enables us to gather the data via the API, and pandas and json will help us prepare that data to write to a Snowflake table.
Next, we make our request for data through the API
This request pulls all data for complaints that have ‘Walmart’ as a keyword. If you are using a different API, refer to the appropriate documentation to determine which URL format should be used.
Next, we prepare the data for writing to a Snowflake table.
Note that these steps may vary slightly depending on how your specific data is returned. Users may want to print their data after the first line of code is run from this section to better understand the structure of the data.
For this example, we utilize json_normalize as our response includes nested JSON values. We also include lines to format our column names exactly how we want them to appear in our final Snowflake table. I have found that unless column names are in all upper case, Snowflake will require users to use quotation marks around the column names when querying the table.
Finally, we create a snowflake data frame for its ability to write directly to a table.
Since this notebook was created in the DEMO.PUBLIC schema, this code will create a table called ‘complaint_data’ in the DEMO.PUBLIC schema. With the ‘overwrite’ parameter, Snowflake will replace the table if it already exists.
Final Steps
You should now see your data in Snowflake by querying the table. After confirming that the process was successful, users can run the notebook as needed on an ad-hoc basis or schedule a task to run the notebook at a specific interval using the scheduling option from the notebook’s UI.
If your table was not created, or you ran into errors along the way, I have compiled a list of the most common reasons that may be preventing a successful run:
- The Access Integration is not enabled or was configured incorrectly.
-
If this is the case, users will receive an error indicating that the web address could not be found.
-
- The API requires a key
-
In some cases, even APIs that are free require users to generate and provide a key when making requests. Users should refer to the corresponding documentation for their API to resolve this issue.
-
- Incorrect Snowflake Configuration
-
Ensure that you have selected the correct role when creating the notebook. This is easy to miss, as the role used when navigating to the UI for creating a notebook does not always match the last role a user had activated inside of a worksheet. Remember, this is the role in the bottom left of the UI.
-
Also, double-check that the table you are trying to create does not already exist with a different role as the owner. Snowflake will not allow you to drop and recreate any tables not owned by your role.
-
Conclusion
While consuming API data may be straightforward for many data professionals, leveraging entirely native Snowflake features for this task offers significant advantages. Traditionally - the storing, executing, and scheduling of data ingestion tasks involves multiple steps across various external tools. However, for many workloads, this process can be streamlined using notebooks - minimizing failure points, lowering the technical barrier, and perhaps most importantly, enhancing security with Snowflake’s built-in protections.
Tags:
Snowflake, Data, SQL, Cloud, Containerization, business strategy, digital transformation, cloud strategy, Cloud MaturityMar 20, 2025 12:30:00 AM
Comments