Authors: Sarah Narum, Elle Estwick, Stephen Lowery, Emma York, Andy Lamora
While having a notebook is a great step forward for being able to explore and share data, one of the main advantages of notebooks is the ability to rapidly develop visualizations using python's robust package ecosystem. Snowflake provides a couple of ways to install more packages into our Snowflake notebook; for this example, we'll use Anaconda to install the packages we need.
To do so, simply navigate to your notebook inside of the Snowflake UI, and click on the 'Packages' tab on the top right of the page near the 'Start' button.
Users can then search for packages available through Anaconda and install them directly from this interface. After selecting the package, it may take several minutes for the package to be installed and ready for use inside of your notebook.
The second method for installing packages is through a Snowflake stage. This method should only be used for packages that are unavailable through Anaconda.
Users must first upload the package to a Snowflake stage, and then install the package in a similar manner as we did for Anaconda.
Note: Installing packages this way only works for Python packages Note: Wheel and tar.gz files are not currently supported.
Next, let's create a visualization of a query. See also: Snowflake Docs
For this example, we'll be using Seaborn.
Follow the steps in the above, Installing Packages, section to install seaborn
via Anaconda.
Once installed, the library can be imported and used within a cell—the below notebook cell imports the seaborn package to create a graph grouping movie release year by decade. The full script can be copied from here:
This brings the seaborn
library into the notebook runtime and sets up a Snowpark session that we can use to interact with Snowflake objects. Next, we can query to group the movie data we uploaded earlier into decades:
Next, we'll use seaborn to create our graph with the countplot()
method (see docs here)
With our chart initialized, we can now begin modifying other attributes to clean it up a bit. Let's configure the title and labels:
Then, we can add some polish by removing decimals from our labels (decades are integers, after all!):
Finally, we can run the cell and see our chart!