Snowflake Storage Integration and Azure

Snowflake Storage Integration and Azure

By Shamir Charania
on September 26, 2020

One of the first tasks when getting started with Snowflake is to make data from your existing data sources available for consumption in the platform. There are a couple of different ways of doing this, but today we will focus on creating a storage integration, specifically with Microsoft Azure.

What is a Storage Integration?

For those with a web application background, you can think of a storage integration as the configuration of a service account between your application and your back-end data store. Effectively what happens is that Snowflake creates an Azure Active Directory backed service principal, which it controls, and prompts you to provide the appropriate access to you azure resources. The Storage integration itself is just a representation of the authentication configuration for a given target.

Here is what the configuration for a storage integration looks like:

CREATE STORAGE INTEGRATION azure_int
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = AZURE
  ENABLED = TRUE
  AZURE_TENANT_ID = 'a123b4c5-1234-123a-a12b-1a23b45678c9'
  STORAGE_ALLOWED_LOCATIONS = ('*')
  STORAGE_BLOCKED_LOCATIONS = ('azure://myaccount.blob.core.windows.net/mycontainer/path3/', 'azure://myaccount.blob.core.windows.net/mycontainer/path4/');

A couple of notes:

You can create one or more storage integration objects, however, they all make use of the same service principal tied to your storage account.
You can grant usage on particular storage integration objects to particular roles via the grant usage command
Allowed locations can be used to govern where in Azure this particular storage integration can be used
Denied locations provides a block list of locations where you don’t want this storage integration to be used

Some Security Notes

From a security perspective, these external storage integration objects form some of the outbound network security rules you’ll want to make sure you get right. Connecting storage integration objects to the wrong storage accounts could lead to sensitive data leaving the secured boundaries you are creating. Further, due to the fact that a single service principal is being used across all storage integration objects, you’ll want to plan out usage permissions to roles appropriate.

If I were coming up with a security rule set for this integration type, it would look something like the following:

Create fit-for-purpose/role-based storage integration objects

You will likely want to err on the side of more storage integration objects than less, separating them out by business unit/function and also possibly by environment (development vs test vs production). This is to support granular grant permissions access downstream in the snowflake process.

Automate creation of storage integration objects along with grants

Creation of all your storage integration objects and relevant grants should be automated and stored in version control. You could also create some tests scripts to examine these statements to enforce naming convention and the fact that storage allowed location should never be *.

Least Privilege

Even though snowflake assists with managing access to storage integration objects, the service principal used in the process should be given the least privilege to Azure resources. This could include making use of granular POSIX permissions on your Azure Data Lake Gen2 than giving super permissive roles such as Storage Blob Reader via Azure RBAC.

Audit

Audits and alerts should be created to monitor create/update/delete activities on storage integration objects themselves, and also on the grant statements involving those objects.

Conclusion

There are many ways to move data in and out of Snowflake. As with any cloud service, there are still customer responsibilities when it comes to data security, and monitoring/managing storage integration objects is one of them. Hopefully this article helped outline some the actions one needs to take.

About Shamir Charania

Shamir Charania, a seasoned cloud expert, possesses in-depth expertise in Amazon Web Services (AWS) and Microsoft Azure, complemented by his six-year tenure as a Microsoft MVP in Azure. At Keep Secure, Shamir provides strategic cloud guidance, with senior architecture-level decision-making to having the technical chops to back it all up. With a strong emphasis on cybersecurity, he develops robust global cloud strategies prioritizing data protection and resilience. Leveraging complexity theory, Shamir delivers innovative and elegant solutions to address complex requirements while driving business growth, positioning himself as a driving force in cloud transformation for organizations in the digital age.