One of the first tasks when getting started with Snowflake is to make data from your existing data sources available for consumption in the platform. There are a couple of different ways of doing this, but today we will focus on creating a storage integration, specifically with Microsoft Azure.
For those with a web application background, you can think of a storage integration as the configuration of a service account between your application and your back-end data store. Effectively what happens is that Snowflake creates an Azure Active Directory backed service principal, which it controls, and prompts you to provide the appropriate access to you azure resources. The Storage integration itself is just a representation of the authentication configuration for a given target.
Here is what the configuration for a storage integration looks like:
CREATE STORAGE INTEGRATION azure_int TYPE = EXTERNAL_STAGE STORAGE_PROVIDER = AZURE ENABLED = TRUE AZURE_TENANT_ID = 'a123b4c5-1234-123a-a12b-1a23b45678c9' STORAGE_ALLOWED_LOCATIONS = ('*') STORAGE_BLOCKED_LOCATIONS = ('azure://myaccount.blob.core.windows.net/mycontainer/path3/', 'azure://myaccount.blob.core.windows.net/mycontainer/path4/');
A couple of notes:
From a security perspective, these external storage integration objects form some of the outbound network security rules you’ll want to make sure you get right. Connecting storage integration objects to the wrong storage accounts could lead to sensitive data leaving the secured boundaries you are creating. Further, due to the fact that a single service principal is being used across all storage integration objects, you’ll want to plan out usage permissions to roles appropriate.
If I were coming up with a security rule set for this integration type, it would look something like the following:
You will likely want to err on the side of more storage integration objects than less, separating them out by business unit/function and also possibly by environment (development vs test vs production). This is to support granular grant permissions access downstream in the snowflake process.
Creation of all your storage integration objects and relevant grants should be automated and stored in version control. You could also create some tests scripts to examine these statements to enforce naming convention and the fact that storage allowed location should never be *.
Even though snowflake assists with managing access to storage integration objects, the service principal used in the process should be given the least privilege to Azure resources. This could include making use of granular POSIX permissions on your Azure Data Lake Gen2 than giving super permissive roles such as Storage Blob Reader via Azure RBAC.
Audits and alerts should be created to monitor create/update/delete activities on storage integration objects themselves, and also on the grant statements involving those objects.
There are many ways to move data in and out of Snowflake. As with any cloud service, there are still customer responsibilities when it comes to data security, and monitoring/managing storage integration objects is one of them. Hopefully this article helped outline some the actions one needs to take.
Shamir is a Microsoft Most Valuable Professional (MVP – Azure) and has extensive experience building solutions in the cloud, from strategy to deployment to automation