Generic placeholder image

Azure ADF and the ARM Testing Toolkit

Recently I was asked to work on a deployment pipeline for ADF. It turns out that deployment pipelines in ADF are quite easy due to the nature of the entire service being built/programmed by ARM api calls, which can ultimately be templated. You can read more in depth about the desired process here (https://docs.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment). For those who don’t like to click links, the process goes like so: Developers work in feature branches to use the UI to craft their pipelines and dependent resources Changes are merged (by magic surely, or pull requests if you are a pro) to the “master” branch Changes in the master branch can then be published by hitting the publish button (fancy) The ADF service updates an “adf_publish” branch with a set of ARM templates that represents the current state of the “code” CD pipelines can be created to take those arm templates and deploy in target environments This is all well in good, but what if we want to introduce tests on top of our ADF code.

Read more
Generic placeholder image

Communication Security and Azure Databricks

We are going to continue on exploring security considerations with Azure Databricks. In this post, I’m going to touch on communication security and how it is handled within the various components that make up the Azure Databricks service. Communication security is particularly important when using cloud services. Most regulations and cyber security frameworks will expect that data (sensitive in nature anyways) be protected both at rest and in transit. What they typically mean by that is that both the integrity and the confidentially of data is protected while it is transmitted.

Read more
Generic placeholder image

Exploring Azure Databricks Permissions

We are continuing on with our discussion about devops and security concerns with Azure Databricks. In this post, we will talk about setting up granular permissions inside of Azure Databricks. By default, particularly with workspaces in the standard tier, all users have access to all resources within the workspace. By resources, I mean specific Databricks “objects” such as directories, notebooks, clusters, pools, jobs and tables. Luckily, Azure Databricks offers a premium plan, which allows administrators to configure custom role-based access controls based on the permissions API.

Read more
Generic placeholder image

Azure Databricks and SCIM Integration

We are continuing on with our discussion about devops and security concerns with Azure Databricks. In this post, we are focusing on user provisioning in Azure Databricks by way of the System for Cross-domain Identity Management (SCIM). As a quick refresher, the Azure Databricks “service” is really just a multi-tenanted application. The application itself is secured by Azure Active Directory, which means that it uses that source as an authentication provider.

Read more
Generic placeholder image

Crying Over a Pool of Spilt IOPS

You may have the crazy idea that you can move your high performing database server, running on your all-flash SAN, to the cloud and expect it to perform the same for a tenth of the price. Boy, are you in for an awkward moment trying to explain why the migrated server isn’t performing the way you planned or is costing you more. Use Premium Disk Premium disks do buy you some improvement over standard HDDs and SSDs, but the disk alone is generally nowhere close to the performance you get from a dedicated SAN.

Read more
Generic placeholder image

Building your own Azure Databricks CLI

We are continuing on with our discussion about devops and security concerns with Azure databricks. In this post, we focus on building our own databricks cli. From a control plane perspective, databricks has a REST API that backs all management operations. You can read more about the API by going here. In order to make automation easier, databricks has a CLI that you can use to. The CLI is written in python and you can find out more here.

Read more