Post-Image

Building your own Azure Databricks CLI

We are continuing on with our discussion about devops and security concerns with Azure databricks. In this post, we focus on building our own databricks cli.

From a control plane perspective, databricks has a REST API that backs all management operations. You can read more about the API by going here.

In order to make automation easier, databricks has a CLI that you can use to. The CLI is written in python and you can find out more here. As per the official documentation, this CLI is “experimental” and you should use it at your own risk.

I am generally a big fan of CLIs but many CLIs have some inherent issues with their design. The biggest concern I have is the use of plain text credential files with personal access tokens (or equivalent). Unfortunately, the databricks CLI is no different.

So, lets talk about some steps to build your own databricks CLI, in my case, using the only shell that has “power” in the word.

Let’s talk about authentication

As I mentioned in a previous post, users in a databricks workspace can exist in two ways. The first is directly in the workspace itself and the second is where users/permissions are inherited from Azure. In the first case, you have to use/create personal access tokens for the users you would like to have access.

You can read more about token access here. Here is some sample PowerShell code to get you started:

class DatabricksUtility{
    [string]$url
    hidden [PSCredential]$token

    [object]GetHeaders(){
        $headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
        $headers.Add("Authorization", "Bearer {0}" -f $this.token.GetNetworkCredential().Password)
        $headers.Add("Content-Type", "application/json")
        return $headers
    }

    DatabricksUtility([string]$url,[PSCredential]$token){
        $this.url = $url
        $this.token = $token
    }

    [object]ExecuteCall([string]$uri,[string]$method){
        $response = Invoke-WebRequest -Method $method -Uri $uri -Headers $this.GetHeaders() -Verbose
        Write-Verbose ("RESPONSE: {0}" -f $response)
        return ($response | ConvertFrom-Json)
    }
}

The other way to handle authentication is to use azure active directory, which is currently a preview feature. I’ll try and do up a post on this feature after I get a chance to use it. From a security perspective, you will likely want to remove user access to create personal access tokens. As such, AAD authentication would be the only way to continue forward from an automation perspective.

Using the REST API

As I mentioned earlier, the CLI is effectively just a wrapper on the REST API. Because of this, you are going to be doing a lot of JSON manipulation to make things work. Observing the REST API, you will notice that there are lots of calls that require that you know the ID of the resources you are creating.

For example the clusters get call requires that you know the cluster_Id. As you can expect, this cluster ID isn’t something you can specify, but something that is assigned by the workspace.

I decided to create some convenience methods that would use list commands (and the parsing of those lists) instead of relying on knowing the cluster ID. While not ideal as workspace sizes increase, it works for now. Here is a sample.

    [object]GetClusterByName([string]$clusterName){
        $clusters = $this.ListClusters()
        return $clusters.clusters.Where({$_.cluster_name -eq $clusterName})
    }

    [object]ListClusters(){
        $uri = "https://{0}/api/2.0/clusters/list" -f $this.url
        return $this.ExecuteCall($uri,"Get")
    }

This will allow you to build methods/tooling that might make more sense than relying on knowing the cluster IDs.

Introducing the concepts of whatif

If you’ve used PowerShell for any length of time, you will understand the power of whatif. Effectively, whatif is a flag that scripts you write should support that allows users to test the execution of the script without making any changes.

Since the databricks CLI is effectively a REST API wrapper, most of the operations are conducted by passing JSON payloads back in forth. As such, you can easily augment methods to verify the current running configuration against ones that have been passed in, and then, report on what it would change if the application was run without the whatif flag.

In this article, we talked about creating our own databricks CLI so that we could extend the functionality of the existing client. Depending on your scenario, something like this may come in handy. Hope you enjoyed!!

 

Share This Article

Comments