Exploring Azure Databricks Permissions

Exploring Azure Databricks Permissions

By Shamir Charania
on February 15, 2020

We are continuing on with our discussion about devops and security concerns with Azure Databricks. In this post, we will talk about setting up granular permissions inside of Azure Databricks.

By default, particularly with workspaces in the standard tier, all users have access to all resources within the workspace. By resources, I mean specific Databricks “objects” such as directories, notebooks, clusters, pools, jobs and tables. Luckily, Azure Databricks offers a premium plan, which allows administrators to configure custom role-based access controls based on the permissions API.

When you are creating production Databricks workspaces, you are likely going to have two main use-cases. The first is job specific. This workspace is used to run pre-created reports and functions that have followed some type of development process and have been promoted into production. The second type is going to be more for exploratory type processing. End users will want to experiment and play with the data, creating notebooks in an interactive fashion and examining the results.

From a job specific workspace perspective, you likely want to have creation of new notebooks, jobs, clusters, etc locked down to only approved CI/CD processes. Because these jobs will likely be using service principals, ensuring that users cannot just create notebooks and run them would be of extreme importance. Of course, you will still have support personnel who will need to monitor job execution and results. Azure Databricks role-based access control can help with this use case.

For interactive clusters, you will likely want to ensure that users have “safe” places to create their notebooks, run jobs, and examine results. Because the results of the notebooks are stored with the notebook themselves, you’ll want to create appropriate role-based access controls to ensure that only users with the same security clearance can see their outputs. You may also want to create “home” directories only accessible by the individual users. Again, role-based access control is a good fit here.

Permissions Architecture

From an architecture perspective, the permissions in Azure Databricks is quite simplistic. Each object within a Databricks workspace (for example a notebook) has a set of “permissions” that can be associated with it.

For example, notebooks can have the following permissions:

CAN_READ
- Users can view and comment on a notebook
CAN_RUN
- Users can view, comment and also attach/detach the notebook from a cluster. They can also run commands within that notebook
CAN_EDIT
- All the above plus the ability to edit the notebook
CAN_MANAGE
- All the above and can also change permissions on the notebook

These permissions can be assigned to the respective objects along with a user or a group. This typically manifests itself as adding an access control list to that particular object. Generically, it looks something like this:

{
	"access_control_list": [
	{
		"user_name":"<UserName>" || "group_name":"<GroupName>" ,
		"permission_level": "<PermissionLevel>
	}
	]
}

Conclusion

At time of writing, permissions can be used in premium tier workspaces with workspace access control enabled. It is editable via the portal experience, and, if you ask nicely, you may get access to a preview for setting the permissions via script.

About Shamir Charania

Shamir Charania, a seasoned cloud expert, possesses in-depth expertise in Amazon Web Services (AWS) and Microsoft Azure, complemented by his six-year tenure as a Microsoft MVP in Azure. At Keep Secure, Shamir provides strategic cloud guidance, with senior architecture-level decision-making to having the technical chops to back it all up. With a strong emphasis on cybersecurity, he develops robust global cloud strategies prioritizing data protection and resilience. Leveraging complexity theory, Shamir delivers innovative and elegant solutions to address complex requirements while driving business growth, positioning himself as a driving force in cloud transformation for organizations in the digital age.