Azure Databricks implementations follow a fairly typical pattern in terms of a multi-user application deployment. The initial phases are all about delivering some type of business value that attracts users to the platform and encourages them to use it. Once you’ve made it that far, the focus often shifts to understanding how the service is being used. Data driven organizations/projects use this information to fuel decisions around optimization and further feature development. While the Azure Databricks platform offers some built-in reporting capabilities, they are generally operational in nature. The goal of this post is to discuss how to use diagnostic logs to gain insight in to how users are making use of your Azure Databricks workspaces. Diagnostic logs are a great way to aggregate this data across workspaces to get a holistic view of use.
Diagnostic logs, sometimes called resource logs, is a feature of Azure services whereby those services emit information about activities that occur “on the data plane” of that Azure service. It is important to note that diagnostic logs are service specific, and each service has a different set of information that can be emitted. While you can forward these logs to a variety of destinations, typically one uses log analytics as a destination.
You can read more about these logs here.
In the case of Azure Databricks, the following logs are forwarded:
A couple of key notes:
If you want to learn more about how to enable diagnostic logs, please see this link.
Let’s discuss a couple of these logs in more detail.
The Databricks accounts log captures events surrounding logins to your various workspaces. While most logins will either be of type “Microsoft.Databricks/accounts/tokenLogin” or “Microsoft.Databricks/accounts/aadBrowserLogin” I’d expect more login types soon as features such as ServicePrincipal login comes out of preview.
One interesting query here is to group the logins via UserAgent.
DatabricksAccounts | summarize count() by UserAgent
You can see that for this particular workspace, most of the usage is Apache-HTTPClient which I can only assume is the databricks workspace itself. This is likely tied to how jobs are handled/run in the backend. The second is AzureDataFactory which is likely due to Data Factory being used as the ELT orchestration engine.
You can also run other queries to figure out who these users are by looking at the identity field. For example:
DatabricksAccounts | where ActionName contains "login" | extend d=parse_json(Identity) | project UserEmail=d.email, SourceIPAddress
The notebooks diagnostic logs can be interesting when you want to understand how notebooks are being used in the organization. Typically, one creates a couple of workspaces for different purposes. An analytics workspace is more for analysts doing one-off work, whereas your engineering workspaces are more to support your ETL processes. As such, I’d expect to see lots of creates in the analytics workspaces, but little to none in my engineering workspaces (unless a code deploy was done).
Some sample queries that could provide some insight:
DatabricksNotebook | where ActionName == "createNotebook" | summarize count() by ResourceId
It would help you understand where the majority of your notebook creation is happening. If you wanted to understand who is creating these notebooks, you could run something like:
DatabricksNotebook | where ActionName == "createNotebook" | summarize count() by Identity
You may want to also understand which notebooks the operations are occurring on. You could use a query like the following to pull out the notebook name:
DatabricksNotebook | extend requestJSON = parse_json(RequestParams) | project notebookName = requestJSON.path
There is a wealth of information that is contained in these diagnostic logs. A key part of your infrastructure deployment should be to forward these logs to a log analytics workspace so you can start to perform this type of analysis. Hopefully this post was enough to get you started on your journey.