Post-Image

Creating ADLS Gen2 Items

Often, as part of build and release pipelines, one needs to interact with the data lake in order to create items such as files and folders. The goal of this post is to examine how to do this with Azure DevOps.

One really cool feature about Azure DevOps is service connections. These service connections allow you to securely manage access to resources, such as the Azure management plane, and pass that “access” to your Azure DevOps tasks. This is particularly useful when using Azure powershell task. In the not too recent past, there were no Azure powershell cmdlets for Azure Data Lake Gen2, and so, you had to make use of the rest api. This is no longer the case.

Steps for creating a pipeline that interacts with ADLS Gen2:

  1. Create a service principal inside the portal
  2. Grant appropriate access either via Azure RBAC or POSIX permissions to your data lake
  3. Create a service connection with the required information
  4. Make use of the Azure Powershell task to interact with the lake.

Here is an example Azure DevOps task:

  - task: AzurePowerShell@4
    inputs:
      azureSubscription: $(connectionName)
      scriptType: 'filepath'
      scriptPath: DeployFile.ps1
      scriptArguments: -filePath $(filePath) -destinationStorageAccountName $(storageAccountName) -destinationStorageAccountFolder $(storageAccountFolder) -destinationFileSystemName $(fileSystemName)
      azurePowerShellVersion: latestVersion

And here is the corresponding powershell:

param(
    [Parameter(Mandatory=$true)
    [string]$filePath,
    [Parameter(Mandatory=$true)
    [string]$destinationStorageAccountName,
    [Parameter(Mandatory=$true)
    [string]$destinationStorageAccountFolder,
    [Parameter(Mandatory=$true)
    [string]$destinationFileSystemName
)

Write-Host ("##[command]Installing modules")
if (-not (Get-Module -ListAvailable -Name Az.Storage)){
    Install-Module Az.Storage -Repository PSGallery -Force -Scope CurrentUser
}

if (-not ($destinationStorageAccountFolder.EndsWith("/"))){
    $destinationStorageAccountFolder = $destinationStorageAccountFolder + "/"
}


    $file = Get-Item -Path $filePath

    Write-Host ("##[command]Processing file {0}" -f $file.FullName)
    $destinationFilePath = $destinationStorageAccountFolder + $file.Name
    Write-Host ("##[command]Processing file {0}" -f $destinationFilePath)
    $dataLakeFile = Get-AzDataLakeGen2Item -FileSystem $destinationFileSystemName `
                                    -Path $destinationFilePath `
                                    -Context $storageContext `
                                    -ErrorAction SilentlyContinue
    
    if ($dataLakeFile){
        Remove-AzDataLakeGen2Item -FileSystem $destinationFileSystemName `
                                    -Path $destinationFilePath `
                                    -Context $storageContext `
                                    -Force
    }
    New-AzDataLakeGen2Item -Context $storageContext `
                            -Path $destinationFilePath `
                            -FileSystem $destinationFileSystemName `
                            -Source $file.FullName `
                            -Force

You’ll notice in the code above that I remove a file if it already exists, prior to updating. At time of writing, overwriting a file doesn’t work properly using these cmdlets and fails with an error saying the “Overwrite permission is not allowed”.

With the approach above, you can easily add ADLS Gen2 environment setup steps to your build/release pipelines. You can find all the cmdlet references for gen2 here.

 

Share This Article

Comments