Notebooks in Synapse

Azure Synapse Analytics’ most appealing feature at first glance is the Synapse Studio. One unified UX across data stores, notebooks and pipelines. Notebook experience is appreciated the most among folks who read a load of data that takes minutes or hours to load then do operations on it whether in data engineering, feature engineering or ML training. The ability to divide your code into smaller chunks that you control which to execute when is a powerful productivity tool.

Added value that the notebook stores not only code by also the results of your code so to speak, it has now a data storage capacity that makes some organizations that are highly regulated or handles high confidential information worried about it.

Save output property

Luckily, that’s controllable by a per-notebook property SaveOutput , it’s enabled by default for any workspace that is not linked to git source control.

The main security concern comes when the workspace is linked to source control because now the source control repo has both the code and the data, that’s why this feature is disabled once you link your workspace, the screenshot below shows the configuration when linked to source control.

So If you have a concern, just link the workspace to source control and problem fixed.

However in the cases when you can’t link to source control for whatever reason, you have to ask the users to disable it per notebook as there’s no workspace-wide configurations. And that what inspired me to write this script. If you just want the script, head to the bottom of this article or keep reading to explain it.

The Dev endpoint

​ There are 3 endpoint for any workspace, SQL on demand endpoint, SQL dedicated endpoint, and these two are self explanatory. They are SQL endpoint for the built-in serverless logical server and the dedicated pool if you are created respectively. The third endpoint is the dev endpoint, there’s not much details for that one other than it’s the endpoint for anything else. What it means is it’s the API endpoint for any APIs that are specific to your workspace, in other meaning, anything under the data plane category of the Synapse APIs . That endpoint is specific to your workspace and it has the format of workspace-name.dev.azuresynapse.netIt’s important to understand that this endpoint has the same network capabilities like the other two so you can link it to a private endpoint to make sure the traffic comes only from your own vnets. For more information about the network security, refer to my youtube video.

The Notebook APIs

One API set exposed from the dev endpoint are the notebooks APIs. You can create/update, delete and get notebooks through these APIs. That’s what I leveraged to create the script.

The script…..disable SaveOutput

The Token

First before send our first API call, we should get access token to use it in the authentication. This token is different than the Azure resource manager token as the resource in this case is different, it’s should be a token for the dev endpoint

Write-Host "Getting token for workspace $workspaceName"
$token = (Get-AzAccessToken -ResourceUrl "https://$devDomain" -TenantId $tenantId).Token
return $token

Get all the notebooks in the workspace

using the GET /notebooksSummary APIs we will get all the notebooks names to loop through them.

Loop and update

Loop through the notebooks, get the full notebook details and change the saveOutput property to false

Write-Host "Notebook: $($notebook.name)"
$response = invokeREST -method GET -relativeUrl "/notebooks/$($notebook.name)" -body $null
# convert the response to a notebookDetails object
$notebookDetails = $response | ConvertFrom-Json

# Set the saveOutput flag to false
$notebookDetails.properties.metadata.saveOutput=$false

That doesn’t remove any output already written to the notebook, so if we want to remove what was already added, I’m using these lines

# Remove the state of the notebook
$notebookDetails.properties.metadata.synapse_widget=New-Object -TypeName object
# Remove the outputs of the notebook cells
foreach($cell in $notebookDetails.properties.cells) {
 $cell.outputs=@()
}

Finally before we send the update, make sure that we are updating the right notebook, not just by the name because there might be change in the names between the time we Get the notebook and the time we PUT the notebook so I use the eTag.

Also removing some json sections that are not expected from the PUT API

$headers = @{
    "If-Match" =  """$($notebookDetails.etag)"""
 }

# Remove id, type and etag properties from the notebook details
$notebookDetails.PSObject.properties.remove('id')
$notebookDetails.PSObject.properties.remove('type')
$notebookDetails.PSObject.properties.remove('etag')

The complete script