Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article provides an overview of how Azure Databricks redacts access keys and credentials in logs.
Credential redaction overview
Credential redaction is a critical security practice that involves masking sensitive information, such as passwords or API keys, to prevent unauthorized access. Azure Databricks redacts keys and credentials in audit logs and log4j Apache Spark logs to protect your data from information leaking. Azure Databricks automatically redacts cloud credentials and credentials in URI.
For some credential types, Azure Databricks adds a hash_prefix
, which is a short code generated from the credential using a method called MD5. This code is used to check that the credential is valid and hasn't been altered.
Cloud credentials redaction
Cloud credentials redacted might have one of several redaction replacements. Some say [REDACTED]
, while others might have more specific replacements such as REDACTED_POSSIBLE_CLOUD_SECRET_ACCESS_KEY.
Azure Databricks might redact certain long strings that appear randomly generated, even if they are not cloud credentials.
Credentials in URI redaction
Azure Databricks detects //username:password@mycompany.com
in URI and replaces username:password
with REDACTED_CREDENTIALS(hash_prefix)
. Azure Databricks computes the hash from username:password
(including the :
).
For example, Azure Databricks logs 2017/01/08: Accessing https://admin:admin@mycompany.com
as 2017/01/08: Accessing https://REDACTED_CREDENTIALS(d2abaa37)@mycompany.com
.