Filter sensitive table data using row filters and column masks
This article provides guidance and examples for using row filters, column masks, and mapping tables to filter sensitive data in your tables. These features require Unity Catalog.
What are row filters?
Row filters allow you to apply a filter to a table so that queries return only rows that meet the filter criteria. You implement a row filter as a SQL user-defined function (UDF). Python and Scala UDFs are also supported, but only when they are wrapped in SQL UDFs.
What are column masks?
Column masks let you apply a masking function to a table column. The masking function evaluates at query runtime, substituting each reference of the target column with the results of the masking function. For most use cases, column masks determine whether to return the original column value or redact it based on the identity of the invoking user. Column masks are expressions written as SQL UDFs or as Python or Scala UDFs that are wrapped in SQL UDFs.
Each table column can have only one masking function applied to it. The masking function takes the unmasked value of the column as input and returns the masked value as its result. The return value of the masking function should be the same type as the column being masked. The masking function can also take additional columns as input parameters and use them in its masking logic.
What is the difference between these filters and dynamic views?
Dynamic views, row filters, and column masks all let you apply complex logic to tables and process their filtering decisions at query runtime.
A dynamic view is an abstracted, read-only view of one or more source tables. The user can access a dynamic view without having access to the source tables directly. Creating a dynamic view defines a new table name that must not match the name of any source tables or other tables and views present in the same schema.
On the other hand, associating a row filter or column mask to a target table applies the corresponding logic directly to the table itself without introducing new table names. Subsequent queries can continue to refer directly to the target table using its original name.
Use dynamic views if you need to apply transformation logic such as filters and masks to read-only tables, and if it is acceptable for users to refer to the dynamic views using different names. If you want to filter data when you share it using Delta Sharing, you must use dynamic views. Use row filters and column masks if you want to filter or compute expressions over specific data but still provide users access to the tables using their original names.
Before you begin
To add row filters and column masks to tables, you must have:
- A workspace that is enabled for Unity Catalog.
- A function that is registered in Unity Catalog. This can be a SQL UDF, or a Python or Scala UDF that is registered in Unity Catalog and wrapped in a SQL UDF. For details, see What are user-defined functions (UDFs)?, Column mask clause, and ROW FILTER clause.
You must also meet the following requirements:
- To assign a function that adds row filters or column masks to a table, you must have the
EXECUTE
privilege on the function,USE SCHEMA
on the schema, andUSE CATALOG
on the parent catalog. - If you are adding filters or masks when you create a new table, you must have the
CREATE TABLE
privilege on the schema. - If you are adding filters or masks to an existing table, you must be the table owner or have both the
MODIFY
andSELECT
privileges on the table.
To access a table that has row filters or column masks, your compute resource must meet one of these requirements:
A SQL warehouse.
Shared access mode on Databricks Runtime 12.2 LTS or above.
Single user access mode on Databricks Runtime 15.4 LTS or above.
You cannot read row filters or column masks using single user compute on Databricks Runtime 15.3 or below.
To take advantage of the data filtering provided in Databricks Runtime 15.4 LTS and above, you must also verify that your workspace is enabled for serverless compute, because the data filtering functionality that supports row filters and column masks runs on serverless compute. You might therefore be charged for serverless compute resources when you use single user compute to read tables that use row filters or column masks. See Fine-grained access control on single user compute.
Apply a row filter
To create a row filter, you write a function (UDF) to define the filter policy and then apply it to a table. Each table can have only one row filter. A row filter accepts zero or more input parameters where each input parameter binds to one column of the corresponding table.
You can apply a row filter using Catalog Explorer or SQL commands. The Catalog Explorer instructions assume that you have already created a function and that it is registered in Unity Catalog. The SQL instructions include examples of creating a row filter function and applying it to a table.
Catalog Explorer
- In your Azure Databricks workspace, click Catalog.
- Browse or search for the table you want to filter.
- On the Overview tab, click Row filter: Add filter.
- On the Add row filter dialog, select the catalog and schema that contain the filter function, then select the function.
- On the expanded dialog, view the function definition and select the table columns that match the columns included in the function statement.
- Click Add.
To remove the filter from the table, click fx Row filter and click Remove.
SQL
To create a row filter and then add it to an existing table, use CREATE FUNCTION
and apply the function using ALTER TABLE
. You can also apply a function when you create a table using CREATE TABLE
.
Create the row filter:
CREATE FUNCTION <function_name> (<parameter_name> <parameter_type>, ...) RETURN {filter clause whose output must be a boolean};
Apply the row filter to a table using a column name:
ALTER TABLE <table_name> SET ROW FILTER <function_name> ON (<column_name>, ...);
Additional syntax examples:
Apply the row filter to a table using a constant literal that matches a function parameter:
ALTER TABLE <table_name> SET ROW FILTER <function_name> ON (<constant_literal>, ...);
Remove a row filter from a table:
ALTER TABLE <table_name> DROP ROW FILTER;
Modify a row filter:
Run a DROP FUNCTION statement to drop the existing function, or use CREATE OR REPLACE FUNCTION to replace it.
Delete a row filter:
ALTER TABLE <table_name> DROP ROW FILTER; DROP FUNCTION <function_name>;
Note
You must perform the
ALTER TABLE ... DROP ROW FILTER
command before you drop the function. If you don’t, the table will be in an inaccessible state.If the table becomes inaccessible in this way, alter the table and drop the orphaned row filter reference using
ALTER TABLE <table_name> DROP ROW FILTER;
.
See also ROW FILTER clause.
Row filter examples
This example creates a SQL user-defined function that applies to members of the group admin
in the region US
.
When this sample function is applied to the sales
table, members of the admin
group can access all records in the table. If the function is called by a non-admin, the RETURN_IF
condition fails and the region='US'
expression is evaluated, filtering the table to only show records in the US
region.
CREATE FUNCTION us_filter(region STRING)
RETURN IF(IS_ACCOUNT_GROUP_MEMBER('admin'), true, region='US');
Apply the function to a table as a row filter. Subsequent queries from the sales
table then return a subset of rows.
CREATE TABLE sales (region STRING, id INT);
ALTER TABLE sales SET ROW FILTER us_filter ON (region);
Disable the row filter. Future user queries from the sales
table then return all of the rows in the table.
ALTER TABLE sales DROP ROW FILTER;
Create a table with the function applied as a row filter as part of the CREATE TABLE
statement. Future queries from the sales
table then each return a subset of rows.
CREATE TABLE sales (region STRING, id INT)
WITH ROW FILTER us_filter ON (region);
Apply a column mask
To apply a column mask, create a function (UDF) and then apply it to a table column.
You can apply a column mask using Catalog Explorer or SQL commands. The Catalog Explorer instructions assume that you have already created a function and that it is registered in Unity Catalog. The SQL instructions include examples of creating a column mask function and applying it to a table column.
Catalog Explorer
- In your Azure Databricks workspace, click Catalog.
- Browse or search for the table.
- On the Overview tab, find the row you want to apply the column mask to and click the Mask edit icon.
- On the Add column mask dialog, select the catalog and schema that contain the filter function, then select the function.
- On the expanded dialog, view the function definition. If the function includes any parameters in addition to the column that is being masked, select the table columns that you want to cast those additional function parameters to.
- Click Add.
To remove the column mask from the table, click fx Column mask in the table row and click Remove.
SQL
To create a column mask and add it to an existing table column, use CREATE FUNCTION
and apply the masking function using ALTER TABLE
. You can also apply a function when you create a table using CREATE TABLE
.
You use SET MASK
to apply the masking function. Within the MASK
clause, you can use any of the Azure Databricks built-in runtime functions or call other user-defined functions. Common use cases include inspecting the identity of the invoking user running the function by using current_user( )
or getting the groups they are a member of using is_account_group_member( )
. For details, see Column mask clause and Built-in functions.
Create a column mask:
CREATE FUNCTION <function_name> (<parameter_name> <parameter_type>, ...) RETURN {expression with the same type as the first parameter};
Apply the column mask to a column in an existing table:
ALTER TABLE <table_name> ALTER COLUMN <col_name> SET MASK <mask_func_name> USING COLUMNS <additional_columns>;
Additional syntax examples:
Apply the column mask to a column in an existing table using a constant literal that matches a function parameter:
ALTER TABLE <table_name> ALTER COLUMN <col_name> SET MASK <mask_func_name> USING COLUMNS (<constant_name>, ...);
Remove a column mask from a column in a table:
ALTER TABLE <table_name> ALTER COLUMN <column where mask is applied> DROP MASK;
Modify a column mask: Either
DROP
the existing function, or useCREATE OR REPLACE TABLE
.Delete a column mask:
ALTER TABLE <table_name> ALTER COLUMN <column where mask is applied> DROP MASK; DROP FUNCTION <function_name>;
Note
You must perform the
ALTER TABLE
command before dropping the function or the table will be in an inaccessible state.If the table becomes inaccessible in this way, alter the table and drop the orphaned mask reference using
ALTER TABLE <table_name> ALTER COLUMN <column where mask is applied> DROP MASK;
.
Column mask examples
In this example, you create a user-defined function that masks the ssn
column so that only users who are members of the HumanResourceDept
group can view values in that column.
CREATE FUNCTION ssn_mask(ssn STRING)
RETURN CASE WHEN is_member('HumanResourceDept') THEN ssn ELSE '***-**-****' END;
Apply the new function to a table as a column mask. You can add the column mask when you create the table or after.
--Create the `users` table and apply the column mask in a single step:
CREATE TABLE users (
name STRING,
ssn STRING MASK ssn_mask);
--Create the `users` table and apply the column mask after:
CREATE TABLE users
(name STRING, ssn STRING);
ALTER TABLE users ALTER COLUMN ssn SET MASK ssn_mask;
Queries on that table now return masked ssn
column values when the querying user is not a member of the HumanResourceDept
group:
SELECT * FROM users;
James ***-**-****
To disable the column mask so that queries return the original values in the ssn
column:
ALTER TABLE users ALTER COLUMN ssn DROP MASK;
Use mapping tables to create an access-control list
To achieve row-level security, consider defining a mapping table (or access-control list). Each mapping table is a comprehensive mapping table that encodes which data rows in the original table are accessible to certain users or groups. Mapping tables are useful because they offer simple integration with your fact tables through direct joins.
This methodology proves beneficial in addressing many use cases with custom requirements. Examples include:
- Imposing restrictions based on the logged-in user while accommodating different rules for specific user groups.
- Creating intricate hierarchies, such as organizational structures, requiring diverse sets of rules.
- Replicating complex security models from external source systems.
By adopting mapping tables in this way, you can effectively tackle these challenging scenarios and ensure robust row-level and column-level security implementations.
Mapping table examples
Use a mapping table to check if the current user is in a list:
USE CATALOG main;
Create a new mapping table:
DROP TABLE IF EXISTS valid_users;
CREATE TABLE valid_users(username string);
INSERT INTO valid_users
VALUES
('fred@databricks.com'),
('barney@databricks.com');
Create a new filter:
Note
All filters run with definer’s rights except for functions that check user context (for example, the CURRENT_USER
and IS_MEMBER
functions) which run as the invoker.
In this example, the function checks to see if the current user is in the valid_users
table. If the user is found, the function returns true.
DROP FUNCTION IF EXISTS row_filter;
CREATE FUNCTION row_filter()
RETURN EXISTS(
SELECT 1 FROM valid_users v
WHERE v.username = CURRENT_USER()
);
The example below applies the row filter during table creation. You can also add the filter later using an ALTER TABLE
statement. When applying to a whole table use the ON ()
syntax. For a specific row use ON (row);
.
DROP TABLE IF EXISTS data_table;
CREATE TABLE data_table
(x INT, y INT, z INT)
WITH ROW FILTER row_filter ON ();
INSERT INTO data_table VALUES
(1, 2, 3),
(4, 5, 6),
(7, 8, 9);
Select data from the table. This should only return data if the user is in the valid_users
table.
SELECT * FROM data_table;
Create a mapping table comprising accounts that should always have access to view all the rows in the table, regardless of the column values:
CREATE TABLE valid_accounts(account string);
INSERT INTO valid_accounts
VALUES
('admin'),
('cstaff');
Now create a SQL UDF that returns true
if the values of all columns in the row are less than five, or if the invoking user is a member of the above mapping table.
CREATE FUNCTION row_filter_small_values (x INT, y INT, z INT)
RETURN (x < 5 AND y < 5 AND z < 5)
OR EXISTS(
SELECT 1 FROM valid_accounts v
WHERE IS_ACCOUNT_GROUP_MEMBER(v.account));
Finally, apply the SQL UDF to the table as a row filter:
ALTER TABLE data_table SET ROW FILTER row_filter_small_values ON (x, y, z);
Support and limitations
Row filters and column masks are not supported with all Azure Databricks functionality or on all compute resources. This section lists supported functionality and limitations.
Supported features and formats
This list of supported functionality is not exhaustive. Some items are listed because they were unsupported during the Public Preview.
Databricks SQL and Databricks notebooks for SQL workloads are supported.
DML commands by users with
MODIFY
privileges are supported. Filters and masks are applied to the data that is read byUPDATE
andDELETE
statements and are not applied to data that is written (includingINSERT
).Supported data formats:
- Delta and Parquet for managed and external tables.
- Multiple other data formats for foreign tables registered in Unity Catalog using Lakehouse Federation.
Policy parameters can include constant expressions (strings, numeric, intervals, booleans, nulls).
SQL, Python, and Scala UDFs are supported as row filter or column mask functions, as long as they are registered in Unity Catalog. Python and Scala UDFs must be wrapped in a SQL UDF.
You can create views on tables that reference column masks or row filters, but you cannot add column masks or row filters to a view.
Delta Lake change data feeds are supported as long as the schema is compatible with the row filters and column masks that apply to the target table.
Foreign tables are supported.
Table sampling is supported.
MERGE
statements are supported when source tables, target tables, or both use row filters and column masks. This includes tables with row filter functions that contain simple subqueries, but there are limitations, listed in the section that follows.Databricks SQL materialized views and Databricks SQL streaming tables support row filters and column masks (Public Preview):
- You can add row filters and column masks to a Databricks SQL materialized view or streaming table. This must be done declaratively when the materialized view or streaming table is defined. See CREATE MATERIALIZED VIEW or CREATE STREAMING TABLE.
- You can define Databricks SQL materialized views or streaming tables on tables that include row filters and column masks.
Materialized views and streaming tables declared and published in Delta Live Tables support row filters or column masks (Public Preview):
- You can add row filters and column masks to a Delta Live Tables materialized view or streaming table.
- You can define Delta Live Tables materialized views or streaming tables on tables that include row filters and column masks.
Performance Considerations
Row filters and column masks provide guarantees on the visibility of your data by ensuring that no users may view the contents of values of the base tables prior to filtering and masking operations. They are engineered to perform well in response to queries under most common use cases. In less frequent applications, where the query engine must choose between optimizing query performance and protecting against leaking information from the filtered/masked values, it will always make the secure decision at the expense of some impact on query performance. To minimize this performance impact, apply the following principles:
- Use simple policy functions: Policy functions with fewer expressions will generally perform better than more complex expressions. Avoid using mapping tables and expression subqueries in favor of simple CASE functions.
- Reduce the number of function arguments: Azure Databricks cannot optimize away column references to the source table resulting from policy function arguments, even if these columns are not otherwise used in the query. Use policy functions with fewer arguments, as the queries from these tables will generally perform better.
- Avoid adding row filters with too many AND conjuncts: Since each table only supports adding at most one row filter, a common approach is to combine multiple desired policy functions with
AND
. However, for each conjunct, the chances increase that conjunct(s) include components mentioned elsewhere in this table that could affect performance (such as the use of mapping tables). Use fewer conjuncts to improve performance. - Use deterministic expressions that cannot throw errors in table policies and queries from these tables: Some expressions may throw errors if the provided inputs are invalid, such as ANSI division. In such cases, the SQL compiler must not push down operations with those expressions (such as filters) too far down in the query plan, to avoid the possibility of errors like “division by zero” that reveal information about values prior to filtering and/or masking operations. Use expressions that are deterministic and never throw errors, such as
try_divide
in this example. - Run test queries over your table to gauge performance: Construct realistic queries that represent the workload you expect for your table with row filters and/or column masks and measure the performance. Make small modifications to the policy functions and observe their effects until you reach a good balance between performance and expressiveness of the filtering and masking logic.
Limitations
- Databricks Runtime versions below 12.2 LTS do not support row filters or column masks. These runtimes fail securely, meaning that if you try to access tables from unsupported versions of these runtimes, no data is returned.
- Delta Sharing does not work with row-level security or column masks.
- You cannot apply row-level security or column masks to a view.
- Time travel does not work with row-level security or column masks.
- Path-based access to files in tables with policies is not supported.
- Row-filter or column-mask policies with circular dependencies back to the original policies are not supported.
- Deep and shallow clones are not supported.
MERGE
statements do not support tables with row filter policies that contain nesting, aggregations, windows, limits, or non-deterministic functions.- Delta Lake APIs are not supported.
Single user compute limitation
You cannot access a table that has row filters or column masks from a single user compute resource on Databricks Runtime 15.3 or below. You can use single user access mode on Databricks Runtime 15.4 LTS or above, if your workspace is enabled for serverless compute. For more information, see Fine-grained access control on single user compute.