Prepare MySQL for ingestion using the utility objects script

Important

The MySQL connector is in Public Preview. Contact your Azure Databricks account team to request access.

Important

This page contains references to the term slave, a term that Microsoft no longer uses. When the term is removed from the software, we'll remove it from this page.

Complete the MySQL database setup tasks to ingest into Azure Databricks using Lakeflow Connect.

Requirements

  • MySQL version: 5.7.44 or 8.0+
  • Executing user: Must be root or have full administrative privileges (SUPER, ALL PRIVILEGES)
  • CDC user: Must already exist with appropriate authentication plugin before running setup procedures
  • Target schemas/tables: Must already exist before granting permissions

Step 1: Install utility objects

This step installs the utility stored procedures needed for MySQL CDC setup.

  1. Download the script: mysql_setup.sql
  2. Open the script in MySQL Workbench, DBeaver, or your preferred SQL client.
  3. Connect to your MySQL instance as root or an admin user.
  4. Make sure you are connected to your desired schema.
  5. Run the script.
  6. Verify installation:
SELECT ROUTINE_NAME
FROM information_schema.ROUTINES
WHERE ROUTINE_SCHEMA = '<schema-name>'
AND ROUTINE_TYPE = 'PROCEDURE'
AND ROUTINE_NAME IN ('lakeflow_cdc_setup', 'lakeflow_setup_cdc_user');

Step 2: Create the CDC user

Before running the setup procedures, create the CDC user with the appropriate authentication plugin.

For MySQL 8.x:

CREATE USER 'cdc_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'your_secure_password';

For MySQL 5.x:

CREATE USER 'cdc_user'@'%' IDENTIFIED WITH sha256_password BY 'your_secure_password';

Note

The authentication plugin must match your MySQL version. Using the wrong plugin will cause the setup procedure to fail.

Step 3: Enable binary logging (for standalone MySQL only)

This step configures binary logging settings required for CDC operations. For details, see lakeflow_setup_cdc_user in the procedure reference below.

CALL lakeflow_cdc_setup();

Note

This procedure only works on standalone MySQL instances. For managed services (Amazon RDS, Azure Database for MySQL, Google Cloud SQL), use the service-specific configuration tools:

  • Amazon RDS: Use Parameter Groups to set binlog_format = ROW and binlog_row_image = FULL
  • Azure MySQL: Use Server Parameters
  • Google Cloud SQL: Use Database Flags

Step 4: Grant CDC permissions

This step validates the CDC user and grants the necessary permissions for ingestion. For details, see lakeflow_setup_cdc_user in the procedure reference below.

-- Grant permissions on specific tables
CALL lakeflow_setup_cdc_user('cdc_user', '`mydb`.`orders`, `mydb`.`customers`');

Alternative options:

  • For all tables in specific schemas: 'mydb1.*, mydb2.*'
  • For mixed (schemas and specific tables): 'mydb1.*, mydb2.orders'
  • For replication permissions only (no table access): 'NULL'

Step 5: Verify setup

Run the following queries to confirm that the CDC user has the correct permissions:

-- Check user privileges
SHOW GRANTS FOR 'cdc_user'@'%';

-- Verify binary logging configuration
SHOW VARIABLES LIKE 'binlog_format';
SHOW VARIABLES LIKE 'binlog_row_image';

-- Verify user authentication plugin
SELECT user, host, plugin FROM mysql.user WHERE user = 'cdc_user';

Procedure reference

lakeflow_setup_cdc_user

Validates that a CDC user exists with the appropriate authentication plugin and grants the necessary privileges for CDC operations.

Syntax:

CALL lakeflow_setup_cdc_user(
    @User = 'username',
    @Tables = 'table_list'
);

Parameters:

Parameter Type Description
@User VARCHAR(255) The CDC username (must already exist)
@Tables MEDIUMTEXT Comma-separated list of tables/schemas

Table list format:

Format Example Description
Specific tables `schema`.`table1`, `schema`.`table2` Grants SELECT on specific tables
All tables in schema `schema`.* Grants SELECT on all tables in the schema
Mixed `schema1`.*, `schema2`.`table1` Combination of schemas and specific tables
Replication only 'NULL' Grants only replication privileges, no table access

Privileges granted:

Privilege Scope Purpose
REPLICATION SLAVE Global (*.*) Required to read binary logs
REPLICATION CLIENT Global (*.*) Required for SHOW MASTER STATUS
SELECT (plugin, user) mysql.user Required to read authentication plugin info
SELECT Specified tables Required for reading CDC data

Validation checks:

The procedure performs the following validations before granting privileges:

  1. MySQL version is 5.x or 8.x
  2. User exists with host = '%'
  3. User has correct authentication plugin for the MySQL version
  4. All specified schemas exist
  5. All specified tables exist (for non-wildcard entries)

lakeflow_cdc_setup

Configures MySQL binary logging settings required for CDC operations.

Syntax:

CALL lakeflow_cdc_setup();

Parameters: None

Configuration applied:

Setting Value Purpose
binlog_format ROW Captures all column changes in binary log
binlog_row_image FULL Logs complete before/after images

Behavior by MySQL version:

Version Behavior
MySQL 8.x Uses SET PERSIST (survives restarts)
MySQL 5.x Uses SET GLOBAL (requires manual persistence)

Common scenarios

Scenario 1: Specific tables only

-- Create user
CREATE USER 'lakeflow_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'secure_password';

-- Setup binary logging
CALL lakeflow_cdc_setup();

-- Grant permissions on specific tables
CALL lakeflow_setup_cdc_user('lakeflow_user', '`sales`.`orders`, `sales`.`customers`, `inventory`.`products`');

Scenario 2: All tables in specific schemas

-- Create user
CREATE USER 'lakeflow_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'secure_password';

-- Setup binary logging
CALL lakeflow_cdc_setup();

-- Grant permissions on all tables in schemas
CALL lakeflow_setup_cdc_user('lakeflow_user', '`sales`.*, `inventory`.*, `hr`.*');

Scenario 3: Hybrid approach (schemas + specific tables)

-- Create user
CREATE USER 'lakeflow_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'secure_password';

-- Setup binary logging
CALL lakeflow_cdc_setup();

-- Grant all tables in sales schema, but only specific tables in other schemas
CALL lakeflow_setup_cdc_user('lakeflow_user', '`sales`.*, `inventory`.`products`, `hr`.`employees`');

Scenario 4: Amazon RDS MySQL

-- Create user (binary logging configured via Parameter Groups)
CREATE USER 'lakeflow_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'secure_password';

-- Grant permissions (skip lakeflow_cdc_setup for managed services)
CALL lakeflow_setup_cdc_user('lakeflow_user', '`mydb`.*');

Tip

For Amazon RDS, configure binary logging using a custom Parameter Group with binlog_format = ROW and binlog_row_image = FULL.

MySQL 5.x persistence

When using lakeflow_cdc_setup() on MySQL 5.x, the configuration is not persistent across restarts. To make it permanent:

  1. Open /etc/my.cnf with root privileges:
sudo vi /etc/my.cnf
  1. Add under [mysqld] section:
[mysqld]
binlog_format=ROW
binlog_row_image=FULL
  1. Restart MySQL:
sudo systemctl restart mysqld

Troubleshooting

User does not exist

Error: User 'xxx'@'%' does NOT exist

Solution: Create the user before running the setup procedure. See Step 2.

Wrong authentication plugin

Error: User exists with wrong plugin type. Expected: caching_sha2_password, Found: mysql_native_password

Solution: Recreate the user with the correct authentication plugin:

DROP USER 'cdc_user'@'%';
CREATE USER 'cdc_user'@'%' IDENTIFIED WITH caching_sha2_password BY 'your_password';

Schema does not exist

Error: Catalog 'xxx' does NOT exist

Solution: Create the schema or correct the schema name in your tables list.

Table does not exist

Error: Table 'xxx.yyy' does NOT exist

Solution: Create the table or correct the table name in your tables list.

Managed service detected

Error: Cloud/managed service detected. Use parameter/server group tools for binlog setup.

Solution: Skip lakeflow_cdc_setup() and configure binary logging through your cloud provider's management console.

Additional resources