Configure Oracle database in a copy activity
This article outlines how to use the copy activity in data pipeline to copy data from and to Oracle database.
Supported configuration
For the configuration of each tab under copy activity, go to the following sections respectively.
General
Refer to the General settings guidance to configure the General settings tab.
Source
The following properties are supported for Oracle database under the Source tab of a copy activity.
The following properties are required:
- Connection: Select an Oracle database connection from the connection list. If no connection exists, then create a new Oracle database connection by selecting More at the bottom of the connection list.
- Use query: Select from Table or Query.
If you select Table:
Table: Specify the name of the table in the Oracle database to read data. Select the table from the drop-down list or select Enter manually to enter the schema and table name.
If you select Query:
Query: Specify the custom SQL query to read data. For example:
SELECT * FROM MyTable
.When you enable partitioned load, you need to hook any corresponding built-in partition parameters in your query. For examples, see the Parallel copy from Oracle database section.
Under Advanced, you can specify the following fields:
Partition option: Specifies the data partitioning options used to load data from Oracle database. When a partition option is enabled (that is, not None), the degree of parallelism to concurrently load data from an Oracle database is controlled by Degree of copy parallelism in copy activity settings tab.
If you select None, you choose not to use partition.
If you select Physical partitions of table:
Partition names: Specify the list of physical partitions that needs to be copied.
If you use a query to retrieve the source data, hook
?DfTabularPartitionName
in the WHERE clause. For an example, see the Parallel copy from Oracle database section.
If you select Dynamic range:
Partition column name: Specify the name of the source column in integer type that will be used by range partitioning for parallel copy. If not specified, the primary key of the table is auto-detected and used as the partition column.
If you use a query to retrieve the source data, hook
?DfRangePartitionColumnName
in the WHERE clause. For an example, see the Parallel copy from Oracle database section.Partition upper bound: Specify maximum value of the partition column to copy data out.
If you use a query to retrieve the source data, hook
?DfRangePartitionUpbound
in the WHERE clause. For an example, see the Parallel copy from Parallel copy from Oracle database section.Partition lower bound: Specify the minimum value of the partition column to copy data out.
If you use a query to retrieve the source data, hook
?DfRangePartitionLowbound
in the WHERE clause. For an example, see the Parallel copy from Parallel copy from Oracle database section.
Query timeout (minutes): Specify the timeout for query command execution, default is 120 minutes. If a parameter is set for this property, allowed values are timespan, such as "02:00:00" (120 minutes).
Additional columns: Add additional data columns to store source files' relative path or static value. Expression is supported for the latter.
Destination
The following properties are supported for Oracle database under the Destination tab of a copy activity.
The following properties are required:
- Connection: Select an Oracle database connection from the connection list. If the connection doesn't exist, then create a new Oracle database connection by selecting More at the bottom of the connection list.
- Table: Select the table in your database from the drop-down list. Or check Enter manually to enter the schema and table name.
Under Advanced, you can specify the following fields:
- Pre-copy script: Specify a SQL query for the copy activity to execute before you write data into Oracle database in each run. You can use this property to clean up the preloaded data.
- Write batch timeout: The wait time for the batch insert operation to complete before it times out. The allowed value is timespan. An example is 00:30:00 (30 minutes).
- Write batch size: Specify the number of rows to insert into the Oracle database table per batch. The allowed value is integer (number of rows). The default value is 10,000.
- Max concurrent connections: The upper limit of concurrent connections established to the data store during the activity run. Specify a value only when you want to limit concurrent connections.
Mapping
For Mapping tab configuration, go to Configure your mappings under mapping tab.
Settings
For Settings tab configuration, see Configure your other settings under settings tab.
Parallel copy from Oracle database
The Oracle database connector provides built-in data partitioning to copy data from Oracle database in parallel. You can find data partitioning options on the Source tab of the copy activity.
When you enable partitioned copy, the service runs parallel queries against your Oracle database source to load data by partitions. The parallel degree is controlled by the Degree of copy parallelism setting in the copy activity settings tab. For example, if you set Degree of copy parallelism to four, the service concurrently generates and runs four queries based on your specified partition option and settings, and each query retrieves a portion of data from your Oracle database.
You are suggested to enable parallel copy with data partitioning especially when you load large amount of data from your Oracle database. The following are suggested configurations for different scenarios. When copying data into file-based data store, it's recommended to write to a folder as multiple files (only specify folder name), in which case the performance is better than writing to a single file.
Scenario | Suggested settings |
---|---|
Full load from large table, with physical partitions. | Partition option: Physical partitions of table. During execution, the service automatically detects the physical partitions, and copies data by partitions. |
Full load from large table, without physical partitions, while with an integer column for data partitioning. | Partition options: Dynamic range partition. Partition column: Specify the column used to partition data. If not specified, the primary key column is used. |
Load a large amount of data by using a custom query, with physical partitions. | Partition options: Physical partitions of table. Query: SELECT * FROM <TABLENAME> PARTITION("?DfTabularPartitionName") WHERE <your_additional_where_clause> . Partition name: Specify the partition name(s) to copy data from. If not specified, the service automatically detects the physical partitions on the table you specified in the Oracle database data. During execution, the service replaces ?DfTabularPartitionName with the actual partition name, and sends to Oracle database. |
Load a large amount of data by using a custom query, without physical partitions, while with an integer column for data partitioning. | Partition options: Dynamic range partition. Query: SELECT * FROM <TABLENAME> WHERE ?DfRangePartitionColumnName <= ?DfRangePartitionUpbound AND ?DfRangePartitionColumnName >= ?DfRangePartitionLowbound AND <your_additional_where_clause> .Partition column: Specify the column used to partition data. You can partition against the column with integer data type. Partition upper bound and partition lower bound: Specify if you want to filter against partition column to retrieve data only between the lower and upper range. During execution, the service replaces ?DfRangePartitionColumnName , ?DfRangePartitionUpbound , and ?DfRangePartitionLowbound with the actual column name and value ranges for each partition, and sends to Oracle database. For example, if your partition column "ID" is set with the lower bound as 1 and the upper bound as 80, with parallel copy set as 4, the service retrieves data by 4 partitions. Their IDs are between [1,20], [21, 40], [41, 60], and [61, 80], respectively.` |
Tip
When copying data from a non-partitioned table, you can use "Dynamic range" partition option to partition against an integer column. If your source data doesn't have such type of column, you can leverage ORA_HASH function in source query to generate a column and use it as partition column.
Table summary
The following tables contain more information about the copy activity in Oracle database.
Source information
Name | Description | Value | Required | JSON script property |
---|---|---|---|---|
Connection | Your connection to the source data store. | <your Oracle database connection> | Yes | connection |
Use query | The way to read data from Oracle database. Apply Table to read data from the specified table or apply Query to read data using SQL queries. | • Table • Query |
Yes | / |
For Table | ||||
schema name | Name of the schema. | < your schema name > | No | schema |
table name | Name of the table. | < your table name > | No | table |
For Query | ||||
Query | Use the custom SQL query to read data. An example is SELECT * FROM MyTable . When you enable partitioned load, you need to hook any corresponding built-in partition parameters in your query. For examples, see the Parallel copy from Oracle database section. |
< SQL queries > | No | oracleReaderQuery |
Partition option | The data partitioning options used to load data from Oracle database. | • None (default) • Physical partitions of table • Dynamic range |
No | / |
For Physical partitions of table | ||||
Partition names | The list of physical partitions that needs to be copied. If you use a query to retrieve the source data, hook ?DfTabularPartitionName in the WHERE clause. |
< your partition names > | No | partitionNames |
For Dynamic range | ||||
Partition column name | Specify the name of the source column in integer type that will be used by range partitioning for parallel copy. If not specified, the primary key of the table is auto-detected and used as the partition column. If you use a query to retrieve the source data, hook ?DfRangePartitionColumnName in the WHERE clause. For an example, see the Parallel copy from Oracle database section. |
< your partition column names > | No | partitionColumnName |
Partition upper bound | Specify maximum value of the partition column to copy data out. If you use a query to retrieve the source data, hook ?DfRangePartitionUpbound in the WHERE clause. For an example, see the Parallel copy from Parallel copy from Oracle database section. |
< your partition upper bound > | No | partitionUpperBound |
Partition lower bound | Specify the minimum value of the partition column to copy data out. If you use a query to retrieve the source data, hook ?DfRangePartitionLowbound in the WHERE clause. For an example, see the Parallel copy from Parallel copy from Oracle database section. |
< your partition lower bound > | No | partitionLowerBound |
Query timeout | The timeout for query command execution, default is 120 minutes. | timespan | No | queryTimeout |
Additional columns | Add additional data columns to store source files' relative path or static value. Expression is supported for the latter. | • Name • Value |
No | additionalColumns: • name • value |
Destination information
Name | Description | Value | Required | JSON script property |
---|---|---|---|---|
Connection | Your connection to the destination data store. | <your Oracle database connection> | Yes | connection |
Table | Your destination data table. | <name of your destination table> | Yes | / |
schema name | Name of the schema. | < your schema name > | Yes | schema |
table name | Name of the table. | < your table name > | Yes | table |
Pre-copy script | A SQL query for the copy activity to execute before you write data into Oracle database in each run. You can use this property to clean up the preloaded data. | < your pre-copy script > | No | preCopyScript |
Write batch timeout | The wait time for the batch insert operation to complete before it times out. | timespan | No | writeBatchTimeout |
Write batch size | The number of rows to insert into the SQL table per batch. | integer (the default is 10,000) |
No | writeBatchSize |
Max concurrent connections | The upper limit of concurrent connections established to the data store during the activity run. Specify a value only when you want to limit concurrent connections. | < max concurrent connections > | No | maxConcurrentConnections |