Get data from database by (Linked Service) using Synapse notebook spark pool

JEichhorn 0 Reputation points
2024-12-05T09:28:38.5666667+00:00

Hi, i have read the following acticle get-data-from-azure-sql-database-managed-instance

Which helped me on my way, however i am trying to connect to an Oracle Database (in stead of Microsoft SQL) using a linked service and it seems that this is not supported. Is this correct?

If so, is there an alternative way that i can access an Oracle database server by linked server from the notebooks?

The code i have used is:

PythonCopy

  from pyspark.sql import SparkSession
# Replace with your linked service name
linked_service_name = "LS_ORACLE_Connection"
# Retrieve connection string from linked service using Synapse library
access_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_name)
# Build the JDBC URL
jdbc_url = f"jdbc:oracle:thin:@testdb123.test.local:1521:DB01"
# Create Spark DataFrame
df = spark.read \
    .format("oracle") \
    .option("url", jdbc_url) \
    .option("databaseName", "DB01") \
    .option("accessToken", access_token) \
    .option("dbtable", "Tablename") \
    .load()


This results in the following error message:

Py4JJavaError: An error occurred while calling z:mssparkutils.credentials.getConnectionStringOrCreds.

: com.microsoft.azure.synapse.tokenlibrary.TokenServiceClientResponseStatusException: Token Service returned 'Client Error' (400), with message: {"result":"DependencyError","errorId":"BadRequest","errorMessage":"[Code=LinkedServiceTypeNotSupported, Target=LS_ORACLE_Connection, Message=Linked Service Type 'Oracle' not supported].

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,375 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,625 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Amira Bedhiafi 33,071 Reputation points Volunteer Moderator
    2024-12-05T17:37:48.4066667+00:00

    Azure Synapse Analytics currently does not natively support connecting to Oracle databases using Linked Services directly in notebooks via mssparkutils.credentials.getConnectionStringOrCreds

    since it is primarily designed for Azure-based services like Azure SQL Database or Azure Blob Storage.

    Try to download the Oracle JDBC driver (ojdbc8.jar) from the Oracle website.

    Since, then upload the ojdbc8.jar file to your Synapse Workspace's Linked Storage Account or any accessible storage location.

    Then specify the location of the ojdbc8.jar file when starting your Synapse Spark notebook session or use the --jars option.

    Then replace the code using mssparkutils.credentials with direct JDBC configurations :

    from pyspark.sql import SparkSession
    # Oracle database connection details
    jdbc_url = "jdbc:oracle:thin:@testdb123.test.local:1521:DB01"
    database_properties = {
        "user": "your_username",  # Replace with your Oracle DB username
        "password": "your_password",  # Replace with your Oracle DB password
        "driver": "oracle.jdbc.OracleDriver"
    }
    # Read data from Oracle table
    df = spark.read \
        .format("jdbc") \
        .option("url", jdbc_url) \
        .option("dbtable", "Tablename") \
        .options(**database_properties) \
        .load()
    # Show data
    df.show()
    

  2. Ganesh Gurram 7,295 Reputation points Microsoft External Staff Moderator
    2024-12-09T14:10:09.4933333+00:00

    Hi @JEichhorn

    Thanks for the question and using MS Q&A forum.

    The error: "ORA-17868: Unknown host specified" indicates the Synapse Spark pool cannot directly access your on-premise Oracle server. This is because Linked Services with self-hosted integration runtime are currently not supported for connecting to Oracle databases within Synapse notebooks.

    As an alternative approach, Utilize Azure Data Factory (ADF) to act as a mediator between your Synapse Spark pool and the on-premise Oracle server.

    Here’s the approach you can take:

    Create an ADF pipeline: Define a copy activity in ADF that extracts data from your Oracle database and stores it in a temporary location like an Azure Data Lake Storage (ADLS) Gen2 account accessible by your Synapse workspace.

    Access data in Synapse notebook: In your notebook, read the data from the ADLS location using Spark. This leverages ADF's self-hosted integration runtime capability to establish a secure connection to your on-premise Oracle server.

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click `Accept Answer` and `Yes` for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.