How to write data to an on prem Postgresql db from Synapse?

Question

How to write data to an on prem Postgresql db from Synapse?

Anonymous

I am trying to write data to an on prem Postgresql db from Synapse. I used two approaches but still not successful.

Created a dataflow for ETL but on prem Postgresql db is not supported as a sink by Synapse(link).
Tried performing ETL using PySpark notebook and used psycopg2 for writing data to the Postgresql db, but getting the following error: (psycopg2.OperationalError) connection to server at "xx.x.xx.xx", port 5432 failed: Connection timed out Is the server running on that host and accepting TCP/IP connections?

Is there any workaround that I can use to write data to the on prem Postgresql db from Synapse?

NIKHILA NETHIKUNTA 4,600 Reputation points Microsoft External Staff

2024-08-19T06:41:33.69+00:00

Hi @Ishan Ambike
Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

1 answer

Your answer

NIKHILA NETHIKUNTA 4,600 Reputation points Microsoft External Staff

2024-08-19T06:41:33.69+00:00

Hi @Ishan Ambike
Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

The error message you encountered suggests that the connection between Azure Synapse and your on-premises PostgreSQL server is being blocked or is unable to be established. This is often due to network configurations. You need to ensure that the PostgreSQL server is reachable from the Azure Synapse environment. Typically, this involves configuring the network to allow inbound traffic on the PostgreSQL server from the Azure Synapse IP addresses.

VPN or ExpressRoute: One common approach is to set up a VPN or use Azure ExpressRoute to establish a secure, private connection between your Azure resources and your on-premises network. This ensures that your Synapse environment can communicate with your on-prem PostgreSQL database as if they were on the same local network.
Firewall Rules: Make sure that any firewalls or network security groups are configured to allow traffic on port 5432 from the IP range of Azure Synapse. Even if the network configuration is correct, the PostgreSQL server itself might not be configured to accept remote connections. PostgreSQL by default only allows local connections.
PostgreSQL Configuration: You need to modify the postgresql.conf file to listen on all available IP addresses (e.g., listen_addresses = '*') and adjust the pg_hba.conf file to allow connections from the IP addresses of your Azure Synapse environment.
Authentication: Ensure that the authentication method configured in pg_hba.conf supports connections from Azure Synapse. For instance, you may need to set the method to md5 or another appropriate authentication mechanism.

Since you mentioned that using a dataflow in Synapse was not successful because the on-prem PostgreSQL database is not supported as a sink, and you faced challenges with PySpark and psycopg2, you might want to consider alternative approaches:

Using ADF: ADF is a more robust tool for ETL operations and supports a wider range of connectors, including those for on-prem databases. You can create an ADF pipeline that moves data from Synapse to your on-prem PostgreSQL database. ADF can integrate with your on-prem network using a Self-Hosted Integration Runtime, which allows it to securely connect to on-prem resources.
Using a Gateway: Another approach is to use an on-premises data gateway. This can bridge the gap between your on-prem environment and Azure Synapse, allowing for secure data transfer. If direct writing from Synapse to the on-prem PostgreSQL database continues to be problematic, consider the following :
Staging in Azure: Write the data to an intermediate storage location within Azure, such as Azure Blob Storage or an Azure SQL Database, and then use a separate process (ADF, custom scripts) to move the data from there to your on-prem PostgreSQL.
Batch Processing: If real-time data transfer is not required, consider batching the data and transferring it during off-peak hours when network traffic is lower.

NIKHILA NETHIKUNTA 4,600 Reputation points Microsoft External Staff

2024-08-20T13:33:22.4633333+00:00

Hi @Anonymous
Just checking in to see if the above answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Anonymous

2024-08-20T14:04:07.67+00:00

Hi @Amira Bedhiafi , thanks for your answer. Can you please let me know where I can find Azure Synapse IP address that I should add in pg_hba.conf?
NIKHILA NETHIKUNTA 4,600 Reputation points Microsoft External Staff

2024-08-22T09:03:07.25+00:00

Hi @Ishan Ambike

You can follow the below steps:
Go to the Azure Portal and navigate to your Synapse workspace.

Click on the "Firewalls and virtual networks" tab.

Under "Firewall rules", you should see a list of IP addresses that are allowed to access your Synapse workspace. These are the IP addresses that you should add to the pg_hba.conf file on your on-premises PostgreSQL server.

Hope this helps. Please let me know if you have any further questions.
Anonymous

2024-08-22T21:10:54.7+00:00

Hi @NIKHILA NETHIKUNTA ,

Thanks for your response. I wanted to know whether having VPN is also required to connect to the on-prem PostgreSQL DB. I found this article: https://learn.microsoft.com/en-us/azure/databricks/security/network/classic/on-prem-network.

Thanks,

Ishan
NIKHILA NETHIKUNTA 4,600 Reputation points Microsoft External Staff

2024-08-26T09:59:00.28+00:00

Hi @Anonymous
Using a VPN is highly recommended when connecting Azure Synapse to your on-premises PostgreSQL database.

Secure Connection: A VPN ensures that the data transmitted between Azure Synapse and your on-premises PostgreSQL database is encrypted and secure.

Private Network: It creates a private network tunnel, reducing the risk of data breaches and unauthorized access.

Firewall and Network Rules: A VPN can help bypass firewall restrictions and network security rules that might block direct connections from Azure Synapse to your on-premises database.

Consistent Connectivity: It provides a more stable and reliable connection, which is crucial for data transfer and ETL processes.

Hope this helps. Please let me know if you have any further questions.
NIKHILA NETHIKUNTA 4,600 Reputation points Microsoft External Staff

2024-08-27T09:48:41.8766667+00:00

@Ishan Ambike
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Share via

How to write data to an on prem Postgresql db from Synapse?

1 answer

Your answer