How to set up ADF Python script to access an external SFTP with Firewall Exceptions (whitelist)?

11-4688 111 Reputation points
2024-06-24T12:08:45.5766667+00:00

I am working on an ADF pipeline. One of the steps will include a Python script that connects to an external SFTP, download some files and upload them to my Storage Account.

The SFTP owner asked to share the IP that he should add to Firewall exceptions so the code can connect with the SFTP.

My current setup:

  1. Virtual Network
  2. Public IP Address with static IP and DNS Name Label
  3. A Pool in the Azure Batch Service. In the Network Configuration step I used my Vnet from 1 and default Subnet, usermanaged IP address provisioning type and assigned PublicIP ID.
  4. I can now see that the IP of Node that was created is the same as the one from 2.

Shall it work? I tried to prepare a POC of this solution that follows this path with copying a file from one container of Storage Account to Another.

  1. It works with the Public Access in the Networking blade enabled
  2. It works with the Enabled from selected virtual networks and IP addresses selected and my Virtual Network Added
  3. It works with the Enabled from selected virtual networks and IP addresses selected and my local IP added to the whitelist when I run it locally
  4. It DOES NOT WORK with the Enabled from selected virtual networks and IP addresses selected and The Node Public IP (from 2 / 4) added to the whitelist. What might be the reason for that and how to cope with it?

The code is pretty simple:

from azure.storage.blob import BlobClient
import pandas as pd
from io import BytesIO
import requests

# Print current IP address
response = requests.get('https://api.ipify.org?format=json')
ip_address = response.json()['ip']
print(f'Current IP Address: {ip_address}')

# Define parameters
connectionString = "connectionstring"
inputContainerName = "input"
inputBlobName = "iris.csv"
outputContainerName = "output"
outputBlobName = "iris_setosa.csv"

# Establish connection with the blob storage account for input container
input_blob = BlobClient.from_connection_string(conn_str=connectionString, container_name=inputContainerName, blob_name=inputBlobName)

# Download the blob as a stream
input_stream = input_blob.download_blob()
df = pd.read_csv(BytesIO(input_stream.readall()))

# Take a subset of the records
df = df[df['Species'] == "setosa"]

# Save the subset of the iris dataframe locally in memory
output_stream = BytesIO()
df.to_csv(output_stream, index=False)
output_stream.seek(0)  # Reset the stream position to the beginning

# Establish connection with the blob storage account for output container
output_blob = BlobClient.from_connection_string(conn_str=connectionString, container_name=outputContainerName, blob_name=outputBlobName)

# Upload the stream to the output container
output_blob.upload_blob(output_stream, overwrite=True)
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
2,851 questions
Azure Batch
Azure Batch
An Azure service that provides cloud-scale job scheduling and compute management.
318 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,974 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Nandan Hegde 30,556 Reputation points MVP
    2024-06-24T12:38:24.8133333+00:00

    You can access SFTP as a source in ADF and copy data into blob storage via Copy activity.

    There are 2 ways to whitelist the IP in SFTP :

    1. In case if you need a single static IP, you can host a self hosted IR and whitelist that servers public IP within the SFTP
    2. Every ADF region has a specific range of IP address which you can whitelist

    Similar thread :

    https://learn.microsoft.com/en-us/answers/questions/1513454/how-to-generate-a-static-ip-for-adf-to-connect-to