Azure Synapse Spark client library for .NET - version 1.0.0-preview.8

This directory contains the open source subset of the .NET SDK. For documentation of the complete Azure SDK, please see the Microsoft Azure .NET Developer Center.

Use the client library for Synapse to:

  • Submit Spark Batch job and Spark Session Job

Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs.

Getting started

The complete Microsoft Azure SDK can be downloaded from the Microsoft Azure Downloads Page and ships with support for building deployment packages, integrating with tooling, rich command line tooling, and more.

For the best development experience, developers should use the official Microsoft NuGet packages for libraries. NuGet packages are regularly updated with new functionality and hotfixes.

Install the package

Install the Spark client library for Azure Synapse Analytics for .NET with NuGet:

dotnet add package Azure.Analytics.Synapse.Spark --version 0.1.0-preview.1

Prerequisites

  • Azure Subscription: To use Azure services, including Azure Synapse, you'll need a subscription. If you do not have an existing Azure account, you may sign up for a free trial or use your Visual Studio Subscription benefits when you create an account.
  • An existing Azure Synapse workspace. If you need to create an Azure Synapse workspace, you can use the Azure Portal or Azure CLI.

If you use the Azure CLI, the command looks like below:

az synapse workspace create \
    --name <your-workspace-name> \
    --resource-group <your-resource-group-name> \
    --storage-account <your-storage-account-name> \
    --file-system <your-storage-file-system-name> \
    --sql-admin-login-user <your-sql-admin-user-name> \
    --sql-admin-login-password <your-sql-admin-user-password> \
    --location <your-workspace-location>

Authenticate the client

In order to interact with the Azure Synapse Analytics service, you'll need to create an instance of the SparkBatchClient or SparkSessionClient class. You need a workspace endpoint, which you may see as "Development endpoint" in the portal, and client secret credentials (client id, client secret, tenant id) to instantiate a client object.

Client secret credential authentication is being used in this getting started section but you can find more ways to authenticate with Azure identity. To use the DefaultAzureCredential provider shown below, or other credential providers provided with the Azure SDK, you should install the Azure.Identity package:

Install-Package Azure.Identity

Examples

The Microsoft.Azure.Synapse supports the CRUD of spark batch job.

Spark Batch Job examples

List spark batch job

List the spark batch job under the specific spark pool of a specific synapse workspace

Response<SparkBatchJobCollection> jobs = client.GetSparkBatchJobs();
foreach (SparkBatchJob job in jobs.Value.Sessions)
{
    Console.WriteLine(job.Name);
}

Create spark batch job

Create spark batch job under specific workspace and spark pool.

string name = $"batch-{Guid.NewGuid()}";
string file = string.Format("abfss://{0}@{1}.dfs.core.windows.net/samples/net/wordcount/wordcount.zip", fileSystem, storageAccount);
SparkBatchJobOptions request = new SparkBatchJobOptions(name, file)
{
    ClassName = "WordCount",
    Arguments =
    {
        string.Format("abfss://{0}@{1}.dfs.core.windows.net/samples/net/wordcount/shakespeare.txt", fileSystem, storageAccount),
        string.Format("abfss://{0}@{1}.dfs.core.windows.net/samples/net/wordcount/result/", fileSystem, storageAccount),
    },
    DriverMemory = "28g",
    DriverCores = 4,
    ExecutorMemory = "28g",
    ExecutorCores = 4,
    ExecutorCount = 2
};

SparkBatchOperation createOperation = client.StartCreateSparkBatchJob(request);
while (!createOperation.HasCompleted)
{
    System.Threading.Thread.Sleep(2000);
    createOperation.UpdateStatus();
}
SparkBatchJob jobCreated = createOperation.Value;

Cancel spark batch job

Cancel a Spark batch job with Spark batch id under specific workspace and Spark pool.

Response operation = client.CancelSparkBatchJob(jobCreated.Id);

To build

For information on building the Azure Synapse client library, please see Building the Microsoft Azure SDK for .NET

Target frameworks

For information about the target frameworks of the Azure Synapse client library, please refer to the Target Frameworks of the Microsoft Azure SDK for .NET.

Key concepts

Submit Spark job.

Thread safety

We guarantee that all client instance methods are thread-safe and independent of each other (guideline). This ensures that the recommendation of reusing client instances is always safe, even across threads.

Additional concepts

Client options | Accessing the response | Long-running operations | Handling failures | Diagnostics | Mocking | Client lifetime

Troubleshooting

Please open issue in github.

Next steps

The next step is adding more examples

Contributing

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.