Ekinlikler
Power BI DataViz Dünya Şampiyonası
14 Şub 16 - 31 Mar 16
4 giriş şansıyla bir konferans paketi kazanabilir ve Las Vegas'taki LIVE Grand Finale'e gidebilirsiniz
Daha fazla bilgi edininBu tarayıcı artık desteklenmiyor.
En son özelliklerden, güvenlik güncelleştirmelerinden ve teknik destekten faydalanmak için Microsoft Edge’e yükseltin.
Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs.
The Azure Synapse Analytics Spark client library enables programmatically managing Spark jobs.
Source code | API reference documentation | Product documentation | Samples
Maven dependency for the Azure Synapse Spark client library. Add it to your project's POM file.
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-analytics-synapse-spark</artifactId>
<version>1.0.0-beta.4</version>
</dependency>
az synapse workspace create \
--name <your-workspace-name> \
--resource-group <your-resource-group-name> \
--storage-account <your-storage-account-name> \
--file-system <your-storage-file-system-name> \
--sql-admin-login-user <your-sql-admin-user-name> \
--sql-admin-login-password <your-sql-admin-user-password> \
--location <your-workspace-location>
In order to interact with the Azure Synapse service, you'll need to create an instance of the SparkClient class. You would need a workspace endpoint and client secret credentials (client id, client secret, tenant id) to instantiate a client object using the default DefaultAzureCredential
examples shown in this document.
The DefaultAzureCredential
way of authentication by providing client secret credentials is being used in this getting started section but you can find more ways to authenticate with azure-identity.
To create/get client secret credentials you can use the Azure Portal, Azure CLI or Azure Cloud Shell
Here is an Azure Cloud Shell snippet below to
Create a service principal and configure its access to Azure resources:
az ad sp create-for-rbac -n <your-application-name> --skip-assignment
Output:
{
"appId": "generated-app-ID",
"displayName": "dummy-app-name",
"name": "http://dummy-app-name",
"password": "random-password",
"tenant": "tenant-ID"
}
Once you've populated the AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, and AZURE_TENANT_ID environment variables and replaced your-workspace-endpoint with the URI returned above, you can create Spark clients. For example, the following code creates SparkBatchClient:
import com.azure.identity.DefaultAzureCredentialBuilder;
import com.azure.analytics.synapse.spark.SparkBatchClient;
import com.azure.analytics.synapse.spark.SparkClientBuilder;
SparkBatchClient batchClient = new SparkClientBuilder()
.endpoint("https://{YOUR_WORKSPACE_NAME}.dev.azuresynapse.net")
.sparkPoolName("{SPARK_POOL_NAME}")
.credential(new DefaultAzureCredentialBuilder().build())
.buildSparkBatchClient();
NOTE: For using an asynchronous client use SparkBatchAsyncClient instead of SparkBatchClient and call
buildSparkBatchAsyncClient()
The Spark batch client performs the interactions with the Azure Synapse service for getting, setting, updating, deleting, and listing Spark batch jobs. Asynchronous (SparkBatchAsyncClient) and synchronous (SparkBatchClient) clients exist in the SDK allowing for the selection of a client based on an application's use case.
The Azure.Analytics.Synapse.Spark package supports synchronous and asynchronous APIs. The following section covers some of the most common Azure Synapse Analytics Spark job related tasks:
The following sections provide several code snippets covering some of the most common Azure Synapse Spark service tasks, including:
createSparkBatchJob
creates a Spark batch job.
SparkBatchJobOptions options = new SparkBatchJobOptions()
.setName(name)
.setFile(file)
.setClassName("WordCount")
.setArguments(Arrays.asList(
String.format("abfss://%s@%s.dfs.core.windows.net/samples/java/wordcount/shakespeare.txt", fileSystem, storageAccount),
String.format("abfss://%s@%s.dfs.core.windows.net/samples/java/wordcount/result/", fileSystem, storageAccount)
))
.setDriverMemory("28g")
.setDriverCores(4)
.setExecutorMemory("28g")
.setExecutorCores(4)
.setExecutorCount(2);
SparkBatchJob jobCreated = batchClient.createSparkBatchJob(options);
getSparkBatchJobs
enumerates the Spark batch jobs in the Synapse workspace.
SparkBatchJobCollection jobs = batchClient.getSparkBatchJobs();
for (SparkBatchJob job : jobs.getSessions()) {
System.out.println(job.getName());
}
cancelSparkBatchJob
cancels a Spark batch job by the given job ID.
batchClient.cancelSparkBatchJob(jobId);
The following sections provide several code snippets covering some of the most common asynchronous Azure Synapse Spark service tasks, including:
Note : You should add
System.in.read()
orThread.sleep()
after the function calls in the main class/thread to allow async functions/operations to execute and finish before the main application/thread exits.
createSparkBatchJob
creates a Spark batch job.
String storageAccount = "<storage-account>";
String fileSystem = "<file-system>";
String name = "<job-name>";
String file = String.format("abfss://%s@%s.dfs.core.windows.net/samples/java/wordcount/wordcount.jar", fileSystem, storageAccount);
SparkBatchJobOptions options = new SparkBatchJobOptions()
.setName(name)
.setFile(file)
.setClassName("WordCount")
.setArguments(Arrays.asList(
String.format("abfss://%s@%s.dfs.core.windows.net/samples/java/wordcount/shakespeare.txt", fileSystem, storageAccount),
String.format("abfss://%s@%s.dfs.core.windows.net/samples/java/wordcount/result/", fileSystem, storageAccount)
))
.setDriverMemory("28g")
.setDriverCores(4)
.setExecutorMemory("28g")
.setExecutorCores(4)
.setExecutorCount(2);
batchClient.createSparkBatchJob(options).subscribe(job -> System.out.printf("Job ID: %f\n", job.getId()));
getSparkBatchJobs
enumerates the Spark batch jobs in the Synapse workspace.
batchClient.getSparkBatchJobs().subscribe(jobs -> {
for (SparkBatchJob job : jobs.getSessions()) {
System.out.println(job.getName());
}
});
cancelSparkBatchJob
deletes a Spark batch job by the job ID.
batchClient.cancelSparkBatchJob(jobId);
All client libraries by default use the Netty HTTP client. Adding the above dependency will automatically configure the client library to use the Netty HTTP client. Configuring or changing the HTTP client is detailed in the HTTP clients wiki.
All client libraries, by default, use the Tomcat-native Boring SSL library to enable native-level performance for SSL operations. The Boring SSL library is an Uber JAR containing native libraries for Linux / macOS / Windows, and provides better performance compared to the default SSL implementation within the JDK. For more information, including how to reduce the dependency size, refer to the performance tuning section of the wiki.
Several Synapse Java SDK samples are available to you in the SDK's GitHub repository. These samples provide example code for additional scenarios commonly encountered while working with Azure Synapse Analytics.
For more extensive documentation on Azure Synapse Analytics, see the API reference documentation.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Azure SDK for Java geri bildirimi
Azure SDK for Java, açık kaynak bir projedir. Geri bildirim sağlamak için bir bağlantı seçin:
Ekinlikler
Power BI DataViz Dünya Şampiyonası
14 Şub 16 - 31 Mar 16
4 giriş şansıyla bir konferans paketi kazanabilir ve Las Vegas'taki LIVE Grand Finale'e gidebilirsiniz
Daha fazla bilgi edinin