Sündmused
31. märts, 23 - 2. apr, 23
Suurim Fabrici, Power BI ja SQL-i õppesündmus. 31. märts – 2. aprill. Kasutage koodi FABINSIDER, et säästa $400.
Registreeruge juba tänaSeda brauserit enam ei toetata.
Uusimate funktsioonide, turbevärskenduste ja tehnilise toe kasutamiseks võtke kasutusele Microsoft Edge.
This tutorial shows you how to use the Stream Analytics no code editor to create a job that captures Event Hubs data in to Azure Data Lake Storage Gen2 in the parquet format.
In this tutorial, you learn how to:
Before you start, make sure you've completed the following steps:
Locate the Resource Group in which the TollApp event generator was deployed.
Select the Azure Event Hubs namespace. You might want to open it in a separate tab or a window.
On the Event Hubs namespace page, select Event Hubs under Entities on the left menu.
Select entrystream
instance.
On the Event Hubs instance page, select Process data in the Features section on the left menu.
Select Start on the Capture data to ADLS Gen2 in Parquet format tile.
Name your job parquetcapture
and select Create.
On the event hub configuration page, follow these steps:
For Consumer group, select Use existing.
Confirm that $Default
consumer group is selected.
Confirm that Serialization is set to JSON.
Confirm that Authentication method is set to Connection String.
Confirm that Event hub shared access key name is set to RootManageSharedAccessKey.
Select Connect at the bottom of the window.
Within few seconds, you'll see sample input data and the schema. You can choose to drop fields, rename fields, or change data type.
Select the Azure Data Lake Storage Gen2 tile on your canvas and configure it by specifying
Subscription where your Azure Data Lake Gen2 account is located in
Storage account name, which should be the same ADLS Gen2 account used with your Azure Synapse Analytics workspace done in the Prerequisites section.
Container inside which the Parquet files will be created.
For Delta table path, specify a name for the table.
Date and time pattern as the default yyyy-mm-dd and HH.
Select Connect
Select Save in the top ribbon to save your job, and then select Start to run your job. Once the job is started, select X in the right corner to close the Stream Analytics job page.
You'll then see a list of all Stream Analytics jobs created using the no code editor. And within two minutes, your job will go to a Running state. Select the Refresh button on the page to see the status changing from Created -> Starting -> Running.
Locate the Azure Data Lake Storage Gen2 account you had used in the previous step.
Select the container you had used in the previous step. You'll see parquet files created in the folder you specified earlier.
Locate your Azure Synapse Analytics workspace and open Synapse Studio.
Create a serverless Apache Spark pool in your workspace if one doesn't already exist.
In the Synapse Studio, go to the Develop hub and create a new Notebook.
Create a new code cell and paste the following code in that cell. Replace container and adlsname with the name of the container and ADLS Gen2 account used in the previous step.
%%pyspark
df = spark.read.load('abfss://container@adlsname.dfs.core.windows.net/*/*.parquet', format='parquet')
display(df.limit(10))
df.count()
df.printSchema()
For Attach to on the toolbar, select your Spark pool from the dropdown list.
Select Run All to see the results
In the Develop hub, create a new SQL script.
Paste the following script and Run it using the Built-in serverless SQL endpoint. Replace container and adlsname with the name of the container and ADLS Gen2 account used in the previous step.
SELECT
TOP 100 *
FROM
OPENROWSET(
BULK 'https://adlsname.dfs.core.windows.net/container/*/*.parquet',
FORMAT='PARQUET'
) AS [result]
In this tutorial, you learned how to create a Stream Analytics job using the no code editor to capture Event Hubs data streams in Parquet format. You then used Azure Synapse Analytics to query the parquet files using both Synapse Spark and Synapse SQL.
Sündmused
31. märts, 23 - 2. apr, 23
Suurim Fabrici, Power BI ja SQL-i õppesündmus. 31. märts – 2. aprill. Kasutage koodi FABINSIDER, et säästa $400.
Registreeruge juba tänaKoolitus
Moodul
Ingest streaming data using Azure Stream Analytics and Azure Synapse Analytic - Training
Azure Stream Analytics provides a real-time data processing engine that you can use to ingest streaming event data into Azure Synapse Analytics for further analysis and reporting.
Sertimine
Microsoft Certified: Azure Data Engineer Associate - Certifications
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.