What is an Apache Spark job definition?

Članek
01. 12. 2023

An Apache Spark job definition is a Microsoft Fabric code item that allows you to submit batch/streaming jobs to Spark clusters. By uploading the binary files from the compilation output of different languages (for example, .jar from Java), you can apply different transformation logic to the data hosted on a lakehouse. Besides the binary file, you can further customize the behavior of the job by uploading more libraries and command line arguments.

To run a Spark job definition, you must have at least one lakehouse associated with it. This default lakehouse context serves as the default file system for Spark runtime. For any Spark code using a relative path to read/write data, the data is served from the default lakehouse.

Nasvet

To run a Spark job definition item, you must have a main definition file and default lakehouse context. If you don't have a lakehouse, create one by following the steps in Create a lakehouse.

How to create an Apache Spark job definition in Fabric

Dodatni viri

Usposabljanje

Modul

Use Apache Spark in Microsoft Fabric - Training

Apache Spark is a core technology for large-scale data analytics. Microsoft Fabric provides support for Spark clusters, enabling you to analyze and process data at scale.

Potrdilo

Microsoft Certified: Fabric Data Engineer Associate - Certifications

As a Fabric Data Engineer, you should have subject matter expertise with data loading patterns, data architectures, and orchestration processes.

Dogodek

FabCon Vegas

31. mar., 23h - 2. apr., 23h

Največji učni dogodek Fabric, Power BI in SQL. 31. marec - 2. april Če želite shraniti 400 $, uporabite kodo FABINSIDER.

Registrirajte se danes

Deli z drugimi prek

What is an Apache Spark job definition?

Povratne informacije

Dodatni viri

Deli z drugimi prek

What is an Apache Spark job definition?

Related content

Povratne informacije

Dodatni viri