Azure Data Factory Spark Activity JSON Properties - Required/Mandatory

Spark Activity JSON Properties - Required/Mandatory

Question

There are a number of JSON properties used in the JSON definition of a Spark Activity in Azure Data Factory.

From the given list of properties, choose the properties that are required/mandatory in the JSON definition.

(Select all that are applicable)

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D. E. F.

Correct Answers: A and C

Check the below table to know the JSON properties used in the JSON definition.

The table also provides the description for each property and also whether that property is required or not.

Property

name

description

type

linkedServiceName

SparkJobLinkedService

rootPath

entryFilePath

className

arguments

proxyUser

sparkConfig

getDebuginfo

Description

Name of the activity in the pipeline.

Text describing what the activity does.

For Spark Activity, the activity type is HDInsightSpark.

Name of the HDInsight Spark Linked Service on which the Spark program runs. To learn about
this linked service, see Compute linked services article.

The Azure Storage linked service that holds the Spark job file, dependencies, and logs. Only
‘Azure Blob Storage and ADLS Gen2 linked services are supported here. If you do not specify a
value for this property, the storage associated with HDInsight cluster is used. The value of this
property can only be an Azure Storage linked service.

The Azure Blob container and folder that contains the Spark file. The file name is case-sensitive.
Refer to folder structure section (next section) for details about the structure of this folder.

Relative path to the root folder of the Spark code/package. The entry file must be either a
Python file or a jar file.

Application’s Java/Spark main class

A list of command-line arguments to the Spark program.

The user account to impersonate to execute the Spark program

Specify values for Spark configuration properties listed in the topic: Spark Configuration -
Application properties.

Specifies when the Spark log files are copied to the Azure storage used by HDInsight cluster (or)
specified by sparkJobLinkedService. Allowed values: None, Always, or Failure. Default value:
None.

Required

Yes

No

Yes

Yes

No

Yes

Yes

No

No

No

No

No

Option A is correct.

linkedServiceName is a required property in JSON definition.

Option B is incorrect.

SparkJobLinkedService is not a required property in JSON definition.

Option C is correct.

rootPath is a required property in JSON definition.

Option D is incorrect.

className is not a required property in JSON definition.

Option E is incorrect.

proxyUser is not a required property in JSON definition.

Option F is incorrect.

Out of the given options, only linkedServiceName and rootPath are the required JSON properties.

To know more about transforming data using spark, please visit the below-given link:

Sure, I can provide you with a detailed explanation of the JSON properties used in the JSON definition of a Spark Activity in Azure Data Factory and which properties are required/mandatory.

The Spark Activity in Azure Data Factory allows you to run Apache Spark jobs in an Azure HDInsight cluster or on a standalone Spark cluster. When you define a Spark Activity in Azure Data Factory, you need to provide a JSON definition that includes various properties.

Out of the given list of properties, the mandatory properties for the JSON definition of a Spark Activity are:

A. LinkedServiceName - This property specifies the name of the Azure Data Factory linked service that represents the HDInsight cluster or standalone Spark cluster where the Spark job will be executed. This is a mandatory property as it is required to identify the linked service to which the activity is associated.

B. SparkJobLinkedService - This property is used to specify the name of the linked service that represents the Spark cluster where the Spark job will be executed. This is a mandatory property as it is required to identify the linked service to which the activity is associated.

C. rootPath - This property specifies the root folder path in the storage account where the input and output data of the Spark job will be stored. This property is not mandatory and can be omitted if the Spark job does not require input/output data from storage.

D. className - This property specifies the fully qualified name of the main class that contains the Spark job code. This is a mandatory property as it is required to identify the main class that contains the Spark job code.

E. proxyUser - This property specifies the user account to be used when executing the Spark job. This property is not mandatory and can be omitted if the default user account is to be used.

Therefore, the correct answer to the question would be option A, B, and D. The properties LinkedServiceName, SparkJobLinkedService, and className are mandatory in the JSON definition of a Spark Activity in Azure Data Factory.