Develop U-SQL with Python, R, and C# for Azure Data Lake Analytics in Visual Studio Code
Learn how to use Visual Studio Code (VS Code) to write Python, R and C# code behind with U-SQL and submit jobs to Azure Data Lake service. For more information about Azure Data Lake Tools for VS Code, see Use the Azure Data Lake Tools for Visual Studio Code.
Before writing code-behind custom code, you need to open a folder or a workspace in VS Code.
Prerequisites for Python and R
Register Python and, R extensions assemblies for your ADL account.
Open your account in portal.
- Select Overview.
- Select Sample Script.
Select More.
Select Install U-SQL Extensions.
Confirmation message is displayed after the U-SQL extensions are installed.
Note
For best experiences on Python and R language service, please install VSCode Python and R extension.
Develop Python file
Select the New File in your workspace.
Write your code in U-SQL. The following is a code sample.
REFERENCE ASSEMBLY [ExtPython]; @t = SELECT * FROM (VALUES ("D1","T1","A1","@foo Hello World @bar"), ("D2","T2","A2","@baz Hello World @beer") ) AS D( date, time, author, tweet ); @m = REDUCE @t ON date PRODUCE date string, mentions string USING new Extension.Python.Reducer("pythonSample.usql.py", pyVersion : "3.5.1"); OUTPUT @m TO "/tweetmentions.csv" USING Outputters.Csv();
Right-click a script file, and then select ADL: Generate Python Code Behind File.
The xxx.usql.py file is generated in your working folder. Write your code in Python file. The following is a code sample.
def get_mentions(tweet): return ';'.join( ( w[1:] for w in tweet.split() if w[0]=='@' ) ) def usqlml_main(df): del df['time'] del df['author'] df['mentions'] = df.tweet.apply(get_mentions) del df['tweet'] return df
Right-click in USQL file, you can select Compile Script or Submit Job to running job.
Develop R file
Select the New File in your workspace.
Write your code in U-SQL file. The following is a code sample.
DEPLOY RESOURCE @"/usqlext/samples/R/my_model_LM_Iris.rda"; DECLARE @IrisData string = @"/usqlext/samples/R/iris.csv"; DECLARE @OutputFilePredictions string = @"/my/R/Output/LMPredictionsIris.txt"; DECLARE @PartitionCount int = 10; @InputData = EXTRACT SepalLength double, SepalWidth double, PetalLength double, PetalWidth double, Species string FROM @IrisData USING Extractors.Csv(); @ExtendedData = SELECT Extension.R.RandomNumberGenerator.GetRandomNumber(@PartitionCount) AS Par, SepalLength, SepalWidth, PetalLength, PetalWidth FROM @InputData; // Predict Species @RScriptOutput = REDUCE @ExtendedData ON Par PRODUCE Par, fit double, lwr double, upr double READONLY Par USING new Extension.R.Reducer(scriptFile : "RClusterRun.usql.R", rReturnType : "dataframe", stringsAsFactors : false); OUTPUT @RScriptOutput TO @OutputFilePredictions USING Outputters.Tsv();
Right-click in USQL file, and then select ADL: Generate R Code Behind File.
The xxx.usql.r file is generated in your working folder. Write your code in R file. The following is a code sample.
load("my_model_LM_Iris.rda") outputToUSQL=data.frame(predict(lm.fit, inputFromUSQL, interval="confidence"))
Right-click in USQL file, you can select Compile Script or Submit Job to running job.
Develop C# file
A code-behind file is a C# file associated with a single U-SQL script. You can define a script dedicated to UDO, UDA, UDT, and UDF in the code-behind file. The UDO, UDA, UDT, and UDF can be used directly in the script without registering the assembly first. The code-behind file is put in the same folder as its peering U-SQL script file. If the script is named xxx.usql, the code-behind is named as xxx.usql.cs. If you manually delete the code-behind file, the code-behind feature is disabled for its associated U-SQL script. For more information about writing customer code for U-SQL script, see Writing and Using Custom Code in U-SQL: User-Defined Functions.
Select the New File in your workspace.
Write your code in U-SQL file. The following is a code sample.
@a = EXTRACT Iid int, Starts DateTime, Region string, Query string, DwellTime int, Results string, ClickedUrls string FROM @"/Samples/Data/SearchLog.tsv" USING Extractors.Tsv(); @d = SELECT DISTINCT Region FROM @a; @d1 = PROCESS @d PRODUCE Region string, Mkt string USING new USQLApplication_codebehind.MyProcessor(); OUTPUT @d1 TO @"/output/SearchLogtest.txt" USING Outputters.Tsv();
Right-click in USQL file, and then select ADL: Generate CS Code Behind File.
The xxx.usql.cs file is generated in your working folder. Write your code in CS file. The following is a code sample.
namespace USQLApplication_codebehind { [SqlUserDefinedProcessor] public class MyProcessor : IProcessor { public override IRow Process(IRow input, IUpdatableRow output) { output.Set(0, input.Get<string>(0)); output.Set(1, input.Get<string>(0)); return output.AsReadOnly(); } } }
Right-click in USQL file, you can select Compile Script or Submit Job to running job.
Next steps
- Use the Azure Data Lake Tools for Visual Studio Code
- U-SQL local run and local debug with Visual Studio Code
- Get started with Data Lake Analytics using PowerShell
- Get started with Data Lake Analytics using the Azure portal
- Use Data Lake Tools for Visual Studio for developing U-SQL applications
- Use Data Lake Analytics(U-SQL) catalog