Deploying .whl artifacts from Azure DevOps feed to Synapse spark pools

Question

Deploying .whl artifacts from Azure DevOps feed to Synapse spark pools

Victor Seifert 151

I was wondering whether it was possible to deploy artifacts (in my case custom python packages as whl files) from the Azure DevOps artifact feed to a Synapse spark pool.

Currently I have to manually:

download the whl file from the artifact feed
upload the whl file to the synapse workspace
add the package/whl manually to the packages of a spark pool ("Select from workspace packages").

I have so far not found any option to do this as part of a pipeline in Azure DevOps and also have not found any documentation. I was wondering whether this is actually possible (and if so, how) or whether this will be a feature in the future?

Best regards,
Victor

ShaikMaheer-MSFT 38,546 Reputation points Microsoft Employee Moderator

2022-06-06T13:03:42.917+00:00

Hi @Victor Seifert ,

Thank you for posting query in Microsoft Q&A Platform.

I am checking with internal team to get insights on this. I will share updates soon. Thank you.
Victor Seifert 46 Reputation points

2022-06-28T10:29:07.277+00:00

Hi @ShaikMaheer-MSFT , are there any updates by now?

Best regards,
Victor

1 answer

Your answer

ShaikMaheer-MSFT 38,546 Reputation points Microsoft Employee Moderator

2022-06-06T13:03:42.917+00:00

Hi @Victor Seifert ,

Thank you for posting query in Microsoft Q&A Platform.

I am checking with internal team to get insights on this. I will share updates soon. Thank you.
Victor Seifert 46 Reputation points

2022-06-28T10:29:07.277+00:00

Hi @ShaikMaheer-MSFT , are there any updates by now?

Best regards,
Victor

Answer 1

Hi @Victor Seifert ,

An alternate way is to leverage the library and upload custom package.
BigDataPoolsOperationsExtensions.BeginCreateOrUpdate Method (Microsoft.Azure.Management.Synapse) - Azure for .NET Developers | Microsoft Learn

Something like this

   // Custom packages can only be added after the initial spark pool creation, otherwise Azure throws an error  
                if (this.sparkPoolEntity.HasCustomPackages && false)  
                {  
                    // Requirements file is specified, use it.  
                    string requirementsFileContents = null;  
                    if (!string.IsNullOrEmpty(this.sparkPoolEntity.RequirementsFilePath))  
                    {  
                        var requirementsFilePath = PathManager.Instance.GetFilePath(this.sparkPoolEntity.RequirementsFilePath);  
                        var requirementsFileName = Path.GetFileName(requirementsFilePath);  
                        requirementsFileContents = File.ReadAllText(requirementsFilePath);  
  
                        sparkPoolInfo.LibraryRequirements = new LibraryRequirements(DateTime.UtcNow, requirementsFileContents, requirementsFileName);  
                    }  
                      
                    sparkPoolInfo.CustomLibraries = await this.GetPythonLibrariesToDeploy();  
                    var sparkPool = await resourceManagement.GetSparkPoolAsync(this.sparkPoolEntity.ResourceGroup, this.sparkPoolEntity.Workspacename, this.sparkPoolEntity.SparkPoolName);  
                    var librariesToUpdate = sparkPoolInfo.CustomLibraries.Select(l => l.Name);  
                    var existingLibraries = (sparkPool.CustomLibraries ?? new List<LibraryInfo>()).Select(l => l.Name);  
  
                    var librariesChanged = !Enumerable.SequenceEqual(librariesToUpdate.OrderBy(e => e), existingLibraries.OrderBy(e => e));  
                    var requirementsChanged = requirementsFileContents != sparkPool.LibraryRequirements?.Content;  
  
                    // Skip re-provisioning if there is no change in libraries or requirements  
                    Logger.Instance.LogMessage($"Spark pool {this.sparkPoolEntity.SparkPoolName} libraries changed: {librariesChanged}. Requirements changed: {requirementsChanged}");  
                    if (librariesChanged || requirementsChanged)  
                    {  
                        await WaitForProvisioningToComplete();  
                        Logger.Instance.LogMessage($"Begin re-provisioning of spark pool {this.sparkPoolEntity.SparkPoolName}");  
  
                        // Use BeginCreateOrUpdate here so the pipeline deployment isn't blocked on the spark pool provisioning  
                        await resourceManagement.BeginCreateOrUpdateSparkPoolAsync(this.sparkPoolEntity.SparkPoolName,  
                                                                         this.sparkPoolEntity.ResourceGroup,  
                                                                         this.sparkPoolEntity.Workspacename,  
                                                                         sparkPoolInfo);  
                    }  
                    else  
                    {  
                        Logger.Instance.LogMessage($"Skip re-provisioning of spark pool {this.sparkPoolEntity.SparkPoolName}");  
                    }  
                }

Hope this helps. Please let us know how it goes.

-----------

Please consider hitting Accept Answer button. Accepted answers help community as well.

Victor Seifert 46 Reputation points

2022-06-29T05:50:35.453+00:00

As far as I can tell from the code you supplied, this is limited to the requirements.txt file that is appended to the SparkPool, correct? I can potentially update its contents with the script and have the pool automatically pick up any changes.

However, the script does not help to upload a .whl file (e.g. from an ADLS) to the sparkpool, correct? Or am I missing something?

Share via

Deploying .whl artifacts from Azure DevOps feed to Synapse spark pools

1 answer

Your answer