Deploying .whl artifacts from Azure DevOps feed to Synapse spark pools

Victor Seifert 151 Reputation points
2022-06-02T12:51:43.733+00:00

I was wondering whether it was possible to deploy artifacts (in my case custom python packages as whl files) from the Azure DevOps artifact feed to a Synapse spark pool.

Currently I have to manually:

  • download the whl file from the artifact feed
  • upload the whl file to the synapse workspace
  • add the package/whl manually to the packages of a spark pool ("Select from workspace packages").

I have so far not found any option to do this as part of a pipeline in Azure DevOps and also have not found any documentation. I was wondering whether this is actually possible (and if so, how) or whether this will be a feature in the future?

Best regards,
Victor

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} vote

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,546 Reputation points Microsoft Employee Moderator
    2022-06-28T16:32:50.353+00:00

    Hi @Victor Seifert ,

    An alternate way is to leverage the library and upload custom package.
    BigDataPoolsOperationsExtensions.BeginCreateOrUpdate Method (Microsoft.Azure.Management.Synapse) - Azure for .NET Developers | Microsoft Learn

    Something like this

       // Custom packages can only be added after the initial spark pool creation, otherwise Azure throws an error  
                    if (this.sparkPoolEntity.HasCustomPackages && false)  
                    {  
                        // Requirements file is specified, use it.  
                        string requirementsFileContents = null;  
                        if (!string.IsNullOrEmpty(this.sparkPoolEntity.RequirementsFilePath))  
                        {  
                            var requirementsFilePath = PathManager.Instance.GetFilePath(this.sparkPoolEntity.RequirementsFilePath);  
                            var requirementsFileName = Path.GetFileName(requirementsFilePath);  
                            requirementsFileContents = File.ReadAllText(requirementsFilePath);  
      
                            sparkPoolInfo.LibraryRequirements = new LibraryRequirements(DateTime.UtcNow, requirementsFileContents, requirementsFileName);  
                        }  
                          
                        sparkPoolInfo.CustomLibraries = await this.GetPythonLibrariesToDeploy();  
                        var sparkPool = await resourceManagement.GetSparkPoolAsync(this.sparkPoolEntity.ResourceGroup, this.sparkPoolEntity.Workspacename, this.sparkPoolEntity.SparkPoolName);  
                        var librariesToUpdate = sparkPoolInfo.CustomLibraries.Select(l => l.Name);  
                        var existingLibraries = (sparkPool.CustomLibraries ?? new List<LibraryInfo>()).Select(l => l.Name);  
      
                        var librariesChanged = !Enumerable.SequenceEqual(librariesToUpdate.OrderBy(e => e), existingLibraries.OrderBy(e => e));  
                        var requirementsChanged = requirementsFileContents != sparkPool.LibraryRequirements?.Content;  
      
                        // Skip re-provisioning if there is no change in libraries or requirements  
                        Logger.Instance.LogMessage($"Spark pool {this.sparkPoolEntity.SparkPoolName} libraries changed: {librariesChanged}. Requirements changed: {requirementsChanged}");  
                        if (librariesChanged || requirementsChanged)  
                        {  
                            await WaitForProvisioningToComplete();  
                            Logger.Instance.LogMessage($"Begin re-provisioning of spark pool {this.sparkPoolEntity.SparkPoolName}");  
      
                            // Use BeginCreateOrUpdate here so the pipeline deployment isn't blocked on the spark pool provisioning  
                            await resourceManagement.BeginCreateOrUpdateSparkPoolAsync(this.sparkPoolEntity.SparkPoolName,  
                                                                             this.sparkPoolEntity.ResourceGroup,  
                                                                             this.sparkPoolEntity.Workspacename,  
                                                                             sparkPoolInfo);  
                        }  
                        else  
                        {  
                            Logger.Instance.LogMessage($"Skip re-provisioning of spark pool {this.sparkPoolEntity.SparkPoolName}");  
                        }  
                    }  
    

    Hope this helps. Please let us know how it goes.

    -----------

    Please consider hitting Accept Answer button. Accepted answers help community as well.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.