Best way to implement a sheet update?

Cataster 641 Reputation points
2021-02-22T22:15:47.227+00:00

Today, we have a process that involves the following:

  • JIRA
  • Google Sheets/tsv files
  • c# code
  • Python code
  • Manual intervention

There is a c# project hosted on bitbukcet that has Azure functions. Amongst the important functions, this one stands out ReadFrameGoogleSheetCols. (Code for it below)

Data is read from the database and then a function uploads the generated worksheet to a JIRA task (via webhooks)

worksheet123.tsv sample file:
sheet1

My colleague downloads the sheet attached in the respective JIRA task (e.g. worksheet123.tsv), uses it as input file to run python machine learning code to generate productItemID's from columns A,B, D, and E of the worksheet.

pycode

The output of this python code run is a mappings.tsv file that contains columns A, B of the input file as well as the generated ProductItemID column (as column C).

mapping.tsv output file sample:
ProdItemID

Then, she copies the ProductItemID column and pastes into the original file (e.g.worksheet123.tsv) on Google Drive. That is the manual part today.

We want to automate this process, and refine the workflow.

As you can probably spot from the images/steps, my colleague hosts all this on her machine today.

We are thinking of utilizing blob storage to host the input file, by maybe creating a webhook in the c# code to generate the file there. and then there would be some form of Azure file watcher that watches whenever theres an input file, inputs it into the python code and generate the mapping.tsv file with productItemID and then copy/paste it into the original worksheet used as input as additional column.
This means that the python code has to run on Azure of course, cause today my colleague runs it locally on her machine. I am not sure if thats possible...

We are also considering a message queue instead of file watcher.

Either way, Im looking for best advise/recommendations about streamlining our current process today. Like what services would be helpful for this on Azure? Is the filewatcher/message queueing method the right/best approaches here? Are there better alternatives? Suggestions are welcome!!

--------------------

Note: I was not sure the right forum to post this, but since i think we may utilize blob storage or maybe a data lake, and there is some Azure functions involved, i tagged it under the respective tags. but there may be other better ways to do this so id appreciate some guidance on this!

ReadFrameGoogleSheetCols()

    public static Dictionary<string, List<string>> ReadFrameGoogleSheetCols(string fileId, string sheetCol, int startcol = 1)  
    {  
        var dc = new Dictionary<string, List<string>>();  
        var sheetname = Helper.GetSetting("GoogleSheetName");  

        var cols = Helper.GetSetting("SheetSPcolumn_Mapping") + "," + sheetCol;  

        if (!string.IsNullOrEmpty(sheetCol))  
        {  
            var col1 = cols.Split(',')[0];  
            cols = $"{col1},{sheetCol}";  
        }  

        String range = $"{sheetname}!A{startcol}:ZZ2";  

        GoogleAPIService api = new GoogleAPIService();  
        SpreadsheetsResource.ValuesResource.GetRequest request =  
               api.GoogleSheetService().Spreadsheets.Values.Get(fileId, range);  
        ValueRange response = request.Execute();  
        IList<IList<Object>> values = response.Values;  
        if (values != null && values.Count > 0)  
        {  
            var iWarning = 0;  
            var iMaxCol = 0;  
            for (var i = 0; i < values[0].Count; i++)  
            {  
                if (values[0][i].ToString().ToLower().Equals(cols.Split(',')[0].ToLower()))  
                {  
                    iWarning = i;  
                    break;  
                }  
            }  
              
            List<int> lCols = new List<int>();  

            for (var i = iWarning + 1; i < values[0].Count; i++)  
            {  
                if (cols.ToLower().Contains(values[0][i].ToString().ToLower()))  
                {  
                    iMaxCol = i;  
                    lCols.Add(i - iWarning);  
                }  
            }  

            var startRow = startcol + 1; // row where to start read data  

            range = $"{sheetname}!{api.GetColumnName(iWarning)}{startRow}:{api.GetColumnName(iMaxCol)}";  
            SpreadsheetsResource.ValuesResource.GetRequest reqSP =  
              api.GoogleSheetService().Spreadsheets.Values.Get(fileId, range);  

            ValueRange resSP = reqSP.Execute();  
            IList<IList<Object>> valsp = resSP.Values;  


            var lstwarnings = new List<string>();  
            lstwarnings.AddRange(from row in valsp  
                                 where !row[0].ToString().ToUpper().StartsWith("OK")  
                                 select row[0].ToString());  

            // return null if All warnings is not OK  
            if (lstwarnings.Count > 0)  
                return null;  


            if (valsp != null && valsp.Count > 0)  
            {  
                var colSP = sheetCol.Split(',');  
                var icol = 0;  

                foreach (var (i, lstsp) in from i in lCols  
                                           let lstsp = new List<string>()  
                                           select (i, lstsp))  
                {  
                    lstsp.AddRange(from row in valsp  
                                   where row.Count > i && row[0].ToString().ToUpper().StartsWith("OK") && !row[i].ToString().StartsWith("--")  
                                   select row[i].ToString());  
                    dc.Add(colSP[icol], lstsp);  
                    icol++;  
                }  

                if (!ValidateHeaders(colSP, dc))  
                    return null;  
            }  

        }  

        return dc;  
    }  
Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,337 questions
Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
4,227 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,424 questions
C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
10,215 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Pramod Valavala 20,516 Reputation points Microsoft Employee
    2021-02-23T06:19:17.763+00:00

    Azure Functions would be the ideal choice for all things related to code (like the python part) in your scenario.

    For the sheets, you could even leverage Logic Apps by using Excel and/or Google Sheets connectors, reducing the amount of code that you need to manage, assuming the connectors satisfy all your requirements.

    And instead of a file watcher, you could leverage Blob Storage Events (powered by Event Grid) directly without polling and/or implementing your own queue.