Synapse DataFlow String interpolation

Saru Thiagarajan 31 Reputation points
2021-08-15T21:09:23.373+00:00

I have a source string literal containing one or more placeholders. E.g.

"Hi [[personID: 10001]]"
"Hi [[personID: 10001]] and [[personID: 10012]],"
"Hi [[personID: 1200]]"

The above are stored as rows in a table. I have another table (t_person) where you can find the name of a person using the personID.

personID, name
10001, Mike
10012, Dan
1200, Ram

The dataflow results should be:
"Hi Mike"
"Hi Mike and Dan,"
"Hi Ram"

In my dataflow expression, I tried using mapIf, map etc and it is actually is getting complicated. I'm pretty sure there is a simpler way to interpolate this. Any help is greatly appreciated.

Note: Due to various non technical reasons, want to get it done within the context of the dataflow engine.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,368 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,539 questions
0 comments No comments
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 26,026 Reputation points
    2021-08-16T20:54:29.357+00:00

    Hello @Saru Thiagarajan and welcome to Microsoft Q&A.

    I believe I have a solution that should work, but found a product bug while implementing it.

    The idea is to put the t_person table into a cached sink so we can use a cached lookup. A cached lookup lets us do the lookup in-line rather than a separate lookup activity.

    Ideally, I would extract and then replace the personID from the greeting, but you have an example with multiple persons, so I must use a different approach.

    First I split the greeting, and clean up the brackets. I do this with

    regexSplit(GreetingWithID, `(\[\[personID: )|(\]\])`)  
    

    "Hi [[personID: 10001]]" => ["Hi ","10001", ""]

    Next I want to replace the "10001" with "Mike". I do not want to replace "Hi". So I can do a test isInteger() . When using a map, #item represents the current thing.

    map(regexSplit(GreetingWithID, `(\[\[personID: )|(\]\])`), iif(not(isInteger(#item)), #item, toString(cache#lookup(toShort(#item)).name)))  
    

    Lastly I want to join the pieces together. We can use the reduce() funtion.

    toString(reduce(The_map_above, '', #acc + #item, #result))  
    

    The bug I'm working on right now is "name" not being found.

    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Saru Thiagarajan 31 Reputation points
    2021-08-17T03:21:23.013+00:00

    Thanks for working on this. The "reduce" is a good one, +1 for that.

    I've been breaking my head around the same cache lookup. The dataflow would simply be in the queue state and pipeline would be in the running state when the expression has the cachedPerson inline lookup.

    Would you double check for me in your logic that it is replacing all the placeholders in the string for the use case : "Hi [[personID: 10001]] and [[personID: 10012]],"

    Kudos (+1) for the use of reducer. Let me also test on my end by renaming the column to something else. "name" could be a reserved word and it could hinder with the metadata namespaces.

    Look forward to hearing back from you.


  2. Saru Thiagarajan 31 Reputation points
    2021-08-18T01:51:31.18+00:00

    I'll accept your answer but do let us know after you've gone past that bug.

    The below is using the join and not a lookup in the map.

    Create a Crossjoin where left stream is the srcGreetingWithID and Right Stream is the srct_person
    Condition instr(GreetingWithID, '[[personID: '+personID+']]') > 0

    and in the aggregator

    reduce(collect(personid+': :'+name), first(GreetingWithID),replace(#acc,'[personID:'+toString(split(#item, ': :')[1]+']]', toString(split(#item, ': :')[2])), #result)

    0 comments No comments