Azure Data Factory performing transformation on derived columns

Antony Ellis 11 Reputation points
2021-10-25T15:09:55.17+00:00

Hi, so I must be missing something quite fundamental here as don't believe my question is that complex yet I have found nothing to help online. Basic requirement is read from a BLOB storage CSV file (comma delimited) and then update Microsoft Dynamics 365. For this I am using Data Flow to create a new CSV file that has the transformed values in it ready for a Copy Activity into D365 because at time of writing Sink into D365 from a data flow does not appear to be supported. Also all columns in the CSV dataset are defined as "string" I have no option to override to actual types in that dataset the only thing I have seen is the "Projection" tab where I can specify a type but it makes no difference to behaviour described below:

Whenever there is an empty string value within a Date column I get the following error:

ErrorCode=TypeConversionFailure,Exception occurred when converting value '' for column name 'xxx_movedinnolaterthan' from type 'String' (precision:, scale:) to type 'DateTime' (precision:, scale:). Additional info: String was not recognized as a valid DateTime.

Whenever there is an empty string value within a Integer column I get the following error:

ErrorCode=TypeConversionFailure,Exception occurred when converting value '' for column name 'xxx_xgroup' from type 'String' (precision:, scale:) to type 'Int32' (precision:, scale:). Additional info: Input string was not in a correct format.*

Ok... so I think no problem for the date lets add an expression in the derived column:

case (

isDate(toString(toDate(date_of_validation,'dd/MM/yy'))),toDate(date_of_validation,'dd/MM/yy'),currentDate('GMT')
)

No error now! but the problem is I am forced to provide a valid datetime I cant for example return NULL or indeed an empty value and the latter is what needs to happen. The moment I try to specify any other kind of type/value ADF expression builder complains "Expression should return same type <date> as previous expression".

?case (
fld_status == "Some Value a",167410000,
fld_status == "Some Value a",167410002,
fld_status == "Some Value a",167410001,
fld_status == "Some Value a",167410003,
fld_status == "Some Value a",167410004
,null() -- forget it! :-)
)

What am I missing here?

Even if I change the source field types (in the Projection tabs) it doesn't prevent the errors, so how am I suppose to test for invalid types/values and return a NULL? If no value was provided then I cant and would not want to make one up. Finally, I tried the ignore null option in the D365 sink copy activity just as a test and it made no difference.

I must be doing some thing very silly here
Appreciate help!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,627 questions
{count} vote

3 answers

Sort by: Most helpful
  1. svijay-MSFT 5,256 Reputation points Microsoft Employee Moderator
    2021-10-28T20:30:05.767+00:00

    Hello @Antony Ellis ,

    My understanding is that the issue is don't appear when you had the "Quote All" is disabled.

    This is because - when you pass the value "" (this is the blank date value when it gets quotes) it gets translated/parsed as '' - gets an error as this is not a valid date time string.

    Hence you see tthis piece of string '' in your error text.

    Now, when you unquote - the above behavior is not observed because - the D365 is able to handle a blank value.

    144732-image.png

    Now coming back to your questions :

    How do I Return a NULL because the expression editor moans about different types?

    I was able to return null using the below expression :

    toDate(null())  
    

    Output :

    144693-image.png

    How do I ensure text values are quoted and commas are escaped properly if I don't tick the "quote all" option the outputted CSV is not always quoted?

    If my understanding correct, you wanted to know whether the Output CSV - contained quotes. The texts in the output are not quoted. When you have unchecked the above option.

    Sample output :

    144676-image.png

    Additional Note :

    I used the below expression in the Derived column to quote only the text fields

    144694-image.png

    Here name is a text field.

    Also you'd need to remove the quote character and set no escape character at the dataset property.

    144705-image.png

    Output :

    144723-image.png

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

  2. Antony Ellis 11 Reputation points
    2021-11-22T15:26:12.63+00:00

    Ok I asked this question weeks ago and havent had a solid answer below is a summary of what I have found to work and perhaps someone can tell me if there is a better way of approach it. The first reply was very useful in terms of giving me a clue about the issue but didnt explain how to handle type conversion.

    Copy Activity

    In a nutshell for a Copy Activity when the source dataset is processing a text quoted source (e.g. CSV) and the non-text field types are surrounded with quotes this will cause type conversion errors. For example “19/10/21” would generate the following error in the copy activity even if a format of dd/MM/yy was specified as a format in the copy activity:

    ErrorCode=TypeConversionFailure,Exception occurred when converting value '19/10/21' for column name 'xxx_yourdate' from type 'String' (precision:, scale:) to type 'DateTime' (precision:, scale:). Additional info: String was not recognized as a valid DateTime.

    The Sink type doesn’t matter as this error is raised as part of ADF conversion of datasets to its interim types before the sink operation. To help avoid type conversion errors ensure that all non-text data type fields (e.g. Date, Integer, Decimal) are not text-quoted. In addition a clear plan for handling non-text data types and formats needs to be in place. The Copy Activity has some very basic options for handling dates and number formats but have limited use-cases.

    Scenario A: You don’t use data flow and don’t need to specify a date format in copy activity settings.

    This applies only when you know that the source dates in all source date fields will be unquoted and of the literal format yyyy-MM-dd or yyyy-MM-dd hh:mm:ss. All other date formats will fail.

    Scenario B: You don’t use data flow and know all dates are unquoted. You know the values in the date fields will always be dd/MM/yy or dd/MM/yy HH:mm:ss across all date fields

    In this scenario because the date format contains a “/” and not a “-“ and is not the expected format you will need to specify a format in the copy activity. So long as when a date value is provided it follows either of these formats and of course is a valid date it will not error.

    Scenario C: Date fields may or may not be text quoted and can vary in format

    In this scenario the only option is to use data flows and derived date field columns. It should be an assumed (normal) requirement that if any source fields are non-text then a data flow and transformation will most likely be needed in real world migration or integration scenarios.

    Summary

    Based on my tests it is highly unlikely that the type conversion settings in a copy activity alone will be sufficient to handle most real-world scenarios. This means the use of data flows and derived columns should be an assumed requirement. This is especially true when the source is text quoted on (date, integer, decimal) and other non-text fields.
    The supported sink types directly within a data flow at this time are very limited (E.g. D365, Azure Table) are not supported sink types. If the intended target is not supported, then you have to sink to an interim location such as a SQL table or BLOB first which has quotes removed and formats enforced. Then reference the interim location as the source within a Copy Activity with itself then has access to more supported sink types. The section below outlines the transformation expressions that were tested and used in the scenario where fields were quoted and the incoming date values were not supported.

    DATES

    Input: 19/10/21 or 2021-10-19

    The replace function was required because the date functions were returning NULL values just because the date contained a “/” and not a “-“.

    toString(
    coalesce(
    toTimestamp(replace(exp_date_of_validation,"/","-"),'dd-MM-yy HH:mm:ss'),
    toTimestamp(replace(exp_date_of_validation,"/","-"),'dd-MM-yy'),
    toTimestamp(replace(exp_date_of_validation,"/","-"),'dd-MM-yyyy HH:mm:ss'),
    toTimestamp(replace(exp_date_of_validation,"/","-"),'dd-MM-yyyy'),
    toDate(date_of_validation),
    null()
    )
    )

    151446-image.png

    Input: 2021-10-19 17:32:00

    151534-image.png

    INTEGERS

    For example an optionset value in D365.

    toInteger(
    coalesce(
    toInteger(
    case
    (
    os_type == "1",167410000,
    os_type == "2",167410001,
    os_type == "3",167410002
    )
    )
    , null()
    )
    )

    DECIMALS

    Things like currency/money fields or any other kind of fraction.

    toDecimal(
    coalesce(
    toDecimal
    (
    estimated_tenure
    )
    , null()
    )
    )

    STRINGS

    An example of string handling, although this isn’t do much at the moment you can build this to use case statements and return different values so would support basic inline business logic.

    toString(
    coalesce(
    toString
    (
    polling_key
    )
    , null()
    )
    )

    I hope this helps others and if there is a better way of validating dates or other non-text values in ADF I would like know about it.

    0 comments No comments

  3. Antony Ellis 11 Reputation points
    2021-12-06T15:16:41.34+00:00

    Hi Thanks @svijay-MSFT .

    Your suggestion:

    toTimestamp("19/10/21",'dd/MM/yy HH:mm:ss')

    If we were to present a date value of "19-10-21" your example breaks, If we were to present a date value of "19/10/21" or "19/10/2021" your example breaks, if we were to present data of "19-10-21 00:00:00" your example breaks, if we were to present a date of "19/10/21 00:00:00" then and only then it works for the one scenario and returns a valid date. It seems way too bothered about the separator "-" vs. "/" and surely for a date it shouldnt be that picky! I mean an incoming date of "19/10/21" should work if format specified "dd/MM/yy HH:mm:ss" or even ""dd-MM-yy HH:mm:ss" in anycase both scenarios would fail because of a missing timestamp and/or because the incoming format did not exactly match the date value.

    My example below seems to cover all possible permutations including whether the date field had a timestamp included or not in the source. We dont trust incoming data so having an expression in a derived column to handle multiple scenarios and ideally always return one consistent format (e.g. yyyy-MM-dd 00:00:00) seems to be the goal. It seems the only way to achieve this is by doing something like below:

    toString(
    coalesce(
    toTimestamp(replace("19-10-21","-","/"),'dd/MM/yy HH:mm:ss'),
    toTimestamp(replace("19-10-21","-","/"),'dd/MM/yy'),
    toTimestamp(replace("19-10-21","-","/"),'dd/MM/yyyy HH:mm:ss'),-
    toTimestamp(replace("19-10-21","-","/"),'dd/MM/yyyy'),
    toTimestamp(replace("19/10/21","/","-"),'dd-MM-yy HH:mm:ss'),
    toTimestamp(replace("19/10/21","/","-"),'dd-MM-yy'),
    toTimestamp(replace("19/10/21","/","-"),'dd-MM-yyyy HH:mm:ss'),
    toTimestamp(replace("19/10/21","/","-"),'dd-MM-yyyy'),
    toDate("19/10/21"),
    null()
    )


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.