I have already tried this solution but it didn’t work as it was giving multiple errors when keeping – or index number or using actual array element name. Actually, the source data frame syntax is as below so, here articles array has struct elements within it.
dataframe syntax is
id:string
countryCode:string
articles:array
element:struct
articleId:string
mainCategoryId:string
subCategoryId:string
transactionDate:date
size:string
Business requirement is when we load data for a particular year, say 2021 and the transactions for different customers identified as id are loaded as below for example, with one customer 12345 having two products purchased in 2021.
{
"id": "12345",
"countryCode": "IN",
"articles": [
{
"articleId": "123",
"mainCategoryId": "Boots",
"subCategoryId": "ladies",
"purchasePrice": 200,
"orderId": 12345,
"transactionDate": 17861,
"size": "38"
},
{
"articleId": "456",
"mainCategoryId": "Boots",
"subCategoryId": "men",
"purchasePrice": 300,
"orderId": 12367,
"transactionDate": 17867,
"size": "38"
}
],
"_rid": "XXXXX",
"_self": "YYY",
"_etag": “ZZZ",
"_attachments": "attachments/",
"_ts": 1660823865
}
When we load data for 2022 year, it should ideally append the newly purchased articles by the customer in 2022 along with already purchased articles in 2021
Ideal behavior:
{
"id": "12345",
"countryCode": "IN",
"articles": [
{
"articleId": "123",
"mainCategoryId": "Boots",
"subCategoryId": "ladies",
"purchasePrice": 200,
"orderId": 12345,
"transactionDate": 17861,
"size": "38"
},
{
"articleId": "456",
"mainCategoryId": "Boots",
"subCategoryId": "men",
"purchasePrice": 300,
"orderId": 12367,
"transactionDate": 17867,
"size": "38"
},
{
"articleId": "123",
"mainCategoryId": "Tshirt",
"subCategoryId": "ladies",
"purchasePrice": 500,
"orderId": 12343,
"transactionDate": 17889,
"size": "L"
},
{
"articleId": "456",
"mainCategoryId": " Tshirt ",
"subCategoryId": "men",
"purchasePrice": 600,
"orderId": 12382,
"transactionDate": 17889,
"size": "L"
}
],
"_rid": "XXXXX",
"_self": "YYY",
"_etag": “ZZZ",
"_attachments": "attachments/",
"_ts": 1660823865
}
Issue is it is replacing the existing articles with new ones
{
"id": "12345",
"countryCode": "IN",
"articles": [
{
"articleId": "123",
"mainCategoryId": "Tshirt",
"subCategoryId": "ladies",
"purchasePrice": 500,
"orderId": 12343,
"transactionDate": 17889,
"size": "L"
},
{
"articleId": "456",
"mainCategoryId": " Tshirt ",
"subCategoryId": "men",
"purchasePrice": 600,
"orderId": 12382,
"transactionDate": 17889,
"size": "L"
}
],
"_rid": "XXXXX",
"_self": "YYY",
"_etag": “ZZZ",
"_attachments": "attachments/",
"_ts": 1660823865
}