IndexError: string index out of range - ADF - Databricks - Python

Jered Willoughby 1 Reputation point
2023-02-03T17:44:09.69+00:00
I'm passing a UDF value from a JSON table through Azure Data Factory as the following:

"schema_array": "[('creationdate','timestamp'),('agent_name_txt','string'),('email','string'),('agent_hire_date','date'),('days_since_hire_text','int'),('department_auto','string')]"

But when my databricks notebooks accepts it and the indexing is interpreted as a string (which it needs to be set as an array.

This will work in my notebook by accepting the value when hardcoded, but not when passing from ADF. Here's the data type when pass from ADF: print(type(schema_array)) <class 'str'>

In need to know how to transform this (I'm in python) so my notebook will accept it as an array. I've tried to set it as an array in ADF many different ways, but will get the following error: "The variable 'udf' of type 'String' cannot be initialized or updated with value of type 'Array'. The variable 'udf' only supports values of types 'String'".

I'm running it through a schema mapping function to transform my data frame:

def string_to_datatype(datatype):
  if datatype == 'timestamp':
    final_datatype = t.TimestampType()
  elif datatype == 'integer' or datatype == 'int':
    final_datatype = t.IntegerType()
  elif datatype == 'date':
    final_datatype = t.DateType()
  else: final_datatype = t.StringType()

  return final_datatype

schema = t.StructType([])

for s in schema_array:
    schema = schema.add(t.StructField(s[0], string_to_datatype(s[1])))
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,910 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,483 questions
{count} votes

1 answer

Sort by: Most helpful
  1. BhargavaGunnam-MSFT 25,876 Reputation points Microsoft Employee
    2023-02-14T00:07:33.9066667+00:00

    Hello @Jered Willoughby,

    Welcome to the MS Q&A platform.

    You can use the json.loads() method to convert the string into a list. Here is an example:

    import json

    schema_array = "[('creationdate','timestamp'),('agent_name_txt','string'),('email','string'),('agent_hire_date','date'),('days_since_hire_text','int'),('department_auto','string')]"

    # Convert the string into a list

    schema_list = json.loads(schema_array)

    # Print the data type of the list

    print(type(schema_list))

    This should output <class 'list'>, indicating that the string has been successfully converted into a list. You can then use the list in your Databricks notebook as needed.

    Alternatively, you can use the ast.literal_eval() method to convert the string into a list. This method is safer than json.loads() because it only evaluates literals and not arbitrary code. Here is an example:

    import ast

    schema_array = "[('creationdate','timestamp'),('agent_name_txt','string'),('email','string'),('agent_hire_date','date'),('days_since_hire_text','int'),('department_auto','string')]"

    # Convert the string into a list

    schema_list = ast.literal_eval(schema_array)

    # Print the data type of the list

    print(type(schema_list))

    This should also output <class 'list'>, indicating that the string has been successfully converted into a list. You can then use the list in your Databricks notebook as needed.

    I hope this helps. Let me know if you have any other questions.

    0 comments No comments