IndexError: string index out of range - ADF - Databricks - Python

Question

IndexError: string index out of range - ADF - Databricks - Python

Jered Willoughby 1

I'm passing a UDF value from a JSON table through Azure Data Factory as the following:

"schema_array": "[('creationdate','timestamp'),('agent_name_txt','string'),('email','string'),('agent_hire_date','date'),('days_since_hire_text','int'),('department_auto','string')]"

But when my databricks notebooks accepts it and the indexing is interpreted as a string (which it needs to be set as an array.

This will work in my notebook by accepting the value when hardcoded, but not when passing from ADF. Here's the data type when pass from ADF: print(type(schema_array)) <class 'str'>

In need to know how to transform this (I'm in python) so my notebook will accept it as an array. I've tried to set it as an array in ADF many different ways, but will get the following error: "The variable 'udf' of type 'String' cannot be initialized or updated with value of type 'Array'. The variable 'udf' only supports values of types 'String'".

I'm running it through a schema mapping function to transform my data frame:

def string_to_datatype(datatype):
  if datatype == 'timestamp':
    final_datatype = t.TimestampType()
  elif datatype == 'integer' or datatype == 'int':
    final_datatype = t.IntegerType()
  elif datatype == 'date':
    final_datatype = t.DateType()
  else: final_datatype = t.StringType()

  return final_datatype

schema = t.StructType([])

for s in schema_array:
    schema = schema.add(t.StructField(s[0], string_to_datatype(s[1])))

Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-02-09T01:58:00.3833333+00:00

Hello @Jered Willoughby,

I am checking to see if you have any further questions here.

1 answer

Your answer

Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-02-09T01:58:00.3833333+00:00

Hello @Jered Willoughby,

I am checking to see if you have any further questions here.

Answer 1

Hello @Jered Willoughby,

Welcome to the MS Q&A platform.

You can use the json.loads() method to convert the string into a list. Here is an example:

import json

schema_array = "[('creationdate','timestamp'),('agent_name_txt','string'),('email','string'),('agent_hire_date','date'),('days_since_hire_text','int'),('department_auto','string')]"

# Convert the string into a list

schema_list = json.loads(schema_array)

# Print the data type of the list

print(type(schema_list))

This should output <class 'list'>, indicating that the string has been successfully converted into a list. You can then use the list in your Databricks notebook as needed.

Alternatively, you can use the ast.literal_eval() method to convert the string into a list. This method is safer than json.loads() because it only evaluates literals and not arbitrary code. Here is an example:

import ast

schema_array = "[('creationdate','timestamp'),('agent_name_txt','string'),('email','string'),('agent_hire_date','date'),('days_since_hire_text','int'),('department_auto','string')]"

# Convert the string into a list

schema_list = ast.literal_eval(schema_array)

# Print the data type of the list

print(type(schema_list))

This should also output <class 'list'>, indicating that the string has been successfully converted into a list. You can then use the list in your Databricks notebook as needed.

I hope this helps. Let me know if you have any other questions.

Share via

IndexError: string index out of range - ADF - Databricks - Python

1 answer

Your answer