Pyspark split using expr

Ryan Abbey 1,171 Reputation points
2022-01-17T23:01:30.61+00:00

According to the spark/pyspark documentation the function "split" can take 2 or 3 parameters with the third being the max number of elements to create

sql

However, when trying to use this within the "expr" function, it claims that a max of two can be provided, which documentation should I be reading to identify correct usage?

As for the issue at hand, I'm trying to split a string in to a max of 3 elements, if I can't do that using split, is there an alternative function? If I have no choice but to use split, what's the easiest approach to bringing all the subsequent elements back in to a string

e.g. for simplicity

I start with A,B,C,D,E

ideally I want
A B C,D,E

but because split will give me
A B C D E

what is the easiest way to recreate "C,D,E" given an unknown number of elements?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,395 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 37,896 Reputation points Microsoft Employee
    2022-01-18T16:28:25.94+00:00

    Hi @Ryan Abbey ,

    Thank you for posting query in Microsoft Q&A Platform.

    Consider using Spark pool with Apache Spark version 3.1 to run your notebook. This runtime will have split() function which takes all 3 parameters as you mentioned. Kindly check below screenshot.

    166073-image.png

    While creating spark pool make sure under additional settings select Apache spark version 3.1
    166124-image.png

    Hope this will help. Please let us know if any further queries.


    Please consider hitting Accept Answer. Accepted answers help community as well.