Remove characters in from values pyspark

Shambhu Rai 1,411 Reputation points
2022-08-18T18:32:32.387+00:00

Hi Expert,

How to remove characters from column values pyspark sql

I.e gffg546, gfg6544

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
{count} votes

Answer accepted by question author
  1. Dillon Silzer 60,736 Reputation points Volunteer Moderator
    2022-08-18T20:24:27.46+00:00

    Hi @Shambhu Rai

    You can use this with Spark Tables + Pandas DataFrames:

    Example:

    import pandas as pd  
      
    df=spark.table("your.table_name")  
      
    pddf = df.toPandas()  
      
    pddf['ColumnName']=pddf['ColumnName'].replace(regex=[r'\D+'], value="")   
      
    display(pddf)  
    

    Output without replace:

    232574-image.png

    Output after using the replace function:

    232560-image.png

    Documentation for converting spark to pd

    https://docs.databricks.com/spark/latest/spark-sql/spark-pandas.html

    --------------------------

    If this is helpful please accept answer.

    0 comments No comments

5 additional answers

Sort by: Most helpful
  1. Shambhu Rai 1,411 Reputation points
    2022-08-18T20:29:36.467+00:00

    Sir,
    Last point how to handle 10-25 in the same column name when we need 25 as an output in same column


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.