Remove characters in from values pyspark

Question

Remove characters in from values pyspark

Shambhu Rai 1,411

Hi Expert,

How to remove characters from column values pyspark sql

I.e gffg546, gfg6544

Dillon Silzer 57,831 Reputation points Volunteer Moderator

2022-08-18T19:16:57.44+00:00

Just to clarify are you trying to remove the "ff" from all strings and replace with "f"?
Shambhu Rai 1,411 Reputation points

2022-08-18T19:20:59.64+00:00

No only values should come and values like 10-25 should come as it is
546,654,10-25

Accepted answer

5 additional answers

Your answer

Dillon Silzer 57,831 Reputation points Volunteer Moderator

2022-08-18T19:16:57.44+00:00

Just to clarify are you trying to remove the "ff" from all strings and replace with "f"?
Shambhu Rai 1,411 Reputation points

2022-08-18T19:20:59.64+00:00

No only values should come and values like 10-25 should come as it is
546,654,10-25

Answer 1

Hi @Shambhu Rai

You can use this with Spark Tables + Pandas DataFrames:

Example:

import pandas as pd  
  
df=spark.table("your.table_name")  
  
pddf = df.toPandas()  
  
pddf['ColumnName']=pddf['ColumnName'].replace(regex=[r'\D+'], value="")   
  
display(pddf)

Output without replace:

Output after using the replace function:

Documentation for converting spark to pd

https://docs.databricks.com/spark/latest/spark-sql/spark-pandas.html

--------------------------

If this is helpful please accept answer.

Answer 2

Shambhu Rai 1,411

Suggestion please

Answer 3

Hi @Shambhu Rai

You can process the pyspark table in panda frames to remove non-numeric characters as seen below:

Example code: (replace with your pyspark statement)

import pandas as pd  
   
df = pd.DataFrame({  
     'A': ['gffg546', 'gfg6544', 'gfg65443213123'],  
})  
  
df['A'] = df['A'].replace(regex=[r'\D+'], value="")   
display(df)

Cited from: https://stackoverflow.com/questions/44117326/how-can-i-remove-all-non-numeric-characters-from-all-the-values-in-a-particular

----------------------------

If this is helpful please accept answer.

Answer 4

Shambhu Rai 1,411

Hi Expert,

How to do it on column level and get values 10-25 as it is in target column. Istead of 'A' can we add column

Dillon Silzer 57,831 Reputation points Volunteer Moderator

2022-08-18T20:10:05.993+00:00

Are you calling a spark table or something else?

Answer 5

Shambhu Rai 1,411

Hi Expert,

How to do it on column level and get values 10-25 as it is in target column. Istead of 'A' can we add column

Share via

Remove characters in from values pyspark

5 additional answers

Your answer