Unable to read wild character by databricks

Shambhu Rai 1,411 Reputation points
2024-01-07T14:05:39.9766667+00:00

hi expert, Databricks unable to read wild character in column available in CSV ...tried reading using utf, and ansi but not working that is Select col1 from table1 but column contains 'Säck' wild character

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,476 questions
{count} votes

Accepted answer
  1. Sedat SALMAN 14,170 Reputation points MVP
    2024-01-08T12:14:21.95+00:00

    hi I have added more detailed explanation to my previous answer

    Azure Databricks offers different modes for handling malformed records when reading CSV files:

    • PERMISSIVE (default): Inserts nulls for fields that couldn't be parsed correctly.
    • DROPMALFORMED: Drops lines with fields that couldn't be parsed.
    • FAILFAST: Aborts reading if any malformed data is found.

    as stated in the link

    https://learn.microsoft.com/en-us/azure/databricks/query/formats/csv

    you can set mode and encoding at the same time

    df = (spark.read
          .format("csv")
          .option("mode", "PERMISSIVE")  # Set the mode for handling malformed records
          .option("charset", "UTF-8")    # Set the encoding
          .load("path_to_your_csv_file"))
    
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.