ADF Copy Activity giving ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.time.DateTimeException:Invalid date 'February 29' as '200' is

Immaneni, Nanda 25 Reputation points
2024-11-25T14:27:18.02+00:00

We are using a Copy Activity to read from Parquet Files in a specific directory and load them into Parquet File in a Target Directory. While doing that noticed that we are getting following error:

{ "Code": 21000, "Message": "ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.time.DateTimeException:Invalid date 'February 29' as '200' is not a leap year\ntotal entry:7\r\njava.time.LocalDate.create(LocalDate.java:429)\r\njava.time.LocalDate.of(LocalDate.java:269)\r\njava.time.LocalDateTime.of(LocalDateTime.java:361)

When I looked at the data in the parquet file, I noticed few rows with near by dates, but not exact Feb 29, 200, but none with 02-29, Can you please let us know on this.

0166-04-29T00:00:00.000+00:00

0200-03-01T00:00:00.000+00:00

0200-03-05T00:00:00.000+00:00

0200-03-06T00:00:00.000+00:00

0200-03-12T00:00:00.000+00:00

0200-03-13T00:00:00.000+00:00

0200-04-15T00:00:00.000+00:00

0200-04-23T00:00:00.000+00:00

0200-07-28T00:00:00.000+00:00

0200-08-01T00:00:00.000+00:00

0200-08-06T00:00:00.000+00:00

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,646 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247 34,741 Reputation points MVP Volunteer Moderator
    2024-11-25T16:51:19.19+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    The error occurs because Java's LocalDate class is unable to parse a date like "February 29" for the year 200, as it is not a valid leap year in the Gregorian calendar system, which Java uses by default. Here's how to resolve the issue:

    Possible Root Cause:

    The error suggests there are rows in your dataset with malformed or invalid date values. Even if the exact "February 29, 200" is not present, Java is likely interpreting dates in a format that makes it assume this invalid date exists.

    Steps to Troubleshoot and Resolve

    Validate the Data Source

    • Check your source Parquet file for date anomalies.
      • Pay attention to rows with near-boundary dates (e.g., dates close to February 29 or the year 200).
        • Use a tool like Apache Spark or Python's PyArrow to inspect the Parquet file for any inconsistencies.
        Apply Transformation in ADF Use a Data Flow in Azure Data Factory to transform the data before writing it to the target. Specifically:
        - Replace or correct invalid dates.
        
           - Apply a filter or transformation to identify and clean malformed data.
        
           Example Transformation:
        
              - Use an `If Condition` or derived column in the data flow to check if the date is invalid.
        
                 - Replace invalid dates with a default or null value.
        
    1. Preprocessing with Spark If the ADF Data Flow is not sufficient or practical:
      • Load the Parquet file into a Spark environment.
        • Use Spark's DataFrame operations to correct or filter invalid dates.
         python
         
         from
      
      Set the Correct Schema In ADF's copy activity, ensure the schema for date fields is explicitly defined and matches the data format to avoid misinterpretation. Use a Custom Java Runtime If dates must conform to a custom calendar system (e.g., Julian calendar):
      • Implement a custom processing layer in Java.
        • Alternatively, preprocess the data externally before ingestion into ADF.
        Log and Capture Errors Enable logging in ADF to identify which specific rows are causing the failure. This can help you zero in on problematic records.

    Preventive Measures

    • Data Validation in Source Systems: Ensure that the source system generating Parquet files validates dates to prevent invalid entries.
    • Schema Evolution: When dealing with legacy datasets, ensure schema definitions are updated and validated against the source data.

    Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.