Azure Synapse: Error handling external file

Mårten Lindblad 1 Reputation point
2021-04-22T18:36:36.19+00:00

We run Azure Synapse Serverless on top of Time Series Insight data that are stored as parquet.
It works well, except that the Time Series Insight service appends to parquet files for up to 10 minutes at a time.
During those windows we get an error analyzing the data:

Msg 15813, Level 16, State 1, Line 1 
Error handling external file: 'Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.'. File/External table name: 'dbo.foo'.

With openrowset:

Msg 15813, Level 16, State 1, Line 1
Error handling external file: 'Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.'. File/External table name: 'https://foo.dfs.core.windows.net/env-foo/V=1/PT=Time/Y=2021/M=04/foo.parquet'.

Is there a way to ignore those files gracefully in Synapse?
Thanks.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,378 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ebram Tawfik 1 Reputation point Microsoft Employee
    2022-08-16T23:05:33.523+00:00

    @MartinJaffer-MSFT I am getting the same error however the file is not corrupted

    Msg 15813, Level 16, State 1, Line 1  
    Error handling external file: 'Invalid metadata in parquet file. Number of rows in metadata does not match actual number of rows in parquet file.'. File/External table name:  
    

    I was able to read the file fine using "Parquet Dotnet https://github.com/aloneguid/parquet-dotnet/tree/master" here is the code:

    private void btn_read_Click(object sender, RoutedEventArgs e)  
            {  
                txt_data.Text = "";  
                using (Stream fileStream = File.OpenRead(txt_fileName.Text))  
                {  
                    // open parquet file reader  
                    using (var parquetReader = new ParquetReader(fileStream))  
                    {  
                        // get file schema (available straight after opening parquet reader)  
                        // however, get only data fields as only they contain data values  
                        DataField[] dataFields = parquetReader.Schema.GetDataFields();  
      
                        // enumerate through row groups in this file  
                        for (int i = 0; i < parquetReader.RowGroupCount; i++)  
                        {  
                            // create row group reader  
                            using (ParquetRowGroupReader groupReader = parquetReader.OpenRowGroupReader(i))  
                            {  
                                // read all columns inside each row group (you have an option to read only  
                                // required columns if you need to.  
                                DataColumn[] columns = dataFields.Select(groupReader.ReadColumn).ToArray();  
                                  
                                for (int j = 0; j < columns.Length; j++)  
                                {  
                                    txt_data.Text += dataFields[j].Name + ": \n";  
                                    // .Data member contains a typed array of column data you can cast to the type of the column  
                                    Array data = columns[j].Data;  
                                    foreach (var item in data)  
                                    {  
                                        txt_data.Text += item.ToString() + "\n";  
                                    }  
                                    txt_data.Text += "\n\n\n\n";  
                                }  
      
      
      
                            }  
                        }  
                    }  
                }  
            }  
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.