Ingest parquet file from blob into ADX table using Python SDK

Bexy Morgan 260 Reputation points
2023-09-27T04:18:21.18+00:00

When trying to ingest the parquet file from blob container to ADX table using Python SDK, the output is showing status as queued. But the parquet file data is not getting ingested into ADX table

User's image

Azure Data Explorer
Azure Data Explorer
An Azure data analytics service for real-time analysis on large volumes of data streaming from sources including applications, websites, and internet of things devices.
526 questions
0 comments No comments
{count} votes

Accepted answer
  1. Sander van de Velde | MVP 32,726 Reputation points MVP
    2023-09-27T16:38:54.35+00:00

    Hello @Bexy Morgan ,

    IKustoQueuedIngestClient is a 'fire-and-forget' client.

    The ingestion operation on the client side ends by posting a message to an Azure queue. After the posting, the client job is done.

    For the client user's convenience, KustoQueuedIngestClient provides a mechanism for tracking the individual ingestion status.

    Check out this documentation regarding the Ingestion report level.

    A Python example is found here.

    The properties are set on KustoQueuedIngestionProperties.

    Please be careful, asking for ingestion reports puts a lot of pressure on the system.

    Turning on positive notifications for every ingestion request for large volume data streams should be avoided, since this places an extreme load on the underlying xStore resources, which might lead to increased ingestion latency and even complete cluster non-responsiveness.

    So only test it on individual calls for diagnostics, please do not make it part of regular ingestion.

    I tried it out and was able to see an error on a queued ingestion:

    User's image

    As another alternative, you could add diagnostics to the ADX service:

    Screenshot of the Diagnostic settings screen, on which you configure which monitoring data to collect for your Azure Data Explorer cluster.

    As you can see, ingestion batching can be logged also.


    If the response helped, do "Accept Answer". If it doesn't work, please let us know the progress. All community members with similar issues will benefit by doing so. Your contribution is highly appreciated.


1 additional answer

Sort by: Most helpful
  1. Sander van de Velde | MVP 32,726 Reputation points MVP
    2023-09-27T06:42:24.3566667+00:00

    Hello @Bexy Morgan ,

    Queued ingestion is preferred over direct ingestion due to increased scalability.

    But if ingestion fails, how do you know?

    Did the call return an operationid? If so, check:

      .show operations <operationId>
    

    It's always interesting to check the operations:

    Kusto maintains an internal log of running and historic operations that it processes, such as ingestion operations and data management operations.

    Next to this, you can check the commands:

    .show commands
    

    This Command can also provide some information:

    Currently, only some of the admin commands are covered by the commands table (.ingest, .set, .append, .set-or-replace, .set-or-append). Gradually, more commands are added to the commands table.


    If the response helped, do "Accept Answer". If it doesn't work, please let us know the progress. All community members with similar issues will benefit by doing so. Your contribution is highly appreciated.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.