Encountering an issue while collecting billing data using the Azure Blob SDK, where the stream seems to be forcibly closed after processing a specific number of records.

Sunyoup Park (박선엽) 105 Reputation points
2024-09-13T06:30:43.68+00:00

https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-download-java#download-from-a-stream

I am in the process of transitioning to the Azure CSP Billing Data API and have implemented data collection functionality using the Azure Blob SDK, as shown in the code below.

I am testing this functionality with multiple customer accounts, and in every case, the stream appears to be forcibly closed after exactly 1.5 million records are processed.

Data processing takes approximately 10 to 15 minutes, but I am unsure of the root cause. I am seeking clarification on whether this issue is related to the Azure Blob SDK or if there is an option to adjust the connection timeout settings. Could you provide guidance on this matter?

try {
  
  val blobStream = blobClient.openInputStream()
  val gzipStream = new GZIPInputStream(blobStream)
  val inputStreamReader = new InputStreamReader(gzipStream, StandardCharsets.UTF_8)
  val bufferedReader = new BufferedReader(inputStreamReader)

  val jsonFactory = new JsonFactory()
  val jsonParser = jsonFactory.createParser(bufferedReader)

  val usageBlobMapper = new ObjectMapper()
  usageBlobMapper.registerModule(DefaultScalaModule)
  usageBlobMapper.registerModule(new ParameterNamesModule())
  usageBlobMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)

  val blobUsageList = ListBuffer[CspApUsageBlob]()
  var currentLine = 0
  var currentColumn = 0
  
  while (!jsonParser.isClosed) {
    try {
      
      currentLine += 1

      if (jsonParser.nextToken() != null) {
        val blobUsage = usageBlobMapper.readValue(jsonParser, classOf[CspApUsageBlob])
        blobUsageList += blobUsage
      }

      // Data proccessing...

    } catch {
      case e: MismatchedInputException =>
        println(s"Error reading JSON data at line $currentLine: ${e.getMessage}")
        e.printStackTrace()

      case e: EOFException =>
        println(s"Reached end of JSON input at line $currentLine: ${e.getMessage}")
        e.printStackTrace()

      case e: Exception =>
        println(s"Unknown error at line $currentLine: ${e.getMessage}")
        e.printStackTrace()
    }

  }

  // Last Data proccessing...

} catch {
  case e: IOException => e.printStackTrace()
}

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,843 questions
Microsoft Graph
Microsoft Graph
A Microsoft programmability model that exposes REST APIs and client libraries to access data on Microsoft 365 services.
12,002 questions
{count} votes

Accepted answer
  1. Nehruji R 7,801 Reputation points Microsoft Vendor
    2024-09-13T10:03:46.5366667+00:00

    Hello Sunyoup Park (박선엽),

    Greetings! Welcome to Microsoft Q&A Platform.

    Azure SDK, the SDK may have an auto retry configuration. For more information, see Retry guidance for Azure services.

    Some resource providers return 429 to report a temporary problem. The problem could be an overload condition that isn't directly caused by your request. Or, it could represent a temporary error about the state of the target resource or dependent resource. For example, the network resource provider returns 429 with the RetryableErrorDueToAnotherOperation error code when the target resource is locked by another operation. To determine if the error comes from throttling or a temporary condition, view the error details in the response.

    For more information, refer to this article: https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/request-limits-and-throttling

    Troubleshooting API throttling errors

    The limits are documented here: https://learn.microsoft.com/en-us/azure/azure-subscription-service-limits (please see Subscription limits - Azure Resource Manager section). And you can see the 429-error code from here.

    Based on the documentation, currently you're allowed to make 15000 Read requests/hour for Azure Resource Manager API.

    There is a similar discussion thread in SO please refer to the suggestion

    Hope this helps! Kindly let us know if the above helps or you need further assistance on this issue.


    Please don’t forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.