Share via

Data Source to GPT Deployment

Ionut Dutescu 60 Reputation points
2024-03-25T14:12:20.5366667+00:00

Hello,

I was wondering, how is the data accessed by the model, when adding own data. I have seen that in the playground chat, it shows me the referenced files from the blob storage. However I only have one file in the storage, and in the references it splits the file into multiple files. Why is this so? It seems like it cannot read the whole file and it is missing details from that part of the file. How can I avoid this split?

Thanks!

Azure Blob Storage
Azure Blob Storage

An Azure service that stores unstructured data in the cloud as blobs.

Foundry Tools
Foundry Tools

Formerly known as Azure AI Services or Azure Cognitive Services is a unified collection of prebuilt AI capabilities within the Microsoft Foundry platform

0 comments No comments

Answer accepted by question author

  1. Amira Bedhiafi 41,386 Reputation points MVP Volunteer Moderator
    2024-03-25T16:23:28.6833333+00:00

    Azure Blob Storage is a massively scalable object storage solution for large amounts of unstructured data. When you upload data to Blob Storage, it is stored as a "blob," which can then be accessed by various services, including Azure AI services, for different purposes like training or inference.

    Your issue can be related to :

    • Data chunking
    • Limitation of the API you are using

    If you need to avoid or manage the splitting of files I recommend:

    • Manually split your large files into smaller segments before uploading them to Blob Storage.
    • If possible, adjust the sizes of your files to be within the processing limits of the Azure AI service you're using.
    • Some Azure services and third-party tools offer preprocessing capabilities that can help you prepare your data in a way that's optimized for use with AI models.

    Was this answer helpful?

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.