Is Azure OpenAI HIPAA compliant for PDF inputs?

Thomas Wood 0 Reputation points
2025-12-03T16:12:17.5433333+00:00

Hi,

I am using Azure OpenAI and I would like to process a PDF containing PHI data.

I know that I am covered for HIPAA by the BAA for the text endpoint, and it doesn't extend to image inputs. Can I also process PDFs and be HIPAA compliant?

One thing that I am unsure of:

  1. I think OpenAI will do some image processing internally to handle PDFs - so would this count as images?
  2. There isn't a single API endpoint that I can find where I can just process a PDF. I have to do it in two calls: 1. - to create the file and 2. to query it. That means that the PDF gets stored on OpenAI's servers. Would it be HIPAA compliant if I deleted it always immediately afterwards?

Here's an example of my code:

 openai_file = client.files.create(
                file=open(path, "rb"),
                purpose="user_data"
            )

file_id = openai_file.id

json_data = {
'model': 'gpt-4.1',
'input': [
    {
        'role': 'user',
        'content': [
            {
                'type': 'input_file',
                'file_id': file_id,
            },
            {
                'type': 'input_text',
                'text':
                    prompt
            },
        ],
    },
],
"temperature": 0
}
response = requests.post('https://{your-resource-name}.openai.azure.com/openai/deployments/{deployment}/responses', headers=headers, json=json_data)
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Divyesh Govaerdhanan 10,065 Reputation points
    2025-12-04T01:28:22.1+00:00

    Hi Thomas Wood,

    Welcome to Microsoft Q&A,

    Today, I would not treat PDF file inputs containing PHI as HIPAA-covered in Azure OpenAI.

    For HIPAA workloads, the conservative and documented pattern is still:

    "Extract the text from the PDF in a HIPAA-eligible service, then send only text to Azure OpenAI under your BAA."

    In the new Responses API, PDF support is implemented via models that have vision capabilities. The official docs say:

    “Models with vision capabilities support PDF input… To help models interpret PDF content, both the extracted text and an image of each page are included in the model’s context… Only models that support both text and image inputs can accept PDF files as input.”

    So even though you pass a “PDF file”, under the hood it’s text + rendered page images, which falls into the vision path, not just the classic text-only path.

    Text inputs under a BAA → HIPAA-eligible

    Image/vision / audio-streaming modalities → not explicitly in HIPAA scope yet

    Since PDFs via the Responses API are processed as text + page images, they currently look much closer to the “image/vision PHI” scenario than the plain text scenario.

    When you upload the PDF via the Files API:

    1. The file is stored, processed, and then referenced by ID in the Responses call.
    2. Deleting the file afterwards is good security hygiene, but:
      • It does not change whether that feature is formally in HIPAA scope.
      • It does not guarantee that all derived artifacts (logs, abuse-monitoring traces, derived images) are instantly wiped.

    HIPAA eligibility is tied to whether the service + feature is in the in-scope list under Microsoft’s BAA, not only to your deletion behavior.

    Please upvote and accept the answer if it helps!!

    0 comments No comments

  2. SRILAKSHMI C 10,640 Reputation points Microsoft External Staff Moderator
    2025-12-04T04:11:37.8533333+00:00

    Hello Thomas Wood,

    Welcome to Microsoft Q&A and Thank you for reaching out.

    I understand that you're trying to find out if you can process PDFs containing protected health information (PHI) with Azure OpenAI while staying HIPAA compliant.

    Here's what you need to know:

    HIPAA and Azure OpenAI: Azure OpenAI can support HIPAA compliance, but it's crucial that you're operating under a Business Associate Agreement (BAA) with Microsoft. You mentioned that you've already established this for the text endpoint, which is a good first step.

    PDF Processing: According to the Azure documentation, PDF files can be processed by models with vision capabilities, and they include both text extraction and image data within the model's context. However, because PDFs contain both text and potentially visual data, they could be treated similarly to image inputs in some respects.

    Data Storage: You pointed out that you're required to make two API calls: one to upload the PDF and another to process it. While this means the PDF gets stored temporarily on OpenAI's servers, it's important to ensure that any PHI contained in the document is managed according to HIPAA regulations. If you delete the file immediately after processing, it reinforces your effort to maintain compliance, but it’s always advisable to consult your legal counsel to ensure this approach meets HIPAA standards.

    Best Practices: Make sure to review the Azure OpenAI data privacy documentation and ensure that your use case aligns with HIPAA compliance requirements. Also, as you noted, you’ll want to delete the PDFs immediately after extracting the information, and ensure you maintain encryption standards during transmission.

    Please refer this

    I Hope this helps. Do let me know if you have any further queries.


    If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

    Thank you!

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.