Microsoft Graph API write raw retrieved .pdf string content to a pdf file from Python

Tim 156 Reputation points
2022-10-25T12:45:52.623+00:00

I'm using the Microsoft Graph API to retrieve Sharepoint document content from within a Python script. I search for documents with the https://graph.microsoft.com/v1.0/search/query endpoint, and then attempt to retrieve the document content via https://graph.microsoft.com/v1.0/sites/{site_id}/drives/{drive_id}/items/{item_id}/content. I want to write content as a .pdf to a blob storage for further processing.

Now, when I call the content endpoint with the Python requests library, I get the .pdf back as a string from the endpoint, which I can retrieve with response.text. This text looks as you would expect for .pdf content (snippet):

%PDF-1.7  
%����  
1 0 obj  
<</Type/Catalog/Pages 2 0 R/Lang(nl-NL) /StructTreeRoot 29 0 R/MarkInfo<</Marked true>>/Metadata 117 0 R/ViewerPreferences 118 0 R>>  
endobj  
2 0 obj  
<</Type/Pages/Count 2/Kids[ 3 0 R 24 0 R] >>  
endobj  
3 0 obj  
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R/F2 10 0 R/F3 12 0 R/F4 17 0 R/F5 19 0 R>>/ExtGState<</GS7 7 0 R/GS8 8 0 R>>/XObject<</Image9 9 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 594.96 842.04] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>>  
endobj  
4 0 obj  
<</Filter/FlateDecode/Length 3438>>  
stream  
x��\mS�8 �N �A ��EX�$�s[T  

so what I try to do is write this content to a file like:

with open('pdffilefromsharepoint.pdf', 'w') as f:  
  f.write(response.text)  

Now this writes away to the PDF without error. However, when I open the document in a .pdf-reader I get just two empty pages with no content at all. Moreover, when I look at the raw contents of my original Sharepoint file and my .pdf file that was written via the result of the content gathered from the Graph API, they seem to be exactly identical: Same number of lines, and also seem to have the exact same content in it line-by-line.

One notable thing is that the original document is just 68kb, while the one written from the gathered API content is 113kb.

Has anyone ever tried to achieve a similar thing like this? Do I need a special package to write this content to a .pdf again from Python?

Also, isn't it possible to just get the bytes from the Graph API for a document?

Microsoft 365 and Office SharePoint Development
Microsoft Security Microsoft Graph
{count} votes

Accepted answer
  1. Tong Zhang_MSFT 9,251 Reputation points
    2022-10-26T09:18:15.53+00:00

    Hi @Tim ,

    According to my research and testing, when we use the Graph API: https://graph.microsoft.com/v1.0/sites/{site_id}/drives/{drive_id}/items/{item_id}/content, we only can return strings in response.

    If you want to get the byte of a document, please try to use the following Graph API, the size is in bytes:

    https://graph.microsoft.com/v1.0/sites/{site id}/drives/{drive id}/items/{item id}?$select=id,size  
    

    254200-01.png

    Hope it can help you. Thanks for your understanding.


    If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.



0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.