Microsoft Graph API write raw retrieved .pdf string content to a pdf file from Python

Question

Microsoft Graph API write raw retrieved .pdf string content to a pdf file from Python

Tim 156

I'm using the Microsoft Graph API to retrieve Sharepoint document content from within a Python script. I search for documents with the https://graph.microsoft.com/v1.0/search/query endpoint, and then attempt to retrieve the document content via https://graph.microsoft.com/v1.0/sites/{site_id}/drives/{drive_id}/items/{item_id}/content. I want to write content as a .pdf to a blob storage for further processing.

Now, when I call the content endpoint with the Python requests library, I get the .pdf back as a string from the endpoint, which I can retrieve with response.text. This text looks as you would expect for .pdf content (snippet):

%PDF-1.7  
%����  
1 0 obj  
<</Type/Catalog/Pages 2 0 R/Lang(nl-NL) /StructTreeRoot 29 0 R/MarkInfo<</Marked true>>/Metadata 117 0 R/ViewerPreferences 118 0 R>>  
endobj  
2 0 obj  
<</Type/Pages/Count 2/Kids[ 3 0 R 24 0 R] >>  
endobj  
3 0 obj  
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R/F2 10 0 R/F3 12 0 R/F4 17 0 R/F5 19 0 R>>/ExtGState<</GS7 7 0 R/GS8 8 0 R>>/XObject<</Image9 9 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 594.96 842.04] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>>  
endobj  
4 0 obj  
<</Filter/FlateDecode/Length 3438>>  
stream  
x��\mS�8 �N �A ��EX�$�s[T

so what I try to do is write this content to a file like:

with open('pdffilefromsharepoint.pdf', 'w') as f:  
  f.write(response.text)

Now this writes away to the PDF without error. However, when I open the document in a .pdf-reader I get just two empty pages with no content at all. Moreover, when I look at the raw contents of my original Sharepoint file and my .pdf file that was written via the result of the content gathered from the Graph API, they seem to be exactly identical: Same number of lines, and also seem to have the exact same content in it line-by-line.

One notable thing is that the original document is just 68kb, while the one written from the gathered API content is 113kb.

Has anyone ever tried to achieve a similar thing like this? Do I need a special package to write this content to a .pdf again from Python?

Also, isn't it possible to just get the bytes from the Graph API for a document?

Tong Zhang_MSFT 9,251 Reputation points

2022-10-26T02:45:14.737+00:00

Hi @Tim ,

We are currently doing some research on this issue, will let you know as soon as possible.
Tim 156 Reputation points

2022-10-26T08:11:50.343+00:00

Thank you TongZhang, much appreciated!
Tim 156 Reputation points

2022-10-26T10:14:44.057+00:00
Hi @Tong Zhang_MSFT , so this gets me the size of the PDF file in bytes. What I'm trying to achieve is getting the actual bytes content from the endpoint instead of this string. So from the example content I receive now (also shown in original question):

%PDF-1.7 %��

I want to be able to get the bytes object by doing a conversion, or even better, by being able to retrieve the actual bytes for the file content from the https://graph.microsoft.com/v1.0/sites/{site_id}/drives/{drive_id}/items/{item_id}/content endpoint or some other endpoint, so that it looks more like this:

b'%PDF-1.7\r\n' b'%\xb5\xb5\xb5\xb5\r\n'

(this is just the first two lines but I want this for all the lines in the content.

Now this ^^ I can actually write away. The string in the snippet in the original question I cannot.

Accepted answer

0 additional answers

Your answer

Tong Zhang_MSFT 9,251 Reputation points

2022-10-26T02:45:14.737+00:00

Hi @Tim ,

We are currently doing some research on this issue, will let you know as soon as possible.
Tim 156 Reputation points

2022-10-26T08:11:50.343+00:00

Thank you TongZhang, much appreciated!
Tim 156 Reputation points

2022-10-26T10:14:44.057+00:00

Hi @Tong Zhang_MSFT , so this gets me the size of the PDF file in bytes. What I'm trying to achieve is getting the actual bytes content from the endpoint instead of this string. So from the example content I receive now (also shown in original question):

%PDF-1.7 %��

I want to be able to get the bytes object by doing a conversion, or even better, by being able to retrieve the actual bytes for the file content from the https://graph.microsoft.com/v1.0/sites/{site_id}/drives/{drive_id}/items/{item_id}/content endpoint or some other endpoint, so that it looks more like this:

b'%PDF-1.7\r\n' b'%\xb5\xb5\xb5\xb5\r\n'

(this is just the first two lines but I want this for all the lines in the content.

Now this ^^ I can actually write away. The string in the snippet in the original question I cannot.

Answer 1

Tong Zhang_MSFT 9,251

Hi @Tim ,

According to my research and testing, when we use the Graph API: https://graph.microsoft.com/v1.0/sites/{site_id}/drives/{drive_id}/items/{item_id}/content, we only can return strings in response.

If you want to get the byte of a document, please try to use the following Graph API, the size is in bytes:

https://graph.microsoft.com/v1.0/sites/{site id}/drives/{drive id}/items/{item id}?$select=id,size

Hope it can help you. Thanks for your understanding.

If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Tong Zhang_MSFT 9,251 Reputation points

2022-10-27T02:54:41.51+00:00
Hi @Tim ,

Thanks for your reply. Unfortunately, it is currently not possible to convert string content to bytes objects using the Graph API. We can only get the string content through the Graph API https://graph.microsoft.com/v1.0/sites/{site_id}/drives/{drive_id}/items/{item_id}/content ,and then convert it using python code:

test_string = "GFG is best" # printing original string print("The original string : " + str(test_string)) # Using encode(enc) # convert string to byte res = test_string.encode('utf-8') # print result print("The byte converted string is : " + str(res) + ", type : " + str(type(res)))

Also, I recommend you can create a feedback on this issue. Many features of our current products are designed and upgraded based on customers’ feedback. With requirements like this increase, the problem may well be released in the future.

Thanks for your understanding.

Feedback: https://feedbackportal.microsoft.com/feedback/forum/ebe2edae-97d1-ec11-a7b5-0022481f3c80

If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.
Tong Zhang_MSFT 9,251 Reputation points

2022-10-31T07:53:23.157+00:00

Hi @Tim ,

Do you have any further questions about this thread? If you have any questions or progress, you can contact me in time.
Tim 156 Reputation points

2022-11-01T08:22:57.727+00:00

Thank you this has been helpful. How I eventually solved it is by using response.content from the requests return. This gives the bytes object

Share via

Microsoft Graph API write raw retrieved .pdf string content to a pdf file from Python

0 additional answers

Your answer