Missing Characters when you save as PDF an MS Word Document in Arabic Language

Question

Missing Characters when you save as PDF an MS Word Document in Arabic Language

Anonymous

Folks

We are getting the following issues [and i m non-native arabic person who can't read or write arabic but can understand unicode characters] and the steps are as follows:

Create a new MS Word Document.
Copy & Paste the following string [as an example] - "الأحكام والشروط"
The font is standard Arial and Size 10.
Save As the file as PDF - which is available as a standard functionality in MS Word.
Open the PDF File and try selecting the text.

The following are the issues:

In MS Word - all the characters are proper unicode and the unicode for the above string is:

Unicode	Description
627	LETTER ALEF
644	LETTER LAM
623	LETTER ALEF WITH HAMZA ABOVE
062D	LETTER HAH
643	LETTER KAF
627	LETTER ALEF
645	LETTER MEEM
20	SPACE
648	LETTER WAW
627	LETTER ALEF
644	LETTER LAM
634	LETTER SHEEN
631	LETTER REH
648	LETTER WAW
637	LETTER TAH

When we open the PDF File created by MS Word and we do CTRL+A [Select all Text] and look at the text copied, the unicodes are as follows:

Unicode	Description	Remarks
627	ARABIC LETTER ALEF
644	ARABIC LETTER LAM
623	ARABIC LETTER ALEF WITH HAMZA ABOVE
062D	ARABIC LETTER HAH
643	ARABIC LETTER KAF
627	ARABIC LETTER ALEF
645	ARABIC LETTER MEEM
20	SPACE
648	ARABIC LETTER WAW
627	ARABIC LETTER ALEF
627	ARABIC LETTER ALEF	Original Unicode was 644
634	ARABIC LETTER SHEEN
631	ARABIC LETTER REH
648	ARABIC LETTER WAW
627	ARABIC LETTER ALEF	Original Unicode was 637

You can see - that when the MS Word Document was saved as PDF - there were certain characters - which get replaced automatically and is a loss of data as text in the concerned PDF File.

If you visually see the PDF File - every thing in terms of characters seems to be same.

We have tried even adobe Acrobat Professional to convert the arabic into PDF - and the issue remains same.

Based on a 20 page document we had - when we compare the original MS Word characters with corresponding text extracted from PDF via copy paste - we get about 17% replacements. We can't identify any pattern in the same.

Request support for the above.

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.

0 comments

4 answers

Answer 1

Dear Doug & Paul

Thank you for your response.
Adobe Acrobat Professional doesn't solve the issue at all.
We tried creating the PDF File via Foxit Printer and its quality is relatively better that all other options we tried - including Nitro, CutePDF, PDFFill.
We are yet to try the solution as suggested by Doug for the specific printer of PrimoPDF.

In all - Adobe Acrobat - certainly doesn't solve the issue. Foxit is better - and we need to run more samples to see if the character replacements which are done by Foxit have any regular pattern.

We will continue to put our efforts but in the meanwhile- if someone has any pointers - they are most welcome to suggest us the same.

Thank you

R

Answer 2

Folks

Thank you for giving us the pointers to investigate.
We have installed the latest version of Adobe Acrobat Professional and created the PDF.
When using MS Word - the generation of PDF was done by

PDF Producer: Microsoft® Word 2010

PDF Version: 1.4 (Acrobat 5.x)

When us Adobe Acrobat Professional - the PDF generation is done by:

PDF Producer: Acrobat Distiller 15.0 (Windows)

PDF Version: 1.5 (Acrobat 6.x)

In the PDF Version 1.5, except for the specific characters in arabic - unicode value 0647and 06BE- both of which are different forms of Letter HEH - have issues [they are interchanged] but all others seems to be done.

This is by far the closest we could get to resolving this issue as of now.

Thank you for your support

Ritesh

Answer 3

Paul Edstein 82,861 Volunteer Moderator

I have seen similar behaviour with Greek characters being lost in documents saved as PDF with Word 2010. All content is preserved when using a PDF print driver (e.g. Adobe Acrobat Pro) to generate PDFs.

0 comments

Answer 4

Doug Robbins - MVP - Office Apps and Services 323K MVP Volunteer Moderator

Try using the free PrimoPDF converter that will install itself as a printer and use it to "print" the file to a PDF and see if the issue still occurs.

0 comments

Share via

Missing Characters when you save as PDF an MS Word Document in Arabic Language

4 answers