Share via

Missing Characters when you save as PDF an MS Word Document in Arabic Language

Anonymous
2016-06-22T08:51:34+00:00

Folks

We are getting the following issues [and i m non-native arabic person who can't read or write arabic but can understand unicode characters] and the steps are as follows:

  1. Create a new MS Word Document.
  2. Copy & Paste the following string [as an example] - "الأحكام والشروط"
  3. The font is standard Arial and Size 10.
  4. Save As the file as PDF - which is available as a standard functionality in MS Word.
  5. Open the PDF File and try selecting the text.

The following are the issues:

  1. In MS Word - all the characters are proper unicode and the unicode for the above string is:
Unicode Description
627 LETTER ALEF
644 LETTER LAM
623 LETTER ALEF WITH HAMZA ABOVE
062D LETTER HAH
643 LETTER KAF
627 LETTER ALEF
645 LETTER MEEM
20 SPACE
648 LETTER WAW
627 LETTER ALEF
644 LETTER LAM
634 LETTER SHEEN
631 LETTER REH
648 LETTER WAW
637 LETTER TAH
  1. When we open the PDF File created by MS Word and we do CTRL+A [Select all Text] and look at the text copied, the unicodes are as follows:
Unicode Description Remarks
627 ARABIC LETTER ALEF
644 ARABIC LETTER LAM
623 ARABIC LETTER ALEF WITH HAMZA ABOVE
062D ARABIC LETTER HAH
643 ARABIC LETTER KAF
627 ARABIC LETTER ALEF
645 ARABIC LETTER MEEM
20 SPACE
648 ARABIC LETTER WAW
627 ARABIC LETTER ALEF
627 ARABIC LETTER ALEF Original Unicode was 644
634 ARABIC LETTER SHEEN
631 ARABIC LETTER REH
648 ARABIC LETTER WAW
627 ARABIC LETTER ALEF Original Unicode was 637

You can see - that when the MS Word Document was saved as PDF - there were certain characters - which get replaced automatically and is a loss of data as text in the concerned PDF File. 

If you visually see the PDF File - every thing in terms of characters seems to be same.

We have tried even adobe Acrobat Professional to convert the arabic into PDF - and the issue remains same.

Based on a 20 page document we had - when we compare the original MS Word characters with corresponding text extracted from PDF via copy paste - we get about 17% replacements. We can't identify any pattern in the same.

Request support for the above.

Microsoft 365 and Office | Word | For home | Windows

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.

0 comments No comments

4 answers

Sort by: Most helpful
  1. Anonymous
    2016-06-22T13:42:39+00:00

    Dear Doug & Paul

    1. Thank you for your response.
    2. Adobe Acrobat Professional doesn't solve the issue at all.
    3. We tried creating the PDF File via Foxit Printer and its quality is relatively better that all other options we tried - including Nitro, CutePDF, PDFFill.
    4. We are yet to try the solution as suggested by Doug for the specific printer of PrimoPDF.

    In all - Adobe Acrobat - certainly doesn't solve the issue. Foxit is better - and we need to run more samples to see if the character replacements which are done by Foxit have any regular pattern.

    We will continue to put our efforts but in the meanwhile- if someone has any pointers - they are most welcome to suggest us the same.

    Thank you

    R

    Was this answer helpful?

    1 person found this answer helpful.
    0 comments No comments
  2. Anonymous
    2016-06-23T08:25:18+00:00

    Folks

    1. Thank you for giving us the pointers to investigate.
    2. We have installed the latest version of Adobe Acrobat Professional and created the PDF.
    3. When using MS Word - the generation of PDF was done by

    PDF Producer:   Microsoft® Word 2010

    PDF Version:  1.4 (Acrobat 5.x)

    1. When us Adobe Acrobat Professional - the PDF generation is done by:

    PDF Producer:   Acrobat Distiller 15.0 (Windows)

    PDF Version:  1.5 (Acrobat 6.x)

    In the PDF Version 1.5, except for the specific characters in arabic  - unicode value 0647and 06BE- both of which are different forms of Letter HEH - have issues [they are interchanged] but all others seems to be done.

    This is by far the closest we could get to resolving this issue as of now.

    Thank you for your support

    Ritesh

    Was this answer helpful?

    0 comments No comments
  3. Paul Edstein 82,861 Reputation points Volunteer Moderator
    2016-06-22T13:11:16+00:00

    I have seen similar behaviour with Greek characters being lost in documents saved as PDF with Word 2010. All content is preserved when using a PDF print driver (e.g. Adobe Acrobat Pro) to generate PDFs.

    Was this answer helpful?

    0 comments No comments
  4. Doug Robbins - MVP - Office Apps and Services 323K Reputation points MVP Volunteer Moderator
    2016-06-22T09:46:40+00:00

    Try using the free PrimoPDF converter that will install itself as a printer and use it to "print" the file to a PDF and see if the issue still occurs.

    Was this answer helpful?

    0 comments No comments