How to extract embedded files from word document in a folder

Anonymous
2010-12-17T11:26:26+00:00

I have a word document with 10 objects embedded – excel/ ppt/ project and word formats. Below are the issues that i am facing

  1. I cannot directly copy each file and place in a specified folder. If we can save Outlook email attachments at one go why are we unable to save word attachments at one go!!??
  2. Even if i open each file and try to save it does not save by default document name

I have to open each file and save with a new name, which is very tedious and time consuming.

Microsoft 365 and Office | Word | For home | Windows

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.

0 comments No comments
{count} votes
Answer accepted by question author
  1. Anonymous
    2010-12-17T15:49:41+00:00

    If you are working with Word-2007 format documents, you should be able to pull the embedded files out of the package. For a simple manual process on a single document, rename it aswhatever.zip, and navigate to the word/embeddings folder within it and just copy the files you want. For an automated process for multiple files, it is rather more involved.


    Enjoy,

    Tony

    www.WordArticles.com

    97 people found this answer helpful.
    0 comments No comments

21 additional answers

Sort by: Most helpful
  1. Anonymous
    2014-03-03T14:56:08+00:00

    It's actually easier than that! The following macro will extract all embedded media files from a docx or docm document, regardless of whether the apps associated with those objects are installed. After selecting the folder to process, the code extracts the images from all docx & docm files in that folder and outputs them to a new 'DocMedia' folder in that folder. Each output file's name is prefixed with the parent document's name. If the files have media other than images embedded, these will be extracted too. Note that, the macro only processes docx & docm files - doc files can't be processed this way. If you only want to process one file, you could put just that file in the folder to be processed, or modify the code to process only a selected file.

    Yes, I'd seen that routine before. Unfortunately, it does not save the embedded files with their actual filename, but as generic filenames (Microsoft_Word_Document1, etc.). The filename is stored as the IconLabel of the InlineShapes.OLEFormat. It could be useful, but doesn't accomplish what I need for the files I work with.

    0 comments No comments
  2. Paul Edstein 82,806 Reputation points Volunteer Moderator
    2014-03-04T06:57:34+00:00

    Yes, I'd seen that routine before. Unfortunately, it does not save the embedded files with their actual filename, but as generic filenames (Microsoft_Word_Document1, etc.). The filename is stored as the IconLabel of the InlineShapes.OLEFormat.

    Since embedded objects (as opposed to linked objects) that aren't inserted as icons don't have an IconLabel, which I imagine is the vast majority of cases, you can't rely on that approach for anything other than a very small number of such objects. Furthermore, your macro only works with InlineShape objects in the document's main story; it doesn't work with Shape objects or content anywhere other than in the body of the document (e.g. in textboxes, headers, footers, footnotes, endnotes, etc.). And, as you've noted, it's functionality with PDFs is limited; it also doesn't work for any other object whose parent application isn't installed on the host PC. The code I posted handles all embedded objects, regardless of the parent application and regardless of where they're found in the document.

    0 comments No comments
  3. Anonymous
    2014-03-04T14:19:11+00:00

    Yes, I'd seen that routine before. Unfortunately, it does not save the embedded files with their actual filename, but as generic filenames (Microsoft_Word_Document1, etc.). The filename is stored as the IconLabel of the InlineShapes.OLEFormat.

    Since embedded objects (as opposed to linked objects) that aren't inserted as icons don't have an IconLabel, which I imagine is the vast majority of cases, you can't rely on that approach for anything other than a very small number of such objects. Furthermore, your macro only works with InlineShape objects in the document's main story; it doesn't work with Shape objects or content anywhere other than in the body of the document (e.g. in textboxes, headers, footers, footnotes, endnotes, etc.). And, as you've noted, it's functionality with PDFs is limited; it also doesn't work for any other object whose parent application isn't installed on the host PC. The code I posted handles all embedded objects, regardless of the parent application and regardless of where they're found in the document.

    Your code did not extract embedded pdf files (Insert object -> Display as icon). There were a couple of .bin files in the Embedded folder created, one of which might have been the embedded .pdf, but changing the extension to pdf did not work for either of them. It is still useful code, depending upon what you are dealing with and what you need to do.

    This is pretty much the only type of embedded object I need to deal with, so the code I have works well for those, unless as you noted, you do not have the application(s).

    EDIT: also, I had to add a line to create the StrMediaFold to your example. I was getting an error that it didn't exist here:

    FileCopy StrTmpFold & "" & StrMediaFile, StrMediaFold & "" & Split(Split(StrFileList, "|")(i), ".")(0) & StrMediaFile

    0 comments No comments
  4. Anonymous
    2014-06-09T21:08:59+00:00

    You just can use this tool --> https://github.com/Sicos1977/OfficeExtractor

    1 person found this answer helpful.
    0 comments No comments