How to extract embedded files from word document in a folder

Anonymous
2010-12-17T11:26:26+00:00

I have a word document with 10 objects embedded – excel/ ppt/ project and word formats. Below are the issues that i am facing

  1. I cannot directly copy each file and place in a specified folder. If we can save Outlook email attachments at one go why are we unable to save word attachments at one go!!??
  2. Even if i open each file and try to save it does not save by default document name

I have to open each file and save with a new name, which is very tedious and time consuming.

Microsoft 365 and Office | Word | For home | Windows

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.

0 comments No comments
{count} votes
Answer accepted by question author
  1. Anonymous
    2010-12-17T15:49:41+00:00

    If you are working with Word-2007 format documents, you should be able to pull the embedded files out of the package. For a simple manual process on a single document, rename it aswhatever.zip, and navigate to the word/embeddings folder within it and just copy the files you want. For an automated process for multiple files, it is rather more involved.


    Enjoy,

    Tony

    www.WordArticles.com

    97 people found this answer helpful.
    0 comments No comments

21 additional answers

Sort by: Most helpful
  1. HansV 462.4K Reputation points MVP Volunteer Moderator
    2014-02-03T15:40:32+00:00

    I'm afraid I don't know what else to suggest. Sorry.

    0 comments No comments
  2. Paul Edstein 82,806 Reputation points Volunteer Moderator
    2014-02-04T03:32:22+00:00

    Use Save As to save the document as a Word 2007 document (.docx, or if it contains macros, .docm).

    Then use the method described by Tony Jollans.

    After the above methods didn't work this was exactly what I tried with no luck

    Actually, it does work - when done as described. Do note that you have to open the file in Word, then save it in the docx or docm format. Simply changing the extension from doc to docx or docm doesn't change the file format and won't work.

    1 person found this answer helpful.
    0 comments No comments
  3. Anonymous
    2014-02-28T18:04:30+00:00

    This is a routine that I cobbled together from various references online to extract Word, Excel, Powerpoint, Visio and .pdf files embedded in Word docs. It is easily breakable, particularly the .pdf portion, and probably contains numerous mistakes and things that could be done better. It requires Acrobat full version, and will NOT work with Reader. It creates an Embedded Files folder and extracts all embedded files using their visible name in the Word doc they are embedded in. This sort of functionality should be built into Office programs, but obviously isn't. Any comments, etc. appreciated, just thought I would share.

    Sub ExtractEmbeddedDocs()

    Dim MyObj As Object

    Dim xlApp As Object

    Dim xlWkb As Object

    Dim myshape As Word.InlineShape

    Dim myFormat As Word.WdSaveFormat

    Dim WDDoc As Word.Document

    '    Dim embedObj As OLEObject

    Dim FileExtStr As String

    Dim FileFormatNum As Long

    Dim StrInFold As String, StrOutFold As String

    Dim StrDocFile As String, Obj_App As Object, i As Long

    Dim StrFile As String, StrFileList As String, StrMediaFile As String, j As Long

    Dim outFileName As String

    Dim SBar As Boolean

    Dim exten As String

    Dim embedCount As Integer, wordCount As Integer, excelCount As Integer, visioCount As Integer, pptCount As Integer

    Dim pdfCount As Integer

    Dim msg As String, temp As String

    Dim ok As Boolean

    Dim docs As Variant, doc As Variant    ', temp As Variant

    Dim AcroApp As Acrobat.CAcroApp

    Dim AcroPDDoc As Acrobat.CAcroPDDoc

    Dim AcroAVDoc As Acrobat.CAcroAVDoc

    Dim jso As Object

    StrInFold = ActiveDocument.Path

    If StrInFold = "" Then Exit Sub

    ' Store current Status Bar status, then switch on

    SBar = Application.DisplayStatusBar

    Application.DisplayStatusBar = True

    StrOutFold = StrInFold & "\Embedded Files"

    Application.ScreenUpdating = False

    '    On Error GoTo error_handler

    'Test for existing output folder, create if they don't already exist

    If Dir(StrOutFold, vbDirectory) = "" Then MkDir StrOutFold

    embedCount = ActiveDocument.InlineShapes.Count

    ' This opens the embedded documents, each in their own instance of the program

    For Each myshape In ActiveDocument.InlineShapes

    If (myshape.Type = wdInlineShapeEmbeddedOLEObject) Then

    If (InStr(myshape.OLEFormat.ClassType, "Word") > 0) Then

    myshape.OLEFormat.DoVerb (wdOLEVerbOpen) ' Open the first embedded word doc

    ' Now I want to save it

    Set MyObj = GetObject(, "Word.Application")

    If MyObj Is Nothing Then

    ' Word is not running, create new instance

    Set MyObj = CreateObject("Word.Application")

    End If

    Set WDDoc = MyObj.ActiveDocument

    ' older Word files had " on each end of the IconLabel of the embedded file, resulting in a path name with "" in it

    ' this was causing an error 4148 when the SaveAs line executed. Go figure :)

    temp = Trim(Replace(Replace(myshape.OLEFormat.IconLabel, Chr(34), ""), Chr(34), ""))

    If Right(temp, 3) = "doc" Then

    myFormat = wdFormatDocument

    Else

    Select Case Right(temp, 4)

    Case "docx"

    myFormat = wdFormatXMLDocument

    Case "docm"

    myFormat = wdFormatXMLDocumentMacroEnabled

    Case Else

    myFormat = wdFormatDocument

    End Select

    End If

    outFileName = StrOutFold & "" + temp '

    WDDoc.SaveAs2 FileName:=outFileName, FileFormat:=myFormat   'CompatibilityMode:=12    '

    WDDoc.Close savechanges:=False 'MyObj.Application.Documents.Item(1)

    wordCount = wordCount + 1

    Set WDDoc = Nothing

    '                MyObj.Quit ' will quit the existing instance of Word, don't do!

    Set MyObj = Nothing

    End If

    If (InStr(myshape.OLEFormat.ClassType, "Visio") > 0) Then

    myshape.OLEFormat.DoVerb (wdOLEVerbOpen) ' Open the first embedded Visio file

    ' Now I want to save it

    Set MyObj = GetObject(, "Visio.Application")

    If MyObj Is Nothing Then

    ' Visio is not running, create new instance

    Set MyObj = CreateObject("Visio.Application")

    End If

    'Debug.Print MyObj.Application.Documents.Item(1) ' Prints the filename

    ' older Word files had " on each end of the IconLabel of the embedded file, resulting in a path name with "" in it

    ' this was causing an error 4148 when the SaveAs line executed. Go figure :)

    temp = Trim(Replace(Replace(myshape.OLEFormat.IconLabel, Chr(34), ""), Chr(34), ""))

    outFileName = StrOutFold & "" + temp '

    MyObj.Application.Documents.Item(1).SaveAs FileName:=outFileName

    MyObj.Application.Documents.Item(1).Close 'savechanges:=False

    visioCount = visioCount + 1

    MyObj.Quit

    Set MyObj = Nothing

    End If

    If (InStr(myshape.OLEFormat.ClassType, "Excel") > 0) Then

    myshape.OLEFormat.DoVerb (wdOLEVerbOpen) ' Open the first embedded Excel

    ' Now I want to save it

    Set xlApp = GetObject(, "Excel.Application")

    If xlApp Is Nothing Then

    ' Excel is not running, create new instance

    Set xlApp = CreateObject("Excel.Application")

    End If

    Set xlWkb = xlApp.Workbooks(1)

    ' older Word files had " on each end of the IconLabel of the embedded file, resulting in a path name with "" in it

    ' this was causing an error 4148 when the SaveAs line executed. Go figure :)

    temp = Trim(Replace(Replace(myshape.OLEFormat.IconLabel, Chr(34), ""), Chr(34), ""))

    outFileName = StrOutFold & "" + temp '

    ' find file format from extension

    With xlApp.ActiveWorkbook

    If Val(xlApp.Application.Version) < 12 Then

    'You use Excel 97-2003

    FileExtStr = ".xls": FileFormatNum = -4143

    Else

    'You use Excel 2007-2013

    Select Case .FileFormat

    Case 51: FileExtStr = ".xlsx": FileFormatNum = 51

    Case 52:

    If .HasVBProject Then

    FileExtStr = ".xlsm": FileFormatNum = 52

    Else

    FileExtStr = ".xlsx": FileFormatNum = 51

    End If

    Case 56: FileExtStr = ".xls": FileFormatNum = 56

    Case Else: FileExtStr = ".xlsb": FileFormatNum = 50

    End Select

    End If

    End With

    xlApp.Application.DisplayAlerts = False

    xlApp.ActiveWorkbook.SaveAs FileName:=outFileName, FileFormat:=FileFormatNum 'xlWorkbookNormal -4143 xlWorkbookDefault 51

    xlApp.ActiveWorkbook.Close savechanges:=False

    Set xlWkb = Nothing

    xlApp.Quit

    Set xlApp = Nothing

    excelCount = excelCount + 1

    End If

    If (InStr(myshape.OLEFormat.ClassType, "PowerPoint") > 0) Then

    myshape.OLEFormat.DoVerb (wdOLEVerbOpen) ' Open the first powerpoint file

    ' Now I want to save it

    Set MyObj = GetObject(, "PowerPoint.Application")

    If MyObj Is Nothing Then

    ' Powerpoint is not running, create new instance

    Set MyObj = CreateObject("PowerPoint.Application")

    End If

    ' older Word files had " on each end of the IconLabel of the embedded file, resulting in a path name with "" in it

    ' this was causing an error 4148 when the SaveAs line executed. Go figure :)

    temp = Trim(Replace(Replace(myshape.OLEFormat.IconLabel, Chr(34), ""), Chr(34), ""))

    outFileName = StrOutFold & "" + temp '

    myshape.OLEFormat.Object.SaveAs FileName:=outFileName

    myshape.OLEFormat.Object.Close 'savechanges:=False

    pptCount = pptCount + 1

    MyObj.Quit

    Set MyObj = Nothing

    End If

    If (InStr(myshape.OLEFormat.ClassType, "Acro") > 0) Then

    myshape.OLEFormat.DoVerb (wdOLEVerbOpen) ' Open the first embedded pdf

    myshape.OLEFormat.Activate  ' probably not needed

    Set AcroAVDoc = CreateObject("AcroExch.AVDoc")

    Set AcroApp = CreateObject("AcroExch.App")

    '                If AcroApp Is Nothing Then

    '                ' Acrobat is not running, create new instance

    '                    Set AcroApp = CreateObject("AcroExch.App")

    '                End If

    Set AcroAVDoc = AcroApp.GetActiveDoc    ' get the logical doc

    Set AcroPDDoc = AcroAVDoc.GetPDDoc      ' get the physical doc

    'some code I found (KHK) for working with the javascript bridge, not needed here

    '                Set jso = AcroPDDoc.GetJSObject     ' get the javascript bridge

    '                docs = jso.app.activeDocs       ' get array of active docs

    '

    '                For Each doc In docs

    '                    If doc.documentFileName = AcroPDDoc.GetFileName Then

    '                        ' insert template document

    '                    End If

    '

    '                Next

    ' older Word files had " on each end of the IconLabel of the embedded file, resulting in a path name with "" in it

    ' this was causing an error 4148 when the SaveAs line executed. Go figure :)

    temp = Trim(Replace(Replace(myshape.OLEFormat.IconLabel, Chr(34), ""), Chr(34), ""))

    outFileName = StrOutFold & "" + temp '

    If AcroPDDoc.Save(PDSaveFull, outFileName) = False Then

    MsgBox "Cannot save document"

    End If

    AcroAVDoc.Close (1)

    AcroPDDoc.Close

    pdfCount = pdfCount + 1

    AcroApp.Exit

    Set AcroApp = Nothing

    Set AcroAVDoc = Nothing

    Set AcroPDDoc = Nothing

    End If

    End If

    Next myshape

    ' Clear the Status Bar

    Application.StatusBar = False

    ' Restore original Status Bar status

    Application.DisplayStatusBar = SBar

    Application.ScreenUpdating = True

    temp = "Embedded file counts" & vbCrLf & "Total " & vbTab & vbTab & embedCount & vbCrLf & "Word Files " & vbTab & wordCount & vbCrLf & _

    "Excel Files " & vbTab & vbTab & excelCount & vbCrLf & "Visio Files " & vbTab & vbTab & visioCount & vbCrLf & "PowerPoint Files " & vbTab & pptCount & vbCrLf

    temp = temp & "PDF Files " & vbTab & vbTab & pdfCount & vbCrLf & "Unknown files" & vbTab & embedCount - (wordCount + excelCount + visioCount)

    msg = temp

    msg = msg & vbCrLf & vbCrLf & "You should have " & vbTab & (wordCount + excelCount + visioCount + pptCount + pdfCount) & " files"

    MsgBox msg, vbInformation + vbOKOnly

    Exit Sub

    error_handler:

    If Err.Number Then

    MsgBox Err.Number & "  " & Err.Description, vbCritical + vbOKOnly

    End If

    If Err.Number = 1004 Then

    If Err.Description = "No cells were found." Then

    '            GoTo get_filename

    ElseIf Err.Description = "You cannot save this workbook with the " & _

    "same name as another open workbook or " & _

    "add-in. Choose a different name, or " & _

    "close the other workbook or add-in " & _

    "before saving." Then

    MsgBox "There is another file with the same name " & _

    "already open.  Please chose a different name " & _

    "for this file."

    '            GoTo get_filename

    End If

    End If

    Exit Sub

    pdfError:

    MsgBox "PDF files require Adobe Acrobat (not Reader) to work"

    Resume

    End Sub

    1 person found this answer helpful.
    0 comments No comments
  4. Paul Edstein 82,806 Reputation points Volunteer Moderator
    2014-03-01T06:48:42+00:00

    It's actually easier than that! The following macro will extract all embedded media files from a docx or docm document, regardless of whether the apps associated with those objects are installed. After selecting the folder to process, the code extracts the images from all docx & docm files in that folder and outputs them to a new 'DocMedia' folder in that folder. Each output file's name is prefixed with the parent document's name. If the files have media other than images embedded, these will be extracted too. Note that, the macro only processes docx & docm files - doc files can't be processed this way. If you only want to process one file, you could put just that file in the folder to be processed, or modify the code to process only a selected file.

    Sub ExtractDocResources()

    ' The following macro extracts the media and embedded objects from all docx & docm

    ' files in that folder and outputs them to new 'Media' and 'Embedded' folders in that folder.

    ' Each output file's name is prefixed with the parent document's name.

    '

    'Note: The macro only processes docx & docm files - doc files can't be processed this way

    ' (though they could be converted to the docx format for processing).

    '

    Application.ScreenUpdating = False

    Dim SBar As Boolean           ' Status Bar flag

    Dim StrInFold As String, StrEmbedFold As String, StrMediaFold As String, StrTmpFold As String

    Dim StrDocFile As String, StrZipFile As String, Obj_App As Object, i As Long, j As Long

    Dim StrFile As String, StrFileList As String, StrEmbedFile As String, StrMediaFile As String

    StrInFold = GetFolder

    If StrInFold = "" Then Exit Sub

    ' Store current Status Bar status, then switch on

    SBar = Application.DisplayStatusBar

    Application.DisplayStatusBar = True

    StrEmbedFold = StrInFold & "\Embedded"

    StrMediaFold = StrInFold & "\Media"

    StrTmpFold = StrInFold & "\Tmp"

    'Test for existing tmp & output folders, create they if they don't already exist

    If Dir(StrTmpFold, vbDirectory) = "" Then MkDir StrTmpFold

    If Dir(StrMediaFold, vbDirectory) = "" Then MkDir StrMediaFold

    If Dir(StrEmbedFold, vbDirectory) = "" Then MkDir StrEmbedFold

    'Create a Shell App for accessing the zip archives

    Set Obj_App = CreateObject("Shell.Application")

    'Look for docx files to process

    StrFile = Dir(StrInFold & "\*.doc?", vbNormal)

    'Build the file list

    While StrFile <> ""

      StrFileList = StrFileList & "|" & StrFile

      StrFile = Dir()

    Wend

    'process the file list

    j = UBound(Split(StrFileList, "|"))

    For i = 1 To j

      'ID the document to process

      StrDocFile = StrInFold & "" & Split(StrFileList, "|")(i)

      ' Report progress on Status Bar.

      Application.StatusBar = "Processing file " & i & " of " & j & ": " & StrDocFile

      'Define the zip name

      StrZipFile = Split(StrDocFile, ".")(0) & ".zip"

      'First process any embedded objects

      'In case the file is in use or zip file has no  embedded objects

      On Error Resume Next

      'Create the zip file, by simply copying to a new file with a zip extension

      FileCopy StrDocFile, StrZipFile

      'Extract the zip archive's embedded object files to the temporary folder

      Obj_App.NameSpace(StrTmpFold & "").CopyHere Obj_App.NameSpace(StrZipFile & "\word\embeddings").Items

      'Delete the zip file - the loop takes care of timing issues

      Do While Dir(StrZipFile) <> ""

        Kill StrZipFile

      Loop

      'Restore error trapping

      On Error GoTo 0

      'Get the temporary folder's file listing

      StrEmbedFile = Dir(StrTmpFold & "\*.*", vbNormal)

      'Process the temporary folder's files

      While StrEmbedFile <> ""

        'Copy the file to the output folder, prefixed with the source file's name

        FileCopy StrTmpFold & "" & StrEmbedFile, StrEmbedFold & "" & Split(Split(StrFileList, "|")(i), ".")(0) & StrEmbedFile

        'Delete the media file

        Kill StrTmpFold & "" & StrEmbedFile

        'Get the next media file

        StrEmbedFile = Dir()

      Wend

      'Next, process any media

      'In case the file is in use or zip file has no media

      On Error Resume Next

      'Create the zip file, by simply copying to a new file with a zip extension

      FileCopy StrDocFile, StrZipFile

      'Extract the zip archive's media files to the temporary folder

      Obj_App.NameSpace(StrTmpFold & "").CopyHere Obj_App.NameSpace(StrZipFile & "\word\media").Items

      'Delete the zip file - the loop takes care of timing issues

      Do While Dir(StrZipFile) <> ""

        Kill StrZipFile

      Loop

      'Restore error trapping

      On Error GoTo 0

      'Get the temporary folder's file listing

      StrMediaFile = Dir(StrTmpFold & "\*.*", vbNormal)

      'Process the temporary folder's files

      While StrMediaFile <> ""

        'Copy the file to the output folder, prefixed with the source file's name

        FileCopy StrTmpFold & "" & StrMediaFile, StrMediaFold & "" & Split(Split(StrFileList, "|")(i), ".")(0) & StrMediaFile

        'Delete the media file

        Kill StrTmpFold & "" & StrMediaFile

        'Get the next media file

        StrMediaFile = Dir()

      Wend

    Next

    'Delete the temporary folder

    RmDir StrTmpFold

    ' Clear the Status Bar

    Application.StatusBar = False

    ' Restore original Status Bar status

    Application.DisplayStatusBar = SBar

    Application.ScreenUpdating = True

    End Sub

    Function GetFolder() As String

    Dim oFolder As Object

    GetFolder = ""

    Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)

    If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path

    Set oFolder = Nothing

    End Function

    0 comments No comments