Share via

Split document into multiple documents

Anonymous
2013-03-04T21:56:32+00:00

I am using VB.NET. I have a function that reads a Word document and splits it into multiple documents, based on finding a certain paragraph of text sprinkled throughout the original document. Here is part of the code that I'm using:

Dim BookDoc As New Word.Document

BookDoc = WordApp.Documents.Open(FileName:=CType("C:\MyDoc.doc", Object), ReadOnly:=True, Visible:=False)

NumPgf = BookDoc.Paragraphs.Count

PgfNum = 0

While PgfNum < NumPgf

PgfNum += 1

Dim BookPgf As Word.Paragraph = BookDoc.Paragraphs(Index:=PgfNum)

PgfStr = BookPgf.Range.Text

' Determine whether to split document at this paragraph ...

End While

' Save final article document

SaveSplitDocument()

BookDoc.Close(SaveChanges:=False)

BookDoc = Nothing

GC.Collect()

The code was working fine until I encountered a document that contains a table in the middle of the content that needs to be split. I don't need to split the table into two pieces, but I need to be able to include the table in the new split document exactly in the same place in the flow of paragraphs where it was in the original. It makes me think that I need a different way to read through the document, not using the Paragraphs collection. But I'm not sure how to do this. Any suggestions? Thanks!

Microsoft 365 and Office | Word | For home | Windows

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.

0 comments No comments
Answer accepted by question author
  1. Paul Edstein 82,861 Reputation points Volunteer Moderator
    2013-03-06T03:36:05+00:00

    Although it's quite possible to do nested searches in Word, it's also unnecessary in this case. Instead, one can use a single Find expression then test what's found. Try:

    Sub ExportTaggedRanges()

    Dim StrPath As String, StrName As String

    Dim NewDoc As Document, Rng As Range

    StrPath = GetFolder & ""

    With ActiveDocument

      Set Rng = .Range(0, 0)

      With .Range

        With .Find

          .ClearFormatting

          .Replacement.ClearFormatting

          .Text = "{[C/]{1,2}ontentI[!\}]{1,}}^13"

          .Replacement.Text = ""

          .Forward = True

          .Wrap = wdFindStop

          .Format = False

          .MatchWildcards = True

          .Execute

        End With

        Do While .Find.Found

          Rng.End = .End

          Rng.MoveEnd wdParagraph, -1

          If Rng.Paragraphs.First.Range.Text Like "{ContentID #####*}" & vbCr Then

            StrName = Replace(Replace(Rng.Paragraphs.First.Range.Text, "{ContentID ", ""), "}" & vbCr, "")

            Rng.Copy

            Set NewDoc = Documents.Add(Template:="Normal", Visible:=False)

            With NewDoc

              .Range.Paste

              .Range.Paragraphs.First.Range.Delete

              With .Range.Characters.Last

                While .Previous Like "[" & Chr(9) & "-" & Chr(14) & Chr(32) & Chr(160) & "]"

                  .Previous.Text = vbNullString

                Wend

              End With

              .SaveAs2 FileName:=StrPath & StrName, Fileformat:=wdFormatXMLDocument, AddToRecentFiles:=False

              .Close

            End With

          End If

          Rng.Collapse wdCollapseEnd

          .Collapse wdCollapseEnd

          .Find.Execute

        Loop

      End With

    End With

    End Sub

    Function GetFolder() As String

    Dim oFolder As Object

    GetFolder = ""

    Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)

    If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path

    Set oFolder = Nothing

    End Function

    Note: With the above code, the output document is saved with just the tag # for its name, and the tag is deleted from the body of the document. The former is managed by 'StrName = Replace(Replace(Rng.Paragraphs.First.Range.Text, "{ContentID ", ""), "}" & vbCr, "")'. The latter is managed by '.Range.Paragraphs.First.Range.Delete'

    0 comments No comments

5 additional answers

Sort by: Most helpful
  1. Anonymous
    2013-03-05T20:51:31+00:00

    Paul,

    That was very helpful. Thanks very much!

    I got your code to work in my program without any problems, I just need to make a few changes and I'm hung up again.

    The tags that are inserted throughout the document are like the following:

    {ContentID 12345} to start an article.

    {/ContentID} to end an article.

    Each of these tags is in a paragraph by itself. I need to be able to retrieve the article number from the starting tag. All of the Word content between these two tags constitutes an article and needs to be saved as a separate document.

    I thought I could modify your code to alternate searches to first look for .Find.Text = "{ContentID [0-9]{5,6}}^13^10" as a wildcard search. Then the second time through the loop it would look for .Find.Text = "{/ContentID}^13^10" and save the pasted range as a new document. However, it ends up with an error in the middle of processing the source document. Is it not possible to switch finds in the middle of handling a document range?

    If I remove the end tag search and always look for the starting tag as "{ContentID " (without wildcards), it works fine. It may be an issue with wildcard searching, but I'm not sure. Do I have to do more than set .MatchWildcards = True?

    0 comments No comments
  2. Paul Edstein 82,861 Reputation points Volunteer Moderator
    2013-03-05T01:20:06+00:00

    In that case, you'll need to copy and paste the ranges to a new document, preferably one based on the same template. This process can become fairly involved if the paragraphs are numbered, there are different page layouts, with headers, footers, etc involved.

    Document splitters based on Section breaks resulting from mailmerges are readily available (see, for example http://www.gmayor.com/individual_merge_letters.htm), but these usually don't need to concern themselves with different page layouts, etc, and they don't need to Find textual tags for their splitting cues.

    The following macro implements a basic tag-based splitting function, saving the documents to the folder of your choice and naming based on the source document's name and a sequence number. If needed, code to manage headers, footers, different page layouts, etc, could be added. The tags are assumed to be in paragraphs of their own and don't appear in the output. Simply replace <tag text> with your tag.

    Sub ExportTaggedRanges()

    Dim StrPath As String, StrName As String

    Dim NewDoc As Document, Rng As Range, i As Long

    StrPath = GetFolder & ""

    i = 1

    With ActiveDocument

      StrName = Split(.Name, ".")(0) & " "

      Set Rng = .Range(0, 0)

      With .Range

        With .Find

          .ClearFormatting

          .Replacement.ClearFormatting

          .Text = "<tag text>^p"

          .Replacement.Text = ""

          .Forward = True

          .Wrap = wdFindStop

          .Format = False

          .MatchCase = False

          .MatchWholeWord = False

          .MatchWildcards = False

          .MatchSoundsLike = False

          .MatchAllWordForms = False

          .Execute

        End With

        Do While .Find.Found

          If i > 10 Then Exit Sub

          Rng.End = .End

          Rng.MoveEnd wdParagraph, -1

          Rng.Copy

          Set NewDoc = Documents.Add(Template:="Normal", Visible:=False)

          With NewDoc

            .Range.Paste

            With .Range.Characters.Last

              While .Previous Like "[" & Chr(9) & "-" & Chr(14) & Chr(32) & Chr(160) & "]"

                .Previous.Text = vbNullString

              Wend

            End With

            .SaveAs2 FileName:=StrPath & StrName & Format(i, "000"), Fileformat:=wdFormatXMLDocument, AddToRecentFiles:=False

            .Close

          End With

          i = i + 1

          Rng.Start = .End

          .Collapse wdCollapseEnd

          .Find.Execute

        Loop

      End With

      Rng.End = .Range.End

      Rng.Copy

      Set NewDoc = Documents.Add(Template:="Normal", Visible:=False)

      With NewDoc

        .Range.Paste

        With .Range.Characters.Last

          While .Previous Like "[" & Chr(9) & "-" & Chr(14) & Chr(32) & Chr(160) & "]"

            .Previous.Text = vbNullString

          Wend

        End With

        .SaveAs2 FileName:=StrPath & StrName & Format(i, "000"), Fileformat:=wdFormatXMLDocument, AddToRecentFiles:=False

        .Close

      End With

    End With

    End Sub

    Function GetFolder() As String

    Dim oFolder As Object

    GetFolder = ""

    Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)

    If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path

    Set oFolder = Nothing

    End Function

    Note: The code contains a line 'If i > 10 Then Exit Sub' for testing - to limit the output. Dlete that before going into production!!

    0 comments No comments
  3. Anonymous
    2013-03-05T00:29:56+00:00

    Thank you for your reply!

    My Word document contains a certain user-inserted tag every so often. I want my program to read through the source document and every time it encounters one of these tags, start a new document and copy everything in the source document from this tag to the next one to the new document. When it encounters another tag, save and close the current target document and open a new one and start the process over again till the next tag.

    Between tags, there are usually "normal" paragraphs. However, there can be one or more tables. I want the tables to be copied intact along with the "normal" paragraphs to the target new document.

    So I don't need to detect whether a paragraph is in a table or not, but I need to be able to copy paragraphs and tables to a new document.

    I hope this additional explanation is clearer.

    0 comments No comments
  4. Paul Edstein 82,861 Reputation points Volunteer Moderator
    2013-03-04T23:12:03+00:00

    You can test where your paragraph is within a table, via the Information property, then either extend or contract the range, using something like the following for the test:

    If BookDoc.Paragraphs(Index:=PgfNum).Range.Information(12) = True Then ...

    Note: 12 = wdWithInTable

    0 comments No comments