Share via

How to fix conversion error line breaks in Microsoft Word?

TJS 40 Reputation points
2026-01-08T22:29:46.2333333+00:00

I have a batch of Microsoft Word documents that were converted from other formats, including WordPerfect and PDF, several years ago. Some of these have many short lines in a row. For example, the text might look like

Bananas

Oranges

Apples

After each of these lines, I would expect to see a paragraph mark/hard return, which would make these parse as separate lines when converted to html. But instead, Word has converted them as ending in spaces with a sharp right indent, so when it is converted to html, it becomes:

Bananas Oranges Apples

which is not workable.

Is there a way that I can identify only spaces that are at the ends of these short lines, and replace them with paragraph marks? In the shorter files I can do it by hand, but some of these are hundreds of pages long. The only other thing I can think to do is to convert them to PDF and back to Word, which seems to fix the line breaks, but I'm not sure if that will cause other issues that are less visible.

Any thoughts appreciated!

Microsoft 365 and Office | Word | For education | Windows

Answer accepted by question author
  1. John Korchok 231.4K Reputation points Volunteer Moderator
    2026-01-08T23:53:06.3333333+00:00

    Thanks for posting. See if the following works for you. I asked Copilot:

    "Word" VBA regex find space followed by uppercase character and add return between space and character

    and it gave my the following VBA:

    Sub AddReturnBetweenSpaceAndUppercase()
        Dim regEx As Object
        Dim matches As Object
        Dim match As Object
        Dim docRange As Range
        
        ' Create RegExp object
        Set regEx = CreateObject("VBScript.RegExp")
        regEx.Pattern = " ([A-Z])"   ' Space followed by uppercase letter
        regEx.Global = True
        regEx.IgnoreCase = False     ' Case-sensitive (only uppercase)
        
        ' Work on the whole document
        Set docRange = ActiveDocument.Content
        
        ' Run regex replace: replace space+uppercase with paragraph mark + uppercase
        docRange.Text = regEx.Replace(docRange.Text, vbCr & "\1")
        
        MsgBox "Done! Paragraph breaks inserted.", vbInformation
    End Sub
    

    Here are the notes accompanying the code:

    How It Works

    Pattern:

    • " ([A-Z])" → matches a space followed by an uppercase letter (A–Z).
      • The uppercase letter is captured in group 1 ([A-Z]).
      Replacement:
      - `vbCr & "\1"` → inserts a paragraph break (`vbCr`) before the captured uppercase letter.
      
      **Scope**:
      
         - `ActiveDocument.Content` → applies to the entire document.
      

    Important Notes

    • This replaces all matches in the document at once.
    • If you want to preserve formatting, you should use Word’s Find with wildcards instead of replacing .Text directly (because .Text replacement removes formatting).

    Formatting-Safe Alternative (Word Wildcards)

    Vba
    
    Sub AddReturnFormattingSafe()
        With Selection.Find
            .ClearFormatting
            .Replacement.ClearFormatting
            .Text = " ([A-Z])"
            .Replacement.Text = "^p\1"
            .Forward = True
            .Wrap = wdFindContinue
            .Format = False
            .MatchWildcards = True
            .Execute Replace:=wdReplaceAll
        End With
        MsgBox "Done! Paragraph breaks inserted.", vbInformation
    End Sub
    

    Explanation:

    • " ([A-Z])" with wildcards matches space + uppercase.
    • ^p\1 inserts a paragraph break before the uppercase letter.
    • This method keeps formatting intact.
    3 people found this answer helpful.
    0 comments No comments

Answer accepted by question author
  1. Charles Kenyon 166.5K Reputation points Volunteer Moderator
    2026-01-08T23:59:15.4233333+00:00

    Conversion from other programs is almost always imperfect.

    It may look OK but the underlying structure will be very different from what the document would be if created in Word. Documents converted from pdf (or really any other format) to Word can be tough to edit because the conversion process never has a one-to-one matching of how formatting is done under the hood. This means that a converted document will seldom be formatted in Word in a way that uses Word features well for that formatting. An example is multiple section breaks to change margins, where in Word you would simply change the paragraph indent. Margins and Indents in Word. Another example is that Word formatting of text is best done using Styles and those will not be used. It will all be direct formatting. That can make a huge difference in how easy it is to edit. The Importance of Styles in Microsoft Word.

     

    With pdf files, if possible, find the file from which the pdf was created and edit that file, using the program that created it. Then if you need it in Word format and it is not, convert it directly to Word. This will cut out one conversion process and make for fewer editing problems.

     

    When I really need the document in Word format and intend to do much editing, I create a new Word file and paste the content into it as plain text. Then I format it to match the original using Styles for the formatting as much as possible. This takes time; for me, it is worth it and saves a lot of frustration.

    2 people found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. TJS 40 Reputation points
    2026-01-08T22:52:57.7+00:00

    User's image

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.