Word COM-Addin document sentence detection is wrong.

Ajay Pandirkar 1 Reputation point
2022-11-25T08:18:07.953+00:00

In word Add-In, we are using the Microsoft.Office.Interop.Word namespace => document.sentences to get sentences in the document. The sentence detection is inaccurate, which results in splitting a complete sentence into multiple sentences. Also, for some paragraphs, it skips the part of the sentence before delimiter (',' or ".''). which causes the wrong sentence detection.

e.g.

        On the downside, a lack of trust between female employees and leaders as the outcome of intra-gender micro-violence can lead to increased stress and isolation of both female managers and employees (O’Neil et al., 2018, p. 337). Furthermore, according to Derks et al.    

Paragraph Split Results:         

"sentence 1": " ,"

"sentence 2": "2018, p."

"sentence 3": "337)."

"sentence 4": "Furthermore, according to Derks et al."

 

Is there is another way to get correct sentence detection?

Office Development
Office Development
Office: A suite of Microsoft productivity software that supports common business tasks, including word processing, email, presentations, and data management and analysis.Development: The process of researching, productizing, and refining new or existing technologies.
3,706 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Oskar Shon 866 Reputation points MVP
    2022-11-25T13:36:09.813+00:00

    First make loop for Paragraphs in your document.
    Then next loop or split to array looking ". " (dot and space), because single dot can't by sure the and of sentence.
    If you want to be 100% sure you can check if next sentence have big letter, any sign or number.

    Regards.

    0 comments No comments