Word COM-Addin document sentence detection is wrong.

Ajay Pandirkar 1 Reputation point
2022-11-25T08:18:07.953+00:00

In word Add-In, we are using the Microsoft.Office.Interop.Word namespace => document.sentences to get sentences in the document. The sentence detection is inaccurate, which results in splitting a complete sentence into multiple sentences. Also, for some paragraphs, it skips the part of the sentence before delimiter (',' or ".''). which causes the wrong sentence detection.

e.g.

        On the downside, a lack of trust between female employees and leaders as the outcome of intra-gender micro-violence can lead to increased stress and isolation of both female managers and employees (O’Neil et al., 2018, p. 337). Furthermore, according to Derks et al.    

Paragraph Split Results:         

"sentence 1": " ,"

"sentence 2": "2018, p."

"sentence 3": "337)."

"sentence 4": "Furthermore, according to Derks et al."

 

Is there is another way to get correct sentence detection?

Microsoft 365 and Office | Development | Other
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Oskar Shon 866 Reputation points
    2022-11-25T13:36:09.813+00:00

    First make loop for Paragraphs in your document.
    Then next loop or split to array looking ". " (dot and space), because single dot can't by sure the and of sentence.
    If you want to be 100% sure you can check if next sentence have big letter, any sign or number.

    Regards.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.