Text Similarity Percentage in Document Files using ML.Net

Diya Rawat 1

I want to group the 80% or above similar PDF documents using K Mean Algorithm and ML.Net. I am reading the text from PDF files. My requirement is whatever similarity percentage user enters, the document files should grouped according to that percentage only which means if user entered the 70% then document should be at least 70% similar.

Suppose I have 10 PDF files and want to group similar documents in a group. If user wants 50% text similar then documents should group together, if user enters 80% similar then documents with at least 80% similarity should group together. And this is possible that it should create the no. of group (means clusters) automatically?

I am new to ML.Net and Algorithm Please help and guide. Thanks