Text Similarity Percentage in Document Files using ML.Net

Diya Rawat 1 Reputation point

I want to group the 80% or above similar PDF documents using K Mean Algorithm and ML.Net. I am reading the text from PDF files. My requirement is whatever similarity percentage user enters, the document files should grouped according to that percentage only which means if user entered the 70% then document should be at least 70% similar.

Suppose I have 10 PDF files and want to group similar documents in a group. If user wants 50% text similar then documents should group together, if user enters 80% similar then documents with at least 80% similarity should group together. And this is possible that it should create the no. of group (means clusters) automatically?

I am new to ML.Net and Algorithm Please help and guide. Thanks

An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
8,189 questions
.NET Machine learning
.NET Machine learning
.NET: Microsoft Technologies based on the .NET software framework.Machine learning: A type of artificial intelligence focused on enabling computers to use observed data to evolve new behaviors that have not been explicitly programmed.
120 questions
0 comments No comments
{count} votes