How to implement MultiClassClassification with tree data structure using ML.Net

Zhi Li 1 Reputation point
2021-03-19T03:42:49.487+00:00

I have hundreds of projects, and they all have tree data structure like this:

79481-image.png

Or like this:

79482-image.png

Each project has its own tree structure which is modified from a standard tree structure. What I am trying to do is to map project's tree structure to the standard tree structure, like this:

79467-image.png

Or like this:

(img)mapping to standard tree

The mapping really depends on the text instead of the node's level.
Now I'm using multi class classification in ML.Net. First I map the existing projects' tree to the standard tree manually and save the results in the database, like this:

   | Label      | Level1         | Level2         | Level3         |  
   | --------   | -------------- | -------------- | -------------- |  
   | A          | A              |      *         |       *        |  
   | A-AA       | A              |      AA1       |       *        |  
   | A-AA-AAA   | A              |      AA1       |      AAA1      |  
   | A-BB       | A              |      BB2       |       *        |  
   | A-BB-BBB   | A              |      BB2       |      BBB2      |  
   | A          | A              |      *         |       *        |  
   | A-AA-AAA   | A              |      AAA1      |       *        |  
   | A-BB       | A              |      BB2       |       *        |  
   | A-BB-BBB   | A              |      BB2       |      BBB2      |  

Because data in the column in ML.Net cannot be a missing value, so I replace them with *. And my tree has 15 levels (feature columns).

The multi class classification algorithm I choose is SdcaMaximumEntropy. Hopefully I can use the prediction to map the tree instead of doing this manually.

I successfully implemented the prediction. However, the prediction result is really poor.

So my question is:

  1. Is the way I do this right?
  2. If yes, should I remove the duplicate rows and should I replace the missing value with *?

Thanks in advance.

C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
10,640 questions
.NET Machine learning
.NET Machine learning
.NET: Microsoft Technologies based on the .NET software framework.Machine learning: A type of artificial intelligence focused on enabling computers to use observed data to evolve new behaviors that have not been explicitly programmed.
154 questions
0 comments No comments
{count} votes