I am looking to train a model to suggest tags/categories for a given text string.
eg: "the fox is weak and limping" = [1-animal],[34-weak],[2667-injury],[16-foot] (a list of tags each with probabilities generated by past associations)
This data would be trained from a data set of many instances of text each with a corresponding string representing the list of tags that match the text.
Is there a way to featurize the text AND the result tags? And apply an algorithm to cross reference them?
The closest I have come is the idea of duplicating each of the training data rows so that each row has only one tag at a time.
I have been researching this question for a week and am thinking the problem is how I am asking it! Everything I have read does not hint at an existing algorithm to match this use case so should I look towards manipulating the data to a different structure.
Any help greatly appreciated.