selectFeatures: trasformazione selezione funzionalità di Machine Learning

Articolo
05/23/2023

La trasformazione di selezione delle funzionalità seleziona le funzionalità dalle variabili specificate usando la modalità specificata.

Utilizzo

  selectFeatures(vars, mode, ...)

Arguments

`vars`

Formula o vettore/elenco di stringhe che specifica il nome delle variabili su cui viene eseguita la selezione delle funzionalità, se la modalità è minCount(). Ad esempio, ~ var1 + var2 + var3. Se la modalità è mutualInformation(), formula o elenco denominato di stringhe che descrive la variabile dipendente e le variabili indipendenti. Ad esempio, label ~ ``var1 + var2 + var3.

`mode`

Specifica la modalità di selezione delle funzionalità. Può essere minCount o mutualInformation.

`...`

Argomenti aggiuntivi da passare direttamente al motore di calcolo Microsoft.

Dettagli

La trasformazione di selezione delle funzionalità seleziona le funzionalità delle variabili specificate usando una delle due modalità: conteggio o informazioni reciproche. Per altre informazioni, vedere minCount e mutualInformation.

Valore

Oggetto maml che definisce la trasformazione.

Vedi anche

minCount mutualInformation

Esempi


 trainReviews <- data.frame(review = c( 
         "This is great",
         "I hate it",
         "Love it",
         "Do not like it",
         "Really like it",
         "I hate it",
         "I like it a lot",
         "I kind of hate it",
         "I do like it",
         "I really hate it",
         "It is very good",
         "I hate it a bunch",
         "I love it a bunch",
         "I hate it",
         "I like it very much",
         "I hate it very much.",
         "I really do love it",
         "I really do hate it",
         "Love it!",
         "Hate it!",
         "I love it",
         "I hate it",
         "I love it",
         "I hate it",
         "I love it"),
      like = c(TRUE, FALSE, TRUE, FALSE, TRUE,
         FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE,
         FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, 
         FALSE, TRUE, FALSE, TRUE), stringsAsFactors = FALSE
     )

     testReviews <- data.frame(review = c(
         "This is great",
         "I hate it",
         "Love it",
         "Really like it",
         "I hate it",
         "I like it a lot",
         "I love it",
         "I do like it",
         "I really hate it",
         "I love it"), stringsAsFactors = FALSE)

 # Use a categorical hash transform which generated 128 features.
 outModel1 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7)))
 summary(outModel1)

 # Apply a categorical hash transform and a count feature selection transform
 # which selects only those hash slots that has value.
 outModel2 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(
   categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7), 
   selectFeatures("reviewCatHash", mode = minCount())))
 summary(outModel2)

 # Apply a categorical hash transform and a mutual information feature selection transform
 # which selects only 10 features with largest mutual information with the label.
 outModel3 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(
   categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7), 
   selectFeatures(like ~ reviewCatHash, mode = mutualInformation(numFeaturesToKeep = 10))))
 summary(outModel3)

Share via