selectFeatures: Transformasi Pemilihan Fitur Pembelajaran Mesin

Artikel
05/23/2023

Transformasi pemilihan fitur memilih fitur dari variabel yang ditentukan menggunakan mode yang ditentukan.

Penggunaan

  selectFeatures(vars, mode, ...)

Argumen

`vars`

Rumus atau vektor/daftar string yang menentukan nama variabel tempat pemilihan fitur dilakukan, jika modenya adalah minCount(). Contohnya,~ var1 + var2 + var3. Jika mode adalah mutualInformation(), rumus atau daftar string bernama yang menjelaskan variabel dependen dan variabel independen. Contohnya,label ~ ``var1 + var2 + var3.

`mode`

Menentukan mode pemilihan fitur. Ini bisa berupa minCount atau mutualInformation.

`...`

Argumen tambahan yang akan diteruskan langsung ke Microsoft Compute Engine.

Detail

Transformasi pemilihan fitur memilih fitur dari variabel yang ditentukan menggunakan salah satu dari dua mode: hitungan atau informasi bersama. Untuk informasi selengkapnya, lihat minCount dan mutualInformation.

Nilai

Objek maml yang menentukan transformasi.

Lihat juga

minCount mutualInformation

Contoh


 trainReviews <- data.frame(review = c( 
         "This is great",
         "I hate it",
         "Love it",
         "Do not like it",
         "Really like it",
         "I hate it",
         "I like it a lot",
         "I kind of hate it",
         "I do like it",
         "I really hate it",
         "It is very good",
         "I hate it a bunch",
         "I love it a bunch",
         "I hate it",
         "I like it very much",
         "I hate it very much.",
         "I really do love it",
         "I really do hate it",
         "Love it!",
         "Hate it!",
         "I love it",
         "I hate it",
         "I love it",
         "I hate it",
         "I love it"),
      like = c(TRUE, FALSE, TRUE, FALSE, TRUE,
         FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE,
         FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, 
         FALSE, TRUE, FALSE, TRUE), stringsAsFactors = FALSE
     )

     testReviews <- data.frame(review = c(
         "This is great",
         "I hate it",
         "Love it",
         "Really like it",
         "I hate it",
         "I like it a lot",
         "I love it",
         "I do like it",
         "I really hate it",
         "I love it"), stringsAsFactors = FALSE)

 # Use a categorical hash transform which generated 128 features.
 outModel1 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7)))
 summary(outModel1)

 # Apply a categorical hash transform and a count feature selection transform
 # which selects only those hash slots that has value.
 outModel2 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(
   categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7), 
   selectFeatures("reviewCatHash", mode = minCount())))
 summary(outModel2)

 # Apply a categorical hash transform and a mutual information feature selection transform
 # which selects only 10 features with largest mutual information with the label.
 outModel3 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(
   categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7), 
   selectFeatures(like ~ reviewCatHash, mode = mutualInformation(numFeaturesToKeep = 10))))
 summary(outModel3)