minCount: Mode Jumlah Pilihan Fitur

Artikel
07/04/2024

Mode hitung pemilihan fitur yang digunakan dalam transformasi pemilihan fitur pilihFeatures.

Penggunaan

  minCount(count = 1, ...)

Argumen

`count`

Ambang batas untuk pemilihan fitur berbasis hitungan. Fitur dipilih jika dan hanya jika setidaknya count contoh memiliki nilai non-default dalam fitur. Nilai default adalah 1.

`...`

Argumen tambahan yang akan diteruskan langsung ke Microsoft Compute Engine.

Detail

Saat menggunakan mode hitungan dalam transformasi pemilihan fitur, fitur dipilih jika jumlah contoh memiliki setidaknya contoh hitungan yang ditentukan dari nilai non-default dalam fitur. Transformasi pemilihan fitur mode hitungan berguna saat diterapkan bersama dengan transformasi hash kategoris (lihat juga, categoricalHash. Pilihan fitur hitungan dapat menghapus fitur yang dihasilkan oleh transformasi hash yang tidak memiliki data dalam contoh.

Nilai

String karakter yang menentukan mode hitungan.

Penulis

Microsoft Corporation Microsoft Technical Support

Lihat juga

mutualInformation selectFeatures

Contoh


 trainReviews <- data.frame(review = c( 
         "This is great",
         "I hate it",
         "Love it",
         "Do not like it",
         "Really like it",
         "I hate it",
         "I like it a lot",
         "I kind of hate it",
         "I do like it",
         "I really hate it",
         "It is very good",
         "I hate it a bunch",
         "I love it a bunch",
         "I hate it",
         "I like it very much",
         "I hate it very much.",
         "I really do love it",
         "I really do hate it",
         "Love it!",
         "Hate it!",
         "I love it",
         "I hate it",
         "I love it",
         "I hate it",
         "I love it"),
      like = c(TRUE, FALSE, TRUE, FALSE, TRUE,
         FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE,
         FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, 
         FALSE, TRUE, FALSE, TRUE), stringsAsFactors = FALSE
     )

     testReviews <- data.frame(review = c(
         "This is great",
         "I hate it",
         "Love it",
         "Really like it",
         "I hate it",
         "I like it a lot",
         "I love it",
         "I do like it",
         "I really hate it",
         "I love it"), stringsAsFactors = FALSE)

 # Use a categorical hash transform which generated 128 features.
 outModel1 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7)))
 summary(outModel1)

 # Apply a categorical hash transform and a count feature selection transform
 # which selects only those hash features that has value.
 outModel2 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(
   categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7), 
   selectFeatures("reviewCatHash", mode = minCount())))
 summary(outModel2)

 # Apply a categorical hash transform and a mutual information feature selection transform
 # which selects those features appearing with at least a count of 5.
 outModel3 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(
   categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7), 
   selectFeatures("reviewCatHash", mode = minCount(count = 5))))
 summary(outModel3)

Bagikan melalui