categorical: Machine Learning カテゴリデータの変換

モデルをトレーニングする前にデータに対して実行できるカテゴリ変換。

使用方法

  categorical(vars, outputKind = "ind", maxNumTerms = 1e+06, terms = "",
    ...)

引数

`vars`

変換する文字ベクトルまたは変数名のリスト。名前付きの場合、その名前は作成される新しい変数名を表します。

`outputKind`

出力の種類を指定する文字列。

"ind": インジケーターベクトルを出力します。入力列はカテゴリのベクトルであり、出力には入力列のスロットごとに 1 つのインジケーターベクトルが含まれています。
"bag": 複数セットのベクトルを出力します。入力列がカテゴリのベクトルである場合、出力には 1 つのベクトルが含まれます。ここで各スロットの値は、入力ベクトル内のカテゴリの出現回数です。入力列に 1 つのカテゴリが含まれている場合、インジケーターベクトルとバッグベクトルは同等です
"key": インデックスを出力します。出力はカテゴリの整数 ID (1 から、辞書内のカテゴリ数の間) です。
既定値は "ind" です。

`maxNumTerms`

辞書に含めるカテゴリの最大数を指定する整数。既定値は 1000000 です。

`terms`

用語またはカテゴリの省略可能な文字ベクトル。

`...`

コンピューティングエンジンに送信される追加の引数。

説明

categorical 変換では、テキスト列を操作するデータセットを通過して、カテゴリのディクショナリを構築します。各行について、入力列に表示されるテキスト文字列全体がカテゴリとして定義されます。カテゴリ変換の出力はインジケーターベクトルです。このベクトル内の各スロットは辞書内のカテゴリに対応します。そのため、その長さは構築された辞書のサイズとなります。カテゴリ変換は 1 つまたは複数の列に適用できます。その場合、適用される列ごとに個別の辞書が構築されます。

categorical は、現在、係数データを処理するためにサポートされていません。

値

変換を定義する maml オブジェクト。

作成者

Microsoft Corporation Microsoft Technical Support

こちらもご覧ください

rxFastTrees、rxFastForest、rxNeuralNet、rxOneClassSvm、rxLogisticRegression。

使用例


 trainReviews <- data.frame(review = c( 
         "This is great",
         "I hate it",
         "Love it",
         "Do not like it",
         "Really like it",
         "I hate it",
         "I like it a lot",
         "I kind of hate it",
         "I do like it",
         "I really hate it",
         "It is very good",
         "I hate it a bunch",
         "I love it a bunch",
         "I hate it",
         "I like it very much",
         "I hate it very much.",
         "I really do love it",
         "I really do hate it",
         "Love it!",
         "Hate it!",
         "I love it",
         "I hate it",
         "I love it",
         "I hate it",
         "I love it"),
      like = c(TRUE, FALSE, TRUE, FALSE, TRUE,
         FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE,
         FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, 
         FALSE, TRUE, FALSE, TRUE), stringsAsFactors = FALSE
     )

     testReviews <- data.frame(review = c(
         "This is great",
         "I hate it",
         "Love it",
         "Really like it",
         "I hate it",
         "I like it a lot",
         "I love it",
         "I do like it",
         "I really hate it",
         "I love it"), stringsAsFactors = FALSE)


 # Use a categorical transform: the entire string is treated as a category
 outModel1 <- rxLogisticRegression(like~reviewCat, data = trainReviews, 
     mlTransforms = list(categorical(vars = c(reviewCat = "review"))))
 # Note that 'I hate it' and 'I love it' (the only strings appearing more than once)
 # have non-zero weights
 summary(outModel1)

 # Use the model to score
 scoreOutDF1 <- rxPredict(outModel1, data = testReviews, 
     extraVarsToWrite = "review")
 scoreOutDF1

フィードバック

このページはお役に立ちましたか?

Last updated on 2025-01-02

categorical: Machine Learning カテゴリ データの変換

使用方法

引数

vars

outputKind

maxNumTerms

terms

...

説明

値

作成者