다음을 통해 공유


PcaCatalog.RandomizedPca 메서드

정의

오버로드

RandomizedPca(AnomalyDetectionCatalog+AnomalyDetectionTrainers, RandomizedPcaTrainer+Options)

SVD(임의 단수 값 분해) 알고리즘을 사용하여 대략적인 PCA(주 구성 요소 분석) 모델을 학습하는 고급 옵션을 사용하여 만듭니 RandomizedPcaTrainer 다.

RandomizedPca(AnomalyDetectionCatalog+AnomalyDetectionTrainers, String, String, Int32, Int32, Boolean, Nullable<Int32>)

임의 SVD(단수 값 분해) 알고리즘을 사용하여 대략적인 PCA(주 구성 요소 분석) 모델을 학습하는 만들기 RandomizedPcaTrainer

RandomizedPca(AnomalyDetectionCatalog+AnomalyDetectionTrainers, RandomizedPcaTrainer+Options)

SVD(임의 단수 값 분해) 알고리즘을 사용하여 대략적인 PCA(주 구성 요소 분석) 모델을 학습하는 고급 옵션을 사용하여 만듭니 RandomizedPcaTrainer 다.

public static Microsoft.ML.Trainers.RandomizedPcaTrainer RandomizedPca (this Microsoft.ML.AnomalyDetectionCatalog.AnomalyDetectionTrainers catalog, Microsoft.ML.Trainers.RandomizedPcaTrainer.Options options);
static member RandomizedPca : Microsoft.ML.AnomalyDetectionCatalog.AnomalyDetectionTrainers * Microsoft.ML.Trainers.RandomizedPcaTrainer.Options -> Microsoft.ML.Trainers.RandomizedPcaTrainer
<Extension()>
Public Function RandomizedPca (catalog As AnomalyDetectionCatalog.AnomalyDetectionTrainers, options As RandomizedPcaTrainer.Options) As RandomizedPcaTrainer

매개 변수

catalog
AnomalyDetectionCatalog.AnomalyDetectionTrainers

변칙 검색 카탈로그 트레이너 개체입니다.

options
RandomizedPcaTrainer.Options

알고리즘에 대한 고급 옵션입니다.

반환

예제

using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;

namespace Samples.Dynamic.Trainers.AnomalyDetection
{
    public static class RandomizedPcaSampleWithOptions
    {
        public static void Example()
        {
            // Create a new context for ML.NET operations. It can be used for
            // exception tracking and logging, as a catalog of available operations
            // and as the source of randomness. Setting the seed to a fixed number
            // in this example to make outputs deterministic.
            var mlContext = new MLContext(seed: 0);

            // Training data.
            var samples = new List<DataPoint>()
            {
                new DataPoint(){ Features = new float[3] {0, 2, 1} },
                new DataPoint(){ Features = new float[3] {0, 2, 3} },
                new DataPoint(){ Features = new float[3] {0, 2, 4} },
                new DataPoint(){ Features = new float[3] {0, 2, 1} },
                new DataPoint(){ Features = new float[3] {0, 2, 2} },
                new DataPoint(){ Features = new float[3] {0, 2, 3} },
                new DataPoint(){ Features = new float[3] {0, 2, 4} },
                new DataPoint(){ Features = new float[3] {1, 0, 0} }
            };

            // Convert the List<DataPoint> to IDataView, a consumable format to
            // ML.NET functions.
            var data = mlContext.Data.LoadFromEnumerable(samples);

            var options = new Microsoft.ML.Trainers.RandomizedPcaTrainer.Options()
            {
                FeatureColumnName = nameof(DataPoint.Features),
                Rank = 1,
                Seed = 10,
            };

            // Create an anomaly detector. Its underlying algorithm is randomized
            // PCA.
            var pipeline = mlContext.AnomalyDetection.Trainers.RandomizedPca(
                options);

            // Train the anomaly detector.
            var model = pipeline.Fit(data);

            // Apply the trained model on the training data.
            var transformed = model.Transform(data);

            // Read ML.NET predictions into IEnumerable<Result>.
            var results = mlContext.Data.CreateEnumerable<Result>(transformed,
                reuseRowObject: false).ToList();

            // Let's go through all predictions.
            for (int i = 0; i < samples.Count; ++i)
            {
                // The i-th example's prediction result.
                var result = results[i];

                // The i-th example's feature vector in text format.
                var featuresInText = string.Join(',', samples[i].Features);

                if (result.PredictedLabel)
                    // The i-th sample is predicted as an outlier.
                    Console.WriteLine("The {0}-th example with features [{1}] is" +
                        "an outlier with a score of being outlier {2}", i,
                        featuresInText, result.Score);
                else
                    // The i-th sample is predicted as an inlier.
                    Console.WriteLine("The {0}-th example with features [{1}] is" +
                        "an inlier with a score of being outlier {2}",
                        i, featuresInText, result.Score);
            }
            // Lines printed out should be
            // The 0 - th example with features[0, 2, 1] is an inlier with a score of being outlier 0.2264826
            // The 1 - th example with features[0, 2, 3] is an inlier with a score of being outlier 0.1739471
            // The 2 - th example with features[0, 2, 4] is an inlier with a score of being outlier 0.05711612
            // The 3 - th example with features[0, 2, 1] is an inlier with a score of being outlier 0.2264826
            // The 4 - th example with features[0, 2, 2] is an inlier with a score of being outlier 0.3868995
            // The 5 - th example with features[0, 2, 3] is an inlier with a score of being outlier 0.1739471
            // The 6 - th example with features[0, 2, 4] is an inlier with a score of being outlier 0.05711612
            // The 7 - th example with features[1, 0, 0] is an outlier with a score of being outlier 0.6260795
        }

        // Example with 3 feature values. A training data set is a collection of
        // such examples.
        private class DataPoint
        {
            [VectorType(3)]
            public float[] Features { get; set; }
        }

        // Class used to capture prediction of DataPoint.
        private class Result
        {
            // Outlier gets true while inlier has false.
            public bool PredictedLabel { get; set; }
            // Inlier gets smaller score. Score is between 0 and 1.
            public float Score { get; set; }
        }
    }
}

설명

기본적으로 예측 점수에 따라 데이터 요소의 레이블을 결정하는 데 사용되는 임계값은 0.5입니다. 점수 범위는 0에서 1까지입니다. 예측 점수가 0.5보다 높은 데이터 요소는 이상값으로 간주됩니다. 이 임계값을 변경하는 데 사용합니다 ChangeModelThreshold<TModel>(AnomalyPredictionTransformer<TModel>, Single) .

적용 대상

RandomizedPca(AnomalyDetectionCatalog+AnomalyDetectionTrainers, String, String, Int32, Int32, Boolean, Nullable<Int32>)

임의 SVD(단수 값 분해) 알고리즘을 사용하여 대략적인 PCA(주 구성 요소 분석) 모델을 학습하는 만들기 RandomizedPcaTrainer

public static Microsoft.ML.Trainers.RandomizedPcaTrainer RandomizedPca (this Microsoft.ML.AnomalyDetectionCatalog.AnomalyDetectionTrainers catalog, string featureColumnName = "Features", string exampleWeightColumnName = default, int rank = 20, int oversampling = 20, bool ensureZeroMean = true, int? seed = default);
static member RandomizedPca : Microsoft.ML.AnomalyDetectionCatalog.AnomalyDetectionTrainers * string * string * int * int * bool * Nullable<int> -> Microsoft.ML.Trainers.RandomizedPcaTrainer
<Extension()>
Public Function RandomizedPca (catalog As AnomalyDetectionCatalog.AnomalyDetectionTrainers, Optional featureColumnName As String = "Features", Optional exampleWeightColumnName As String = Nothing, Optional rank As Integer = 20, Optional oversampling As Integer = 20, Optional ensureZeroMean As Boolean = true, Optional seed As Nullable(Of Integer) = Nothing) As RandomizedPcaTrainer

매개 변수

catalog
AnomalyDetectionCatalog.AnomalyDetectionTrainers

변칙 검색 카탈로그 트레이너 개체입니다.

featureColumnName
String

기능 열의 이름입니다. 열 데이터는 알려진 크기의 벡터 Single여야 합니다.

exampleWeightColumnName
String

예제 가중치 열의 이름(선택 사항)입니다. 가중치 열을 사용하려면 열 데이터가 형식 Single이어야 합니다.

rank
Int32

PCA의 구성 요소 수입니다.

oversampling
Int32

임의 PCA 학습에 대한 오버샘플링 매개 변수입니다.

ensureZeroMean
Boolean

사용하도록 설정하면 데이터의 중심이 0 평균이 됩니다.

seed
Nullable<Int32>

난수 생성의 초기값입니다.

반환

예제

using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;

namespace Samples.Dynamic.Trainers.AnomalyDetection
{
    public static class RandomizedPcaSample
    {
        public static void Example()
        {
            // Create a new context for ML.NET operations. It can be used for except
            // ion tracking and logging, as a catalog of available operations and as
            // the source of randomness. Setting the seed to a fixed number in this
            // example to make outputs deterministic.
            var mlContext = new MLContext(seed: 0);

            // Training data.
            var samples = new List<DataPoint>()
            {
                new DataPoint(){ Features = new float[3] {0, 2, 1} },
                new DataPoint(){ Features = new float[3] {0, 2, 1} },
                new DataPoint(){ Features = new float[3] {0, 2, 1} },
                new DataPoint(){ Features = new float[3] {0, 1, 2} },
                new DataPoint(){ Features = new float[3] {0, 2, 1} },
                new DataPoint(){ Features = new float[3] {2, 0, 0} }
            };

            // Convert the List<DataPoint> to IDataView, a consumable format to
            // ML.NET functions.
            var data = mlContext.Data.LoadFromEnumerable(samples);

            // Create an anomaly detector. Its underlying algorithm is randomized
            // PCA.
            var pipeline = mlContext.AnomalyDetection.Trainers.RandomizedPca(
                featureColumnName: nameof(DataPoint.Features), rank: 1,
                    ensureZeroMean: false);

            // Train the anomaly detector.
            var model = pipeline.Fit(data);

            // Apply the trained model on the training data.
            var transformed = model.Transform(data);

            // Read ML.NET predictions into IEnumerable<Result>.
            var results = mlContext.Data.CreateEnumerable<Result>(transformed,
                reuseRowObject: false).ToList();

            // Let's go through all predictions.
            for (int i = 0; i < samples.Count; ++i)
            {
                // The i-th example's prediction result.
                var result = results[i];

                // The i-th example's feature vector in text format.
                var featuresInText = string.Join(',', samples[i].Features);

                if (result.PredictedLabel)
                    // The i-th sample is predicted as an outlier.
                    Console.WriteLine("The {0}-th example with features [{1}] is " +
                        "an outlier with a score of being inlier {2}", i,
                            featuresInText, result.Score);
                else
                    // The i-th sample is predicted as an inlier.
                    Console.WriteLine("The {0}-th example with features [{1}] is " +
                        "an inlier with a score of being inlier {2}", i,
                        featuresInText, result.Score);
            }
            // Lines printed out should be
            // The 0 - th example with features[0, 2, 1] is an inlier with a score of being outlier 0.1101028
            // The 1 - th example with features[0, 2, 1] is an inlier with a score of being outlier 0.1101028
            // The 2 - th example with features[0, 2, 1] is an inlier with a score of being outlier 0.1101028
            // The 3 - th example with features[0, 1, 2] is an outlier with a score of being outlier 0.5082728
            // The 4 - th example with features[0, 2, 1] is an inlier with a score of being outlier 0.1101028
            // The 5 - th example with features[2, 0, 0] is an outlier with a score of being outlier 1
        }

        // Example with 3 feature values. A training data set is a collection of
        // such examples.
        private class DataPoint
        {
            [VectorType(3)]
            public float[] Features { get; set; }
        }

        // Class used to capture prediction of DataPoint.
        private class Result
        {
            // Outlier gets true while inlier has false.
            public bool PredictedLabel { get; set; }
            // Inlier gets smaller score. Score is between 0 and 1.
            public float Score { get; set; }
        }
    }
}

설명

기본적으로 예측 점수에 따라 데이터 요소의 레이블을 결정하는 데 사용되는 임계값은 0.5입니다. 점수 범위는 0에서 1까지입니다. 예측 점수가 0.5보다 높은 데이터 요소는 이상값으로 간주됩니다. 이 임계값을 변경하는 데 사용합니다 ChangeModelThreshold<TModel>(AnomalyPredictionTransformer<TModel>, Single) .

적용 대상