TreeExtensions.FeaturizeByFastTreeRanking Metoda

Reference

Definice

Obor názvů:: Microsoft.ML

Sestavení:: Microsoft.ML.FastTree.dll

Balíček:: Microsoft.ML.FastTree v3.0.1

Balíček:: Microsoft.ML.FastTree v1.2.0

Balíček:: Microsoft.ML.FastTree v1.3.1

Balíček:: Microsoft.ML.FastTree v1.4.0

Balíček:: Microsoft.ML.FastTree v1.5.5

Balíček:: Microsoft.ML.FastTree v1.6.0

Balíček:: Microsoft.ML.FastTree v1.7.0

Balíček:: Microsoft.ML.FastTree v2.0.0

Důležité

Některé informace platí pro předběžně vydaný produkt, který se může zásadně změnit, než ho výrobce nebo autor vydá. Microsoft neposkytuje žádné záruky, výslovné ani předpokládané, týkající se zde uváděných informací.

Vytvořit FastTreeRankingFeaturizationEstimator, která se používá FastTreeRankingTrainer k trénovat TreeEnsembleModelParameters k vytváření funkcí založených na stromové struktuře.

public static Microsoft.ML.Trainers.FastTree.FastTreeRankingFeaturizationEstimator FeaturizeByFastTreeRanking (this Microsoft.ML.TransformsCatalog catalog, Microsoft.ML.Trainers.FastTree.FastTreeRankingFeaturizationEstimator.Options options);

static member FeaturizeByFastTreeRanking : Microsoft.ML.TransformsCatalog * Microsoft.ML.Trainers.FastTree.FastTreeRankingFeaturizationEstimator.Options -> Microsoft.ML.Trainers.FastTree.FastTreeRankingFeaturizationEstimator

<Extension()>
Public Function FeaturizeByFastTreeRanking (catalog As TransformsCatalog, options As FastTreeRankingFeaturizationEstimator.Options) As FastTreeRankingFeaturizationEstimator

Parametry

catalog: TransformsCatalog

Kontext TransformsCatalog pro vytvoření FastTreeRankingFeaturizationEstimator.

options: FastTreeRankingFeaturizationEstimator.Options

Možnosti konfigurace FastTreeRankingFeaturizationEstimator. Podívejte se na dostupná nastavení a TreeEnsembleFeaturizationEstimatorBase.OptionsBase podívejte FastTreeRankingFeaturizationEstimator.Options se na to.

Návraty

FastTreeRankingFeaturizationEstimator

Příklady

using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers.FastTree;

namespace Samples.Dynamic.Transforms.TreeFeaturization
{
    public static class FastTreeRankingFeaturizationWithOptions
    {
        // This example requires installation of additional NuGet package
        // <a href="https://www.nuget.org/packages/Microsoft.ML.FastTree/">Microsoft.ML.FastTree</a>.
        public static void Example()
        {
            // Create a new context for ML.NET operations. It can be used for
            // exception tracking and logging, as a catalog of available operations
            // and as the source of randomness. Setting the seed to a fixed number
            // in this example to make outputs deterministic.
            var mlContext = new MLContext(seed: 0);

            // Create a list of training data points.
            var dataPoints = GenerateRandomDataPoints(100).ToList();

            // Convert the list of data points to an IDataView object, which is
            // consumable by ML.NET API.
            var dataView = mlContext.Data.LoadFromEnumerable(dataPoints);

            // ML.NET doesn't cache data set by default. Therefore, if one reads a
            // data set from a file and accesses it many times, it can be slow due
            // to expensive featurization and disk operations. When the considered
            // data can fit into memory, a solution is to cache the data in memory.
            // Caching is especially helpful when working with iterative algorithms
            // which needs many data passes.
            dataView = mlContext.Data.Cache(dataView);

            // Define input and output columns of tree-based featurizer.
            string labelColumnName = nameof(DataPoint.Label);
            string featureColumnName = nameof(DataPoint.Features);
            string treesColumnName = nameof(TransformedDataPoint.Trees);
            string leavesColumnName = nameof(TransformedDataPoint.Leaves);
            string pathsColumnName = nameof(TransformedDataPoint.Paths);

            // Define the configuration of the trainer used to train a tree-based
            // model.
            var trainerOptions = new FastTreeRankingTrainer.Options
            {
                // Reduce the number of trees to 3.
                NumberOfTrees = 3,
                // Number of leaves per tree.
                NumberOfLeaves = 6,
                // Feature column name.
                FeatureColumnName = featureColumnName,
                // Label column name.
                LabelColumnName = labelColumnName
            };

            // Define the tree-based featurizer's configuration.
            var options = new FastTreeRankingFeaturizationEstimator.Options
            {
                InputColumnName = featureColumnName,
                TreesColumnName = treesColumnName,
                LeavesColumnName = leavesColumnName,
                PathsColumnName = pathsColumnName,
                TrainerOptions = trainerOptions
            };

            // Define the featurizer.
            var pipeline = mlContext.Transforms.FeaturizeByFastTreeRanking(
                options);

            // Train the model.
            var model = pipeline.Fit(dataView);

            // Apply the trained transformer to the considered data set.
            var transformed = model.Transform(dataView);

            // Convert IDataView object to a list. Each element in the resulted list
            // corresponds to a row in the IDataView.
            var transformedDataPoints = mlContext.Data.CreateEnumerable<
                TransformedDataPoint>(transformed, false).ToList();

            // Print out the transformation of the first 3 data points.
            for (int i = 0; i < 3; ++i)
            {
                var dataPoint = dataPoints[i];
                var transformedDataPoint = transformedDataPoints[i];
                Console.WriteLine("The original feature vector [" + String.Join(",",
                    dataPoint.Features) + "] is transformed to three different " +
                    "tree-based feature vectors:");

                Console.WriteLine("  Trees' output values: [" + String.Join(",",
                    transformedDataPoint.Trees) + "].");

                Console.WriteLine("  Leave IDs' 0-1 representation: [" + String
                    .Join(",", transformedDataPoint.Leaves) + "].");

                Console.WriteLine("  Paths IDs' 0-1 representation: [" + String
                    .Join(",", transformedDataPoint.Paths) + "].");
            }

            // Expected output:
            //   The original feature vector [1.117325,1.068023,0.8581612] is
            //   transformed to three different tree-based feature vectors:
            //     Trees' output values: [0.4095458,0.2061437,0.2364294].
            //     Leave IDs' 0-1 representation: [0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1].
            //     Paths IDs' 0-1 representation: [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1].
            //   The original feature vector [0.6588848,1.006027,0.5421779] is
            //   transformed to three different tree-based feature vectors:
            //     Trees' output values: [0.2543825,-0.06570309,-0.1456212].
            //     Leave IDs' 0-1 representation: [0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0].
            //     Paths IDs' 0-1 representation: [1,1,1,1,1,1,1,1,1,1,1,1,1,1,0].
            //   The original feature vector [0.6737045,0.6919063,0.8673147] is
            //   transformed to three different tree-based feature vectors:
            //     Trees' output values: [0.2543825,-0.06570309,0.01300209].
            //     Leave IDs' 0-1 representation: [0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0].
            //     Paths IDs' 0-1 representation: [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1].
        }

        private static IEnumerable<DataPoint> GenerateRandomDataPoints(int count,
            int seed = 0, int groupSize = 10)
        {
            var random = new Random(seed);
            float randomFloat() => (float)random.NextDouble();
            for (int i = 0; i < count; i++)
            {
                var label = random.Next(0, 5);
                yield return new DataPoint
                {
                    Label = (uint)label,
                    GroupId = (uint)(i / groupSize),
                    // Create random features that are correlated with the label.
                    // For data points with larger labels, the feature values are
                    // slightly increased by adding a constant.
                    Features = Enumerable.Repeat(label, 3).Select(x => randomFloat()
                        + x * 0.1f).ToArray()
                };
            }
        }

        // Example with label, groupId, and 3 feature values. A data set is a
        // collection of such examples.
        private class DataPoint
        {
            [KeyType(5)]
            public uint Label { get; set; }
            [KeyType(100)]
            public uint GroupId { get; set; }
            [VectorType(3)]
            public float[] Features { get; set; }
        }

        // Class used to capture the output of tree-base featurization.
        private class TransformedDataPoint : DataPoint
        {
            // The i-th value is the output value of the i-th decision tree.
            public float[] Trees { get; set; }
            // The 0-1 encoding of leaves the input feature vector falls into.
            public float[] Leaves { get; set; }
            // The 0-1 encoding of paths the input feature vector reaches the
            // leaves.
            public float[] Paths { get; set; }
        }
    }
}

Platí pro

Sdílet prostřednictvím