IDF Class
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Inverse document frequency (IDF). The standard formulation is used: idf = log((m + 1) / (d(t) + 1)), where m is the total number of documents and d(t) is the number of documents that contain term t.
This implementation supports filtering out terms which do not appear in a minimum number of documents (controlled by the variable minDocFreq). For terms that are not in at least minDocFreq documents, the IDF is found as 0, resulting in TF-IDFs of 0.
public class IDF : Microsoft.Spark.ML.Feature.FeatureBase<Microsoft.Spark.ML.Feature.IDF>
type IDF = class
inherit FeatureBase<IDF>
Public Class IDF
Inherits FeatureBase(Of IDF)
- Inheritance
Constructors
IDF() |
Create a IDF without any parameters |
IDF(String) |
Create a IDF with a UID that is used to give the IDF a unique ID |
Methods
Clear(Param) |
Clears any value that was previously set for this Microsoft.Spark.ML.Feature.Param. The value is reset to the default value. (Inherited from FeatureBase<T>) |
ExplainParam(Param) |
Returns a description of how a specific Microsoft.Spark.ML.Feature.Param works and is currently set. (Inherited from FeatureBase<T>) |
ExplainParams() |
Returns a description of how all of the Microsoft.Spark.ML.Feature.Param's that apply to this object work and how they are currently set. (Inherited from FeatureBase<T>) |
Fit(DataFrame) |
Fits a model to the input data. |
GetInputCol() |
Gets the column that the IDF should read from |
GetMinDocFreq() |
Minimum of documents in which a term should appear for filtering |
GetOutputCol() |
The IDF will create a new column in the DataFrame, this is the name of the new column. |
GetParam(String) |
Retrieves a Microsoft.Spark.ML.Feature.Param so that it can be used to set the value of the Microsoft.Spark.ML.Feature.Param on the object. (Inherited from FeatureBase<T>) |
Load(String) |
Loads the IDF that was previously saved using Save |
Save(String) |
Saves the object so that it can be loaded later using Load. Note that these objects can be shared with Scala by Loading or Saving in Scala. (Inherited from FeatureBase<T>) |
Set(Param, Object) |
Sets the value of a specific Microsoft.Spark.ML.Feature.Param. (Inherited from FeatureBase<T>) |
SetInputCol(String) |
Sets the column that the IDF should read from |
SetMinDocFreq(Int32) |
Minimum of documents in which a term should appear for filtering |
SetOutputCol(String) |
The IDF will create a new column in the DataFrame, this is the name of the new column. |
ToString() |
Returns the JVM toString value rather than the .NET ToString default (Inherited from FeatureBase<T>) |
Uid() |
The UID that was used to create the object. If no UID is passed in when creating the object then a random UID is created when the object is created. (Inherited from FeatureBase<T>) |
Applies to
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for