CustomStopWordsRemover Class
Remover with list of stopwords specified by the user.
- Inheritance
-
nimbusml.internal.core.feature_extraction.text.stopwords._customstopwordsremover.CustomStopWordsRemoverCustomStopWordsRemover
Constructor
CustomStopWordsRemover(stopword=None, **params)
Parameters
Name | Description |
---|---|
stopword
|
List of stopwords. |
params
|
Additional arguments sent to compute engine. |
Examples
###############################################################################
# NGramFeaturizer
from nimbusml import FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.feature_extraction.text import NGramFeaturizer
from nimbusml.feature_extraction.text.extractor import Ngram
from nimbusml.feature_extraction.text.stopwords import CustomStopWordsRemover
# data input (as a FileDataStream)
path = get_dataset('wiki_detox_train').as_filepath()
data = FileDataStream.read_csv(path, sep='\t')
print(data.head())
# Sentiment SentimentText
# 0 1 ==RUDE== Dude, you are rude upload that carl p...
# 1 1 == OK! == IM GOING TO VANDALIZE WILD ONES WIK...
# 2 1 Stop trolling, zapatancas, calling me a liar m...
# 3 1 ==You're cool== You seem like a really cool g...
# 4 1 ::::: Why are you threatening me? I'm not bein...
xf = NGramFeaturizer(word_feature_extractor=Ngram(),
stop_words_remover=CustomStopWordsRemover(['!',
'$',
'%',
'&',
'\'',
'\'d']),
columns={'features': ['SentimentText']})
# fit and transform
features = xf.fit_transform(data)
# print features
print(features.head())
# Sentiment ... features.douchiest features.award.
# 0 1 ... 0.0 0.0
# 1 1 ... 0.0 0.0
# 2 1 ... 0.0 0.0
# 3 1 ... 0.0 0.0
# 4 1 ... 0.0 0.0
Remarks
The NGramFeaturizer transform produces a
bag of counts of
sequences of consecutive words from a given corpus of text.
It also offers stopwords removing. A user-defined list of stopwords.
It accepts the following option: stopword
.
Methods
get_params |
Get the parameters for this operator. |
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
Name | Description |
---|---|
deep
|
Default value: False
|