What is the first step in the statistical analysis of terms in a text in the context of natural language processing (NLP)?

Question

What is the first step in the statistical analysis of terms in a text in the context of natural language processing (NLP)?

ab321 0

In the practice assessment for the AI-900 exam the correct answer to the following question is "removing stop words".

But in the Microsoft Learn Module: Understand Text Analytics, it states, "The first step in analyzing a corpus is to break it down into tokens."

No where in the Self-Paced Learning Module for the AI-900 exam does it state that the first step to statistical analysis of terms in a text in the context of NLP is "removing stop words".

What is the correct answer? And where can I find supporting documentation?

VarunTha 14,850 Reputation points Microsoft External Staff Moderator

2023-12-05T16:59:48.52+00:00

Hi ab321,

Just checking in to if you had a chance to see the responses provide by 86576635 to your question.

If the answer provided by has helped, please Upvote using "Thumbs-up" so that it can improve discoverability for others in the community looking for help on the same topic.

Please let us know if they were helpful and don't hesitate to reach out to us if you face any issue.

Thank you.
VarunTha 14,850 Reputation points Microsoft External Staff Moderator

2023-12-06T18:25:01.57+00:00

Hi ab321,

Just checking in to if you had a chance to see the responses provide by 86576635 to your question.

If the answer provided by has helped, please Upvote using "Thumbs-up" so that it can improve discoverability for others in the community looking for help on the same topic.

Please let us know if they were helpful and don't hesitate to reach out to us if you face any issue.

Thank you.

1 answer

Your answer

VarunTha 14,850 Reputation points Microsoft External Staff Moderator

2023-12-05T16:59:48.52+00:00

Hi ab321,

Just checking in to if you had a chance to see the responses provide by 86576635 to your question.

If the answer provided by has helped, please Upvote using "Thumbs-up" so that it can improve discoverability for others in the community looking for help on the same topic.

Please let us know if they were helpful and don't hesitate to reach out to us if you face any issue.

Thank you.
VarunTha 14,850 Reputation points Microsoft External Staff Moderator

2023-12-06T18:25:01.57+00:00

Hi ab321,

Just checking in to if you had a chance to see the responses provide by 86576635 to your question.

If the answer provided by has helped, please Upvote using "Thumbs-up" so that it can improve discoverability for others in the community looking for help on the same topic.

Please let us know if they were helpful and don't hesitate to reach out to us if you face any issue.

Thank you.

Answer 1

AI-900 Question

The confusion seems to arise from a discrepancy between the AI-900 exam practice assessment and the Microsoft Learn Module: Understand Text Analytics.

In the practice assessment, the correct answer is given as “removing stop words”. However, the Microsoft Learn Module states that the first step in analyzing a corpus is tokenization, which involves breaking down the text into individual words or “tokens”.

It appears that both tokenization and stop word removal are crucial steps in NLP. However, their order can sometimes vary depending on the specific task or the particular focus of the analysis. In many NLP tasks, tokenization is often the very first step, followed by other preprocessing steps such as removing stop words, stemming, and so on. But in the context of the question from the AI-900 exam practice assessment, which specifically focuses on “statistical analysis of terms,” removing stop words is considered the first step. This is because stop words (common words like “is”, “the”, “and”, etc.) can often skew the statistics due to their high frequency. By removing these words, the focus will be on the more meaningful words in the text for the analysis.

I believe it would be beneficial for a moderator to provide a final decision on this matter to clear up any remaining confusion.

Microsoft Learn Module: Understand Text Analytics - https://learn.microsoft.com/en-us/azure/synapse-analytics/machine-learning/tutorial-text-analytics-use-mmlspark

Share via

What is the first step in the statistical analysis of terms in a text in the context of natural language processing (NLP)?

1 answer

Your answer