Interpret the Keyword Cloud

Viva Glint’s Natural Language Processing (NLP) technology extracts keywords and phrases that are relevant, commonly occurring, and unique to a filtered population. The keyword cloud is a section available on the Comments report.

  • Relevance is defined by how much more frequently a keyword appears than the naturally expected occurrence in organizational surveys.
  • The size of the word in the keyword cloud is based on the relevancy of the term.
  • Color represents the sentiment (positive, negative, or neutral).

Generation of the keyword cloud

A minimum number of comments are required in order for the keyword cloud to be generated by a report. There also needs to be a minimum number of occurrences for the keyword to appear, and the word needs to be in Glint’s proprietary dictionary. The keyword cloud is always in English, but NLP translates the comments from other languages.

Understand the keyword cloud methodology

Glint’s keyword methodology requires that keywords are:

  • Inherently meaningful within the context of employee engagement
  • Relevant within the context of a survey’s results
  • Unique compared to other companies’ survey

These steps explain how and why keywords are considered for inclusion in a keyword cloud:

Isolating Keywords

The first step to building a keyword cloud is to identify if a word actually qualifies as a keyword. To do so, Viva Glint's Natural Language Processing (NLP) technology defines keywords, excludes stop words, and analyzes word sequence.

Defining Keywords

Glint has a proprietary dictionary with over 16,000 keywords, which are commonly associated with employee engagement. If not associated with engagement, some of a survey’s most commonly used words may not be included in a keyword cloud.
Glint’s dictionary is continually updated through a combination of machine learning and human touch. Survey comments are matched against the dictionary to isolate keywords that should be considered for keyword cloud inclusion.

Excluding Stop Words

By its very definition, a keyword must be significant. For example, words like “the” or “to” have little meaning and are considered “stop words.” These words are excluded from Glint’s dictionary. In turn, comments analysis accounts only for meaningful words like “satisfied” or “leadership.”

Analyzing Word Sequence

Technically, a keyword can be a single word (e.g. “priorities”) or a word string, like “career development.” In order to isolate a single or multi-word keyword, each survey comment is split into various sequences. As an example, let’s look at this comment: “I am happy and have work life balance.”

  • One-word sequence: I, am, happy, and, have, work, life, balance, …
  • Two-word sequence: I am, am happy, happy and, and have, …
  • Three-word sequence: ...and have work, have work life, work life balance. Each sequence is then matched against Glint’s keyword dictionary. In this example, the keywords “happy” and “work life balance” are defined as keywords.

Scoring Keywords

Once a keyword is isolated, Glint assigns it a score based on its relevance and salience (i.e. usefulness).

  • To calculate relevance, the scoring algorithm determines a keyword’s frequency within a slice. A slice is a cut of data, such as the results of a company-wide survey or the results of a filtered group within a survey. A higher frequency of the keyword within a slice indicates higher relevance, which increases its score.
  • To identify salience, a keyword is featured only when it's unique compared to other slices or surveys. The scoring algorithm determines a keyword’s frequency within all comments on Glint, across all companies. Here, a higher frequency of the keyword indicates lower salience, which decreases its score. Keywords with the highest scores go through a final sorting process to determine which are ultimately featured in the keyword cloud.

Sorting Keywords

In order to properly represent a survey’s results, the keyword cloud should represent as many of a survey’s comments as possible. A balance must also exist between representing keywords from all comments and keywords which are still the most meaningful.

For example, certain keywords (management, communication, feedback, etc.) typically have a high frequency across comments in every slice and across all companies. Keyword clouds could easily look the same in almost any scenario, which isn’t helpful.

To sort keywords effectively, Glint uses an optimal range of times that a keyword is represented in a slice’s comments, in order to be included in the keyword cloud. This range is referred to as slice coverage. Optimal slice coverage range doesn't include keywords with the highest frequency. The purpose is to include commonly occurring, high-scoring keywords and not those that aren’t so frequent that they lose their unique value.

Glint puts a slice’s keywords through a final sorting process where:

  1. The highest-scoring keyword is pulled forward for consideration. If it falls within the most optimal slice coverage range, it's added to the keyword cloud.
  2. The comment with the next highest-scoring keyword is then compared to the optimal slice coverage range. And so on…
  3. Since not all keywords match the first optimal slice coverage range, Glint matches keywords from the remaining comments against the next most optimal slice coverage range.

This sorting process is repeated until a target number of keywords for inclusion is reached. Through this process of isolating, scoring, and sorting, Glint’s keyword clouds contain keywords that are meaningful, commonly occurring, and unique.