다음을 통해 공유


Azure Search: What Are Scoring Profiles?

The latest version of Azure Search Client Library (version 0.6.5370.1398) supports the usage of Scoring Profiles. But what are scoring profiles anyway?

What are scoring profiles?

Scoring profiles are a way for you to configure how results are ranked, based on one or more custom-defined criteria. Fortunately, Azure Search supports a few scoring profiles configuration types, which means that you can define a quite complex algorithm based on which your results are ranked. Specifically, your results could be boosted by:

  • the appearance of a specific keyword in a specific field; for example, a football match result could be boosted if the name of the match contains a specific keyword, compared to matches where only the description contains that keyword
  • the appearance of a specific value within a range of values; this means that if you have an index of movies and one movie has a higher user rating than another movie and both contain the same keyword you are searching for, you could boost the movie with a higher user rating, considering that people would rather search for that movie instead of a low-rated movie
  • the freshness of a new document; in other words, adding a new document in the database could impact the result corresponding to that document to be rated higher because it was added more recently than the stale documents which already exists within the index and contain the same set of keywords you are querying for
  • the location of a document; this is especially useful in cases in which you are querying for documents which contain geolocation data: for example, your favorite team's matches which occur closer to you could get a better score than matches of the same team which occur on the other coast

All these scoring profiles are also supported in the Azure Search Client Library.

One of the coolest things about Scoring Profiles is that they can define a multitude of functions based on which you can boost the results and, moreover, each function used when calculating the score can have a different booster.

How do I boost results based on specific fields?

The most common way you'd probably boost your results is by having specific keywords in specific fields. For example, if you're querying for football matches, match names which contain your keyword would probably be boosted compared to matches where only the description contains that keyword.

Using Azure Search Client Library, this is done by instantiating a Scoring object and specifying the weight of the fields.

Here's an example:

var scoringProfile1 = new Scoring("scoreByName", SearchableEvent.GetSearchableEventFields())
{
    FunctionAggregation = FunctionAggregationTypes.Sum
};
scoringProfile1.Text.Weights["name"] = 100;

In this example, a new Scoring object is instantiated with the "scoreByName" name of the scoring profile and with a list of fields corresponding to the Index. The name is required for the scoring profile because it is going to be referenced when querying data by using its name.

Afterwards, a scoring profile weight is applied to the field named "name". This basically specifies that when this scoring profile is used when querying documents, documents containing the keyword in the name field will be boosted by 100 compared to documents which contain the keyword in other fields.

How do I boost results corresponding to newer documents?

Another common scenario when using searching systems is to have newer documents boosted compared to stale documents. In other words, if a new document is added to the index, this specific document could be ranked higher. Considering our examples of football matches, freshness boosting is useful in two ways:

  1. boosting a newly added document could help in selling more tickets to these events sooner
  2. inverted-freshness-boosting: events could be boosted a few days before the match occurs, thus making sure that they will be returned on better positions a few days before the event, even if their original score result (un-boosted score) isn't too high

Using Azure Search Client Library, a freshness boosting is applied by adding a FreshnessFunction to the list of functions within a scoring profile. Considering the previous example, this is done like this:

var function1 = new FreshnessFunction()
{
    Boost = 20,
    BoostingDuration = new TimeSpan(0, 13, 15, 18),
    FieldName = "dateadded",
    Interpolation = InterpolationTypes.Logarithmic
};
scoringProfile1.Functions = new List() { function1 };

In this example, a new FreshnessFunction is instantiated with the following properties: the boost applied to any search results that match the keywords is 20 and the boosting is applied to the field named "dateadded" but only for 13 hours, 15 minutes and 18 seconds (according to the BoostingDuration property) after the date and time value specified in the "dateadded" field.

How do I boost results based on geolocation?

Considering our football matches example, whenever a user might search for his favorite team's matches, matches which occur closer to his location could be boosted compared to matches which occur further away. This is also a particularly useful feature for mobile applications or location aware web applications.

Using Azure Search Client Library, a geolocation boosting can be applied after instantiating a DistanceFunction object. Here's an example:

var function2 = new DistanceFunction()
{
    Boost = 10,
    BoostingDistance = 150,
    ReferencePointParameter = "mylocation",
    FieldName = "geolocation",
    Interpolation = InterpolationTypes.Constant,
};
scoringProfile1.Functions = new List() { function2 };

In the previous example, a DistanceFunction is used when calculating a query's results using the scoringProfile1 scoring profile. This function instructs the scoring calculator to boost results located within 150 km away from a location sent when querying the data through a parameter called "mylocation". Due to this function parameter, the DistanceFunction is a special function because it allows the dynamic calculation of search results based on user input other than keyword. The "geolocation" value of FieldName specifies that the field containing the location of the football match is called "geolocation". Keep in mind though, that this field must be of type GeographyPoint (Note: using Azure Search Client Library version 0.6.5370.1398, you can save location data using the GeographyPoint model class. This helps in saving geolocation data because it exposes Latitude and Longitude properties, thus saving you the trouble of serializing and deserializing geolocation data).

How do I boost results based on their rating?

It's common for huge index repositories to boost search results based on a specific values within a range. For example, in a movie database, a movie rated higher by viewers would be boosted compared to poor movies (e.g. IMDB search results for "love" returns the 1969 movie called "Women in Love" - rated 7.8 by the time of this writing - on the 3rd position compared to the 2011 title named "Love Birds", rated only with a score of 5.9 and positioned at the end of the search results page).

Boosting results based on a specific value within a specific range is called magnitude boosting and this is done by using a MagnitudeFunction. Here's an example using Azure Search Client Library:

var function3 = new MagnitudeFunction()
{
    Boost = 1000,
    BoostingRangeStart = 9,
    BoostingRangeEnd = 10,
    ConstantBoostBeyondRange = false,
    FieldName = "rating",
    Interpolation = InterpolationTypes.Constant
};
scoringProfile1.Functions = new List() { function3 };

In this example, the magnitude function boosts document results where the field named "rating" contains a value within 9 and 10 with a booster of 1000.

Notes on scoring profile functions

Even though all the previous examples only instantiate the Function numerator with a single function, the Azure Search service allows you to use more (or even all) these functions simultaneously. Moreover, there's no restrain on using the same function type over and over again, as long as the field and/or dynamic parameters used within the function are different.

In order to use all these functions simultaneously, all you have to do is simply instantiate the Function numerator with all the functions, like this:

scoringProfile1.Functions = new List() { function1, function2, function3 };

Keep in mind though that the booster applied to a field containing a keyword is not considered a function, due to a few reasons:

  • first, functions allow the notion of Interpolation which, as the Azure Search REST API explains it, is a way to 'define the slope for which the score boosting increases from the start of the range to the end of the range'. This notion cannot be applied to text keyword boosting because a field either contains a specific keyword, or doesn't
  • second, when using more functions within a scoring profile, there's a notion of aggregating the functions in order to get the final result. As you'll see next, there's no point in aggregating these functions with the text booster, because documents which don't contain the keyword won't be returned in the search results (or, if no keyword is used, than the booster won't be used altogether, unlike functions which are - or at least, can be - still valid for empty queries)

When you specify more than one function within a scoring profile, these function will be aggregated in order to get the final result score. By default, Azure Search aggregates the results by summing their initial result. However, you can instruct the score calculator to use other aggregation mechanisms:

  • Maximum: only the maximum score returned by the use of a single function is used, whatever that function's type is
  • Minimum: the exact opposite of the previous aggregation type
  • Average: rather than summing the scores, an average result will be calculated and the result will correspond to the end result; this is useful when you want to lower a result's score if it doesn't correspond to all the functions defined within the scoring profile
  • First matching: the first function which matches the scoring profile function definitions is used for calculating the end result; this is similar to Greedy algorithm and has the best performance but might return invalid or unexpected search results
  • Sum: the default aggregation type; sums up all the initial scores using the functions and uses the sum result as end query result score

What happens if I don't use a scoring profile?

If no scoring profile is used, Azure Search uses a model based on term frequency-inverse document frequency (tf-idf for short), which, according to Wikipedia, is 'a numerical statistic that is intended to reflect how important a word is to a document in a collection'. More specifically, Azure Search currently uses Lucene's implementation of an algebraic model called Vector Space Model.

In other words, they check how frequent a given word is across the index (global frequency) and within the field (local frequency) and thus determine how special a given word is. From this result, Azure Search derives a specific value.

The implications of this model are:

  1. Hits of rare terms (low global frequency) will have higher scores than hits with terms that show up all over the index
  2. The more often a specific term shows up in a field (high local frequency), the higher the score for a hit to that term (within limits, however)
  3. Length-normalization: if a field has two terms and one is a hit, this will rate better than the same field and same term but with more values within the field (say, 10 values within the field).

All the results of these calculations are then summed up to result into the score you get when you query for some specific document without using any scoring profiles.

How do I use a scoring profile when I query my index?

Using Azure Search Client Library, when you query an index you simply have to specify a scoring profile's name in the QueryParameters object's property named ScoringProfile. If a scoring profile parameter is required, then you also have to send out a Dictionary<string,string> object, where the key will correspond to the parameters' names and the value will correspond to the parameters' values. Here's an example:

var scoringParams = new Dictionary<string, string>();
scoringParams.Add("mylocation", "-122.3358423,47.6148481");
var result = await _azureSearchService.Indexes[searchIndex].QueryAsync(new QueryParameters()
    {
        QueryText = searchText,
        ScoringProfile = searchScoringProfile,
        ScoringParameters = scoringParams        
    });

Note: using Azure Search Client Library version 0.6.5370.1398, when you're sending out a geolocation value as a scoring parameter, keep in mind that:

  • the position is serialized in LONGITUDE and LATITUDE order due to the Azure Search service's requirements; in a future release (TBD soon), you'll also be able to use the GeographyPoint data type to get the serialization done out-of-the-box
  • longitude and latitude attributed of a coordinate must be separated by the use of a comma
  • decimals are separated using the common English dot separator for decimals

As an additional note, also keep in mind that all scoring parameters defined within a scoring profile must be sent with the query when using that scoring profile. There's currently no way of specifying a default scoring parameter value.

Credits

This article is based on the original blog post from Sep 15th, 2014 on http://alexmang.ro. The blogpost is available at http://alexmang.ro/archives/1541.