Entity Linking cognitive skill (v3)

Article
08/28/2024

The Entity Linking skill (v3) returns a list of recognized entities with links to articles in a well-known knowledge base (Wikipedia).

Note

This skill is bound to the Entity Linking machine learning models in Azure AI Language and requires a billable resource for transactions that exceed 20 documents per indexer per day. Execution of built-in skills is charged at the existing Azure AI services pay-as-you go price.

@odata.type

Microsoft.Skills.Text.V3.EntityLinkingSkill

Data limits

The maximum size of a record should be 50,000 characters as measured by String.Length. If you need to break up your data before sending it to the EntityLinking skill, consider using the Text Split skill. If you do use a text split skill, set the page length to 5000 for the best performance.

Skill parameters

Parameter names are case-sensitive and are all optional.

Parameter name	Description
`defaultLanguageCode`	Language code of the input text. If the default language code is not specified, English (en) will be used as the default language code. See the full list of supported languages.
`minimumPrecision`	A value between 0 and 1. If the confidence score (in the `entities` output) is lower than this value, the entity is not returned. The default is 0.
`modelVersion`	(Optional) Specifies the version of the model to use when calling entity linking. It will default to the latest available when not specified. We recommend you do not specify this value unless it's necessary.

Skill inputs

Input name	Description
`languageCode`	A string indicating the language of the records. If this parameter is not specified, the default language code will be used to analyze the records. See the full list of supported languages.
`text`	The text to analyze.

Skill outputs

Output name Description

Output name	Description
`entities`	An array of complex types that contains the following fields: `"name"` (The actual entity name as it appears in the text) `"id"` `"language"` (The language of the text as determined by the skill) `"url"` (The linked url to this entity) "`bingId`" (The bingId for this linked entity) `"dataSource"` (The data source associated with the url) `"matches"` (An array of complex types that contains: `text`, `offset`, `length` and `confidenceScore`)

entities

An array of complex types that contains the following fields:

"name" (The actual entity name as it appears in the text)
"id"
"language" (The language of the text as determined by the skill)
"url" (The linked url to this entity)
"bingId" (The bingId for this linked entity)
"dataSource" (The data source associated with the url)
"matches" (An array of complex types that contains: text, offset, length and confidenceScore)

Sample definition

  {
    "@odata.type": "#Microsoft.Skills.Text.V3.EntityLinkingSkill",
    "context": "/document",
    "defaultLanguageCode": "en", 
    "minimumPrecision": 0.5, 
    "inputs": [
        {
            "name": "text", 
            "source": "/document/content"
        },
        {
            "name": "languageCode", 
            "source": "/document/language"
        }
    ],
    "outputs": [
        {
            "name": "entities", 
            "targetName": "entities" 
        }
    ]
}

Sample input

{
    "values": [
      {
        "recordId": "1",
        "data":
           {
             "text": "Microsoft is liked by many.",
             "languageCode": "en"
           }
      }
    ]
}

Sample output

{
  "values": [
    {
      "recordId": "1",
      "data" : 
      {
        "entities": [
          {
            "name": "Microsoft", 
            "id": "Microsoft",
            "language": "en", 
            "url": "https://en.wikipedia.org/wiki/Microsoft", 
            "bingId": "a093e9b9-90f5-a3d5-c4b8-5855e1b01f85", 
            "dataSource": "Wikipedia", 
            "matches": [
                {
                    "text": "Microsoft", 
                    "offset": 0, 
                    "length": 9, 
                    "confidenceScore": 0.13 
                }
            ]
          }
        ],
      }
    }
  ]
}

The offsets returned for entities in the output of this skill are directly returned from the Language Service APIs, which means if you are using them to index into the original string, you should use the StringInfo class in .NET in order to extract the correct content. For more information, see Multilingual and emoji support in Language service features.

Warning cases

If the language code for the document is unsupported, a warning is returned and no entities are extracted.

Share via