Introducing Search Schema for SharePoint 2013
Audience: Search Admin/ITPro
What is the search schema?
The search schema is a mechanism in SharePoint search that controls:
- Which aspects (properties) of items are to be indexed
- How and with what indexing structures particular pieces of data are indexed
- Which aliases can be used when querying using property restrictions
What does this mean really? It means that the search schema controls what can be searched for, how it can be searched for and how the results can be presented in your search sites.
Main schema concepts
The most important concepts in search schema are crawled properties and managed properties.
The typical data flow is illustrated by this figure:
So a crawl component reads items from the content database (or other content sources) and sends documents and metadata to the content processing component in the form of crawled properties. A lot of magic then happens inside the content processing component: the text of the item is extracted, language detected, spell-checking applied etc. In addition, the content processing component maps crawled properties to managed properties and outputs a set of name-value pairs of managed properties. As an example, this process includes picking the best crawled property value to index as the author of a document. For e.g. a PowerPoint document this is usually stored inside the document. In other cases the author is found from other crawled properties. This process is defined through the built-in search schema and so-called mappings between crawled properties and managed properties. Some mappings are prioritized, meaning that the first crawled property that actually has a value is picked, while others are not prioritized and the values of all the crawled properties are mapped to managed properties and thus indexed.
Who can modify the search schema?
The search schema contains a lot of system defined crawled and managed properties, as well as mappings in-between that are required for SharePoint Search to work properly. However, as an on-premises customer or SPO customer you also have the option to tweak the search schema to your liking.
With FAST Search for SharePoint 2010 and SharePoint Search 2010, only power users with the rights of Central Administrator could alter the built-in search schema. However, now in SharePoint 2013, even tenant admins and site collection admins/owners can modify their search schemas.
Levels of schemas in SharePoint for on-premises installations
For SharePoint on-premises installations there are two levels that matter for search schemas: search service application (SSA) level and site collection level. This is illustrated in the schema hierarchy shown below:
When SharePoint 2013 is initially installed there exists only one search schema, which is accessible at the central admin level (aka "SSA level" since the central administrator owns the search service application). As before, the central administrator is free to make a number of changes at this level.
In a similar fashion, the site collection owners may choose to override their search schemas that they otherwise inherit from the SSA level. This structure provides opportunities for companies to create and use the search schema for cross-company standards that enrich company-wide searches while still allowing site collections flexibility.
Levels of schemas in SharePoint Online
For SharePoint online (SPO) there are three levels that matter for search schemas: search service application (SSA), tenant and site collection level. This is illustrated in the schema hierarchy shown below:
In SPO, the SSA level is pre-defined (and the same as the default on-premises SSA level search schema). This schema may only be changed by Microsoft. However, new in SharePoint 2013 is the possibility for tenant administrators to modify this centrally defined schema for their tenant. Each tenant administrator inherits from this schema and may choose to override it with various types of changes (described further below).
In a similar fashion, the site collection owners may choose to override their search schemas that they otherwise inherit from the tenant level. This structure provides for opportunities for tenants to create and use the search schema for cross-company standards while still allowing site collections flexibility.
What can you do with the search schema?
Central administrators have the most freedom for search schema changes. Besides properties that are system defined or read-only and required for SharePoint 2013 to work properly, they can add, remove, change and delete any crawled property, managed property or mapping.
Tenant administrators as well as site collection owners have the ability to:
- Create new managed properties of type string, which are not refinable/sortable
- Create new managed properties of type Yes/No.
- Override existing mappings to managed properties
- Create aliases for managed properties
The combination of the latter two operations is powerful. If your intent is, for example, to make an existing managed property refinable, this is not something you are allowed to do on a site collection level. However, what you should do instead is:
- Look up which crawled properties are mapped to the managed property.
- Map those crawled properties to a refinable string property that is not already in use.
- Create an alias for the refinable string property (either with the same name as the other managed property to hide it completely, or with a new name).
Use the default unused managed properties for step two above.
Mapping a different crawled property to "title"
This is more than enough theory in one blog posting, so let's move over to an example: We would like our site collection search page to use something else for the title of documents. This is just for fun and not considered best practice in any way.
We will go through the following steps to accomplish this:
- Create and populate a list in your site collection
- Look at your existing site collection search results page
- Modify the site collection schema
- Refeed site collection data
- Look at the changes for the search results
Step 1: Create a site collection with a list.
Prerequisites: You need to have a site collection available to perform this step. If you do not have that, log in as central administrator or tenant administrator and choose create site collection from Central Admin. For this example we will create a site under /sites/botanical, with template "team site".
Next, we log in as the owner of the site collection and create a list by clicking on:
Site contents -> Add an App -> Custom list
We call the list "flowers" and add a custom column of type string called LatinName. Then,we populate the list with a few items to make it look something like this:
Step 2: Find your search collection search results page
Any site by default has a search box. This box appears in the upper right corner of your site like this:
When searching, a result page such as this one appears:
Notice that there are three refiners on the left-hand side: result type, author and modified date. For each result in the result list there is a header with a larger font, then an extract of the text of the item, with the search keyword(s) highlighted, followed by the URL.
If you cannot see the list items from step 1 in your list of search results, you probably need to wait for a crawl to pick them up and index them. On most systems this will happen automatically during the next 30 minutes. Do not let that stop you, but move on to the next step and see if that works. If you are central administrator you can trigger the recrawl from Central Administration -> Manage Service Applications-> Search Service Application -> Content Sources -> Local SharePoint sites -> Start Full Crawl.
Step 3: Change schema mappings
To change the search schema, open Site Settings from the menu on the upper right:
Then select Search Schema under the Site Collection Administration heading:
This now displays the list of available managed properties on this site collection:
Next, click on the managed property we're about to alter, "Title", to edit it. You will then see all the various attributes for Title, many of which are read-only on a site collection level. Scroll down to the mappings section of the page:
Here a number of crawled properties are mapped in prioritized order. The first of these crawled properties that have a value will be indexed as the value of Title, and it is shown in our search results.
Next, click "add a mapping" and select ows_LatinName from the list of crawled properties. It is possible to type in part of the name to avoid scrolling down the whole long list:
"ows_LatinName" is the crawled property that corresponds to the LatinName column we just created in the Flowers list. Press OK and you are back to editing the Title managed property.
Now you need to press the "Move Up" button several times to make sure that this crawled property is at the top of the list and thus has the highest priority.
Then press "OK" at the end of the page to store the changes to the Title managed property.
Congratulations! You have now created a customized schema for this site collection. This was not possible in SharePoint 2010.
Step 4: Refeed some data
After only altering the schema mappings not much has happened in the search index. This is because the altered mappings take effect as part of the document processing, which only occurs when documents are processed as part of a crawl. To see the effect of our change we need to make sure the list items are recrawled and the items reindexed. We can force a recrawl of our document library like this:
Go to the advanced settings for the Flowers list.
(Open the list, click list-> list settings -> advanced settings)
Scroll down and press the "Reindex document library" button
This marks the whole list to be picked up by a continuous crawl, regardless of whether the items have actually changed. The next time a continuous crawl comes and checks your list for changes it will refeed them all.
Warning: If your system is not set up to perform continuous crawls you need to wait for the next recrawl as scheduled by your administrator. Alternatively, if you are central administrator you can trigger the recrawl from Central Administration -> Manage Service Applications-> Search Service Application -> Content Sources -> Local SharePoint sites -> Start Full Crawl.
Step 5: Wait for a crawl
Crawls are not instant, so we need to wait until a crawl has read our document library and resubmitted the documents to the content processing component. Only then will the index receive the results of our schema changes. If you are testing this with central admin rights you can go to the "manage content sources" page inside the management pages for the search service application and look at the status of the crawl.
Step 6: Look at the new search results
Now that documents have been recrawled and reprocessed according to our new schema, we can look at the new search results. Performing a test search in the site collection search box gives us a result set such as the one below:
As you can see, the second item in this list of results is the list item from our list (and the first is the list itself). For this second item we can see that the title now is the Latin name from our list and not the title of the item. Here is how it used to look:
This is certainly a subtle change, but it demonstrates that we have created a custom schema to change how items are indexed and how results are displayed. The custom schema we created only applies to a single site collection, so we have tested the ability to have multiple search schemas in SharePoint 2013! In a later posting we might even use this capability to do something really useful for search.
- Overview of search schema in SharePoint Server 2013
- Manage the search schema in SharePoint Server 2013
- Overview of crawled and managed properties in SharePoint Server 2013
Other articles with references to schema management:
- Create and deploy custom entity extractors in SharePoint Server 2013
- Manage company name extraction in SharePoint Server 2013
- Plan crawling and federation in SharePoint Server 2013