Compartir a través de


Cloud Search Service Application: Removing items from the Office 365 Search Index.

Microsoft announced the Preview of Cloud Hybrid Search to an audience at the Microsoft Ignite conference in Chicago during May of 2015, the channel 9 recording can be found here. After the announcement many Office 365 customers deployed cloud hybrid search into test and proof of concept platforms. When SharePoint Server 2016 was released to market in March 2016 the feature went into Mainstream Support and the number of customers deploying in real Production platforms started to rise.

Feedback for the feature on the whole was very good and the process of indexing content from on premises into the Office 365 search index was robust and reasonably well understood by the customers. One area was still considered a little murky however, and that was how to get items out of the Office 365 search index again. The aim of this post is to provide some clarity on that and to draw attention to a feature update added in the April 2016 CU for SharePoint 2013 that will also ship in the June 2016 update for SharePoint 2016.

In SharePoint 2013 and SharePoint 2016 items are deleted from the on-premises search index when they are deleted from the content that is being indexed, when the admin removes a start address from the content sources, or the admin completely removes a content source. The deletion happens in different ways and SharePoint uses crawl policies to dictate this process. Documentation on crawl policies can be found here

·         When a SharePoint item is flagged in the change log as deleted then the crawler will signal that deletion during a crawl and that ultimately leads to the item being removed from the search index.

·         When a non-SharePoint item is deleted, for example an item in a file share, this is picked up as an item not found by the next crawl of that content and eventually removed from the search index.

·         When a start address is removed from a content source or an entire content source is removed this triggers a different process, a delete crawl. The delete crawl will systematically remove all items from the search index that fall under the start address(es) being removed.

·         Finally, an index reset can remove items from the search index but this approach is none selective and results in a complete purge of the indexed items, and importantly also the crawl history from the crawl databases.

just like an on-premises only Search Service Application, the Cloud Search Service Application will send signals to the Office 365 search index to remove items from this index. The fourth process above, index reset is however a very different animal in the Cloud Search service application. If an admin selects index reset in the Cloud Search Service Application, the crawl history is purged from the crawl databases but no signal is sent to Office 365 to purge the items from the Office 365 search index. This will result in orphaned indexed items with no effective means of removal. Until the April 2016 cumulative update for SharePoint Server 2013 that is.

When we say no effective way of removing the orphaned search items, there were in fact two ways to accomplish this. First re-index everything on premises and after the indexing completes delete the content sources to trigger a delete crawl to run. Of course re-indexing everything is not efficient, it takes time and if items have been deleted from the on premises content you still run the risk of missing orphans in the O365 search index. Another option is to call Microsoft Office 365 support and raise a ticket to ask for an index purge, something that takes time and again is inefficient for the task at hand.

The message here from Microsoft is please, please do not ever click index reset on a cloud SSA. In fact, a new warning has been added to the index reset function for this exact reason.

hybridcloudssa

So, what has changed in the April 2016 CU to make us happier and give us control over this capability? Well first a new method has been added to the PushTenantManager, a component of the Cloud Search Service Application. The new method is DeleteAllCloudHybridSearchContent which when you think about it, speaks for itself.

Microsoft have helped us out even further though, not only have they implemented the method, they also provide a convenient script to help us use the method.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
<#
.SYNOPSIS
    Issue a call to SPO to delete all external content indexed through Cloud hybrid search. This operation is asynchronous.
.PARAMETER PortalUrl
    SharePoint Online portal URL, for example 'https://contoso.sharepoint.com'.
.PARAMETER Credential
    Logon credential for tenant admin. Will prompt for credential if not specified.
#>
param(
    [Parameter(Mandatory=$true, HelpMessage="SharePoint Online portal URL (PPE), for example https://contoso.spoppe.com.")]
    [ValidateNotNullOrEmpty()]
    [String] $PortalUrl,
    [Parameter(Mandatory=$false, HelpMessage="Logon credential for tenant admin. Will be prompted if not specified.")]
    [PSCredential] $Credential
)
$SP_VERSION = "15"
$regKey = Get-ItemProperty -Path "HKLM:\SOFTWARE\Microsoft\Office Server\15.0\Search" -ErrorAction SilentlyContinue
if ($regKey -eq $null) {
    $regKey = Get-ItemProperty -Path "HKLM:\SOFTWARE\Microsoft\Office Server\16.0\Search" -ErrorAction SilentlyContinue
    if ($regKey -eq $null) {
        throw "Unable to detect SharePoint installation."
    }
    $SP_VERSION = "16"
}
Add-Type -AssemblyName ("Microsoft.SharePoint.Client, Version=$SP_VERSION.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c")
Add-Type -AssemblyName ("Microsoft.SharePoint.Client.Search, Version=$SP_VERSION.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c")
Add-Type -AssemblyName ("Microsoft.SharePoint.Client.Runtime, Version=$SP_VERSION.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c")
if ($Credential -eq $null)
{
    $Credential = Get-Credential -Message "SPO tenant admin credential"
}
$context = New-Object Microsoft.SharePoint.Client.ClientContext($PortalUrl)
$spocred = New-Object Microsoft.SharePoint.Client.SharePointOnlineCredentials($Credential.UserName, $Credential.Password)
$context.Credentials = $spocred
$manager = New-Object Microsoft.SharePoint.Client.Search.ContentPush.PushTenantManager $context
$task = $manager.DeleteAllCloudHybridSearchContent()
$context.ExecuteQuery()
Write-Host "Started delete task (id=$($task.Value))"

---------------------------------------------------------------------------------

So, what happens when we run this script?

In this case we are running the script and supplying the portal url on the cmdline. If the Portal Url is omitted, we will be prompted for it. Also the credential can be supplied, or as here the script will prompt for an Office 365 SharePoint Online Global Admin Account.

Deletecontact

After a valid credential is supplied then the script responds with a simple message.

Deletecontact1

Record this task ID as you may need it if calling Microsoft Support should the process for any reason fail. The task is asynchronous, that is, you can leave it to continue on running in the Office 365 Search Farm and it will eventually complete.

After this final step you will get no more feedback but you can track the effect of the task by running a search query for the managed property IsExternalContent=1 . The screen shots below were taken just before the purge, then a short time later and you can see the reduction in the estimated item count for the same query.

 

Query1

After some time, the same query revealed a different number of estimated items. The number is only an estimate but when you see the estimate steadily falling over time then you can rest assured that the deletion is underway. Ultimately resulting in no items to show in the Office 365 search index.

We have deliberately not tried to provide estimates or predictions for the time taken to purge a specific number of items from the Office 365 index because this will vary on a number of factors. Needless to say it will take as long as it takes.

Summary

So great news for people wanting the end to end control over their Office 365 search experiences. We can not only crawl what we want and include it in the Office 365 search index, but now we can remove items in a controlled manner too.

 

POST BY : Neil Hodgkinson (MSFT) and Manas Biswas (MSFT)

Comments

  • Anonymous
    May 18, 2016
    Link broken to crawl policies
  • Anonymous
    May 18, 2016
    Thanks for addressing this! It's been on the top of my "Ugly" list and I've had several customers get slapped by this issue, so nice that there's a fix. I completely missed that this was in the April CU.There's always things to improve (for example...now...how can you remove just a single item from the cloud index?). But it's nice to see fixes rolling out quickly.
  • Anonymous
    May 19, 2016
    So does this script removes all office 365 content from the On-Premises farm on which I am running this script, or all external content from all Cloud SSAs within my environment?
  • Anonymous
    August 18, 2016
    The comment has been removed
    • Anonymous
      July 13, 2017
      Hi Shikhar,I have the same issue about the DeleteAllCloudHybridSearchContent method : (Do you found some solution?
  • Anonymous
    November 06, 2016
    You might want to mention a couple of places in the doc that SharePoint 2016 requires June 2016 CU and not the April 2016 CU that you reference throughout the article...
  • Anonymous
    November 29, 2016
    Post purge on 0365, will the new crawl of On-Premises server bound to the cloud hybrid will begin re-appearing.My situation is, to test On-Premises UAT server has been attached with o365 for cloud hybrid, then removed the UAT server's Cloud Hybrid service and created on production and linked with 0365. Now while searching the contents, I get UAT urls also. Will the purge, help in cleaning up the UAT urls and fresh full crawl on production fills the index of 0365?
    • Anonymous
      March 25, 2017
      @Sudharsanan Purge will clean every item that has been crawled by CloudSSA , basically everything that is returned when you run the query IsExternalcontent:1 in the SPO search center.
      • Anonymous
        April 05, 2018
        @Manas, we had to run it 5 times to reduce the results from 663,714 items to reduce it 399 (looks like its not deleting all the content in a single iteration). But its not deleting the remaining 399 items no matter how many times we run the script. Any suggestions?