Share via


Managing data spills

Data Spill Mitigation

Data spills are when content is inappropriately secured or indexed by the search engine and should not be. There are different types of data spills:

Site Spills

This is when a whole site was indexed either with weak access control or should not have been indexed at all. In addition to the procedures detailed below, the most important step is to contact the site administrator and correct the access control to the data.

Large Site Spills

If the site is large (greater than 1000 URLs):

  1. Edit the result source to specifically exclude that site from it, this will remove the site immediately from search
  2. Identify which content source the site is in and make sure it’s excluded from the next full crawl
  3. Verify the URLs that you don’t want crawled are excluded by the crawl rules
  4. Kick off a full crawl of that content source
  5. Wait for the full crawl to complete, then verify the site is no longer indexed, and clean up the result source’s exclusion (undo step #1)

Small Site Spills

If the site is small (less than 1000 URLs):

  1. Use the Get-SearchResultPaths.ps1 script to identify the URLs that should not be showing up in the index.
  2. Use the Search UI to manually remove those from the index. It may be possible to do them in batches, the limit for the UI has not been tested, there is also an Server Side OM that can be used as well. http://msdn.microsoft.com/en-us/library/office/microsoft.office.server.search.administration.searchserviceapplication.removedocumentsfromsearchresults(v=office.15).aspx
  3. Identify which content source(s) the site is in and make sure it’s excluded from the next full crawl
  4. Verify the URLs that you don’t want crawled are excluded by the crawl rules
  5. The offending site/URLs will not be picked up during future crawls.

Spills Spanning Sites

If the spill is large, but contained to specific sites, follow the procedure outlined under Large Site Spills for each site.

If the spill is small and spans multiple sites, follow the procedure outlined under Small Site Spills.