Beware crawling the non-Default zone for a SharePoint 2013 Web Application
Update: I've now published another post "Problems Crawling the non-Default zone *Explained" that goes on to explain the underlying behaviors that I warned about and described in this post...
---------------------------------------
After playing for a while with SharePoint 2013 Search, I thought we were out of the woods regarding crawls of the non-Default Alternate Access Mapping (AAM) zone for a SharePoint Web Application. This caused all sorts of problems in earlier versions of SharePoint (primarily busted contextual scopes, broken social tagging, and workflow emails linking to the incorrect zone) because there is a built in assumption by other components throughout SharePoint that the Default zone is being crawled.
I'm still working to fully nail down the impacts for SP2013, but, from my initial testing [in SP2013], when crawling a non-Default URL, all search results will be relative to the URL crawled rather than the URL from which you query (and suspect it’s going to break scoping rules for queries as well), meaning you will get unexpected URLs when you query.
Update: I want to seriously caution against using Server Name Mappings, particularly in SharePoint 2013. Admittedly, with SharePoint 2010, Server Name Mappings did appear to provide a workaround. However, although they appear to work, Server Name Mappings were definitely not designed for this particular scenario.
Second, In SharePoint 2013, I know for certain that some managed properties (e.g. SPSiteUrl and ParentUrl to name two) in the Index absolutely do not get *updated by Server Name Mappings, so adding them will only make the problem worse!!! In other words, you'll have some URL-based properties that are relative to one URL and other MPs relative to the mapped URL...
But because Server Name Mappings were not intended for this scenario, I would not have expectation that this should work in all cases.
For example, if I issued a query from some site in the Web Application https://initech, then I should expect all results from this Web Application to be returned relative to https://initech (as in https://initech/result1.aspx and https://initech/result2.aspx). However, if I were crawling the URL of a non-Default zone, then my results will all be returned relative to this non-Default URL (such as: https://bargainclownmart:88/sites/myTeam/result1.aspx and https://bargainclownmart:88/sites/myTeam/result2.aspx ).
Update: I recently published "Alternate Access Mappings (AAMs) *Explained" to provide more insights on AAMs and to better illustrate its often misunderstood concepts.
In this scenario below, I have two Web Applications with the following Alternate Access Mappings (as a side note, I believe Host Named site collections are now the preferred method over AAMs, but I wanted to demonstrate this as an example):
Internal URL | Zone | Public URL for Zone |
---|---|---|
https://sp-foo:88 | Default | https://sp-foo:88 |
https://testingfoo:88 | Intranet | https://testingfoo:88 |
https://bargainclownmart:88 | Internet | https://bargainclownmart:88 |
https://bargainclownmart.officespace.lab:88 | Extranet | https://bargainclownmart.officespace.lab:88 |
https://faceman | Default | https://faceman |
https://initech | Intranet | https://initech |
https://initech.officespace.lab | Internet | https://initech.officespace.lab |
Observed behaviors when crawling the Default URLs...
In my content source, I specify https://faceman and https://sp-foo:88 as the start addresses and then perform a full crawl.
As expected, the URL for results is relative to the URL from which the query is performed. For example, notice the URL in the browser's address navigation bar shows https://sp-foo:88 and the results for this Web Application are also displayed relative to this same https://sp-foo:88 URL:
Results related to another Web App would also be relative to this zone (which to knowledge is new to SP2013). For example, if I query from the https://initech URL (in other words, from the Intranet zone), then all results related to this Web App would be relative to the https://initech URL (such as https://initech/result1.aspx, https://initech/result2.aspx, etc...) as seen in the last two results in the screen shot below...
- Further, the query, which was issued from the Intranet zone of the https://faceman Web App (In this case, https://initech as seen in the browser's address navigation bar), the results related to the https://sp-foo:88 Web App would also be relative to that Web App's Intranet zone
- As seen below, the search results for this Web App are relative to the https://testingfoo:88 URL (such as https://testingfoo:88/item1.aspx, https://testingfoo:88/item2.aspx, etc...) because it is also the Intranet zone for that Web Application
- If the query occurred in the Internet zone, then the results for the https://sp-foo:88 Web App would also be relative to the Internet zone (In that case, results would appear such as https://bargainclownmart:88/item1.aspx, https://bargainclownmart:88/item2.aspx, etc...)
- If the zone you're in doesn't exist in the other Web App, the results will just defer to the Default zone for that applicable Web App
- For example, if I issue a query from https://bargainclownmart.officespace.lab:88 (the Extranet zone of the https://sp-foo:88 Web App), the results from this Web App would be relative to the https://bargainclownmart.officespace.lab:88 URL (such as https://bargainclownmart.officespace.lab:88/item1.aspx, https://bargainclownmart.officespace.lab:88/item2.aspx, etc...)
- However, the https://faceman Web App does not have an Extranet zone, so all results would be relative to its Default URL for https://faceman (such as https://faceman/result1.aspx, https://faceman/result2.aspx, etc...)
- Likewise, if I extended https://faceman into https://hulkmaster as the Custom zone, queries from https://hulkmaster would show results relating to the https://sp-foo:88 Web App using its Default zone because that Web App does not have a Custom zone. In other words, results for this Web App would be relative to the Default URL https://sp-foo:88 such as https://sp-foo:88/item1.aspx, https://sp-foo:88/item2.aspx, etc...
- For example, if I issue a query from https://bargainclownmart.officespace.lab:88 (the Extranet zone of the https://sp-foo:88 Web App), the results from this Web App would be relative to the https://bargainclownmart.officespace.lab:88 URL (such as https://bargainclownmart.officespace.lab:88/item1.aspx, https://bargainclownmart.officespace.lab:88/item2.aspx, etc...)
For comparison, observed behaviors when crawling the non-Default URLs...
In my content source, I then specify https://faceman and the Internet zone https://bargainclownmart:88 as the start addresses and then perform a full crawl.
For my queries from any zone for any Web App, the search results related to the https://sp-foo:88 Web App will always return relative to the URL that was crawled... in this case https://bargainclownmart:88. In other words...
- If I query from https://sp-foo:88, https://testingfoo:88, etc., all results for this Web App will be relative to the crawled URL https://bargainclownmart:88 (such as https://bargainclownmart:88/item1.aspx, https://bargainclownmart:88/item2.aspx, etc...)
- For example, in this screen capture below, notice the URL for the result "Fight Club" shows bargainlclownmart:88 even though the query was issued from the https://sp-foo:88 URL (as seen in the browser's address navigation bar)
- Likewise, if I query from https://faceman, https://initech, etc., all results related to the https://sp-foo:88 Web App will be relative to the crawled URL https://bargainclownmart:88 such as below:
The moral to this story...
Always crawl the default URL (*the URL being crawled must be a Windows Authenticated zone) unless there is a REALLY good reason otherwise.