Search Visibility
one of the scenario where there is a custom web part which pulls data from People Soft application called "SKILLS" on the Page is not always there, it would be displayed only when we browse the site ,the SharePoint account does not have access to the database or any permission on the People Soft application.
Below are the concerns
1. How come we are able crawl the data from the people soft?
2. If we are able to configure it so easily, why there is an need of BDC?
3. What happens to the security if we are able to crawl data from Finance or HR database?
To understand the behaviour i decided to have an in house repro
Repro steps
=============
Installed a new server with SQL server 2005 SP3 and SharePoint 2007 SP2 on 2 different server
Ran the configuration wizard and created a new farm
Created a new web application and a new site collection
Then installed SHAREPOINT designer and inserted a data view from an Adventure works database which is not part of the farm. (similar to People Soft application)
Followed the below steps to Get a data view WebPart to display data from SQL 2005 server on a SHAREPOINT page as explained in the below link
Then created a crawl rule as below
In the search visibility I configured as below
Note: I have the data from external data source and it connection String is as SQL_SA account which I created for testing
created a new content source as below
And started a full crawl
Then when I search for the word "Television " I was able to retrieve results
I changed the search visibility option back to "Do not index ASPX pages if this site contains fine-grained permissions " as default
Then started full crawl . And searched for the same word "Television"
I was not able to get results.
As another test,
I created an new data view web part to insert a table which had more than 10 rows , as show below ‘
In the web part I had Last name as “Bjorn” and when I searched for Bjorn I did not get any results.
Explanation Crawling Process
The indexing process starts with configuring the content source and the start address(es). Indexing begins when the content source is either triggered to start -manually or scheduled. The spider governs the content discovery and retrieval. The gatherer reads the start address(es) of the content source and loads the protocol handler and IFilter. Once the protocol handler and IFilter are loaded, the content is collected as a stream of text. The data is then passed to the word breaker(s) and continues for noise word removal before the data is added to the index.
when we Go to the site, site actions -> site settings -> and in site administration tab look for search visibility
here you can set some details on how to index aspx pages.
This site does not contain fine-grained permissions. Specify the site's ASPX page indexing behavior:
- Do not index ASPX pages if this site contains fine-grained permissions
- Always index all ASPX pages on this site
- Never index any ASPX pages on this site
In the web part I had Last name as “Bjorn” and when I searched for Bjorn I did not get any results.
This is what I think that might be happening at your side and in the in-house repro that we have done.
If we select the option “Always index all ASPX pages on this site”, the crawl engine hits the site through the web front end. The
gatherer reads the start address(es) of the content source and loads the protocol handler
process this file as an HTLM page and process the URL and added to the index .Hence we were not able to search for the word Bjorn in the search results.
So the assumption that we had in the beginning were not true. What ever its displayed in the page when the crawler hits the site, it converts it as an HTML file when the option “Always index all ASPX pages on this site “ is selected.
Below are the links that can explain the Security considerations for search
https://technet.microsoft.com/en-us/library/cc262033.aspx
More details on the Search architecture
https://msdn.microsoft.com/en-us/library/ms570748.aspx
Learn how to exclude specific URLs from being displayed in search results.
Remove URLs from search results (Search Server 2008)
Hope this help!!
Comments
- Anonymous
November 10, 2011
Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha Ha