Partager via


Quick note about index backup

As you may already know, the only way to restore an index without having to recrawl is to use the out-of-the-box backup mechanism either in the UI or via STSADM catastrophic backup option. Using STSADM allows you to schedule the backup, but I was asked a question the other day that I couldn't immediately answer.

Does STSADM allow you to backup only selected items such as the index or do you have to perform a backup of everything?

The concern was that in our current design the collab sites share the same farm with the SSP and that all the collab content would have to be backed up in addition to the SSP using the out-of-the-box mechanism. Obviously, that wouldn't be good for a large farm where you depend on some sort of differencing mechanism to ensure you can backup all content within the backup window. Well I tested it and found that the SharePoint folks were on top of this (as expected, right :-) ) and created an STSADM option that allows you to chose which item you want to backup just like you do in the GUI. The syntax is:

stsadm -o backup -directory <SharePath> -backupmethod <full | diff> -item <object name>

In our case   <object name> is the SSP name which allows us to backup only the content needed to restore the ssp/index. Of course, you can chose any item you can see in the UI backup option, including individual content databases.

A note about capacity and backup performance. The MSIT index contains about 25 million items and the component sizes are as follows:

Search DB: ~370GB

SSP DB: ~65GB

Index (on the file system) ~150GB

When backing up this content (selecting just the SSP in backup) I'm told I need about 700GB of storage, but you'll notice that the cumulative size of this content is about 585GB. I'm not quite sure why it needs the additional 100GB of space, but I do know that the backup only consumes 413GB after it completes. More investigation is needed to understand the differences, however, my buddy Sam Crewdson tells me that DB fragmentation contributes greatly to the overall backup size. When he defragment's the DB, the size of the backup files are reduced.

Regardless, backup performance seems to be a factor of four things. How large is the content, what are the network limitations, what are the hardware limitations, and where is the backup share. For the longest time, MSIT backed up the index to a share on the index server. The problem with this approach is that the search DB (residing in SQL) is usually the largest component and has to be streamed from the SQL server across the network to the index server. That's not cool. Even with Gig/E, it can take quite a while to transfer that much data. A much better approach is to put the share on the SQL server. Now only the index file, which is usually the smaller of the two, has to cross the network. After making the change IT saw a major decrease in backup duration. In the neighborhood of 60%.

So how long? Well in my lab with nothing else going on, I can backup the aforementioned SSP in about 4 hours. That's about 100GB/hour. That's with some pretty awesome hardware. Your mileage will vary of course.

Mike

Comments

  • Anonymous
    October 19, 2007
    PingBack from http://msdnrss.thecoderblogs.com/2007/10/19/quick-note-about-index-backup/

  • Anonymous
    October 19, 2007
    PingBack from http://www.soundpages.net/computers/?p=3525

  • Anonymous
    March 24, 2008
    So to back up only the Search Index, I need to backup the whole SSP?

  • Anonymous
    March 24, 2008
    The comment has been removed

  • Anonymous
    November 26, 2008
    What about backing up content without the index?  Is this possible?  I don't understand why you would want to backup the index, that can be recreated on its own.  

  • Anonymous
    November 26, 2008
    It is possible to backup only the content. You can select just the web applications for backup. Many organizations find search availability very important and can't absorb the downtime required to rebuild an index. Some companies with small indexes can probably tolerate a day or two while the search index rebuilds, but larger organizations may need a lot longer. Microsoft, for instance, would require more than 5 weeks to perform a full crawl.

  • Anonymous
    December 03, 2008
    The comment has been removed

  • Anonymous
    April 04, 2009
    Did you find anything related to the extra size needed for doing the full backup? I tried a lot but I can't find any reason why the stsadm needs that extra size!!

  • Anonymous
    June 09, 2009
    Can I just backup only index and related data?  I saw that backup the whole SSP will include user profile DB too, which I don't want.