Variations design considerations (SharePoint Server 2010)
Applies to: SharePoint Server 2010
The variations feature in Microsoft SharePoint Server 2010 provides copies of content pages that are linked across a site structure. There is a one-to-many relationship of a single source site and multiple target sites. This enables both modification and translation of content, while maintaining a process to provide page updates across the different variations sites. For more information about variations, see Variations overview (SharePoint Server 2010).
There are several design considerations that must be weighed when you implement a variations infrastructure within your publishing environment. This article outlines some of these considerations, and makes recommendations about how to approach them in common scenarios.
In this article:
Variations process overview
Variations process overview
The following steps describe the process that is undergone when variation sites are created:
First, a source label and one or more target labels are created. This specifies which site within the site collection will contain the variations site hierarchy. In general, the assumption for most sites is that the variation source site will be created from the root of the site collection and the locale will be used as the variation label.
SharePoint Server 2010 supports up to 50 variation labels.
After the variation labels are created, the variations site hierarchies are created. This process creates all sites that are required for the whole variation site hierarchy. This can become very time-consuming if there is a deeply nested site structure within the site information architecture. This cannot be avoided in many large information architectures.
After the variation labels are created, the content is copied to the target variation sites. This includes the pages that were created in the source variation site. Depending on the configuration of the Resource Setting, images and documents could also be copied to the target variation site.
After the whole process is complete, pages that are updated and published on the source variation site are copied to the target variation sites as draft copies for later translation or modification before publishing the pages.
Although this process seems straightforward at face value, there are several considerations to make regarding the process and the effect of your design on the overall effectiveness of your variations solution. The number of sites, number of pages, and overall content size affect variations solutions in different ways. Although all these factors contribute to the overall database size, some have more effect than others, and different mitigation strategies can be employed to reduce the footprint of the content within the variations site hierarchy.
We recommend that you read the articles in the Performance and capacity management (SharePoint Server 2010) section before you plan for the size of your content databases. Although databases that are primarily read-only can grow much larger than the recommended size limit, there are considerations around content deployment, such as backup and restore and variations when you design the physical and logical architecture of your solution. Even for a target content database size of 200 GB, backing up and restoring a database that has that amount of data takes time. Also, the deployment of that large set of data over content deployment will lead to long deployment times and consumption of potentially double that amount of disk space on the source and destination servers when it deploys the content. Therefore, the size of the databases and how they will be managed should be considered during the design phase of the project. For information about content database size limits, see SharePoint Server 2010 capacity management: Software boundaries and limits.
Specifically, when using variations, the location of your content within your information architecture, in addition to the kinds of content that will be served, will contribute to the overall database size of the solution. Placement of large amounts of binary content such as documents and images within the solution will lead to a large database size. Steps should be taken to reduce the footprint of this binary content where it was possible to provide the best performance and lowest possible database footprint for this content.
For information about different design scenarios, and the pros and cons of each, see Design scenarios, later in this article. These scenarios can be used as baseline configurations when you are determining the appropriate design for your information architecture.
One other concern for database sizing is the document versioning strategy used for content pages and other documents. Allowing an unlimited number of versions of these files will increase the database size, because it relates directly to the storage of the versions of content for each file. The number of versions of files within the system can increase the size of the database greatly. We recommend that document collaboration and versioning of source pages and other documents occurs outside the publishing site, and that versions be used very sparingly only where absolutely required to support the business requirements.
Variation site size
The overall size of a variation site includes the size of all site data together with the size of all page content within that site. When you create variation sites, there are two options for how resources are referenced in the target variation sites. Resources include all binary content, such as images and documents.
Reference existing resources You can reference existing resources, which indicates that you will maintain the references (links) to the binary content within the source variation site. If you need to edit the content during translation or localization and reference a localized resource, this resource can be uploaded to the new location within the variation site and be linked to from the content appropriately.
Copy the resources You can copy the resources, which enables all items to be copied to each target variation site. Typically, when the content has links to resources, these resources are copied when they are linked to from inside the field controls on the content pages. This means that any binary content that is referenced in the source variation site is copied to the appropriate library in the target variation site and will create a copy of the file in the content database.
When designing for variation sites, the decision to either reference or copy resources should be considered carefully, as it will have an effect on the design of the solution storage. The primary benefit to referencing resources is that it results in a smaller database size. If the majority of the binary content will be reused across most or all variation target sites, use the Reference existing resources option, and only put the content to be translated or localized inside the variation site hierarchy. If the majority of the binary content is to be translated or localized, or both, use the Copy the resources option.
This section describes three different design scenarios for small, medium and large sites, and explains the pros and cons of each scenario.
This scenario is a small site that has few pages, a small site structure, and fairly low amount of binary content. Sites that typically fall into this category have less than 1 GB of binary content that is used on pages in the variations site hierarchy. In this kind of site, the choice of copying or referencing the content within the variation sites is of less a concern from a storage perspective. Because it has only a few pages and a small structure, this kind of site would leave a fairly low content database footprint even when variation sites are created.
With resources referenced, and 1 GB of resources in the target variation sites, we would expect a database size of less than 5 GB for even a large number of variation sites. With 1 GB of resources copied to each variation site, we would expect a linear increase in the content database size per variation site created. For example, 5 variations = 5 GB, 10 variations = 10 GB, and so on. This would still be considered manageable using typical database and site collection processes while still maintaining an acceptable restore point objective and content deployment time.
This scenario is a larger site with many more pages in the site structure that make up most of the content in the site, but with fairly low amount of binary content. There may be some binary content within the variation site hierarchy, but most other content is stored in the same site collection outside the variation site hierarchy. This would enable some combination of document storage within the site collection without repeating the content per target variation site.
With resources referenced, 5 GB of binary content stored in the root of the site collection, and 1 GB of binary content stored in the source label, we would expect a database size of at least 6 GB, and only a small increase for larger numbers of variation sites. For example, 5 variations = 6 GB, 10 variations = 6 GB, 20 variations = 6 GB. If we assume 5 GB of binary content stored at the site collection level, and 1 GB of data storage within the variation site hierarchy, and resources copied, this database would still be manageable as a single site collection. For example, 5 variations = 10 GB, 10 variations = 15 GB, 20 variations = 35 GB, and so on.
This scenario consists of large amounts of binary content together with a large site structure. We now expect to deal with potentially large amounts of data apart from the actual page content itself. These kinds of sites typically have lots of binary content such as documents and want to translate most of these documents into local languages. We are still not assuming that versioning is used within the libraries as this can bloat the overall database size significantly. Assumptions on versioning effect should allow for the maximum potential sizing of the document versions based on the versioning strategy.
In this scenario, we will assume 20 GB of binary assets that reside outside the scope of the variation site hierarchy. We will also assume 5 GB of content per variation site that resides within the hierarchy (1 GB of page content, 4 GB of binary resources). The following sections describe the effect that the location of resources has on the size of the variation sites.
Single site collection with copied resources
If we decide to leave this content in a single site collection and copy the resources, the size of the content database will quickly increase.
For a single source variation site, we have 25 GB of content. For each target variation site added, we would increase our content database by approximately 5 GB. For example, 2 variations = 30 GB, 10 variations = 75 GB, 20 variations = 120 GB, and so on.
Single site collection with referenced resources
If we decide to have a single site collection but reference the content from the source variation site, we can reduce the overall size of the content database. However, if most of the binary content within the variation sites are to be translated, then this approach may not have the effect that you want because copies will be made by the content authors within the target variation sites. We are assuming that only 1 GB of the 5 GB of content per variation site called out in this example is for storage of page related data that would be replicated per target variation site.
At best, we could assume the following statistics: 1 variation = 25 GB, 2 variations = 26 GB, 10 variations = 34 GB, 20 variations = 44 GB.
Although this approach will lead to reduced overall variation site sizes, it will not solve the potential fundamental issue of a large amount of binary content that is stored at the root site collection. As this amount of content grows, it can become unacceptable to still treat this as a single site collection. An alternate design approach is to break the documents and other binary content into one or more site collections dedicated to the storage of these files. Although this will influence the overall site design, especially with content aggregation and roll-up scenarios, it will lead to a more manageable variation site hierarchy that delivers better performance.
Multiple site collections
By using this approach, the large site collection scenario could be broken into a site structure that resembles the following site structure:
www.contoso.com This site collection is used for the publishing pages and related content. Variations of this content would be created within the site structure. This site collection is created with its own content database independent of the content database for documents and other binary content.
**www.contoso.com/docs** This site collection is used for the document gallery. All documents and related binary content should be tagged with languages, culture, or other appropriate information to enable aggregation via search. The document gallery should use a folder structure that mirrors the site structure of the variations site hierarchy. This makes it easy for content authors to locate related content. Documents can be distributed across one or multiple site collections based on the requirements of the project.
Since the www.contoso.com/docs site collection is not stored within the root publishing site, we could expect 1 GB of content per variation site, with 1 GB of branding-related binary content. This would enable a very manageable publishing site that uses a much smaller content database size.
In this design, there would be two content databases.
Documents 1 content database for documents would be 20 GB. Translations of documents could bloat this database size similar to the scenarios outlined earlier in this section. Multiple documents that are stored for translation purposes would increase this database size but this would not have any effect on the variation process that the publishing site is undergoing.
Publishing Site 1 content database for the publishing site. This content database would be much smaller and would give the best performance of the variation infrastructure. Only the content pages themselves and a small amount of branding resources would be involved in the site collection directly. For example, 1 variation = 2 GB, 2 variations = 3 GB, 10 variations = 11 GB, 20 variations = 21 GB, and so on.
Although this design strategy gives more precise management of the binary content within the solution, it does impose some additional design work, especially related to aggregation of document content. If these documents were aggregated using Content Query Web parts within the site collection, this approach will fail when it tries to aggregate the documents across site collections. New Search queries may have to be written and web parts created for displaying search queries instead of content queries. Therefore, if lots of documents are considered within the overall solution, breaking them into separate site collections will be better than trying to store all content in a single publishing site.
This also does not solve the intrinsic issues with deploying lots of documents by using content deployment. Deploying a large amount of content across farms will take time on the initial deployment. Incremental deployments will only be as large as the number of documents modified in the time since the last deployment. Therefore, it is strictly dependent on the number of document updates relative to the overall corpus of information.
When you determine your storage requirements for a variations project, be sure to plan based on the amount of content you expect to have in the future, not what you currently have. For example, how much content will you have two years from now? How many variation sites do you expect to add in the next two years? As you design your variations solution, consider the following guidelines:
A large content database size can greatly increase the time it takes to create new variations and to run content deployment jobs. Reducing the size of the content database will improve performance during both variation propagation and content deployment. How you achieve this size reduction will depend on the number of variation sites needed, and the amount of content used by each variation site.
If you have a large amount of related content stored outside of the Pages library of the source variation site, use Reference Resources or store the content in a separate site collection.
If you store related content in a separate site collection, be sure to use Search to aggregate content into Web Parts on the variations sites.
The SharePoint Server 2010 Content Publishing team thanks Steve Walker, Microsoft SharePoint Customer Advisory Team, for contributing to this article.