Making sense of a multi-workload ECM SharePoint installation, Part 1, Concepts

 

Introduction

SharePoint made its name with collaboration tools that integrates very well with other Microsoft products and is the foremost tools for user adoption.  With SharePoint 2007, Web Content Management capabilities were integrated from CMS, and it was also the start of more Document Management and Record Management functionalities.  With SharePoint 2010, it now scales to the demand of large enterprises, and brings together the value of social computing for these enterprises.  After all, the most important asset to a company are its employees, and SharePoint is all about connecting people.

Historically, companies often had separate products for collaboration, documents, records, and web content.  These tools may have integrated together, but the story was rarely great, especially during update times.  ECM was made for augmenting the findability of documents – and it started with metadata + classification. 

Enterprise-wide complete success in ECM is however somewhat mitigated.  Simply put, a person typically enters some metadata (if any) when creating the document, and won’t enter a lot of them – it’s often the default values.  The other typical challenge for classification was with asking the users which file plan number it was.  While I’m greatly oversimplifying, even that isn’t done properly for a complete enterprise.  Typical successes are for departments or small teams that typically handle records from the get-go, not for documents you iterate over and then become final.  

It’s also not intuitive for all users to work on documents on their computer or file shares, and then manually bring it to a separate store meant for “final documents only”.  In some situations, customers are using these stores for “published/official documentation”.  This makes the problem much more obvious as it’s typical for some types of documents to have multiple published versions (i.e.: you update a document for a new project phase) – users forget to update the published store.

Even if considering only SharePoint 2010, the challenge comes first in understanding these workloads and how closely they are related.  This series of post will attempt at consolidating information for a better guidance for these multi-workload scenarios, how it works with SharePoint, and how it works when integrating with other ECM products.

  • Part 1, this post, will explain the the general terminology and the key components from the SharePoint platform
  • Part 2 will discuss how to sort out a strategy for your multi-workload environment with sample document lifecycles
  • Part 3 will discuss how SharePoint can and should integrate with other ECM products
  • Part 4 will cover enterprise (SKU) areas of SharePoint, such as BI, FAST, and Project, that adds to the core values of SharePoint with no additional software costs

 

While we had some core elements of ECM in SharePoint 2007, we leaped ahead with SharePoint 2010.  Features like the Managed Metadata Services with its Term Stores and Content Type Hubs makes it easy and efficient to control and share taxonomies; the content organizer and multi-stage retention capabilities are efficient; and the in-place records are a new and welcomed component for business collaboration spaces.  All of this relies on a serious .NET and Workflow Foundation platform, but also with social components such as the Activity Feeds if necessary. 

If you hadn’t thought of SharePoint in the past for ECM, and more so if you struggled with implementing ECM with other products, now’s the time to take a serious look at SharePoint 2010.  Microsoft Consulting Services can definitely help you make the best use of your platform, enhance adoption, and reduce costs.

Let’s go ahead with some common definitions.

 

Enterprise Content Management [ECM] explained

The Association for Information and Image Management [AIIM] currently defines ECM as :

“Enterprise Content Management (ECM) is the strategies, methods and tools used to capture, manage, store, preserve, and deliver content and documents related to organizational processes. ECM tools and strategies allow the management of an organization's unstructured information, wherever that information exists.”

According to Wikipedia, it mentions that this statement was last revised in early 2010.  I’d say they are rightfully revising these statements in order to include new ways for sharing information.  Previously, ECM was defined as the technologies rather than the strategies.

In the field, part of the challenge is that companies are trying to purchase an ECM product and hope they’ll simply install it and content will be managed.  As stated, ECM is not a product, it’s a strategy which may require a few product to fulfill depending on the choices made.  That strategy must include the people factor, both from an adoption and usability perspectives.  The old saying of “These 12 metadata are all required and everyone will fill them according to their file plan” didn’t work.  It’s time to look at a strategy that will enter these information from its context with minimal user input.

Let me give you an example of contextualizing metadata for a document: you create a project workspace for a single specific project.  This project will likely define most of the information that you would have asked for each documents.  A strategy to ask this information a single time per project and automatically tag all relying documents from their first version is much more efficient.  The same goes if you need to augment with information coming from the user’s context.

Another important aspect is that ECM covers all types of content, not just document.  These can be in the form of imaging, physical records, notes, emails, voicemails, meetings, blogs/wikis/other web content, IMs, etc.  I’ve seen many implementations where companies only included documents due to product limitations.  While it’s a good start, make sure that it fulfill your ECM and eDiscovery requirements.

 

Document Management [DM] explained

The AIIM currently defines Document Management as :

“Document management, often referred to as Document Management Systems (DMS), is the use of a computer system and software to store, manage and track electronic documents and electronic images of paper based information captured through the use of a document scanner.”

Typical features of Document Management are :

  • The ability to checkout and check-in a document ensuring integrity for authors and readers
  • Version control that allows segmenting what readers and authors see, and the ability to rollback to previous versions
  • The ability to audit access and changes to the information and its permissions
  • The ability to qualify the information with metadata
  • The ability to start simple and complex workflows for collecting feedback, approve, or start a process
  • The ability to reuse document templates

Historically, the technology was made in such a way so that DM systems were an hierarchical tree based structure of folders and documents.  In my experience, these systems were mostly implemented within specific departments such as Legal and HR who had a process in place for documents.  They also typically didn’t go for collaboration or social.  There will likely always be some types of documents that won’t require social or collaboration, but as we find new ways to share and participate, these types of documents will lower quickly.

What’s important here is that there should be multiple DM repositories in any enterprise.  Some will be collaborative, some won’t.  Your Solution Architecture will determine the repositories you will have such as collaborative, financial statements, audio/video archives, and web content (these are merely examples).

 

Collaboration and Social explained

The AIIM currently defines Collaboration as :

“Collaboration is a working practice whereby individuals work together to a common purpose to achieve business benefit.”

 

I think this statement is vague on purpose.  Collaboration assists users in working together on content and deliverables.  The final deliverable is typically either web content or a document, but the collaboration features also captures the other shared information during the process.  The key thing here is that Collaboration is scenario focused.  Rolling out SharePoint for blank Team Site is typically not responding to a scenario.  Look for examples like these:

  • You have a meeting workspace site that integrates with Microsoft Lync to deliver the presentation for remote workers.  The meeting clerk takes notes using OneNote that connects to the meeting information and appends the notes Live.  Meeting attendees can review these notes with their Windows Phone 7 and apply changes.  Finally, the clerk finalizes these changes and creates a Meeting Minutes document.
  • Your colleague finalized a document and posted it in a workspace.  You notice that your colleague added that document through the activity feeds and, after reviewing it, you assign it a rating that will affect search relevancy.
  • You have a set of company policies defined in a web content management system.  You allow users to enter comments on these policies which facilitate the discussion the communication with employees, allows them to post notes on these policies which help the HR team gauge the employees feedback.
  • You post a question through the activity feed, someone who’s defined as an expert in the subject of which you asked the question responds to you through his phone.  This response is then elevated as a format Q&A element in that subject’s community site.
  • Your job is to respond to media queries and, when such a query pops up (through something like this morning’s newspaper), you create a team site where you aggregate information from all your content sources in order to create an official media response.

As you can see from these samples, Documentation Management is really a part of collaboration.  After all,  checkout/check-in, version control, metadata, and workflows are all form of collaboration – BUT so are others.  This is important when you consider integrating other ECM products as the way they do this integration will impact your collaboration and social capabilities – or it will affect the user experience, and thus, adoption and employee retention.

Social can be viewed as an extension of collaboration functionalities where we move towards openness and freedom for users to author content.  Quoting the AIIM again, social software should have :

    • Search: allow users to search for other users or content
    • Links: group similar users or content together
    • Authoring: include blogs and wikis
    • Tags: allow users to tag content
    • Extensions: recommendations of users or content based on profile
    • Signals: allow people to subscribe to users or content with RSS feeds

…and…

    • Freeform: no barriers to authorship, i.e. free from a learning curve or restrictions.
    • Network-oriented: all content must be Web-addressable.
    • Social: stresses transparency (to access), diversity (in content and community members) and openness (to structure)
    • Emergence: must provide approaches that detect and leverage the collective wisdom of the community.

Personally, I’m buying this as I can see where content is more up to date when it’s freeform.  Historically, with documents, we have to rely on the few individuals who were making these documents and they tend to be too busy.  While it’s open, there are still controls in place before approving an update, but this allows for people who are expert in a specific section of a document to contribute there.  It also allows people who identifies an error (such as something that isn’t true anymore) to complement the information.  When working with documents only, you either don’t have the ability to change this, or you’d have to go through the author who may not be there anymore or who may be too busy to update the documentation.

SharePoint Social capabilities are ultimately the My Sites coupled with the Activity Feeds and Tags.  Think of the Activity Feeds as a mix of Facebook wall and Twitter – with a business purpose.  The core framework is very solid.  You can look at NewsGator’s Social Sites for SharePoint 2010 to see what you can do with the out of the box framework but with a much improved GUI.  They merge the activity feed with community sites and allow for asking questions to experts – and so much more.

In content management, Social can also take the form of your company’s information on networks such as Facebook and Twitter – you may (probably) need to be there, but that content also needs preparation and be recorded.  Your ECM strategy will eventually have to plan this ahead and define a way to both be legally compliant and also fluid enough so that you can respond fast on these external networks.

 

Record Management [RM] explained

The AIIM currently defines Records Management as :

“Enables an enterprise to assign a specific life cycle to individual pieces of corporate information from creation, receipt, maintenance, and use to the ultimate disposition of records. A record is not necessarily the same as a document. All documents are potential records, but not vice versa. A record is essential for the business; documents are containers of "working information." Records are documents with evidentiary value.”

Historically, Document and Record Management have been done together since the hierarchical tree based structure of folders and documents was a convenient way to apply records directly.  Products were made as such and you can get a 2 in 1 easily.  However, User Adoption and proper metadata entries have been a big challenge for these solutions.

What I particularly like about the definition is that “Records are documents with evidentiary value”.   I’ve often been confronted with customers that believe that all documents are records.  When you engage in the discussion and show examples of documents that are likely not a potential record (i.e.: draft documents, the document you create for some office event like Christmas, etc.), they’ll change to all ‘enterprise documents’ with or without a clear indication on how/when a document becomes “enterprise”.  Things can turn in circles for quite some time.

In the end, a record starts with a human intervention.  You define it as a final document by setting a property/metadata, by right-clicking on it, etc.  You might be able to automate to some level, like automatically making all last published versions of documents in a project workspace as records when you close the project.  The action of closing the project is likely manual as well, but it could be automated through the project end date.  Last, you may have repositories that only contains records and thus, the manual action is that the user is navigating there to drop the document (he likely created/collaborated on the document elsewhere) – unless it’s an automated system that put the document there.

On a more practical level, you would define the document templates for a project workspace and get elements such as :

  • Unspecified document types are to be kept for the project duration + n years.
  • Documents of type Meeting Minutes are kept for n years
  • Documents of types “Envisioning”, “Planning”, “Designing”, “Architecture”, “…” are kept as records with the workspace for the workspace duration then moved to the archive and kept for a total of n years

By planning ahead, you can design your workspaces so that it’s automated for users.  When going in for a meeting’s workspace, if they create a new document, it’s automatically a Meeting Minute document and will inherit metadata from the site and the meeting (after all, you are creating the meeting for specific reasons which will likely translate to some metadata for the meeting minutes).  All of the official document types are readily available in the Project Documentation library.  Site Owners may add other document libraries for their collaboration requirements, and these will fall under the “Unspecified document types catch all”.

The nice thing about this quick sample is that all social capabilities are enabled when interacting with information and documents at the project level.  This is where it has values from a people’s perspective.  When the official documentation becomes a record, it stays in the workspace and shouldn’t be moved right away – this helps on the user side as they are still working on the project and want a single place for all of the project’s documentation.  The project’s collaboration workspace will eventually close out.  It’ll likely become read-only when the project ends, and may simply be deleted a few years after the project end.  All official documents will have been moved at the archive before the site’s deletion.

Records Management isn’t an IT project.  It sounds silly, but I’ve seen it.  While IT facilitates the technology piece, and may have some sort of architect role close to information management, but it needs to be business driven with the legal staff.  To make it possible, you’ll probably have a general strategy and then break down in multiple small projects to define the policies.  The key factor being smaller projects – trying to do a whole large enterprise in a single shot doesn’t work.  While IT will still drive the general technology platform, each business units must drive their own information requirements (with possibly the help of architects coming from the IT department).

Records Management has always been about policies, but one of its big requirements in the past was for findability, or rather the ability to find back published documentation.  Search engines have evolved in such a way that findability is not as much a requirement for Records Management, but eDiscovery is now a major driver for RM, and that is our next subject.

 

eDiscovery explained

Citing the AIIM again, eDiscovery is defined as :

Discovery is the term used for the initial phase of litigation where the parties in a dispute are required to provide each other relevant information and records, along with all other evidence related to the case.

As such, eDiscovery is for all electronic information, not just documents.  As I mentioned in the ECM section, it can take many forms such as imaging, physical records, notes, emails, voicemails, meetings, blogs/wikis/other web content, IMs, etc..  This is another reason why trying to take a document-only strategy will not lead to success. 

At this level, SharePoint 2010 is designed to support compliance but may or may not be your main tool for eDiscovery.  This will depend on your email/IM archival strategy (amongst other things).  You do have the ability to store the archives of these systems closely with SharePoint so that you can lead eDiscovery with SharePoint, but you should define your strategy first.  How do your legal council work in these events? should they hold the records directly and/or should they copy what they found in a specialized team site for the litigation? How do they identify records?  How do they audit and report back the actions on records?

This article provides starting information on Records & eDiscovery for SharePoint: https://msdn.microsoft.com/en-us/library/ee557329.aspx.

 

Publishing, or Web content, explained

When SharePoint 2007 came out, it had merged the ‘old’ Microsoft Content Management Server (MCMS) in the platform and it was then called the Publishing features.  Typical SharePoint environments in the past have been mostly using either Publishing or Collaboration features, and often in separate environments even for Intranets.  SharePoint 2010 democratized the Publishing features by making its use much more convenient – it’s approach is close to that of Wikis.  Let me break down what the Publishing features contain:

  • The ability to checkout and check-in web content ensuring integrity for authors and readers
  • Version control that allows segmenting what readers and authors see, and the ability to rollback to previous versions
  • The ability to audit access and changes to the information and its permissions
  • The ability to qualify the information with metadata
  • The ability to start simple and complex workflows for collecting feedback, approve, or start a process
  • The ability to reuse web templates
  • The ability to have an authoring optimized environment, and a visitor optimized environment with synchronization of content between the 2
  • The ability to have the same page in multiple languages
  • Web Analytics
  • The ability to have dynamic queries

If you go back to my “Document Management explained” section, you will notice that the first few items are exactly the same but refers to Web content rather than Document.  That’s because the only difference is the format presented to the user and that’s all it should be for the most part. 

Since publishing features are just like documents but in web format, plan ahead!  Ask yourself how your information should be consumed!  If all the users are going to use these on a web portal, don’t go creating documents just to transform them – it’s more complex and doesn’t add value.  With SharePoint 2010, you can make this web content as records just as much as documents.

And looking at social features, they all have a web content piece.  The new thing is that they are also integrated with our devices such as smartphones.  I can definitely see a future for an enterprise Twitter-like capability that could be used from your business productivity suite (Office), the web, your IM (Lync), and SharePoint.  Publishing merged with social features definitely still has a future!

 

What happens with these wikis, blogs, and notes?

Web Content can also be in other format such as blogs, wikis, or even a simple note board – at some level, they are all essentially a textbox that allows you to enter content.  This can lead to some interesting discussions with large enterprises that haven’t explored Web 2.0.  I’ve seen customers that added blogs in a hurry because “it’s popular and we have to be social”, but weren’t accepting comments!

  • Blogs are web content articles allowing interaction between the author and readers through comments.  At a minimum, it provides capabilities for querying archives by months, and supports RSS for syndication with other devices.  Blogs are typically from a personal perspective, but it can have a different meaning such as a product blog or a team blog.
    • If you have a blog, you should allow comments, and it’s expected that the author responds to them as well!
  • Wikis are when you open up authoring to a large audience, it could even be the whole enterprise that are allowed to update content.  A common misconception is that wikis cannot be as structured as document management.  That is not true, with SharePoint 2010, you can apply policies and workflows on wiki pages just as much as documents.  For example, you can define a content owner that needs to approve changes before it’s seen by everyone else.
    • Enterprise Wikis in SharePoint 2010 are actually based on the publishing features and merges the idea of a few content field and a general ‘free’ content area found in wiki pages.
    • The nuance in this general free content area is that it’s up to the user to define the desired structure.  While discomforting to some, it’s actually the goal as it allows everyone to participate in their own way.  It’s better to have content detailed in some form, than have a super structured documentation that isn’t there because people don’t like the structure or cannot take the time to fit in.
    • The term Wiki can and should be used differently for different solution.  You may have an enterprise wiki for official information which may have an approval workflow for page creation, and content owners for each pages.  You may have a wiki library in each project team sites so that people can log information as they go, this wiki’s lifecycle will be part of the site lifecycle only.

In short, you should have a strategy on how blogs & wikis are used and presented, not really on how to block it.  They are a format, not a productivity enemy, and in fact, they are a productivity helper.  In the past years, some more conservative companies were adamant about blocking blogs & wikis because employees shouldn’t write about personal stuff.  The interesting part is that those employees are intelligent enough to write personal views on company policies or products in their company blog environment;  if they want to share their week-end, they’ll go on Facebook, they won’t do it on the company blog.  The 2 primary reasons for that is that it’s identifiable and also because people want to write so that they can either be recognized or can make things better.

 

How SharePoint changed the game

SharePoint 2007’s features were superb and came out at a great timing.  Most document management systems were used for either specific departments or where used as the archive where employees would have to manually copy their files there when they deemed it final.  This left a vortex of files stored massively on file shares which offer limited (any?) features.  These files are unmanaged and have been piling in without retention forever.

With SharePoint’s ability to easily create workspaces as needed, it was a slam dunk.  Integration with the Office client was built-in without a separate deployment was a big plus.  And making all of this available through a web interface made it convenient.  Traditional content management systems were sure to try to point out where SharePoint was missing features, or how theirs was better, but the reality is that SharePoint is a more than a good-enough all-around platform.

SharePoint’s integration is top-notch and is a key figure with Microsoft infrastructure.  Active Directory? check.  Exchange? check.  ForeFront Security? check.  Configuration Manager? check.  Monitoring? Check.  Data Protection? check.  Lync? check.  Office? check..  ForeFront Identity Management? check.  Beyond that, they make AD, Exchange, Office, and Lync more relevant!  All companies struggle with maintaining employee information (the one where the employee should update) – SharePoint does it and talks back to the AD store.  You need to manage distribution lists for communities? plug in with Exchange and ForeFront Identity Manager and you got it on all front.  You collaborate and it goes to your activity feed which can be viewed through Lync or Outlook! 

This is definitely a better together story.  SharePoint connects people, not just information.  That’s where it won in user adoption, and that’s where the competition has been lacking.  SharePoint is also an all-around platform, not just for a niche.  Can you find a better product in any particular aspect for SharePoint? probably and you can go ahead and debate what is still missing anyway.  The reality is that products shouldn’t be thought in silos anymore.  Integration costs are staggering and rarely offer what you want.

The last ECM magic quadrant from Gartner (https://www.gartner.com/technology/media-products/reprints/microsoft/vol14/article8/article8.html) shows how much Microsoft has evolved in this sector:

GartnerECM-2010

 

I hear SharePoint is severely limited, is that so?

Ahh the famous 100GB limit or 2,000 list limit.  Newsflash: these were hard-coded limits, or break points.   While they were recommended software boundaries or supported and tested limits, they should be thought in context.  First off, the 100 GB limit, it’s a recommendation per content database for 2007 due to recovery times.  With 2010, the current limits are at 100GB per site collection, 200GB per content database, and 1TB limit for mostly-read scenarios. (i.e.: Record Center)  If you need a single site with more data, consider using RBS and plan your disaster recovery accordingly.

2nd, the 2,000 limit.  That’s now 5,000 by default.  This isn’t the limit in a list/library, it’s per view!  This is simply because you will start having performance issues at some point if you keep retrieving all your data.  You wouldn’t see these issues with a traditional system simply because either you didn’t have enough data, or you were able to plan ahead and make sure you weren’t retrieving too many items.  With SharePoint, the end-users have more flexibility and it comes with responsibilities.  Unfortunately, some companies didn’t plan ahead and let their users do to much.  This can be easily mitigated through rapid training and governance.

The actual physical limits are more difficult to discern because there are many facets to them.  For example, you won’t design your repository the same way for a video archive, financial statements, or project documents – they don’t have the same size and thus a different limit will be in play.  For file size, you can consider RBS, multiple data files in SQL, or even the Content Organizer to distribute the load across multiple site collections (it could even be a tenant).  I’m confident that SharePoint can play in the 100s of TeraBytes of content – you just need a lot more planning for that size than for a WCM portal!

Like any system, SharePoint should be planned.  Governance should be done and re-evaluated on a regular basis as new requirements are identified.  Active monitoring should be done to assess the environment and identify upcoming pain points before they happen.  Companies have to find the right mix between flexibility and control by providing capabilities to the end users, but also sufficient boundaries and training so that it won’t negatively affect other users.

SharePoint can definitely handle larger ECM implementations – but it requires proper planning and adequate resources, just as you would with any other software providing ECM capabilities.  The challenge has been that SharePoint is easier for anyone to use, and often with little governance and monitoring, as such, problems are being identified late and are more difficult to resolve.  Also, the Product Group has been very transparent with articles such as the Software Boundaries and Enterprise Content Storage Planning that describes the supported limits.  ALL products have limits somewhere, Microsoft simply chose to write them in order to make it easier to design your solution.  I’m confident that the Product Group will always look at ways to augment these supported limits.

 

SharePoint isn’t just a product, it is a platform with a strong ecosystem of partners

If you are missing a small piece for a key organization requirement, there is likely a 3rd party that adds to SharePoint due to its rich platform.  Competitors will argue that they have this built-in.  To be honest, the reality of content management systems is that they are a party of merged companies.  A lot of these acquisitions may have changed under a same branding umbrella but they aren’t fully integrated yet neither. 

For example, look at OpenText, one of the strong ECM player, they had LiveLink which is now rebranded as OpenText ECM Suite, but also includes the other Hummingbird, Captaris, RedDot and Artesia in it.  These were either competitors or add-on software that were merged in their ECM suite.  When you want different segments of their ECM suites, the cost goes with it too.  The challenge comes with making sense of the licensing and patching – it’s not easy.

That’s what I find interesting because 3rd party companies adding to SharePoint will adhere to best practices from the get go and will look just as much as being part of the product.  They have to do this to stay relevant as the SharePoint platform evolves.  When looking at augmenting SharePoint features, the buy vs build debate is important.  There are some very good partners for things like workflows, social, governance reporting, metadata driven permissions, and many more.  They all share one thing in common, they augment SharePoint through what its underlying platform offers – that is definitely more integrated than a product acquisition sharing the same name.  It also shows how rich that underlying platform (.NET and Workflow Foundation) is. 

And last, it also allows for these expert partners to update more frequently as necessary.  Take the Social partner, NewsGator for example, they can make 10-12 releases between each SharePoint release.  It’s relevant to them as the Social space changes rapidly (Twitter had 400k tweets per quarter 3.5 years ago, it had 65 million per day in June 2010 – back when SharePoint 2010 started planning, Twitter wasn’t much on the radar).  SharePoint did create an awesome Activity Feed framework though, and NewsGator filled the GUI gap and will continue evolving it.  The same goes for other things like RBS or Workflows.

 

Final words for Part 1

I hope you found this article practical for defining the general concepts on ECM & SharePoint.  SharePoint 2010 is a great product and will even be better in the ECM space with future updates and releases.  It combines both the ease of use, adoption, and strong ECM capabilities – and it also brings a solid Microsoft (Office, Active Directory, Windows, etc.) and partner ecosystem.

The upcoming parts will discuss how to plan these multi-workload environments, how we integrate with other ECM products, and what additional value the Enterprise SharePoint offering gives you with no additional software cost.