OCS and Virtualization

I thought a good topic to discuss would be that of OCS and virtualization, as virtualization is obvioulsy a hot topic that can save real dollars for people and there is some general confusion as to whether or not OCS supports virtualization or not (most people think it doesn't)..  As a quick background, the OCS product enables multi-modal communication to an enterprise which includes work loads such as: Instant Messaging and Presence, Conferencing - (which could include application sharing, content such as viewing a Powerpoint together, and real time Audio Video as well), and Voice (where OCS acts as a PBX to do P2P Audio/Video but also integrates with PSTN networks either thru a SIP trunk or a traditional IP/PSTN gateway).

Our product team has actually been working with virtualization technologies for 3 years internally as it makes it much easier to deploy and test topologies prior to releasing the product.  In our OCS 2007 R2 release, we have made the statement that our product does indeed work within a virtualized environment - but we added a few asterisk's, caveats, and disclaimers to this message such that the original spirit of what we support might have gotten a little lost.  Here is a link to a document that explains what we can and cannot do:

 https://technet.microsoft.com/en-us/library/dd572860(office.13).aspx

So if you read this blurb, it sounds like we cannot work under certain configurations most notably audio/video related modalities (i.e. get 4 of your closest friends into an audio conferencing meeting using OCS hosted in a virtualized environment and what happens?).  The truth is that technically the product does work in situations like this where you are doing real time media such as audio/video, etc.  In fact, we set up the product internally in our labs literally hundreds of times per week to go through our internal test passes - so we do have a fair amount of experience in this area.  So you might ask, well then what's the problem - why are things like Audio/Video listed as 'not supported' by OCS in virtualized environments?  Our problem is we cannot assert the load and media quality characteristics of our solution in a virtualized environment, so to some degree we're trying to be super careful and not overpromise.  As an example, when our server mixes real time audio streams together to form an audio conferening solution - latencies are critically important (80 millisecond latency sounds good in an audio conference, 200 millisecond latency sounds pretty terrible and people cannot communicate effectively without talking over each other all the time).  In a virtualized environment, the thread of execution can be taken away from us at any moment to go serve another virtualized workload (E.g. perhaps SQL is another virtualized workload deployed on the same box and now at the worst possible time while we're mixing audio streams the SQL workload needs more real CPU cycles to complete a query).  What happens to OCS or to the user's audio conference at this time?   In all liklihood, the user's are going to hear a noticible audio glitch in the middle of the conference call for a duration that is dictated by how long the thread of execution is away from our audio mixing algorithm.  If there are no other virtualized loads running in the system, there will be a slight performance drag but the solution is still usable.  So in a nutshell, since we cannot determine what other virtual loads are running in the system and therefore assert what audio quality experience might be - we decided to officially "not support" this particular workload so we wouldn't have potentially unhappy customers.

What we're doing now in our next release of the product is testing what the experience is for a lesser number of users than is normally the peak (under provisioning) so we can give guidance to people as to what people can expect in terms of scale and audio experience.  Hopefully, my explanation above explains what the considerations are and that the product does actually work in these environments but your mileage will vary greatly depending on what other workloads are running at the time and what your virtualization software solution is.  Our experience with the upcoming version of the product is the number of users you can support range from 50% of normal to 80% of normal depending on your virtualization platform (this assumes not much else is running at the same time).  So I believe a key ingredient is really planning what other workloads are running on this same box at the same time, to help them co-exist better.  Our OCS solution is fairly disk intensive for workloads like Instant Messaging and Presence and Conferencing with data like viewing a powerpoint, etc.  However, we become very CPU intensive once operations like application sharing or real time audio become involved in a conference.  So planning and partitioning your solution such that not all the same CPU intensive workloads are all on the same box is of course the first real step to having things work well together. 

Until next time..

shaun