다음을 통해 공유


Manageability Maturity Model

Return to System Center Page

The Manageability Maturity Model is useful as a planning and self assessment tool.  The way to use the tool is to look at the indications to determine where your product manageability level is, then look at the impact that being at this level has on customers and TCO.

 

Then look at the kinds of investments that are reasonable, as well as common issues as teams enter each level.  This tool is useful for engaging with the MP Best Practices team in a discussion of ways to improve with near term, mid term and long term investments, and is useful in framing trade-offs with different approaches.

  Level 0 Chaotic Level 1 Documented Level 2 Basic Level 3 Standardized Level 4 Rationalized Level 5 Dynamic
Indications Instrumentation exists Goal is developer level diagnostic & Trace Self Documenting All of level 0 Events have KB articles; Event Viewer links Knowledge articles focus on trouble-shooting Rudimentary MP possible, but afterthought if present Instrumentation supports trouble-shooting. Details are symptomatic, not focused on causes or with regard to action possible. All of level 1 Symptom biased instrumentation Few root causes are identified by direct instrumentation KB articles focus on diagnosis, then repair and remedy Transients look like problems False alarms exceed actionable alerts Change up in instrumentation approach Root causes starting to be identified by instrumentation Instrumentation focuses on issues not just symptoms MP grade is mostly actionable Matching knowledge article starts to include proactive maintenance steps Tier 1 starts to close more alerts than Tier 2 Root cause issue detection supported by instrumentation MP enables high levels of automation Tasks support automated repair via human driven “launch” Focus on prevention starting to appear   All of 4 MP contains diagnosis and remediation actions linked to health state transitions Health state is highly actionable and customers can turn on “automate this” when they come to trust a  monitor’s accuracy Focus on proactive Pro-Packs included
Characteristics How-To by Google Product Specific Books are primary resource MSDN & Blog articles Product Specific WebSites and blogs for self-serve community knowledge   All of level 0 MMD -> TechNet documentation of events Manual trouble shooting is normal Event Viewer link MP’s are 2K5 conversions or equivalent. High noise rate if monitored MMD-> MP converter used MP contain few problem specific diagnostics and tasks MP automates Technet KB presentation Events trigger alerts MP increases costs due to high tuning costs and high false alarm rates Knowledge articles  focus on avoiding the problems Monitors outnumber rules in MP Actionable alert ratio > 50% Common issues managed by tasks Management pack reduces outage duration SLA becomes the focus for operations Change control in place Diagnosis starting to become automated (causality by model) Tier 1 handles most issues Actionable alert ratio > 70% Common issues automatically repaired by tasks Prolonged Outages uncommon MP automates diagnosis and have links to tasks Management packs drive real time decisions Many issues handled by resolvers and tasks launched by tier 1 Actionable alert ratio > 85% Tricky issues become manageable by T1 MP automates repair Correlation used to disambiguate symptomatic measures
Customer Experience Bring Specialists Product Expertise required Random events to learn Manual trouble shooting General admin certified expertise required for each product Browsing event log entries required to detect issues Escalation to product specialists MP’s are noisy, mostly consist of rules that trigger alerts High MP tuning costs Thresholds monitored by MP’s are tunable MP generates many alarms, most of which are not actionable Customers using MP catalog extensively Mix of MP’s becomes a cost of ownership issue Alert floods take down monitoring systems     Issue detection cause direct alerts that are actionable Actionable alerts outnumber non-actionable MP tuning costs coming into “reasonable” levels Consolidation rules are present Customers learn which MP’s are noisy and avoid them Trust in monitoring systems emerging Perceptions of product manageability = “ok” Few noise level alerts Tuning out of the gate a small investment Perception of product manageability = “good” Small staff can run many applications and hosts Skill levels required to do Tier 1 are optimized Product specific SLM included in MP Problems detected as they happen, before outage conditions MP tuning costs are minimum Customers equate great management experience with product quality Task worker UI intuitive and helps to see broader business state MP provides product specific rollups for dashboards and SLM
Impact High reliance on product specific expertise Must hire MS certified product specialists to manage portfolio MSFT servers have dubious reputation for manageability Slight cost reduction MSFT is seen as helping with costs Source of information centralized, “feels good” MP enables automating some of operations manual efforts Tuning costs become concern Closing monitors vs closing alerts is next frontier in cost Costs increase because non-actionable alerts happen frequently Instrumentation is ambiguous, wish for root cause analysis Mix of MP’s becoming a cost and performance concern Health monitors are considered accurate Costs come back into reasonable levels Ops manager seen as major cost saver when instrumentation and MP maturity are reasonable Number of ambiguous alarms (requires escalation) < 50% Small staff can manage thousands of machines, hundreds of applications Escalations are automated/tracked. IT management costs are competitive advantage “Noise” less than 20% of all alarms IT management costs are best in breed Flexible capacity management lowers capital costs MSFT advantage due to best of breed integrated management Escalations and workflows can be fully automated and triggered by monitors