Plan and prioritize
Learn how to identify goals for your platform engineering efforts based on the Platform Engineering Capability Model, walk through common scenarios, and look for ways to measure success. Measure success by scoping your goals to business objectives.
To get started, first assess where your organization is today with the Platform Engineering Capability Model. Use the model to chart where your organization is across six core platform engineering capabilities - investment, adoption, governance, provisioning and mangement, interfaces, and measurement and feedback. All organizations are more advanced in some capabilities than in others. Once you know where your organization stands today, you can pick which capabilities you'd like to grow. To learn more, see how to use the model.
You can continually plan and update your platform engineering goals over time. Being clear on what you want to gain from investing in your platform engineering journey can go a long way in helping build organizational support.
As a platform engineering lead once put it, "I don't do something for platform engineering until I have something I can gain from it." – Peter, platform engineering Lead, Multinational Tech Company
As you're thinking about your goals, scope them to business objectives related to your platform engineering effort, rather than the specifics of a particular development team. Common high-level platform engineering goals include:
- Increase application quality, reduce bugs, and issues during release.
- Improve security, reduce the number of security incidents, or detected Common Vulnerabilities and Exposures (CVEs) once in production.
- Decrease risk through better compliance in areas like licensing, accessibility, privacy, or governmental regulation.
- Accelerate time-to-business value by reducing complexity, overhead, and promoting code sharing through inner source practices.
- Reduce development or operations costs, minimize duplication, and improve automation.
Picking your top goal is critical since your goal drives how you think about your prioritization.
Better yet, agreeing on objectives and key results (OKRs) with your leadership and internal partners leads to establishing measurable goals for the current phase of your investments. (Other planning approaches have similar concepts if your organization uses something else.) The best OKRs are those that you can set based on a concrete measure since it removes subjectivity.
After identifying your goals, choose the key scenarios to drive out your backlog and roadmap. For example, see these scenarios and the associated jobs to be done.
- Understand and apply organizational best practices and policies
- Create a new repository
- Provision tools
- Provision common infrastructure
- Give team members access
- Establish CI/CD pipelines
- Provision development infrastructure
- Initial deployment to test out "pipes"
- Update access to tools, services
- Set up developer machine
- Ramp up team member on applications
- Create application sandbox environment
- Create and review first PR
- Understand organizational best practices, available options
- Update / add infrastructure as code artifacts
- Create application sandbox environment
- Verify updates
- Roll out changes to other environments
- Understand organizational best practices, available options
- Request use of new tool
- Update team member access to tools
- (If applicable) Integrate tool into clients or CI/CD pipelines
- Discover available APIs, SDK, services
- Assess whether it meets needs
- Connect with owning team for any questions
- Adopt for application
- Notification of an issue
- Assess if app code or infra related (or both)
- Create sandbox environment that mirrors prod (if different)
- Make changes, test, out-of-band release
- Notification of an issue
- Assess breadth of issues (which systems)
- Understand if customers are directly impacted
- Create sandbox environment that mirrors prod (if different)
- Make changes, test, out-of-band release
- Communicate with anyone affected
- Understand tool usage
- Notify users of deprecation
- (Optional) Coordination migration of users to new tool
- Pilot proposed architecture
- Adjust based on pilot results
- Update best practices documentation
- Create templates, update automation, policies, governance
- Understand current compliance rules
- Verify application meets rules
- Establish deadline for fixes for deviations
- Make changes, test, release
Many scenarios apply to more than one role. Think about metrics for how you'll measure improvement.
Next, seek to understand the biggest problems your developers and other roles face with the scenarios and jobs you identified. It can be tempting to start investigating new products to fill in perceived gaps, but if these products don't resolve a major pain point, they’re unlikely to be adopted or appreciated.
There are several approaches that can help you do this kind of investigation. One is the Hypothesis Progression Framework. Even if you don't use a formalized process like the Hypothesis Progression Framework, you should interview developers about a job to be done to scope the discussion, and then identify their biggest problems in accomplishing their work. Once you have a good sense of what these problems are, move on to coming up with plans for resolving them. This helps ensure that you build features that developers want to use.
To be able to quickly repeat this process, identify someone that can represent the voice of the customer directly on your internal developer platform team. This role is typically called product manager (even if they have a different job title), and as their knowledge grows, they're able to accurately predict results for smaller decisions and determine when you need to do more interviews. This keeps your agility up while still ensuring you're focused on delivering value to your internal customers.
Once you have a set of validated problems and concepts, you are in a good position to decide where to invest. However, consider the up-front investment and long-term maintenance required when evaluating solutions. The lowest effort solution that can solve the problem tends to be the right one to start with, but often it’s the maintenance work that ultimately decides whether your investment is successful.
Put another way, don’t create solutions that target later stages of your journey unless you really need to.
Once you’ve identified your thinnest viable platform (TVP) (an MVP for your platform), pilot it with a set of development teams that are willing to provide feedback. If your pilot solution solves problems these teams are facing, you shouldn’t have trouble finding someone interested in engaging.
You should capture some key metrics as you pilot a new capability or changes so you can measure whether the concept was successful before you roll it out.
Measuring how successful you're is an important part of a product mindset. Even small successes with modest investments can lay the groundwork for larger investments to build on.
For example, it can be difficult to secure funding or buy-in for compliance efforts because they can be perceived as an operating tax to development teams who are delivering business value. A product mindset can change that perception. With a product mindset, you're trying to ensure the customers for your internal developer platform are happy, and that the business goals of the stakeholders are met. Metrics put you in a position to use facts to prove that you're providing business value. Setting OKRs can help, particularly if you have metrics you can use to help remove subjective bias. Even if you aren’t measuring anything applicable today, you can set a learning OKR to set a baseline that you'll then refine after this baseline is known.
The following are examples of well-known metrics you can measure to determine if the teams you're working with are getting value out of what you are building. Zero in on those that help you measure whether you, and your development team customers, are achieving your goals. For example, the following is a set of metrics that help you evaluate whether your platform is contributing to an effective engineering organization:
- Speed / time to business value: Median days to complete first pull request (onboarding), median minutes for build and test processes (example: CI), median time to merge pull request.
- Software quality: Incidents (issues) created per month per dev(count normalized to number of devs), mean time to remediate (MTTR), mean time to investigate and remediate security issue.
- Platform ease of use: Net user satisfaction (NSAT)
- Thriving ecosystem: Average score for each of the following surveyed questions: "I am empowered to do my best work," "most days I am energized by the work I do," "the work I do is meaningful to me."
You can then break down these metrics by organization, team, or project. To start you need to measure some baselines, but you can then watch these metrics change as you improve your platform.
Other metrics you or your sponsors might be interested measuring include:
Area | Example metrics |
---|---|
Software delivery performance | DevOps Research and Assessment (DORA): Change lead time, deployment frequency, change fail rate, time to restore service (MTTR) |
Operations | DORA (MTTR), mean time between failure (MTBF), average time to acknowledge, end-customer availability, latency, throughput metrics, cost per team, cost per deployment |
Platform capability adoption metrics | Number of teams or developers using a tool or feature over time, percentage of repositories using the platform, most popular templates, pipelines, etc. |
Collecting metrics requires time and effort so it's important to focus on critical metrics for the core goals you identified – particularly those that power your OKRs. OpenTelemetry based products like Application Insights can help. Regardless, be sure to measure platform ease of use and survey that you have a thriving ecosystem regularly. These metrics are often missed for internal systems and are a leading indicator as to whether you meet your broader business goals since engaged participation is critical to success. It also helps you know whether it's time to do further customer development to understand where to go next.
Regardless of how you begin, keep in mind that you should plan for change, start with new applications for pilots, focus on enhancing existing user experiences, adopt the principle of least privilege, plan for controlled experimentation, and minimize customization.
Your target application platform will evolve over time, and you may not be able to migrate all your existing investments at once. Think through how you can support more than one variation over time and plan for change.
Start with new applications of a reasonable size when you're piloting your new platform or platform capabilities. This will also give you experience managing your platform as a product. Shy away from replatforming projects to begin since you learn as you go, and large existing applications often have unique needs that are only uncovered during the re-platforming effort itself. Because of that, coupling your success to these types of efforts can result expectation mismatches or late breaking problems. Starting with something newer can give you have confidence in your direction the value it provides. That reduces the risk of tackling these bigger efforts. Put another way, if you're confident people can start right and stay right, then it becomes easier to drive a get right campaign with what you learn from experience. If this approach isn't possible, you’ll want to do careful analysis, set expectations, and incrementally step in rather than trying to change everything at once.
Developers are more likely to adopt and stick with new capabilities when they're surfaced in something they already like and use. As you're evaluating concepts to deliver new capabilities, be sure to investigate options that take extend existing "centers of gravity." For example, editors/IDEs (Visual Studio, VS Code), DevOps suites (GitHub, Azure DevOps), existing CLIs, or an existing internal portal can be more effective than an entirely new portal or other UX. See user experiences to learn more.
Assume developers have limited access to downstream systems for things like provisioning infrastructure. You'll need a controlled way to enable this access for self-service experiences.
Experiment before rolling out major or risky changes. Not everything has to be fully automated to start. An automatically triggered manual workflow can be a great way to pilot ideas.
Avoid custom building application platform capabilities that could be eclipsed by capabilities software vendors release over time. For example, runtime hosting, service meshes, identity systems, and so on. If you find an urgent need in an area you suspect might be like this, plan for multiple implementation options given the long-term maintenance will likely cause you to switch over time.