Plan and prioritize

Learn how to identify goals for your platform engineering efforts, walk through common scenarios, and look for ways to measure success. You can measure success by scoping your goals to business objectives.

Identify goals and key scenarios

As a platform engineering lead once put it, "I don't do something for platform engineering until I have something I can gain from it." – Peter, platform engineering Lead, Multinational Tech Company

Rather than looking at a rote checklist of capabilities or features, start by identifying the goals of your platform engineering efforts. You can continually plan and update the goals over time. However, being clear on what you want to gain from investing in your platform engineering journey can go a long way in helping build organizational support.

As you're thinking about your goals, scope them to business objectives related to your platform engineering effort, rather than the specifics of a particular development team. For example, here are some common high-level platform engineering goals:

  • Increase application quality, reduce bugs and issues during release.
  • Improve security, reduce the number of security incidents, or detected CVEs once in production.
  • Decrease risk through better compliance in areas like licensing, accessibility, privacy, or governmental regulation.
  • Accelerate time-to-business value by reducing complexity, overhead, and promoting code sharing through inner source practices.
  • Reduce development or operations costs, minimize duplication, and improve automation.

While all of these objectives might be long term goals, picking your top goal is critical since this drives how you think about your prioritization.

Better yet, agreeing on objectives and key results (OKRs) with your leadership and internal partners can help you establish measurable goals for the current phase of your investments. (Other planning approaches have similar concepts if your organization uses something else.) The best OKRs are those that you can set based on a concrete measure since it removes subjectivity.

Scenarios and jobs to be done

Once you've identified your goals, identify the key scenarios you'll use to drive out your backlog and roadmap. For example, see these scenarios and jobs to be done.

Scenario: Start building a new application

  • Understand and apply organizational best practices and policies
  • Create a new repository
  • Provision tools
  • Provision common infrastructure
  • Give team members access
  • Establish CI/CD pipelines
  • Provision dev infrastructure
  • Initial deployment to test out "pipes"

Scenario: Add/remove a new member to an existing team

  • Update access to tools, services
  • Set up developer machine
  • Ramp up team member on applications
  • Create application sandbox environment
  • Create and review first PR

Scenario: Add / update infrastructure for existing application

  • Understand organizational best practices, available options
  • Update / add infrastructure as code artifacts
  • Create application sandbox environment
  • Verify updates
  • Roll out changes to other environments

Scenario: Add / update tool for existing team

  • Understand organizational best practices, available options
  • Request use of new tool
  • Update team member access to tools
  • (If applicable) Integrate tool into clients or CI/CD pipelines

Scenario: Find / reuse existing API, SDK, or service

  • Discover available APIs, SDK, services
  • Assess whether it meets needs
  • Connect with owning team for any questions
  • Adopt for application

Scenario: Respond to an operations incident

  • Notification of an issue
  • Assess if app code or infra related (or both)
  • Create sandbox environment that mirrors prod (if different)
  • Make changes, test, out-of-band release

Scenario: Rapidly remediate security incident / Common Vulnerabilities and Exposures (CVE) notice

  • Notification of an issue
  • Assess breadth of issues (which systems)
  • Understand if customers are directly impacted
  • Create sandbox environment that mirrors prod (if different)
  • Make changes, test, out-of-band release
  • Communicate with anyone affected

Scenario: Deprecate tool

  • Understand tool usage
  • Notify users of deprecation
  • (Optional) Coordination Migration of users to new tool

Scenario: Define / rollout new app model / architecture

  • Pilot proposed architecture
  • Adjust based on pilot results
  • Update best practices documentation
  • Create templates, update automation, policies, governance

Audit application compliance (GDPR, accessibility, infrastructure standards)

  • Understand current compliance rules
  • Verify application meets rules
  • Establish deadline for fixes for deviations
  • Make changes, test, release

Many scenarios apply to more than one role, so you'll also want to think about metrics for how you'll understand how much your scenarios improve.

From problems to concepts

Next, we recommend seeking to understand the biggest problems your developers and other roles face with the scenarios and jobs you identified. It can be tempting to start investigating new products to fill in perceived gaps, but if these products don't resolve a major pain point, they’re unlikely to be adopted or appreciated.

There are several approaches that can help you do this kind of investigation. One is the Hypothesis Progression Framework. Even if you don't use a formalized process like the Hypothesis Progression Framework, you can gain a lot from interviewing developers about a job to be done to scope the discussion, and then identifying their biggest problems in accomplishing their work. Once you have a good sense of what these problems are, you can move on to coming up with concepts for resolving them. This helps ensure that you'll build features that developers want to use.

To be able to quickly repeat this process, you'll want to identify someone that can represent the voice of the customer directly on your internal developer platform team. This role is typically called product manager (even if they have a different job title), and as their knowledge grows, they'll be able to accurately predict results for smaller decisions and determine when you need to do more interviews. This keeps your agility up while still ensuring you're focused on delivering value to your internal customers.

Make the case for your initial investments

Once you have a set of validated problems and concepts, you'll be in a good position to make an argument to invest. However, keep in mind the level of up-front investment and long-term maintenance required. The lowest effort solution that can solve the problem tends to be the right one to start with, but often it’s the maintenance work that ultimately decides whether your investment is successful.

Put another way, don’t create solutions that target later stages of your journey unless you really need to.

Once you’ve identified your thinnest viable platform (TVP) (an MVP for your platform), you can pilot it with a set of development teams that are willing to provide feedback. If your pilot solution solves problems these teams are facing, you shouldn’t have trouble finding someone interested in engaging.

You should capture some key metrics as you pilot a new capability or changes so you can measure whether the concept was successful before you roll it out.

Measure success and proving value

Regardless of whether you're making your first investment or not, measuring how successful you are is an important part of a product mindset. It not only helps you know whether you have achieved your goals, but even small successes with modest investments can lay the groundwork for larger investments to build on.

For example, it can be difficult to secure funding or buy-in for compliance efforts because they can be perceived as an operating tax to development teams who are delivering business value. A product mindset can change that perception. With a product mindset, you're trying to ensure the customers for your internal developer platform are happy, and that the business goals of the stakeholders are met. Metrics put you in a position to use facts to prove that you're providing business value. Setting OKRs can help, particularly if you have metrics you can use to help remove subjective bias. Even if you aren’t measuring anything applicable today, you can set a learning OKR to set a baseline that you'll then refine after this baseline is known.

The following are examples of well-known metrics you can measure to determine if the teams you're working with are getting value out of what you are building. Zero in on those that help you measure whether you, and your development team customers, are achieving your goals. For example, the following is a set of metrics that help you evaluate whether your platform is contributing to an effective engineering organization:

  • Speed / time to business value: Median days to complete first pull request (onboarding), median minutes for build and test processes (example: CI), median time to merge pull request.
  • Software quality: Incidents (issues) created per month per dev(count normalized to number of devs), mean time to remediate (MTTR), mean time to investigate and remediate security issue.
  • Platform ease of use: Net user satisfaction (NSAT)
  • Thriving ecosystem: Average score for each of the following surveyed questions: "I am empowered to do my best work," "most days I am energized by the work I do," "the work I do is meaningful to me."

You can then break down these metrics by organization, team, or project. To start you'll need to measure some baselines, but you can then watch these metrics change as you improve your platform.

Other metrics you or your sponsors might be interested measuring include:

Area Example metrics
Software delivery performance DevOps Research and Assessment (DORA): Change lead time, deployment frequency, change fail rate, time to restore service (MTTR)
Operations DORA (MTTR), mean time between failure (MTBF), average time to acknowledge, end-customer availability, latency, throughput metrics, cost per team, cost per deployment
Platform capability adoption metrics Number of teams or developers using a tool or feature over time, percentage of repositories using the platform, most popular templates, pipelines, etc.

Collecting metrics requires time and effort so it's important to focus on critical metrics for the core goals you identified – particularly those that power your OKRs. OpenTelemetry based products like Application Insights can help. Regardless, be sure to measure platform ease of use and survey that you have a thriving ecosystem regularly. These metrics are often missed for internal systems and are a leading indicator as to whether you'll meet your broader business goals since engaged participation is critical to success. It also helps you know whether it's time to do further customer development to understand where to go next.

Other tips

Regardless of how you begin, keep in mind the following guidelines.

Plan for change

Your target application platform will evolve over time, and you may not be able to migrate all your existing investments at once. You'll likely want to think through how you can support more than one variation over time and plan for change.

Validate ideas with newer applications

It's generally best to start with new applications of a reasonable size when you're piloting your new platform or platform capabilities. This will also give you experience managing your platform as a product. Shy away from re-platforming projects to begin since you'll learn as you go, and large existing applications often have unique needs that are only uncovered during the re-platforming effort itself. Because of that, coupling your success to these types of efforts can result expectation mismatches or late breaking problems. Starting with something newer can give you have confidence in your direction the value it provides. That reduces the risk of tackling these bigger efforts. Put another way, if you're confident people can start right and stay right, then it becomes easier to drive a get right campaign with what you learn from experience. If this approach isn't possible, you’ll want to do careful analysis, set expectations, and incrementally step in rather than trying to change everything at once.

Focus on existing centers of gravity for user experiences before creating new ones

Developers are more likely to adopt and stick with new capabilities when they're surfaced in something they already like and use. As you're evaluating concepts to deliver new capabilities, be sure to investigate options that take extend existing "centers of gravity." For example, editors/IDEs (Visual Studio, VS Code), DevOps suites (GitHub, Azure DevOps), existing CLIs, or an existing internal portal can be more effective than an entirely new portal or other UX. See user experiences to learn more.

Assume the principle of least privilege

Assume developers have limited access to downstream systems for things like provisioning infrastructure. You'll need a controlled way to enable this access for self-service experiences.

Plan for controlled experimentation

Experiment before rolling out major or risky changes. Not everything has to be fully automated to start. An automatically triggered manual workflow can be a great way to pilot ideas.

Minimize App Platform customization

Try to avoid custom building application platform capabilities that could be eclipsed by capabilities software vendors release over time. For example, runtime hosting, service meshes, identity systems, and so on. If you find an urgent need in an area you suspect might be like this, plan for multiple implementation options given the long-term maintenance will likely cause you to switch over time.