Pipeline caching

आलेख
03/22/2023

Azure DevOps Services

Pipeline caching can help reduce build time by allowing the outputs or downloaded dependencies from one run to be reused in later runs, thereby reducing or avoiding the cost to recreate or redownload the same files again. Caching is especially useful in scenarios where the same dependencies are downloaded over and over at the start of each run. This is often a time consuming process involving hundreds or thousands of network calls.

Caching can be effective at improving build time provided the time to restore and save the cache is less than the time to produce the output again from scratch. Because of this, caching may not be effective in all scenarios and may actually have a negative impact on build time.

Note

Pipeline caching is supported in agent pool jobs for both YAML and Classic pipelines. However, it is not supported in Classic release pipelines.

When to use artifacts versus caching

Pipeline caching and pipeline artifacts perform similar functions but are designed for different scenarios and shouldn't be used interchangeably.

Use pipeline artifacts when you need to take specific files produced in one job and share them with other jobs (and these other jobs will likely fail without them).
Use pipeline caching when you want to improve build time by reusing files from previous runs (and not having these files won't impact the job's ability to run).

Note

Pipeline caching and pipeline artifacts are free for all tiers (free and paid). see Artifacts storage consumption for more details.

Cache task: how it works

Caching is added to a pipeline using the Cache task. This task works like any other task and is added to the steps section of a job.

When a cache step is encountered during a run, the task restores the cache based on the provided inputs. If no cache is found, the step completes and the next step in the job is run.

After all steps in the job have run and assuming a successful job status, a special "Post-job: Cache" step is automatically added and triggered for each "restore cache" step that wasn't skipped. This step is responsible for saving the cache.

Note

Caches are immutable, meaning that once a cache is created, its contents cannot be changed.

Configure the Cache task

The Cache task has two required arguments: key and path:

path: the path of the folder to cache. Can be an absolute or a relative path. Relative paths are resolved against $(System.DefaultWorkingDirectory).

Note

You can use predefined variables to store the path to the folder you want to cache, however wildcards are not supported.

key: should be set to the identifier for the cache you want to restore or save. Keys are composed of a combination of string values, file paths, or file patterns, where each segment is separated by a | character.

Strings:
Fixed value (like the name of the cache or a tool name) or taken from an environment variable (like the current OS or current job name)
File paths:
Path to a specific file whose contents will be hashed. This file must exist at the time the task is run. Keep in mind that any key segment that "looks like a file path" will be treated like a file path. In particular, this includes segments containing a .. This could result in the task failing when this "file" doesn't exist.

Tip

To avoid a path-like string segment from being treated like a file path, wrap it with double quotes, for example: "my.key" | $(Agent.OS) | key.file
File patterns:
Comma-separated list of glob-style wildcard pattern that must match at least one file. For example:
- **/yarn.lock: all yarn.lock files under the sources directory
- */asset.json, !bin/**: all asset.json files located in a directory under the sources directory, except under the bin directory

The contents of any file identified by a file path or file pattern is hashed to produce a dynamic cache key. This is useful when your project has files that uniquely identify what is being cached. For example, files like package-lock.json, yarn.lock, Gemfile.lock, or Pipfile.lock are commonly referenced in a cache key since they all represent a unique set of dependencies.

Relative file paths or file patterns are resolved against $(System.DefaultWorkingDirectory).

Example:

Here's an example showing how to cache dependencies installed by Yarn:

variables:
  YARN_CACHE_FOLDER: $(Pipeline.Workspace)/s/.yarn

steps:
- task: Cache@2
  inputs:
    key: '"yarn" | "$(Agent.OS)" | yarn.lock'
    restoreKeys: |
       "yarn" | "$(Agent.OS)"
       "yarn"
    path: $(YARN_CACHE_FOLDER)
  displayName: Cache Yarn packages

- script: yarn --frozen-lockfile

In this example, the cache key contains three parts: a static string ("yarn"), the OS the job is running on since this cache is unique per operating system, and the hash of the yarn.lock file that uniquely identifies the set of dependencies in the cache.

On the first run after the task is added, the cache step will report a "cache miss" since the cache identified by this key doesn't exist. After the last step, a cache will be created from the files in $(Pipeline.Workspace)/s/.yarn and uploaded. On the next run, the cache step will report a "cache hit" and the contents of the cache will be downloaded and restored.

When using checkout: self, the repository is checked out to $(Pipeline.Workspace)/s, and your .yarn folder usually resides in the repository itself.

Note

Pipeline.Workspace is the local path on the agent running your pipeline where all directories are created. This variable has the same value as Agent.BuildDirectory.

Ensure you update the variable YARN_CACHE_FOLDER if using anything other than checkout: self as this should point to the repository where .yarn resides.

Restore keys

restoreKeys can be used if one wants to query against multiple exact keys or key prefixes. This is used to fall back to another key in the case that a key doesn't yield a hit. A restore key searches for a key by prefix and yield the latest created cache entry as a result. This is useful if the pipeline is unable to find an exact match but wants to use a partial cache hit instead. To insert multiple restore keys, delimit them by using a new line to indicate the restore key (see the example for more details). The order of which restore keys will be tried against will be from top to bottom.

Required software on self-hosted agent

Archive software / Platform	Windows	Linux	Mac
GNU Tar	Required	Required	No
BSD Tar	No	No	Required
7-Zip	Recommended	No	No

The above executables need to be in a folder listed in the PATH environment variable. Keep in mind that the hosted agents come with the software included, this is only applicable for self-hosted agents.

Example:

Here's an example of how to use restore keys by Yarn:

variables:
  YARN_CACHE_FOLDER: $(Pipeline.Workspace)/.yarn

steps:
- task: Cache@2
  inputs:
    key: '"yarn" | "$(Agent.OS)" | yarn.lock'
    restoreKeys: |
       yarn | "$(Agent.OS)"
       yarn
    path: $(YARN_CACHE_FOLDER)
  displayName: Cache Yarn packages

- script: yarn --frozen-lockfile

In this example, the cache task attempts to find if the key exists in the cache. If the key doesn't exist in the cache, it tries to use the first restore key yarn | $(Agent.OS). This attempts to search for all keys that either exactly match that key or has that key as a prefix. A prefix hit can happen if there was a different yarn.lock hash segment. For example, if the following key yarn | $(Agent.OS) | old-yarn.lock was in the cache where the old-yarn.lock yielded a different hash than yarn.lock, the restore key will yield a partial hit. If there's a miss on the first restore key, it will then use the next restore key yarn which will try to find any key that starts with yarn. For prefix hits, the result yields the most recently created cache key as the result.

Note

A pipeline can have one or more caching task(s). There is no limit on the caching storage capacity, and jobs and tasks from the same pipeline can access and share the same cache.

Cache isolation and security

To ensure isolation between caches from different pipelines and different branches, every cache belongs to a logical container called a scope. Scopes provide a security boundary that ensures a job from one pipeline cannot access the caches from a different pipeline, and a job building a PR has read access to the caches for the PR's target branch (for the same pipeline), but cannot write (create) caches in the target branch's scope.

When a cache step is encountered during a run, the cache identified by the key is requested from the server. The server then looks for a cache with this key from the scopes visible to the job, and returns the cache (if available). On cache save (at the end of the job), a cache is written to the scope representing the pipeline and branch. See below for more details.

CI, manual, and scheduled runs

Scope	Read	Write
Source branch	Yes	Yes
`main` branch	Yes	No
`master` branch	Yes	No

Pull request runs

Scope	Read	Write
Source branch	Yes	No
Target branch	Yes	No
Intermediate branch (such as `refs/pull/1/merge`)	Yes	Yes
`main` branch	Yes	No
`master` branch	Yes	No

Pull request fork runs

Branch	Read	Write
Target branch	Yes	No
Intermediate branch (such as `refs/pull/1/merge`)	Yes	Yes
`main` branch	Yes	No
`master` branch	Yes	No

Tip

Because caches are already scoped to a project, pipeline, and branch, there is no need to include any project, pipeline, or branch identifiers in the cache key.

Conditioning on cache restoration

In some scenarios, the successful restoration of the cache should cause a different set of steps to be run. For example, a step that installs dependencies can be skipped if the cache was restored. This is possible using the cacheHitVar task input. Setting this input to the name of an environment variable causes the variable to be set to true when there's a cache hit, inexact on a restore key cache hit, otherwise it is set to false. This variable can then be referenced in a step condition or from within a script.

In the following example, the install-deps.sh step is skipped when the cache is restored:

steps:
- task: Cache@2
  inputs:
    key: mykey | mylockfile
    restoreKeys: mykey
    path: $(Pipeline.Workspace)/mycache
    cacheHitVar: CACHE_RESTORED

- script: install-deps.sh
  condition: ne(variables.CACHE_RESTORED, 'true')

- script: build.sh

Bundler

For Ruby projects using Bundler, override the BUNDLE_PATH environment variable used by Bundler to set the path Bundler looks for Gems in.

Example:

variables:
  BUNDLE_PATH: $(Pipeline.Workspace)/.bundle

steps:
- task: Cache@2
  displayName: Bundler caching
  inputs:
    key: 'gems | "$(Agent.OS)" | Gemfile.lock'
    path: $(BUNDLE_PATH)
    restoreKeys: | 
      gems | "$(Agent.OS)"
      gems

Ccache (C/C++)

Ccache is a compiler cache for C/C++. To use Ccache in your pipeline make sure Ccache is installed, and optionally added to your PATH (see Ccache run modes). Set the CCACHE_DIR environment variable to a path under $(Pipeline.Workspace) and cache this directory.

Example:

variables:
  CCACHE_DIR: $(Pipeline.Workspace)/ccache

steps:
- bash: |
    sudo apt-get install ccache -y    
    echo "##vso[task.prependpath]/usr/lib/ccache"
  displayName: Install ccache and update PATH to use linked versions of gcc, cc, etc

- task: Cache@2
  displayName: Ccache caching
  inputs:
    key: 'ccache | "$(Agent.OS)" | $(Build.SourceVersion)'
    path: $(CCACHE_DIR)
    restoreKeys: | 
      ccache | "$(Agent.OS)"

See Ccache configuration settings for more details.

Docker images

Caching Docker images dramatically reduces the time it takes to run your pipeline.

variables:
  repository: 'myDockerImage'
  dockerfilePath: '$(Build.SourcesDirectory)/app/Dockerfile'
  tag: '$(Build.BuildId)'

pool:
  vmImage: 'ubuntu-latest'
steps:
  - task: Cache@2
    displayName: Cache task
    inputs:
      key: 'docker | "$(Agent.OS)" | cache'
      path: $(Pipeline.Workspace)/docker
      cacheHitVar: CACHE_RESTORED                #Variable to set to 'true' when the cache is restored
    
  - script: |
      docker load -i $(Pipeline.Workspace)/docker/cache.tar
    displayName: Docker restore
    condition: and(not(canceled()), eq(variables.CACHE_RESTORED, 'true'))

  - task: Docker@2
    displayName: 'Build Docker'
    inputs:
      command: 'build'
      repository: '$(repository)'
      dockerfile: '$(dockerfilePath)'
      tags: |
        '$(tag)'

  - script: |
      mkdir -p $(Pipeline.Workspace)/docker
      docker save -o $(Pipeline.Workspace)/docker/cache.tar $(repository):$(tag)
    displayName: Docker save
    condition: and(not(canceled()), not(failed()), ne(variables.CACHE_RESTORED, 'true'))

key: (required) - a unique identifier for the cache.
path: (required) - path of the folder or file that you want to cache.

Golang

For Golang projects, you can specify the packages to be downloaded in the go.mod file. If your GOCACHE variable isn't already set, set it to where you want the cache to be downloaded.

Example:

variables:
  GO_CACHE_DIR: $(Pipeline.Workspace)/.cache/go-build/

steps:
- task: Cache@2
  inputs:
    key: 'go | "$(Agent.OS)" | go.mod'
    restoreKeys: | 
      go | "$(Agent.OS)"
    path: $(GO_CACHE_DIR)
  displayName: Cache GO packages

Gradle

Using Gradle's built-in caching support can have a significant impact on build time. To enable the build cache, set the GRADLE_USER_HOME environment variable to a path under $(Pipeline.Workspace) and either run your build with --build-cache or add org.gradle.caching=true to your gradle.properties file.

Example:

variables:
  GRADLE_USER_HOME: $(Pipeline.Workspace)/.gradle

steps:
- task: Cache@2
  inputs:
    key: 'gradle | "$(Agent.OS)" | **/build.gradle.kts' # Swap build.gradle.kts for build.gradle when using Groovy
    restoreKeys: |
      gradle | "$(Agent.OS)"
      gradle
    path: $(GRADLE_USER_HOME)
  displayName: Configure gradle caching

- task: Gradle@2
  inputs:
    gradleWrapperFile: 'gradlew'
    tasks: 'build'
    options: '--build-cache'
  displayName: Build

- script: |   
    # stop the Gradle daemon to ensure no files are left open (impacting the save cache operation later)
    ./gradlew --stop    
  displayName: Gradlew stop

restoreKeys: The fallback keys if the primary key fails (Optional)

Note

Caches are immutable, once a cache with a particular key is created for a specific scope (branch), the cache cannot be updated. This means that if the key is a fixed value, all subsequent builds for the same branch will not be able to update the cache even if the cache's contents have changed. If you want to use a fixed key value, you must use the restoreKeys argument as a fallback option.

Maven

Maven has a local repository where it stores downloads and built artifacts. To enable, set the maven.repo.local option to a path under $(Pipeline.Workspace) and cache this folder.

Example:

variables:
  MAVEN_CACHE_FOLDER: $(Pipeline.Workspace)/.m2/repository
  MAVEN_OPTS: '-Dmaven.repo.local=$(MAVEN_CACHE_FOLDER)'

steps:
- task: Cache@2
  inputs:
    key: 'maven | "$(Agent.OS)" | **/pom.xml'
    restoreKeys: |
      maven | "$(Agent.OS)"
      maven
    path: $(MAVEN_CACHE_FOLDER)
  displayName: Cache Maven local repo

- script: mvn install -B -e

If you're using a Maven task, make sure to also pass the MAVEN_OPTS variable because it gets overwritten otherwise:

- task: Maven@4
  inputs:
    mavenPomFile: 'pom.xml'
    mavenOptions: '-Xmx3072m $(MAVEN_OPTS)'

.NET/NuGet

If you use PackageReferences to manage NuGet dependencies directly within your project file and have a packages.lock.json file, you can enable caching by setting the NUGET_PACKAGES environment variable to a path under $(UserProfile) and caching this directory. See Package reference in project files for more details on how to lock dependencies. If you want to use multiple packages.lock.json, you can still use the following example without making any changes. The content of all the packages.lock.json files will be hashed and if one of the files is changed, a new cache key is generated.

Example:

variables:
  NUGET_PACKAGES: $(Pipeline.Workspace)/.nuget/packages

steps:
- task: Cache@2
  inputs:
    key: 'nuget | "$(Agent.OS)" | $(Build.SourcesDirectory)/**/packages.lock.json'
    restoreKeys: |
       nuget | "$(Agent.OS)"
       nuget
    path: $(NUGET_PACKAGES)
  displayName: Cache NuGet packages

Node.js/npm

There are different ways to enable caching in a Node.js project, but the recommended way is to cache npm's shared cache directory. This directory is managed by npm and contains a cached version of all downloaded modules. During install, npm checks this directory first (by default) for modules that can reduce or eliminate network calls to the public npm registry or to a private registry.

Because the default path to npm's shared cache directory is not the same across all platforms, it's recommended to override the npm_config_cache environment variable to a path under $(Pipeline.Workspace). This also ensures the cache is accessible from container and noncontainer jobs.

Example:

variables:
  npm_config_cache: $(Pipeline.Workspace)/.npm

steps:
- task: Cache@2
  inputs:
    key: 'npm | "$(Agent.OS)" | package-lock.json'
    restoreKeys: |
       npm | "$(Agent.OS)"
    path: $(npm_config_cache)
  displayName: Cache npm

- script: npm ci

If your project doesn't have a package-lock.json file, reference the package.json file in the cache key input instead.

Tip

Because npm ci deletes the node_modules folder to ensure that a consistent, repeatable set of modules is used, you should avoid caching node_modules when calling npm ci.

Node.js/Yarn

Like with npm, there are different ways to cache packages installed with Yarn. The recommended way is to cache Yarn's shared cache folder. This directory is managed by Yarn and contains a cached version of all downloaded packages. During install, Yarn checks this directory first (by default) for modules, which can reduce or eliminate network calls to public or private registries.

Example:

variables:
  YARN_CACHE_FOLDER: $(Pipeline.Workspace)/.yarn

steps:
- task: Cache@2
  inputs:
    key: 'yarn | "$(Agent.OS)" | yarn.lock'
    restoreKeys: |
       yarn | "$(Agent.OS)"
       yarn
    path: $(YARN_CACHE_FOLDER)
  displayName: Cache Yarn packages

- script: yarn --frozen-lockfile

Python/Anaconda

Set up your pipeline caching with Anaconda environments:

Example

variables:
  CONDA_CACHE_DIR: /usr/share/miniconda/envs

# Add conda to system path
steps:
- script: echo "##vso[task.prependpath]$CONDA/bin"
  displayName: Add conda to PATH

- bash: |
    sudo chown -R $(whoami):$(id -ng) $(CONDA_CACHE_DIR)
  displayName: Fix CONDA_CACHE_DIR directory permissions

- task: Cache@2
  displayName: Use cached Anaconda environment
  inputs:
    key: 'conda | "$(Agent.OS)" | environment.yml'
    restoreKeys: | 
      python | "$(Agent.OS)"
      python
    path: $(CONDA_CACHE_DIR)
    cacheHitVar: CONDA_CACHE_RESTORED

- script: conda env create --quiet --file environment.yml
  displayName: Create Anaconda environment
  condition: eq(variables.CONDA_CACHE_RESTORED, 'false')

Windows

- task: Cache@2
  displayName: Cache Anaconda
  inputs:
    key: 'conda | "$(Agent.OS)" | environment.yml'
    restoreKeys: | 
      python | "$(Agent.OS)"
      python
    path: $(CONDA)/envs
    cacheHitVar: CONDA_CACHE_RESTORED

- script: conda env create --quiet --file environment.yml
  displayName: Create environment
  condition: eq(variables.CONDA_CACHE_RESTORED, 'false')

PHP/Composer

For PHP projects using Composer, override the COMPOSER_CACHE_DIR environment variable used by Composer.

Example:

variables:
  COMPOSER_CACHE_DIR: $(Pipeline.Workspace)/.composer

steps:
- task: Cache@2
  inputs:
    key: 'composer | "$(Agent.OS)" | composer.lock'
    restoreKeys: |
      composer | "$(Agent.OS)"
      composer
    path: $(COMPOSER_CACHE_DIR)
  displayName: Cache composer

- script: composer install

Known issues and feedback

If you're experiencing issues setting up caching for your pipeline, check the list of open issues in the microsoft/azure-pipelines-tasks repo. If you don't see your issue listed, create a new one and provide the necessary information about your scenario.

Q&A

Q: Can I clear a cache?

A: Clearing a cache is currently not supported. However you can add a string literal (such as version2) to your existing cache key to change the key in a way that avoids any hits on existing caches. For example, change the following cache key from this:

key: 'yarn | "$(Agent.OS)" | yarn.lock'

To this:

key: 'version2 | yarn | "$(Agent.OS)" | yarn.lock'

Q: When does a cache expire?

A: Caches expire after seven days of no activity.

Q: When does the cache get uploaded?

A: After the last step of your pipeline a cache will be created from your cache path and uploaded. See the example for more details.

Q: Is there a limit on the size of a cache?

A: There's no enforced limit on the size of individual caches or the total size of all caches in an organization.

इसके माध्यम से साझा किया गया

Pipeline caching

When to use artifacts versus caching

Cache task: how it works

Configure the Cache task

Restore keys

Required software on self-hosted agent

Cache isolation and security

CI, manual, and scheduled runs

Pull request runs

Pull request fork runs

Conditioning on cache restoration

Bundler

Ccache (C/C++)

Docker images

Golang

Gradle

Maven

.NET/NuGet

Node.js/npm

Node.js/Yarn

Python/Anaconda

Example

PHP/Composer

Known issues and feedback

Q&A

Q: Can I clear a cache?

Q: When does a cache expire?

Q: When does the cache get uploaded?

Q: Is there a limit on the size of a cache?

प्रतिक्रिया

अतिरिक्त संसाधन