Edit

Share via


GATK Resource Bundle

Note

Important Update May 2025: Dear Community, We’d like to inform you of an upcoming change regarding the Genomics open datasets currently available through Azure. After careful consideration, we decided to shift our focus to new initiatives that will better serve our community and align with our long-term goals. As such, access to the Genomics open datasets on Azure will be deprecated in the coming months. We understand these datasets were valuable for research, development, and learning, and we deeply appreciate the contributions and engagement from our community over time. Thank you for your understanding and support.

The GATK resource bundle is a collection of standard files for working with human resequencing data with the GATK.

Note

Microsoft provides Azure Open Datasets on an “as is” basis. Microsoft makes no warranties, express or implied, guarantees or conditions with respect to your use of the datasets. To the extent permitted under your local law, Microsoft disclaims all liability for any damages or losses, including direct, consequential, special, indirect, incidental or punitive, resulting from your use of the datasets.

This dataset is provided under the original terms that Microsoft received source data. The dataset may include data sourced from Microsoft.

Data source

This dataset is a mirror of the data store at https://gatk.broadinstitute.org/hc/articles/360035890811-Resource-bundle

Data volumes and update frequency

  1. datasetgatkbestpractices : 542 GB
  2. datasetgatklegacybundles : 61 GB
  3. datasetgatktestdata : 2 TB
  4. datasetpublicbroadref : 477 GB
  5. datasetbroadpublic : 3 TB

Datasets are updated monthly during the first week of every month.

Storage location

This dataset is stored in the West US 2 and West Central US Azure regions. Allocating compute resources in West US 2 or West Central US is recommended for affinity.

Data Access

  1. datasetgatkbestpractices

    West US 2: 'https://datasetgatkbestpractices.blob.core.windows.net/dataset'

    West Central US: 'https://datasetgatkbestpractices-secondary.blob.core.windows.net/dataset'

  2. datasetgatklegacybundles

    West US 2: 'https://datasetgatklegacybundles.blob.core.windows.net/dataset'

    West Central US: 'https://datasetgatklegacybundles-secondary.blob.core.windows.net/dataset'

  3. datasetgatktestdata

    West US 2: 'https://datasetgatktestdata.blob.core.windows.net/dataset'

    West Central US: 'https://datasetgatktestdata-secondary.blob.core.windows.net/dataset'

  4. datasetpublicbroadref

    West US 2: 'https://datasetpublicbroadref.blob.core.windows.net/dataset'

    West Central US: 'https://datasetpublicbroadref-secondary.blob.core.windows.net/dataset'

    South Central US: 'https://datasetpublicbroadrefsc.blob.core.windows.net/dataset'

  5. datasetbroadpublic

    West US 2: 'https://datasetbroadpublic.blob.core.windows.net/dataset'

    West Central US: 'https://datasetbroadpublic-secondary.blob.core.windows.net/dataset'

    South Central US: 'https://datasetbroadpublicsc.blob.core.windows.net/dataset'

Use Terms

Visit the GATK resource bundle official site.

Contact

Visit the GATK resource bundle official site.

Next steps

View the rest of the datasets in the Open Datasets catalog.