Getting Started

The following prerequisites are required for running the sample:

To run this sample, you need to have administrative access to your HPC cluster’s head node, and you must have a valid Windows Azure account, a Windows Azure worker node template defined in your head node, and several Windows Azure worker nodes in the HPC cluster that are started and online. Follow the Deploying Windows Azure Worker Nodes in Windows HPC Server 2008 R2 Step-by-Step Guide on TechNet for further information.

Note:
The BLAST HPC parametric sweep application runs only one instance of the application on each Windows Azure node at a time, using only one core for the processing. To conserve Windows Azure CPU hours, it is advisable that you use only small nodes for this sample. If you wish to run this sample on medium or larger nodes, you can do so, but the application will only take advantage of a single core in each node.

One possible design to utilize all Cores on larger instances is a file-based locking mechanism, of which each core has its own copy of the database.  When each new task is dispatched onto a node, it first checks to see whether one of the databases is available, claims it, and releases the claim when the task is done.  This is left as an additional exercise for this lab.

Task 1 - Inspecting the Blast Solution

In this task, you will inspect the Blast solution to see the various projects included in this sample.

  1. Open Microsoft Visual Studio 2010 from Start | All Programs | Microsoft Visual Studio 2010 | Microsoft Visual Studio 2010.
  2. Open the Blast.sln solution file located in the Blast\Source\Blast folder.
  3. Examine the projects tree in the Solution Explorer window. The solution tree includes several Windows Azure projects, some class libraries, and a console application, as shown in Figure 2:

    Figure 2

    The Blast projects tree

The following list describes the purpose of each project in the Blast solution:

  • Client
    • BlastClient. The project contains a Windows Presentation Foundation (WPF) application that submits the BLAST HPC job and monitors it. This project uses the HpcRestClient library to send job creation requests over HTTP to the HPC Job Scheduler’s web service using the REST interface.You can inspect the MainWindow.xaml.cs file in this project to see how the WPF application uses the HpcRestClient library to create the new parametric sweep job.
    • HpcRestClient. The project contains a library that supports the REST interface of the HPC Job Scheduler’s web service, which is a part of the HPC Pack web features. Using this library, clients can send REST requests over HTTP to the HPC Job Scheduler’s web service in order to create new parametric sweep jobs and get information about active and finished jobs.You can inspect the HpcClient.cs file in this project to see how the REST interface is used and which data structures the interface receives and returns.
  • HpcApplication
    • AzureBlobCopy. The project contains a command-line utility that enables users to upload and download files from Windows Azure blob storage.
    • AzureUtilities. The project contains a library for handling the uploading and downloading of files from blobs. AzureBlobCopy makes use of this library for that purpose.You can inspect the BlobUtilities.cs file in this project to see how the library uses the Microsoft.WindowsAzure.StorageClient assembly to communicate with blobs.
    • UploadToBOV. The project contains a command-line utility that uploads BLAST output files to the BLAST output visualization (BOV) website, gets the generated URL for the file, and writes it to Windows Azure table storage.
  • Shared
    • BlastTable. The project contains a library for handling the writing and reading of blast results from a table storage. Both UploadToBOV and BlastClient make use of this library for that purpose.You can inspect the TableHelper.cs file in this project to see how the library uses the Microsoft.WindowsAzure.StorageClient assembly to communicate with tables.

Task 2 - Inspecting the Parametric Sweep Script Files

In this task, you will inspect the contents of the run.cmd and the verifydb.cmd files to see which commands run when the parametric sweep job executes in each Windows Azure compute node.

  1. Open the labs folder in WindowsExplorer and navigate to the BLAST\Source\ncbi-blast-2.2.25+ folder.
  2. Locate the verifydb.cmd file and view its contents:

    VERIFYDB.CMD

    set root=%CCP_PACKAGE_ROOT%\ncbi-blast-2.2.25 set dbdir=%CCP_PACKAGE_ROOT%\ncbidb if exist %dbdir%\*.* goto finish if not exist %dbdir% mkdir %dbdir% echo copying files %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.00.nhr %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.00.nin %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.00.nnd %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.00.nni %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.00.nog %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.00.nsd %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.00.nsi %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.00.nsq %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.01.nhr %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.01.nin %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.01.nnd %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.01.nni %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.01.nog %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.01.nsd %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.01.nsi %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.01.nsq %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer ncbi -LocalDir %dbdir% -FileName est_human.nal :finish
  3. The verifydb.cmd file runs in each of the Windows Azure nodes, as part of a preparation task, before starting the parametric sweep task. The script performs the following commands:
    1. Verifies the existence of the NCBI human genome database in the Windows Azure node.
    2. If the database is not present, runs the AzureBlobCopy utility to download the entire database into the ncbidb folder.
  4. Locate the run.cmd file and view its contents:

    RUN.CMD

    set inputFile=input_%1 set outputFile=output%1.txt set BLASTDB=%CCP_PACKAGE_ROOT%\ncbidb set root=%CCP_PACKAGE_ROOT%\ncbi-blast-2.2.25 set dbdir=%CCP_PACKAGE_ROOT%\ncbidb set inputdir=%CCP_WORKDIR%\%CCP_JOBID%\%CCP_TASKID%\input set outputdir=%CCP_WORKDIR%\%CCP_JOBID%\%CCP_TASKID%\output if not exist %inputdir% mkdir %inputdir% if not exist %outputdir% mkdir %outputdir% %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Download -BlobContainer inputncbi -LocalDir %inputdir% -FileName %inputFile% %root%\bin\blastn.exe -db est_human -query %inputdir%\%inputFile% -out %outputdir%\%outputFile% %root%\AzureBlobCopy\AzureBlobCopy.exe -Action Upload -BlobContainer outputncbi -LocalDir %outputdir% -FileName %outputFile% %root%\UploadToBOV\UploadToBOV.exe %outputdir%\%outputFile% rem cleanup rmdir /S /Q %inputdir% rmdir /S /Q %outputdir%

  5. The run.cmd file executes the following commands:
    1. Runs the AzureBlobCopy utility to download the current nucleotide input file from the input blob and decompress it.
    2. Runs the blastn application, which performs the nucleotide matching with the NCBI human genome database.
    3. Runs the AzureBlobCopy utility to upload the resulting file to the output blob.
    4. Runs the UploadToBOV utility to upload the resulting file to the BOV website, and writes the URL returned by the website to Windows Azure table.
    5. Clears the input and output files to conserve space on the Windows Azure node.

Task 3 - Preparing the Blast Solution for Deployment

In this task, you will make the necessary adjustments to the projects’ configuration files so you can deploy them to the Windows Azure nodes.

  1. In the Solution Explorer window, expand the Client folder, expand the BlastClient project, and open the app.config file. The appSettings section contains four keys: StorageAccountName, StorageKey, HpcWebServiceUrl, and NodesGroup.
    1. StorageAccountName and StorageKey. These application settings hold the name of the Windows Azure storage account and the account’s primary key. Change these settings to match your Windows Azure storage account name and storage account primary key, respectively.
    2. NodesGroup. This application setting holds the name of the Windows Azure node group. If you have a different group that contains Windows Azure worker nodes and would prefer to use it for this sample, change the value of this setting to the name of your group as it is specified in your HPC cluster’s node configuration.
    3. HpcWebServiceUrl. This application setting contains a URL that points to the location of the HPC Job Scheduler’s HTTP web service. For example, if your HPC Job Scheduler’s machine name is MyHeadNode, and the HPC cluster’s name is MyCluster, then the application setting will be:

      XML

      <add key="HpcWebServiceUrl" value="https://MyHeadNode/WindowsHpc/MyCluster"/>
  2. In the Solution Explorer window, expand the HpcApplication folder, expand the AzureBlobCopy project, and open the App.config file. The appSettings section contains two keys: StorageAccountName and StorageKey. Change the values of these keys to match your Windows Azure storage account name and storage account primary key, respectively.
  3. In the HpcApplication folder, expand the UploadToBOV folder, and open the app.config file. The appSettings section contains two keys: StorageAccountName and StorageKey. Change the values of these keys to match your Windows Azure storage account name and storage account primary key, respectively.
  4. Save all the changed files and build the solution.

Task 4 - Uploading Input Files to Blob Storage

In this task, you will upload the input files required by the parametric sweep application to a blob in your Windows Azure storage account.

Uploading, downloading, and browsing files in blobs is an easy task if you install one of the blob storage browsing applications, such as CloudBerry Explorer for Azure Blob Storage, or the Azure Storage Explorer. The following steps are for the CloudBerry Explorer application; you can use the same techniques with Azure Storage Explorer, but the steps may differ.

  1. Download and install CloudBerry Explorer for Azure Blob Storage.
  2. Download and extract the human genome database files from the NCBI FTP server:

    1. Browse to the NCBI FTP server on ftp://ftp.ncbi.nih.gov/blast/db
    2. Download the files named est_human.00.tar.gz, and est_human.01.tar.gz.
    3. Extract both .gz files to the same folder. After extracting both files, you should have 17 files, as shown in Figure 3:

    Figure 3

    Content of the extracted NCBI human genome database

  3. Open CloudBerry Explorer for Azure Blob Storage from Start | All Programs | CloudBerryLab | CloudBerry Explorer for Azure Blob Storage | CloudBerry Explorer for Azure Blob Storage.
  4. Open the File menu and select Azure Blob Storage Accounts. The Account Registration dialog will appear, as shown in Figure 4:

    Figure 4

    The Account Registration dialog

  5. Click the Add button, and input the display name of the storage account, the storage account name, and the shared key (the primary access key) of the account. Use the same storage account settings you used in Task 3 for the AzureBlobCopy project.
  6. Click the Test Connection button and wait for the approval message. Close the approval message, click OK to add the storage account, and close the Azure Blob Storage Accounts dialog.
  7. You should now see your blob storage in the left pane of the application, and your machine (“My Computer”) in the right pane.
  8. Create a new container in the blob by clicking on the New Container button in the left pane, as shown in Figure 5:

    Figure 5

    Creating a new blob container

  9. In the Create New Container dialog, set the container name to inputncbi, select the Full public read access option from the Access control options as shown in Figure 6, and click OK.

    Figure 6

    The Create New Container dialog

  10. Locate the newly created container in the list of containers and double-click its name to view its contents (it should be empty for now).
  11. In the right pane, navigate to the Source\Input folder that is in the BLAST sample folder.
  12. Select all the files (200 files) from the Input folder and click the Copy button. Click Yes in the confirmation message that appears, and then wait for the copy procedure to complete.
  13. Repeat steps 8 and 9 to create another new container, this time naming it ncbi.
  14. In the left pane, click the Root folder in the address path to move to the root of the blob.
  15. Locate the ncbicontainer in the list of containers and double-click its name to see its content (it should be empty for now).
  16. In the right pane, navigate to the folder to which you extracted the database files in step 2 of this task.
  17. Select all the extracted database files (17 files) from the database folder and click the Copy button. Click Yes in the confirmation message that appears, and then wait for the copy procedure to complete.

    Note:
    The size of the database is about 2.5GB, so this operation may take some time, depending on your network bandwidth.

  18. After the upload completes, close the CloudBerry Explorer application.