Azure Storage Bindings Part 1 – Blobs

The Azure WebJobs SDK provides model binding between C# BCL types and Azure storage like Blobs, Tables, and Queues.

The SDK has a JobHost object which reflects over the functions in your assembly.  So your main looks like this:

         static void Main()
        {
            string acs = "DefaultEndpointsProtocol=https;AccountName=???;AccountKey=???";
            JobHost host = new JobHost(acs); // From nuget: Microsoft.WindowsAzure.Jobs.Host
            host.RunAndBlock();            
        }

The JobHost will reflect over your methods looking for attributes in the Microsoft.WindowsAzure.Jobs namespace and use those attributes to setup some triggers (BlobInput, QueueInput) and do bindings. RunAndBlock() will scan for various triggers and then invoke your function when a trigger is fired. Model Binding refers to how the JobHost binds your functions parameters. (it’s very similar to MVC/WebAPI).

The benefits of model binding:

  1. Convenience. You can pick the type that’s most useful for you to consume and the WebJobs SDK will take care of the glue code. If you’re doing string operations on a blob, you can bind directly to TextReader/TextWriter, rather than 10 lines of ceremony to convert a TextWriter from a CloudBlob.
  2. Flushing and Closing: The WebJobs SDK will automatically flush and close outstanding outputs.
  3. Unit testability. It’s far easier to unit test and mock BCL types like TextWriter than ICloudBlob.
  4. Diagnostics.  model binding cooperates with the dashboard to give you real time diagnostics on your parameter usage. See screenshots below.

And if model binding is not sufficient for you, you can always bind to the Storage SDK types directly.

That said, here are the bindings that are currently supported in the Alpha release. 

Binding to BCL types: Stream, TextReader/Writer, String

You can use [BlobInput] and [BlobOutput] attributes to bind blobs to the BCL types Stream, TextReader and String.

See triggering rules for more details, but basically a function runs when a blob matching [BlobInput] is found that is newer than the blobs specified by [BlobOutput]. This means that it’s important for a [BlobInput] function to write some output (even if it’s just a dummy file) so that the JobHost knows that it’s run and doesn’t keep re-triggering it.

Here’s an example of a blob copy function using each of those types:

         public static void CopyWithStream(
            [BlobInput("container/in/{name}")] Stream input,
            [BlobOutput("container/out1/{name}")] Stream output
            )
        {
            Debug.Assert(input.CanRead && !input.CanWrite);
            Debug.Assert(!output.CanRead && output.CanWrite);

            input.CopyTo(output);
        }

        public static void CopyWithText(
            [BlobInput("container/in/{name}")] TextReader input,
            [BlobOutput("container/out2/{name}")] TextWriter output
            )
        {
            string content = input.ReadToEnd();
            output.Write(content);
        }

        public static void CopyWithString(
            [BlobInput("container/in/{name}")] string input,
            [BlobOutput("container/out3/{name}")] out string output
            )
        {
            output = input;
        }

Some notes:

  1. It’s fine to have multiple functions read from the same input blob. In this case, all functions are reading from any blob that matches “in/{name}” in the container named “container”.
  2. The Streams / TextWriters are automatically flushed when the function returns. 

You can see some more examples for blob usage on the sample site.

Diagnostics!

When you look at the function invocation in the dashboard, you can see usage stats for each parameter. In this case, we see that CopyWithStream() was invoked on blob “in/a.txt”, and read 4 bytes from it, spending 0.076 seconds on IO, and wrote out 4 bytes to blob “out1/a.txt”.

image

Again, the monitoring here “just works” when using the SDK, you don’t need to include any extra logging packages or do any extra configuration work to enable it.

Binding to Blob Storage SDK types

You can also bind directly to CloudBlob (v1 storage sdk) or ICloudBlob, CloudPageBlob, CloudBlobBlob (v2+ storage sdk). These options are good when you need to blob properties not exposed as a stream (such as etags, metadata, etc).

         public static void UseStorageSdk(
           [BlobInput("container/in/{name}")] CloudBlob input,
           [BlobOutput("container/out4/{name}")] CloudBlob output
           )
        {
            // get non-stream properties 
            input.FetchAttributes();
            var keys = input.Metadata.AllKeys;            
            
            // do stuff...
        }

The storage types obviously give you full control, but they don’t cooperate with the dashboard and so won’t give you the same monitoring experience as using the BCL types.

You can also bind a parameter to both CloudStorageAccount 1.7x and 2.x+ Storage SDK types.

         public static void Func(CloudStorageAccount account, ...)
        {
            // Now use the azure SDK directly 
        }

The SDK is reflecting over the method parameters, so it knows if ‘account’ is 1.7x (Microsoft.WindowsAzure.CloudStorageAccount) or 2x+ (Microsoft.WindowsAzure.Storage.CloudStorageAccount) and can bind to either one. This means that existing applications using 1.7 can start to incorporate the WebJobs SDK.