Large numbers and Azure Mobile Services

In any distributed system with different runtime environments, the way that data is represented in all nodes of the system, and transferred between them, can cause some problems if there is a mismatch between the nodes. In Azure Mobile Services we often have this issue - the server runtime runs on a JavaScript (or more precisely, node.js) engine, while the client can run in many different platforms (CLR managed code, Objective-C, Java, JavaScript, or any other client using the REST interface). With JavaScript - the mobile service runtime - there are two data types which usually cause problems: dates and numbers. Let's look at them in this post.

Dates problems aren't common to JavaScript - dealing with conversions between local time (what people would most like to see in their applications) and standard time (usually UTC, how we'd store data) has been a problem in many frameworks, including .NET, and many smart people have written about it. Besides the framework-specific issues with dates, the main problem with the fact that Azure Mobile Services as a distributed system uses JavaScript in the backend is that dates in JS are represented as the number of milliseconds from the Unix zero date (1970-01-01T00:00:00.000 UTC) is that dates with sub-millisecond precision are truncated, which is rarely a big problem.

Numbers, on the other hand, tend to cause some problem with heterogeneous systems with JavaScript in one side and another language on another. In JS, all numbers are represented as 64-bit (double precision) floating point values. In the managed world, that means that every number would be represented as a Double. But in the managed (or other languages with strong typing), other numeric types exist and are often used (with good reason) in defining the data types used by the application. Integers (usually 32 and 64 bits, but also in other sizes), single and double precision floating point numbers and fixed-point (with fixed or arbitrary precision) numbers are represented by a large variety of types in different languages. That means that there are many numbers which cannot be represented, without loss of precision, in JavaScript, so any time one of those numbers is sent from the client to the service (e.g., as part of an object being inserted into the database), when it’s sent back to the client, its value will be different. For example, any odd numbers beyond 2^53 cannot be represented as a 64-bit floating point value.

So what do the client SDKs for Azure Mobile Services do when facing with numbers which can potentially be corrupted by going to the server? The answer depends on the how the application is interacting with the SDK, or more specifically, the data types which are being stored into / retrieved from the backend service. In many languages, there are two possible ways for an application save data, so let’s look at how the SDK deals with numbers on those two separately.

Typed mode

In the typed mode, we use “regular”, user-defined data types (e.g., User, Product, Order, TodoItem, etc.), and the SDK handles the serialization / deserialization of those types into / from the format used in the wire (JSON). The clients for managed platforms (Windows Store, Windows Phone, Desktop .NET 4.5) and Android both have this mode. JavaScript-based clients (where there really are no user-defined data types – and I’m not going here into the argument of prototypes versus real object-orientation) doesn’t have this mode (and it really doesn’t matter for this specific post, since there’s no difference in number representation between the JavaScript on the client and on the server). The iOS client SDK also doesn’t have it, since there’s no widely-used, generic serialization mechanism to translate between Objective-C @interfaces and JSON.

In the typed mode, the SDK does a lot of data manipulation under the covers, so it was coded in a way that, if data loss were to happen, an exception is thrown to the user. The idea is that the developer is trusting the SDK with its data, and we don’t want to corrupt it without warning the user. Let’s take the code below.

  1. public sealed partial class MainPage : Page
  2. {
  3.     public static MobileServiceClient MobileService = new MobileServiceClient(
  4.         "https://YOUR_SERVICE_HERE.azure-mobile.net/",
  5.         "YOUR_APPLICATION_KEY_HERE"
  6.     );
  7.  
  8.     public MainPage()
  9.     {
  10.         this.InitializeComponent();
  11.     }
  12.  
  13.     private async void btnStart_Click_1(object sender, RoutedEventArgs e)
  14.     {
  15.         try
  16.         {
  17.             var table = MobileService.GetTable<Test>();
  18.             Test item = new Test { Str = "hello", Value = (1L << 53) + 1 };
  19.             await table.InsertAsync(item);
  20.             AddToDebug("Inserted: {0}", item);
  21.         }
  22.         catch (Exception ex)
  23.         {
  24.             this.AddToDebug("Error: {0}", ex);
  25.         }
  26.     }
  27.  
  28.     void AddToDebug(string text, params object[] args)
  29.     {
  30.         if (args != null && args.Length > 0) text = string.Format(text, args);
  31.         this.txtDebug.Text = this.txtDebug.Text + text + Environment.NewLine;
  32.     }
  33. }
  34.  
  35. public class Test
  36. {
  37.     public int Id { get; set; }
  38.     public string Str { get; set; }
  39.     public long Value { get; set; }
  40.  
  41.     public override string ToString()
  42.     {
  43.         return string.Format("Id={0},Str={1},Value={2}", Id, Str, Value);
  44.     }
  45. }

When the btnStart_Click_1 handler is invoked, the code tries to insert a typed item (Test) into a table with a long value which would be corrupted if the operation were to complete. Instead, the code throws the following exception

     System.InvalidOperationException: The value 9007199254740993 for member Value is outside the valid range for numeric columns.

The validation ensures that integers have to fall in the range [-2^53, +2^53]; numbers outside that range are rejected, and the exception is thrown.

Now, what if you really want to use numbers beyond the allowed range? There are a few possibilities. In the .NET-based SDKs, you can actually remove the validation, which is made by a JSON.NET converter, by using a code similar to the one below. Notice that this will cause data corruption, but if precision can be sacrificed for a wider range of numbers, then that’s an option.

  1. var settings = MobileService.SerializerSettings;
  2. var mspcc = settings.Converters.Where(c => c is MobileServicePrecisionCheckConverter).FirstOrDefault();
  3. if (mspcc != null)
  4. {
  5.     settings.Converters.Remove(mspcc);
  6. }
  7.  
  8. var table = MobileService.GetTable<Test>();
  9. Test item = new Test { Str = "hello", Value = (1L << 53) + 1 };
  10. await table.InsertAsync(item);
  11. AddToDebug("Inserted: {0}", item);

Another alternative is to change the data type for the value. Double is represented exactly like numbers in the server, so all numbers that can be represented in the client can be transferred to the server and back. But double values may lose precision as the numbers grow big as well.

Yet another alternative is to use strings instead of numbers. With strings you can actually have arbitrary precision, but you may lose the ability to do relational queries on the data (unless you use some sort of zero-left-padding to normalize the values), and they take up more storage on the server.

The main take away is that if you’re dealing with large numbers and user-defined types, there will be cases where those numbers won’t be able to be represented in the server. The client SDK will try its best to warn the user (via exceptions) that a data loss would occur, but there are alternatives if the application really requires large numbers to be stored.

Untyped mode

The second way for an application to exchange data with the service is via an “untyped” model, where instead of dealing with “user types”, the application works with simpler types (dictionaries, arrays, primitives) which map directly to JSON. The untyped model appears in different ways in different platforms:

Unlike on the typed mode, where there is a step which is taken to convert the object into the JSON-like structure which is sent to the server, this step is unnecessary in the untyped mode. Therefore, we had to make a choice: validate that the numbers could be faithfully represented in the server and return an error (such as returning exceptions or the appropriate error callback), and incur the penalty of the additional validation for a scenario which isn’t too common; or bypass the validation, and let the user (in the scenarios where it’s applicable) deal with the large numbers themselves. After some internal discussion, we made the second choice (I don’t think there’s really a right or wrong approach, just some decision that had to be made – but if you disagree, you can always send a suggestion in our UserVoice page and we can consider it for the updates to the client SDKs).

What that decision entails is that if you try to run the following code, you’ll not get any error:

  1. JObject item = new JObject();
  2. item.Add("Str", "hello");
  3. item.Add("Value", 1234567890123456789L);
  4. var table = MobileService.GetTable("Test");
  5. var inserted = await table.InsertAsync(item);
  6. AddToDebug("Original: {0}", item);
  7. AddToDebug("Inserted: {0}", inserted);

What will happen instead is that the output will be shown as follows:

Original: {
"Str": "hello",
"Value": 1234567890123456789
}
Inserted: {
"Str": "hello",
"Value": 1234567890123456800,
"id": 36
}

Similarly, this Objective-C code shows the same result

- (IBAction)clickMe:(id)sender {
MSTable *table = [client tableWithName:@"Test"];
NSDictionary *item = @{@"Str" : @"Hello", @"Value" : @(1234567890123456789L)};
[table insert:item completion:^(NSDictionary *inserted, NSError *error) {
NSLog(@"Original: %@", item);
NSLog(@"Inserted: %@", inserted);
}];
}

And the logs:

2013-04-10 13:36:18.009 MyApp[9289:c07] Original: {
Str = Hello;
Value = 1234567890123456789;
}
2013-04-10 13:36:18.009 MyApp[9289:c07] Inserted: {
Str = Hello;
Value = 1234567890123456800;
id = 58;
}

Now, what if we actually want to enforce the limit checks on untyped operations? One simple alternative is to, prior to making the CRUD call, to traverse the object to see if it contains a long value which cannot be represented at the server side. Another alternative is to add a message handler (on the managed clients) or a filter (on the other platforms) which will look at the outgoing JSON request and fail if it has some numbers which can cause trouble if sent to the server side. This is one simple implementation of the validation for the managed client:

  1.       bool WillRoundTripWithNoDataLoss(JToken item)
  2.       {
  3.           if (item == null) return true;
  4.           switch (item.Type)
  5.           {
  6.               case JTokenType.Array:
  7.                   JArray ja = (JArray)item;
  8.                   return ja.All(jt => WillRoundTripWithNoDataLoss(jt));
  9.               case JTokenType.Object:
  10.                   JObject jo = (JObject)item;
  11.                   return jo.PropertyValues().All(jt => WillRoundTripWithNoDataLoss(jt));
  12.               case JTokenType.Boolean:
  13.               case JTokenType.Float:
  14.                 caseJTokenType.Null:
  15.               case JTokenType.String:
  16.                   return true;
  17.               case JTokenType.Integer:
  18.                   JValue jv = (JValue)item;
  19.                   long value = jv.Value<long>();
  20.                   long maxAllowedValue = 0x0020000000000000;
  21.                   long minAllowedValue = 0;
  22.                   unchecked
  23.                   {
  24.                       minAllowedValue = (long)0xFFE0000000000000;
  25.                   }
  26.  
  27.                   return minAllowedValue <= value && value <= maxAllowedValue;
  28.               default:
  29.                   throw new ArgumentException("Validation for type " + item.Type + " not yet implemented");
  30.           }
  31.       }

In summary, it’s possible that you’ll never need to worry about this “impedance mismatch” between the client and the server for large numbers, and all values will just work. But it’s always nice to go into a framework knowing some of its pitfalls, and this one is in my opinion one which could be hard to identify.