Automation Foibles Unveiled: Saving random data

Now, many of you probably know that I am a big fan of computer generated random test data that is a represents a reasonable sample data set from the total population of possible test data. (I refer to this a probabilistic stochastic test data.) So, why would I argue against preserving randomly generated test data?

I just returned from STAREast, where for the second time in a month I heard someone suggest storing randomly generated test data in a file. Many people will site the inability to recreate random test data as a drawback to using randomly generated test data in a test. So, the reason these people suggested storing the random data in a file is so they can easily repeat a test with the same data should some randomly generated test data expose an anomaly. I absolutely concur that if we generate random test data, and that test data exposes a problem we need a way to recreate the data. But, isn't there a better way than to save random test data in a file?

Saving randomly generated test data to a file creates a test artifact. Depending on how much randomly generated data is generated, this file could become quite large. Also, saving data to a file impacts the performance of an automated test and certainly slows down manual execution of tests. Then consider the number of tests that generate random test data are executed numerous times throughout the lifecycle, and it doesn't take long until we have countless test artifacts simply storing more static test data that quickly loses its value (especially if no problems were detected). Of course, we can easily delete the files after the test if no anomaly was detected, but I suspect that most testers will delete those files upon the completion of the test if no problems were detected.

So, the question is how can we reproduce computer generated probabilistic stochastic test data if we don't save that randomly generated data to a file?

Planting Seeds

In computing, a seed is simply an integer value that is used by a random generator as the starting value. If we pass a seed value as an argument to a given random generator then we will consistently get the same random value each and every time. Essentially, a seed allows us to replicate computer generated probabilistic stochastic test data anytime as long as we use the same seed and the same random generator algorithm. So, instead of saving each and every piece of randomly generated test data used in any given test, we can simply log the seed value used by that test in the test results log file.

But, if we use the same seed all the time, then we are simply generating the same data over and over again. And, manually inputting a seed for each test that generates probabilistic stochastic test data is not an ideal situation, especially for automated tests. So, to solve that problem we can randomly generate a seed value that is then passed to the random generator algorithm!  Again, logging the randomly generated seed allows us to accurately reproduce the probabilistic stochastic test data at any later time.

The example below illustrates a simple method in C# that will either generate a random seed or return a user specified seed value.

         public static int GetSeedValue(string seedValue)
        {
            // check if user specified seed value is passed as an arguement to 
            // the seedValue parameter
            if (seedValue == string.Empty)
            {
                // Create a new random object
                Random randomObject = new Random();
                // Generate a random integer value between 0 and 2,147,483,647
                return randomObject.Next();
            }
            else
            {
                // convert the seedValue to an integer value
                // NOTE: This example method does not include exception handling
                return int.Parse(seedValue);
            }
        }

The following example illustrates how to use this method to get a random seed value to generate random strings and numbers that increase the breadth of test data coverage in each subsequent iteration of a test.

         static void Main(string[] args)
        {
            // These variables declare the range of characters used for the
            // string test data. In this case the strings are composed of upper
            // case ASCII characters 'A' through 'Z'
            char minChar = '\u0041';
            char maxChar = '\u005A';
            
            // This reads the user specified seed value from the console window
            // If no seed value is specified an empty string is passed to the 
            // GetRandomSeed method which will cause it to generate a random 
            // seed value.
            string mySeed = Console.ReadLine();
            
            // Declare a seed variable and initialize it to either the user
            // specified seed or to a computer generated random seed value
            int seed = GetSeedValue(mySeed);

            // The seed value should be permenently recorded in the logged
            // results for this test
            Console.WriteLine("The seed value for this test is {0}\n", seed);

            // Create a new random object based on the seed
            Random randomGeneratorObject = new Random(seed);

            // Generate 10 random strings
            for (int count = 0; count < 10; count++)
            {
                // Declare and initialize a string variable for our test data
                string testString = string.Empty;
                // Generate random length strings between 1 and 10 characters
                for (int length = 0; length < randomGeneratorObject.Next(1, 11); length++)
                {
                    // Generate a random character within the defined range and
                    // concatenate it to the testString variable until the 
                    // random string length has been reached
                    testString += Convert.ToChar(randomGeneratorObject.Next(
                        minChar, maxChar + 1)).ToString();
                }
                // Write the test string to the console window
                Console.WriteLine("Test String {0}: {1}", count + 1, testString);
            }

            Console.WriteLine("\nRandom numbers");
            // Generate 5 random numbers
            for (int numberCount = 0; numberCount < 5; numberCount++)
            {    
                Console.WriteLine("{0} ", randomGeneratorObject.Next());
            }
        }

Calling the Main method and passing an integer value between 0 and 2,147,483,647 will generate 10 random length strings composed of random upper case characters between 'A' and 'Z' and 5 random numbers. If no user specified seed is passed to the Main method then the code will call the GetGenerateSeed method and generate a random seed value for use in the test. Of course, passing the same integer value will produce the same strings and numbers each and every time.

Using probabilistic stochastic test data is valuable because it efficiently increases the breadth of data coverage, and significantly augments 'typical' static test data, user-generated test data, or static test data derived from historical failure indicators. But, instead of storing randomly generated test data in a file, it is a best practice to simply record the seed value of each test. With a seed value we can easily recreate the computer generated random test data should any of the random data used in a test exposes an anomaly.