2019-01-30

November 2016

Volume 31 Number 11

[Cutting Edge]

Code First and Database Initialization

By Dino Esposito | November 2016

Dino Esposito Though the term “DevOps” is relatively new and recently has grown to encompass many more activities—most notably automated testing and deployment—I believe the first example of an automatable developer operation is as old as the software itself. I’m referring to the specific ability to create and initialize the database during the application setup. Many software companies develop vertical systems that are then sold to a variety of customers and adapted to their needs.

The aspects that can be customized depend on the characteristics of the product and the business domain, but I dare say that at the very minimum any vertical software application needs to have a customer-specific database. Therefore, the database must be created with the tables and schemas required by the context and populated with ad hoc data.

Not all required tasks can always be automated and built into the product itself. Imagine, for example, the import of existing data. Whether the data to import resides in Excel files or legacy databases, odds are that an importer must be built in some way to process the data and load it into the new storage. However, if you employed Entity Framework 6.x Code First in the data access layer of the application, then at least the creation of the database schemas and tables is easy to automate and can transparently happen the first time the application is run.

In this column, I’ll summarize some features that have been available in Code First since the early days from the perspective of a multi-customer application. In particular, I’ll focus on how to create and fill a database and how to programmatically define its name and connection string.

Laying the Ground of Table Schemas

Let’s assume you have a brand-new Visual Studio project already linked to the Entity Framework 6.x NuGet package. The next step you might want to take is to create a data-access class library or at least a distinct folder in the current project to save all files that in some way concur with the working of the data-access functionality. According to Entity Framework rules, you need to have a DbContext class that represents the entry point in the application’s data management subsystem. Here’s an example of such an application-specific class:

public class RegionsContext : DbContext
{
  ...
}

To the application’s eyes, the DbContext-derived class is no more and no less than the database. The class is expected to expose a handful of DbSet<T> properties, one for each collection of entities being managed by the application. A DbSet<T> property is logically equivalent to a table in a physical database:

public class RegionsContext : DbContext
{
  public DbSet<Region> Regions { get; set; }
}

The net effect of the code snippet here is to enable the application to work with a relational database that contains a table named Regions. What about the schema of the table? The schema is determined by the public layout of the Region class. Code First offers both a range of attributes and fluent syntax to define a mapping between class properties and columns of the underlying table. In the same way, you can define indexes, primary keys, identity columns, default values, and whatever else the underlying database lets you configure in columns and tables. Here’s a minimal version of the Region class:

public class Region
{
  public Region()
  {
    Languages = "EN";
  }
  [Key]
  public string RegionCode { get; set; }
  public string RegionName { get; set; }
  public string Languages { get; set; }
  ...
}

As is, all records in the Regions table will have three nvarchar(MAX) columns (RegionCode, RegionName and Languages) and RegionCode will be set as the primary key column. In addition, any newly created instance of the Region class will have the value “EN” set to the Languages property. This is a way to ensure that “EN” is the default value for the column whenever a new record is added through the code. Note, though, that setting a value in the constructor of a Code-First solution as is done here doesn’t automatically add any default value binding in the underlying database configuration.

Naming the Database

In a Code-First solution, all connections to the database pass through the DbContext-derived class, which opens and closes connections appropriately while still publicly exposing the connection as a property for your code to take full control over open and close operations. What about the details of the connection string and, more important, how do you provide details in a parametric way?

When you create a DbContext-derived class, you must provide a constructor. Here’s a very common example:

public RegionsContext(string conn) : base(conn)
{
  ...
}

The DbContext class has a constructor that accepts the connection string as a parameter so the simplest thing to do is that you mirror the functionality of the underlying constructor through the constructor of your derived class. Intelligently, the DbContext class contains some logic to process the string being passed. Any string you pass that’s in the form name=XXX is assumed to indicate that the actual connection string can be found in the XXX entry within the connectionstrings section of the application’s configuration file (that is, web.config for a Web project). Otherwise, any string passed is assumed to be the name of the database to create. In this case, further details of the connection string, such as credentials and server location, are expected to be in the defaultConnectionFactory block of the entityFramework section in the configuration file. Note that whenever you add the Entity Framework package to a Visual Studio project, the configuration file is silently modified to support the entityFramework section. Figure 1 shows the related listing, amended a bit for clarity.

Figure 1 Sample Web.config File as Modified to Support Entity Framework Code First

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <configSections>
    <section name="entityFramework"
      type="System.Data.Entity.Internal.ConfigFile.EntityFrameworkSection,..." />
  </configSections>
    <startup>
      <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5" />
    </startup>
  <connectionStrings>
    <!-- Your explicit connection strings -->
  </connectionStrings>
  <entityFramework>
    <defaultConnectionFactory type=
      "System.Data.Entity.Infrastructure.SqlConnectionFactory,
      EntityFramework">
    <parameters>
    <parameter value="Data Source=(local); Integrated Security=True;" />
    </parameters>
  </defaultConnectionFactory>
  <providers>
    <provider invariantName="System.Data.SqlClient"
      type="System.Data.Entity.SqlServer.SqlProviderServices, ..." />
    </providers>
  </entityFramework>
</configuration>

Most of the examples you find about Code First are based on a fixed and constant connection string either referenced from the configuration file or explicitly passed to the context class. The net effect is that Code First creates the database using the provided connection string the first time the application is run. Let’s delve a bit deeper into this aspect.

The DbContext class supports four initialization strategies, as listed in Figure 2.

Figure 2 Code-First Database Initialization Strategies

Strategy	Description
CreateDatabaseIfNotExists	Checks if the database exists and creates it if it can’t find one. If the database exists but has an incompatible schema then it throws an exception. Note: This is the default initializer.
DropCreateDatabaseIfModelChanges	Creates the database if it doesn’t exist. If the database exists but has an incompatible schema then it drops the existing database and creates a new one.
DropCreateDatabaseAlways	Every time you run the application it drops an existing database and recreates it.
Custom initializer	Custom initializer class you write to work out just the behavior you wish to have that none of the other options offer. Note: You must use this option to add some master content to the database.

According to the default behavior CreateDatabaseIfNotExists, every time the context class is created, it checks whether the referenced database exists and is reachable. If not, it creates it. If the database exists and is reachable but doesn’t have a schema compatible with the public layout of entity classes, then it throws an exception. To remove the exception, you have to edit the entity class or, more likely, edit the schema of the database either through the database programming interface or an Entity Framework migration script.

I find this option ideal for when the application reaches the production level. During the development stage, though, I prefer the DropCreateDatabaseIfModelChanges option, which essentially shields you from any database maintenance chores: All you do is tweak the entity classes as appropriate and Entity Framework fixes the database the next time you hit F5 in Visual Studio. To activate the initialization strategy of your choice you add the following line to the constructor of the custom DbContext class:

Database.SetInitializer<YourDbContext>(
  new DropCreateDatabaseIfModelChanges<YourDbContext>());

Note that you can also set the database initializer in the configuration file, which might be a good idea if you plan to use different strategies in production and development.

In summary, Code First enables you to write an application that automatically creates all of its database tables the first time you run it. In other words, all you have to do is copy files and binaries and then launch it. This behavior, however, works at its best if the system is built for a single customer. When you have a multi-customer system, the best you can do is use a setup utility.

One of the reasons for taking a slightly different approach is that you might want to name the database differently; for example, you might want to add a customer-specific prefix to the name. Figure 3 shows the skeleton of such a command-line utility. The program takes the customer prefix from the command line, formats the database name as appropriate and then triggers the DbContext-derived class, which recreates the database and fills it with appropriate initial data.

Figure 3 Customer-Specific Name of the Database

class Program
{
  static void Main(string[] args)
  {
    var prefix = args[0];
                // boundary checks skipped
    var dbName = String.Format("yourdb_{0}", prefix);
    using (var db = new YourDbContext(dbName))
    {
      // Fill the database in some way
    }
  }
}

Initial Fill of the Database

Any system designed to accommodate the needs of multiple customers in the same business domain must have a number of tables dedicated to storing options and preferences that differ from customer to customer. This information must be provided in some way upon installation of the software. Realistically, part of the initial load of the database is shared by all installations, but part of it is customer-specific. The part that depends on customer’s data is commonly imported from external sources and requires an ad hoc routine, whether it’s a script of some kind or compiled code. Depending on the context, it might not even be out of place thinking of some dependency injection mechanism to generalize the structure of importers within the setup utility that initializes the database. As far as static database content is concerned, though, Code First has ad hoc services to offer.

Custom Initializers

To stuff data into the database during the initialization process, it’s required that you create a custom database initializer as described in Figure 2. A custom initializer is a class that inherits from one of the predefined initializers such as DropCreateDatabaseIfModelChanges. The only strict requirement for this class is overriding the Seed method:

public class YourDbInitializer : DropCreateDatabaseAlways<YourDbContext>
{
  protected override void Seed(YourDbContext context)
  {              
    ...
  }
}

In the implementation of the Seed method, you execute any code that populates the tables of the database using the provided DbContext to access it. That’s all of it.

If you’re planning a multi-customer application, defining a custom initializer is a good move because it gives you a single point to focus on to shape up the initial form of the database on a per-customer basis. The initializer is a plain C# class so it might be empowered using dependency injection tools to connect it to specific pieces of logic that can import data from just where they live.

Last but not least, database initializers can be disabled entirely so that the setup of the database remains a completely distinct operation—perhaps even managed by a different IT or DevOps team. To instruct the Code-First framework to ignore any initializers, you need to follow code in the constructor of the custom DbContext class:

Database.SetInitializer<YourDbContext>(null);

For example, this is a safe option to use when you release updates to existing systems. Disabling initializers ensures you never lose any of the existing data no matter what.

Wrapping Up

At the end of the day, Code First makes writing multi-tenant and multi-customer applications no harder than applications specifically written for a well-known configuration and client. All that’s required is a bit of knowledge about assignment of connection strings and the initialization process. In Entity Framework Core, the core principles remain unaltered even though the details of how it ultimately works are different. In particular, the new DbContext class features an OnConfiguring overridable method through which you connect the context to the database provider of choice and pass it credentials, and whatever else.

Dino Esposito is the author of “Microsoft .NET: Architecting Applications for the Enterprise” (Microsoft Press, 2014) and “Modern Web Applications with ASP.NET” (Microsoft Press, 2016). A technical evangelist for the .NET and Android platforms at JetBrains, and frequent speaker at industry events worldwide, Esposito shares his vision of software at software2cents@wordpress.com and on Twitter: @despos.

Thanks to the following Microsoft technical expert for reviewing this article: Andrea Saltarello (andrea.saltarello@manageddesigns.it)
Andrea Saltarello is an entrepreneur and software architect from Milan, Italy, who still loves writing code for real projects to get feedback about his design decisions. As a trainer and speaker, he has had several speaking engagements for courses and conferences across Europe, such as TechEd Europe, DevWeek and Software Architect. He has been a Microsoft MVP since 2003 and was recently been appointed a Microsoft Regional Director. He is passionate about music, and is devoted to Depeche Mode, with whom he has been in love ever since listening to “Everything Counts” for the first time.

Discuss this article in the MSDN Magazine forum

Share via