Why EF Code First Sucks … and how to fix it if you must use it


NOTE: EF is a tool. As with all tools, it has pros and cons. The headline is designed to get you to examine your useage.

I am a contrarian in many cases. I look at what is in front of me and ask one of the following questions:

  • When would I use this thing I don’t like
  • When would I not use this thing I do like

By constantly questioning, I find myself being a good devil’s advocate. It also keeps me focused on solving problems the “right way”, which means adding in the variables as I find them and validating my knee jerk reaction over and over again. Sometimes this means not adopting the latest FUD, no matter how awesome it is, as it will cause too much disruption (read: too much risk).

As a developer, I think you should explore everything you find and constantly be curious. As an architect, and now a manager, I think you have to take in account how many moving parts you are introducing before making a decision to add something new. I also am very strong at examining pros and cons before striking out. This does not mean I don’t make very quick decisions at times, but that I strive to always do at least some research before firing up the engine and traveling away from home.

For me, Entity Framework makes things easy and much faster to develop. But, it also puts you in a box, like all software designed to develop (even Visual Studio puts you in a box, just larger).ย There are some projects that fit in the box, but some that do not. As I work on mission critical applications, in an Enterprise setting, I find the maintainability and acceptability of the solution to outweigh time to market in most cases. This makes EF less attractive to me, especially since most EF developers I have met understand the basics and not much more. When you build software any idiot can use, I can guarantee you some of the people using it are idiots. The easier it is, the more idiots.

In particular, this article is focused on issues I am having with a current project I am testing. The problems I have found are not unique, however, as I have seen them in other instances. In this case, there are a few additional issues that exacerbate the problem. And that takes us to “why this post?”

To put it plainly, if you are going to use EF Code First, don’t use the defaults. Let’s dive in.

The problem with EF Code First

Let me be fair. This is not really a problem with EF Code First as much as understanding how to use it correctly. The issue I see is so many people DON’T know how to use it correctly. In fact, it seems to be a one-size fits all tool for so many people. And a great number of them are ignorant about the pros and cons of the product and the cause and effect nature of the issues it creates when used in default. And, as mentioned in the last section, some are idiots. ๐Ÿ˜‰

Auto-generation is attractive. I utilize my skill in coding domain objects, push build and deploy and voila (wah-lah for those who hate French?). Now I have a site. If I just went with defaults, however, I have all of the following problems.

  • My database looks just like my business objects.
  • My interfaces to my data are CRUDdy (simply create, read, update and delete on tables that simply reconstitute my business objects)
  • My database is hard to optimize if looking just like my business objects is not the optimal way to solve it.
  • If I use GUIDs, which are great for certain scenarios (multiple live servers, creation of IDs on disconnected devices, etc.) the default clustered index can create issues down the line (mostly fragmentation)
  • If I use strings in my business objects, they are NVARCHAR(MAX) in my database
  • By default, I have no constraints (other than the auto-generated primary keys). This means no foreign keys in my database.
  • Filtering is done on the user interface.

Once again, I fully realize much of this is ignorance on how you SHOULD design your application with EF Code First, but with no warnings or guidance in Visual Studio to stop the developer from doing this, I am finding this type of bad EF Code First design is more the rule than the exception.

NOTE: I use the word business objects rather than domain models in this post, as I think the lack of domain focus exacerbates this problem. I will write about domain models in a future post and why domaincentric design (I use the made up domaincentric word to distinguish from a purist Domain Driven Design approach – see: Eric Evans).

Annotating with Attributes

I can cure most of the ills with attributes on my business objects. This includes putting keys of the correct type, adding max lengths and avoiding naked strings on objects. I will go through each of the issues I have discovered (not a complete list).

  • Clustered keys on GUIDs (uniqueidentifier data type in SQL Server)
  • NVARCHAR(MAX) fields
  • Foreign keys

Clustered Keys on Guids

When you use a clustered key, the items in a table are stored in the order of the key. With numeric keys, this means 1 comes before 2, etc. With a common key type being an IDENTITY field, or autoincrementing integer, this type of field works nicely. As each key is created, the new data is placed after previous data. You do not end up with table fragmentation very often, which makes maintenance of your database easier.

As long as you continue to add to the end of the table, the clustered key is very efficient. But, when you stick something in the middle of existing records, it causes some extra work, as it fragments the data table. Without getting too much into the internals of RDBMS, or SQL Server in particular, the underlying pages get fragmented. if the data completely fills a page, the fragmentation is cured by reorganizing pages. This is not trivial, but it is far less intrusive than when multiple rows exist in a single page and the internal data of the page has to be reorganized. Yes, there are things you can do to lessen the blow, like altering the fill percentage of the page, but there is a risk of fragmentation.

Why is fragmentation bad? It reduces the efficiency of the database, which can impact performance.

When you use a Globally Unique IDentified (GUID, which is represented as a uniqueidentifier in SQL Server), the GUIDs may be created sequentially, but this is not guaranteed, especially over time. This has to do with the algorithm to create GUIDs to “guarantee” they are unique (there is still a minuscule risk of creating 2 GUIDs that are alike, but it would be extremely rare – 2^128 or 1 in 340,282,366,920,938,000,000,000,000,000,000,000,000 – a very minuscule risk). When a new GUID is created, it can be sequential for quite some time. Then it is not. And over time, you end up with a lot out of sequence. And you start fragmenting. And performance suffers. And little children start crying … really loud.

Fortunately, this is easy to fix.

Version 1: Fragmentation city

[Key()]
public GUID Id { get;set;}

This is the same as the following in SQL Server.

tbd

Version 2: Non-fragmented bliss

Please read this entire section before attempting (Why? Because I am a smart a**). Let’s solve this problem.

[Key(NonClustered)]
public GUID Id { get;set;}

Bliss!

Yes, I a screwing with you. Entity Framework leaves NO WAY to add a non-clustered primary key in Code First with data annotations. So, you have to sling some code. This will require at least EF 6.2 (released a year ago, so you are on this if you are new to EF). In the OnModelCreating event, add the following:

modelbuilder.Entity().HasKey(k => k.Id, config => config.IsClustered(false));

I have also seen it done like this (very slight variation).

p.HasKey(k => k.ColumnId).ForSqlServerIsClustered(false);

The important take away is you dink with Is Clustered and setting it to false. Both create a migration that sets up a non-clustered index.

Cat’s and dogs sleeping together … and world peace.

If you are prior to EF 6.2, I feel really sorry for you. You can create non-clustered indexes, but not non-clustered primary keys. In that case, i would stop using Code First and cry. ๐Ÿ™‚

NVARCHAR(MAX)

Before getting into the problem, I want to rant a bit. NVARCHAR(MAX) was created to solve a problem that, for the most part, did not exist. The problem is this:

What if I have a field that normally contains a manageable amount of data, but occasionally contains the entire context of War and Peace?

I see this as a possibility, but a very rare one. In most cases, when you have a s**t ton of data in a field, that is the norm, and when you don’t, that is the norm. Usually you can either constrain or not. And, in cases where you have small amounts of data, it is more often from bad test plans on BLOB (Binary Large OBject) fields where the tester does not want to copy and paste and wait.

NOTE: I am willing to listen to cases where NVARCHAR(MAX) is a proper design choice, as I realize I do not have all of the answers. Just ping me and let’s argue over it. ๐Ÿ˜‰

Okay, I am being a bit too snarky. Let me regain my composure.

NVARCHAR(MAX) is designed to add some performance gains for the rare “I can either type in my name or enter War & Peace” scenarios. Anything small enough to fit into the SQL Server page is put in the page. Anything not, is placed in a type of BLOB storage. This means it can be much faster on records that fit in the page and only slow down on those that overflow.

Sounds Great, right? Yes, but it has some implications. You cannot index an NVARCHAR(MAX) field in SQL Server. I think this should be obvious for anyone who things about it, but let me describe the scenario and see if you can spot the problem.

  • You have a table with NVARCHAR(MAX)
  • There are 1,000,000 records in the table
  • You add the contents of War & Peace into 1/2 the records
  • You create an index on that field

Do you see the problem. Hint: How big is the index.

To solve the fact a person could create a multiple terrabyte index on a field, you cannot index using any BLOB field, even one, like NVARCHAR(MAX), that is pseudo-blob.

But, guess what EF does on strings? If you guessed “creates an NVARCHAR(MAX) field” give yourself a golden star.

Is this a problem? Probably not in most cases, but if you know you are creating a field like city, will you ever have a city that contains 1,000,000 characters? Not until we explore Zeekbart in the Omega quadrant (they are very verbose aliens) or start coding in Entish (LOTR’s creatures for the non-Tolkein fans).

Fortunately, you can solve this one very, very easily. Just add a data annotation.

[MaxLength(100)]
public string City { get; set; }

 

Wala Wala Washington. Now it will no longer create a MAX field and you can index city. And, yes, this is a common use case, although I would consider extracting the city information into it’s own table if you wish to use it as part of your queries (one primary reason for indexing this field is ad hoc searching on demographic fields, like “how many users in New York City had a problem”).

Foreign Keys

This is not an issue, as attributing for foreign keys is extremely easy to do. The problem is I find many people using EF Code first FAIL to do it. Your code is auto-generating your database folks. And constraints are a necessary element of ensuring the database catching bad data by throwing an exception when you try to put bad data in. Never rely on the brilliance of your developers to keep you company in business. Instead constrain the database so they can’t mess things up.

I ran into this problem (not EF, but lack of constraints) with a client in the early 2000s. I was asked to look at a problem with their data. When I logged on an examined the schema, I found no foreign keys. Asking the dev manager, he stated “the foreign keys were causing errors, so we removed them”. I said, “of course they were causing errors, because the code was set up to create data problems.” I got paid for identifying the problem, but they were unhappy I could not fix the missing data. Needless to say, they are out of business now. All because of a lack of foreign keys.

Fortunately, you can avoid this type of embarassment by adding … dun, dun, duh … foreign keys. Guess how? Yes, data annotations.

[ForeignKey(“KeyName”)] public int OtherTableId { get; set; }

I am not going to cover it in this post, but you can do composite (multiple field) keys, as well. I try not to use composite keys as a rule. Not because I am against the idea, per se, but because I find it harder on coders in many instances. If you start with the idea each table has a derived key as its primary key, it is very clean. You can then refactor the idea where a composite key would be better.

The Data Looks the Same

This is more of a design issue, but EF makes it very easy to simply pass objects back and forth between layers. If you adhere to principles of Domain Driven Design, you understand each domain is unique in how it treats models. Let’s take an eCommerce company as an example. In particular, let’s look at the customer object.

  • When an order is taken, a customer ID is added. For the sales department, when the order is retrieved, the customer information, including addresses, is needed to track down any issues.
  • For the warehouse, there is no need to have a customer object. You only need an order number with the items to pull.
  • For the shipping department, you need an address and phone number and nothing else.

I could go on, but if you use EF in default mode, you will be tempted to have a single domain and filter on the user interface.

This is what was done in an application I recently tested. Grab everything for all roles and then filter out what is not needed. But, since the application was not designed out enough, the following problems surfaced.

  • Some smaller tables were retrieved in toto and not filtered for the location or the user. Thus, if I added something, someone in another location saw it, even though they should not have.
  • As binding was not constrained, the tables had all of the information. If I grabbed a user object, I got all of the data if it was not filtered correctly.

Sure, the above have some other design issues that contribute to the problem, but EF in default forces you to filter after retrieval. When you do not, you are likely adding extra work, which may not be justified over other methods of data access (a topic for another day).

The bigger issue is the database storage looks just like the domain model. And, if you are using the EF defaults and auto-generation, you end up stuck in this paradigm. Data can be optimized with a few tools, like indexes, but once you figure it out, you need to make sure the changes are reflected in the EF model. In some cases, this is hard, and that which is hard is rarely tried (until someone threatens to fire you – then again, they threaten and you find a new job, so it is still not done).

In Summary

I am not a huge fan of Entity Framework in the Enterprise (important caveat?). Not because it does not work in many cases, but because it has a lot of moving parts and some that are missing. In most cases, you end up having to use custom migrations to get around the fact it is so easy to step outside of the box created for you. Once you start slinging code with every migration, you start asking whether going to scripted database changes is a better approach than data migrations in EF (I will argue that it IS in a future post, but your mileage may vary and I accept that each tool has its pros and cons).

A hill I will die on is “defaults are bad”, at least when it comes to Entity Framework in an Enterprise setting. I will also state you have to annotate your data if you are going to use your model to generate it. I would prefer don’t design your RELATIONAL database with OOP objects, but your mileage may vary.

Peace and Grace,
Greg

Twitter: @gbworld

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: