Nested ConcurrentDictionary, is it a good Idea?

Question

Wednesday, August 21, 2013 12:18 PM

Hi, 

I have an application that's written in .Net 3.5 that I want to rewrite to 4.5.

The old version has a nested dictionary ; Dictionary<int, Dictionary< string, SomeClass>>, with a bunch of locks to add thread safety.

I know .Net 4.0 introduced the ConcurrentDictionary which has some thread safe methods and a good performance for concurrent reads/writes.

If I would change the nested dictionary to a nested concurrentDictionary ;

ConcurrentDictionary<int, ConcurrentDictionary<string, SomeClass>> , can the outer dictionary lock the dictionary so the inner dictionary doesn't allow concurrent accesses anymore on the same index or does this never happen?

All replies (11)

Thursday, August 22, 2013 12:47 PM âś…Answered

I think I need some kind of 2 Dimensional dictionary where I can read and write concurrently to each element in a thread safe way, the elements can be obtained like TryGetValue.

OK, so #1: databases are hard.

If your keys are <int,string> and your values are <SomeClass> and you want to be able to do things like give me all SomeClass objects where the int key is 5, or give me all SomeClass objects where the string key is "foo", then you'll need a more complex indexing mechanism than a dictionary.

In database-land, these are indices.  And there is a cost associated with having each index because it must be maintained as records are added, deleted or updated.  The payoff of maintaining an index comes from the query.  If your use of the database involves many lookups for records based on the keys then you'll want to maintain an index for that key.  But if the operations are mostly just add and remove, then you can suspend maintenance of the index until the adds and removes are complete, and then consolidate the changes, or rebuild the index.

A two-dimensional dictionary would support partial queries in either dimension, implementing both of these interfaces:

  • IDictionary<int,IDictionary<string,SomeClass>>
  • IDictionary<string,IDictionary<int,SomeClass>>

And getting back to what I said before about maintenance of keys (again this is a reason not to embed the thread-safety strategy in the data-structure) if you know a priori what you're going to do with the data structure then you can devise a locking strategy that suits your needs.  If you know that you will basically build the indices ahead of time, and then they won't change while you do all your queries, then you can safely multiple readers during the query phase of your processing because you know that no modifications can take place.  And having multiple readers of a normal Dictionary is already thread safe, provided you don't modify the collection because it's a read-only operation.  Why take a lock at all, no matter how efficient, if you can know a priori that it will never be held?

It could be better to have one uber-ReaderWriterLock for the global read-write phase, rather than a lock per dictionary that must be used in all cases.

It's always better to know more about the future and choose a strategy based on that information.

Anyway, I feel like we've answered your original question, which is that the outer dictionary can't lock an inner dictionary.

To answer your next question, yes ConcurrentDictionaries have better performance.  ConcurrentDictionaries use a memory barrier technique to remain consistent during concurrent reads and writes.  Heavier locks are only taken when the memory barrier detects contention.  So it will out-perform a system that uses a normal Dictionary with a single external lock.  Prefer a ConcurrentDictionary to a Dictionary and single external lock that is only for that dictionary.

But it won't outperform a system that knows a priori when locks are necessary and when they aren't -- i.e.: a program that knows more about what's going on than just the discrete operations on the dictionary.


Wednesday, August 21, 2013 12:37 PM

Here's a thought:  ConcurrentDictionary<Tuple<int,string>,SomeClass>

Just like how there are states, and a state contains cities, then you can refer to a city by a <state,city> pair.

Same thing here.  You don't necessarily need a separate dictionary of strings for each int.  You can just have single a dictionary where the keys are <int,string> tuples.  Then you only need one dictionary and you don't have to worry about the thread safety, and atomicity of accessing nested dictionaries.

Just throwing it out there as an alternative approach.


Wednesday, August 21, 2013 12:38 PM

Hi,

There will be thread safe concurrent access.

so your dictionary will fully be locked by the outer dictionary hence there will be no thread affinity in the inner dictionary

so an outer concurrent dictionary with an inner normal inner dictionary will do the trick

Regadrs,

Stygen


Wednesday, August 21, 2013 12:51 PM

Thank you for your replies, 

Alright, so the outer dictionary locks the inner one.

However, this is not really what I need, it is enough that the inner dictionary is thread safe, but I still need to be able to add and remove items in a thread safe way, and all of it in a hight performance way. I'm working in an application with high throughput that needs very high performance.

The ConcurrentDictionary<Tuple<int,string>,SomeClass> seems like a good solution, but what if i want to get a list of the outer dictionary based on the integer index?

For example OuterDictionary.TryGetValue(3, out innerDictionary); -> this works with a dictionary, but how do you do this with tuples ? GetAllKeys, loop through them and check item1 == 3?

Best regards


Wednesday, August 21, 2013 1:24 PM

1. performance issues need to be tested, so why dont u create a branch of what u do with the new concurrent classes and test it with Stopwatch class?

2. if ur afraid to change classes u can always just use the ConcurrencyBag class that is just a wrapper

3. and if ur so into performance learn deeper TPL (http://msdn.microsoft.com/en-us/library/dd460717.aspx), there is alot, ALOT new stuff to learn. 

EDIT:

let me add that there is a big diff between .net 3.5- to 4+ since they change all the core so with "worse" stuff u still may get better perf


Wednesday, August 21, 2013 1:42 PM

Thanks, I kinda know that stuff ;) 

I have some throughput tests setup and I worked with this stuff before, but not the Concurrent Collections from .Net 4.0 so much.

I know older stuff sometimes is faster, but if the performance doesn't degrade too much by adding the stuff, I think it's ok to add it, because if you work in a team with several developers and some of them forget to lock collections sometimes, or add too many locks, you'll have a lot more potential bugs, and thus it'll be worth it and save you headaches further down the road.

However, if the performance degrades too much, then it's another story.

I've also read that just creating tons of tasks isn't good for performance either, and another mental model is needed (reference to Stephen Toub :P).

Regards


Wednesday, August 21, 2013 1:55 PM

"Alright, so the outer dictionary locks the inner one. "

No way. Those dictionaries are completely independent in this regard, one doesn't somehow "lock" the other. In general, .NET's Concurrent collections tend to use lock free algorithms so any mention of a "lock" in this context is kind of bogus. They're thread safe but any locking that they might do is an implementation detail.

"this is not really what I need, it is enough that the inner dictionary is thread safe, but I still need to be able to add and remove items in a thread safe way,"

That's very unclear, you still need to add/remove items in a thread safe way to which dictionary? If only the inner dictionary needs to be thread safe then why not use Dictionary<int, ConcurrentDictionary<...>>? Perhaps you can present a piece of your current Dictionary + bunch of locks code to clarify things.


Wednesday, August 21, 2013 2:07 PM

Hi,

There will be thread safe concurrent access.

so your dictionary will fully be locked by the outer dictionary hence there will be no thread affinity in the inner dictionary

so an outer concurrent dictionary with an inner normal inner dictionary will do the trick

Regadrs,

Stygen

I disagree.

As far as I know, the outer dictionary does not lock the inner one.

Acknowledge, first, that the following is not guaranteed to be true, even with a single thread-safe dictionary:

dictionary.ContainsKey( dictionary.Keys.First() );

Why? Because although two or more threads can safely obtain the first key, and two or more threads can safely query whether a dictionary contains a key, there's nothing to say that that key hasn't been removed between when you get it (dictionary.Keys.First()) and when you query to see if it exists (dictionary.ContainsKey( foo )).  Only the access to the Keys enumeration and the ContainsKey operations are thread-safe.  The little expression we wrote makes two distinctly separate calls to the dictionary.  First, it gets the Keys, then second it queries if the dictionary contains that key.  No lock is held the entire time.  The combined set of calls on a dictionary are not protected by a lock.  Only each individual call.

What I'm illustrating here is that a single expression can access the dictionary twice where there is no lock surrounding the two discrete accesses.

Now, to say that the inner dictionary is somehow protected from corruption by virtue of the outer dictionary being thread-safe is incorrect.

Consider two or more threads that execute this code:

outer[1].Add("hello", foo ); 

The threads can safely determine which inner dictionary is referred to by outer[1], even if the outer dictionary is being modified.  But when both threads attempt to perform Add("hello", foo) on that inner dictionary, they risk corrupting the data structure of the inner dictionary because it is not a thread-safe data structure.

In other words, the call to outer[1] is protected by a lock in that it will not read a dictionary that is being modified, but once the result is obtained (which is just a reference to the inner dictionary of Dictionary<string,SomeClass>) the second call is made, which is an Add operation.  Two threads both calling Add on the same dictionary run the risk of corrupting it.

Getting back to my first example (dict.ContainsKey(dict.Keys.First()), this is why I find "thread safe" data structure useless.  They are only safe from themselves.  You almost always need some kind of outer lock to maintain consistency of the data stored within.

A perfect example is: how can two threads add a key to the outer dictionary?  If each does:

outer.Add( 1, new Dictionary<string,SomeClass>() );

Then someone will get an ArgumentException: "An item with the same key has already been added."

So you might write the code:

if( !outer.ContainsKey( 1 ) )
{
   outer.Add( 1, Dictionary<string,SomeClass>() );
}

But this doesn't solve the problem either.  Because both threads can check and observe that the collection doesn't contain the key, then both threads can attempt to add an item with the same key.  You need some kind of a lock to guard the whole process of checking to see if it contains a key and adding the new one.  Having a ConcurrentDictionary doesn't help you.  True that we didn't corrupt the thread-safe dictionary itself, but our logic about what the thread-safe dictionary contains is wrong unless we have a lock of our own.

Fortunately the ConcurrentDictionary has a GetOrAdd method, but in this case, you have to create two dictionaries to use it.

Dictionary<string, SomeClass> inner = outer.GetOrAdd( 1, new Dictionary<string, SomeClass>() );

One thread will succeed, and the others will not add the key, which means they will also drop their newly created dictionary on the floor.  This is a bad performance overhead problem that could be avoided with an external lock and a lock-free outer dictionary.

You can see by the return type that the inner dictionary is not a ConcurrentDictionary, and any access we perform on it will run the risk of corrupting it.

Just because a data-structure is "thread safe" doesn't mean you ever get to just wash your hands of responsibility for using it correctly or efficiently.

[A personal note: In my industry, performance is paramount, so I would always suggest the non-thread safe data structures for efficiency of the discrete operations, with well-planned external locks where necessary to maintain integrity.  This almost always means using a senior developer though as the juniors just don't seem to get it, and we instead put the juniors to work to write code that isn't performance critical.  But I love turning juniors into seniors through education, and they seem to appreciate it too!]


Wednesday, August 21, 2013 2:35 PM

Hi Wyck,

True what you said about "dictionary.ContainsKey( dictionary.Keys.First())", however, I wouldn't use that, ConcurrentDictionary has the TryGetValue, TryRemove and AddOrUpdate functions that solve that problem.

"Dictionary<string, SomeClass> inner = outer.GetOrAdd(1, new Dictionary<string, SomeClass>());" 

I see your point, however, in this case, the dictionaries will not be added concurrently.

I just realized I should have explained a bit more;

The nested dictionaries are created on startup (non-concurrent), and then they are being accessed concurrently and only update the SomeClass instances concurrently.

Adding and removing of nested dictionaries may be possible in the future, but can be limited so that no 2 concurrent Adds/Removes can happen.

I read somewhere that the ConcurrentDictionaries have better performance for concurrent reads/writes on the elements, isn't this true?


Thursday, August 22, 2013 6:24 AM

Btw, the Tuple dictionary also has an issue ; you need to create a tuple each time you want to read, in order to get a key to access the dictionary, and afterwards it's garbage, ready to be collected by the GC, creating unnecessary GC.

I think I need some kind of 2 Dimensional dictionary where I can read and write concurrently to each element in a thread safe way, the elements can be obtained like TryGetValue.

Additions and removals can then be done on a single thread, so they don't add and remove items at the safe time -> no concurrency issues there. If a TryGetValue exists, then there should be no problem with a reader trying to access an element that has been removed.

Any suggestions?


Thursday, August 22, 2013 2:16 PM

Alright, ConcurrentDictionary question answered then.

I'll give some kind of metaphore to how the code works, I guess it would be similar to this ;

Dictionary<CompanyId (int), Dictionary<DepartmentName (string), DepartmentInfo>>.

This isn't the data that's stored in my structure, but it would work in a similar way.

The concurrent readers will want to get a DepartmentInfo directly, and they will always have both the CompanyId and DepartmentName accessable at the same time, without needing to loop through the inner dictionary.

From time to time, you'd want to add some Companies or Departments to a company, but this action can be restricted so that 2 additions don't run concurrently.

Likewise, you'd also want to eliminate Companies or Departments belonging to a company, 

as before, it can be restricted so that 2 removals don't run concurrently.

Removals and additions can be restricted so that they don't happen concurrently.

Preferrably, you'd want to be able to access  D[1]["HR"] and D[4][ "Marketting"] concurrently (readers ofc), without them locking each other. However, 2 concurrent accesses to  D[1]["HR"] should never take place.

A situation that needs to be considered is a removal at the same time as read can cause a problem if they access the same field (for example D[1]["HR"])

If you consider a ConcurrentDictionary as an 1 dimensional dictionary, then a 2 dimensional concurrentDictionary would be the way to go. But no such thing exist AFAIK.