Category Archives: c# Collections

.NET Collections: Comparing Performance and Memory Usage

I develop several web sites with dictionary search engines where you can search for words in a lots of different ways. Because of the flexible searches with regular expressions, normal indexes in databases engines are almost useless.

The current solution is to load all objects into memory. The search data is fairly static and  does not grow much, so it works quite well. The current solution finds the results most of the time in a few hundreds milliseconds (0.001 to 0.5 seconds).

Recently I have been looking at the graph database Neo4j, but because of the kind of searches that are done on the pages the results on test data was incredibly poor. Most queries took 2 to 9 seconds and that was on test data that was only a small part of the real data and only 5% of the relations was generated. I might give it another try with real data when Neo4j 2.0 is released as stable, but I don’t think I will perform well with this special case.

So the next step is to do some memory indexing on my own. The first question was to find out which of the collections in .NET is the fastest and which uses the least memory.

The tests were all done on my laptop with as few programs running as possible. This would make the results fairly reliable, but I can’t guarantee that the results are fully correct. But I trust them and will base my future development upon them. 

The time column shows the total time of a fixed number of lookups.

One string stored 10 million times with <int> Key

This test stores the same string in all the records in the collection. Because of this the difference in size should only reflect the key index.

Size (MB) Time (ms)
Dictionary<int, string> 267 235
SortedDictionary<int, string> 534 4484
Hashtable 547 1851
SortedList<int, string> 114 2156

chart_1

10 million unique strings stored with <int> Key

This test stores different strings in all the records in the collections. This will create a more realistic test of the memory usage.

Size (MB) Time (ms)
Dictionary<int, string> 706 368
SortedDictionary<int, string> 973 4590
Hashtable 985 2286
SortedList<int, string> 553 2122

chart_2

10 million unique strings stored with <string> Key

If I could have strings as keys I would be able to use the collections in more ways than one. But as you can see some of the lookups became more than 10 times slower than in the tests above.

Size (MB) Time (ms)
Dictionary<int, string> 706 1957
SortedDictionary<int, string> 973 66057
Hashtable 757 2950
SortedList<int, string> 591 74534

chart_3

1 million sub-collections with 10 strings each with <int> Key

This was another test I did to see if it would be useful to store the data in a tree, so each records in the collection holds another collection. This is a special case for my needs and I don’t expect anyone else to have any use of it. But I did the test, so I give you the numbers.

Size (MB) Time (ms)
Dictionary<int, string> 431 273
SortedDictionary<int, string> 656 801
Hashtable 733 1634
SortedList<int, string> 245 618

chart_4

 

Conclusion

The collection with the best performance is Dictionary<> and the collection with most compact footprint in SortedList<>.

The most surprising to me was that the SortedDictionary<> performed so badly.

References

In an earlier post i linked to this blog that has a great post about performance of the different collections. It has a nice list with all the different collections, but does not talk anything about the memory usage.

http://geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/c.net-fundamentals-choosing-the-right-collection-class.aspx

Comparing C# Collection Classes

While I was looking for a way to optimize flexible searches in my collections stored in memory, I came across this page. It has a great list with just the information I needed when deciding witch collection class to use for my memory indexes. I have collections with up to 1 million objects in them and I want (if possible) to retrieve subsets of them in less than 100 ms. And since each subset selections queries several properties randomly, I’m trying to optimize it with memory indexes.

http://geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/c.net-fundamentals-choosing-the-right-collection-class.aspx

 

My first plan was to scan multiple (sorted) indexes at the same time, but that was not as fast as I thought it would be. Currently the fastest way is to compare indexes two and two to get the intersection. My first try with Dictionary<T, T> was very fast, but used far too much memory. So the best way right now is to use List<T> (with IDs of int) and find intersections of indexes by looping and searching with BinarySearch().

Here is some more information about the BinarySearch():

http://www.dotnetperls.com/binarysearch