The New Generic Kid on the Block–HashSet<T>

0 comments

Posted on 25th January 2008 by r2 in Generics

In mathematics, a Set is typically thought of as a collection of distinct objects that is usually defined by some rule that determines whether they are a member of that particular Set.  For example, a Set could be defined to contain "all the odd numbers under 100" or "every number divisible by 2" or whatever.  The main points here being that the objects in a Set are distinct (i.e. NO duplicate objects are allowed) and the objects are not ordered or sorted in any way.  (If you really feel the need to geek out on Set Theory, be my guest).

According to the MSDN Documentation for HashSet, "a set is a collection that contains no duplicate elements, and whose elements are in no particular order."  Sound familiar?  In the .NET Base Class Library, there has never been a true Set class until now with the release of .NET 3.5 and the brand-spanking-new System.Collections.Generic.HashSet<T> class.

In the past, when we needed to implement a Set in .NET, we sort of bastardized the List<T> or other unsuspecting Collection classes into something like this:

  1. int[] intArray = new int[]{ 1, 2, 5, 7, 4, 1, 7, 9, 8};
  2. List<int> intList = new List<int>();
  3. foreach(int theInt in intArray)
  4. {
  5.   // checking to make sure that the object is not in our List before adding
  6.   //  so we are calling Contains() and then Add() on every time through the loop
  7.   if (!intList.Contains(theInt))
  8.     intList.Add(theInt);
  9. }

What we needed was a Set that would let us just keep adding objects to it and free us from having to worry about it being there first.  A recent example of when I really needed a true Set class was while working on the Weather WidgetThe Weather Channel provides some 40+ images for use in their SDK, yet in the Weather Widget I never need more than maximum of 11 (and usually less as there is usually duplication).  So, I had the idea to dynamically zip the images that I actually needed for a given Zip Code and use that file for my image assets in Silverlight via CreateXamlFromDownloader. Obviously, I don’t want these images duplicated in the zip file.  So, I came up with this:

  1. HashSet<string> assets = new HashSet<string>();
  2. foreach (ForecastDay day in forecast.Days)
  3. {
  4.   // no Exception here if duplicate…it just doesn't add it.
  5.   assets.Add(Server.MapPath(String.Format("~/images/{0}", day.IconDay)));
  6. }
  7. Utility.WriteZipFile(assets.ToArray<string>());

This just scratches the surface of what HashSet<T> is capable of.  There are member methods available for most Set operations, such as IntersectWith(), UnionWith(), IsSubsetOf(), IsSupersetOf(), RemoveWhere(), etc.   Here’s a link to all of the HashSet<T> Members.

Recall that being a member of a set depends on some rule that determines this membership.  With the HashSet<T>, you can define your own rule of what it means to be in a Set.  For a good example of defining your own EqualityComparer, see Introducing HashSet<T> from Kim Hamilton on the BCL Team Blog.

So, welcome the New Kid on the Block to the Generic Collections.

One last random link I had on the subject for those interested in LINQ:

Good side-by-side comparison of the HashSet and LINQ Set Operations

No comments yet.

Sorry, the comment form is closed at this time.