Which STL Container?

Included as part of the C++ Standard Template Library (STL) is a collection of generic containers. Each of these containers serves a different purpose and has different pros and cons. It is often difficult to decide which container to use and when to use it.

This article is a guided tour of the STL containers. It is not intended to teach you how to use each of these containers; rather, it will help you decide when and where to use each of them. It will also show you a few tips, tricks and snippets of information that are not normally documented elsewhere but may come in handy.

The target audience for this article is intermediate and above. I don’t necessarily discuss every new concept I introduce but every time a new and important concept is introduced I will be sure to provide a link to a reference where you can read more if you need to.

This article discusses the following STL containers:

  • vector
  • deque
  • list
  • set
  • map
  • multiset
  • multimap
  • bitset

Vector

The vector is a sequence container that represents an abstraction of a dynamic one dimensional array. It manages an internal buffer which can automatically grow to accommodate the contents. The allocation strategy used to allocate the internal buffer is not defined by the C++ Standard and is dependent upon the allocator. Unless a user defined allocator is provided it will default to using a vendor specific allocator, which will provide a general allocation strategy for best case general use. Typically, the default allocator will just double the size of the internal buffer when current capacity is reached; however, different compilers may use different strategies so you are advised to consult your compilers documentation.

The internal buffer of a vector is guaranteed to be binary compatible with a standard C array. This means it can safely be used with legacy code. There is no specific operator to provide a C array compatible pointer; rather, you take the address of the internal

buffer as though you were taking the address of the first element in an array.

An example of accessing the internal buffer of a vector

vector<int> v(10, 0); // create a vector with 10 ints, each set to 0
int * p = &v[0]; // Get a pointer to the internal dynamic array

This buffer is guaranteed to be perfectly safe to use and 100% compatible with a C style array as long as no mutable methods are called on the instance the buffer belongs to. In much the same was that mutable methods on a vector may invalidate iterators so, too, can they invalidate the internal buffer. Why is this? Well, put simply, the internal buffer may need to be reallocated to allow for extra capacity. When this happens the vector creates a whole new buffer, which will be bigger than the existing one, all data from the old buffer are copied to the new buffer and the old buffer is then destroyed. It should be clear that if this happens your C style pointer will be pointing to invalid memory.

A specific exception to be aware of to the C array guarantee is vector<bool>, which the C++98 version of the C++ Standard specifies should be specialised such that each Boolean element only occupies one bit. There are other considerations with this specialisation too. For example, the subscript operators do not return a reference to a value within the vector, they return a copy of a bool that is constructed when the operator is called, which represents the Boolean state for the bit it represents.

The vector class is very efficient in terms of memory usage. As well as the size of the allocated buffer it has a very small overhead in terms of additional members of the class to manage its internal state. Typically a an empty vector will utilise between 16 to 24 bytes (depending upon how it‘s implemented, which is not defined by the C++ Standard).

The size of the buffer may very well be greater than the size of the data held within the vector. This is because the capacity of the vector (over actual size of the internal buffer) will always be at least as big as the size (the amount of the capacity in use) but will usually be greater.

When a vector needs to grow it must allocate a new buffer, copy the old to the new before destroying the old. This means that during mutable function calls it is possible that the amount of memory consumed by the vector may, for a short period, be twice as much as the previous capacity.

The internal buffer of a deque may not shrink when the container is cleared. This can be a problem if you have released a large quantity of data and want to reclaim the memory being used. There is a simple trick, called The Self Swapping Idiom that can be used. Basically, swap the vector to be cleared with a temporary empty one. When you use the swap method on a vector (this does not, necessarily, apply when using std::swap) all that happens is that the pointer values to each of the buffers is swapped, making a swap a very efficient process. When the other vector is destroyed all the memory it now has goes with it. Your original vector will now contain an empty buffer.

The vector is most efficient when appending items (pushing back).If you think of the vectors internal buffer as a stack of plates with the top plate being its back it’s easy to add new plate, you just put them on top. What if we want to insert a plate in the middle or at the bottom? Well, now it’s a little more complex. We have to lift up the plate to make space to insert a new one. It’s the same with a vector. If we want to insert a new item every item after it must move back to make space. The nearer the front you insert the longer it takes as the more items there are to move. Likewise, deleting anywhere other than the back of the vector is equally inefficient.

When appending to or removing from the back of a vector the time complexity is O(1). In other words, constant time as long as the buffer does not need to be reallocated.

If the number of elements being inserted into or removed from a vector is know beforehand the time complexity is O(N+M) otherwise the time complexity is O(NM), where N is the number of items being inserted or removed and M is the number of items that need to be moved.

In all cases of adding to a vector, the time complexities do not take into account the fact that the vector may also need to reallocate the internal buffer. This has a time complexity of O(N+M), where N is the original buffer size and M is the new buffer size. Where possible, you should use the reserve method on vector to pre-allocate the internal buffer to minimise the requirement to reallocate the internal buffer. Although vector will grow dynamically it is most optimal to use when you know in advance how big the buffer needs to be.

The vector is random access, meaning you can read any element of a vector in constant time, or O(1). This makes vector ideal for implementing constructs that require low latency reads.

Consider using a vector if:

  • you need to store data when you have a rough idea, in advance, of the number of items
  • data can be either added all in one go or can be appended to the existing data
  • if you want to be able to access the contents in any order

Avoid using a vector if:

  • you need to do frequent inserts or deletes to anywhere other then the back of the vector
  • you do not know, in advance, roughly how much data you plan to put in it

A simple example of using vector

#include <vector>
#include <iostream>
#include <iterator>
#include <algorithm>

int main()
{
	std::vector<int> c;
	c.reserve(10);

	for(int x = 0 ; x < 10 ; ++x)
	{
		c.push_back(x);
	}

	std::copy(c.begin(), c.end(), std::ostream_iterator<int>(std::cout, "n"));
}

Deque

The deque is a sequence container that represents an abstraction of a dynamic one dimensional array. It manages an internal buffer which can automatically grow to accommodate the contents. The default allocation strategy used to allocate the internal buffer is not defined by the C++ Standard. Unless a user defined allocator is provided it will default to using a vendor specific allocator, which will provide a general allocation strategy for best case general use.

The size of the buffer may possibly be greater than the size of the data held within the deque. This is because the capacity of the deque (over actual size of the internal buffer) will always be at least as big as the size (the amount of the capacity in use) but may be greater.

The actual internal implementation of the deque buffer is completely implementation dependent so we cannot draw any assumptions about how costly it is to grow the buffer; however, it is reasonable to assume that the worse case is going to be linear, of more specifically O(N) where N represents the size of the new buffer.

Unlike vector, deque provides no mechanism to reserve a buffer. However, this isn’t actually such a big deal since deque doesn’t have to comply with the contiguous memory requirements of vector (which it needs to remain backwards compatible with C arrays). Since this is the case compiler vendors are able to implement memory allocation strategies that are far more Operating System friendly. Or, to put it another way, it does not suffer from the same reallocation bottleneck of vector.

The internal buffer of a deque may not shrink when the container is cleared. This can be a problem if you have released a large quantity of data and want to reclaim the memory being used. Like vector the trick is to swap the deque with an empty one and then destroy the original one that was empty

The deque gets its name from a contraction of “double ended queue” and is most efficient when appending items to the front or back (vector is only efficient when appending to the back).  If you think of the deque’s internal buffer line of cups it’s easy to add new cup at either end. What if we want to insert a cup in the middle? Well, now it’s a little more complex. We have to move some of the cups to the left or the right to make space to insert a new one. It’s the same with a deque. If we want to insert a new item every item after it must move backwards or forwards to make space. Likewise, deleting anywhere other than the back or front of the deque is equally inefficient.

When appending to or removing from the back and front of a deque the time complexity is O(1). In other words, constant time as long as the buffer does not need to be reallocated.

If the number of elements being inserted into or removed from a deque is know beforehand the time complexity is O(N+M) otherwise the time complexity is O(NM), where N is the number of items being inserted or removed and M is the distance or number of items that need to be moved.

In all cases of adding to a deque, the time complexities do not take into account the fact that the deque may also need to reallocate the internal buffer.

The deque is random access, meaning you can read any value of a deque in constant time, or O(1). This makes deque ideal for implementing constructs that require low latency reads.

Consider using a deque if:

  • you need to store data and are likely to can be appended to the existing data at either end
  • you want to be able to access the contents in any order

Avoid using a deque if:

  • you need to do frequent inserts or deletes to anywhere other then the back or front of the deque
  • you need to have compatibility with C style arrays (use vector)

A simple example of using deque

#include <deque>
#include <iostream>
#include <iterator>
#include <algorithm>

int main()
{
	std::deque<int> c;

	for(int x = 0 ; x < 10 ; ++x)
	{
		if(0 == (x%2))
		{
			c.push_back(x);
		}
		else
		{
			c.push_front(x);
		}

	}

	std::copy(c.begin(), c.end(), std::ostream_iterator<int>(std::cout, "n"));
}

List

The list is a sequence container that represents an abstraction of a doubly linked list. The list consist of nodes, which are allocated on a need too basis. The memory foot print of an empty list is very small; however, the memory footprint of each node is actually quite large and can, for example exceed the size of the data being stored in the node.

For example, if we store 32 bit int in the node on a 32 bit platform the overhead for each node will probably be at least 64 bits (the next and prev pointer, three times as much as the size of the data)! This means that list is, generally, a poor choice if you are looking to store small data items especially if you plan to store lots of them. Put another way, if you were storing 1 GB of 32 bit int values in the list it would have a memory footprint of at least 3GB.

Although the cost of a node, in terms of memory, is quite high the list makes up for this by virtual of the fact that memory is only allocated when a node is created and can be released when the node is destroyed.  This means that it’s a good candidate for storing an unpredictable number of items, especially if the number of items is likely to fluctuate.

The downside is that unless the allocator used takes care to avoid it, rapid allocation and de-allocation of nodes in a list can lead to bad memory fragmentation.

Most STL implementations try to get around this by allocating a memory in chunks and using this to allocate the nodes. When a node is released the memory it used in the chunk becomes free. This is not a C++ Standards requirement so you should check your compilers documentation to see if this behaviour is supported.

Since a list is, essentially, a sequential chain of nodes where one node points to the next and that in-turn points back it follows that if a node is remove or inserted into the list it will not invalidate the rest. This means that it is perfectly safe to iterate through a list whilst it is being modified; only the iterator to the modified node become invalid in the case of a node being removed.

It also follows that since a list is just a sequential chain of nodes access to them is also sequential and not direct as in the case of, for example, the items in a vector. In fact, access to a node within a list takes liner time, or more specifically it has a time complexity of O(N) where N is the distance from the node being accessed from the point we started from. This makes list unsuitable for certain algorithms such as a binary search.

On the other hand, inserting, moving (even between type identical lists) or removing a node requires nothing more than the modification of the next and previous pointers in the three nodes affected (the previous, the current and the next) and so takes constant time, or more specifically it has a time complexity of O(1).

Consider using a list if:

  • you need to store many items but the amount cannot be predicted
  • you need to perform lots of inserts or deletes that are not at the start or end of the sequence of data

Avoid using a list if:

  • the size of your data items is small, especially if you need to store many of them
  • you need constant time random access to the items

A simple example of using list

#include <list>
#include <iostream>
#include <iterator>
#include <algorithm>

int main()
{
	std::list<int> c;

	for(int x = 0 ; x < 10 ; ++x)
	{
		if(0 == (x%2))
		{
			c.push_back(x);
		}
		else
		{
			c.push_front(x);
		}

	}

	std::copy(c.begin(), c.end(), std::ostream_iterator<int>(std::cout, "n"));
}

Set

The set is a sorted associative container that represents an abstraction of an ordered unique key. Although the exact implementation of the data structures that make up a set are not specifically defined in the C++ Standard they are typically implemented as a Binary Search Tree (more specifically, a common implementation is a Red/Black tree).

Just like list, set is a node based container meaning iterators are not invalidated when items are added or removed with the exception of the iterator of the specific item being removed.

The keys in a set must be unique, attempting to add the same key more than once just replaces the existing item with the end result being nothing more than wasted time.

Accessing, adding and removing items in a set can be done directly via the key value although, unlike vector and deque, the access does not take place in constant time. In fact time taken is generally logarithmic, or more specifically it has a time complexity of O(log N), where N is the distance from the first node to the target node.

Just like list, one needs to consider the possible size overhead of the framework that makes up a node versus the size of the data being stored. If the size of the data is small, say the size of a 32 bit int on a 32 bit platform it is likely that the size of the node will be at last 3 times that of the original data item.

Since set is a sorted associative container based on a Binary Search Tree it follows that keys are stored in a predetermined order. By default the order of the keys is determined by thestd::less comparison predicate. You can override this behaviour by providing your own comparison predicate as one of the template parameters.

As an alternative to set you might want to consider Google’s sparse_hash_set anddense_hash_set. Both of these are unsorted associate containers that are implemented using has tables rather than Binary Search Trees. They have several advantages over set, the mains ones are constant time lookup (best case, worse case can be linear!) and much lower cost in terms of data to framework overhead for each data item stored. Unlike set, the data is not sorted making hash table a poor choice if you need to access data in sorted order.

Consider using a set if:

  • you need to store data items in a sorted order and you need this sorting to be enforced in real time
  • you need to filter out duplicates from a collection and wish to retain that list for further use
  • you cannot use a 3rd party hash set
  • the nature of your data means it doesn’t hash well, resulting in many collisions

Avoid using a set if:

  • the size of your data items is very small especially if you need to store lots of them
  • you cannot afford to accept the fact that access is not done in constant time
  • you can get away with using a hash set

A simple example of using set

#include <set>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <cstdlib>
#include <ctime>

int main()
{
	std::set<int> c;

	srand((int)time(0));

	for(int x = 0 ; x < 10 ; ++x)
	{
		c.insert(rand());
	}

	std::copy(c.begin(), c.end(), std::ostream_iterator<int>(std::cout, "n"));
}

Map

The map is a sorted associative container that represents an abstraction of an ordered unique key, with a related value. Although the exact implementation of the data structures that make up a map are not specifically defined in the C++ Standard they are typically implemented as a Binary Search Tree (|more specifically, a common implementation is a Red/Black tree).

Just like list, map is a node based container meaning iterators are not invalidated when items are added or removed with the exception of the iterator of the specific item being removed.

The keys  in a map must be unique (although the related values do not have the same restriction), attempting to add the same key more than once just replaces the existing item and its value. Of course, if the new key has a different value then this will modify the value of the key in the map.

Accessing, adding and removing items in a map can be done directly via the key value although, unlike vector, the access does not take place in constant time. In fact time taken is generally logarithmic, or more specifically it has a time complexity of O(log N), where N is the distance from the first node to the target node.

Just like list, one needs to consider the possible size overhead of the framework that makes up a node versus the size of the data being stored. If the size of the data is small, say the key size size of a 32 bit int and, likewise the value, on a 32 bit platform it is likely that the size of the node will be at last 2 times that of the original data item.

Since map is a sorted  associative container based on a Binary Search Tree it follows that keys are stored in a predetermined order. By default the order of the keys is determined by the std::less comparison predicate. You can override this behaviour by providing your own comparison predicate as one of the template parameters.

As an alternative to map you might want to consider Google’s sparse_hash_map anddense_hash_map . Both of these are unsorted associate containers that are implemented using has tables rather than Binary Search Trees. They have several advantages over map, the mains ones are constant time lookup (best case, worse case can be linear!) and much lower cost in terms of data to framework overhead for each data item stored. Unlike map, the data is not sorted making hash table a poor choice if you need to access data in sorted order.

Consider using a map if:

  • you need to store data items in a sorted order of the key and you need this sorting to be enforced in real time
  • you need to filter out duplicates from a list and wish to retain that list for further use
  • you cannot use a 3rd party hash map
  • the nature of your data means it doesn’t hash well, resulting in many collisions

Avoid using a map if:

  • the size of your data items is very small especially if you need to store lots of them
  • you cannot afford to accept the fact that access is not done in constant time
  • you can get away with using a hash set

A simple example of using map

#include <map>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <cstdlib>
#include <ctime>

namespace std
{
	ostream & operator << (ostream & os, pair<int, int> const & val)
	{
		return (os << val.first << "=>" << val.second);
	}
}

int main()
{
	std::map<int, int> c;

	srand((int)time(0));

	for(int x = 0 ; x < 10 ; ++x)
	{
		c[rand()] = x;
	}

	std::copy(c.begin(), c.end(), std::ostream_iterator<std::pair<int, int> >(std::cout, "n"));
}

Multiset

The multiset is a sorted associative container that represents an abstraction of an ordered non-unique key. The only difference between set and multiset is that a multiset is allowed to have duplicate keys.

A simple example of using multiset

#include <set>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <cstdlib>
#include <ctime>

int main()
{
	std::multiset<int> c;

	srand((int)time(0));

	for(int x = 0 ; x < 10 ; ++x)
	{
		c.insert(rand());
	}

	std::copy(c.begin(), c.end(), std::ostream_iterator<int>(std::cout, "n"));
}

Multimap

The multimap is a sorted associative container that represents an abstraction of an ordered non-unique key. The only difference between map and multimap is that a multimap is allowed to have duplicate keys.

A simple example of using multimap

#include <map>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <cstdlib>
#include <ctime>

namespace std
{
	ostream & operator << (ostream & os, pair<int, int> const & val)
	{
		return (os << val.first << "=>" << val.second);
	}
}

int main()
{
	std::multimap<int, int> c;

	srand((int)time(0));

	for(int x = 0 ; x < 10 ; ++x)
	{
		c.insert(std::pair<int, int>(rand(), x));
	}

	std::copy(c.begin(), c.end(), std::ostream_iterator<std::pair<int, int> >(std::cout, "n"));
}

Bitset

The  bitset isn’t really a container since it doesn’t expose any iterators; however, I list it here for completeness. The bitset is basically a fixed size bitfield, where the size is defined at compile time as a template parameter. It is useful if you need a bitfield that is of a size that doesn’t match a standard intrinsic integer. It’s also useful as it can automatically convert a string of an appropriate format to a bitfield value and vice-verca.

A simple example of using bitset

#include <bitset>
#include <iostream>

int main()
{
	std::bitset<8> c(0xA7);
	std::cout << c.to_string() << std::endl;
}

Final word

That concludes our guided tour of the basic STL containers. Now you have a good grounding in STL containers you might want to read An Introduction to STL Algorithms. This excellent article by, “w00te”, goes into some depth about STL Algorithms and how they can be used with containers.

If you wish to find out more about STL containers please refer to the following excellent guide.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s