Contents
This article and the code that accompanies it present an intrusive
reference counting
smart pointer.
If you don't know what it means click on the links in the previous sentence. The word intrusive
signifies that the reference counting is done by the object itself rather than external code. If
you ever used Microsoft COM then this is exactly the kind of reference counting used there. If you
are familiar with Boost or TR1 shared_ptr this is exactly what they are not doing.
To avoid typing 'reference counting' or 'intrusive reference counting' every time I will use
RC and IRC abbreviations. Similarly ERC will stand for external reference counting as practiced
by shared_ptr and similar classes.
First you might not have a choice. A framework or library you are using might require IRC.
Microsoft COM does for example and so do some other important libraries. You can in principle use
a non-intrusive RC smart pointer with IRC object if it supports custom acquire/cleanup actions but
doing so is more complicated and introduces unnecessary overhead.
Which brings us to another point. IRC can and usually does result in better performance than ERC.
An ERC smart pointer has to store the count somewhere and it will be either in the pointer object itself
or some external location. In the first case you pay in pointer size while in the second you pay in
additional allocation and cache misses when accessing the count (which is in a different location than
both the pointer and pointee). To put it in a different perspective all other things being equal
under best possible circumstances an ERC can only be as efficient as IRC but never better.
Yet another advantage of IRC is that you can produce an IRC smart pointer from any raw pointer
to the object including this. This means that you can easily produce smart
pointers to itself
from any of the object's member functions. This is in general impossible with ERC without modifying
the object. But then if you already modify it why not make it IRC and avoid all the trouble.
Of course nothing comes for free. The main disadvantage of IRC is of course the intrusive
part. In order to use it you must design the target object class to support reference counting.
However, this can be addressed for most
object types by using templated reference
counted adapters as shown at the end of this
article. However, template adapters are not
perfect and in some cases ERC remains the only available option.
Another IRC disadvantage is that it is hard to support so-called weak pointers. Not that it is
impossible but doing it requires some framework for object registration which pretty much removes
all IRC advantages.
Having discovered shared_ptr some people tend to treat it as the best, indeed default, smart pointer.
This is, in my opinion very misguided. An ERC smart pointer like shared_ptr is just one possible choice
and should be used only when it fits the problem you are trying to solve. If you
need single owner semantics with ownership transfer use auto_ptr, if you need single owner with no transfer
use scoped_ptr. If you need RC, control
target object source and don't need weak pointers consider and IRC
smart pointer like refcnt_ptr presented here. Finally if you need RC but don't control the target or need
weak pointers, shared_ptr is, indeed, a good choice.
Well, because I couldn't find one that I liked. There are quite a few well-known IRC smart pointers
implementations but they all have some problems I couldn't live with. First of all most smart pointers come
packaged within a library. boost::intrusive_ptr requires Boost, Loki's smart pointer requires Loki etc.
This is fine if you are already using these nice libraries but can be a problem if you don't. In a commercial
software development world bringing in a new 3rd party library can be a tricky proposition (for very
good reasons) and people often cannot do it. What I needed is an independent smart pointer class that
can be copy/pasted into any code base. (Not that you cannot grab the code of Boost or Loki smart pointers - it is
just more complicated than it ought to be)
Even ignoring the problem above there are some fundamental flaws with pretty much every IRC smart pointer
implementation. To begin with all of them include a constructor of the form
smart_ptr(T * p);
This is very, very bad design. It asserts that there is a conversion from raw pointer to smart pointer. However,
with IRC there is always the question whether such conversion results in an increment of reference count. There
is no single good answer to it. In some situations you want the count to go up and
in some you don't. Pretending that there is one default choice is misleading.
To get around this problem some IRC smart pointers include another constructor of the form
smart_ptr(T * p, bool add_ref);
Which is practically an example of how not to use bool parameters. Is it so hard to introduce an enum
with easy to comprehend name so the reader doesn't have to guess what smart_ptr(p, false) means?
Some smart pointers from Microsoft and others go further and commit various other sins. They arent generic
enough and require the pointee to have functions with some exact signature prescribed by the smart pointer. For
example if your 'decrement reference count' function isn't named Release() you cannot use MS smart pointers.
They also may overload operator & (address of) to return the address of the stored pointer. This is done
with best intentions to support passing &smart_ptr<T> to a function expecting T ** but this hack
makes it impossible to store such smart pointer in containers or do anything else that might require taking its
real address.
To summarize, most IRC smart pointers out there suck and so I decided to create my own
The source code for refcnt_ptr can be downloaded from
here. It is under
MIT style license so you are free
to include it in commercial or open source projects as long as you comply with it.
If you have any questions or constructive criticism feel free to contact me. I will try to answer
all mail but please be patient, working on open source code doesn't pay my bills.
In order to use refcnt_ptr with your class you need to define two function that tell it how to increment and
decrement the reference count for this class. For a refcnt_ptr<T> you need to define
void refcnt_add_ref(T & unk);
void refcnt_sub_ref(T & unk);
Note the reference rather than pointer parameter. It is so because you will never get a NULL parameter
in this function so don't worry about it.
These functions can be in any namespace as long as a lookup for T & can find them.
Here is an example COM's IUnknown interface
inline
void refcnt_add_ref(IUnknown & unk)
{
unk.AddRef();
}
inline
void refcnt_sub_ref(IUnknown & unk)
{
unk.Release();
}
Basic usage
Having declared/defined these two functions now you can use refcnt_ptr<T> as follows
refcnt_ptr<T> pT1 = noref(new T); //create from raw but don't increment ref count
refcnt_ptr<T> pT2;
pT2 = noref(new T); //assign from raw but don't increment ref count
T * pRawT3 = ...;
refcnt_ptr<T> pT3 = ref(pRawT3); //create from raw and increment ref count
T * pRawT4 = ...;
refcnt_ptr<T> pT4;
pT4 = ref(pRawT4); //assign from raw and increment ref count
refcnt_ptr<T> pT5 = pT4; //normal copy
refcnt_ptr<T> pT6;
pT6 = pT5; //normal assignment
T * pRawT7 = pT6.c_ptr(); //get the raw pointer. note the name which is IMO way
//better than meaningless get()
T & t8 = *pT6; //usual dereference
pT6->foo(); //and arrow operator
pT6.reset(); //releases the reference and sets the stored pointer
//to NULL. note that there is no argument to this function
T * pRawT8 = pT5.release(); //sets the stored pointer to NULL and returns its
//previous value. reference count is unchanged
pT4.swap(pT3); //the usual swap operation
Comparisons
refcnt_ptr<T> can be natuarally compared with both raw pointers and other refcnt_ptr<T>
refcnt_ptr<T> pT1 = ..., pT2 = ...;
T * pRawT = ...;
if (pT1 == pT2) ....;
if (pT1 == pRawT) ....;
if (pRawT == pT1) ....;
//and the same for !=, <, <=, >, >=
Conversions
In general if you have two types T and Y such as Y * can be implicitly converted to T * then
refcnt_ptr<T> can accept Y * and refcnt_ptr<Y> in any of the constructors and assignment
operators. To use the examples from above
refcnt_ptr<T> pT1 = noref(new Y); //create from raw but don't increment ref count
refcnt_ptr<T> pT2;
pT2 = noref(new Y); //assign from raw but don't increment ref count
Y * pRawY3 = ...;
refcnt_ptr<T> pT3 = ref(pRawY3); //create from raw and increment ref count
Y * pRawY4 = ...;
refcnt_ptr<T> pT4;
pT4 = ref(pRawY4); //assign from raw and increment ref count
refcnt_ptr<Y> pY4 = ...;
refcnt_ptr<T> pT5 = pY4; //normal copy
refcnt_ptr<T> pT6;
pT6 = pY4; //normal assignment
refcnt_ptr<T> pT7 = ...;
refcnt_ptr<Y> pY7 = ...;
Y * pRawY7 = ...;
if (pT7 == pY7) ....;
if (pT7 == pRawY7) ....;
if (pRawY7 == pT7) ....;
//and the same for !=, <, <=, >, >=
Conditionals
refcnt_ptr<T> can be used in coditional expressions. This is done without
introducing any unintended conversions to an integers
refcnt_ptr<T> pT = ....;
if (pT) ...; //fine
while(!pT) ...; //fine
pT + 7 //won't compile
Explicit conversions
If you need to convert refcnt_ptr<Y> to refcnt_ptr<T> where Y * can only be converted
to T * using any of non-implicit C++ casts
(static, dynamic, const and reinterpret) you can of course go through raw pointers but this is tedious.
So instead you can use the following helpers to save you some time
refcnt_ptr<T> pT = ....;
refcnt_ptr<const T> pConstT = ....;
refcnt_ptr<Y> pY = ....;
pT = refcnt_const_cast<T>(pConstT);
pT = refcnt_static_cast<T>(pY);
pT = refcnt_dynamic_cast<T>(pY);
pT = refcnt_reinterpret_cast<T>(pY);
This is basically all you can do with refcnt_ptr. Simple isn't it?
Requirements for type T
- The expressions T * should form a valid data pointer
- If T * p then expressions refcnt_add_ref(*p) and refcnt_sub_ref(*p) must be valid function calls
- If T * p points to a valid object then this object must remain alive if refcnt_add_ref(*p) was called more times than
refcnt_sub_ref(*p)
Exception safety
- refcnt_sub_ref function supplied by user must not throw.
If it does using refcnt_ptr results in undefined behavior
- refcnt_add_ref function supplied by user may throw freely
- if refcnt_add_ref does not throw then all functions of refcnt_ptr do not throw either (nothrow ES guarantee)
- if refcnt_add_ref throws then constructors and assignment operators of refcnt_ptr provide
strong ES guarantee, while all other functions are still nothrow
Thread safety
- Simultaneous operations on distinct objects by multiple threads do not require synchronization and
do not introduce synchronization of their own
- With exception of copy and assignment from an object simultaneous read (const) operations on on the same
object by multiple threads do not require synchronization and do not introduce synchronization of their own
- Iff refcnt_add_ref and refcnt_sub_ref do not require external
synchronization with respect to themselves
and each other then simultaneous
copy and assignment from the same object by multiple threads do not require synchronization either. Whether
they introduce synchronization of their own depends on implementation of refcnt_add_ref and refcnt_sub_ref
- Otherwise simultaneous
copy and assignment from the same object by multiple threads require external synchronization
- Simultaneous write (non const) operations on the same object by multiple threads require external synchronizations.
Failure to synchronize them results in UB
In plain English if refcnt_add_ref and refcnt_sub_ref are thread safe then refcnt_ptr is thread safe
for reads. Otherwise be careful and read the exact rules above
Making an object reference counted is a little tricky. There is one fundamental problem, two tricky
situations and some design choices that need to be taken care of. Since designing reference counted objects often goes together with
using reference counted smart pointers this is a good place to discuss these issues.
The basics are pretty well known. People usually create a class along the following lines
class foo
{
public:
void add_ref()
{ ++m_count; }
void sub_ref()
{
if (--m_count == 0)
delete this;
}
private:
atomic<int> m_count;
};
So far so good but why add_ref and sub_ref are not const? First reference count is really not a part
of the object state. Second having them non-const will prevent you from ever having smart_ptr<const T>
since any such smart pointer will have to call them on a const object. So here is the first revision
class foo
{
public:
void add_ref() const
{ ++m_count; }
void sub_ref() const
{
if (--m_count == 0)
delete this;
}
private:
mutable atomic<int> m_count;
};
Now the fundamnetal problem. What should the reference count be initialized to in object constructor:
1 or 0? People who use flawed smart pointers that have a simple constructor smart_ptr(T * p) that always
bumps the count tend to like 0. This way the object gets the desired one as soon as you stuff it into a smart_ptr
Unfortunately this turns out to be a bad design. The problem is that while in th body of the constructor you
have an object in a weird state: alive but with 0 reference count. This might sound theoretic but it hits you hard
if you ever create a temporary smart pointer out of this within the constructor. This can happen if you
call an external function that expects a smart pointer parameter
(for example LogMe(smart_ptr<T>(this)).
In this case the count is bumped to 1 when the pointer is created but then goes to 0 when it is destroyed. And this
invokes the destructor. Bang! You've got a big piece of UB happening here.
In case you say 'I don't care since I will just avoid creating smart pointers out of this in the constructor',
consider that you will have to ensure that no helper method you call from the constructor does it. Essentially
each member function will now have to deal with the possibility that the object is in a bad state. And this is a
rich source of bugs
A much better approach is to create a simple invariant: all living objects have their reference count > 0, period.
No more special states and no more avoiding smart pointers in constructor. With this is mind here is a revised foo
class foo
{
public:
foo() : m_count(1)
{}
void add_ref() const
{ ++m_count; }
void sub_ref() const
{
if (--m_count == 0)
delete this;
}
private:
mutable atomic<int> m_count;
};
Note that having this convention requires you to use
refcnt_ptr<foo> p = noref(new foo);
Which is why noref was used in all examples above that included new objects.
But we are not off the hook yet. Now consider what happens in destructor. When we enter it the reference
count is already 0 but we might call some helper there too and pass it a smart pointer created from this.
The count will be bumped again then go to 0 and you will have 'delete this' running the second time.
The solution is again to stick to principles. We agreed that a live object always has the count > 0 and
so it must be the case for destructor too. Here is the correct version
class foo
{
public:
foo() : m_count(1)
{}
void add_ref() const
{ ++m_count; }
void sub_ref() const
{
if (--m_count == 0)
{
++m_count; //restore invariant
delete this;
}
}
private:
mutable atomic<int> m_count;
};
This is a bare minimum working reference counted class. Now, it is quite tedious to type it
every time so the next question is "can we reuse it?". The answer is yes but the mechanism of reuse
depends on your circumstances. You could make it a base class a derive from it but it is intrusive.
A better approach would be a class template that can be applied to many types without
touching them. Here it is
template<typename T>
class ref_counted : public T
{
public:
ref_counted() : m_count(1)
{}
void add_ref() const
{ ++m_count; }
void sub_ref() const
{
if (--m_count == 0)
{
++m_count; //restore invariant
delete this;
}
}
private:
mutable atomic<int> m_count;
};
template<typename T>
void refcnt_add_ref(const ref_counted<T> & obj)
{
obj.add_ref();
}
template<typename T>
void refcnt_sub_ref(const ref_counted<T> & obj)
{
obj.sub_ref();
}
Now you can do things like
typedef ref_counted<std::vector<int> > ref_counted_vector;
refcnt_ptr<ref_counted_vector> p = noref(new ref_counted_vector);
This is almost like ERC. Of course there are still problems.
Current C++ doesn't allow 'inheriting' constructors so if the wrapped type
doesn't have a default construtor or you want to use a different one things
get trickier (you can partilly solve it by having delegated constructors with
up to N templated parameters). I'll leave all these extensions as an exercise for the
reader.