Eugene's Site
refcnt_ptr
Download the source code

Contents

Introduction

This article and the code that accompanies it present an intrusive reference counting smart pointer. If you don't know what it means click on the links in the previous sentence. The word intrusive signifies that the reference counting is done by the object itself rather than external code. If you ever used Microsoft COM then this is exactly the kind of reference counting used there. If you are familiar with Boost or TR1 shared_ptr this is exactly what they are not doing. To avoid typing 'reference counting' or 'intrusive reference counting' every time I will use RC and IRC abbreviations. Similarly ERC will stand for external reference counting as practiced by shared_ptr and similar classes.

Why would I use intrusive reference counting?

First you might not have a choice. A framework or library you are using might require IRC. Microsoft COM does for example and so do some other important libraries. You can in principle use a non-intrusive RC smart pointer with IRC object if it supports custom acquire/cleanup actions but doing so is more complicated and introduces unnecessary overhead.

Which brings us to another point. IRC can and usually does result in better performance than ERC. An ERC smart pointer has to store the count somewhere and it will be either in the pointer object itself or some external location. In the first case you pay in pointer size while in the second you pay in additional allocation and cache misses when accessing the count (which is in a different location than both the pointer and pointee). To put it in a different perspective all other things being equal under best possible circumstances an ERC can only be as efficient as IRC but never better.

Yet another advantage of IRC is that you can produce an IRC smart pointer from any raw pointer to the object including this. This means that you can easily produce smart pointers to itself from any of the object's member functions. This is in general impossible with ERC without modifying the object. But then if you already modify it why not make it IRC and avoid all the trouble.

Of course nothing comes for free. The main disadvantage of IRC is of course the intrusive part. In order to use it you must design the target object class to support reference counting. However, this can be addressed for most object types by using templated reference counted adapters as shown at the end of this article. However, template adapters are not perfect and in some cases ERC remains the only available option.

Another IRC disadvantage is that it is hard to support so-called weak pointers. Not that it is impossible but doing it requires some framework for object registration which pretty much removes all IRC advantages.

Having discovered shared_ptr some people tend to treat it as the best, indeed default, smart pointer. This is, in my opinion very misguided. An ERC smart pointer like shared_ptr is just one possible choice and should be used only when it fits the problem you are trying to solve. If you need single owner semantics with ownership transfer use auto_ptr, if you need single owner with no transfer use scoped_ptr. If you need RC, control target object source and don't need weak pointers consider and IRC smart pointer like refcnt_ptr presented here. Finally if you need RC but don't control the target or need weak pointers, shared_ptr is, indeed, a good choice.

Why another smart pointer?

Well, because I couldn't find one that I liked. There are quite a few well-known IRC smart pointers implementations but they all have some problems I couldn't live with. First of all most smart pointers come packaged within a library. boost::intrusive_ptr requires Boost, Loki's smart pointer requires Loki etc. This is fine if you are already using these nice libraries but can be a problem if you don't. In a commercial software development world bringing in a new 3rd party library can be a tricky proposition (for very good reasons) and people often cannot do it. What I needed is an independent smart pointer class that can be copy/pasted into any code base. (Not that you cannot grab the code of Boost or Loki smart pointers - it is just more complicated than it ought to be)

Even ignoring the problem above there are some fundamental flaws with pretty much every IRC smart pointer implementation. To begin with all of them include a constructor of the form

smart_ptr(T * p);

This is very, very bad design. It asserts that there is a conversion from raw pointer to smart pointer. However, with IRC there is always the question whether such conversion results in an increment of reference count. There is no single good answer to it. In some situations you want the count to go up and in some you don't. Pretending that there is one default choice is misleading. To get around this problem some IRC smart pointers include another constructor of the form

smart_ptr(T * p, bool add_ref);

Which is practically an example of how not to use bool parameters. Is it so hard to introduce an enum with easy to comprehend name so the reader doesn't have to guess what smart_ptr(p, false) means?

Some smart pointers from Microsoft and others go further and commit various other sins. They arent generic enough and require the pointee to have functions with some exact signature prescribed by the smart pointer. For example if your 'decrement reference count' function isn't named Release() you cannot use MS smart pointers. They also may overload operator & (address of) to return the address of the stored pointer. This is done with best intentions to support passing &smart_ptr<T> to a function expecting T ** but this hack makes it impossible to store such smart pointer in containers or do anything else that might require taking its real address.

To summarize, most IRC smart pointers out there suck and so I decided to create my own

Source code

The source code for refcnt_ptr can be downloaded from here. It is under MIT style license so you are free to include it in commercial or open source projects as long as you comply with it.

If you have any questions or constructive criticism feel free to contact me. I will try to answer all mail but please be patient, working on open source code doesn't pay my bills.

Usage examples

In order to use refcnt_ptr with your class you need to define two function that tell it how to increment and decrement the reference count for this class. For a refcnt_ptr<T> you need to define

void refcnt_add_ref(T & unk);
void refcnt_sub_ref(T & unk);

Note the reference rather than pointer parameter. It is so because you will never get a NULL parameter in this function so don't worry about it.

These functions can be in any namespace as long as a lookup for T & can find them.

Here is an example COM's IUnknown interface

inline
void refcnt_add_ref(IUnknown & unk)
{
    unk.AddRef();
}
inline
void refcnt_sub_ref(IUnknown & unk)
{
    unk.Release();
} 

Basic usage

Having declared/defined these two functions now you can use refcnt_ptr<T> as follows

refcnt_ptr<T> pT1 = noref(new T);       //create from raw but don't increment ref count

refcnt_ptr<T> pT2;
pT2 = noref(new T);                     //assign from raw but don't increment ref count

T * pRawT3 = ...;
refcnt_ptr<T> pT3 = ref(pRawT3);        //create from raw and increment ref count

T * pRawT4 = ...;
refcnt_ptr<T> pT4;
pT4 = ref(pRawT4);                      //assign from raw and increment ref count

refcnt_ptr<T> pT5 = pT4;                //normal copy

refcnt_ptr<T> pT6;
pT6 = pT5;                              //normal assignment

T * pRawT7 = pT6.c_ptr();               //get the raw pointer. note the name which is IMO way 
                                        //better than meaningless get()

T & t8 = *pT6;                          //usual dereference

pT6->foo();                             //and arrow operator 

pT6.reset();                            //releases the reference and sets the stored pointer 
                                        //to NULL. note that there is no argument to this function

T * pRawT8 = pT5.release();             //sets the stored pointer to NULL and returns its 
                                        //previous value. reference count is unchanged

pT4.swap(pT3);                          //the usual swap operation

Comparisons

refcnt_ptr<T> can be natuarally compared with both raw pointers and other refcnt_ptr<T>

refcnt_ptr<T> pT1 = ..., pT2 = ...; 
T * pRawT = ...;

if (pT1 == pT2) ....;
if (pT1 == pRawT) ....;
if (pRawT == pT1) ....;

//and the same for !=, <, <=, >, >=

Conversions

In general if you have two types T and Y such as Y * can be implicitly converted to T * then refcnt_ptr<T> can accept Y * and refcnt_ptr<Y> in any of the constructors and assignment operators. To use the examples from above

refcnt_ptr<T> pT1 = noref(new Y);       //create from raw but don't increment ref count

refcnt_ptr<T> pT2;
pT2 = noref(new Y);                     //assign from raw but don't increment ref count

Y * pRawY3 = ...;
refcnt_ptr<T> pT3 = ref(pRawY3);        //create from raw and increment ref count

Y * pRawY4 = ...;
refcnt_ptr<T> pT4;
pT4 = ref(pRawY4);                      //assign from raw and increment ref count

refcnt_ptr<Y> pY4 = ...;
refcnt_ptr<T> pT5 = pY4;                //normal copy

refcnt_ptr<T> pT6;
pT6 = pY4;                              //normal assignment

refcnt_ptr<T> pT7 = ...; 
refcnt_ptr<Y> pY7 = ...; 
Y * pRawY7 = ...;

if (pT7 == pY7) ....;
if (pT7 == pRawY7) ....;
if (pRawY7 == pT7) ....;

//and the same for !=, <, <=, >, >=

Conditionals

refcnt_ptr<T> can be used in coditional expressions. This is done without introducing any unintended conversions to an integers

refcnt_ptr<T> pT = ....; 

if (pT) ...;                           //fine
while(!pT) ...;                        //fine

pT + 7                                 //won't compile

Explicit conversions

If you need to convert refcnt_ptr<Y> to refcnt_ptr<T> where Y * can only be converted to T * using any of non-implicit C++ casts (static, dynamic, const and reinterpret) you can of course go through raw pointers but this is tedious. So instead you can use the following helpers to save you some time

refcnt_ptr<T> pT = ....; 
refcnt_ptr<const T> pConstT = ....;
refcnt_ptr<Y> pY = ....; 


pT = refcnt_const_cast<T>(pConstT);
pT = refcnt_static_cast<T>(pY);
pT = refcnt_dynamic_cast<T>(pY);
pT = refcnt_reinterpret_cast<T>(pY);

This is basically all you can do with refcnt_ptr. Simple isn't it?

Formal properties

Requirements for type T

  1. The expressions T * should form a valid data pointer
  2. If T * p then expressions refcnt_add_ref(*p) and refcnt_sub_ref(*p) must be valid function calls
  3. If T * p points to a valid object then this object must remain alive if refcnt_add_ref(*p) was called more times than refcnt_sub_ref(*p)

Exception safety

  1. refcnt_sub_ref function supplied by user must not throw. If it does using refcnt_ptr results in undefined behavior
  2. refcnt_add_ref function supplied by user may throw freely
  3. if refcnt_add_ref does not throw then all functions of refcnt_ptr do not throw either (nothrow ES guarantee)
  4. if refcnt_add_ref throws then constructors and assignment operators of refcnt_ptr provide strong ES guarantee, while all other functions are still nothrow

Thread safety

  1. Simultaneous operations on distinct objects by multiple threads do not require synchronization and do not introduce synchronization of their own
  2. With exception of copy and assignment from an object simultaneous read (const) operations on on the same object by multiple threads do not require synchronization and do not introduce synchronization of their own
  3. Iff refcnt_add_ref and refcnt_sub_ref do not require external synchronization with respect to themselves and each other then simultaneous copy and assignment from the same object by multiple threads do not require synchronization either. Whether they introduce synchronization of their own depends on implementation of refcnt_add_ref and refcnt_sub_ref
  4. Otherwise simultaneous copy and assignment from the same object by multiple threads require external synchronization
  5. Simultaneous write (non const) operations on the same object by multiple threads require external synchronizations. Failure to synchronize them results in UB

In plain English if refcnt_add_ref and refcnt_sub_ref are thread safe then refcnt_ptr is thread safe for reads. Otherwise be careful and read the exact rules above

A note on implementing reference counted objects

Making an object reference counted is a little tricky. There is one fundamental problem, two tricky situations and some design choices that need to be taken care of. Since designing reference counted objects often goes together with using reference counted smart pointers this is a good place to discuss these issues.

The basics are pretty well known. People usually create a class along the following lines

class foo
{
public:
    void add_ref()
       { ++m_count; }
    void sub_ref()
    {
        if (--m_count == 0)
            delete this;
    }
private:
    atomic<int> m_count;
};

So far so good but why add_ref and sub_ref are not const? First reference count is really not a part of the object state. Second having them non-const will prevent you from ever having smart_ptr<const T> since any such smart pointer will have to call them on a const object. So here is the first revision

class foo
{
public:
    void add_ref() const
       { ++m_count; }
    void sub_ref() const
    {
        if (--m_count == 0)
            delete this;
    }
private:
    mutable atomic<int> m_count;
};

Now the fundamnetal problem. What should the reference count be initialized to in object constructor: 1 or 0? People who use flawed smart pointers that have a simple constructor smart_ptr(T * p) that always bumps the count tend to like 0. This way the object gets the desired one as soon as you stuff it into a smart_ptr

Unfortunately this turns out to be a bad design. The problem is that while in th body of the constructor you have an object in a weird state: alive but with 0 reference count. This might sound theoretic but it hits you hard if you ever create a temporary smart pointer out of this within the constructor. This can happen if you call an external function that expects a smart pointer parameter (for example LogMe(smart_ptr<T>(this)). In this case the count is bumped to 1 when the pointer is created but then goes to 0 when it is destroyed. And this invokes the destructor. Bang! You've got a big piece of UB happening here.

In case you say 'I don't care since I will just avoid creating smart pointers out of this in the constructor', consider that you will have to ensure that no helper method you call from the constructor does it. Essentially each member function will now have to deal with the possibility that the object is in a bad state. And this is a rich source of bugs

A much better approach is to create a simple invariant: all living objects have their reference count > 0, period. No more special states and no more avoiding smart pointers in constructor. With this is mind here is a revised foo

class foo
{
public:
    foo() : m_count(1)
    {}
	
    void add_ref() const
       { ++m_count; }
    void sub_ref() const
    {
        if (--m_count == 0)
            delete this;
    }
private:
    mutable atomic<int> m_count;
};

Note that having this convention requires you to use

refcnt_ptr<foo> p = noref(new foo);

Which is why noref was used in all examples above that included new objects.

But we are not off the hook yet. Now consider what happens in destructor. When we enter it the reference count is already 0 but we might call some helper there too and pass it a smart pointer created from this. The count will be bumped again then go to 0 and you will have 'delete this' running the second time.

The solution is again to stick to principles. We agreed that a live object always has the count > 0 and so it must be the case for destructor too. Here is the correct version

class foo
{
public:
    foo() : m_count(1)
    {}
	
    void add_ref() const
       { ++m_count; }
    void sub_ref() const
    {
        if (--m_count == 0)
        {
        	++m_count; //restore invariant
            delete this;
        }
    }
private:
    mutable atomic<int> m_count;
};

This is a bare minimum working reference counted class. Now, it is quite tedious to type it every time so the next question is "can we reuse it?". The answer is yes but the mechanism of reuse depends on your circumstances. You could make it a base class a derive from it but it is intrusive. A better approach would be a class template that can be applied to many types without touching them. Here it is

template<typename T>
class ref_counted : public T
{
public:
    ref_counted() : m_count(1)
    {}
	
    void add_ref() const
       { ++m_count; }
    void sub_ref() const
    {
        if (--m_count == 0)
        {
        	++m_count; //restore invariant
            delete this;
        }
    }
private:
    mutable atomic<int> m_count;
};

template<typename T>
void refcnt_add_ref(const ref_counted<T> & obj)
{
    obj.add_ref();
}

template<typename T>
void refcnt_sub_ref(const ref_counted<T> & obj)
{
    obj.sub_ref();
}


Now you can do things like

typedef ref_counted<std::vector<int> > ref_counted_vector;

refcnt_ptr<ref_counted_vector> p = noref(new ref_counted_vector);

This is almost like ERC. Of course there are still problems. Current C++ doesn't allow 'inheriting' constructors so if the wrapped type doesn't have a default construtor or you want to use a different one things get trickier (you can partilly solve it by having delegated constructors with up to N templated parameters). I'll leave all these extensions as an exercise for the reader.