C++ Multiple Thread Concurrency with data in Classes

I was recently looking at Quora feed and someone asked the question: –

If a class in C++ has a member function that executes on a separate thread, can that member function read from and write to data members of the class?

which I thought would be a good question to answer. However, it’s not likely to be a short answer so I thought I would write a quick blog article on the subject.

I will start by saying that when dealing with multiple threads (often just referred to as concurrency), its actually a lot simpler than you might think, but it does take a certain mindset before it feels natural. Anyway, back to the answer to the question.

The short answer to the question is, No, not safely, not without adding some form of concurrency protection. Things are a lot easier in C++11 but like anything to do with concurrency, it all depends on your implementation details and the nature of the data members, their purpose and the reason for needing multiple threads to read/write to that data.

As a general (but not exclusively true) rule, having one instance of a class that contains dynamic data at runtime, should not be operated on by more than one thread. Many threads can read without issue, but modifying is generally a bad idea.

If you need to modify data of a class, give the thread that modifying the data exclusive access to that class instance, this, while not enforced or required by C++, adopting this strategy will most certainly help humans reading your code understand intent.

One of my most frequently used patterns which is incredibly useful, and (almost) lock-free, and fast, and basically foolproof from a usability point of view and really simple to understand in a concurrent system. This is not the only solution, nor is it perfect for any situation, nor is it anything special, but it does really help in multi-threaded systems (servers especially).

Let’s say we have a class that contains some configuration information, for this purpose I will use a simple map of name/values

class config_data_t
{
   std::map<std::string, std::string> _config_vals;

public:
   const std::string& get_val(const std::string& name);
}

And this configuration is global, so we load it from a file and hold it in memory so that all threads can access the configuration information quickly and without having to worry about concurrency AND we have a thread that watches the config file, and if it changes, we need to reload our configuration information safely, without either locking or disrupting any other thread.

So the first thing we are going to need to do is to create a global instance of our config object and load it from the config file. To do this we are going to create a shared_ptr which is a global variable, make a new instance of our config class and load it with its data.

using config_data_ptr_t = std::shared_ptr<config_data_t>;
config_data_ptr_t _global_config_info;

void load_config()
{
   config_data_ptr_t config_info = std::make_shared<config_data_t>();

   // Do whatever you need to do in order to load the config_info->_config_vals with data

   // update our global variable
   _global_config_info = config_info;

   // ... rest of main() stuff
}

Ok, in the above code, we have a global variable _global_config_info shared pointer, then in main() we have made a new instance of a config_data_t object as a local shared pointer (config_info), populated it with data and finally, we have assigned our config_info pointer to our global _global_config_info pointer, at this point within the scope of load_data(), both config_info and _global_config_info are pointing to the same instance of the config class, when load_config ends, config_info goes out of scope leaving the config info pointed to by the _global_config_info shared pointer.

Within our program we want any thread to be able to access our configuration information, so we provide a global getter function to make this easy and intuitive.

const config_data_ptr_t config() 
{
    return _global_config_info;
}

This is a very simple function, but this is where half of the magic happens. What we are doing is returning a new shared pointer that is pointing to the same instance of the config_info_t class as the _global_config_info pointer is pointing at. The shared pointer copy is both very fast and is thread-safe, concurrency is built-in, this is a thread-safe operation and often is implemented under the hood in a lock-free way using the CPU’s interlocking atomic operations. So, once you have the pointer you may access configuration values as required. This can be done safely from as many threads as you like, without fear of any concurrency issues

const config_data_ptr_t cfg = config();

const std::string& val = cfg->get_val("some.config.val");

Now I said above, this is “half” the magic, because there is a rule you need to follow… you must never, under any circumstances modify the values held in the global config object, if you do, your thread(s) will run into concurrency issues. So if you load it and use it, no problem. But we said that we also need to be able to detect changes to the config file and load those changes when detected! Well, that’s the other half of the magic, because we are going to take advantage of the shared_ptr behavior and we are going to follow one simple rule…

We will never change the config values in a config object, if we detect a change in the configuration file we will simply re-load it.

if(has_config_file_changed())
{
   load_config();
}

At the point we call load config, other threads might have a pointer to the current config, but when we load the config into a new instance of the config_info_t class, and then we assign this pointer to the global pointer, the second half of the magic occurs. All existing threads that have a config pointer are pointing to the old instance (pre config file changes) of the config object, the global pointer gets updated to point to the freshly loaded config object. As each thread finishes with the config the shared_ptrs go out of scope, when the last shared_ptr of the old config object goes out of scope, the old config object is freed from memory. Subsequent threads that call config() will get a pointer to the freshly loaded config object.

So this is the basic principle and it works really well, and I bet there is even a known “pattern” but if there is, I have no idea what its called. Also, there are two unanswered questions that the above raises.

What if I want my program to be able to modify configuration changes on the fly?

Absolutely this is a common requirement and its easy, you simply adopt the same principle, you never write changes to the config object that is pointed to by the global variable, instead, you simply take a copy of the global object, make your changes and replace the global object with your changes object.


void set_config_values(const std::map<std::string, std::string>& vals)
{
  // Make a new empty config object
  config_data_ptr_t config_info = std::make_shared<config_data_t>();

  // Make a copy of the global config
  *config_info = *config();

  // Modify config items as required
  for(const auto& v : vals)
    config_info->_config_vals[v.first] = v.second;

  // update our global variable
  _global_config_info = config_info;
}

Surely copying a whole config object is really expensive if you just want to change a single value?

Yes and no, it depends on the nature of your data. If the configuration data is huge, then yes, this would not be the right pattern, but in my experience, most config data has some fundamental characteristics.

a) the configuration does not change much.
b) the configuration is generally quite small from a size and a number of items point of view.

As I said, this pattern is not for all situations, generally, the copy pattern is a play-off between simplicity and efficiency. What I hope this example highlights though, is that class members are not in any way protected from concurrency issues, you need to adopt a pattern/strategy that gives you the best results for the job to are trying to do.

I will follow-up with a second example, where the same class is used with locking protection, where the copying-on-write pattern is not efficient, and-or you will make a lot of config changes dynamically, which is the other common pattern for this type of data.

This content is published under the Attribution-Noncommercial-Share Alike 3.0 Unported license.

2 comments

  1. For the copy and write approach will it subject to a race condition? For example, two threads are attempting to update the config at a similar time and they both copy an unmodified version of the config and make modification on their own copy. And whatever thread that swap latter will be end up making the earlier change disappear?

    1. No the pattern is, you modify an exclusive copy, so if two threads modify they are modifying their own copy. Of course its the last one that puts the modified object back into the shared pointer that wins in this case. Its not really for that pattern, this is used when you have a high throughput server needing shared access to an object (say a configuration object) for read, but very infrequently there are changes, which would be done by one thread. Technically the shared_ptr should be protected by a mutex because its reference counting is not atomic, although i did read somewhere that might change in the future.
      Gerry

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.