The C++ Memory Model¶

Basics¶

  • Memory consists of one or more contiguous sequence of bytes
  • Every byte has a unique address.
    • Bytes are >= 8 bits

Processor Memory Hierarchy¶

Memory Hierarchy

Threads and Data Races¶

  • All objects are accessible to all threads
  • When an evaluation of an expression writes to a memory location
    • and another evaluation reads or modifies the same memory location
    • the expressions conflict and there is a data race unless
      • both evaluations are on the same thread or same signal handler
      • both are atomic operations
      • one happens-before another
    • if a data race occurs the behavior of the program is undefined

Memory Order¶

  • Memory order specifies how regular, non-atomic, memory accesses are to be ordered around an atomic operation
  • Absent any constraints, one thread can observe the values changed by another thread in any order
  • The default order for any standard library atomic operation is sequentially-consistent (seq_cst)

Other Supported Memory Orders¶

  • Relaxed: there is no synchronization or ordering constraints imposed on other reads or writes, only the operation's atomicity is guaranteed

performance-move

  • Acquire: applies to a load operation
    • no reads or writes in the current thread can be reordered before the load
  • Release: applies to a store operation
    • no reads or writes in the current thread can be reordered after the store
    • all writes are visible in other threads that acquire the same atomic
    • all writes with a data dependency are visible in other threads that consume the same atomic
  • Acquire-Release: applies to read-modify-write operations
    • Same guarantees as acquire and release for a single atomic operation

performance-move

  • Consume: (discouraged as of C++17) similar to aquire but only applies to dependent operations
    • no known production compiler tracks dependency chains - consume becomes aquire

Sequentially Consistent Ordering¶

  • load operations are acquire
  • store operations are a release
  • read-modify-write are acquire and release
  • a single total order exists in which all threads observe all modifications in the same order

Important¶

  • Intel hardware has a strong memory model
    • Every machine instruction has an implied acquire-release semantics
    • That does not imply that every operation is atomic
    • And without specifying a memory order, the compiler is free to reorder operations
    • Peformance of memory-order relaxed and sequentially-consistent has no hardware implications

performance-move

Cost of Copy
  • ARM has a weak memory model
    • Every operation is a consume or release
    • But without specifying a memory order, the compiler is free to reorder operations

Recommendations¶

  • Avoid using primitive synchronization mechanisms at all
  • If you must, only use sequentially consistent ordering unless
    • You can demonstrate a performance gain
    • You can prove it is correct
    • You have at least one other expert review your proof
    • You write a complete description along with your proof and include it in the code
    • You include a unit test to demonstrate both correctness and gain