Data Structures¶

Goal: No incidental data structures

Definitions¶

Classic: A data structure is a format for organizing and storing data.

  • Doesn't define structure, replaces it with the related word, format
  • In mathematics, structure is defined as:

A structure on a set consists of additional entities that, in some manner, relate to the set, endowing the collection with meaning or significance.

  • A type is a pattern for storing and modifying objects

  • A type is a structure that relates a set of objects to a set of values

    • This is a representational relationship
  • A representational relationship creates a trivial data structure consisting of a single value

  • Values are related to other values, i.e. $3 \neq 4$

  • Because objects exist in memory, they have a physical relationship

A data structure is a structure utilizing value, representational, and physical relationships to encode semantic relationships on a collection of objects

  • The choice of encoding can make a dramatic difference on the performance of operations
Memory Hierarchy
Data from IT Hare

TLB is Translation Look-aside Buffer - or cache miss

  • A data structure is created anytime a relationship is established between objects
  • To avoid confusion we will reserve the term data structure to refer to types with a set of invariants which insure a set of relationships are maintained. i.e. standard containers
  • More transient data structures will be referred to as structured data

Encoding Semantic Relationships¶

  • Three primary means to encode a semantic relationship
    • Physically, using relative location in memory
      • {3, 4, 5} 3 is before 4 and 3 is less-than 4
    • Value, use an object with a value to represent the relationship
      • struct list { int _data, list* _next }; the value of _next encodes an ordered relationship
    • Representational, use the representation of the object to encode a relationship about the values of the objects
      • hash(a) == hash(b) $\implies$ a == b

[ Everything from here down is notes... ]

namespace bcc {

template <class T, class O>
void iota(T f, T l, O out) {
    while (f != l) {
        out(f);
        ++f;
    }
}

} // namespace bcc
%%timeit
{
std::list<int> _list;
bcc::iota(0, 1'000'000, [&](int n){ _list.push_back(n); });
}
%%timeit
{
std::vector<int> _vector;
bcc::iota(0, 1'000'000, [&](int n){ _vector.push_back(n); });
}
std::list<int> _list;
bcc::iota(0, 1'000'000, [&](int n){ _list.push_back(n); });
%timeit std::find(begin(_list), end(_list), 500'000);
%timeit std::find(begin(_vector), end(_vector), 500'000);
%timeit _list.insert(std::find(begin(_list), end(_list), 500'000), 42);
%timeit _vector.insert(std::find(begin(_vector), end(_vector), 500'000), 42);
%timeit -n 1000 _list.push_front(42);
%timeit -n 1000 _vector.insert(begin(_vector), 42);
  • If you only need push_front(), std::deque<> is a better choice
  • std::list<> only makes since when you are externally indexing and hence require iterator stability

Problem¶