diff options
Diffstat (limited to 'doc/design.rst')
-rw-r--r-- | doc/design.rst | 268 |
1 files changed, 268 insertions, 0 deletions
diff --git a/doc/design.rst b/doc/design.rst new file mode 100644 index 000000000000..27d29e9e617a --- /dev/null +++ b/doc/design.rst @@ -0,0 +1,268 @@ +.. _design: + +Design +====== + +This is a library of **immutable containers**. + +These containers have all their methods marked ``const``. Instead of +mutating them *in place*, they provide manipulation functions that +*return a new transformed value*, leaving the original value +unaltered. In the context of data-structures, this property of +preserving old values is called **persistence**. + +Most of these containers use data-structures in which these operations +can be done *efficiently*. In particular, not all data is copied when +a new value is produced. Instead, the new values may share, +internally, common data with other objects. We sometimes refer to +this property as **structural sharing**. This behaviour is +transparent to the user. + +Assigment +--------- + +We are sorry, we lied. These containers provide *one mutating +operation*: **assignment** --- i.e. ``operator=``. + +There is a good reason: without ``operator=`` everything becomes +complicated in C++. For example, one may not contain non-assignable +types in many standard containers, assignment would also be disabled +from your custom types holding immutable containers, and so on and so +forth. + +C++ is a multi-paradigm language with an imperative core. Thus, it is +built on the foundation that *variables* can be mutated --- +i.e. assigned to. We don't want to ride against this tide. What we +want to prevent is in-place *object* manipulation. Because of C++ +semantics, *variable* assignment is defined in terms of *object +mutation*, so we have to provide *this very particular mutating +operation*, but nothing else. Of course, you are free to mark your +variables ``const`` to completely forbid assignment. + +.. warning:: + + **Assignment is not thread safe**. When a *mutable* variable is + shared across multiple threads, protect access using some other + mechanism. + + For obvious reasons, all other methods, which are ``const``, are + thread-safe. It is safe to share *immutable* state across multiple + threads. + +To ``const`` or not to ``const`` +-------------------------------- + +Many C++ programmers, influenced by functional programming, are trying +to escape the evils of mutability by using ``const`` whenever +possible. We also do it ourselves in many of our examples to +reinforce the property of immutability. + +While in general this is a good practice backed up with very good +intentions, it has one caveat: *it disables moveability*. It does so +even when ``std::move()`` is used. This makes sense, since moving from +an object may mutate it, and ``const``, my friends, prevents *all* +mutations. For example: + +.. literalinclude:: ../example/vector/move.cpp + :language: c++ + :start-after: move-bad/start + :end-before: move-bad/end + +One may think that the variable ``v`` is moved into the +``push_back()`` call. This is not the case, because the variable +``v`` is marked ``const``. Of course, one may enable the move by +removing it, as in: + +.. literalinclude:: ../example/vector/move.cpp + :language: c++ + :start-after: move-good/start + :end-before: move-good/end + +So, is it bad style then to use ``const`` as much as possible? I +wouldn't say so and it is advisable when ``std::move()`` is not used. +An alternative style is to not use ``const`` but adopt an `AAA-style +<aaa>`_ (*Almost Always use Auto*). This way, it is easy to look for +mutations by looking for lines that contain ``=`` but no ``auto``. +Remember that when using our immutable containers ``operator=`` is the +only way to mutate a variable. + +.. _aaa: https://herbsutter.com/2013/08/12/gotw-94-solution-aaa-style-almost-always-auto/ + +.. admonition:: Why does ``const`` prevent move semantics? + + For those adventurous into the grainy details C++, here is why. + ``std::move()`` does not move anything, it is just *a cast* from + normal *l-value* references (``T&``) to *r-value* reference + (``T&&``). This is, you pass it a variable, and it returns a + reference to its object disguised as an intermediate result. In + exchange, you promise not to do anything with this variable later + [#f1]_. It is the role of the thing that *receives the moved-from + value* (in the previous example, ``push_back``) to actually do + anything interesting with it --- for example, steal its contents + ๐. + + So if you pass a ``T&`` to ``std::move()`` you get a ``T&&`` and, + unsurprisingly, if you pass a ``const T&`` you get a ``const T&&``. + But the receivers of the moved-from value (like constructors or our + ``push_back()``) maybe be moved-into because they provide an + overload that expects ``T&&`` --- without the ``const``! Since a + ``const T&&`` can not be converted into a ``T&&``, the compiler + looks up for you another viable overload, and most often finds a + copy constructor or something alike that expects a ``const T&`` or + just ``T``, to which a ``const T&&`` can be converted. The code + compiles and works correctly, but it is less efficient than we + expected. Our call to ``std::move()`` was fruitless. + + .. [#f1] For the sake of completeness: it is actually allowed to do stuff + with the variable *after another value is assigned to it*. + +.. _move-semantics: + +Leveraging move semantics +------------------------- + +When using :ref:`reference counting<rc>` (which is the default) +mutating operations can often be faster when operating on *r-value +references* (temporaries and moved-from values). Note that this +removes *persistence*, since one can not access the moved-from value +anymore! However, this may be a good idea when doing a chain of +operations where the intermediate values are not important to us. + +For example, let's say we want to write a function that inserts all +integers in the range :math:`[first, last)` into an immutable vector. +From the point of view of the caller of the function, this function is +a *transaction*. Whatever intermediate vectors are generated inside +of it can be discarded since the caller can only see the initial +vector (the one passed in as argument) and the vector with *all* the +elements. We may write such function like this: + +.. literalinclude:: ../example/vector/iota-move.cpp + :language: c++ + :start-after: myiota/start + :end-before: myiota/end + +The intermediate values are *moved* into the next ``push_back()`` +call. They are going to be discarded anyways, this little +``std::move`` just makes the whole thing faster, letting ``push_back`` +mutate part of the internal data structure in place when possible. + +If you don't like this syntax, :doc:`transients<transients>` may be +used to obtain similar performance benefits. + +.. admonition:: Assigment guarantees + + From the language point of view, the only requirement on moved from + values is that they should still be destructible. We provide the + following two additional guarantees: + + - **It is valid to assign to a moved-from variable**. The variable + gets the assigned value and becomes usable again. This is the + behaviour of standard types. + + - **It is valid to assign a moved-from variable to itself**. For + most standard types this is *undefined behaviour*. However, for our + immutable containers types, expressions of the form ``v = + std::move(v)`` are well-defined. + +Recursive types +--------------- + +Most containers will fail to be instantiated with a type of unknown +size, this is, an *incomplete type*. This prevents using them for +building recursive types. The following code fails to compile: + +.. code-block:: c++ + + struct my_type + { + int data; + immer::vector<my_type> children; + }; + +However, we can easily workaround this by using an ``immer::box`` to wrap +the elements in the vector, as in: + +.. code-block:: c++ + + struct my_type + { + int data; + immer::vector<immer::box<my_type>> children; + }; + +.. admonition:: Standard containers and incomplete types + + While the first example might seem to compile when using some + implementations of ``std::vector`` instead of ``immer::vector``, such + use is actually forbidden by the standard: + + **17.6.4.8** *Other functions (...)* 2. the effects are undefined in + the following cases: (...) In particular---if an incomplete type (3.9) + is used as a template argument when instantiating a template + component, unless specifically allowed for that component. + +.. _batch-update: +Efficient batch manipulations +----------------------------- + +Sometimes you may write a function that needs to do multiple changes +to a container. Like most code you write with this library, this +function is *pure*: it takes one container value in, and produces a +new container value out, no side-effects. + +Let's say we want to write a function that inserts all integers in the +range :math:`[first, last)` into an immutable vector: + +.. literalinclude:: ../example/vector/iota-slow.cpp + :language: c++ + :start-after: include:myiota/start + :end-before: include:myiota/end + +This function works as expected, but it is slower than necessary. +On every loop iteration, a new value is produced, just to be +forgotten in the next iteration. + +Instead, we can grab a mutable view on the value, a :ref:`transient`. +Then, we manipulate it *in-place*. When we are done with it, we +extract back an immutable value from it. The code now looks like +this: + +.. _iota-transient: + +.. literalinclude:: ../example/vector/iota-transient.cpp + :language: c++ + :start-after: include:myiota/start + :end-before: include:myiota/end + +Both conversions are :math:`O(1)`. Note that calling ``transient()`` +does not break the immutability of the variable it is called on. The +new mutable object will adopt its contents, but when a mutation is +performed, it will copy the data necessary using *copy on write*. +Subsequent manipulations may hit parts that have already been copied, +and these changes are done in-place. Because of this, it does not +make sense to use transients to do only one change. + +.. tip:: + + Note that :ref:`move semantics<move-semantics>` can be used instead to + support a similar use-case. However, transients optimise updates + even when reference counting is disabled. + +.. _std-compat: +Standard library compatibility +------------------------------ + +While the immutable containers provide an interface that follows a +functional style, this is incompatible with what the standard library +algorithms sometimes expect. :ref:`transients` try to provide an +interface as similar as possible to similar standard library +containers. Thus, can also be used to interoperate with standard +library components. + +For example the :ref:`myiota() function above<iota-transient>` may as +well be written using standard library tools: + +.. literalinclude:: ../example/vector/iota-transient-std.cpp + :language: c++ + :start-after: include:myiota/start + :end-before: include:myiota/end |