about summary refs log tree commit diff
path: root/doc/design.rst
blob: 27d29e9e617af856fbe68104eba11b4c962ba0b2 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
.. _design:

Design
======

This is a library of **immutable containers**.

These containers have all their methods marked ``const``.  Instead of
mutating them *in place*, they provide manipulation functions that
*return a new transformed value*, leaving the original value
unaltered.  In the context of data-structures, this property of
preserving old values is called **persistence**.

Most of these containers use data-structures in which these operations
can be done *efficiently*.  In particular, not all data is copied when
a new value is produced.  Instead, the new values may share,
internally, common data with other objects.  We sometimes refer to
this property as **structural sharing**.  This behaviour is
transparent to the user.

Assigment
---------

We are sorry, we lied. These containers provide *one mutating
operation*: **assignment** --- i.e. ``operator=``.

There is a good reason: without ``operator=`` everything becomes
complicated in C++.  For example, one may not contain non-assignable
types in many standard containers, assignment would also be disabled
from your custom types holding immutable containers, and so on and so
forth.

C++ is a multi-paradigm language with an imperative core.  Thus, it is
built on the foundation that *variables* can be mutated ---
i.e. assigned to.  We don't want to ride against this tide.  What we
want to prevent is in-place *object* manipulation.  Because of C++
semantics, *variable* assignment is defined in terms of *object
mutation*, so we have to provide *this very particular mutating
operation*, but nothing else.  Of course, you are free to mark your
variables ``const`` to completely forbid assignment.

.. warning::

   **Assignment is not thread safe**.  When a *mutable* variable is
   shared across multiple threads, protect access using some other
   mechanism.

   For obvious reasons, all other methods, which are ``const``, are
   thread-safe.  It is safe to share *immutable* state across multiple
   threads.

To ``const`` or not to ``const``
--------------------------------

Many C++ programmers, influenced by functional programming, are trying
to escape the evils of mutability by using ``const`` whenever
possible.  We also do it ourselves in many of our examples to
reinforce the property of immutability.

While in general this is a good practice backed up with very good
intentions, it has one caveat: *it disables moveability*.  It does so
even when ``std::move()`` is used.  This makes sense, since moving from
an object may mutate it, and ``const``, my friends, prevents *all*
mutations.  For example:

.. literalinclude:: ../example/vector/move.cpp
   :language: c++
   :start-after: move-bad/start
   :end-before:  move-bad/end

One may think that the variable ``v`` is moved into the
``push_back()`` call.  This is not the case, because the variable
``v`` is marked ``const``.  Of course, one may enable the move by
removing it, as in:

.. literalinclude:: ../example/vector/move.cpp
   :language: c++
   :start-after: move-good/start
   :end-before:  move-good/end

So, is it bad style then to use ``const`` as much as possible?  I
wouldn't say so and it is advisable when ``std::move()`` is not used.
An alternative style is to not use ``const`` but adopt an `AAA-style
<aaa>`_ (*Almost Always use Auto*).  This way, it is easy to look for
mutations by looking for lines that contain ``=`` but no ``auto``.
Remember that when using our immutable containers ``operator=`` is the
only way to mutate a variable.

.. _aaa: https://herbsutter.com/2013/08/12/gotw-94-solution-aaa-style-almost-always-auto/

.. admonition:: Why does ``const`` prevent move semantics?

   For those adventurous into the grainy details C++, here is why.
   ``std::move()`` does not move anything, it is just *a cast* from
   normal *l-value* references (``T&``) to *r-value* reference
   (``T&&``).  This is, you pass it a variable, and it returns a
   reference to its object disguised as an intermediate result.  In
   exchange, you promise not to do anything with this variable later
   [#f1]_. It is the role of the thing that *receives the moved-from
   value* (in the previous example, ``push_back``) to actually do
   anything interesting with it --- for example, steal its contents
   😈.

   So if you pass a ``T&`` to ``std::move()`` you get a ``T&&`` and,
   unsurprisingly, if you pass a ``const T&`` you get a ``const T&&``.
   But the receivers of the moved-from value (like constructors or our
   ``push_back()``) maybe be moved-into because they provide an
   overload that expects ``T&&`` --- without the ``const``!  Since a
   ``const T&&`` can not be converted into a ``T&&``, the compiler
   looks up for you another viable overload, and most often finds a
   copy constructor or something alike that expects a ``const T&`` or
   just ``T``, to which a ``const T&&`` can be converted.  The code
   compiles and works correctly, but it is less efficient than we
   expected.  Our call to ``std::move()`` was fruitless.

   .. [#f1] For the sake of completeness: it is actually allowed to do stuff
            with the variable *after another value is assigned to it*.

.. _move-semantics:

Leveraging move semantics
-------------------------

When using :ref:`reference counting<rc>` (which is the default)
mutating operations can often be faster when operating on *r-value
references* (temporaries and moved-from values).  Note that this
removes *persistence*, since one can not access the moved-from value
anymore!  However, this may be a good idea when doing a chain of
operations where the intermediate values are not important to us.

For example, let's say we want to write a function that inserts all
integers in the range :math:`[first, last)` into an immutable vector.
From the point of view of the caller of the function, this function is
a *transaction*.  Whatever intermediate vectors are generated inside
of it can be discarded since the caller can only see the initial
vector (the one passed in as argument) and the vector with *all* the
elements.  We may write such function like this:

.. literalinclude:: ../example/vector/iota-move.cpp
   :language: c++
   :start-after: myiota/start
   :end-before:  myiota/end

The intermediate values are *moved* into the next ``push_back()``
call.  They are going to be discarded anyways, this little
``std::move`` just makes the whole thing faster, letting ``push_back``
mutate part of the internal data structure in place when possible.

If you don't like this syntax, :doc:`transients<transients>` may be
used to obtain similar performance benefits.

.. admonition:: Assigment guarantees

   From the language point of view, the only requirement on moved from
   values is that they should still be destructible.  We provide the
   following two additional guarantees:

   - **It is valid to assign to a moved-from variable**.  The variable
     gets the assigned value and becomes usable again.  This is the
     behaviour of standard types.

   - **It is valid to assign a moved-from variable to itself**.  For
     most standard types this is *undefined behaviour*.  However, for our
     immutable containers types, expressions of the form ``v =
     std::move(v)`` are well-defined.

Recursive types
---------------

Most containers will fail to be instantiated with a type of unknown
size, this is, an *incomplete type*.  This prevents using them for
building recursive types.  The following code fails to compile:

.. code-block:: c++

  struct my_type
  {
      int data;
      immer::vector<my_type> children;
  };

However, we can easily workaround this by using an ``immer::box`` to wrap
the elements in the vector, as in:

.. code-block:: c++

  struct my_type
  {
      int data;
      immer::vector<immer::box<my_type>> children;
  };

.. admonition:: Standard containers and incomplete types

  While the first example might seem to compile when using some
  implementations of ``std::vector`` instead of ``immer::vector``, such
  use is actually forbidden by the standard:

    **17.6.4.8** *Other functions (...)* 2. the effects are undefined in
    the following cases: (...) In particular---if an incomplete type (3.9)
    is used as a template argument when instantiating a template
    component, unless specifically allowed for that component.

.. _batch-update:
Efficient batch manipulations
-----------------------------

Sometimes you may write a function that needs to do multiple changes
to a container.  Like most code you write with this library, this
function is *pure*: it takes one container value in, and produces a
new container value out, no side-effects.

Let's say we want to write a function that inserts all integers in the
range :math:`[first, last)` into an immutable vector:

.. literalinclude:: ../example/vector/iota-slow.cpp
   :language: c++
   :start-after: include:myiota/start
   :end-before:  include:myiota/end

This function works as expected, but it is slower than necessary.
On every loop iteration, a new value is produced, just to be
forgotten in the next iteration.

Instead, we can grab a mutable view on the value, a :ref:`transient`.
Then, we manipulate it *in-place*.  When we are done with it, we
extract back an immutable value from it.  The code now looks like
this:

.. _iota-transient:

.. literalinclude:: ../example/vector/iota-transient.cpp
   :language: c++
   :start-after: include:myiota/start
   :end-before:  include:myiota/end

Both conversions are :math:`O(1)`.  Note that calling ``transient()``
does not break the immutability of the variable it is called on.  The
new mutable object will adopt its contents, but when a mutation is
performed, it will copy the data necessary using *copy on write*.
Subsequent manipulations may hit parts that have already been copied,
and these changes are done in-place.  Because of this, it does not
make sense to use transients to do only one change.

.. tip::

   Note that :ref:`move semantics<move-semantics>` can be used instead to
   support a similar use-case.  However, transients optimise updates
   even when reference counting is disabled.

.. _std-compat:
Standard library compatibility
------------------------------

While the immutable containers provide an interface that follows a
functional style, this is incompatible with what the standard library
algorithms sometimes expect. :ref:`transients` try to provide an
interface as similar as possible to similar standard library
containers.  Thus, can also be used to interoperate with standard
library components.

For example the :ref:`myiota() function above<iota-transient>` may as
well be written using standard library tools:

.. literalinclude:: ../example/vector/iota-transient-std.cpp
   :language: c++
   :start-after: include:myiota/start
   :end-before:  include:myiota/end