diff options
Diffstat (limited to 'users/tazjin/blog/posts/nixery-layers.md')
-rw-r--r-- | users/tazjin/blog/posts/nixery-layers.md | 272 |
1 files changed, 272 insertions, 0 deletions
diff --git a/users/tazjin/blog/posts/nixery-layers.md b/users/tazjin/blog/posts/nixery-layers.md new file mode 100644 index 000000000000..26526d11b5dc --- /dev/null +++ b/users/tazjin/blog/posts/nixery-layers.md @@ -0,0 +1,272 @@ +TIP: This blog post was originally published as a design document for +[Nixery][] and is not written in the same style +as other blog posts. + +Thanks to my colleagues at Google and various people from the Nix community for +reviewing this. + +------ + +# Nixery: Improved Layering + +**Authors**: tazjin@ + +**Reviewers**: so...@, en...@, pe...@ + +**Status**: Implemented + +**Last Updated**: 2019-08-10 + +## Introduction + +This document describes a design for an improved image layering method for use +in Nixery. The algorithm [currently used][grhmc] is designed for a slightly +different use-case and we can improve upon it by making use of more of the +available data. + +## Background / Motivation + +Nixery is a service that uses the [Nix package manager][nix] to build container +images (for runtimes such as Docker), that are served on-demand via the +container [registry protocols][]. A demo instance is available at +[nixery.dev][]. + +In practice this means users can simply issue a command such as `docker pull +nixery.dev/shell/git` and receive an image that was built ad-hoc containing a +shell environment and git. + +One of the major advantages of building container images via Nix (as described +for `buildLayeredImage` in [this blog post][grhmc]) is that the +content-addressable nature of container image layers can be used to provide more +efficient caching characteristics (caching based on layer content) than what is +common with Dockerfiles and other image creation methods (caching based on layer +creation method). + +However, this is constrained by the maximum number of layers supported in an +image (125). A naive approach such as putting each included package (any +library, binary, etc.) in its own layer quickly runs into this limitation due to +the large number of dependencies more complex systems tend to have. In addition, +users wanting to extend images created by Nixery (e.g. via `FROM nixery.dev/…`) +share this layer maximum with the created image - limiting extensibility if all +layers are used up by Nixery. + +In theory the layering strategy of `buildLayeredImage` should already provide +good caching characteristics, but in practice we are seeing many images with +significantly more packages than the number of layers configured, leading to +more frequent cache-misses than desired. + +The current implementation of `buildLayeredImage` inspects a graph of image +dependencies and determines the total number of references (direct & indirect) +to any node in the graph. It then sorts all dependencies by this popularity +metric and puts the first `n - 2` (for `n` being the maximum number of layers) +packages in their own layers, all remaining packages in one layer and the image +configuration in the final layer. + +## Design / Proposal + +## (Close-to) ideal layer-layout using more data + +We start out by considering what a close to ideal layout of layers would look +like for a simple use-case. + +![Ideal layout](/static/img/nixery/ideal_layout.webp) + +In this example, counting the total number of references to each node in the +graph yields the following result: + +| pkg | refs | +|-------|------| +| E | 3 | +| D | 2 | +| F | 2 | +| A,B,C | 1 | + +Assuming we are constrained to 4 layers, the current algorithm would yield these layers: + +``` +L1: E +L2: D +L3: F +L4: A, B, C +``` + +The initial proposal for this design is that additional data should be +considered in addition to the total number of references, in particular a +distinction should be made between direct and indirect references. Packages that +are only referenced indirectly should be merged with their parents. + +This yields the following table: + +| pkg | direct | indirect | +|-------|--------|----------| +| E | 3 | 3 | +| D | 2 | 2 | +| F | *1* | 2 | +| A,B,C | 1 | 1 | + +Despite having two indirect references, F is in fact only being referred to +once. Assuming that we have no other data available outside of this graph, we +have no reason to assume that F has any popularity outside of the scope of D. +This might yield the following layers: + +``` +L1: E +L2: D, F +L3: A +L4: B, C +``` + +D and F were grouped, while the top-level references (i.e. the packages +explicitly requested by the user) were split up. + +An assumption is introduced here to justify this split: The top-level packages +is what the user is modifying directly, and those groupings are likely +unpredictable. Thus it is opportune to not group top-level packages in the same +layer. + +This raises a new question: Can we make better decisions about where to split +the top-level? + +## (Even closer to) ideal layering using (even) more data + +So far when deciding layer layouts, only information immediately available in +the build graph of the image has been considered. We do however have much more +information available, as we have both the entire nixpkgs-tree and potentially +other information (such as download statistics). + +We can calculate the total number of references to any derivation in nixpkgs and +use that to rank the popularity of each package. Packages within some percentile +can then be singled out as good candidates for a separate layer. + +When faced with a splitting decision such as in the last section, this data can +aid the decision. Assume for example that package B in the above is actually +`openssl`, which is a very popular package. Taking this into account would +instead yield the following layers: + +``` +L1: E, +L2: D, F +L3: B, +L4: A, C +``` + +## Layer budgets and download size considerations + +As described in the introduction, there is a finite amount of layers available +for each image (the “layer budget”). When calculating the layer distribution, we +might end up with the “ideal” list of layers that we would like to create. Using +our previous example: + +``` +L1: E, +L2: D, F +L3: A +L4: B +L5: C +``` + +If we only have a layer budget of 4 available, something needs to be merged into +the same layer. To make a decision here we could consider only the package +popularity, but there is in fact another piece of information that has not come +up yet: The actual size of the package. + +Presumably a user would not mind downloading a library that is a few kilobytes +in size repeatedly, but they would if it was a 200 megabyte binary instead. + +Conversely if a large binary was successfully cached, but an extremely popular +small library is not, the total download size might also grow to irritating +levels. + +To avoid this we can calculate a merge rating: + + merge_rating(pkg) = popularity_percentile(pkg) × size(pkg.subtree) + +Packages with a low merge rating would be merged together before packages with +higher merge ratings. + +## Implementation + +There are two primary components of the implementation: + +1. The layering component which, given an image specification, decides the image + layers. + +2. The popularity component which, given the entire nixpkgs-tree, calculates the + popularity of packages. + +## Layering component + +It turns out that graph theory’s concept of [dominator trees][] maps reasonably +well onto the proposed idea of separating direct and indirect dependencies. This +becomes visible when creating the dominator tree of a simple example: + +![Example without extra edges](/static/img/nixery/example_plain.webp) + +Before calculating the dominator tree, we inspect each node and insert extra +edges from the root for packages that match a certain popularity or size +threshold. In this example, G is popular and an extra edge is inserted: + +![Example with extra edges](/static/img/nixery/example_extra.webp) + +Calculating the dominator tree of this graph now yields our ideal layer +distribution: + +![Dominator tree of example](/static/img/nixery/dominator.webp) + +The nodes immediately dominated by the root node can now be “harvested” as image +layers, and merging can be performed as described above until the result fits +into the layer budget. + +To implement this, the layering component uses the [gonum/graph][] library which +supports calculating dominator trees. The program is fed with Nix’s +`exportReferencesGraph` (which contains the runtime dependency graph and runtime +closure size) as well as the popularity data and layer budget. It returns a list +of layers, each specifying the paths it should contain. + +Nix invokes this program and uses the output to create a derivation for each +layer, which is then built and returned to Nixery as usual. + +TIP: This is implemented in [`layers.go`][layers.go] in Nixery. The file starts +with an explanatory comment that talks through the process in detail. + +## Popularity component + +The primary issue in calculating the popularity of each package in the tree is +that we are interested in the runtime dependencies of a derivation, not its +build dependencies. + +To access information about the runtime dependency, the derivation actually +needs to be built by Nix - it can not be inferred because Nix does not know +which store paths will still be referenced by the build output. + +However for packages that are cached in the NixOS cache, we can simply inspect +the `narinfo`-files and use those to determine popularity. + +Not every package in nixpkgs is cached, but we can expect all *popular* packages +to be cached. Relying on the cache should therefore be reasonable and avoids us +having to rebuild/download all packages. + +The implementation will read the `narinfo` for each store path in the cache at a +given commit and create a JSON-file containing the total reference count per +package. + +For the public Nixery instance, these popularity files will be distributed via a +GCS bucket. + +TIP: This is implemented in [popcount][] in Nixery. + +-------- + +Hopefully this detailed design review was useful to you. You can also watch [my +NixCon talk][talk] about Nixery for a review of some of this, and some demos. + +[Nixery]: https://cs.tvl.fyi/depot/-/tree/tools/nixery +[grhmc]: https://grahamc.com/blog/nix-and-layered-docker-images +[Nix]: https://nixos.org/nix +[registry protocols]: https://github.com/opencontainers/distribution-spec/blob/master/spec.md +[nixery.dev]: https://nixery.dev +[dominator trees]: https://en.wikipedia.org/wiki/Dominator_(graph_theory) +[gonum/graph]: https://godoc.org/gonum.org/v1/gonum/graph +[layers.go]: https://cs.tvl.fyi/depot/-/blob/tools/nixery/layers/layers.go +[popcount]: https://cs.tvl.fyi/depot/-/tree/tools/nixery/popcount +[talk]: https://www.youtube.com/watch?v=pOI9H4oeXqA |