about summary refs log tree commit diff
path: root/third_party/bazel/rules_haskell/debug/linking_utils/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'third_party/bazel/rules_haskell/debug/linking_utils/README.md')
-rw-r--r--third_party/bazel/rules_haskell/debug/linking_utils/README.md265
1 files changed, 265 insertions, 0 deletions
diff --git a/third_party/bazel/rules_haskell/debug/linking_utils/README.md b/third_party/bazel/rules_haskell/debug/linking_utils/README.md
new file mode 100644
index 0000000000..57384a27fe
--- /dev/null
+++ b/third_party/bazel/rules_haskell/debug/linking_utils/README.md
@@ -0,0 +1,265 @@
+# Debugging linking errors
+
+The usual utilities, like `nm`, `objdump`, and of course `ldd` (see
+[here](https://linux-audit.com/elf-binaries-on-linux-understanding-and-analysis/#tools-for-binary-analysis)
+for a good overview of existing tools) go a long way. Yet, when
+debugging non-trivial runtime linker failures one would oftentimes
+like to filter outputs programmatically, with more advanced query
+logic than just simple `grep` and `sed` expressions.
+
+This library provides a small set of utility subroutines. These can
+help debug complicated linker errors.
+
+The main function is `ldd(f, elf_path)`. It is in the same spirit
+as `ldd(1)`, but instead of a flat list of resolved libraries, it
+returns a tree of structured information.
+
+When we use the term `ldd` in the following document, it refers
+to the `ldd` function exported from [./ldd.py](./ldd.py).
+
+To query that tree, you pass it a function `f`, which is applied to
+each dependency recursively (transforming the tree from the bottom
+up).
+
+The following functions are exported alongside the `ldd` function.
+They can be passed to `ldd` and used as building blocks for insightful
+queries:
+
+- `identity`: don’t transform, output everything
+- `remove_matching_needed`: remove needed entries that match a regex
+- `remove_matching_runpaths`: remove runpaths that match a regex
+- `non_existing_runpaths`: return a list of runpaths that don’t exist
+  in the filesystem
+- `unused_runpaths`: return a list of runpaths that are listed in the
+  elf binary header, but no dependency was actually found in them
+- `collect_unused_runpaths`: give an overview of all unused runpaths
+
+Helpers:
+- `dict_remove_empty`: remove fields with empty lists/dicts from an output
+- `items`: `dict.iteritems()` for both python 2 and 3
+
+See the introductory tutorial below on how to use these functions.
+
+## Example usage
+
+### Setup
+
+If you have a bazel target which outputs a binary which you want to
+debug, the easiest way is to use `ldd_test`:
+
+```python
+load(
+    "//:debug/linking_utils/ldd_test.bzl",
+    "ldd_test",
+)
+
+ldd_test(
+    name = "test-ldd",
+    elf_binary = "//tests/binary-indirect-cbits",
+    current_workspace = None,
+    script = r'''
+YOUR SCRIPT HERE
+'''
+)
+```
+
+All exported functions from `ldd.py` are already in scope.
+See the [`BUILD`](./BUILD) file in this directory for an example.
+
+
+### Writing queries
+
+`ldd` takes a function that is applied to each layer of elf
+dependencies. This function is passed a set of structured data.
+This data is gathered by querying the elf binary with `objdump`
+and parsing the header fields of the dynamic section:
+
+```
+DependencyInfo :
+{ needed : dict(string, union(
+    LDD_MISSING, LDD_UNKNOWN,
+    {
+        # the needed dependency
+        item : a,
+        # where the dependency was found in
+        found_in : RunpathDir
+    }))
+# all runpath directories that were searched
+, runpath_dirs : [ RunpathDir ] }
+```
+
+The amount of data can get quite extensive for larger projects, so you
+need a way to filter it down to get to the bottom of our problem.
+
+If a transitive dependency cannot be found by the runtime linker, the
+binary cannot be started. `ldd` shows such a problem by setting
+the corresponding value in the `needed` dict to `LDD_MISSING`.
+To remove everything from the output but the missing dependency and
+the path to that dependency, you can write a filter like this:
+
+```python
+# `d` is the DependencyInfo dict from above
+def filter_down_to_missing(d):
+    res = {}
+
+    # items is a .iteritems() that works for py 2 and 3
+    for name, dep in items(d['needed']):
+        if dep == LDD_MISSING:
+            res[name] = LDD_MISSING
+        elif dep in LDD_ERRORS:
+            pass
+        else:
+            # dep['item'] contains the already converted info
+            # from the previous layer
+            res[name] = dep['item']
+
+    # dict_remove_empty removes all empty fields from the dict,
+    # otherwise your result contains a lot of {} in the values.
+    return dict_remove_empty(res)
+
+# To get human-readable output, we re-use python’s pretty printing
+# library. It’s only simple python values after all!
+import pprint
+pprint.pprint(
+  # actually parse the elf binary and apply only_missing on each layer
+  ldd(
+    filter_down_to_missing,
+    # the path to the elf binary you want to expect.
+    elf_binary_path
+  )
+)
+```
+
+Note that in the filter you only need to filter the data for the
+current executable, and add the info from previous layers (which are
+available in `d['item']`).
+
+The result might look something like:
+
+```python
+{'libfoo.so.5': {'libbar.so.1': {'libbaz.so.6': 'MISSING'}}}
+```
+
+or
+
+```python
+{}
+```
+
+if nothing is missing.
+
+Now, that is a similar output to what a tool like `lddtree(1)` could
+give you. But we don’t need to stop there because it’s trivial to
+augment your output with more information:
+
+
+```python
+def missing_with_runpath(d):
+  # our previous function can be re-used
+  missing = filter_down_to_missing(d)
+
+  # only display runpaths if there are missing deps
+  runpaths = [] if missing is {} else d['runpath_dirs']
+
+  # dict_remove_empty keeps the output clean
+  return dict_remove_empty({
+    'rpth': runpaths,
+    'miss': missing
+  })
+
+# same invocation, different function
+pprint.pprint(
+  ldd(
+    missing_with_runpath,
+    elf_binary_path
+  )
+)
+```
+
+which displays something like this for my example binary:
+
+```python
+{ 'miss': { 'libfoo.so.5': { 'miss': { 'libbar.so.1': { 'miss': { 'libbaz.so.6': 'MISSING'},
+                                                          'rpth': [ { 'absolute_path': '/home/philip/.cache/bazel/_bazel_philip/fd9fea5ad581ea59473dc1f9d6bce826/execroot/myproject/bazel-out/k8-fastbuild/bin/something/and/bazel-out/k8-fastbuild/bin/other/integrate',
+                                                                      'path': '$ORIGIN/../../../../../../bazel-out/k8-fastbuild/bin/other/integrate'}]}},
+                             'rpth': [ { 'absolute_path': '/nix/store/xdsjx0gba4id3yyqxv66bxnm2sqixkjj-glibc-2.27/lib',
+                                         'path': '/nix/store/xdsjx0gba4id3yyqxv66bxnm2sqixkjj-glibc-2.27/lib'},
+                                       { 'absolute_path': '/nix/store/x6inizi5ahlyhqxxwv1rvn05a25icarq-gcc-7.3.0-lib/lib',
+                                         'path': '/nix/store/x6inizi5ahlyhqxxwv1rvn05a25icarq-gcc-7.3.0-lib/lib'}]}},
+  'rpth': [ … lots more nix rpaths … ]}
+```
+
+That’s still a bit cluttered for my taste, so let’s filter out
+the `/nix/store` paths (which are mostly noise):
+
+```python
+import re
+nix_matcher = re.compile("/nix/store.*")
+
+def missing_with_runpath(d):
+  missing = filter_down_to_missing(d)
+
+  # this is one of the example functions provided by ldd.py
+  remove_matching_runpaths(d, nix_matcher)
+  # ^^^
+
+  runpaths = [] if missing is {} else d['runpath_dirs']
+
+  # dict_remove_empty keeps the output clean
+  return dict_remove_empty({
+    'rpth': runpaths,
+    'miss': missing
+  })
+```
+
+and we are down to:
+
+```python
+{ 'miss': { 'libfoo.so.5': { 'miss': { 'libbar.so.1': { 'miss': { 'libbaz.so.6': 'MISSING'},
+                                                          'rpth': [ { 'absolute_path': '/home/philip/.cache/bazel/_bazel_philip/fd9fea5ad581ea59473dc1f9d6bce826/execroot/myproject/bazel-out/k8-fastbuild/bin/something/and/bazel-out/k8-fastbuild/bin/other/integrate',
+                                                                      'path': '$ORIGIN/../../../../../../bazel-out/k8-fastbuild/bin/other/integrate'}]}}}
+```
+
+… which shows exactly the path that is missing the dependency we
+expect. But what has gone wrong? Does this path even exist? We can
+find out!
+
+```python
+import re
+nix_matcher = re.compile("/nix/store.*")
+
+def missing_with_runpath(d):
+  missing = filter_down_to_missing(d)
+  remove_matching_runpaths(d, nix_matcher)
+  runpaths = [] if missing is {} else d['runpath_dirs']
+
+  # returns a list of runpaths that don’t exist in the filesystem
+  doesnt_exist = non_existing_runpaths(d)
+  # ^^^
+
+  return dict_remove_empty({
+    'rpth': runpaths,
+    'miss': missing,
+    'doesnt_exist': doesnt_exist,
+  })
+```
+
+I amended the output by a list of runpaths which point to non-existing
+directories:
+
+```python
+{ 'miss': { 'libfoo.so.5': { 'miss': { 'libbar.so.1': { 'miss': { 'libbaz.so.6': 'MISSING'},
+                                                        'rpth': [ { 'absolute_path': '/home/philip/.cache/bazel/_bazel_philip/fd9fea5ad581ea59473dc1f9d6bce826/execroot/myproject/bazel-out/k8-fastbuild/bin/something/and/bazel-out/k8-fastbuild/bin/other/integrate',
+                                                                    'path': '$ORIGIN/../../../../../../bazel-out/k8-fastbuild/bin/other/integrate'}]
+                                                        'doesnt_exist': [ { 'absolute_path': '/home/philip/.cache/bazel/_bazel_philip/fd9fea5ad581ea59473dc1f9d6bce826/execroot/myproject/bazel-out/k8-fastbuild/bin/something/and/bazel-out/k8-fastbuild/bin/other/integrate',
+                                                                            'path': '$ORIGIN/../../../../../../bazel-out/k8-fastbuild/bin/other/integrate'}]}}}
+```
+
+Suddenly it’s perfectly clear where the problem lies,
+`$ORIGIN/../../../../../../bazel-out/k8-fastbuild/bin/other/integrate`
+points to a path that does not exist.
+
+Any data query you’d like to do is possible, as long as it uses
+the data provided by the `ldd` function. See the lower part of
+`ldd.py` for more examples.
+