From bdeb6f406e7ea8faa496cedc627bc024014e14b6 Mon Sep 17 00:00:00 2001
From: sterni <sternenseemann@systemli.org>
Date: Wed, 17 Jan 2024 16:29:14 +0100
Subject: docs(tvix): initial notes on a possible generic Nix lang test suite
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This kind of collects points to consider which should hopefully help in
figuring out what such a lang test suite could or should look like
exactly—which is something I currently struggle somewhat.

Change-Id: If4f47546fe4b8046fb79718743fa9a72f9801876
Reviewed-on: https://cl.tvl.fyi/c/depot/+/10657
Reviewed-by: raitobezarius <tvl@lahfa.xyz>
Tested-by: BuildkiteCI
Reviewed-by: flokli <flokli@flokli.de>
Autosubmit: sterni <sternenseemann@systemli.org>
---
 tvix/nix-lang-test-suite/README.md | 140 +++++++++++++++++++++++++++++++++++++
 1 file changed, 140 insertions(+)
 create mode 100644 tvix/nix-lang-test-suite/README.md

(limited to 'tvix/nix-lang-test-suite/README.md')

diff --git a/tvix/nix-lang-test-suite/README.md b/tvix/nix-lang-test-suite/README.md
new file mode 100644
index 000000000000..68f87f20f58f
--- /dev/null
+++ b/tvix/nix-lang-test-suite/README.md
@@ -0,0 +1,140 @@
+# The Implementation Independent Nix Language Test Suite
+
+## Design Notes
+
+### Requirements
+
+- It should work with potentially any Nix implementation and with all serious
+  currently available ones (C++ Nix, hnix, Tvix, …). How much of it the
+  implementations pass, is of course an orthogonal question.
+- It should be easy to add test cases, independent of any specific
+  implementation.
+- It should be simple to ignore test cases and mark know failures
+  (similar to the notyetpassing mechanism in the Tvix test suite).
+
+### Test Case Types
+
+This is a summary of relevant kinds of test cases that can be found in the wild,
+usually testing some kind of concrete implementation, but also doubling up as a
+potential test case for _any_ Nix implementation. For the most part, this is the
+`lang` test suite of C++ Nix which is also used by Tvix and hnix.
+
+- **parse** test cases: Parsing the given expression should either *succeed* or
+  *fail*.
+
+  - C++ Nix doesn't have any expected output for the success cases while
+    `rnix-parser` checks them against its own textual AST representation.
+  - For the failure cases, `rnix-parser` and C++ Nix (as of recently) have
+    expected error messages/representations.
+
+  Both error and failure cases probably are hard to implement against expected
+  output/error messages for a generic test suite. Even if standardized error
+  codes are implemented (see below), it is doubtful whether it'd be useful
+  to have a dedicated code for every kind of parse/lex failure.
+- (strict) **eval** test cases: Evaluating the given expression should either
+  *fail* or *succeed* and yield a given result.
+
+  - **eval-okay** (success) tests currently require three things:
+
+    1. Successful evaluation after deeply forcing and printing the evaluation
+       result (i.e. `nix-instantiate --eval --strict`)
+    2. That the output matches an expected output exactly (string equality).
+       For this the output of `nix-instantiate(1)` is used, sometimes with
+       the addition of the `--xml --no-location` or `--json` flags.
+    3. Optionally, stderr may need to be equal to an expected string exactly
+       which would test e.g. `builtins.trace` messages or deprecation warnings
+       (C++ Nix).
+
+       This extra check is currently not supported by the Tvix test suite.
+
+  - **eval-fail** tests require that the given expression fails to evaluate. C++
+    Nix has recently started to also check the error messages via the stderr
+    mechanism described above. This is not supported by Tvix at the moment.
+- _lazy_ eval test cases: This is currently only supported by the `nix_oracle`
+  test suite in Tvix which compares the evaluation result of expressions to the
+  output of `nix-instantiate(1)` without `--strict`. By relying on the fact
+  that the resulting value is not forced deeply before printing, it can be
+  observed whether certain expressions are thunked or not.
+
+  This is somewhat fragile as permissible optimizations may prevent a thunk from
+  being created. However, this should not be an issue if the cases are chosen
+  carefully. Empirically, this test suite was useful for catching some instances
+  of overzealous evaluation early in development of Tvix.
+
+- **identity** test cases require that the given expression evaluates to a
+  value whose printed representation is the same (string equal to) the original
+  expression. Such test cases only exist in the Tvix test suite.
+
+  Of course only a limited number of expression satisfy this, but it is
+  useful for testing `nix-instantiate(1)` style value printing. Consequently,
+  it is kind of on the edge of what you can call a language test.
+
+### Extra Dependencies of Some Test Cases
+
+- **Filesystem**: Some test cases `import` other files or use `builtins.readFile`,
+  `builtins.readDir` and friends.
+- **Working and Home Directory**: Tests involving relative and home relative paths
+  need knowledge of the current and home directory to correctly interpret the output.
+  C++ Nix does a [search and replace on the test output for this purpose][cpp-nix-pwd-sed]
+- **Nix Store**: Some tests add files to the store, either via path interpolation,
+  `builtins.toFile` or `builtins.derivation`.
+
+  Additionally, it should be considered that Import-from-Derivation may be
+  interesting to test in the future. Currently, the Tvix and C++ Nix test
+  suites all pass with Import-from-Derivation disabled, i.e. a dummy store
+  implementation is enough.
+
+  Note that the absence of a store dependency ideally also influences the test
+  execution: In Tvix, for example, store independent tests can be executed
+  with a store backend that immediately errors out, verifying that the test
+  is, in fact, store independent.
+- **Environment**: The C++ Nix test suite sets a single environment variable,
+  `TEST_VAR=foo`. Additionally, `NIX_PATH` and `HOME` are sometimes set (the
+  latter is probably not a great idea, since it is not terribly reliable).
+- **Nix Path**: A predetermined Nix Path (via `NIX_PATH` and/or command line
+  arguments) needs to be set for some test cases.
+- **Nix flags**: Some tests need to have extra flags passed to `nix-instantiate(1)`
+  in order to work. This is done using a `.flags` file
+
+### Expected Output Considerations
+
+#### Success
+
+The expected output of `eval-okay` test cases (which are the majority of test
+cases) uses the standard strict output of `nix-instantiate(1)` in most cases
+which is nice to read and easy to work with. However, some more obscure aspects
+of this output inevitably leak into the test cases, namely the cycle detection
+and printing and (in the case of Tvix) the printing of thunks. Unfortunately,
+the output has been changed after Nix 2.3, bringing it closer to the output of
+`nix eval`, but in an inconsistent manner (e.g. `<CYCLE>` was changed to
+`«repeated»`, but `<LAMBDA>` remained). As a consequence, it is not always
+possible to write C++ Nix version independent test cases.
+
+It is unclear whether a satisfying solution (for a common test suite) can
+be achieved here as it has become a somewhat contentious [issue whether
+or not nix-instantiate should have a stable output](cpp-nix-attr-elision-printing-pr).
+
+A solution may be to use the XML output, specifically the `--xml --no-location`
+flags to `nix-instantiate(1)` for some of these instances. As it (hopefully)
+corresponds to `builtins.toXML`, there should be a greater incentive to keep it
+stable. It does support (only via `nix-instantiate(1)`, though) printing
+unevaluated thunks, but has no kind of cycle detection (which is fair enough for
+its intended purpose).
+
+#### Failure
+
+C++ Nix has recently (some time after Nix 2.3, probably much later actually)
+started checking error messages via expected stderr output. This naturally
+won't work for a implementation independent language test suite:
+
+- It is fine to have differing phrasing for error messages or localize them.
+- Printed error positions and stack traces may be slightly different depending
+  on implementation internals.
+- Formatting will almost certainly differ.
+
+Consequently, just checking for failure when running the test suite should be
+an option. Long term, it may be interesting to have standardized error codes
+and portable error code reporting.
+
+[cpp-nix-pwd-sed]: https://github.com/NixOS/nix/blob/2cb9c7c68102193e7d34fabe6102474fc7f98010/tests/functional/lang.sh#L109
+[cpp-nix-attr-elision-printing-pr]: https://github.com/NixOS/nix/pull/9606
-- 
cgit 1.4.1