Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
The pid field can be -1 if forking the substituter process failed.
|
|
|
|
Fixes #269.
|
|
|
|
|
|
|
|
This causes nix-copy-closure to show what it's doing before rather
than after.
|
|
C++11 lambdas ftw.
|
|
|
|
This automatically creates /nix/var/nix/profiles/per-user and sets the
permissions/ownership on /nix/store to 1775 and root:nixbld.
|
|
Otherwise you just get ‘expected string `Derive(['’ which isn't very helpful.
|
|
Not bind-mounting the /dev from the host also solves the problem with
/dev/shm being a symlink to something not in the chroot.
|
|
This will allow Hydra to detect that a build should not be marked as
"permanently failed", allowing it to be retried later.
|
|
Namely:
nix-store: derivations.cc:242: nix::Hash nix::hashDerivationModulo(nix::StoreAPI&, nix::Derivation): Assertion `store.isValidPath(i->first)' failed.
This happened because of the derivation output correctness check being
applied before the references of a derivation are valid.
|
|
Previously we would say "error: setting synchronous mode: unable to
open database file" which isn't very helpful.
|
|
This should fix building on Illumos.
|
|
AFAIK, nobody uses it, it's not maintained, and it has no tests.
|
|
To deal with SQLITE_PROTOCOL, we also need to retry read-only
operations.
|
|
Registering the path as failed can fail if another process does the
same thing after the call to hasPathFailed(). This is extremely
unlikely though.
|
|
|
|
There is no risk of getting an inconsistent result here: if the ID
returned by queryValidPathId() is deleted from the database
concurrently, subsequent queries involving that ID will simply fail
(since IDs are never reused).
|
|
|
|
In the Hydra build farm we fairly regularly get SQLITE_PROTOCOL errors
(e.g., "querying path in database: locking protocol"). The docs for
this error code say that it "is returned if some other process is
messing with file locks and has violated the file locking protocol
that SQLite uses on its rollback journal files." However, the SQLite
source code reveals that this error can also occur under high load:
if( cnt>5 ){
int nDelay = 1; /* Pause time in microseconds */
if( cnt>100 ){
VVA_ONLY( pWal->lockError = 1; )
return SQLITE_PROTOCOL;
}
if( cnt>=10 ) nDelay = (cnt-9)*238; /* Max delay 21ms. Total delay 996ms */
sqlite3OsSleep(pWal->pVfs, nDelay);
}
i.e. if certain locks cannot be not acquired, SQLite will retry a
number of times before giving up and returing SQLITE_PROTOCOL. The
comments say:
Circumstances that cause a RETRY should only last for the briefest
instances of time. No I/O or other system calls are done while the
locks are held, so the locks should not be held for very long. But
if we are unlucky, another process that is holding a lock might get
paged out or take a page-fault that is time-consuming to resolve,
during the few nanoseconds that it is holding the lock. In that case,
it might take longer than normal for the lock to free.
...
The total delay time before giving up is less than 1 second.
On a heavily loaded machine like lucifer (the main Hydra server),
which often has dozens of processes waiting for I/O, it seems to me
that a page fault could easily take more than a second to resolve.
So, let's treat SQLITE_PROTOCOL as SQLITE_BUSY and retry the
transaction.
Issue NixOS/hydra#14.
|
|
|
|
On a system with multiple CPUs, running Nix operations through the
daemon is significantly slower than "direct" mode:
$ NIX_REMOTE= nix-instantiate '<nixos>' -A system
real 0m0.974s
user 0m0.875s
sys 0m0.088s
$ NIX_REMOTE=daemon nix-instantiate '<nixos>' -A system
real 0m2.118s
user 0m1.463s
sys 0m0.218s
The main reason seems to be that the client and the worker get moved
to a different CPU after every call to the worker. This patch adds a
hack to lock them to the same CPU. With this, the overhead of going
through the daemon is very small:
$ NIX_REMOTE=daemon nix-instantiate '<nixos>' -A system
real 0m1.074s
user 0m0.809s
sys 0m0.098s
|
|
Common operations like instantiating a NixOS system config no longer
fitted in 8192 pages, leading to more fsyncs. So increase this limit.
|
|
For instance, it's pointless to keep copy-from-other-stores running if
there are no other stores, or download-using-manifests if there are no
manifests. This also speeds things up because we don't send queries
to those substituters.
|
|
|
|
Otherwise subsequent invocations of "--repair" will keep rebuilding
the path. This only happens if the path content differs between
builds (e.g. due to timestamps).
|
|
|
|
|
|
This greatly reduces the number of system calls.
|
|
|
|
It is surprisingly impossible to check if a mountpoint is a bind mount
on Linux, and in my previous commit I forgot to check if /nix/store was
even a mountpoint at all. statvfs.f_flag is not populated with MS_BIND
(and even if it were, my check was wrong in the previous commit).
Luckily, the semantics of mount with MS_REMOUNT | MS_BIND make both
checks unnecessary: if /nix/store is not a mountpoint, then mount will
fail with EINVAL, and if /nix/store is not a bind-mount, then it will
not be made writable. Thus, if /nix/store is not a mountpoint, we fail
immediately (since we don't know how to make it writable), and if
/nix/store IS a mountpoint but not a bind-mount, we fail at first write
(see below for why we can't check and fail immediately).
Note that, due to what is IMO buggy behavior in Linux, calling mount
with MS_REMOUNT | MS_BIND on a non-bind readonly mount makes the
mountpoint appear writable in two places: In the sixth (but not the
10th!) column of mountinfo, and in the f_flags member of struct statfs.
All other syscalls behave as if the mount point were still readonly (at
least for Linux 3.9-rc1, but I don't think this has changed recently or
is expected to soon). My preferred semantics would be for MS_REMOUNT |
MS_BIND to fail on a non-bind mount, as it doesn't make sense to remount
a non bind-mount as a bind mount.
|
|
if /nix/store is a read-only bind mount
/nix/store could be a read-only bind mount even if it is / in its own filesystem, so checking the 4th field in mountinfo is insufficient.
Signed-off-by: Shea Levy <shea@shealevy.com>
|
|
This reverts commit 28bba8c44f484eae38e8a15dcec73cfa999156f6.
|
|
|
|
|
|
Now it's really brown paper bag time...
|
|
|
|
Also, change the file mode before changing the owner. This prevents a
slight time window in which a setuid binary would be setuid root.
|
|
It turns out that in multi-user Nix, a builder may be able to do
ln /etc/shadow $out/foo
Afterwards, canonicalisePathMetaData() will be applied to $out/foo,
causing /etc/shadow's mode to be set to 444 (readable by everybody but
writable by nobody). That's obviously Very Bad.
Fortunately, this fails in NixOS's default configuration because
/nix/store is a bind mount, so "ln" will fail with "Invalid
cross-device link". It also fails if hard-link restrictions are
enabled, so a workaround is:
echo 1 > /proc/sys/fs/protected_hardlinks
The solution is to check that all files in $out are owned by the build
user. This means that innocuous operations like "ln
${pkgs.foo}/some-file $out/" are now rejected, but that already failed
in chroot builds anyway.
|
|
No need to get annoying.
|
|
|
|
Doing this once makes subsequent operations like garbage collecting
more efficient since we don't have to call makeMutable() first.
|
|
If all contending processes wait a fixed amount of time (100 ms),
there is a good probability that they'll just collide again.
|
|
Hopefully this reduces the chance of hitting ‘unable to fork: Cannot
allocate memory’ errors. vfork() is used for everything except
starting builders.
|
|
They are unnecessary because we set the close-on-exec flag.
|
|
|