diff options
Diffstat (limited to 'third_party/git/Documentation/technical')
52 files changed, 0 insertions, 10542 deletions
diff --git a/third_party/git/Documentation/technical/.gitignore b/third_party/git/Documentation/technical/.gitignore deleted file mode 100644 index 8aa891daee..0000000000 --- a/third_party/git/Documentation/technical/.gitignore +++ /dev/null @@ -1 +0,0 @@ -api-index.txt diff --git a/third_party/git/Documentation/technical/api-allocation-growing.txt b/third_party/git/Documentation/technical/api-allocation-growing.txt deleted file mode 100644 index 5a59b54844..0000000000 --- a/third_party/git/Documentation/technical/api-allocation-growing.txt +++ /dev/null @@ -1,39 +0,0 @@ -allocation growing API -====================== - -Dynamically growing an array using realloc() is error prone and boring. - -Define your array with: - -* a pointer (`item`) that points at the array, initialized to `NULL` - (although please name the variable based on its contents, not on its - type); - -* an integer variable (`alloc`) that keeps track of how big the current - allocation is, initialized to `0`; - -* another integer variable (`nr`) to keep track of how many elements the - array currently has, initialized to `0`. - -Then before adding `n`th element to the item, call `ALLOC_GROW(item, n, -alloc)`. This ensures that the array can hold at least `n` elements by -calling `realloc(3)` and adjusting `alloc` variable. - ------------- -sometype *item; -size_t nr; -size_t alloc - -for (i = 0; i < nr; i++) - if (we like item[i] already) - return; - -/* we did not like any existing one, so add one */ -ALLOC_GROW(item, nr + 1, alloc); -item[nr++] = value you like; ------------- - -You are responsible for updating the `nr` variable. - -If you need to specify the number of elements to allocate explicitly -then use the macro `REALLOC_ARRAY(item, alloc)` instead of `ALLOC_GROW`. diff --git a/third_party/git/Documentation/technical/api-argv-array.txt b/third_party/git/Documentation/technical/api-argv-array.txt deleted file mode 100644 index 870c8edbfb..0000000000 --- a/third_party/git/Documentation/technical/api-argv-array.txt +++ /dev/null @@ -1,65 +0,0 @@ -argv-array API -============== - -The argv-array API allows one to dynamically build and store -NULL-terminated lists. An argv-array maintains the invariant that the -`argv` member always points to a non-NULL array, and that the array is -always NULL-terminated at the element pointed to by `argv[argc]`. This -makes the result suitable for passing to functions expecting to receive -argv from main(), or the link:api-run-command.html[run-command API]. - -The string-list API (documented in string-list.h) is similar, but cannot be -used for these purposes; instead of storing a straight string pointer, -it contains an item structure with a `util` field that is not compatible -with the traditional argv interface. - -Each `argv_array` manages its own memory. Any strings pushed into the -array are duplicated, and all memory is freed by argv_array_clear(). - -Data Structures ---------------- - -`struct argv_array`:: - - A single array. This should be initialized by assignment from - `ARGV_ARRAY_INIT`, or by calling `argv_array_init`. The `argv` - member contains the actual array; the `argc` member contains the - number of elements in the array, not including the terminating - NULL. - -Functions ---------- - -`argv_array_init`:: - Initialize an array. This is no different than assigning from - `ARGV_ARRAY_INIT`. - -`argv_array_push`:: - Push a copy of a string onto the end of the array. - -`argv_array_pushl`:: - Push a list of strings onto the end of the array. The arguments - should be a list of `const char *` strings, terminated by a NULL - argument. - -`argv_array_pushf`:: - Format a string and push it onto the end of the array. This is a - convenience wrapper combining `strbuf_addf` and `argv_array_push`. - -`argv_array_pushv`:: - Push a null-terminated array of strings onto the end of the array. - -`argv_array_pop`:: - Remove the final element from the array. If there are no - elements in the array, do nothing. - -`argv_array_clear`:: - Free all memory associated with the array and return it to the - initial, empty state. - -`argv_array_detach`:: - Disconnect the `argv` member from the `argv_array` struct and - return it. The caller is responsible for freeing the memory used - by the array, and by the strings it references. After detaching, - the `argv_array` is in a reinitialized state and can be pushed - into again. diff --git a/third_party/git/Documentation/technical/api-config.txt b/third_party/git/Documentation/technical/api-config.txt deleted file mode 100644 index 7d20716c32..0000000000 --- a/third_party/git/Documentation/technical/api-config.txt +++ /dev/null @@ -1,319 +0,0 @@ -config API -========== - -The config API gives callers a way to access Git configuration files -(and files which have the same syntax). See linkgit:git-config[1] for a -discussion of the config file syntax. - -General Usage -------------- - -Config files are parsed linearly, and each variable found is passed to a -caller-provided callback function. The callback function is responsible -for any actions to be taken on the config option, and is free to ignore -some options. It is not uncommon for the configuration to be parsed -several times during the run of a Git program, with different callbacks -picking out different variables useful to themselves. - -A config callback function takes three parameters: - -- the name of the parsed variable. This is in canonical "flat" form: the - section, subsection, and variable segments will be separated by dots, - and the section and variable segments will be all lowercase. E.g., - `core.ignorecase`, `diff.SomeType.textconv`. - -- the value of the found variable, as a string. If the variable had no - value specified, the value will be NULL (typically this means it - should be interpreted as boolean true). - -- a void pointer passed in by the caller of the config API; this can - contain callback-specific data - -A config callback should return 0 for success, or -1 if the variable -could not be parsed properly. - -Basic Config Querying ---------------------- - -Most programs will simply want to look up variables in all config files -that Git knows about, using the normal precedence rules. To do this, -call `git_config` with a callback function and void data pointer. - -`git_config` will read all config sources in order of increasing -priority. Thus a callback should typically overwrite previously-seen -entries with new ones (e.g., if both the user-wide `~/.gitconfig` and -repo-specific `.git/config` contain `color.ui`, the config machinery -will first feed the user-wide one to the callback, and then the -repo-specific one; by overwriting, the higher-priority repo-specific -value is left at the end). - -The `config_with_options` function lets the caller examine config -while adjusting some of the default behavior of `git_config`. It should -almost never be used by "regular" Git code that is looking up -configuration variables. It is intended for advanced callers like -`git-config`, which are intentionally tweaking the normal config-lookup -process. It takes two extra parameters: - -`config_source`:: -If this parameter is non-NULL, it specifies the source to parse for -configuration, rather than looking in the usual files. See `struct -git_config_source` in `config.h` for details. Regular `git_config` defaults -to `NULL`. - -`opts`:: -Specify options to adjust the behavior of parsing config files. See `struct -config_options` in `config.h` for details. As an example: regular `git_config` -sets `opts.respect_includes` to `1` by default. - -Reading Specific Files ----------------------- - -To read a specific file in git-config format, use -`git_config_from_file`. This takes the same callback and data parameters -as `git_config`. - -Querying For Specific Variables -------------------------------- - -For programs wanting to query for specific variables in a non-callback -manner, the config API provides two functions `git_config_get_value` -and `git_config_get_value_multi`. They both read values from an internal -cache generated previously from reading the config files. - -`int git_config_get_value(const char *key, const char **value)`:: - - Finds the highest-priority value for the configuration variable `key`, - stores the pointer to it in `value` and returns 0. When the - configuration variable `key` is not found, returns 1 without touching - `value`. The caller should not free or modify `value`, as it is owned - by the cache. - -`const struct string_list *git_config_get_value_multi(const char *key)`:: - - Finds and returns the value list, sorted in order of increasing priority - for the configuration variable `key`. When the configuration variable - `key` is not found, returns NULL. The caller should not free or modify - the returned pointer, as it is owned by the cache. - -`void git_config_clear(void)`:: - - Resets and invalidates the config cache. - -The config API also provides type specific API functions which do conversion -as well as retrieval for the queried variable, including: - -`int git_config_get_int(const char *key, int *dest)`:: - - Finds and parses the value to an integer for the configuration variable - `key`. Dies on error; otherwise, stores the value of the parsed integer in - `dest` and returns 0. When the configuration variable `key` is not found, - returns 1 without touching `dest`. - -`int git_config_get_ulong(const char *key, unsigned long *dest)`:: - - Similar to `git_config_get_int` but for unsigned longs. - -`int git_config_get_bool(const char *key, int *dest)`:: - - Finds and parses the value into a boolean value, for the configuration - variable `key` respecting keywords like "true" and "false". Integer - values are converted into true/false values (when they are non-zero or - zero, respectively). Other values cause a die(). If parsing is successful, - stores the value of the parsed result in `dest` and returns 0. When the - configuration variable `key` is not found, returns 1 without touching - `dest`. - -`int git_config_get_bool_or_int(const char *key, int *is_bool, int *dest)`:: - - Similar to `git_config_get_bool`, except that integers are copied as-is, - and `is_bool` flag is unset. - -`int git_config_get_maybe_bool(const char *key, int *dest)`:: - - Similar to `git_config_get_bool`, except that it returns -1 on error - rather than dying. - -`int git_config_get_string_const(const char *key, const char **dest)`:: - - Allocates and copies the retrieved string into the `dest` parameter for - the configuration variable `key`; if NULL string is given, prints an - error message and returns -1. When the configuration variable `key` is - not found, returns 1 without touching `dest`. - -`int git_config_get_string(const char *key, char **dest)`:: - - Similar to `git_config_get_string_const`, except that retrieved value - copied into the `dest` parameter is a mutable string. - -`int git_config_get_pathname(const char *key, const char **dest)`:: - - Similar to `git_config_get_string`, but expands `~` or `~user` into - the user's home directory when found at the beginning of the path. - -`git_die_config(const char *key, const char *err, ...)`:: - - First prints the error message specified by the caller in `err` and then - dies printing the line number and the file name of the highest priority - value for the configuration variable `key`. - -`void git_die_config_linenr(const char *key, const char *filename, int linenr)`:: - - Helper function which formats the die error message according to the - parameters entered. Used by `git_die_config()`. It can be used by callers - handling `git_config_get_value_multi()` to print the correct error message - for the desired value. - -See test-config.c for usage examples. - -Value Parsing Helpers ---------------------- - -To aid in parsing string values, the config API provides callbacks with -a number of helper functions, including: - -`git_config_int`:: -Parse the string to an integer, including unit factors. Dies on error; -otherwise, returns the parsed result. - -`git_config_ulong`:: -Identical to `git_config_int`, but for unsigned longs. - -`git_config_bool`:: -Parse a string into a boolean value, respecting keywords like "true" and -"false". Integer values are converted into true/false values (when they -are non-zero or zero, respectively). Other values cause a die(). If -parsing is successful, the return value is the result. - -`git_config_bool_or_int`:: -Same as `git_config_bool`, except that integers are returned as-is, and -an `is_bool` flag is unset. - -`git_parse_maybe_bool`:: -Same as `git_config_bool`, except that it returns -1 on error rather -than dying. - -`git_config_string`:: -Allocates and copies the value string into the `dest` parameter; if no -string is given, prints an error message and returns -1. - -`git_config_pathname`:: -Similar to `git_config_string`, but expands `~` or `~user` into the -user's home directory when found at the beginning of the path. - -Include Directives ------------------- - -By default, the config parser does not respect include directives. -However, a caller can use the special `git_config_include` wrapper -callback to support them. To do so, you simply wrap your "real" callback -function and data pointer in a `struct config_include_data`, and pass -the wrapper to the regular config-reading functions. For example: - -------------------------------------------- -int read_file_with_include(const char *file, config_fn_t fn, void *data) -{ - struct config_include_data inc = CONFIG_INCLUDE_INIT; - inc.fn = fn; - inc.data = data; - return git_config_from_file(git_config_include, file, &inc); -} -------------------------------------------- - -`git_config` respects includes automatically. The lower-level -`git_config_from_file` does not. - -Custom Configsets ------------------ - -A `config_set` can be used to construct an in-memory cache for -config-like files that the caller specifies (i.e., files like `.gitmodules`, -`~/.gitconfig` etc.). For example, - ----------------------------------------- -struct config_set gm_config; -git_configset_init(&gm_config); -int b; -/* we add config files to the config_set */ -git_configset_add_file(&gm_config, ".gitmodules"); -git_configset_add_file(&gm_config, ".gitmodules_alt"); - -if (!git_configset_get_bool(gm_config, "submodule.frotz.ignore", &b)) { - /* hack hack hack */ -} - -/* when we are done with the configset */ -git_configset_clear(&gm_config); ----------------------------------------- - -Configset API provides functions for the above mentioned work flow, including: - -`void git_configset_init(struct config_set *cs)`:: - - Initializes the config_set `cs`. - -`int git_configset_add_file(struct config_set *cs, const char *filename)`:: - - Parses the file and adds the variable-value pairs to the `config_set`, - dies if there is an error in parsing the file. Returns 0 on success, or - -1 if the file does not exist or is inaccessible. The user has to decide - if he wants to free the incomplete configset or continue using it when - the function returns -1. - -`int git_configset_get_value(struct config_set *cs, const char *key, const char **value)`:: - - Finds the highest-priority value for the configuration variable `key` - and config set `cs`, stores the pointer to it in `value` and returns 0. - When the configuration variable `key` is not found, returns 1 without - touching `value`. The caller should not free or modify `value`, as it - is owned by the cache. - -`const struct string_list *git_configset_get_value_multi(struct config_set *cs, const char *key)`:: - - Finds and returns the value list, sorted in order of increasing priority - for the configuration variable `key` and config set `cs`. When the - configuration variable `key` is not found, returns NULL. The caller - should not free or modify the returned pointer, as it is owned by the cache. - -`void git_configset_clear(struct config_set *cs)`:: - - Clears `config_set` structure, removes all saved variable-value pairs. - -In addition to above functions, the `config_set` API provides type specific -functions in the vein of `git_config_get_int` and family but with an extra -parameter, pointer to struct `config_set`. -They all behave similarly to the `git_config_get*()` family described in -"Querying For Specific Variables" above. - -Writing Config Files --------------------- - -Git gives multiple entry points in the Config API to write config values to -files namely `git_config_set_in_file` and `git_config_set`, which write to -a specific config file or to `.git/config` respectively. They both take a -key/value pair as parameter. -In the end they both call `git_config_set_multivar_in_file` which takes four -parameters: - -- the name of the file, as a string, to which key/value pairs will be written. - -- the name of key, as a string. This is in canonical "flat" form: the section, - subsection, and variable segments will be separated by dots, and the section - and variable segments will be all lowercase. - E.g., `core.ignorecase`, `diff.SomeType.textconv`. - -- the value of the variable, as a string. If value is equal to NULL, it will - remove the matching key from the config file. - -- the value regex, as a string. It will disregard key/value pairs where value - does not match. - -- a multi_replace value, as an int. If value is equal to zero, nothing or only - one matching key/value is replaced, else all matching key/values (regardless - how many) are removed, before the new pair is written. - -It returns 0 on success. - -Also, there are functions `git_config_rename_section` and -`git_config_rename_section_in_file` with parameters `old_name` and `new_name` -for renaming or removing sections in the config files. If NULL is passed -through `new_name` parameter, the section will be removed from the config file. diff --git a/third_party/git/Documentation/technical/api-credentials.txt b/third_party/git/Documentation/technical/api-credentials.txt deleted file mode 100644 index 75368f26ca..0000000000 --- a/third_party/git/Documentation/technical/api-credentials.txt +++ /dev/null @@ -1,271 +0,0 @@ -credentials API -=============== - -The credentials API provides an abstracted way of gathering username and -password credentials from the user (even though credentials in the wider -world can take many forms, in this document the word "credential" always -refers to a username and password pair). - -This document describes two interfaces: the C API that the credential -subsystem provides to the rest of Git, and the protocol that Git uses to -communicate with system-specific "credential helpers". If you are -writing Git code that wants to look up or prompt for credentials, see -the section "C API" below. If you want to write your own helper, see -the section on "Credential Helpers" below. - -Typical setup -------------- - ------------- -+-----------------------+ -| Git code (C) |--- to server requiring ---> -| | authentication -|.......................| -| C credential API |--- prompt ---> User -+-----------------------+ - ^ | - | pipe | - | v -+-----------------------+ -| Git credential helper | -+-----------------------+ ------------- - -The Git code (typically a remote-helper) will call the C API to obtain -credential data like a login/password pair (credential_fill). The -API will itself call a remote helper (e.g. "git credential-cache" or -"git credential-store") that may retrieve credential data from a -store. If the credential helper cannot find the information, the C API -will prompt the user. Then, the caller of the API takes care of -contacting the server, and does the actual authentication. - -C API ------ - -The credential C API is meant to be called by Git code which needs to -acquire or store a credential. It is centered around an object -representing a single credential and provides three basic operations: -fill (acquire credentials by calling helpers and/or prompting the user), -approve (mark a credential as successfully used so that it can be stored -for later use), and reject (mark a credential as unsuccessful so that it -can be erased from any persistent storage). - -Data Structures -~~~~~~~~~~~~~~~ - -`struct credential`:: - - This struct represents a single username/password combination - along with any associated context. All string fields should be - heap-allocated (or NULL if they are not known or not applicable). - The meaning of the individual context fields is the same as - their counterparts in the helper protocol; see the section below - for a description of each field. -+ -The `helpers` member of the struct is a `string_list` of helpers. Each -string specifies an external helper which will be run, in order, to -either acquire or store credentials. See the section on credential -helpers below. This list is filled-in by the API functions -according to the corresponding configuration variables before -consulting helpers, so there usually is no need for a caller to -modify the helpers field at all. -+ -This struct should always be initialized with `CREDENTIAL_INIT` or -`credential_init`. - - -Functions -~~~~~~~~~ - -`credential_init`:: - - Initialize a credential structure, setting all fields to empty. - -`credential_clear`:: - - Free any resources associated with the credential structure, - returning it to a pristine initialized state. - -`credential_fill`:: - - Instruct the credential subsystem to fill the username and - password fields of the passed credential struct by first - consulting helpers, then asking the user. After this function - returns, the username and password fields of the credential are - guaranteed to be non-NULL. If an error occurs, the function will - die(). - -`credential_reject`:: - - Inform the credential subsystem that the provided credentials - have been rejected. This will cause the credential subsystem to - notify any helpers of the rejection (which allows them, for - example, to purge the invalid credentials from storage). It - will also free() the username and password fields of the - credential and set them to NULL (readying the credential for - another call to `credential_fill`). Any errors from helpers are - ignored. - -`credential_approve`:: - - Inform the credential subsystem that the provided credentials - were successfully used for authentication. This will cause the - credential subsystem to notify any helpers of the approval, so - that they may store the result to be used again. Any errors - from helpers are ignored. - -`credential_from_url`:: - - Parse a URL into broken-down credential fields. - -Example -~~~~~~~ - -The example below shows how the functions of the credential API could be -used to login to a fictitious "foo" service on a remote host: - ------------------------------------------------------------------------ -int foo_login(struct foo_connection *f) -{ - int status; - /* - * Create a credential with some context; we don't yet know the - * username or password. - */ - - struct credential c = CREDENTIAL_INIT; - c.protocol = xstrdup("foo"); - c.host = xstrdup(f->hostname); - - /* - * Fill in the username and password fields by contacting - * helpers and/or asking the user. The function will die if it - * fails. - */ - credential_fill(&c); - - /* - * Otherwise, we have a username and password. Try to use it. - */ - status = send_foo_login(f, c.username, c.password); - switch (status) { - case FOO_OK: - /* It worked. Store the credential for later use. */ - credential_accept(&c); - break; - case FOO_BAD_LOGIN: - /* Erase the credential from storage so we don't try it - * again. */ - credential_reject(&c); - break; - default: - /* - * Some other error occurred. We don't know if the - * credential is good or bad, so report nothing to the - * credential subsystem. - */ - } - - /* Free any associated resources. */ - credential_clear(&c); - - return status; -} ------------------------------------------------------------------------ - - -Credential Helpers ------------------- - -Credential helpers are programs executed by Git to fetch or save -credentials from and to long-term storage (where "long-term" is simply -longer than a single Git process; e.g., credentials may be stored -in-memory for a few minutes, or indefinitely on disk). - -Each helper is specified by a single string in the configuration -variable `credential.helper` (and others, see linkgit:git-config[1]). -The string is transformed by Git into a command to be executed using -these rules: - - 1. If the helper string begins with "!", it is considered a shell - snippet, and everything after the "!" becomes the command. - - 2. Otherwise, if the helper string begins with an absolute path, the - verbatim helper string becomes the command. - - 3. Otherwise, the string "git credential-" is prepended to the helper - string, and the result becomes the command. - -The resulting command then has an "operation" argument appended to it -(see below for details), and the result is executed by the shell. - -Here are some example specifications: - ----------------------------------------------------- -# run "git credential-foo" -foo - -# same as above, but pass an argument to the helper -foo --bar=baz - -# the arguments are parsed by the shell, so use shell -# quoting if necessary -foo --bar="whitespace arg" - -# you can also use an absolute path, which will not use the git wrapper -/path/to/my/helper --with-arguments - -# or you can specify your own shell snippet -!f() { echo "password=`cat $HOME/.secret`"; }; f ----------------------------------------------------- - -Generally speaking, rule (3) above is the simplest for users to specify. -Authors of credential helpers should make an effort to assist their -users by naming their program "git-credential-$NAME", and putting it in -the $PATH or $GIT_EXEC_PATH during installation, which will allow a user -to enable it with `git config credential.helper $NAME`. - -When a helper is executed, it will have one "operation" argument -appended to its command line, which is one of: - -`get`:: - - Return a matching credential, if any exists. - -`store`:: - - Store the credential, if applicable to the helper. - -`erase`:: - - Remove a matching credential, if any, from the helper's storage. - -The details of the credential will be provided on the helper's stdin -stream. The exact format is the same as the input/output format of the -`git credential` plumbing command (see the section `INPUT/OUTPUT -FORMAT` in linkgit:git-credential[1] for a detailed specification). - -For a `get` operation, the helper should produce a list of attributes -on stdout in the same format. A helper is free to produce a subset, or -even no values at all if it has nothing useful to provide. Any provided -attributes will overwrite those already known about by Git. If a helper -outputs a `quit` attribute with a value of `true` or `1`, no further -helpers will be consulted, nor will the user be prompted (if no -credential has been provided, the operation will then fail). - -For a `store` or `erase` operation, the helper's output is ignored. -If it fails to perform the requested operation, it may complain to -stderr to inform the user. If it does not support the requested -operation (e.g., a read-only store), it should silently ignore the -request. - -If a helper receives any other operation, it should silently ignore the -request. This leaves room for future operations to be added (older -helpers will just ignore the new requests). - -See also --------- - -linkgit:gitcredentials[7] - -linkgit:git-config[1] (See configuration variables `credential.*`) diff --git a/third_party/git/Documentation/technical/api-diff.txt b/third_party/git/Documentation/technical/api-diff.txt deleted file mode 100644 index 30fc0e9c93..0000000000 --- a/third_party/git/Documentation/technical/api-diff.txt +++ /dev/null @@ -1,174 +0,0 @@ -diff API -======== - -The diff API is for programs that compare two sets of files (e.g. two -trees, one tree and the index) and present the found difference in -various ways. The calling program is responsible for feeding the API -pairs of files, one from the "old" set and the corresponding one from -"new" set, that are different. The library called through this API is -called diffcore, and is responsible for two things. - -* finding total rewrites (`-B`), renames (`-M`) and copies (`-C`), and - changes that touch a string (`-S`), as specified by the caller. - -* outputting the differences in various formats, as specified by the - caller. - -Calling sequence ----------------- - -* Prepare `struct diff_options` to record the set of diff options, and - then call `repo_diff_setup()` to initialize this structure. This - sets up the vanilla default. - -* Fill in the options structure to specify desired output format, rename - detection, etc. `diff_opt_parse()` can be used to parse options given - from the command line in a way consistent with existing git-diff - family of programs. - -* Call `diff_setup_done()`; this inspects the options set up so far for - internal consistency and make necessary tweaking to it (e.g. if - textual patch output was asked, recursive behaviour is turned on); - the callback set_default in diff_options can be used to tweak this more. - -* As you find different pairs of files, call `diff_change()` to feed - modified files, `diff_addremove()` to feed created or deleted files, - or `diff_unmerge()` to feed a file whose state is 'unmerged' to the - API. These are thin wrappers to a lower-level `diff_queue()` function - that is flexible enough to record any of these kinds of changes. - -* Once you finish feeding the pairs of files, call `diffcore_std()`. - This will tell the diffcore library to go ahead and do its work. - -* Calling `diff_flush()` will produce the output. - - -Data structures ---------------- - -* `struct diff_filespec` - -This is the internal representation for a single file (blob). It -records the blob object name (if known -- for a work tree file it -typically is a NUL SHA-1), filemode and pathname. This is what the -`diff_addremove()`, `diff_change()` and `diff_unmerge()` synthesize and -feed `diff_queue()` function with. - -* `struct diff_filepair` - -This records a pair of `struct diff_filespec`; the filespec for a file -in the "old" set (i.e. preimage) is called `one`, and the filespec for a -file in the "new" set (i.e. postimage) is called `two`. A change that -represents file creation has NULL in `one`, and file deletion has NULL -in `two`. - -A `filepair` starts pointing at `one` and `two` that are from the same -filename, but `diffcore_std()` can break pairs and match component -filespecs with other filespecs from a different filepair to form new -filepair. This is called 'rename detection'. - -* `struct diff_queue` - -This is a collection of filepairs. Notable members are: - -`queue`:: - - An array of pointers to `struct diff_filepair`. This - dynamically grows as you add filepairs; - -`alloc`:: - - The allocated size of the `queue` array; - -`nr`:: - - The number of elements in the `queue` array. - - -* `struct diff_options` - -This describes the set of options the calling program wants to affect -the operation of diffcore library with. - -Notable members are: - -`output_format`:: - The output format used when `diff_flush()` is run. - -`context`:: - Number of context lines to generate in patch output. - -`break_opt`, `detect_rename`, `rename-score`, `rename_limit`:: - Affects the way detection logic for complete rewrites, renames - and copies. - -`abbrev`:: - Number of hexdigits to abbreviate raw format output to. - -`pickaxe`:: - A constant string (can and typically does contain newlines to - look for a block of text, not just a single line) to filter out - the filepairs that do not change the number of strings contained - in its preimage and postimage of the diff_queue. - -`flags`:: - This is mostly a collection of boolean options that affects the - operation, but some do not have anything to do with the diffcore - library. - -`touched_flags`:: - Records whether a flag has been changed due to user request - (rather than just set/unset by default). - -`set_default`:: - Callback which allows tweaking the options in diff_setup_done(). - -BINARY, TEXT;; - Affects the way how a file that is seemingly binary is treated. - -FULL_INDEX;; - Tells the patch output format not to use abbreviated object - names on the "index" lines. - -FIND_COPIES_HARDER;; - Tells the diffcore library that the caller is feeding unchanged - filepairs to allow copies from unmodified files be detected. - -COLOR_DIFF;; - Output should be colored. - -COLOR_DIFF_WORDS;; - Output is a colored word-diff. - -NO_INDEX;; - Tells diff-files that the input is not tracked files but files - in random locations on the filesystem. - -ALLOW_EXTERNAL;; - Tells output routine that it is Ok to call user specified patch - output routine. Plumbing disables this to ensure stable output. - -QUIET;; - Do not show any output. - -REVERSE_DIFF;; - Tells the library that the calling program is feeding the - filepairs reversed; `one` is two, and `two` is one. - -EXIT_WITH_STATUS;; - For communication between the calling program and the options - parser; tell the calling program to signal the presence of - difference using program exit code. - -HAS_CHANGES;; - Internal; used for optimization to see if there is any change. - -SILENT_ON_REMOVE;; - Affects if diff-files shows removed files. - -RECURSIVE, TREE_IN_RECURSIVE;; - Tells if tree traversal done by tree-diff should recursively - descend into a tree object pair that are different in preimage - and postimage set. - -(JC) diff --git a/third_party/git/Documentation/technical/api-directory-listing.txt b/third_party/git/Documentation/technical/api-directory-listing.txt deleted file mode 100644 index 5abb8e8b1f..0000000000 --- a/third_party/git/Documentation/technical/api-directory-listing.txt +++ /dev/null @@ -1,130 +0,0 @@ -directory listing API -===================== - -The directory listing API is used to enumerate paths in the work tree, -optionally taking `.git/info/exclude` and `.gitignore` files per -directory into account. - -Data structure --------------- - -`struct dir_struct` structure is used to pass directory traversal -options to the library and to record the paths discovered. A single -`struct dir_struct` is used regardless of whether or not the traversal -recursively descends into subdirectories. - -The notable options are: - -`exclude_per_dir`:: - - The name of the file to be read in each directory for excluded - files (typically `.gitignore`). - -`flags`:: - - A bit-field of options: - -`DIR_SHOW_IGNORED`::: - - Return just ignored files in `entries[]`, not untracked - files. This flag is mutually exclusive with - `DIR_SHOW_IGNORED_TOO`. - -`DIR_SHOW_IGNORED_TOO`::: - - Similar to `DIR_SHOW_IGNORED`, but return ignored files in - `ignored[]` in addition to untracked files in - `entries[]`. This flag is mutually exclusive with - `DIR_SHOW_IGNORED`. - -`DIR_KEEP_UNTRACKED_CONTENTS`::: - - Only has meaning if `DIR_SHOW_IGNORED_TOO` is also set; if this is set, the - untracked contents of untracked directories are also returned in - `entries[]`. - -`DIR_SHOW_IGNORED_TOO_MODE_MATCHING`::: - - Only has meaning if `DIR_SHOW_IGNORED_TOO` is also set; if - this is set, returns ignored files and directories that match - an exclude pattern. If a directory matches an exclude pattern, - then the directory is returned and the contained paths are - not. A directory that does not match an exclude pattern will - not be returned even if all of its contents are ignored. In - this case, the contents are returned as individual entries. -+ -If this is set, files and directories that explicitly match an ignore -pattern are reported. Implicitly ignored directories (directories that -do not match an ignore pattern, but whose contents are all ignored) -are not reported, instead all of the contents are reported. - -`DIR_COLLECT_IGNORED`::: - - Special mode for git-add. Return ignored files in `ignored[]` and - untracked files in `entries[]`. Only returns ignored files that match - pathspec exactly (no wildcards). Does not recurse into ignored - directories. - -`DIR_SHOW_OTHER_DIRECTORIES`::: - - Include a directory that is not tracked. - -`DIR_HIDE_EMPTY_DIRECTORIES`::: - - Do not include a directory that is not tracked and is empty. - -`DIR_NO_GITLINKS`::: - - If set, recurse into a directory that looks like a Git - directory. Otherwise it is shown as a directory. - -The result of the enumeration is left in these fields: - -`entries[]`:: - - An array of `struct dir_entry`, each element of which describes - a path. - -`nr`:: - - The number of members in `entries[]` array. - -`alloc`:: - - Internal use; keeps track of allocation of `entries[]` array. - -`ignored[]`:: - - An array of `struct dir_entry`, used for ignored paths with the - `DIR_SHOW_IGNORED_TOO` and `DIR_COLLECT_IGNORED` flags. - -`ignored_nr`:: - - The number of members in `ignored[]` array. - -Calling sequence ----------------- - -Note: index may be looked at for .gitignore files that are CE_SKIP_WORKTREE -marked. If you to exclude files, make sure you have loaded index first. - -* Prepare `struct dir_struct dir` and clear it with `memset(&dir, 0, - sizeof(dir))`. - -* To add single exclude pattern, call `add_exclude_list()` and then - `add_exclude()`. - -* To add patterns from a file (e.g. `.git/info/exclude`), call - `add_excludes_from_file()` , and/or set `dir.exclude_per_dir`. A - short-hand function `setup_standard_excludes()` can be used to set - up the standard set of exclude settings. - -* Set options described in the Data Structure section above. - -* Call `read_directory()`. - -* Use `dir.entries[]`. - -* Call `clear_directory()` when none of the contained elements are no longer in use. - -(JC) diff --git a/third_party/git/Documentation/technical/api-error-handling.txt b/third_party/git/Documentation/technical/api-error-handling.txt deleted file mode 100644 index ceeedd485c..0000000000 --- a/third_party/git/Documentation/technical/api-error-handling.txt +++ /dev/null @@ -1,75 +0,0 @@ -Error reporting in git -====================== - -`die`, `usage`, `error`, and `warning` report errors of various -kinds. - -- `die` is for fatal application errors. It prints a message to - the user and exits with status 128. - -- `usage` is for errors in command line usage. After printing its - message, it exits with status 129. (See also `usage_with_options` - in the link:api-parse-options.html[parse-options API].) - -- `error` is for non-fatal library errors. It prints a message - to the user and returns -1 for convenience in signaling the error - to the caller. - -- `warning` is for reporting situations that probably should not - occur but which the user (and Git) can continue to work around - without running into too many problems. Like `error`, it - returns -1 after reporting the situation to the caller. - -Customizable error handlers ---------------------------- - -The default behavior of `die` and `error` is to write a message to -stderr and then exit or return as appropriate. This behavior can be -overridden using `set_die_routine` and `set_error_routine`. For -example, "git daemon" uses set_die_routine to write the reason `die` -was called to syslog before exiting. - -Library errors --------------- - -Functions return a negative integer on error. Details beyond that -vary from function to function: - -- Some functions return -1 for all errors. Others return a more - specific value depending on how the caller might want to react - to the error. - -- Some functions report the error to stderr with `error`, - while others leave that for the caller to do. - -- errno is not meaningful on return from most functions (except - for thin wrappers for system calls). - -Check the function's API documentation to be sure. - -Caller-handled errors ---------------------- - -An increasing number of functions take a parameter 'struct strbuf *err'. -On error, such functions append a message about what went wrong to the -'err' strbuf. The message is meant to be complete enough to be passed -to `die` or `error` as-is. For example: - - if (ref_transaction_commit(transaction, &err)) - die("%s", err.buf); - -The 'err' parameter will be untouched if no error occurred, so multiple -function calls can be chained: - - t = ref_transaction_begin(&err); - if (!t || - ref_transaction_update(t, "HEAD", ..., &err) || - ret_transaction_commit(t, &err)) - die("%s", err.buf); - -The 'err' parameter must be a pointer to a valid strbuf. To silence -a message, pass a strbuf that is explicitly ignored: - - if (thing_that_can_fail_in_an_ignorable_way(..., &err)) - /* This failure is okay. */ - strbuf_reset(&err); diff --git a/third_party/git/Documentation/technical/api-gitattributes.txt b/third_party/git/Documentation/technical/api-gitattributes.txt deleted file mode 100644 index 45f0df600f..0000000000 --- a/third_party/git/Documentation/technical/api-gitattributes.txt +++ /dev/null @@ -1,154 +0,0 @@ -gitattributes API -================= - -gitattributes mechanism gives a uniform way to associate various -attributes to set of paths. - - -Data Structure --------------- - -`struct git_attr`:: - - An attribute is an opaque object that is identified by its name. - Pass the name to `git_attr()` function to obtain the object of - this type. The internal representation of this structure is - of no interest to the calling programs. The name of the - attribute can be retrieved by calling `git_attr_name()`. - -`struct attr_check_item`:: - - This structure represents one attribute and its value. - -`struct attr_check`:: - - This structure represents a collection of `attr_check_item`. - It is passed to `git_check_attr()` function, specifying the - attributes to check, and receives their values. - - -Attribute Values ----------------- - -An attribute for a path can be in one of four states: Set, Unset, -Unspecified or set to a string, and `.value` member of `struct -attr_check_item` records it. There are three macros to check these: - -`ATTR_TRUE()`:: - - Returns true if the attribute is Set for the path. - -`ATTR_FALSE()`:: - - Returns true if the attribute is Unset for the path. - -`ATTR_UNSET()`:: - - Returns true if the attribute is Unspecified for the path. - -If none of the above returns true, `.value` member points at a string -value of the attribute for the path. - - -Querying Specific Attributes ----------------------------- - -* Prepare `struct attr_check` using attr_check_initl() - function, enumerating the names of attributes whose values you are - interested in, terminated with a NULL pointer. Alternatively, an - empty `struct attr_check` can be prepared by calling - `attr_check_alloc()` function and then attributes you want to - ask about can be added to it with `attr_check_append()` - function. - -* Call `git_check_attr()` to check the attributes for the path. - -* Inspect `attr_check` structure to see how each of the - attribute in the array is defined for the path. - - -Example -------- - -To see how attributes "crlf" and "ident" are set for different paths. - -. Prepare a `struct attr_check` with two elements (because - we are checking two attributes): - ------------- -static struct attr_check *check; -static void setup_check(void) -{ - if (check) - return; /* already done */ - check = attr_check_initl("crlf", "ident", NULL); -} ------------- - -. Call `git_check_attr()` with the prepared `struct attr_check`: - ------------- - const char *path; - - setup_check(); - git_check_attr(path, check); ------------- - -. Act on `.value` member of the result, left in `check->items[]`: - ------------- - const char *value = check->items[0].value; - - if (ATTR_TRUE(value)) { - The attribute is Set, by listing only the name of the - attribute in the gitattributes file for the path. - } else if (ATTR_FALSE(value)) { - The attribute is Unset, by listing the name of the - attribute prefixed with a dash - for the path. - } else if (ATTR_UNSET(value)) { - The attribute is neither set nor unset for the path. - } else if (!strcmp(value, "input")) { - If none of ATTR_TRUE(), ATTR_FALSE(), or ATTR_UNSET() is - true, the value is a string set in the gitattributes - file for the path by saying "attr=value". - } else if (... other check using value as string ...) { - ... - } ------------- - -To see how attributes in argv[] are set for different paths, only -the first step in the above would be different. - ------------- -static struct attr_check *check; -static void setup_check(const char **argv) -{ - check = attr_check_alloc(); - while (*argv) { - struct git_attr *attr = git_attr(*argv); - attr_check_append(check, attr); - argv++; - } -} ------------- - - -Querying All Attributes ------------------------ - -To get the values of all attributes associated with a file: - -* Prepare an empty `attr_check` structure by calling - `attr_check_alloc()`. - -* Call `git_all_attrs()`, which populates the `attr_check` - with the attributes attached to the path. - -* Iterate over the `attr_check.items[]` array to examine - the attribute names and values. The name of the attribute - described by an `attr_check.items[]` object can be retrieved via - `git_attr_name(check->items[i].attr)`. (Please note that no items - will be returned for unset attributes, so `ATTR_UNSET()` will return - false for all returned `attr_check.items[]` objects.) - -* Free the `attr_check` struct by calling `attr_check_free()`. diff --git a/third_party/git/Documentation/technical/api-grep.txt b/third_party/git/Documentation/technical/api-grep.txt deleted file mode 100644 index a69cc8964d..0000000000 --- a/third_party/git/Documentation/technical/api-grep.txt +++ /dev/null @@ -1,8 +0,0 @@ -grep API -======== - -Talk about <grep.h>, things like: - -* grep_buffer() - -(JC) diff --git a/third_party/git/Documentation/technical/api-history-graph.txt b/third_party/git/Documentation/technical/api-history-graph.txt deleted file mode 100644 index d0d1707c8c..0000000000 --- a/third_party/git/Documentation/technical/api-history-graph.txt +++ /dev/null @@ -1,173 +0,0 @@ -history graph API -================= - -The graph API is used to draw a text-based representation of the commit -history. The API generates the graph in a line-by-line fashion. - -Functions ---------- - -Core functions: - -* `graph_init()` creates a new `struct git_graph` - -* `graph_update()` moves the graph to a new commit. - -* `graph_next_line()` outputs the next line of the graph into a strbuf. It - does not add a terminating newline. - -* `graph_padding_line()` outputs a line of vertical padding in the graph. It - is similar to `graph_next_line()`, but is guaranteed to never print the line - containing the current commit. Where `graph_next_line()` would print the - commit line next, `graph_padding_line()` prints a line that simply extends - all branch lines downwards one row, leaving their positions unchanged. - -* `graph_is_commit_finished()` determines if the graph has output all lines - necessary for the current commit. If `graph_update()` is called before all - lines for the current commit have been printed, the next call to - `graph_next_line()` will output an ellipsis, to indicate that a portion of - the graph was omitted. - -The following utility functions are wrappers around `graph_next_line()` and -`graph_is_commit_finished()`. They always print the output to stdout. -They can all be called with a NULL graph argument, in which case no graph -output will be printed. - -* `graph_show_commit()` calls `graph_next_line()` and - `graph_is_commit_finished()` until one of them return non-zero. This prints - all graph lines up to, and including, the line containing this commit. - Output is printed to stdout. The last line printed does not contain a - terminating newline. - -* `graph_show_oneline()` calls `graph_next_line()` and prints the result to - stdout. The line printed does not contain a terminating newline. - -* `graph_show_padding()` calls `graph_padding_line()` and prints the result to - stdout. The line printed does not contain a terminating newline. - -* `graph_show_remainder()` calls `graph_next_line()` until - `graph_is_commit_finished()` returns non-zero. Output is printed to stdout. - The last line printed does not contain a terminating newline. Returns 1 if - output was printed, and 0 if no output was necessary. - -* `graph_show_strbuf()` prints the specified strbuf to stdout, prefixing all - lines but the first with a graph line. The caller is responsible for - ensuring graph output for the first line has already been printed to stdout. - (This can be done with `graph_show_commit()` or `graph_show_oneline()`.) If - a NULL graph is supplied, the strbuf is printed as-is. - -* `graph_show_commit_msg()` is similar to `graph_show_strbuf()`, but it also - prints the remainder of the graph, if more lines are needed after the strbuf - ends. It is better than directly calling `graph_show_strbuf()` followed by - `graph_show_remainder()` since it properly handles buffers that do not end in - a terminating newline. The output printed by `graph_show_commit_msg()` will - end in a newline if and only if the strbuf ends in a newline. - -Data structure --------------- -`struct git_graph` is an opaque data type used to store the current graph -state. - -Calling sequence ----------------- - -* Create a `struct git_graph` by calling `graph_init()`. When using the - revision walking API, this is done automatically by `setup_revisions()` if - the '--graph' option is supplied. - -* Use the revision walking API to walk through a group of contiguous commits. - The `get_revision()` function automatically calls `graph_update()` each time - it is invoked. - -* For each commit, call `graph_next_line()` repeatedly, until - `graph_is_commit_finished()` returns non-zero. Each call to - `graph_next_line()` will output a single line of the graph. The resulting - lines will not contain any newlines. `graph_next_line()` returns 1 if the - resulting line contains the current commit, or 0 if this is merely a line - needed to adjust the graph before or after the current commit. This return - value can be used to determine where to print the commit summary information - alongside the graph output. - -Limitations ------------ - -* `graph_update()` must be called with commits in topological order. It should - not be called on a commit if it has already been invoked with an ancestor of - that commit, or the graph output will be incorrect. - -* `graph_update()` must be called on a contiguous group of commits. If - `graph_update()` is called on a particular commit, it should later be called - on all parents of that commit. Parents must not be skipped, or the graph - output will appear incorrect. -+ -`graph_update()` may be used on a pruned set of commits only if the parent list -has been rewritten so as to include only ancestors from the pruned set. - -* The graph API does not currently support reverse commit ordering. In - order to implement reverse ordering, the graphing API needs an - (efficient) mechanism to find the children of a commit. - -Sample usage ------------- - ------------- -struct commit *commit; -struct git_graph *graph = graph_init(opts); - -while ((commit = get_revision(opts)) != NULL) { - while (!graph_is_commit_finished(graph)) - { - struct strbuf sb; - int is_commit_line; - - strbuf_init(&sb, 0); - is_commit_line = graph_next_line(graph, &sb); - fputs(sb.buf, stdout); - - if (is_commit_line) - log_tree_commit(opts, commit); - else - putchar(opts->diffopt.line_termination); - } -} ------------- - -Sample output -------------- - -The following is an example of the output from the graph API. This output does -not include any commit summary information--callers are responsible for -outputting that information, if desired. - ------------- -* -* -* -|\ -* | -| | * -| \ \ -| \ \ -*-. \ \ -|\ \ \ \ -| | * | | -| | | | | * -| | | | | * -| | | | | * -| | | | | |\ -| | | | | | * -| * | | | | | -| | | | | * \ -| | | | | |\ | -| | | | * | | | -| | | | * | | | -* | | | | | | | -| |/ / / / / / -|/| / / / / / -* | | | | | | -|/ / / / / / -* | | | | | -| | | | | * -| | | | |/ -| | | | * ------------- diff --git a/third_party/git/Documentation/technical/api-index-skel.txt b/third_party/git/Documentation/technical/api-index-skel.txt deleted file mode 100644 index eda8c195c1..0000000000 --- a/third_party/git/Documentation/technical/api-index-skel.txt +++ /dev/null @@ -1,13 +0,0 @@ -Git API Documents -================= - -Git has grown a set of internal API over time. This collection -documents them. - -//////////////////////////////////////////////////////////////// -// table of contents begin -//////////////////////////////////////////////////////////////// - -//////////////////////////////////////////////////////////////// -// table of contents end -//////////////////////////////////////////////////////////////// diff --git a/third_party/git/Documentation/technical/api-index.sh b/third_party/git/Documentation/technical/api-index.sh deleted file mode 100755 index 9c3f4131b8..0000000000 --- a/third_party/git/Documentation/technical/api-index.sh +++ /dev/null @@ -1,28 +0,0 @@ -#!/bin/sh - -( - c=//////////////////////////////////////////////////////////////// - skel=api-index-skel.txt - sed -e '/^\/\/ table of contents begin/q' "$skel" - echo "$c" - - ls api-*.txt | - while read filename - do - case "$filename" in - api-index-skel.txt | api-index.txt) continue ;; - esac - title=$(sed -e 1q "$filename") - html=${filename%.txt}.html - echo "* link:$html[$title]" - done - echo "$c" - sed -n -e '/^\/\/ table of contents end/,$p' "$skel" -) >api-index.txt+ - -if test -f api-index.txt && cmp api-index.txt api-index.txt+ >/dev/null -then - rm -f api-index.txt+ -else - mv api-index.txt+ api-index.txt -fi diff --git a/third_party/git/Documentation/technical/api-merge.txt b/third_party/git/Documentation/technical/api-merge.txt deleted file mode 100644 index 9dc1bed768..0000000000 --- a/third_party/git/Documentation/technical/api-merge.txt +++ /dev/null @@ -1,104 +0,0 @@ -merge API -========= - -The merge API helps a program to reconcile two competing sets of -improvements to some files (e.g., unregistered changes from the work -tree versus changes involved in switching to a new branch), reporting -conflicts if found. The library called through this API is -responsible for a few things. - - * determining which trees to merge (recursive ancestor consolidation); - - * lining up corresponding files in the trees to be merged (rename - detection, subtree shifting), reporting edge cases like add/add - and rename/rename conflicts to the user; - - * performing a three-way merge of corresponding files, taking - path-specific merge drivers (specified in `.gitattributes`) - into account. - -Data structures ---------------- - -* `mmbuffer_t`, `mmfile_t` - -These store data usable for use by the xdiff backend, for writing and -for reading, respectively. See `xdiff/xdiff.h` for the definitions -and `diff.c` for examples. - -* `struct ll_merge_options` - -This describes the set of options the calling program wants to affect -the operation of a low-level (single file) merge. Some options: - -`virtual_ancestor`:: - Behave as though this were part of a merge between common - ancestors in a recursive merge. - If a helper program is specified by the - `[merge "<driver>"] recursive` configuration, it will - be used (see linkgit:gitattributes[5]). - -`variant`:: - Resolve local conflicts automatically in favor - of one side or the other (as in 'git merge-file' - `--ours`/`--theirs`/`--union`). Can be `0`, - `XDL_MERGE_FAVOR_OURS`, `XDL_MERGE_FAVOR_THEIRS`, or - `XDL_MERGE_FAVOR_UNION`. - -`renormalize`:: - Resmudge and clean the "base", "theirs" and "ours" files - before merging. Use this when the merge is likely to have - overlapped with a change in smudge/clean or end-of-line - normalization rules. - -Low-level (single file) merge ------------------------------ - -`ll_merge`:: - - Perform a three-way single-file merge in core. This is - a thin wrapper around `xdl_merge` that takes the path and - any merge backend specified in `.gitattributes` or - `.git/info/attributes` into account. Returns 0 for a - clean merge. - -Calling sequence: - -* Prepare a `struct ll_merge_options` to record options. - If you have no special requests, skip this and pass `NULL` - as the `opts` parameter to use the default options. - -* Allocate an mmbuffer_t variable for the result. - -* Allocate and fill variables with the file's original content - and two modified versions (using `read_mmfile`, for example). - -* Call `ll_merge()`. - -* Read the merged content from `result_buf.ptr` and `result_buf.size`. - -* Release buffers when finished. A simple - `free(ancestor.ptr); free(ours.ptr); free(theirs.ptr); - free(result_buf.ptr);` will do. - -If the modifications do not merge cleanly, `ll_merge` will return a -nonzero value and `result_buf` will generally include a description of -the conflict bracketed by markers such as the traditional `<<<<<<<` -and `>>>>>>>`. - -The `ancestor_label`, `our_label`, and `their_label` parameters are -used to label the different sides of a conflict if the merge driver -supports this. - -Everything else ---------------- - -Talk about <merge-recursive.h> and merge_file(): - - - merge_trees() to merge with rename detection - - merge_recursive() for ancestor consolidation - - try_merge_command() for other strategies - - conflict format - - merge options - -(Daniel, Miklos, Stephan, JC) diff --git a/third_party/git/Documentation/technical/api-object-access.txt b/third_party/git/Documentation/technical/api-object-access.txt deleted file mode 100644 index 5b29622d00..0000000000 --- a/third_party/git/Documentation/technical/api-object-access.txt +++ /dev/null @@ -1,15 +0,0 @@ -object access API -================= - -Talk about <sha1-file.c> and <object.h> family, things like - -* read_sha1_file() -* read_object_with_reference() -* has_sha1_file() -* write_sha1_file() -* pretend_object_file() -* lookup_{object,commit,tag,blob,tree} -* parse_{object,commit,tag,blob,tree} -* Use of object flags - -(JC, Shawn, Daniel, Dscho, Linus) diff --git a/third_party/git/Documentation/technical/api-oid-array.txt b/third_party/git/Documentation/technical/api-oid-array.txt deleted file mode 100644 index c97428c2c3..0000000000 --- a/third_party/git/Documentation/technical/api-oid-array.txt +++ /dev/null @@ -1,90 +0,0 @@ -oid-array API -============== - -The oid-array API provides storage and manipulation of sets of object -identifiers. The emphasis is on storage and processing efficiency, -making them suitable for large lists. Note that the ordering of items is -not preserved over some operations. - -Data Structures ---------------- - -`struct oid_array`:: - - A single array of object IDs. This should be initialized by - assignment from `OID_ARRAY_INIT`. The `oid` member contains - the actual data. The `nr` member contains the number of items in - the set. The `alloc` and `sorted` members are used internally, - and should not be needed by API callers. - -Functions ---------- - -`oid_array_append`:: - Add an item to the set. The object ID will be placed at the end of - the array (but note that some operations below may lose this - ordering). - -`oid_array_lookup`:: - Perform a binary search of the array for a specific object ID. - If found, returns the offset (in number of elements) of the - object ID. If not found, returns a negative integer. If the array - is not sorted, this function has the side effect of sorting it. - -`oid_array_clear`:: - Free all memory associated with the array and return it to the - initial, empty state. - -`oid_array_for_each`:: - Iterate over each element of the list, executing the callback - function for each one. Does not sort the list, so any custom - hash order is retained. If the callback returns a non-zero - value, the iteration ends immediately and the callback's - return is propagated; otherwise, 0 is returned. - -`oid_array_for_each_unique`:: - Iterate over each unique element of the list in sorted order, - but otherwise behave like `oid_array_for_each`. If the array - is not sorted, this function has the side effect of sorting - it. - -`oid_array_filter`:: - Apply the callback function `want` to each entry in the array, - retaining only the entries for which the function returns true. - Preserve the order of the entries that are retained. - -Examples --------- - ------------------------------------------ -int print_callback(const struct object_id *oid, - void *data) -{ - printf("%s\n", oid_to_hex(oid)); - return 0; /* always continue */ -} - -void some_func(void) -{ - struct sha1_array hashes = OID_ARRAY_INIT; - struct object_id oid; - - /* Read objects into our set */ - while (read_object_from_stdin(oid.hash)) - oid_array_append(&hashes, &oid); - - /* Check if some objects are in our set */ - while (read_object_from_stdin(oid.hash)) { - if (oid_array_lookup(&hashes, &oid) >= 0) - printf("it's in there!\n"); - - /* - * Print the unique set of objects. We could also have - * avoided adding duplicate objects in the first place, - * but we would end up re-sorting the array repeatedly. - * Instead, this will sort once and then skip duplicates - * in linear time. - */ - oid_array_for_each_unique(&hashes, print_callback, NULL); -} ------------------------------------------ diff --git a/third_party/git/Documentation/technical/api-parse-options.txt b/third_party/git/Documentation/technical/api-parse-options.txt deleted file mode 100644 index 2e2e7c10c6..0000000000 --- a/third_party/git/Documentation/technical/api-parse-options.txt +++ /dev/null @@ -1,313 +0,0 @@ -parse-options API -================= - -The parse-options API is used to parse and massage options in Git -and to provide a usage help with consistent look. - -Basics ------- - -The argument vector `argv[]` may usually contain mandatory or optional -'non-option arguments', e.g. a filename or a branch, and 'options'. -Options are optional arguments that start with a dash and -that allow to change the behavior of a command. - -* There are basically three types of options: - 'boolean' options, - options with (mandatory) 'arguments' and - options with 'optional arguments' - (i.e. a boolean option that can be adjusted). - -* There are basically two forms of options: - 'Short options' consist of one dash (`-`) and one alphanumeric - character. - 'Long options' begin with two dashes (`--`) and some - alphanumeric characters. - -* Options are case-sensitive. - Please define 'lower-case long options' only. - -The parse-options API allows: - -* 'stuck' and 'separate form' of options with arguments. - `-oArg` is stuck, `-o Arg` is separate form. - `--option=Arg` is stuck, `--option Arg` is separate form. - -* Long options may be 'abbreviated', as long as the abbreviation - is unambiguous. - -* Short options may be bundled, e.g. `-a -b` can be specified as `-ab`. - -* Boolean long options can be 'negated' (or 'unset') by prepending - `no-`, e.g. `--no-abbrev` instead of `--abbrev`. Conversely, - options that begin with `no-` can be 'negated' by removing it. - Other long options can be unset (e.g., set string to NULL, set - integer to 0) by prepending `no-`. - -* Options and non-option arguments can clearly be separated using the `--` - option, e.g. `-a -b --option -- --this-is-a-file` indicates that - `--this-is-a-file` must not be processed as an option. - -Steps to parse options ----------------------- - -. `#include "parse-options.h"` - -. define a NULL-terminated - `static const char * const builtin_foo_usage[]` array - containing alternative usage strings - -. define `builtin_foo_options` array as described below - in section 'Data Structure'. - -. in `cmd_foo(int argc, const char **argv, const char *prefix)` - call - - argc = parse_options(argc, argv, prefix, builtin_foo_options, builtin_foo_usage, flags); -+ -`parse_options()` will filter out the processed options of `argv[]` and leave the -non-option arguments in `argv[]`. -`argc` is updated appropriately because of the assignment. -+ -You can also pass NULL instead of a usage array as the fifth parameter of -parse_options(), to avoid displaying a help screen with usage info and -option list. This should only be done if necessary, e.g. to implement -a limited parser for only a subset of the options that needs to be run -before the full parser, which in turn shows the full help message. -+ -Flags are the bitwise-or of: - -`PARSE_OPT_KEEP_DASHDASH`:: - Keep the `--` that usually separates options from - non-option arguments. - -`PARSE_OPT_STOP_AT_NON_OPTION`:: - Usually the whole argument vector is massaged and reordered. - Using this flag, processing is stopped at the first non-option - argument. - -`PARSE_OPT_KEEP_ARGV0`:: - Keep the first argument, which contains the program name. It's - removed from argv[] by default. - -`PARSE_OPT_KEEP_UNKNOWN`:: - Keep unknown arguments instead of erroring out. This doesn't - work for all combinations of arguments as users might expect - it to do. E.g. if the first argument in `--unknown --known` - takes a value (which we can't know), the second one is - mistakenly interpreted as a known option. Similarly, if - `PARSE_OPT_STOP_AT_NON_OPTION` is set, the second argument in - `--unknown value` will be mistakenly interpreted as a - non-option, not as a value belonging to the unknown option, - the parser early. That's why parse_options() errors out if - both options are set. - -`PARSE_OPT_NO_INTERNAL_HELP`:: - By default, parse_options() handles `-h`, `--help` and - `--help-all` internally, by showing a help screen. This option - turns it off and allows one to add custom handlers for these - options, or to just leave them unknown. - -Data Structure --------------- - -The main data structure is an array of the `option` struct, -say `static struct option builtin_add_options[]`. -There are some macros to easily define options: - -`OPT__ABBREV(&int_var)`:: - Add `--abbrev[=<n>]`. - -`OPT__COLOR(&int_var, description)`:: - Add `--color[=<when>]` and `--no-color`. - -`OPT__DRY_RUN(&int_var, description)`:: - Add `-n, --dry-run`. - -`OPT__FORCE(&int_var, description)`:: - Add `-f, --force`. - -`OPT__QUIET(&int_var, description)`:: - Add `-q, --quiet`. - -`OPT__VERBOSE(&int_var, description)`:: - Add `-v, --verbose`. - -`OPT_GROUP(description)`:: - Start an option group. `description` is a short string that - describes the group or an empty string. - Start the description with an upper-case letter. - -`OPT_BOOL(short, long, &int_var, description)`:: - Introduce a boolean option. `int_var` is set to one with - `--option` and set to zero with `--no-option`. - -`OPT_COUNTUP(short, long, &int_var, description)`:: - Introduce a count-up option. - Each use of `--option` increments `int_var`, starting from zero - (even if initially negative), and `--no-option` resets it to - zero. To determine if `--option` or `--no-option` was encountered at - all, initialize `int_var` to a negative value, and if it is still - negative after parse_options(), then neither `--option` nor - `--no-option` was seen. - -`OPT_BIT(short, long, &int_var, description, mask)`:: - Introduce a boolean option. - If used, `int_var` is bitwise-ored with `mask`. - -`OPT_NEGBIT(short, long, &int_var, description, mask)`:: - Introduce a boolean option. - If used, `int_var` is bitwise-anded with the inverted `mask`. - -`OPT_SET_INT(short, long, &int_var, description, integer)`:: - Introduce an integer option. - `int_var` is set to `integer` with `--option`, and - reset to zero with `--no-option`. - -`OPT_STRING(short, long, &str_var, arg_str, description)`:: - Introduce an option with string argument. - The string argument is put into `str_var`. - -`OPT_STRING_LIST(short, long, &struct string_list, arg_str, description)`:: - Introduce an option with string argument. - The string argument is stored as an element in `string_list`. - Use of `--no-option` will clear the list of preceding values. - -`OPT_INTEGER(short, long, &int_var, description)`:: - Introduce an option with integer argument. - The integer is put into `int_var`. - -`OPT_MAGNITUDE(short, long, &unsigned_long_var, description)`:: - Introduce an option with a size argument. The argument must be a - non-negative integer and may include a suffix of 'k', 'm' or 'g' to - scale the provided value by 1024, 1024^2 or 1024^3 respectively. - The scaled value is put into `unsigned_long_var`. - -`OPT_EXPIRY_DATE(short, long, ×tamp_t_var, description)`:: - Introduce an option with expiry date argument, see `parse_expiry_date()`. - The timestamp is put into `timestamp_t_var`. - -`OPT_CALLBACK(short, long, &var, arg_str, description, func_ptr)`:: - Introduce an option with argument. - The argument will be fed into the function given by `func_ptr` - and the result will be put into `var`. - See 'Option Callbacks' below for a more elaborate description. - -`OPT_FILENAME(short, long, &var, description)`:: - Introduce an option with a filename argument. - The filename will be prefixed by passing the filename along with - the prefix argument of `parse_options()` to `prefix_filename()`. - -`OPT_ARGUMENT(long, &int_var, description)`:: - Introduce a long-option argument that will be kept in `argv[]`. - If this option was seen, `int_var` will be set to one (except - if a `NULL` pointer was passed). - -`OPT_NUMBER_CALLBACK(&var, description, func_ptr)`:: - Recognize numerical options like -123 and feed the integer as - if it was an argument to the function given by `func_ptr`. - The result will be put into `var`. There can be only one such - option definition. It cannot be negated and it takes no - arguments. Short options that happen to be digits take - precedence over it. - -`OPT_COLOR_FLAG(short, long, &int_var, description)`:: - Introduce an option that takes an optional argument that can - have one of three values: "always", "never", or "auto". If the - argument is not given, it defaults to "always". The `--no-` form - works like `--long=never`; it cannot take an argument. If - "always", set `int_var` to 1; if "never", set `int_var` to 0; if - "auto", set `int_var` to 1 if stdout is a tty or a pager, - 0 otherwise. - -`OPT_NOOP_NOARG(short, long)`:: - Introduce an option that has no effect and takes no arguments. - Use it to hide deprecated options that are still to be recognized - and ignored silently. - -`OPT_PASSTHRU(short, long, &char_var, arg_str, description, flags)`:: - Introduce an option that will be reconstructed into a char* string, - which must be initialized to NULL. This is useful when you need to - pass the command-line option to another command. Any previous value - will be overwritten, so this should only be used for options where - the last one specified on the command line wins. - -`OPT_PASSTHRU_ARGV(short, long, &argv_array_var, arg_str, description, flags)`:: - Introduce an option where all instances of it on the command-line will - be reconstructed into an argv_array. This is useful when you need to - pass the command-line option, which can be specified multiple times, - to another command. - -`OPT_CMDMODE(short, long, &int_var, description, enum_val)`:: - Define an "operation mode" option, only one of which in the same - group of "operating mode" options that share the same `int_var` - can be given by the user. `enum_val` is set to `int_var` when the - option is used, but an error is reported if other "operating mode" - option has already set its value to the same `int_var`. - - -The last element of the array must be `OPT_END()`. - -If not stated otherwise, interpret the arguments as follows: - -* `short` is a character for the short option - (e.g. `'e'` for `-e`, use `0` to omit), - -* `long` is a string for the long option - (e.g. `"example"` for `--example`, use `NULL` to omit), - -* `int_var` is an integer variable, - -* `str_var` is a string variable (`char *`), - -* `arg_str` is the string that is shown as argument - (e.g. `"branch"` will result in `<branch>`). - If set to `NULL`, three dots (`...`) will be displayed. - -* `description` is a short string to describe the effect of the option. - It shall begin with a lower-case letter and a full stop (`.`) shall be - omitted at the end. - -Option Callbacks ----------------- - -The function must be defined in this form: - - int func(const struct option *opt, const char *arg, int unset) - -The callback mechanism is as follows: - -* Inside `func`, the only interesting member of the structure - given by `opt` is the void pointer `opt->value`. - `*opt->value` will be the value that is saved into `var`, if you - use `OPT_CALLBACK()`. - For example, do `*(unsigned long *)opt->value = 42;` to get 42 - into an `unsigned long` variable. - -* Return value `0` indicates success and non-zero return - value will invoke `usage_with_options()` and, thus, die. - -* If the user negates the option, `arg` is `NULL` and `unset` is 1. - -Sophisticated option parsing ----------------------------- - -If you need, for example, option callbacks with optional arguments -or without arguments at all, or if you need other special cases, -that are not handled by the macros above, you need to specify the -members of the `option` structure manually. - -This is not covered in this document, but well documented -in `parse-options.h` itself. - -Examples --------- - -See `test-parse-options.c` and -`builtin/add.c`, -`builtin/clone.c`, -`builtin/commit.c`, -`builtin/fetch.c`, -`builtin/fsck.c`, -`builtin/rm.c` -for real-world examples. diff --git a/third_party/git/Documentation/technical/api-quote.txt b/third_party/git/Documentation/technical/api-quote.txt deleted file mode 100644 index e8a1bce94e..0000000000 --- a/third_party/git/Documentation/technical/api-quote.txt +++ /dev/null @@ -1,10 +0,0 @@ -quote API -========= - -Talk about <quote.h>, things like - -* sq_quote and unquote -* c_style quote and unquote -* quoting for foreign languages - -(JC) diff --git a/third_party/git/Documentation/technical/api-ref-iteration.txt b/third_party/git/Documentation/technical/api-ref-iteration.txt deleted file mode 100644 index ad9d019ff9..0000000000 --- a/third_party/git/Documentation/technical/api-ref-iteration.txt +++ /dev/null @@ -1,78 +0,0 @@ -ref iteration API -================= - - -Iteration of refs is done by using an iterate function which will call a -callback function for every ref. The callback function has this -signature: - - int handle_one_ref(const char *refname, const struct object_id *oid, - int flags, void *cb_data); - -There are different kinds of iterate functions which all take a -callback of this type. The callback is then called for each found ref -until the callback returns nonzero. The returned value is then also -returned by the iterate function. - -Iteration functions -------------------- - -* `head_ref()` just iterates the head ref. - -* `for_each_ref()` iterates all refs. - -* `for_each_ref_in()` iterates all refs which have a defined prefix and - strips that prefix from the passed variable refname. - -* `for_each_tag_ref()`, `for_each_branch_ref()`, `for_each_remote_ref()`, - `for_each_replace_ref()` iterate refs from the respective area. - -* `for_each_glob_ref()` iterates all refs that match the specified glob - pattern. - -* `for_each_glob_ref_in()` the previous and `for_each_ref_in()` combined. - -* Use `refs_` API for accessing submodules. The submodule ref store could - be obtained with `get_submodule_ref_store()`. - -* `for_each_rawref()` can be used to learn about broken ref and symref. - -* `for_each_reflog()` iterates each reflog file. - -Submodules ----------- - -If you want to iterate the refs of a submodule you first need to add the -submodules object database. You can do this by a code-snippet like -this: - - const char *path = "path/to/submodule" - if (add_submodule_odb(path)) - die("Error submodule '%s' not populated.", path); - -`add_submodule_odb()` will return zero on success. If you -do not do this you will get an error for each ref that it does not point -to a valid object. - -Note: As a side-effect of this you cannot safely assume that all -objects you lookup are available in superproject. All submodule objects -will be available the same way as the superprojects objects. - -Example: --------- - ----- -static int handle_remote_ref(const char *refname, - const unsigned char *sha1, int flags, void *cb_data) -{ - struct strbuf *output = cb_data; - strbuf_addf(output, "%s\n", refname); - return 0; -} - -... - - struct strbuf output = STRBUF_INIT; - for_each_remote_ref(handle_remote_ref, &output); - printf("%s", output.buf); ----- diff --git a/third_party/git/Documentation/technical/api-remote.txt b/third_party/git/Documentation/technical/api-remote.txt deleted file mode 100644 index f10941b2e8..0000000000 --- a/third_party/git/Documentation/technical/api-remote.txt +++ /dev/null @@ -1,127 +0,0 @@ -Remotes configuration API -========================= - -The API in remote.h gives access to the configuration related to -remotes. It handles all three configuration mechanisms historically -and currently used by Git, and presents the information in a uniform -fashion. Note that the code also handles plain URLs without any -configuration, giving them just the default information. - -struct remote -------------- - -`name`:: - - The user's nickname for the remote - -`url`:: - - An array of all of the url_nr URLs configured for the remote - -`pushurl`:: - - An array of all of the pushurl_nr push URLs configured for the remote - -`push`:: - - An array of refspecs configured for pushing, with - push_refspec being the literal strings, and push_refspec_nr - being the quantity. - -`fetch`:: - - An array of refspecs configured for fetching, with - fetch_refspec being the literal strings, and fetch_refspec_nr - being the quantity. - -`fetch_tags`:: - - The setting for whether to fetch tags (as a separate rule from - the configured refspecs); -1 means never to fetch tags, 0 - means to auto-follow tags based on the default heuristic, 1 - means to always auto-follow tags, and 2 means to fetch all - tags. - -`receivepack`, `uploadpack`:: - - The configured helper programs to run on the remote side, for - Git-native protocols. - -`http_proxy`:: - - The proxy to use for curl (http, https, ftp, etc.) URLs. - -`http_proxy_authmethod`:: - - The method used for authenticating against `http_proxy`. - -struct remotes can be found by name with remote_get(), and iterated -through with for_each_remote(). remote_get(NULL) will return the -default remote, given the current branch and configuration. - -struct refspec --------------- - -A struct refspec holds the parsed interpretation of a refspec. If it -will force updates (starts with a '+'), force is true. If it is a -pattern (sides end with '*') pattern is true. src and dest are the -two sides (including '*' characters if present); if there is only one -side, it is src, and dst is NULL; if sides exist but are empty (i.e., -the refspec either starts or ends with ':'), the corresponding side is -"". - -An array of strings can be parsed into an array of struct refspecs -using parse_fetch_refspec() or parse_push_refspec(). - -remote_find_tracking(), given a remote and a struct refspec with -either src or dst filled out, will fill out the other such that the -result is in the "fetch" specification for the remote (note that this -evaluates patterns and returns a single result). - -struct branch -------------- - -Note that this may end up moving to branch.h - -struct branch holds the configuration for a branch. It can be looked -up with branch_get(name) for "refs/heads/{name}", or with -branch_get(NULL) for HEAD. - -It contains: - -`name`:: - - The short name of the branch. - -`refname`:: - - The full path for the branch ref. - -`remote_name`:: - - The name of the remote listed in the configuration. - -`merge_name`:: - - An array of the "merge" lines in the configuration. - -`merge`:: - - An array of the struct refspecs used for the merge lines. That - is, merge[i]->dst is a local tracking ref which should be - merged into this branch by default. - -`merge_nr`:: - - The number of merge configurations - -branch_has_merge_config() returns true if the given branch has merge -configuration given. - -Other stuff ------------ - -There is other stuff in remote.h that is related, in general, to the -process of interacting with remotes. - -(Daniel Barkalow) diff --git a/third_party/git/Documentation/technical/api-revision-walking.txt b/third_party/git/Documentation/technical/api-revision-walking.txt deleted file mode 100644 index 03f9ea6ac4..0000000000 --- a/third_party/git/Documentation/technical/api-revision-walking.txt +++ /dev/null @@ -1,72 +0,0 @@ -revision walking API -==================== - -The revision walking API offers functions to build a list of revisions -and then iterate over that list. - -Calling sequence ----------------- - -The walking API has a given calling sequence: first you need to -initialize a rev_info structure, then add revisions to control what kind -of revision list do you want to get, finally you can iterate over the -revision list. - -Functions ---------- - -`repo_init_revisions`:: - - Initialize a rev_info structure with default values. The third - parameter may be NULL or can be prefix path, and then the `.prefix` - variable will be set to it. This is typically the first function you - want to call when you want to deal with a revision list. After calling - this function, you are free to customize options, like set - `.ignore_merges` to 0 if you don't want to ignore merges, and so on. See - `revision.h` for a complete list of available options. - -`add_pending_object`:: - - This function can be used if you want to add commit objects as revision - information. You can use the `UNINTERESTING` object flag to indicate if - you want to include or exclude the given commit (and commits reachable - from the given commit) from the revision list. -+ -NOTE: If you have the commits as a string list then you probably want to -use setup_revisions(), instead of parsing each string and using this -function. - -`setup_revisions`:: - - Parse revision information, filling in the `rev_info` structure, and - removing the used arguments from the argument list. Returns the number - of arguments left that weren't recognized, which are also moved to the - head of the argument list. The last parameter is used in case no - parameter given by the first two arguments. - -`prepare_revision_walk`:: - - Prepares the rev_info structure for a walk. You should check if it - returns any error (non-zero return code) and if it does not, you can - start using get_revision() to do the iteration. - -`get_revision`:: - - Takes a pointer to a `rev_info` structure and iterates over it, - returning a `struct commit *` each time you call it. The end of the - revision list is indicated by returning a NULL pointer. - -`reset_revision_walk`:: - - Reset the flags used by the revision walking api. You can use - this to do multiple sequential revision walks. - -Data structures ---------------- - -Talk about <revision.h>, things like: - -* two diff_options, one for path limiting, another for output; -* remaining functions; - -(Linus, JC, Dscho) diff --git a/third_party/git/Documentation/technical/api-run-command.txt b/third_party/git/Documentation/technical/api-run-command.txt deleted file mode 100644 index 8bf3e37f53..0000000000 --- a/third_party/git/Documentation/technical/api-run-command.txt +++ /dev/null @@ -1,264 +0,0 @@ -run-command API -=============== - -The run-command API offers a versatile tool to run sub-processes with -redirected input and output as well as with a modified environment -and an alternate current directory. - -A similar API offers the capability to run a function asynchronously, -which is primarily used to capture the output that the function -produces in the caller in order to process it. - - -Functions ---------- - -`child_process_init`:: - - Initialize a struct child_process variable. - -`start_command`:: - - Start a sub-process. Takes a pointer to a `struct child_process` - that specifies the details and returns pipe FDs (if requested). - See below for details. - -`finish_command`:: - - Wait for the completion of a sub-process that was started with - start_command(). - -`run_command`:: - - A convenience function that encapsulates a sequence of - start_command() followed by finish_command(). Takes a pointer - to a `struct child_process` that specifies the details. - -`run_command_v_opt`, `run_command_v_opt_cd_env`:: - - Convenience functions that encapsulate a sequence of - start_command() followed by finish_command(). The argument argv - specifies the program and its arguments. The argument opt is zero - or more of the flags `RUN_COMMAND_NO_STDIN`, `RUN_GIT_CMD`, - `RUN_COMMAND_STDOUT_TO_STDERR`, or `RUN_SILENT_EXEC_FAILURE` - that correspond to the members .no_stdin, .git_cmd, - .stdout_to_stderr, .silent_exec_failure of `struct child_process`. - The argument dir corresponds the member .dir. The argument env - corresponds to the member .env. - -`child_process_clear`:: - - Release the memory associated with the struct child_process. - Most users of the run-command API don't need to call this - function explicitly because `start_command` invokes it on - failure and `finish_command` calls it automatically already. - -The functions above do the following: - -. If a system call failed, errno is set and -1 is returned. A diagnostic - is printed. - -. If the program was not found, then -1 is returned and errno is set to - ENOENT; a diagnostic is printed only if .silent_exec_failure is 0. - -. Otherwise, the program is run. If it terminates regularly, its exit - code is returned. No diagnostic is printed, even if the exit code is - non-zero. - -. If the program terminated due to a signal, then the return value is the - signal number + 128, ie. the same value that a POSIX shell's $? would - report. A diagnostic is printed. - - -`start_async`:: - - Run a function asynchronously. Takes a pointer to a `struct - async` that specifies the details and returns a set of pipe FDs - for communication with the function. See below for details. - -`finish_async`:: - - Wait for the completion of an asynchronous function that was - started with start_async(). - -`run_hook`:: - - Run a hook. - The first argument is a pathname to an index file, or NULL - if the hook uses the default index file or no index is needed. - The second argument is the name of the hook. - The further arguments correspond to the hook arguments. - The last argument has to be NULL to terminate the arguments list. - If the hook does not exist or is not executable, the return - value will be zero. - If it is executable, the hook will be executed and the exit - status of the hook is returned. - On execution, .stdout_to_stderr and .no_stdin will be set. - (See below.) - - -Data structures ---------------- - -* `struct child_process` - -This describes the arguments, redirections, and environment of a -command to run in a sub-process. - -The caller: - -1. allocates and clears (using child_process_init() or - CHILD_PROCESS_INIT) a struct child_process variable; -2. initializes the members; -3. calls start_command(); -4. processes the data; -5. closes file descriptors (if necessary; see below); -6. calls finish_command(). - -The .argv member is set up as an array of string pointers (NULL -terminated), of which .argv[0] is the program name to run (usually -without a path). If the command to run is a git command, set argv[0] to -the command name without the 'git-' prefix and set .git_cmd = 1. - -Note that the ownership of the memory pointed to by .argv stays with the -caller, but it should survive until `finish_command` completes. If the -.argv member is NULL, `start_command` will point it at the .args -`argv_array` (so you may use one or the other, but you must use exactly -one). The memory in .args will be cleaned up automatically during -`finish_command` (or during `start_command` when it is unsuccessful). - -The members .in, .out, .err are used to redirect stdin, stdout, -stderr as follows: - -. Specify 0 to request no special redirection. No new file descriptor - is allocated. The child process simply inherits the channel from the - parent. - -. Specify -1 to have a pipe allocated; start_command() replaces -1 - by the pipe FD in the following way: - - .in: Returns the writable pipe end into which the caller writes; - the readable end of the pipe becomes the child's stdin. - - .out, .err: Returns the readable pipe end from which the caller - reads; the writable end of the pipe end becomes child's - stdout/stderr. - - The caller of start_command() must close the so returned FDs - after it has completed reading from/writing to it! - -. Specify a file descriptor > 0 to be used by the child: - - .in: The FD must be readable; it becomes child's stdin. - .out: The FD must be writable; it becomes child's stdout. - .err: The FD must be writable; it becomes child's stderr. - - The specified FD is closed by start_command(), even if it fails to - run the sub-process! - -. Special forms of redirection are available by setting these members - to 1: - - .no_stdin, .no_stdout, .no_stderr: The respective channel is - redirected to /dev/null. - - .stdout_to_stderr: stdout of the child is redirected to its - stderr. This happens after stderr is itself redirected. - So stdout will follow stderr to wherever it is - redirected. - -To modify the environment of the sub-process, specify an array of -string pointers (NULL terminated) in .env: - -. If the string is of the form "VAR=value", i.e. it contains '=' - the variable is added to the child process's environment. - -. If the string does not contain '=', it names an environment - variable that will be removed from the child process's environment. - -If the .env member is NULL, `start_command` will point it at the -.env_array `argv_array` (so you may use one or the other, but not both). -The memory in .env_array will be cleaned up automatically during -`finish_command` (or during `start_command` when it is unsuccessful). - -To specify a new initial working directory for the sub-process, -specify it in the .dir member. - -If the program cannot be found, the functions return -1 and set -errno to ENOENT. Normally, an error message is printed, but if -.silent_exec_failure is set to 1, no message is printed for this -special error condition. - - -* `struct async` - -This describes a function to run asynchronously, whose purpose is -to produce output that the caller reads. - -The caller: - -1. allocates and clears (memset(&asy, 0, sizeof(asy));) a - struct async variable; -2. initializes .proc and .data; -3. calls start_async(); -4. processes communicates with proc through .in and .out; -5. closes .in and .out; -6. calls finish_async(). - -The members .in, .out are used to provide a set of fd's for -communication between the caller and the callee as follows: - -. Specify 0 to have no file descriptor passed. The callee will - receive -1 in the corresponding argument. - -. Specify < 0 to have a pipe allocated; start_async() replaces - with the pipe FD in the following way: - - .in: Returns the writable pipe end into which the caller - writes; the readable end of the pipe becomes the function's - in argument. - - .out: Returns the readable pipe end from which the caller - reads; the writable end of the pipe becomes the function's - out argument. - - The caller of start_async() must close the returned FDs after it - has completed reading from/writing from them. - -. Specify a file descriptor > 0 to be used by the function: - - .in: The FD must be readable; it becomes the function's in. - .out: The FD must be writable; it becomes the function's out. - - The specified FD is closed by start_async(), even if it fails to - run the function. - -The function pointer in .proc has the following signature: - - int proc(int in, int out, void *data); - -. in, out specifies a set of file descriptors to which the function - must read/write the data that it needs/produces. The function - *must* close these descriptors before it returns. A descriptor - may be -1 if the caller did not configure a descriptor for that - direction. - -. data is the value that the caller has specified in the .data member - of struct async. - -. The return value of the function is 0 on success and non-zero - on failure. If the function indicates failure, finish_async() will - report failure as well. - - -There are serious restrictions on what the asynchronous function can do -because this facility is implemented by a thread in the same address -space on most platforms (when pthreads is available), but by a pipe to -a forked process otherwise: - -. It cannot change the program's state (global variables, environment, - etc.) in a way that the caller notices; in other words, .in and .out - are the only communication channels to the caller. - -. It must not change the program's state that the caller of the - facility also uses. diff --git a/third_party/git/Documentation/technical/api-setup.txt b/third_party/git/Documentation/technical/api-setup.txt deleted file mode 100644 index eb1fa9853e..0000000000 --- a/third_party/git/Documentation/technical/api-setup.txt +++ /dev/null @@ -1,47 +0,0 @@ -setup API -========= - -Talk about - -* setup_git_directory() -* setup_git_directory_gently() -* is_inside_git_dir() -* is_inside_work_tree() -* setup_work_tree() - -(Dscho) - -Pathspec --------- - -See glossary-context.txt for the syntax of pathspec. In memory, a -pathspec set is represented by "struct pathspec" and is prepared by -parse_pathspec(). This function takes several arguments: - -- magic_mask specifies what features that are NOT supported by the - following code. If a user attempts to use such a feature, - parse_pathspec() can reject it early. - -- flags specifies other things that the caller wants parse_pathspec to - perform. - -- prefix and args come from cmd_* functions - -parse_pathspec() helps catch unsupported features and reject them -politely. At a lower level, different pathspec-related functions may -not support the same set of features. Such pathspec-sensitive -functions are guarded with GUARD_PATHSPEC(), which will die in an -unfriendly way when an unsupported feature is requested. - -The command designers are supposed to make sure that GUARD_PATHSPEC() -never dies. They have to make sure all unsupported features are caught -by parse_pathspec(), not by GUARD_PATHSPEC. grepping GUARD_PATHSPEC() -should give the designers all pathspec-sensitive codepaths and what -features they support. - -A similar process is applied when a new pathspec magic is added. The -designer lifts the GUARD_PATHSPEC restriction in the functions that -support the new magic. At the same time (s)he has to make sure this -new feature will be caught at parse_pathspec() in commands that cannot -handle the new magic in some cases. grepping parse_pathspec() should -help. diff --git a/third_party/git/Documentation/technical/api-sigchain.txt b/third_party/git/Documentation/technical/api-sigchain.txt deleted file mode 100644 index 9e1189ef01..0000000000 --- a/third_party/git/Documentation/technical/api-sigchain.txt +++ /dev/null @@ -1,41 +0,0 @@ -sigchain API -============ - -Code often wants to set a signal handler to clean up temporary files or -other work-in-progress when we die unexpectedly. For multiple pieces of -code to do this without conflicting, each piece of code must remember -the old value of the handler and restore it either when: - - 1. The work-in-progress is finished, and the handler is no longer - necessary. The handler should revert to the original behavior - (either another handler, SIG_DFL, or SIG_IGN). - - 2. The signal is received. We should then do our cleanup, then chain - to the next handler (or die if it is SIG_DFL). - -Sigchain is a tiny library for keeping a stack of handlers. Your handler -and installation code should look something like: - ------------------------------------------- - void clean_foo_on_signal(int sig) - { - clean_foo(); - sigchain_pop(sig); - raise(sig); - } - - void other_func() - { - sigchain_push_common(clean_foo_on_signal); - mess_up_foo(); - clean_foo(); - } ------------------------------------------- - -Handlers are given the typedef of sigchain_fun. This is the same type -that is given to signal() or sigaction(). It is perfectly reasonable to -push SIG_DFL or SIG_IGN onto the stack. - -You can sigchain_push and sigchain_pop individual signals. For -convenience, sigchain_push_common will push the handler onto the stack -for many common signals. diff --git a/third_party/git/Documentation/technical/api-submodule-config.txt b/third_party/git/Documentation/technical/api-submodule-config.txt deleted file mode 100644 index fb06089393..0000000000 --- a/third_party/git/Documentation/technical/api-submodule-config.txt +++ /dev/null @@ -1,66 +0,0 @@ -submodule config cache API -========================== - -The submodule config cache API allows to read submodule -configurations/information from specified revisions. Internally -information is lazily read into a cache that is used to avoid -unnecessary parsing of the same .gitmodules files. Lookups can be done by -submodule path or name. - -Usage ------ - -To initialize the cache with configurations from the worktree the caller -typically first calls `gitmodules_config()` to read values from the -worktree .gitmodules and then to overlay the local git config values -`parse_submodule_config_option()` from the config parsing -infrastructure. - -The caller can look up information about submodules by using the -`submodule_from_path()` or `submodule_from_name()` functions. They return -a `struct submodule` which contains the values. The API automatically -initializes and allocates the needed infrastructure on-demand. If the -caller does only want to lookup values from revisions the initialization -can be skipped. - -If the internal cache might grow too big or when the caller is done with -the API, all internally cached values can be freed with submodule_free(). - -Data Structures ---------------- - -`struct submodule`:: - - This structure is used to return the information about one - submodule for a certain revision. It is returned by the lookup - functions. - -Functions ---------- - -`void submodule_free(struct repository *r)`:: - - Use these to free the internally cached values. - -`int parse_submodule_config_option(const char *var, const char *value)`:: - - Can be passed to the config parsing infrastructure to parse - local (worktree) submodule configurations. - -`const struct submodule *submodule_from_path(const unsigned char *treeish_name, const char *path)`:: - - Given a tree-ish in the superproject and a path, return the - submodule that is bound at the path in the named tree. - -`const struct submodule *submodule_from_name(const unsigned char *treeish_name, const char *name)`:: - - The same as above but lookup by name. - -Whenever a submodule configuration is parsed in `parse_submodule_config_option` -via e.g. `gitmodules_config()`, it will overwrite the null_sha1 entry. -So in the normal case, when HEAD:.gitmodules is parsed first and then overlayed -with the repository configuration, the null_sha1 entry contains the local -configuration of a submodule (e.g. consolidated values from local git -configuration and the .gitmodules file in the worktree). - -For an example usage see test-submodule-config.c. diff --git a/third_party/git/Documentation/technical/api-trace.txt b/third_party/git/Documentation/technical/api-trace.txt deleted file mode 100644 index fadb5979c4..0000000000 --- a/third_party/git/Documentation/technical/api-trace.txt +++ /dev/null @@ -1,140 +0,0 @@ -trace API -========= - -The trace API can be used to print debug messages to stderr or a file. Trace -code is inactive unless explicitly enabled by setting `GIT_TRACE*` environment -variables. - -The trace implementation automatically adds `timestamp file:line ... \n` to -all trace messages. E.g.: - ------------- -23:59:59.123456 git.c:312 trace: built-in: git 'foo' -00:00:00.000001 builtin/foo.c:99 foo: some message ------------- - -Data Structures ---------------- - -`struct trace_key`:: - - Defines a trace key (or category). The default (for API functions that - don't take a key) is `GIT_TRACE`. -+ -E.g. to define a trace key controlled by environment variable `GIT_TRACE_FOO`: -+ ------------- -static struct trace_key trace_foo = TRACE_KEY_INIT(FOO); - -static void trace_print_foo(const char *message) -{ - trace_printf_key(&trace_foo, "%s", message); -} ------------- -+ -Note: don't use `const` as the trace implementation stores internal state in -the `trace_key` structure. - -Functions ---------- - -`int trace_want(struct trace_key *key)`:: - - Checks whether the trace key is enabled. Used to prevent expensive - string formatting before calling one of the printing APIs. - -`void trace_disable(struct trace_key *key)`:: - - Disables tracing for the specified key, even if the environment - variable was set. - -`void trace_printf(const char *format, ...)`:: -`void trace_printf_key(struct trace_key *key, const char *format, ...)`:: - - Prints a formatted message, similar to printf. - -`void trace_argv_printf(const char **argv, const char *format, ...)``:: - - Prints a formatted message, followed by a quoted list of arguments. - -`void trace_strbuf(struct trace_key *key, const struct strbuf *data)`:: - - Prints the strbuf, without additional formatting (i.e. doesn't - choke on `%` or even `\0`). - -`uint64_t getnanotime(void)`:: - - Returns nanoseconds since the epoch (01/01/1970), typically used - for performance measurements. -+ -Currently there are high precision timer implementations for Linux (using -`clock_gettime(CLOCK_MONOTONIC)`) and Windows (`QueryPerformanceCounter`). -Other platforms use `gettimeofday` as time source. - -`void trace_performance(uint64_t nanos, const char *format, ...)`:: -`void trace_performance_since(uint64_t start, const char *format, ...)`:: - - Prints the elapsed time (in nanoseconds), or elapsed time since - `start`, followed by a formatted message. Enabled via environment - variable `GIT_TRACE_PERFORMANCE`. Used for manual profiling, e.g.: -+ ------------- -uint64_t start = getnanotime(); -/* code section to measure */ -trace_performance_since(start, "foobar"); ------------- -+ ------------- -uint64_t t = 0; -for (;;) { - /* ignore */ - t -= getnanotime(); - /* code section to measure */ - t += getnanotime(); - /* ignore */ -} -trace_performance(t, "frotz"); ------------- - -Bugs & Caveats --------------- - -GIT_TRACE_* environment variables can be used to tell Git to show -trace output to its standard error stream. Git can often spawn a pager -internally to run its subcommand and send its standard output and -standard error to it. - -Because GIT_TRACE_PERFORMANCE trace is generated only at the very end -of the program with atexit(), which happens after the pager exits, it -would not work well if you send its log to the standard error output -and let Git spawn the pager at the same time. - -As a work around, you can for example use '--no-pager', or set -GIT_TRACE_PERFORMANCE to another file descriptor which is redirected -to stderr, or set GIT_TRACE_PERFORMANCE to a file specified by its -absolute path. - -For example instead of the following command which by default may not -print any performance information: - ------------- -GIT_TRACE_PERFORMANCE=2 git log -1 ------------- - -you may want to use: - ------------- -GIT_TRACE_PERFORMANCE=2 git --no-pager log -1 ------------- - -or: - ------------- -GIT_TRACE_PERFORMANCE=3 3>&2 git log -1 ------------- - -or: - ------------- -GIT_TRACE_PERFORMANCE=/path/to/log/file git log -1 ------------- diff --git a/third_party/git/Documentation/technical/api-trace2.txt b/third_party/git/Documentation/technical/api-trace2.txt deleted file mode 100644 index 71eb081fed..0000000000 --- a/third_party/git/Documentation/technical/api-trace2.txt +++ /dev/null @@ -1,1378 +0,0 @@ -= Trace2 API - -The Trace2 API can be used to print debug, performance, and telemetry -information to stderr or a file. The Trace2 feature is inactive unless -explicitly enabled by enabling one or more Trace2 Targets. - -The Trace2 API is intended to replace the existing (Trace1) -printf-style tracing provided by the existing `GIT_TRACE` and -`GIT_TRACE_PERFORMANCE` facilities. During initial implementation, -Trace2 and Trace1 may operate in parallel. - -The Trace2 API defines a set of high-level messages with known fields, -such as (`start`: `argv`) and (`exit`: {`exit-code`, `elapsed-time`}). - -Trace2 instrumentation throughout the Git code base sends Trace2 -messages to the enabled Trace2 Targets. Targets transform these -messages content into purpose-specific formats and write events to -their data streams. In this manner, the Trace2 API can drive -many different types of analysis. - -Targets are defined using a VTable allowing easy extension to other -formats in the future. This might be used to define a binary format, -for example. - -Trace2 is controlled using `trace2.*` config values in the system and -global config files and `GIT_TRACE2*` environment variables. Trace2 does -not read from repo local or worktree config files or respect `-c` -command line config settings. - -== Trace2 Targets - -Trace2 defines the following set of Trace2 Targets. -Format details are given in a later section. - -=== The Normal Format Target - -The normal format target is a tradition printf format and similar -to GIT_TRACE format. This format is enabled with the `GIT_TRACE2` -environment variable or the `trace2.normalTarget` system or global -config setting. - -For example - ------------- -$ export GIT_TRACE2=~/log.normal -$ git version -git version 2.20.1.155.g426c96fcdb ------------- - -or - ------------- -$ git config --global trace2.normalTarget ~/log.normal -$ git version -git version 2.20.1.155.g426c96fcdb ------------- - -yields - ------------- -$ cat ~/log.normal -12:28:42.620009 common-main.c:38 version 2.20.1.155.g426c96fcdb -12:28:42.620989 common-main.c:39 start git version -12:28:42.621101 git.c:432 cmd_name version (version) -12:28:42.621215 git.c:662 exit elapsed:0.001227 code:0 -12:28:42.621250 trace2/tr2_tgt_normal.c:124 atexit elapsed:0.001265 code:0 ------------- - -=== The Performance Format Target - -The performance format target (PERF) is a column-based format to -replace GIT_TRACE_PERFORMANCE and is suitable for development and -testing, possibly to complement tools like gprof. This format is -enabled with the `GIT_TRACE2_PERF` environment variable or the -`trace2.perfTarget` system or global config setting. - -For example - ------------- -$ export GIT_TRACE2_PERF=~/log.perf -$ git version -git version 2.20.1.155.g426c96fcdb ------------- - -or - ------------- -$ git config --global trace2.perfTarget ~/log.perf -$ git version -git version 2.20.1.155.g426c96fcdb ------------- - -yields - ------------- -$ cat ~/log.perf -12:28:42.620675 common-main.c:38 | d0 | main | version | | | | | 2.20.1.155.g426c96fcdb -12:28:42.621001 common-main.c:39 | d0 | main | start | | 0.001173 | | | git version -12:28:42.621111 git.c:432 | d0 | main | cmd_name | | | | | version (version) -12:28:42.621225 git.c:662 | d0 | main | exit | | 0.001227 | | | code:0 -12:28:42.621259 trace2/tr2_tgt_perf.c:211 | d0 | main | atexit | | 0.001265 | | | code:0 ------------- - -=== The Event Format Target - -The event format target is a JSON-based format of event data suitable -for telemetry analysis. This format is enabled with the `GIT_TRACE2_EVENT` -environment variable or the `trace2.eventTarget` system or global config -setting. - -For example - ------------- -$ export GIT_TRACE2_EVENT=~/log.event -$ git version -git version 2.20.1.155.g426c96fcdb ------------- - -or - ------------- -$ git config --global trace2.eventTarget ~/log.event -$ git version -git version 2.20.1.155.g426c96fcdb ------------- - -yields - ------------- -$ cat ~/log.event -{"event":"version","sid":"sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.620713Z","file":"common-main.c","line":38,"evt":"1","exe":"2.20.1.155.g426c96fcdb"} -{"event":"start","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621027Z","file":"common-main.c","line":39,"t_abs":0.001173,"argv":["git","version"]} -{"event":"cmd_name","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621122Z","file":"git.c","line":432,"name":"version","hierarchy":"version"} -{"event":"exit","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621236Z","file":"git.c","line":662,"t_abs":0.001227,"code":0} -{"event":"atexit","sid":"20190408T191610.507018Z-H9b68c35f-P000059a8","thread":"main","time":"2019-01-16T17:28:42.621268Z","file":"trace2/tr2_tgt_event.c","line":163,"t_abs":0.001265,"code":0} ------------- - -=== Enabling a Target - -To enable a target, set the corresponding environment variable or -system or global config value to one of the following: - -include::../trace2-target-values.txt[] - -If the target already exists and is a directory, the traces will be -written to files (one per process) underneath the given directory. They -will be named according to the last component of the SID (optionally -followed by a counter to avoid filename collisions). - -== Trace2 API - -All public Trace2 functions and macros are defined in `trace2.h` and -`trace2.c`. All public symbols are prefixed with `trace2_`. - -There are no public Trace2 data structures. - -The Trace2 code also defines a set of private functions and data types -in the `trace2/` directory. These symbols are prefixed with `tr2_` -and should only be used by functions in `trace2.c`. - -== Conventions for Public Functions and Macros - -The functions defined by the Trace2 API are declared and documented -in `trace2.h`. It defines the API functions and wrapper macros for -Trace2. - -Some functions have a `_fl()` suffix to indicate that they take `file` -and `line-number` arguments. - -Some functions have a `_va_fl()` suffix to indicate that they also -take a `va_list` argument. - -Some functions have a `_printf_fl()` suffix to indicate that they also -take a varargs argument. - -There are CPP wrapper macros and ifdefs to hide most of these details. -See `trace2.h` for more details. The following discussion will only -describe the simplified forms. - -== Public API - -All Trace2 API functions send a messsage to all of the active -Trace2 Targets. This section describes the set of available -messages. - -It helps to divide these functions into groups for discussion -purposes. - -=== Basic Command Messages - -These are concerned with the lifetime of the overall git process. - -`void trace2_initialize_clock()`:: - - Initialize the Trace2 start clock and nothing else. This should - be called at the very top of main() to capture the process start - time and reduce startup order dependencies. - -`void trace2_initialize()`:: - - Determines if any Trace2 Targets should be enabled and - initializes the Trace2 facility. This includes setting up the - Trace2 thread local storage (TLS). -+ -This function emits a "version" message containing the version of git -and the Trace2 protocol. -+ -This function should be called from `main()` as early as possible in -the life of the process after essential process initialization. - -`int trace2_is_enabled()`:: - - Returns 1 if Trace2 is enabled (at least one target is - active). - -`void trace2_cmd_start(int argc, const char **argv)`:: - - Emits a "start" message containing the process command line - arguments. - -`int trace2_cmd_exit(int exit_code)`:: - - Emits an "exit" message containing the process exit-code and - elapsed time. -+ -Returns the exit-code. - -`void trace2_cmd_error(const char *fmt, va_list ap)`:: - - Emits an "error" message containing a formatted error message. - -`void trace2_cmd_path(const char *pathname)`:: - - Emits a "cmd_path" message with the full pathname of the - current process. - -=== Command Detail Messages - -These are concerned with describing the specific Git command -after the command line, config, and environment are inspected. - -`void trace2_cmd_name(const char *name)`:: - - Emits a "cmd_name" message with the canonical name of the - command, for example "status" or "checkout". - -`void trace2_cmd_mode(const char *mode)`:: - - Emits a "cmd_mode" message with a qualifier name to further - describe the current git command. -+ -This message is intended to be used with git commands having multiple -major modes. For example, a "checkout" command can checkout a new -branch or it can checkout a single file, so the checkout code could -emit a cmd_mode message of "branch" or "file". - -`void trace2_cmd_alias(const char *alias, const char **argv_expansion)`:: - - Emits an "alias" message containing the alias used and the - argument expansion. - -`void trace2_def_param(const char *parameter, const char *value)`:: - - Emits a "def_param" message containing a key/value pair. -+ -This message is intended to report some global aspect of the current -command, such as a configuration setting or command line switch that -significantly affects program performance or behavior, such as -`core.abbrev`, `status.showUntrackedFiles`, or `--no-ahead-behind`. - -`void trace2_cmd_list_config()`:: - - Emits a "def_param" messages for "important" configuration - settings. -+ -The environment variable `GIT_TRACE2_CONFIG_PARAMS` or the `trace2.configParams` -config value can be set to a -list of patterns of important configuration settings, for example: -`core.*,remote.*.url`. This function will iterate over all config -settings and emit a "def_param" message for each match. - -`void trace2_cmd_set_config(const char *key, const char *value)`:: - - Emits a "def_param" message for a new or updated key/value - pair IF `key` is considered important. -+ -This is used to hook into `git_config_set()` and catch any -configuration changes and update a value previously reported by -`trace2_cmd_list_config()`. - -`void trace2_def_repo(struct repository *repo)`:: - - Registers a repository with the Trace2 layer. Assigns a - unique "repo-id" to `repo->trace2_repo_id`. -+ -Emits a "worktree" messages containing the repo-id and the worktree -pathname. -+ -Region and data messages (described later) may refer to this repo-id. -+ -The main/top-level repository will have repo-id value 1 (aka "r1"). -+ -The repo-id field is in anticipation of future in-proc submodule -repositories. - -=== Child Process Messages - -These are concerned with the various spawned child processes, -including shell scripts, git commands, editors, pagers, and hooks. - -`void trace2_child_start(struct child_process *cmd)`:: - - Emits a "child_start" message containing the "child-id", - "child-argv", and "child-classification". -+ -Before calling this, set `cmd->trace2_child_class` to a name -describing the type of child process, for example "editor". -+ -This function assigns a unique "child-id" to `cmd->trace2_child_id`. -This field is used later during the "child_exit" message to associate -it with the "child_start" message. -+ -This function should be called before spawning the child process. - -`void trace2_child_exit(struct child_proess *cmd, int child_exit_code)`:: - - Emits a "child_exit" message containing the "child-id", - the child's elapsed time and exit-code. -+ -The reported elapsed time includes the process creation overhead and -time spend waiting for it to exit, so it may be slightly longer than -the time reported by the child itself. -+ -This function should be called after reaping the child process. - -`int trace2_exec(const char *exe, const char **argv)`:: - - Emits a "exec" message containing the "exec-id" and the - argv of the new process. -+ -This function should be called before calling one of the `exec()` -variants, such as `execvp()`. -+ -This function returns a unique "exec-id". This value is used later -if the exec() fails and a "exec-result" message is necessary. - -`void trace2_exec_result(int exec_id, int error_code)`:: - - Emits a "exec_result" message containing the "exec-id" - and the error code. -+ -On Unix-based systems, `exec()` does not return if successful. -This message is used to indicate that the `exec()` failed and -that the current program is continuing. - -=== Git Thread Messages - -These messages are concerned with Git thread usage. - -`void trace2_thread_start(const char *thread_name)`:: - - Emits a "thread_start" message. -+ -The `thread_name` field should be a descriptive name, such as the -unique name of the thread-proc. A unique "thread-id" will be added -to the name to uniquely identify thread instances. -+ -Region and data messages (described later) may refer to this thread -name. -+ -This function must be called by the thread-proc of the new thread -(so that TLS data is properly initialized) and not by the caller -of `pthread_create()`. - -`void trace2_thread_exit()`:: - - Emits a "thread_exit" message containing the thread name - and the thread elapsed time. -+ -This function must be called by the thread-proc before it returns -(so that the coorect TLS data is used and cleaned up. It should -not be called by the caller of `pthread_join()`. - -=== Region and Data Messages - -These are concerned with recording performance data -over regions or spans of code. - -`void trace2_region_enter(const char *category, const char *label, const struct repository *repo)`:: - -`void trace2_region_enter_printf(const char *category, const char *label, const struct repository *repo, const char *fmt, ...)`:: - -`void trace2_region_enter_printf_va(const char *category, const char *label, const struct repository *repo, const char *fmt, va_list ap)`:: - - Emits a thread-relative "region_enter" message with optional - printf string. -+ -This function pushes a new region nesting stack level on the current -thread and starts a clock for the new stack frame. -+ -The `category` field is an arbitrary category name used to classify -regions by feature area, such as "status" or "index". At this time -it is only just printed along with the rest of the message. It may -be used in the future to filter messages. -+ -The `label` field is an arbitrary label used to describe the activity -being started, such as "read_recursive" or "do_read_index". -+ -The `repo` field, if set, will be used to get the "repo-id", so that -recursive oerations can be attributed to the correct repository. - -`void trace2_region_leave(const char *category, const char *label, const struct repository *repo)`:: - -`void trace2_region_leave_printf(const char *category, const char *label, const struct repository *repo, const char *fmt, ...)`:: - -`void trace2_region_leave_printf_va(const char *category, const char *label, const struct repository *repo, const char *fmt, va_list ap)`:: - - Emits a thread-relative "region_leave" message with optional - printf string. -+ -This function pops the region nesting stack on the current thread -and reports the elapsed time of the stack frame. -+ -The `category`, `label`, and `repo` fields are the same as above. -The `category` and `label` do not need to match the correpsonding -"region_enter" message, but it makes the data stream easier to -understand. - -`void trace2_data_string(const char *category, const struct repository *repo, const char *key, const char * value)`:: - -`void trace2_data_intmax(const char *category, const struct repository *repo, const char *key, intmax value)`:: - -`void trace2_data_json(const char *category, const struct repository *repo, const char *key, const struct json_writer *jw)`:: - - Emits a region- and thread-relative "data" or "data_json" message. -+ -This is a key/value pair message containing information about the -current thread, region stack, and repository. This could be used -to print the number of files in a directory during a multi-threaded -recursive tree walk. - -`void trace2_printf(const char *fmt, ...)`:: - -`void trace2_printf_va(const char *fmt, va_list ap)`:: - - Emits a region- and thread-relative "printf" message. - -== Trace2 Target Formats - -=== NORMAL Format - -Events are written as lines of the form: - ------------- -[<time> SP <filename>:<line> SP+] <event-name> [[SP] <event-message>] LF ------------- - -`<event-name>`:: - - is the event name. - -`<event-message>`:: - is a free-form printf message intended for human consumption. -+ -Note that this may contain embedded LF or CRLF characters that are -not escaped, so the event may spill across multiple lines. - -If `GIT_TRACE2_BRIEF` or `trace2.normalBrief` is true, the `time`, `filename`, -and `line` fields are omitted. - -This target is intended to be more of a summary (like GIT_TRACE) and -less detailed than the other targets. It ignores thread, region, and -data messages, for example. - -=== PERF Format - -Events are written as lines of the form: - ------------- -[<time> SP <filename>:<line> SP+ - BAR SP] d<depth> SP - BAR SP <thread-name> SP+ - BAR SP <event-name> SP+ - BAR SP [r<repo-id>] SP+ - BAR SP [<t_abs>] SP+ - BAR SP [<t_rel>] SP+ - BAR SP [<category>] SP+ - BAR SP DOTS* <perf-event-message> - LF ------------- - -`<depth>`:: - is the git process depth. This is the number of parent - git processes. A top-level git command has depth value "d0". - A child of it has depth value "d1". A second level child - has depth value "d2" and so on. - -`<thread-name>`:: - is a unique name for the thread. The primary thread - is called "main". Other thread names are of the form "th%d:%s" - and include a unique number and the name of the thread-proc. - -`<event-name>`:: - is the event name. - -`<repo-id>`:: - when present, is a number indicating the repository - in use. A `def_repo` event is emitted when a repository is - opened. This defines the repo-id and associated worktree. - Subsequent repo-specific events will reference this repo-id. -+ -Currently, this is always "r1" for the main repository. -This field is in anticipation of in-proc submodules in the future. - -`<t_abs>`:: - when present, is the absolute time in seconds since the - program started. - -`<t_rel>`:: - when present, is time in seconds relative to the start of - the current region. For a thread-exit event, it is the elapsed - time of the thread. - -`<category>`:: - is present on region and data events and is used to - indicate a broad category, such as "index" or "status". - -`<perf-event-message>`:: - is a free-form printf message intended for human consumption. - ------------- -15:33:33.532712 wt-status.c:2310 | d0 | main | region_enter | r1 | 0.126064 | | status | label:print -15:33:33.532712 wt-status.c:2331 | d0 | main | region_leave | r1 | 0.127568 | 0.001504 | status | label:print ------------- - -If `GIT_TRACE2_PERF_BRIEF` or `trace2.perfBrief` is true, the `time`, `file`, -and `line` fields are omitted. - ------------- -d0 | main | region_leave | r1 | 0.011717 | 0.009122 | index | label:preload ------------- - -The PERF target is intended for interactive performance analysis -during development and is quite noisy. - -=== EVENT Format - -Each event is a JSON-object containing multiple key/value pairs -written as a single line and followed by a LF. - ------------- -'{' <key> ':' <value> [',' <key> ':' <value>]* '}' LF ------------- - -Some key/value pairs are common to all events and some are -event-specific. - -==== Common Key/Value Pairs - -The following key/value pairs are common to all events: - ------------- -{ - "event":"version", - "sid":"20190408T191827.272759Z-H9b68c35f-P00003510", - "thread":"main", - "time":"2019-04-08T19:18:27.282761Z", - "file":"common-main.c", - "line":42, - ... -} ------------- - -`"event":<event>`:: - is the event name. - -`"sid":<sid>`:: - is the session-id. This is a unique string to identify the - process instance to allow all events emitted by a process to - be identified. A session-id is used instead of a PID because - PIDs are recycled by the OS. For child git processes, the - session-id is prepended with the session-id of the parent git - process to allow parent-child relationships to be identified - during post-processing. - -`"thread":<thread>`:: - is the thread name. - -`"time":<time>`:: - is the UTC time of the event. - -`"file":<filename>`:: - is source file generating the event. - -`"line":<line-number>`:: - is the integer source line number generating the event. - -`"repo":<repo-id>`:: - when present, is the integer repo-id as described previously. - -If `GIT_TRACE2_EVENT_BRIEF` or `trace2.eventBrief` is true, the `file` -and `line` fields are omitted from all events and the `time` field is -only present on the "start" and "atexit" events. - -==== Event-Specific Key/Value Pairs - -`"version"`:: - This event gives the version of the executable and the EVENT format. -+ ------------- -{ - "event":"version", - ... - "evt":"1", # EVENT format version - "exe":"2.20.1.155.g426c96fcdb" # git version -} ------------- - -`"start"`:: - This event contains the complete argv received by main(). -+ ------------- -{ - "event":"start", - ... - "t_abs":0.001227, # elapsed time in seconds - "argv":["git","version"] -} ------------- - -`"exit"`:: - This event is emitted when git calls `exit()`. -+ ------------- -{ - "event":"exit", - ... - "t_abs":0.001227, # elapsed time in seconds - "code":0 # exit code -} ------------- - -`"atexit"`:: - This event is emitted by the Trace2 `atexit` routine during - final shutdown. It should be the last event emitted by the - process. -+ -(The elapsed time reported here is greater than the time reported in -the "exit" event because it runs after all other atexit tasks have -completed.) -+ ------------- -{ - "event":"atexit", - ... - "t_abs":0.001227, # elapsed time in seconds - "code":0 # exit code -} ------------- - -`"signal"`:: - This event is emitted when the program is terminated by a user - signal. Depending on the platform, the signal event may - prevent the "atexit" event from being generated. -+ ------------- -{ - "event":"signal", - ... - "t_abs":0.001227, # elapsed time in seconds - "signo":13 # SIGTERM, SIGINT, etc. -} ------------- - -`"error"`:: - This event is emitted when one of the `error()`, `die()`, - or `usage()` functions are called. -+ ------------- -{ - "event":"error", - ... - "msg":"invalid option: --cahced", # formatted error message - "fmt":"invalid option: %s" # error format string -} ------------- -+ -The error event may be emitted more than once. The format string -allows post-processors to group errors by type without worrying -about specific error arguments. - -`"cmd_path"`:: - This event contains the discovered full path of the git - executable (on platforms that are configured to resolve it). -+ ------------- -{ - "event":"cmd_path", - ... - "path":"C:/work/gfw/git.exe" -} ------------- - -`"cmd_name"`:: - This event contains the command name for this git process - and the hierarchy of commands from parent git processes. -+ ------------- -{ - "event":"cmd_name", - ... - "name":"pack-objects", - "hierarchy":"push/pack-objects" -} ------------- -+ -Normally, the "name" field contains the canonical name of the -command. When a canonical name is not available, one of -these special values are used: -+ ------------- -"_query_" # "git --html-path" -"_run_dashed_" # when "git foo" tries to run "git-foo" -"_run_shell_alias_" # alias expansion to a shell command -"_run_git_alias_" # alias expansion to a git command -"_usage_" # usage error ------------- - -`"cmd_mode"`:: - This event, when present, describes the command variant This - event may be emitted more than once. -+ ------------- -{ - "event":"cmd_mode", - ... - "name":"branch" -} ------------- -+ -The "name" field is an arbitrary string to describe the command mode. -For example, checkout can checkout a branch or an individual file. -And these variations typically have different performance -characteristics that are not comparable. - -`"alias"`:: - This event is present when an alias is expanded. -+ ------------- -{ - "event":"alias", - ... - "alias":"l", # registered alias - "argv":["log","--graph"] # alias expansion -} ------------- - -`"child_start"`:: - This event describes a child process that is about to be - spawned. -+ ------------- -{ - "event":"child_start", - ... - "child_id":2, - "child_class":"?", - "use_shell":false, - "argv":["git","rev-list","--objects","--stdin","--not","--all","--quiet"] - - "hook_name":"<hook_name>" # present when child_class is "hook" - "cd":"<path>" # present when cd is required -} ------------- -+ -The "child_id" field can be used to match this child_start with the -corresponding child_exit event. -+ -The "child_class" field is a rough classification, such as "editor", -"pager", "transport/*", and "hook". Unclassified children are classified -with "?". - -`"child_exit"`:: - This event is generated after the current process has returned - from the waitpid() and collected the exit information from the - child. -+ ------------- -{ - "event":"child_exit", - ... - "child_id":2, - "pid":14708, # child PID - "code":0, # child exit-code - "t_rel":0.110605 # observed run-time of child process -} ------------- -+ -Note that the session-id of the child process is not available to -the current/spawning process, so the child's PID is reported here as -a hint for post-processing. (But it is only a hint because the child -proces may be a shell script which doesn't have a session-id.) -+ -Note that the `t_rel` field contains the observed run time in seconds -for the child process (starting before the fork/exec/spawn and -stopping after the waitpid() and includes OS process creation overhead). -So this time will be slightly larger than the atexit time reported by -the child process itself. - -`"exec"`:: - This event is generated before git attempts to `exec()` - another command rather than starting a child process. -+ ------------- -{ - "event":"exec", - ... - "exec_id":0, - "exe":"git", - "argv":["foo", "bar"] -} ------------- -+ -The "exec_id" field is a command-unique id and is only useful if the -`exec()` fails and a corresponding exec_result event is generated. - -`"exec_result"`:: - This event is generated if the `exec()` fails and control - returns to the current git command. -+ ------------- -{ - "event":"exec_result", - ... - "exec_id":0, - "code":1 # error code (errno) from exec() -} ------------- - -`"thread_start"`:: - This event is generated when a thread is started. It is - generated from *within* the new thread's thread-proc (for TLS - reasons). -+ ------------- -{ - "event":"thread_start", - ... - "thread":"th02:preload_thread" # thread name -} ------------- - -`"thread_exit"`:: - This event is generated when a thread exits. It is generated - from *within* the thread's thread-proc (for TLS reasons). -+ ------------- -{ - "event":"thread_exit", - ... - "thread":"th02:preload_thread", # thread name - "t_rel":0.007328 # thread elapsed time -} ------------- - -`"def_param"`:: - This event is generated to log a global parameter. -+ ------------- -{ - "event":"def_param", - ... - "param":"core.abbrev", - "value":"7" -} ------------- - -`"def_repo"`:: - This event defines a repo-id and associates it with the root - of the worktree. -+ ------------- -{ - "event":"def_repo", - ... - "repo":1, - "worktree":"/Users/jeffhost/work/gfw" -} ------------- -+ -As stated earlier, the repo-id is currently always 1, so there will -only be one def_repo event. Later, if in-proc submodules are -supported, a def_repo event should be emitted for each submodule -visited. - -`"region_enter"`:: - This event is generated when entering a region. -+ ------------- -{ - "event":"region_enter", - ... - "repo":1, # optional - "nesting":1, # current region stack depth - "category":"index", # optional - "label":"do_read_index", # optional - "msg":".git/index" # optional -} ------------- -+ -The `category` field may be used in a future enhancement to -do category-based filtering. -+ -`GIT_TRACE2_EVENT_NESTING` or `trace2.eventNesting` can be used to -filter deeply nested regions and data events. It defaults to "2". - -`"region_leave"`:: - This event is generated when leaving a region. -+ ------------- -{ - "event":"region_leave", - ... - "repo":1, # optional - "t_rel":0.002876, # time spent in region in seconds - "nesting":1, # region stack depth - "category":"index", # optional - "label":"do_read_index", # optional - "msg":".git/index" # optional -} ------------- - -`"data"`:: - This event is generated to log a thread- and region-local - key/value pair. -+ ------------- -{ - "event":"data", - ... - "repo":1, # optional - "t_abs":0.024107, # absolute elapsed time - "t_rel":0.001031, # elapsed time in region/thread - "nesting":2, # region stack depth - "category":"index", - "key":"read/cache_nr", - "value":"3552" -} ------------- -+ -The "value" field may be an integer or a string. - -`"data-json"`:: - This event is generated to log a pre-formatted JSON string - containing structured data. -+ ------------- -{ - "event":"data_json", - ... - "repo":1, # optional - "t_abs":0.015905, - "t_rel":0.015905, - "nesting":1, - "category":"process", - "key":"windows/ancestry", - "value":["bash.exe","bash.exe"] -} ------------- - -== Example Trace2 API Usage - -Here is a hypothetical usage of the Trace2 API showing the intended -usage (without worrying about the actual Git details). - -Initialization:: - - Initialization happens in `main()`. Behind the scenes, an - `atexit` and `signal` handler are registered. -+ ----------------- -int main(int argc, const char **argv) -{ - int exit_code; - - trace2_initialize(); - trace2_cmd_start(argv); - - exit_code = cmd_main(argc, argv); - - trace2_cmd_exit(exit_code); - - return exit_code; -} ----------------- - -Command Details:: - - After the basics are established, additional command - information can be sent to Trace2 as it is discovered. -+ ----------------- -int cmd_checkout(int argc, const char **argv) -{ - trace2_cmd_name("checkout"); - trace2_cmd_mode("branch"); - trace2_def_repo(the_repository); - - // emit "def_param" messages for "interesting" config settings. - trace2_cmd_list_config(); - - if (do_something()) - trace2_cmd_error("Path '%s': cannot do something", path); - - return 0; -} ----------------- - -Child Processes:: - - Wrap code spawning child processes. -+ ----------------- -void run_child(...) -{ - int child_exit_code; - struct child_process cmd = CHILD_PROCESS_INIT; - ... - cmd.trace2_child_class = "editor"; - - trace2_child_start(&cmd); - child_exit_code = spawn_child_and_wait_for_it(); - trace2_child_exit(&cmd, child_exit_code); -} ----------------- -+ -For example, the following fetch command spawned ssh, index-pack, -rev-list, and gc. This example also shows that fetch took -5.199 seconds and of that 4.932 was in ssh. -+ ----------------- -$ export GIT_TRACE2_BRIEF=1 -$ export GIT_TRACE2=~/log.normal -$ git fetch origin -... ----------------- -+ ----------------- -$ cat ~/log.normal -version 2.20.1.vfs.1.1.47.g534dbe1ad1 -start git fetch origin -worktree /Users/jeffhost/work/gfw -cmd_name fetch (fetch) -child_start[0] ssh git@github.com ... -child_start[1] git index-pack ... -... (Trace2 events from child processes omitted) -child_exit[1] pid:14707 code:0 elapsed:0.076353 -child_exit[0] pid:14706 code:0 elapsed:4.931869 -child_start[2] git rev-list ... -... (Trace2 events from child process omitted) -child_exit[2] pid:14708 code:0 elapsed:0.110605 -child_start[3] git gc --auto -... (Trace2 events from child process omitted) -child_exit[3] pid:14709 code:0 elapsed:0.006240 -exit elapsed:5.198503 code:0 -atexit elapsed:5.198541 code:0 ----------------- -+ -When a git process is a (direct or indirect) child of another -git process, it inherits Trace2 context information. This -allows the child to print the command hierarchy. This example -shows gc as child[3] of fetch. When the gc process reports -its name as "gc", it also reports the hierarchy as "fetch/gc". -(In this example, trace2 messages from the child process is -indented for clarity.) -+ ----------------- -$ export GIT_TRACE2_BRIEF=1 -$ export GIT_TRACE2=~/log.normal -$ git fetch origin -... ----------------- -+ ----------------- -$ cat ~/log.normal -version 2.20.1.160.g5676107ecd.dirty -start git fetch official -worktree /Users/jeffhost/work/gfw -cmd_name fetch (fetch) -... -child_start[3] git gc --auto - version 2.20.1.160.g5676107ecd.dirty - start /Users/jeffhost/work/gfw/git gc --auto - worktree /Users/jeffhost/work/gfw - cmd_name gc (fetch/gc) - exit elapsed:0.001959 code:0 - atexit elapsed:0.001997 code:0 -child_exit[3] pid:20303 code:0 elapsed:0.007564 -exit elapsed:3.868938 code:0 -atexit elapsed:3.868970 code:0 ----------------- - -Regions:: - - Regions can be use to time an interesting section of code. -+ ----------------- -void wt_status_collect(struct wt_status *s) -{ - trace2_region_enter("status", "worktrees", s->repo); - wt_status_collect_changes_worktree(s); - trace2_region_leave("status", "worktrees", s->repo); - - trace2_region_enter("status", "index", s->repo); - wt_status_collect_changes_index(s); - trace2_region_leave("status", "index", s->repo); - - trace2_region_enter("status", "untracked", s->repo); - wt_status_collect_untracked(s); - trace2_region_leave("status", "untracked", s->repo); -} - -void wt_status_print(struct wt_status *s) -{ - trace2_region_enter("status", "print", s->repo); - switch (s->status_format) { - ... - } - trace2_region_leave("status", "print", s->repo); -} ----------------- -+ -In this example, scanning for untracked files ran from +0.012568 to -+0.027149 (since the process started) and took 0.014581 seconds. -+ ----------------- -$ export GIT_TRACE2_PERF_BRIEF=1 -$ export GIT_TRACE2_PERF=~/log.perf -$ git status -... - -$ cat ~/log.perf -d0 | main | version | | | | | 2.20.1.160.g5676107ecd.dirty -d0 | main | start | | 0.001173 | | | git status -d0 | main | def_repo | r1 | | | | worktree:/Users/jeffhost/work/gfw -d0 | main | cmd_name | | | | | status (status) -... -d0 | main | region_enter | r1 | 0.010988 | | status | label:worktrees -d0 | main | region_leave | r1 | 0.011236 | 0.000248 | status | label:worktrees -d0 | main | region_enter | r1 | 0.011260 | | status | label:index -d0 | main | region_leave | r1 | 0.012542 | 0.001282 | status | label:index -d0 | main | region_enter | r1 | 0.012568 | | status | label:untracked -d0 | main | region_leave | r1 | 0.027149 | 0.014581 | status | label:untracked -d0 | main | region_enter | r1 | 0.027411 | | status | label:print -d0 | main | region_leave | r1 | 0.028741 | 0.001330 | status | label:print -d0 | main | exit | | 0.028778 | | | code:0 -d0 | main | atexit | | 0.028809 | | | code:0 ----------------- -+ -Regions may be nested. This causes messages to be indented in the -PERF target, for example. -Elapsed times are relative to the start of the correpsonding nesting -level as expected. For example, if we add region message to: -+ ----------------- -static enum path_treatment read_directory_recursive(struct dir_struct *dir, - struct index_state *istate, const char *base, int baselen, - struct untracked_cache_dir *untracked, int check_only, - int stop_at_first_file, const struct pathspec *pathspec) -{ - enum path_treatment state, subdir_state, dir_state = path_none; - - trace2_region_enter_printf("dir", "read_recursive", NULL, "%.*s", baselen, base); - ... - trace2_region_leave_printf("dir", "read_recursive", NULL, "%.*s", baselen, base); - return dir_state; -} ----------------- -+ -We can further investigate the time spent scanning for untracked files. -+ ----------------- -$ export GIT_TRACE2_PERF_BRIEF=1 -$ export GIT_TRACE2_PERF=~/log.perf -$ git status -... -$ cat ~/log.perf -d0 | main | version | | | | | 2.20.1.162.gb4ccea44db.dirty -d0 | main | start | | 0.001173 | | | git status -d0 | main | def_repo | r1 | | | | worktree:/Users/jeffhost/work/gfw -d0 | main | cmd_name | | | | | status (status) -... -d0 | main | region_enter | r1 | 0.015047 | | status | label:untracked -d0 | main | region_enter | | 0.015132 | | dir | ..label:read_recursive -d0 | main | region_enter | | 0.016341 | | dir | ....label:read_recursive vcs-svn/ -d0 | main | region_leave | | 0.016422 | 0.000081 | dir | ....label:read_recursive vcs-svn/ -d0 | main | region_enter | | 0.016446 | | dir | ....label:read_recursive xdiff/ -d0 | main | region_leave | | 0.016522 | 0.000076 | dir | ....label:read_recursive xdiff/ -d0 | main | region_enter | | 0.016612 | | dir | ....label:read_recursive git-gui/ -d0 | main | region_enter | | 0.016698 | | dir | ......label:read_recursive git-gui/po/ -d0 | main | region_enter | | 0.016810 | | dir | ........label:read_recursive git-gui/po/glossary/ -d0 | main | region_leave | | 0.016863 | 0.000053 | dir | ........label:read_recursive git-gui/po/glossary/ -... -d0 | main | region_enter | | 0.031876 | | dir | ....label:read_recursive builtin/ -d0 | main | region_leave | | 0.032270 | 0.000394 | dir | ....label:read_recursive builtin/ -d0 | main | region_leave | | 0.032414 | 0.017282 | dir | ..label:read_recursive -d0 | main | region_leave | r1 | 0.032454 | 0.017407 | status | label:untracked -... -d0 | main | exit | | 0.034279 | | | code:0 -d0 | main | atexit | | 0.034322 | | | code:0 ----------------- -+ -Trace2 regions are similar to the existing trace_performance_enter() -and trace_performance_leave() routines, but are thread safe and -maintain per-thread stacks of timers. - -Data Messages:: - - Data messages added to a region. -+ ----------------- -int read_index_from(struct index_state *istate, const char *path, - const char *gitdir) -{ - trace2_region_enter_printf("index", "do_read_index", the_repository, "%s", path); - - ... - - trace2_data_intmax("index", the_repository, "read/version", istate->version); - trace2_data_intmax("index", the_repository, "read/cache_nr", istate->cache_nr); - - trace2_region_leave_printf("index", "do_read_index", the_repository, "%s", path); -} ----------------- -+ -This example shows that the index contained 3552 entries. -+ ----------------- -$ export GIT_TRACE2_PERF_BRIEF=1 -$ export GIT_TRACE2_PERF=~/log.perf -$ git status -... -$ cat ~/log.perf -d0 | main | version | | | | | 2.20.1.156.gf9916ae094.dirty -d0 | main | start | | 0.001173 | | | git status -d0 | main | def_repo | r1 | | | | worktree:/Users/jeffhost/work/gfw -d0 | main | cmd_name | | | | | status (status) -d0 | main | region_enter | r1 | 0.001791 | | index | label:do_read_index .git/index -d0 | main | data | r1 | 0.002494 | 0.000703 | index | ..read/version:2 -d0 | main | data | r1 | 0.002520 | 0.000729 | index | ..read/cache_nr:3552 -d0 | main | region_leave | r1 | 0.002539 | 0.000748 | index | label:do_read_index .git/index -... ----------------- - -Thread Events:: - - Thread messages added to a thread-proc. -+ -For example, the multithreaded preload-index code can be -instrumented with a region around the thread pool and then -per-thread start and exit events within the threadproc. -+ ----------------- -static void *preload_thread(void *_data) -{ - // start the per-thread clock and emit a message. - trace2_thread_start("preload_thread"); - - // report which chunk of the array this thread was assigned. - trace2_data_intmax("index", the_repository, "offset", p->offset); - trace2_data_intmax("index", the_repository, "count", nr); - - do { - ... - } while (--nr > 0); - ... - - // report elapsed time taken by this thread. - trace2_thread_exit(); - return NULL; -} - -void preload_index(struct index_state *index, - const struct pathspec *pathspec, - unsigned int refresh_flags) -{ - trace2_region_enter("index", "preload", the_repository); - - for (i = 0; i < threads; i++) { - ... /* create thread */ - } - - for (i = 0; i < threads; i++) { - ... /* join thread */ - } - - trace2_region_leave("index", "preload", the_repository); -} ----------------- -+ -In this example preload_index() was executed by the `main` thread -and started the `preload` region. Seven threads, named -`th01:preload_thread` through `th07:preload_thread`, were started. -Events from each thread are atomically appended to the shared target -stream as they occur so they may appear in random order with respect -other threads. Finally, the main thread waits for the threads to -finish and leaves the region. -+ -Data events are tagged with the active thread name. They are used -to report the per-thread parameters. -+ ----------------- -$ export GIT_TRACE2_PERF_BRIEF=1 -$ export GIT_TRACE2_PERF=~/log.perf -$ git status -... -$ cat ~/log.perf -... -d0 | main | region_enter | r1 | 0.002595 | | index | label:preload -d0 | th01:preload_thread | thread_start | | 0.002699 | | | -d0 | th02:preload_thread | thread_start | | 0.002721 | | | -d0 | th01:preload_thread | data | r1 | 0.002736 | 0.000037 | index | offset:0 -d0 | th02:preload_thread | data | r1 | 0.002751 | 0.000030 | index | offset:2032 -d0 | th03:preload_thread | thread_start | | 0.002711 | | | -d0 | th06:preload_thread | thread_start | | 0.002739 | | | -d0 | th01:preload_thread | data | r1 | 0.002766 | 0.000067 | index | count:508 -d0 | th06:preload_thread | data | r1 | 0.002856 | 0.000117 | index | offset:2540 -d0 | th03:preload_thread | data | r1 | 0.002824 | 0.000113 | index | offset:1016 -d0 | th04:preload_thread | thread_start | | 0.002710 | | | -d0 | th02:preload_thread | data | r1 | 0.002779 | 0.000058 | index | count:508 -d0 | th06:preload_thread | data | r1 | 0.002966 | 0.000227 | index | count:508 -d0 | th07:preload_thread | thread_start | | 0.002741 | | | -d0 | th07:preload_thread | data | r1 | 0.003017 | 0.000276 | index | offset:3048 -d0 | th05:preload_thread | thread_start | | 0.002712 | | | -d0 | th05:preload_thread | data | r1 | 0.003067 | 0.000355 | index | offset:1524 -d0 | th05:preload_thread | data | r1 | 0.003090 | 0.000378 | index | count:508 -d0 | th07:preload_thread | data | r1 | 0.003037 | 0.000296 | index | count:504 -d0 | th03:preload_thread | data | r1 | 0.002971 | 0.000260 | index | count:508 -d0 | th04:preload_thread | data | r1 | 0.002983 | 0.000273 | index | offset:508 -d0 | th04:preload_thread | data | r1 | 0.007311 | 0.004601 | index | count:508 -d0 | th05:preload_thread | thread_exit | | 0.008781 | 0.006069 | | -d0 | th01:preload_thread | thread_exit | | 0.009561 | 0.006862 | | -d0 | th03:preload_thread | thread_exit | | 0.009742 | 0.007031 | | -d0 | th06:preload_thread | thread_exit | | 0.009820 | 0.007081 | | -d0 | th02:preload_thread | thread_exit | | 0.010274 | 0.007553 | | -d0 | th07:preload_thread | thread_exit | | 0.010477 | 0.007736 | | -d0 | th04:preload_thread | thread_exit | | 0.011657 | 0.008947 | | -d0 | main | region_leave | r1 | 0.011717 | 0.009122 | index | label:preload -... -d0 | main | exit | | 0.029996 | | | code:0 -d0 | main | atexit | | 0.030027 | | | code:0 ----------------- -+ -In this example, the preload region took 0.009122 seconds. The 7 threads -took between 0.006069 and 0.008947 seconds to work on their portion of -the index. Thread "th01" worked on 508 items at offset 0. Thread "th02" -worked on 508 items at offset 2032. Thread "th04" worked on 508 itemts -at offset 508. -+ -This example also shows that thread names are assigned in a racy manner -as each thread starts and allocates TLS storage. - -== Future Work - -=== Relationship to the Existing Trace Api (api-trace.txt) - -There are a few issues to resolve before we can completely -switch to Trace2. - -* Updating existing tests that assume GIT_TRACE format messages. - -* How to best handle custom GIT_TRACE_<key> messages? - -** The GIT_TRACE_<key> mechanism allows each <key> to write to a -different file (in addition to just stderr). - -** Do we want to maintain that ability or simply write to the existing -Trace2 targets (and convert <key> to a "category"). diff --git a/third_party/git/Documentation/technical/api-tree-walking.txt b/third_party/git/Documentation/technical/api-tree-walking.txt deleted file mode 100644 index bde18622a8..0000000000 --- a/third_party/git/Documentation/technical/api-tree-walking.txt +++ /dev/null @@ -1,147 +0,0 @@ -tree walking API -================ - -The tree walking API is used to traverse and inspect trees. - -Data Structures ---------------- - -`struct name_entry`:: - - An entry in a tree. Each entry has a sha1 identifier, pathname, and - mode. - -`struct tree_desc`:: - - A semi-opaque data structure used to maintain the current state of the - walk. -+ -* `buffer` is a pointer into the memory representation of the tree. It always -points at the current entry being visited. - -* `size` counts the number of bytes left in the `buffer`. - -* `entry` points to the current entry being visited. - -`struct traverse_info`:: - - A structure used to maintain the state of a traversal. -+ -* `prev` points to the traverse_info which was used to descend into the -current tree. If this is the top-level tree `prev` will point to -a dummy traverse_info. - -* `name` is the entry for the current tree (if the tree is a subtree). - -* `pathlen` is the length of the full path for the current tree. - -* `conflicts` can be used by callbacks to maintain directory-file conflicts. - -* `fn` is a callback called for each entry in the tree. See Traversing for more -information. - -* `data` can be anything the `fn` callback would want to use. - -* `show_all_errors` tells whether to stop at the first error or not. - -Initializing ------------- - -`init_tree_desc`:: - - Initialize a `tree_desc` and decode its first entry. The buffer and - size parameters are assumed to be the same as the buffer and size - members of `struct tree`. - -`fill_tree_descriptor`:: - - Initialize a `tree_desc` and decode its first entry given the - object ID of a tree. Returns the `buffer` member if the latter - is a valid tree identifier and NULL otherwise. - -`setup_traverse_info`:: - - Initialize a `traverse_info` given the pathname of the tree to start - traversing from. The `base` argument is assumed to be the `path` - member of the `name_entry` being recursed into unless the tree is a - top-level tree in which case the empty string ("") is used. - -Walking -------- - -`tree_entry`:: - - Visit the next entry in a tree. Returns 1 when there are more entries - left to visit and 0 when all entries have been visited. This is - commonly used in the test of a while loop. - -`tree_entry_len`:: - - Calculate the length of a tree entry's pathname. This utilizes the - memory structure of a tree entry to avoid the overhead of using a - generic strlen(). - -`update_tree_entry`:: - - Walk to the next entry in a tree. This is commonly used in conjunction - with `tree_entry_extract` to inspect the current entry. - -`tree_entry_extract`:: - - Decode the entry currently being visited (the one pointed to by - `tree_desc's` `entry` member) and return the sha1 of the entry. The - `pathp` and `modep` arguments are set to the entry's pathname and mode - respectively. - -`get_tree_entry`:: - - Find an entry in a tree given a pathname and the sha1 of a tree to - search. Returns 0 if the entry is found and -1 otherwise. The third - and fourth parameters are set to the entry's sha1 and mode - respectively. - -Traversing ----------- - -`traverse_trees`:: - - Traverse `n` number of trees in parallel. The `fn` callback member of - `traverse_info` is called once for each tree entry. - -`traverse_callback_t`:: - The arguments passed to the traverse callback are as follows: -+ -* `n` counts the number of trees being traversed. - -* `mask` has its nth bit set if something exists in the nth entry. - -* `dirmask` has its nth bit set if the nth tree's entry is a directory. - -* `entry` is an array of size `n` where the nth entry is from the nth tree. - -* `info` maintains the state of the traversal. - -+ -Returning a negative value will terminate the traversal. Otherwise the -return value is treated as an update mask. If the nth bit is set the nth tree -will be updated and if the bit is not set the nth tree entry will be the -same in the next callback invocation. - -`make_traverse_path`:: - - Generate the full pathname of a tree entry based from the root of the - traversal. For example, if the traversal has recursed into another - tree named "bar" the pathname of an entry "baz" in the "bar" - tree would be "bar/baz". - -`traverse_path_len`:: - - Calculate the length of a pathname returned by `make_traverse_path`. - This utilizes the memory structure of a tree entry to avoid the - overhead of using a generic strlen(). - -Authors -------- - -Written by Junio C Hamano <gitster@pobox.com> and Linus Torvalds -<torvalds@linux-foundation.org> diff --git a/third_party/git/Documentation/technical/api-xdiff-interface.txt b/third_party/git/Documentation/technical/api-xdiff-interface.txt deleted file mode 100644 index 6296ecad1d..0000000000 --- a/third_party/git/Documentation/technical/api-xdiff-interface.txt +++ /dev/null @@ -1,7 +0,0 @@ -xdiff interface API -=================== - -Talk about our calling convention to xdiff library, including -xdiff_emit_consume_fn. - -(Dscho, JC) diff --git a/third_party/git/Documentation/technical/bitmap-format.txt b/third_party/git/Documentation/technical/bitmap-format.txt deleted file mode 100644 index f8c18a0f7a..0000000000 --- a/third_party/git/Documentation/technical/bitmap-format.txt +++ /dev/null @@ -1,164 +0,0 @@ -GIT bitmap v1 format -==================== - - - A header appears at the beginning: - - 4-byte signature: {'B', 'I', 'T', 'M'} - - 2-byte version number (network byte order) - The current implementation only supports version 1 - of the bitmap index (the same one as JGit). - - 2-byte flags (network byte order) - - The following flags are supported: - - - BITMAP_OPT_FULL_DAG (0x1) REQUIRED - This flag must always be present. It implies that the bitmap - index has been generated for a packfile with full closure - (i.e. where every single object in the packfile can find - its parent links inside the same packfile). This is a - requirement for the bitmap index format, also present in JGit, - that greatly reduces the complexity of the implementation. - - - BITMAP_OPT_HASH_CACHE (0x4) - If present, the end of the bitmap file contains - `N` 32-bit name-hash values, one per object in the - pack. The format and meaning of the name-hash is - described below. - - 4-byte entry count (network byte order) - - The total count of entries (bitmapped commits) in this bitmap index. - - 20-byte checksum - - The SHA1 checksum of the pack this bitmap index belongs to. - - - 4 EWAH bitmaps that act as type indexes - - Type indexes are serialized after the hash cache in the shape - of four EWAH bitmaps stored consecutively (see Appendix A for - the serialization format of an EWAH bitmap). - - There is a bitmap for each Git object type, stored in the following - order: - - - Commits - - Trees - - Blobs - - Tags - - In each bitmap, the `n`th bit is set to true if the `n`th object - in the packfile is of that type. - - The obvious consequence is that the OR of all 4 bitmaps will result - in a full set (all bits set), and the AND of all 4 bitmaps will - result in an empty bitmap (no bits set). - - - N entries with compressed bitmaps, one for each indexed commit - - Where `N` is the total amount of entries in this bitmap index. - Each entry contains the following: - - - 4-byte object position (network byte order) - The position **in the index for the packfile** where the - bitmap for this commit is found. - - - 1-byte XOR-offset - The xor offset used to compress this bitmap. For an entry - in position `x`, a XOR offset of `y` means that the actual - bitmap representing this commit is composed by XORing the - bitmap for this entry with the bitmap in entry `x-y` (i.e. - the bitmap `y` entries before this one). - - Note that this compression can be recursive. In order to - XOR this entry with a previous one, the previous entry needs - to be decompressed first, and so on. - - The hard-limit for this offset is 160 (an entry can only be - xor'ed against one of the 160 entries preceding it). This - number is always positive, and hence entries are always xor'ed - with **previous** bitmaps, not bitmaps that will come afterwards - in the index. - - - 1-byte flags for this bitmap - At the moment the only available flag is `0x1`, which hints - that this bitmap can be re-used when rebuilding bitmap indexes - for the repository. - - - The compressed bitmap itself, see Appendix A. - -== Appendix A: Serialization format for an EWAH bitmap - -Ewah bitmaps are serialized in the same protocol as the JAVAEWAH -library, making them backwards compatible with the JGit -implementation: - - - 4-byte number of bits of the resulting UNCOMPRESSED bitmap - - - 4-byte number of words of the COMPRESSED bitmap, when stored - - - N x 8-byte words, as specified by the previous field - - This is the actual content of the compressed bitmap. - - - 4-byte position of the current RLW for the compressed - bitmap - -All words are stored in network byte order for their corresponding -sizes. - -The compressed bitmap is stored in a form of run-length encoding, as -follows. It consists of a concatenation of an arbitrary number of -chunks. Each chunk consists of one or more 64-bit words - - H L_1 L_2 L_3 .... L_M - -H is called RLW (run length word). It consists of (from lower to higher -order bits): - - - 1 bit: the repeated bit B - - - 32 bits: repetition count K (unsigned) - - - 31 bits: literal word count M (unsigned) - -The bitstream represented by the above chunk is then: - - - K repetitions of B - - - The bits stored in `L_1` through `L_M`. Within a word, bits at - lower order come earlier in the stream than those at higher - order. - -The next word after `L_M` (if any) must again be a RLW, for the next -chunk. For efficient appending to the bitstream, the EWAH stores a -pointer to the last RLW in the stream. - - -== Appendix B: Optional Bitmap Sections - -These sections may or may not be present in the `.bitmap` file; their -presence is indicated by the header flags section described above. - -Name-hash cache ---------------- - -If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains -a cache of 32-bit values, one per object in the pack. The value at -position `i` is the hash of the pathname at which the `i`th object -(counting in index order) in the pack can be found. This can be fed -into the delta heuristics to compare objects with similar pathnames. - -The hash algorithm used is: - - hash = 0; - while ((c = *name++)) - if (!isspace(c)) - hash = (hash >> 2) + (c << 24); - -Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag. -If implementations want to choose a different hashing scheme, they are -free to do so, but MUST allocate a new header flag (because comparing -hashes made under two different schemes would be pointless). diff --git a/third_party/git/Documentation/technical/commit-graph-format.txt b/third_party/git/Documentation/technical/commit-graph-format.txt deleted file mode 100644 index a4f17441ae..0000000000 --- a/third_party/git/Documentation/technical/commit-graph-format.txt +++ /dev/null @@ -1,104 +0,0 @@ -Git commit graph format -======================= - -The Git commit graph stores a list of commit OIDs and some associated -metadata, including: - -- The generation number of the commit. Commits with no parents have - generation number 1; commits with parents have generation number - one more than the maximum generation number of its parents. We - reserve zero as special, and can be used to mark a generation - number invalid or as "not computed". - -- The root tree OID. - -- The commit date. - -- The parents of the commit, stored using positional references within - the graph file. - -These positional references are stored as unsigned 32-bit integers -corresponding to the array position within the list of commit OIDs. Due -to some special constants we use to track parents, we can store at most -(1 << 30) + (1 << 29) + (1 << 28) - 1 (around 1.8 billion) commits. - -== Commit graph files have the following format: - -In order to allow extensions that add extra data to the graph, we organize -the body into "chunks" and provide a binary lookup table at the beginning -of the body. The header includes certain values, such as number of chunks -and hash type. - -All 4-byte numbers are in network order. - -HEADER: - - 4-byte signature: - The signature is: {'C', 'G', 'P', 'H'} - - 1-byte version number: - Currently, the only valid version is 1. - - 1-byte Hash Version (1 = SHA-1) - We infer the hash length (H) from this value. - - 1-byte number (C) of "chunks" - - 1-byte number (B) of base commit-graphs - We infer the length (H*B) of the Base Graphs chunk - from this value. - -CHUNK LOOKUP: - - (C + 1) * 12 bytes listing the table of contents for the chunks: - First 4 bytes describe the chunk id. Value 0 is a terminating label. - Other 8 bytes provide the byte-offset in current file for chunk to - start. (Chunks are ordered contiguously in the file, so you can infer - the length using the next chunk position if necessary.) Each chunk - ID appears at most once. - - The remaining data in the body is described one chunk at a time, and - these chunks may be given in any order. Chunks are required unless - otherwise specified. - -CHUNK DATA: - - OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes) - The ith entry, F[i], stores the number of OIDs with first - byte at most i. Thus F[255] stores the total - number of commits (N). - - OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes) - The OIDs for all commits in the graph, sorted in ascending order. - - Commit Data (ID: {'C', 'D', 'A', 'T' }) (N * (H + 16) bytes) - * The first H bytes are for the OID of the root tree. - * The next 8 bytes are for the positions of the first two parents - of the ith commit. Stores value 0x7000000 if no parent in that - position. If there are more than two parents, the second value - has its most-significant bit on and the other bits store an array - position into the Extra Edge List chunk. - * The next 8 bytes store the generation number of the commit and - the commit time in seconds since EPOCH. The generation number - uses the higher 30 bits of the first 4 bytes, while the commit - time uses the 32 bits of the second 4 bytes, along with the lowest - 2 bits of the lowest byte, storing the 33rd and 34th bit of the - commit time. - - Extra Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional] - This list of 4-byte values store the second through nth parents for - all octopus merges. The second parent value in the commit data stores - an array position within this list along with the most-significant bit - on. Starting at that array position, iterate through this list of commit - positions for the parents until reaching a value with the most-significant - bit on. The other bits correspond to the position of the last parent. - - Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional] - This list of H-byte hashes describe a set of B commit-graph files that - form a commit-graph chain. The graph position for the ith commit in this - file's OID Lookup chunk is equal to i plus the number of commits in all - base graphs. If B is non-zero, this chunk must exist. - -TRAILER: - - H-byte HASH-checksum of all of the above. diff --git a/third_party/git/Documentation/technical/commit-graph.txt b/third_party/git/Documentation/technical/commit-graph.txt deleted file mode 100644 index 729fbcb32f..0000000000 --- a/third_party/git/Documentation/technical/commit-graph.txt +++ /dev/null @@ -1,350 +0,0 @@ -Git Commit Graph Design Notes -============================= - -Git walks the commit graph for many reasons, including: - -1. Listing and filtering commit history. -2. Computing merge bases. - -These operations can become slow as the commit count grows. The merge -base calculation shows up in many user-facing commands, such as 'merge-base' -or 'status' and can take minutes to compute depending on history shape. - -There are two main costs here: - -1. Decompressing and parsing commits. -2. Walking the entire graph to satisfy topological order constraints. - -The commit-graph file is a supplemental data structure that accelerates -commit graph walks. If a user downgrades or disables the 'core.commitGraph' -config setting, then the existing ODB is sufficient. The file is stored -as "commit-graph" either in the .git/objects/info directory or in the info -directory of an alternate. - -The commit-graph file stores the commit graph structure along with some -extra metadata to speed up graph walks. By listing commit OIDs in lexi- -cographic order, we can identify an integer position for each commit and -refer to the parents of a commit using those integer positions. We use -binary search to find initial commits and then use the integer positions -for fast lookups during the walk. - -A consumer may load the following info for a commit from the graph: - -1. The commit OID. -2. The list of parents, along with their integer position. -3. The commit date. -4. The root tree OID. -5. The generation number (see definition below). - -Values 1-4 satisfy the requirements of parse_commit_gently(). - -Define the "generation number" of a commit recursively as follows: - - * A commit with no parents (a root commit) has generation number one. - - * A commit with at least one parent has generation number one more than - the largest generation number among its parents. - -Equivalently, the generation number of a commit A is one more than the -length of a longest path from A to a root commit. The recursive definition -is easier to use for computation and observing the following property: - - If A and B are commits with generation numbers N and M, respectively, - and N <= M, then A cannot reach B. That is, we know without searching - that B is not an ancestor of A because it is further from a root commit - than A. - - Conversely, when checking if A is an ancestor of B, then we only need - to walk commits until all commits on the walk boundary have generation - number at most N. If we walk commits using a priority queue seeded by - generation numbers, then we always expand the boundary commit with highest - generation number and can easily detect the stopping condition. - -This property can be used to significantly reduce the time it takes to -walk commits and determine topological relationships. Without generation -numbers, the general heuristic is the following: - - If A and B are commits with commit time X and Y, respectively, and - X < Y, then A _probably_ cannot reach B. - -This heuristic is currently used whenever the computation is allowed to -violate topological relationships due to clock skew (such as "git log" -with default order), but is not used when the topological order is -required (such as merge base calculations, "git log --graph"). - -In practice, we expect some commits to be created recently and not stored -in the commit graph. We can treat these commits as having "infinite" -generation number and walk until reaching commits with known generation -number. - -We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not -in the commit-graph file. If a commit-graph file was written by a version -of Git that did not compute generation numbers, then those commits will -have generation number represented by the macro GENERATION_NUMBER_ZERO = 0. - -Since the commit-graph file is closed under reachability, we can guarantee -the following weaker condition on all commits: - - If A and B are commits with generation numbers N amd M, respectively, - and N < M, then A cannot reach B. - -Note how the strict inequality differs from the inequality when we have -fully-computed generation numbers. Using strict inequality may result in -walking a few extra commits, but the simplicity in dealing with commits -with generation number *_INFINITY or *_ZERO is valuable. - -We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose -generation numbers are computed to be at least this value. We limit at -this value since it is the largest value that can be stored in the -commit-graph file using the 30 bits available to generation numbers. This -presents another case where a commit can have generation number equal to -that of a parent. - -Design Details --------------- - -- The commit-graph file is stored in a file named 'commit-graph' in the - .git/objects/info directory. This could be stored in the info directory - of an alternate. - -- The core.commitGraph config setting must be on to consume graph files. - -- The file format includes parameters for the object ID hash function, - so a future change of hash algorithm does not require a change in format. - -- Commit grafts and replace objects can change the shape of the commit - history. The latter can also be enabled/disabled on the fly using - `--no-replace-objects`. This leads to difficultly storing both possible - interpretations of a commit id, especially when computing generation - numbers. The commit-graph will not be read or written when - replace-objects or grafts are present. - -- Shallow clones create grafts of commits by dropping their parents. This - leads the commit-graph to think those commits have generation number 1. - If and when those commits are made unshallow, those generation numbers - become invalid. Since shallow clones are intended to restrict the commit - history to a very small set of commits, the commit-graph feature is less - helpful for these clones, anyway. The commit-graph will not be read or - written when shallow commits are present. - -Commit Graphs Chains --------------------- - -Typically, repos grow with near-constant velocity (commits per day). Over time, -the number of commits added by a fetch operation is much smaller than the -number of commits in the full history. By creating a "chain" of commit-graphs, -we enable fast writes of new commit data without rewriting the entire commit -history -- at least, most of the time. - -## File Layout - -A commit-graph chain uses multiple files, and we use a fixed naming convention -to organize these files. Each commit-graph file has a name -`$OBJDIR/info/commit-graphs/graph-{hash}.graph` where `{hash}` is the hex- -valued hash stored in the footer of that file (which is a hash of the file's -contents before that hash). For a chain of commit-graph files, a plain-text -file at `$OBJDIR/info/commit-graphs/commit-graph-chain` contains the -hashes for the files in order from "lowest" to "highest". - -For example, if the `commit-graph-chain` file contains the lines - -``` - {hash0} - {hash1} - {hash2} -``` - -then the commit-graph chain looks like the following diagram: - - +-----------------------+ - | graph-{hash2}.graph | - +-----------------------+ - | - +-----------------------+ - | | - | graph-{hash1}.graph | - | | - +-----------------------+ - | - +-----------------------+ - | | - | | - | | - | graph-{hash0}.graph | - | | - | | - | | - +-----------------------+ - -Let X0 be the number of commits in `graph-{hash0}.graph`, X1 be the number of -commits in `graph-{hash1}.graph`, and X2 be the number of commits in -`graph-{hash2}.graph`. If a commit appears in position i in `graph-{hash2}.graph`, -then we interpret this as being the commit in position (X0 + X1 + i), and that -will be used as its "graph position". The commits in `graph-{hash2}.graph` use these -positions to refer to their parents, which may be in `graph-{hash1}.graph` or -`graph-{hash0}.graph`. We can navigate to an arbitrary commit in position j by checking -its containment in the intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 + -X2). - -Each commit-graph file (except the base, `graph-{hash0}.graph`) contains data -specifying the hashes of all files in the lower layers. In the above example, -`graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains -`{hash0}` and `{hash1}`. - -## Merging commit-graph files - -If we only added a new commit-graph file on every write, we would run into a -linear search problem through many commit-graph files. Instead, we use a merge -strategy to decide when the stack should collapse some number of levels. - -The diagram below shows such a collapse. As a set of new commits are added, it -is determined by the merge strategy that the files should collapse to -`graph-{hash1}`. Thus, the new commits, the commits in `graph-{hash2}` and -the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}` -file. - - +---------------------+ - | | - | (new commits) | - | | - +---------------------+ - | | - +-----------------------+ +---------------------+ - | graph-{hash2} |->| | - +-----------------------+ +---------------------+ - | | | - +-----------------------+ +---------------------+ - | | | | - | graph-{hash1} |->| | - | | | | - +-----------------------+ +---------------------+ - | tmp_graphXXX - +-----------------------+ - | | - | | - | | - | graph-{hash0} | - | | - | | - | | - +-----------------------+ - -During this process, the commits to write are combined, sorted and we write the -contents to a temporary file, all while holding a `commit-graph-chain.lock` -lock-file. When the file is flushed, we rename it to `graph-{hash3}` -according to the computed `{hash3}`. Finally, we write the new chain data to -`commit-graph-chain.lock`: - -``` - {hash3} - {hash0} -``` - -We then close the lock-file. - -## Merge Strategy - -When writing a set of commits that do not exist in the commit-graph stack of -height N, we default to creating a new file at level N + 1. We then decide to -merge with the Nth level if one of two conditions hold: - - 1. `--size-multiple=<X>` is specified or X = 2, and the number of commits in - level N is less than X times the number of commits in level N + 1. - - 2. `--max-commits=<C>` is specified with non-zero C and the number of commits - in level N + 1 is more than C commits. - -This decision cascades down the levels: when we merge a level we create a new -set of commits that then compares to the next level. - -The first condition bounds the number of levels to be logarithmic in the total -number of commits. The second condition bounds the total number of commits in -a `graph-{hashN}` file and not in the `commit-graph` file, preventing -significant performance issues when the stack merges and another process only -partially reads the previous stack. - -The merge strategy values (2 for the size multiple, 64,000 for the maximum -number of commits) could be extracted into config settings for full -flexibility. - -## Deleting graph-{hash} files - -After a new tip file is written, some `graph-{hash}` files may no longer -be part of a chain. It is important to remove these files from disk, eventually. -The main reason to delay removal is that another process could read the -`commit-graph-chain` file before it is rewritten, but then look for the -`graph-{hash}` files after they are deleted. - -To allow holding old split commit-graphs for a while after they are unreferenced, -we update the modified times of the files when they become unreferenced. Then, -we scan the `$OBJDIR/info/commit-graphs/` directory for `graph-{hash}` -files whose modified times are older than a given expiry window. This window -defaults to zero, but can be changed using command-line arguments or a config -setting. - -## Chains across multiple object directories - -In a repo with alternates, we look for the `commit-graph-chain` file starting -in the local object directory and then in each alternate. The first file that -exists defines our chain. As we look for the `graph-{hash}` files for -each `{hash}` in the chain file, we follow the same pattern for the host -directories. - -This allows commit-graphs to be split across multiple forks in a fork network. -The typical case is a large "base" repo with many smaller forks. - -As the base repo advances, it will likely update and merge its commit-graph -chain more frequently than the forks. If a fork updates their commit-graph after -the base repo, then it should "reparent" the commit-graph chain onto the new -chain in the base repo. When reading each `graph-{hash}` file, we track -the object directory containing it. During a write of a new commit-graph file, -we check for any changes in the source object directory and read the -`commit-graph-chain` file for that source and create a new file based on those -files. During this "reparent" operation, we necessarily need to collapse all -levels in the fork, as all of the files are invalid against the new base file. - -It is crucial to be careful when cleaning up "unreferenced" `graph-{hash}.graph` -files in this scenario. It falls to the user to define the proper settings for -their custom environment: - - 1. When merging levels in the base repo, the unreferenced files may still be - referenced by chains from fork repos. - - 2. The expiry time should be set to a length of time such that every fork has - time to recompute their commit-graph chain to "reparent" onto the new base - file(s). - - 3. If the commit-graph chain is updated in the base, the fork will not have - access to the new chain until its chain is updated to reference those files. - (This may change in the future [5].) - -Related Links -------------- -[0] https://bugs.chromium.org/p/git/issues/detail?id=8 - Chromium work item for: Serialized Commit Graph - -[1] https://public-inbox.org/git/20110713070517.GC18566@sigill.intra.peff.net/ - An abandoned patch that introduced generation numbers. - -[2] https://public-inbox.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/ - Discussion about generation numbers on commits and how they interact - with fsck. - -[3] https://public-inbox.org/git/20170908034739.4op3w4f2ma5s65ku@sigill.intra.peff.net/ - More discussion about generation numbers and not storing them inside - commit objects. A valuable quote: - - "I think we should be moving more in the direction of keeping - repo-local caches for optimizations. Reachability bitmaps have been - a big performance win. I think we should be doing the same with our - properties of commits. Not just generation numbers, but making it - cheap to access the graph structure without zlib-inflating whole - commit objects (i.e., packv4 or something like the "metapacks" I - proposed a few years ago)." - -[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u - A patch to remove the ahead-behind calculation from 'status'. - -[5] https://public-inbox.org/git/f27db281-abad-5043-6d71-cbb083b1c877@gmail.com/ - A discussion of a "two-dimensional graph position" that can allow reading - multiple commit-graph chains at the same time. diff --git a/third_party/git/Documentation/technical/directory-rename-detection.txt b/third_party/git/Documentation/technical/directory-rename-detection.txt deleted file mode 100644 index 844629c8c4..0000000000 --- a/third_party/git/Documentation/technical/directory-rename-detection.txt +++ /dev/null @@ -1,115 +0,0 @@ -Directory rename detection -========================== - -Rename detection logic in diffcore-rename that checks for renames of -individual files is aggregated and analyzed in merge-recursive for cases -where combinations of renames indicate that a full directory has been -renamed. - -Scope of abilities ------------------- - -It is perhaps easiest to start with an example: - - * When all of x/a, x/b and x/c have moved to z/a, z/b and z/c, it is - likely that x/d added in the meantime would also want to move to z/d by - taking the hint that the entire directory 'x' moved to 'z'. - -More interesting possibilities exist, though, such as: - - * one side of history renames x -> z, and the other renames some file to - x/e, causing the need for the merge to do a transitive rename. - - * one side of history renames x -> z, but also renames all files within x. - For example, x/a -> z/alpha, x/b -> z/bravo, etc. - - * both 'x' and 'y' being merged into a single directory 'z', with a - directory rename being detected for both x->z and y->z. - - * not all files in a directory being renamed to the same location; - i.e. perhaps most the files in 'x' are now found under 'z', but a few - are found under 'w'. - - * a directory being renamed, which also contained a subdirectory that was - renamed to some entirely different location. (And perhaps the inner - directory itself contained inner directories that were renamed to yet - other locations). - - * combinations of the above; see t/t6043-merge-rename-directories.sh for - various interesting cases. - -Limitations -- applicability of directory renames -------------------------------------------------- - -In order to prevent edge and corner cases resulting in either conflicts -that cannot be represented in the index or which might be too complex for -users to try to understand and resolve, a couple basic rules limit when -directory rename detection applies: - - 1) If a given directory still exists on both sides of a merge, we do - not consider it to have been renamed. - - 2) If a subset of to-be-renamed files have a file or directory in the - way (or would be in the way of each other), "turn off" the directory - rename for those specific sub-paths and report the conflict to the - user. - - 3) If the other side of history did a directory rename to a path that - your side of history renamed away, then ignore that particular - rename from the other side of history for any implicit directory - renames (but warn the user). - -Limitations -- detailed rules and testcases -------------------------------------------- - -t/t6043-merge-rename-directories.sh contains extensive tests and commentary -which generate and explore the rules listed above. It also lists a few -additional rules: - - a) If renames split a directory into two or more others, the directory - with the most renames, "wins". - - b) Avoid directory-rename-detection for a path, if that path is the - source of a rename on either side of a merge. - - c) Only apply implicit directory renames to directories if the other side - of history is the one doing the renaming. - -Limitations -- support in different commands --------------------------------------------- - -Directory rename detection is supported by 'merge' and 'cherry-pick'. -Other git commands which users might be surprised to see limited or no -directory rename detection support in: - - * diff - - Folks have requested in the past that `git diff` detect directory - renames and somehow simplify its output. It is not clear whether this - would be desirable or how the output should be simplified, so this was - simply not implemented. Further, to implement this, directory rename - detection logic would need to move from merge-recursive to - diffcore-rename. - - * am - - git-am tries to avoid a full three way merge, instead calling - git-apply. That prevents us from detecting renames at all, which may - defeat the directory rename detection. There is a fallback, though; if - the initial git-apply fails and the user has specified the -3 option, - git-am will fall back to a three way merge. However, git-am lacks the - necessary information to do a "real" three way merge. Instead, it has - to use build_fake_ancestor() to get a merge base that is missing files - whose rename may have been important to detect for directory rename - detection to function. - - * rebase - - Since am-based rebases work by first generating a bunch of patches - (which no longer record what the original commits were and thus don't - have the necessary info from which we can find a real merge-base), and - then calling git-am, this implies that am-based rebases will not always - successfully detect directory renames either (see the 'am' section - above). merged-based rebases (rebase -m) and cherry-pick-based rebases - (rebase -i) are not affected by this shortcoming, and fully support - directory rename detection. diff --git a/third_party/git/Documentation/technical/hash-function-transition.txt b/third_party/git/Documentation/technical/hash-function-transition.txt deleted file mode 100644 index 2ae8fa470a..0000000000 --- a/third_party/git/Documentation/technical/hash-function-transition.txt +++ /dev/null @@ -1,827 +0,0 @@ -Git hash function transition -============================ - -Objective ---------- -Migrate Git from SHA-1 to a stronger hash function. - -Background ----------- -At its core, the Git version control system is a content addressable -filesystem. It uses the SHA-1 hash function to name content. For -example, files, directories, and revisions are referred to by hash -values unlike in other traditional version control systems where files -or versions are referred to via sequential numbers. The use of a hash -function to address its content delivers a few advantages: - -* Integrity checking is easy. Bit flips, for example, are easily - detected, as the hash of corrupted content does not match its name. -* Lookup of objects is fast. - -Using a cryptographically secure hash function brings additional -advantages: - -* Object names can be signed and third parties can trust the hash to - address the signed object and all objects it references. -* Communication using Git protocol and out of band communication - methods have a short reliable string that can be used to reliably - address stored content. - -Over time some flaws in SHA-1 have been discovered by security -researchers. On 23 February 2017 the SHAttered attack -(https://shattered.io) demonstrated a practical SHA-1 hash collision. - -Git v2.13.0 and later subsequently moved to a hardened SHA-1 -implementation by default, which isn't vulnerable to the SHAttered -attack. - -Thus Git has in effect already migrated to a new hash that isn't SHA-1 -and doesn't share its vulnerabilities, its new hash function just -happens to produce exactly the same output for all known inputs, -except two PDFs published by the SHAttered researchers, and the new -implementation (written by those researchers) claims to detect future -cryptanalytic collision attacks. - -Regardless, it's considered prudent to move past any variant of SHA-1 -to a new hash. There's no guarantee that future attacks on SHA-1 won't -be published in the future, and those attacks may not have viable -mitigations. - -If SHA-1 and its variants were to be truly broken, Git's hash function -could not be considered cryptographically secure any more. This would -impact the communication of hash values because we could not trust -that a given hash value represented the known good version of content -that the speaker intended. - -SHA-1 still possesses the other properties such as fast object lookup -and safe error checking, but other hash functions are equally suitable -that are believed to be cryptographically secure. - -Goals ------ -1. The transition to SHA-256 can be done one local repository at a time. - a. Requiring no action by any other party. - b. A SHA-256 repository can communicate with SHA-1 Git servers - (push/fetch). - c. Users can use SHA-1 and SHA-256 identifiers for objects - interchangeably (see "Object names on the command line", below). - d. New signed objects make use of a stronger hash function than - SHA-1 for their security guarantees. -2. Allow a complete transition away from SHA-1. - a. Local metadata for SHA-1 compatibility can be removed from a - repository if compatibility with SHA-1 is no longer needed. -3. Maintainability throughout the process. - a. The object format is kept simple and consistent. - b. Creation of a generalized repository conversion tool. - -Non-Goals ---------- -1. Add SHA-256 support to Git protocol. This is valuable and the - logical next step but it is out of scope for this initial design. -2. Transparently improving the security of existing SHA-1 signed - objects. -3. Intermixing objects using multiple hash functions in a single - repository. -4. Taking the opportunity to fix other bugs in Git's formats and - protocols. -5. Shallow clones and fetches into a SHA-256 repository. (This will - change when we add SHA-256 support to Git protocol.) -6. Skip fetching some submodules of a project into a SHA-256 - repository. (This also depends on SHA-256 support in Git - protocol.) - -Overview --------- -We introduce a new repository format extension. Repositories with this -extension enabled use SHA-256 instead of SHA-1 to name their objects. -This affects both object names and object content --- both the names -of objects and all references to other objects within an object are -switched to the new hash function. - -SHA-256 repositories cannot be read by older versions of Git. - -Alongside the packfile, a SHA-256 repository stores a bidirectional -mapping between SHA-256 and SHA-1 object names. The mapping is generated -locally and can be verified using "git fsck". Object lookups use this -mapping to allow naming objects using either their SHA-1 and SHA-256 names -interchangeably. - -"git cat-file" and "git hash-object" gain options to display an object -in its sha1 form and write an object given its sha1 form. This -requires all objects referenced by that object to be present in the -object database so that they can be named using the appropriate name -(using the bidirectional hash mapping). - -Fetches from a SHA-1 based server convert the fetched objects into -SHA-256 form and record the mapping in the bidirectional mapping table -(see below for details). Pushes to a SHA-1 based server convert the -objects being pushed into sha1 form so the server does not have to be -aware of the hash function the client is using. - -Detailed Design ---------------- -Repository format extension -~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A SHA-256 repository uses repository format version `1` (see -Documentation/technical/repository-version.txt) with extensions -`objectFormat` and `compatObjectFormat`: - - [core] - repositoryFormatVersion = 1 - [extensions] - objectFormat = sha256 - compatObjectFormat = sha1 - -The combination of setting `core.repositoryFormatVersion=1` and -populating `extensions.*` ensures that all versions of Git later than -`v0.99.9l` will die instead of trying to operate on the SHA-256 -repository, instead producing an error message. - - # Between v0.99.9l and v2.7.0 - $ git status - fatal: Expected git repo version <= 0, found 1 - # After v2.7.0 - $ git status - fatal: unknown repository extensions found: - objectformat - compatobjectformat - -See the "Transition plan" section below for more details on these -repository extensions. - -Object names -~~~~~~~~~~~~ -Objects can be named by their 40 hexadecimal digit sha1-name or 64 -hexadecimal digit sha256-name, plus names derived from those (see -gitrevisions(7)). - -The sha1-name of an object is the SHA-1 of the concatenation of its -type, length, a nul byte, and the object's sha1-content. This is the -traditional <sha1> used in Git to name objects. - -The sha256-name of an object is the SHA-256 of the concatenation of its -type, length, a nul byte, and the object's sha256-content. - -Object format -~~~~~~~~~~~~~ -The content as a byte sequence of a tag, commit, or tree object named -by sha1 and sha256 differ because an object named by sha256-name refers to -other objects by their sha256-names and an object named by sha1-name -refers to other objects by their sha1-names. - -The sha256-content of an object is the same as its sha1-content, except -that objects referenced by the object are named using their sha256-names -instead of sha1-names. Because a blob object does not refer to any -other object, its sha1-content and sha256-content are the same. - -The format allows round-trip conversion between sha256-content and -sha1-content. - -Object storage -~~~~~~~~~~~~~~ -Loose objects use zlib compression and packed objects use the packed -format described in Documentation/technical/pack-format.txt, just like -today. The content that is compressed and stored uses sha256-content -instead of sha1-content. - -Pack index -~~~~~~~~~~ -Pack index (.idx) files use a new v3 format that supports multiple -hash functions. They have the following format (all integers are in -network byte order): - -- A header appears at the beginning and consists of the following: - - The 4-byte pack index signature: '\377t0c' - - 4-byte version number: 3 - - 4-byte length of the header section, including the signature and - version number - - 4-byte number of objects contained in the pack - - 4-byte number of object formats in this pack index: 2 - - For each object format: - - 4-byte format identifier (e.g., 'sha1' for SHA-1) - - 4-byte length in bytes of shortened object names. This is the - shortest possible length needed to make names in the shortened - object name table unambiguous. - - 4-byte integer, recording where tables relating to this format - are stored in this index file, as an offset from the beginning. - - 4-byte offset to the trailer from the beginning of this file. - - Zero or more additional key/value pairs (4-byte key, 4-byte - value). Only one key is supported: 'PSRC'. See the "Loose objects - and unreachable objects" section for supported values and how this - is used. All other keys are reserved. Readers must ignore - unrecognized keys. -- Zero or more NUL bytes. This can optionally be used to improve the - alignment of the full object name table below. -- Tables for the first object format: - - A sorted table of shortened object names. These are prefixes of - the names of all objects in this pack file, packed together - without offset values to reduce the cache footprint of the binary - search for a specific object name. - - - A table of full object names in pack order. This allows resolving - a reference to "the nth object in the pack file" (from a - reachability bitmap or from the next table of another object - format) to its object name. - - - A table of 4-byte values mapping object name order to pack order. - For an object in the table of sorted shortened object names, the - value at the corresponding index in this table is the index in the - previous table for that same object. - - This can be used to look up the object in reachability bitmaps or - to look up its name in another object format. - - - A table of 4-byte CRC32 values of the packed object data, in the - order that the objects appear in the pack file. This is to allow - compressed data to be copied directly from pack to pack during - repacking without undetected data corruption. - - - A table of 4-byte offset values. For an object in the table of - sorted shortened object names, the value at the corresponding - index in this table indicates where that object can be found in - the pack file. These are usually 31-bit pack file offsets, but - large offsets are encoded as an index into the next table with the - most significant bit set. - - - A table of 8-byte offset entries (empty for pack files less than - 2 GiB). Pack files are organized with heavily used objects toward - the front, so most object references should not need to refer to - this table. -- Zero or more NUL bytes. -- Tables for the second object format, with the same layout as above, - up to and not including the table of CRC32 values. -- Zero or more NUL bytes. -- The trailer consists of the following: - - A copy of the 20-byte SHA-256 checksum at the end of the - corresponding packfile. - - - 20-byte SHA-256 checksum of all of the above. - -Loose object index -~~~~~~~~~~~~~~~~~~ -A new file $GIT_OBJECT_DIR/loose-object-idx contains information about -all loose objects. Its format is - - # loose-object-idx - (sha256-name SP sha1-name LF)* - -where the object names are in hexadecimal format. The file is not -sorted. - -The loose object index is protected against concurrent writes by a -lock file $GIT_OBJECT_DIR/loose-object-idx.lock. To add a new loose -object: - -1. Write the loose object to a temporary file, like today. -2. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the lock. -3. Rename the loose object into place. -4. Open loose-object-idx with O_APPEND and write the new object -5. Unlink loose-object-idx.lock to release the lock. - -To remove entries (e.g. in "git pack-refs" or "git-prune"): - -1. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the - lock. -2. Write the new content to loose-object-idx.lock. -3. Unlink any loose objects being removed. -4. Rename to replace loose-object-idx, releasing the lock. - -Translation table -~~~~~~~~~~~~~~~~~ -The index files support a bidirectional mapping between sha1-names -and sha256-names. The lookup proceeds similarly to ordinary object -lookups. For example, to convert a sha1-name to a sha256-name: - - 1. Look for the object in idx files. If a match is present in the - idx's sorted list of truncated sha1-names, then: - a. Read the corresponding entry in the sha1-name order to pack - name order mapping. - b. Read the corresponding entry in the full sha1-name table to - verify we found the right object. If it is, then - c. Read the corresponding entry in the full sha256-name table. - That is the object's sha256-name. - 2. Check for a loose object. Read lines from loose-object-idx until - we find a match. - -Step (1) takes the same amount of time as an ordinary object lookup: -O(number of packs * log(objects per pack)). Step (2) takes O(number of -loose objects) time. To maintain good performance it will be necessary -to keep the number of loose objects low. See the "Loose objects and -unreachable objects" section below for more details. - -Since all operations that make new objects (e.g., "git commit") add -the new objects to the corresponding index, this mapping is possible -for all objects in the object store. - -Reading an object's sha1-content -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The sha1-content of an object can be read by converting all sha256-names -its sha256-content references to sha1-names using the translation table. - -Fetch -~~~~~ -Fetching from a SHA-1 based server requires translating between SHA-1 -and SHA-256 based representations on the fly. - -SHA-1s named in the ref advertisement that are present on the client -can be translated to SHA-256 and looked up as local objects using the -translation table. - -Negotiation proceeds as today. Any "have"s generated locally are -converted to SHA-1 before being sent to the server, and SHA-1s -mentioned by the server are converted to SHA-256 when looking them up -locally. - -After negotiation, the server sends a packfile containing the -requested objects. We convert the packfile to SHA-256 format using -the following steps: - -1. index-pack: inflate each object in the packfile and compute its - SHA-1. Objects can contain deltas in OBJ_REF_DELTA format against - objects the client has locally. These objects can be looked up - using the translation table and their sha1-content read as - described above to resolve the deltas. -2. topological sort: starting at the "want"s from the negotiation - phase, walk through objects in the pack and emit a list of them, - excluding blobs, in reverse topologically sorted order, with each - object coming later in the list than all objects it references. - (This list only contains objects reachable from the "wants". If the - pack from the server contained additional extraneous objects, then - they will be discarded.) -3. convert to sha256: open a new (sha256) packfile. Read the topologically - sorted list just generated. For each object, inflate its - sha1-content, convert to sha256-content, and write it to the sha256 - pack. Record the new sha1<->sha256 mapping entry for use in the idx. -4. sort: reorder entries in the new pack to match the order of objects - in the pack the server generated and include blobs. Write a sha256 idx - file -5. clean up: remove the SHA-1 based pack file, index, and - topologically sorted list obtained from the server in steps 1 - and 2. - -Step 3 requires every object referenced by the new object to be in the -translation table. This is why the topological sort step is necessary. - -As an optimization, step 1 could write a file describing what non-blob -objects each object it has inflated from the packfile references. This -makes the topological sort in step 2 possible without inflating the -objects in the packfile for a second time. The objects need to be -inflated again in step 3, for a total of two inflations. - -Step 4 is probably necessary for good read-time performance. "git -pack-objects" on the server optimizes the pack file for good data -locality (see Documentation/technical/pack-heuristics.txt). - -Details of this process are likely to change. It will take some -experimenting to get this to perform well. - -Push -~~~~ -Push is simpler than fetch because the objects referenced by the -pushed objects are already in the translation table. The sha1-content -of each object being pushed can be read as described in the "Reading -an object's sha1-content" section to generate the pack written by git -send-pack. - -Signed Commits -~~~~~~~~~~~~~~ -We add a new field "gpgsig-sha256" to the commit object format to allow -signing commits without relying on SHA-1. It is similar to the -existing "gpgsig" field. Its signed payload is the sha256-content of the -commit object with any "gpgsig" and "gpgsig-sha256" fields removed. - -This means commits can be signed -1. using SHA-1 only, as in existing signed commit objects -2. using both SHA-1 and SHA-256, by using both gpgsig-sha256 and gpgsig - fields. -3. using only SHA-256, by only using the gpgsig-sha256 field. - -Old versions of "git verify-commit" can verify the gpgsig signature in -cases (1) and (2) without modifications and view case (3) as an -ordinary unsigned commit. - -Signed Tags -~~~~~~~~~~~ -We add a new field "gpgsig-sha256" to the tag object format to allow -signing tags without relying on SHA-1. Its signed payload is the -sha256-content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP -SIGNATURE-----" delimited in-body signature removed. - -This means tags can be signed -1. using SHA-1 only, as in existing signed tag objects -2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body - signature. -3. using only SHA-256, by only using the gpgsig-sha256 field. - -Mergetag embedding -~~~~~~~~~~~~~~~~~~ -The mergetag field in the sha1-content of a commit contains the -sha1-content of a tag that was merged by that commit. - -The mergetag field in the sha256-content of the same commit contains the -sha256-content of the same tag. - -Submodules -~~~~~~~~~~ -To convert recorded submodule pointers, you need to have the converted -submodule repository in place. The translation table of the submodule -can be used to look up the new hash. - -Loose objects and unreachable objects -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Fast lookups in the loose-object-idx require that the number of loose -objects not grow too high. - -"git gc --auto" currently waits for there to be 6700 loose objects -present before consolidating them into a packfile. We will need to -measure to find a more appropriate threshold for it to use. - -"git gc --auto" currently waits for there to be 50 packs present -before combining packfiles. Packing loose objects more aggressively -may cause the number of pack files to grow too quickly. This can be -mitigated by using a strategy similar to Martin Fick's exponential -rolling garbage collection script: -https://gerrit-review.googlesource.com/c/gerrit/+/35215 - -"git gc" currently expels any unreachable objects it encounters in -pack files to loose objects in an attempt to prevent a race when -pruning them (in case another process is simultaneously writing a new -object that refers to the about-to-be-deleted object). This leads to -an explosion in the number of loose objects present and disk space -usage due to the objects in delta form being replaced with independent -loose objects. Worse, the race is still present for loose objects. - -Instead, "git gc" will need to move unreachable objects to a new -packfile marked as UNREACHABLE_GARBAGE (using the PSRC field; see -below). To avoid the race when writing new objects referring to an -about-to-be-deleted object, code paths that write new objects will -need to copy any objects from UNREACHABLE_GARBAGE packs that they -refer to new, non-UNREACHABLE_GARBAGE packs (or loose objects). -UNREACHABLE_GARBAGE are then safe to delete if their creation time (as -indicated by the file's mtime) is long enough ago. - -To avoid a proliferation of UNREACHABLE_GARBAGE packs, they can be -combined under certain circumstances. If "gc.garbageTtl" is set to -greater than one day, then packs created within a single calendar day, -UTC, can be coalesced together. The resulting packfile would have an -mtime before midnight on that day, so this makes the effective maximum -ttl the garbageTtl + 1 day. If "gc.garbageTtl" is less than one day, -then we divide the calendar day into intervals one-third of that ttl -in duration. Packs created within the same interval can be coalesced -together. The resulting packfile would have an mtime before the end of -the interval, so this makes the effective maximum ttl equal to the -garbageTtl * 4/3. - -This rule comes from Thirumala Reddy Mutchukota's JGit change -https://git.eclipse.org/r/90465. - -The UNREACHABLE_GARBAGE setting goes in the PSRC field of the pack -index. More generally, that field indicates where a pack came from: - - - 1 (PACK_SOURCE_RECEIVE) for a pack received over the network - - 2 (PACK_SOURCE_AUTO) for a pack created by a lightweight - "gc --auto" operation - - 3 (PACK_SOURCE_GC) for a pack created by a full gc - - 4 (PACK_SOURCE_UNREACHABLE_GARBAGE) for potential garbage - discovered by gc - - 5 (PACK_SOURCE_INSERT) for locally created objects that were - written directly to a pack file, e.g. from "git add ." - -This information can be useful for debugging and for "gc --auto" to -make appropriate choices about which packs to coalesce. - -Caveats -------- -Invalid objects -~~~~~~~~~~~~~~~ -The conversion from sha1-content to sha256-content retains any -brokenness in the original object (e.g., tree entry modes encoded with -leading 0, tree objects whose paths are not sorted correctly, and -commit objects without an author or committer). This is a deliberate -feature of the design to allow the conversion to round-trip. - -More profoundly broken objects (e.g., a commit with a truncated "tree" -header line) cannot be converted but were not usable by current Git -anyway. - -Shallow clone and submodules -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Because it requires all referenced objects to be available in the -locally generated translation table, this design does not support -shallow clone or unfetched submodules. Protocol improvements might -allow lifting this restriction. - -Alternates -~~~~~~~~~~ -For the same reason, a sha256 repository cannot borrow objects from a -sha1 repository using objects/info/alternates or -$GIT_ALTERNATE_OBJECT_REPOSITORIES. - -git notes -~~~~~~~~~ -The "git notes" tool annotates objects using their sha1-name as key. -This design does not describe a way to migrate notes trees to use -sha256-names. That migration is expected to happen separately (for -example using a file at the root of the notes tree to describe which -hash it uses). - -Server-side cost -~~~~~~~~~~~~~~~~ -Until Git protocol gains SHA-256 support, using SHA-256 based storage -on public-facing Git servers is strongly discouraged. Once Git -protocol gains SHA-256 support, SHA-256 based servers are likely not -to support SHA-1 compatibility, to avoid what may be a very expensive -hash reencode during clone and to encourage peers to modernize. - -The design described here allows fetches by SHA-1 clients of a -personal SHA-256 repository because it's not much more difficult than -allowing pushes from that repository. This support needs to be guarded -by a configuration option --- servers like git.kernel.org that serve a -large number of clients would not be expected to bear that cost. - -Meaning of signatures -~~~~~~~~~~~~~~~~~~~~~ -The signed payload for signed commits and tags does not explicitly -name the hash used to identify objects. If some day Git adopts a new -hash function with the same length as the current SHA-1 (40 -hexadecimal digit) or SHA-256 (64 hexadecimal digit) objects then the -intent behind the PGP signed payload in an object signature is -unclear: - - object e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7 - type commit - tag v2.12.0 - tagger Junio C Hamano <gitster@pobox.com> 1487962205 -0800 - - Git 2.12 - -Does this mean Git v2.12.0 is the commit with sha1-name -e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7 or the commit with -new-40-digit-hash-name e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7? - -Fortunately SHA-256 and SHA-1 have different lengths. If Git starts -using another hash with the same length to name objects, then it will -need to change the format of signed payloads using that hash to -address this issue. - -Object names on the command line -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -To support the transition (see Transition plan below), this design -supports four different modes of operation: - - 1. ("dark launch") Treat object names input by the user as SHA-1 and - convert any object names written to output to SHA-1, but store - objects using SHA-256. This allows users to test the code with no - visible behavior change except for performance. This allows - allows running even tests that assume the SHA-1 hash function, to - sanity-check the behavior of the new mode. - - 2. ("early transition") Allow both SHA-1 and SHA-256 object names in - input. Any object names written to output use SHA-1. This allows - users to continue to make use of SHA-1 to communicate with peers - (e.g. by email) that have not migrated yet and prepares for mode 3. - - 3. ("late transition") Allow both SHA-1 and SHA-256 object names in - input. Any object names written to output use SHA-256. In this - mode, users are using a more secure object naming method by - default. The disruption is minimal as long as most of their peers - are in mode 2 or mode 3. - - 4. ("post-transition") Treat object names input by the user as - SHA-256 and write output using SHA-256. This is safer than mode 3 - because there is less risk that input is incorrectly interpreted - using the wrong hash function. - -The mode is specified in configuration. - -The user can also explicitly specify which format to use for a -particular revision specifier and for output, overriding the mode. For -example: - -git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256} - -Choice of Hash --------------- -In early 2005, around the time that Git was written, Xiaoyun Wang, -Yiqun Lisa Yin, and Hongbo Yu announced an attack finding SHA-1 -collisions in 2^69 operations. In August they published details. -Luckily, no practical demonstrations of a collision in full SHA-1 were -published until 10 years later, in 2017. - -Git v2.13.0 and later subsequently moved to a hardened SHA-1 -implementation by default that mitigates the SHAttered attack, but -SHA-1 is still believed to be weak. - -The hash to replace this hardened SHA-1 should be stronger than SHA-1 -was: we would like it to be trustworthy and useful in practice for at -least 10 years. - -Some other relevant properties: - -1. A 256-bit hash (long enough to match common security practice; not - excessively long to hurt performance and disk usage). - -2. High quality implementations should be widely available (e.g., in - OpenSSL and Apple CommonCrypto). - -3. The hash function's properties should match Git's needs (e.g. Git - requires collision and 2nd preimage resistance and does not require - length extension resistance). - -4. As a tiebreaker, the hash should be fast to compute (fortunately - many contenders are faster than SHA-1). - -We choose SHA-256. - -Transition plan ---------------- -Some initial steps can be implemented independently of one another: -- adding a hash function API (vtable) -- teaching fsck to tolerate the gpgsig-sha256 field -- excluding gpgsig-* from the fields copied by "git commit --amend" -- annotating tests that depend on SHA-1 values with a SHA1 test - prerequisite -- using "struct object_id", GIT_MAX_RAWSZ, and GIT_MAX_HEXSZ - consistently instead of "unsigned char *" and the hardcoded - constants 20 and 40. -- introducing index v3 -- adding support for the PSRC field and safer object pruning - - -The first user-visible change is the introduction of the objectFormat -extension (without compatObjectFormat). This requires: -- implementing the loose-object-idx -- teaching fsck about this mode of operation -- using the hash function API (vtable) when computing object names -- signing objects and verifying signatures -- rejecting attempts to fetch from or push to an incompatible - repository - -Next comes introduction of compatObjectFormat: -- translating object names between object formats -- translating object content between object formats -- generating and verifying signatures in the compat format -- adding appropriate index entries when adding a new object to the - object store -- --output-format option -- ^{sha1} and ^{sha256} revision notation -- configuration to specify default input and output format (see - "Object names on the command line" above) - -The next step is supporting fetches and pushes to SHA-1 repositories: -- allow pushes to a repository using the compat format -- generate a topologically sorted list of the SHA-1 names of fetched - objects -- convert the fetched packfile to sha256 format and generate an idx - file -- re-sort to match the order of objects in the fetched packfile - -The infrastructure supporting fetch also allows converting an existing -repository. In converted repositories and new clones, end users can -gain support for the new hash function without any visible change in -behavior (see "dark launch" in the "Object names on the command line" -section). In particular this allows users to verify SHA-256 signatures -on objects in the repository, and it should ensure the transition code -is stable in production in preparation for using it more widely. - -Over time projects would encourage their users to adopt the "early -transition" and then "late transition" modes to take advantage of the -new, more futureproof SHA-256 object names. - -When objectFormat and compatObjectFormat are both set, commands -generating signatures would generate both SHA-1 and SHA-256 signatures -by default to support both new and old users. - -In projects using SHA-256 heavily, users could be encouraged to adopt -the "post-transition" mode to avoid accidentally making implicit use -of SHA-1 object names. - -Once a critical mass of users have upgraded to a version of Git that -can verify SHA-256 signatures and have converted their existing -repositories to support verifying them, we can add support for a -setting to generate only SHA-256 signatures. This is expected to be at -least a year later. - -That is also a good moment to advertise the ability to convert -repositories to use SHA-256 only, stripping out all SHA-1 related -metadata. This improves performance by eliminating translation -overhead and security by avoiding the possibility of accidentally -relying on the safety of SHA-1. - -Updating Git's protocols to allow a server to specify which hash -functions it supports is also an important part of this transition. It -is not discussed in detail in this document but this transition plan -assumes it happens. :) - -Alternatives considered ------------------------ -Upgrading everyone working on a particular project on a flag day -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Projects like the Linux kernel are large and complex enough that -flipping the switch for all projects based on the repository at once -is infeasible. - -Not only would all developers and server operators supporting -developers have to switch on the same flag day, but supporting tooling -(continuous integration, code review, bug trackers, etc) would have to -be adapted as well. This also makes it difficult to get early feedback -from some project participants testing before it is time for mass -adoption. - -Using hash functions in parallel -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -(e.g. https://public-inbox.org/git/22708.8913.864049.452252@chiark.greenend.org.uk/ ) -Objects newly created would be addressed by the new hash, but inside -such an object (e.g. commit) it is still possible to address objects -using the old hash function. -* You cannot trust its history (needed for bisectability) in the - future without further work -* Maintenance burden as the number of supported hash functions grows - (they will never go away, so they accumulate). In this proposal, by - comparison, converted objects lose all references to SHA-1. - -Signed objects with multiple hashes -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Instead of introducing the gpgsig-sha256 field in commit and tag objects -for sha256-content based signatures, an earlier version of this design -added "hash sha256 <sha256-name>" fields to strengthen the existing -sha1-content based signatures. - -In other words, a single signature was used to attest to the object -content using both hash functions. This had some advantages: -* Using one signature instead of two speeds up the signing process. -* Having one signed payload with both hashes allows the signer to - attest to the sha1-name and sha256-name referring to the same object. -* All users consume the same signature. Broken signatures are likely - to be detected quickly using current versions of git. - -However, it also came with disadvantages: -* Verifying a signed object requires access to the sha1-names of all - objects it references, even after the transition is complete and - translation table is no longer needed for anything else. To support - this, the design added fields such as "hash sha1 tree <sha1-name>" - and "hash sha1 parent <sha1-name>" to the sha256-content of a signed - commit, complicating the conversion process. -* Allowing signed objects without a sha1 (for after the transition is - complete) complicated the design further, requiring a "nohash sha1" - field to suppress including "hash sha1" fields in the sha256-content - and signed payload. - -Lazily populated translation table -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Some of the work of building the translation table could be deferred to -push time, but that would significantly complicate and slow down pushes. -Calculating the sha1-name at object creation time at the same time it is -being streamed to disk and having its sha256-name calculated should be -an acceptable cost. - -Document History ----------------- - -2017-03-03 -bmwill@google.com, jonathantanmy@google.com, jrnieder@gmail.com, -sbeller@google.com - -Initial version sent to -http://public-inbox.org/git/20170304011251.GA26789@aiede.mtv.corp.google.com - -2017-03-03 jrnieder@gmail.com -Incorporated suggestions from jonathantanmy and sbeller: -* describe purpose of signed objects with each hash type -* redefine signed object verification using object content under the - first hash function - -2017-03-06 jrnieder@gmail.com -* Use SHA3-256 instead of SHA2 (thanks, Linus and brian m. carlson).[1][2] -* Make sha3-based signatures a separate field, avoiding the need for - "hash" and "nohash" fields (thanks to peff[3]). -* Add a sorting phase to fetch (thanks to Junio for noticing the need - for this). -* Omit blobs from the topological sort during fetch (thanks to peff). -* Discuss alternates, git notes, and git servers in the caveats - section (thanks to Junio Hamano, brian m. carlson[4], and Shawn - Pearce). -* Clarify language throughout (thanks to various commenters, - especially Junio). - -2017-09-27 jrnieder@gmail.com, sbeller@google.com -* use placeholder NewHash instead of SHA3-256 -* describe criteria for picking a hash function. -* include a transition plan (thanks especially to Brandon Williams - for fleshing these ideas out) -* define the translation table (thanks, Shawn Pearce[5], Jonathan - Tan, and Masaya Suzuki) -* avoid loose object overhead by packing more aggressively in - "git gc --auto" - -Later history: - - See the history of this file in git.git for the history of subsequent - edits. This document history is no longer being maintained as it - would now be superfluous to the commit log - -[1] http://public-inbox.org/git/CA+55aFzJtejiCjV0e43+9oR3QuJK2PiFiLQemytoLpyJWe6P9w@mail.gmail.com/ -[2] http://public-inbox.org/git/CA+55aFz+gkAsDZ24zmePQuEs1XPS9BP_s8O7Q4wQ7LV7X5-oDA@mail.gmail.com/ -[3] http://public-inbox.org/git/20170306084353.nrns455dvkdsfgo5@sigill.intra.peff.net/ -[4] http://public-inbox.org/git/20170304224936.rqqtkdvfjgyezsht@genre.crustytoothpaste.net -[5] https://public-inbox.org/git/CAJo=hJtoX9=AyLHHpUJS7fueV9ciZ_MNpnEPHUz8Whui6g9F0A@mail.gmail.com/ diff --git a/third_party/git/Documentation/technical/http-protocol.txt b/third_party/git/Documentation/technical/http-protocol.txt deleted file mode 100644 index 9c5b6f0fac..0000000000 --- a/third_party/git/Documentation/technical/http-protocol.txt +++ /dev/null @@ -1,518 +0,0 @@ -HTTP transfer protocols -======================= - -Git supports two HTTP based transfer protocols. A "dumb" protocol -which requires only a standard HTTP server on the server end of the -connection, and a "smart" protocol which requires a Git aware CGI -(or server module). This document describes both protocols. - -As a design feature smart clients can automatically upgrade "dumb" -protocol URLs to smart URLs. This permits all users to have the -same published URL, and the peers automatically select the most -efficient transport available to them. - - -URL Format ----------- - -URLs for Git repositories accessed by HTTP use the standard HTTP -URL syntax documented by RFC 1738, so they are of the form: - - http://<host>:<port>/<path>?<searchpart> - -Within this documentation the placeholder `$GIT_URL` will stand for -the http:// repository URL entered by the end-user. - -Servers SHOULD handle all requests to locations matching `$GIT_URL`, as -both the "smart" and "dumb" HTTP protocols used by Git operate -by appending additional path components onto the end of the user -supplied `$GIT_URL` string. - -An example of a dumb client requesting for a loose object: - - $GIT_URL: http://example.com:8080/git/repo.git - URL request: http://example.com:8080/git/repo.git/objects/d0/49f6c27a2244e12041955e262a404c7faba355 - -An example of a smart request to a catch-all gateway: - - $GIT_URL: http://example.com/daemon.cgi?svc=git&q= - URL request: http://example.com/daemon.cgi?svc=git&q=/info/refs&service=git-receive-pack - -An example of a request to a submodule: - - $GIT_URL: http://example.com/git/repo.git/path/submodule.git - URL request: http://example.com/git/repo.git/path/submodule.git/info/refs - -Clients MUST strip a trailing `/`, if present, from the user supplied -`$GIT_URL` string to prevent empty path tokens (`//`) from appearing -in any URL sent to a server. Compatible clients MUST expand -`$GIT_URL/info/refs` as `foo/info/refs` and not `foo//info/refs`. - - -Authentication --------------- - -Standard HTTP authentication is used if authentication is required -to access a repository, and MAY be configured and enforced by the -HTTP server software. - -Because Git repositories are accessed by standard path components -server administrators MAY use directory based permissions within -their HTTP server to control repository access. - -Clients SHOULD support Basic authentication as described by RFC 2617. -Servers SHOULD support Basic authentication by relying upon the -HTTP server placed in front of the Git server software. - -Servers SHOULD NOT require HTTP cookies for the purposes of -authentication or access control. - -Clients and servers MAY support other common forms of HTTP based -authentication, such as Digest authentication. - - -SSL ---- - -Clients and servers SHOULD support SSL, particularly to protect -passwords when relying on Basic HTTP authentication. - - -Session State -------------- - -The Git over HTTP protocol (much like HTTP itself) is stateless -from the perspective of the HTTP server side. All state MUST be -retained and managed by the client process. This permits simple -round-robin load-balancing on the server side, without needing to -worry about state management. - -Clients MUST NOT require state management on the server side in -order to function correctly. - -Servers MUST NOT require HTTP cookies in order to function correctly. -Clients MAY store and forward HTTP cookies during request processing -as described by RFC 2616 (HTTP/1.1). Servers SHOULD ignore any -cookies sent by a client. - - -General Request Processing --------------------------- - -Except where noted, all standard HTTP behavior SHOULD be assumed -by both client and server. This includes (but is not necessarily -limited to): - -If there is no repository at `$GIT_URL`, or the resource pointed to by a -location matching `$GIT_URL` does not exist, the server MUST NOT respond -with `200 OK` response. A server SHOULD respond with -`404 Not Found`, `410 Gone`, or any other suitable HTTP status code -which does not imply the resource exists as requested. - -If there is a repository at `$GIT_URL`, but access is not currently -permitted, the server MUST respond with the `403 Forbidden` HTTP -status code. - -Servers SHOULD support both HTTP 1.0 and HTTP 1.1. -Servers SHOULD support chunked encoding for both request and response -bodies. - -Clients SHOULD support both HTTP 1.0 and HTTP 1.1. -Clients SHOULD support chunked encoding for both request and response -bodies. - -Servers MAY return ETag and/or Last-Modified headers. - -Clients MAY revalidate cached entities by including If-Modified-Since -and/or If-None-Match request headers. - -Servers MAY return `304 Not Modified` if the relevant headers appear -in the request and the entity has not changed. Clients MUST treat -`304 Not Modified` identical to `200 OK` by reusing the cached entity. - -Clients MAY reuse a cached entity without revalidation if the -Cache-Control and/or Expires header permits caching. Clients and -servers MUST follow RFC 2616 for cache controls. - - -Discovering References ----------------------- - -All HTTP clients MUST begin either a fetch or a push exchange by -discovering the references available on the remote repository. - -Dumb Clients -~~~~~~~~~~~~ - -HTTP clients that only support the "dumb" protocol MUST discover -references by making a request for the special info/refs file of -the repository. - -Dumb HTTP clients MUST make a `GET` request to `$GIT_URL/info/refs`, -without any search/query parameters. - - C: GET $GIT_URL/info/refs HTTP/1.0 - - S: 200 OK - S: - S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint - S: d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master - S: 2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0 - S: a3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{} - -The Content-Type of the returned info/refs entity SHOULD be -`text/plain; charset=utf-8`, but MAY be any content type. -Clients MUST NOT attempt to validate the returned Content-Type. -Dumb servers MUST NOT return a return type starting with -`application/x-git-`. - -Cache-Control headers MAY be returned to disable caching of the -returned entity. - -When examining the response clients SHOULD only examine the HTTP -status code. Valid responses are `200 OK`, or `304 Not Modified`. - -The returned content is a UNIX formatted text file describing -each ref and its known value. The file SHOULD be sorted by name -according to the C locale ordering. The file SHOULD NOT include -the default ref named `HEAD`. - - info_refs = *( ref_record ) - ref_record = any_ref / peeled_ref - - any_ref = obj-id HTAB refname LF - peeled_ref = obj-id HTAB refname LF - obj-id HTAB refname "^{}" LF - -Smart Clients -~~~~~~~~~~~~~ - -HTTP clients that support the "smart" protocol (or both the -"smart" and "dumb" protocols) MUST discover references by making -a parameterized request for the info/refs file of the repository. - -The request MUST contain exactly one query parameter, -`service=$servicename`, where `$servicename` MUST be the service -name the client wishes to contact to complete the operation. -The request MUST NOT contain additional query parameters. - - C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0 - -dumb server reply: - - S: 200 OK - S: - S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint - S: d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master - S: 2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0 - S: a3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{} - -smart server reply: - - S: 200 OK - S: Content-Type: application/x-git-upload-pack-advertisement - S: Cache-Control: no-cache - S: - S: 001e# service=git-upload-pack\n - S: 0000 - S: 004895dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0multi_ack\n - S: 0042d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master\n - S: 003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0\n - S: 003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}\n - S: 0000 - -The client may send Extra Parameters (see -Documentation/technical/pack-protocol.txt) as a colon-separated string -in the Git-Protocol HTTP header. - -Dumb Server Response -^^^^^^^^^^^^^^^^^^^^ -Dumb servers MUST respond with the dumb server reply format. - -See the prior section under dumb clients for a more detailed -description of the dumb server response. - -Smart Server Response -^^^^^^^^^^^^^^^^^^^^^ -If the server does not recognize the requested service name, or the -requested service name has been disabled by the server administrator, -the server MUST respond with the `403 Forbidden` HTTP status code. - -Otherwise, smart servers MUST respond with the smart server reply -format for the requested service name. - -Cache-Control headers SHOULD be used to disable caching of the -returned entity. - -The Content-Type MUST be `application/x-$servicename-advertisement`. -Clients SHOULD fall back to the dumb protocol if another content -type is returned. When falling back to the dumb protocol clients -SHOULD NOT make an additional request to `$GIT_URL/info/refs`, but -instead SHOULD use the response already in hand. Clients MUST NOT -continue if they do not support the dumb protocol. - -Clients MUST validate the status code is either `200 OK` or -`304 Not Modified`. - -Clients MUST validate the first five bytes of the response entity -matches the regex `^[0-9a-f]{4}#`. If this test fails, clients -MUST NOT continue. - -Clients MUST parse the entire response as a sequence of pkt-line -records. - -Clients MUST verify the first pkt-line is `# service=$servicename`. -Servers MUST set $servicename to be the request parameter value. -Servers SHOULD include an LF at the end of this line. -Clients MUST ignore an LF at the end of the line. - -Servers MUST terminate the response with the magic `0000` end -pkt-line marker. - -The returned response is a pkt-line stream describing each ref and -its known value. The stream SHOULD be sorted by name according to -the C locale ordering. The stream SHOULD include the default ref -named `HEAD` as the first ref. The stream MUST include capability -declarations behind a NUL on the first ref. - -The returned response contains "version 1" if "version=1" was sent as an -Extra Parameter. - - smart_reply = PKT-LINE("# service=$servicename" LF) - "0000" - *1("version 1") - ref_list - "0000" - ref_list = empty_list / non_empty_list - - empty_list = PKT-LINE(zero-id SP "capabilities^{}" NUL cap-list LF) - - non_empty_list = PKT-LINE(obj-id SP name NUL cap_list LF) - *ref_record - - cap-list = capability *(SP capability) - capability = 1*(LC_ALPHA / DIGIT / "-" / "_") - LC_ALPHA = %x61-7A - - ref_record = any_ref / peeled_ref - any_ref = PKT-LINE(obj-id SP name LF) - peeled_ref = PKT-LINE(obj-id SP name LF) - PKT-LINE(obj-id SP name "^{}" LF - - -Smart Service git-upload-pack ------------------------------- -This service reads from the repository pointed to by `$GIT_URL`. - -Clients MUST first perform ref discovery with -`$GIT_URL/info/refs?service=git-upload-pack`. - - C: POST $GIT_URL/git-upload-pack HTTP/1.0 - C: Content-Type: application/x-git-upload-pack-request - C: - C: 0032want 0a53e9ddeaddad63ad106860237bbf53411d11a7\n - C: 0032have 441b40d833fdfa93eb2908e52742248faf0ee993\n - C: 0000 - - S: 200 OK - S: Content-Type: application/x-git-upload-pack-result - S: Cache-Control: no-cache - S: - S: ....ACK %s, continue - S: ....NAK - -Clients MUST NOT reuse or revalidate a cached response. -Servers MUST include sufficient Cache-Control headers -to prevent caching of the response. - -Servers SHOULD support all capabilities defined here. - -Clients MUST send at least one "want" command in the request body. -Clients MUST NOT reference an id in a "want" command which did not -appear in the response obtained through ref discovery unless the -server advertises capability `allow-tip-sha1-in-want` or -`allow-reachable-sha1-in-want`. - - compute_request = want_list - have_list - request_end - request_end = "0000" / "done" - - want_list = PKT-LINE(want SP cap_list LF) - *(want_pkt) - want_pkt = PKT-LINE(want LF) - want = "want" SP id - cap_list = capability *(SP capability) - - have_list = *PKT-LINE("have" SP id LF) - -TODO: Document this further. - -The Negotiation Algorithm -~~~~~~~~~~~~~~~~~~~~~~~~~ -The computation to select the minimal pack proceeds as follows -(C = client, S = server): - -'init step:' - -C: Use ref discovery to obtain the advertised refs. - -C: Place any object seen into set `advertised`. - -C: Build an empty set, `common`, to hold the objects that are later - determined to be on both ends. - -C: Build a set, `want`, of the objects from `advertised` the client - wants to fetch, based on what it saw during ref discovery. - -C: Start a queue, `c_pending`, ordered by commit time (popping newest - first). Add all client refs. When a commit is popped from - the queue its parents SHOULD be automatically inserted back. - Commits MUST only enter the queue once. - -'one compute step:' - -C: Send one `$GIT_URL/git-upload-pack` request: - - C: 0032want <want #1>............................... - C: 0032want <want #2>............................... - .... - C: 0032have <common #1>............................. - C: 0032have <common #2>............................. - .... - C: 0032have <have #1>............................... - C: 0032have <have #2>............................... - .... - C: 0000 - -The stream is organized into "commands", with each command -appearing by itself in a pkt-line. Within a command line, -the text leading up to the first space is the command name, -and the remainder of the line to the first LF is the value. -Command lines are terminated with an LF as the last byte of -the pkt-line value. - -Commands MUST appear in the following order, if they appear -at all in the request stream: - -* "want" -* "have" - -The stream is terminated by a pkt-line flush (`0000`). - -A single "want" or "have" command MUST have one hex formatted -SHA-1 as its value. Multiple SHA-1s MUST be sent by sending -multiple commands. - -The `have` list is created by popping the first 32 commits -from `c_pending`. Less can be supplied if `c_pending` empties. - -If the client has sent 256 "have" commits and has not yet -received one of those back from `s_common`, or the client has -emptied `c_pending` it SHOULD include a "done" command to let -the server know it won't proceed: - - C: 0009done - -S: Parse the git-upload-pack request: - -Verify all objects in `want` are directly reachable from refs. - -The server MAY walk backwards through history or through -the reflog to permit slightly stale requests. - -If no "want" objects are received, send an error: -TODO: Define error if no "want" lines are requested. - -If any "want" object is not reachable, send an error: -TODO: Define error if an invalid "want" is requested. - -Create an empty list, `s_common`. - -If "have" was sent: - -Loop through the objects in the order supplied by the client. - -For each object, if the server has the object reachable from -a ref, add it to `s_common`. If a commit is added to `s_common`, -do not add any ancestors, even if they also appear in `have`. - -S: Send the git-upload-pack response: - -If the server has found a closed set of objects to pack or the -request ends with "done", it replies with the pack. -TODO: Document the pack based response - - S: PACK... - -The returned stream is the side-band-64k protocol supported -by the git-upload-pack service, and the pack is embedded into -stream 1. Progress messages from the server side MAY appear -in stream 2. - -Here a "closed set of objects" is defined to have at least -one path from every "want" to at least one "common" object. - -If the server needs more information, it replies with a -status continue response: -TODO: Document the non-pack response - -C: Parse the upload-pack response: - TODO: Document parsing response - -'Do another compute step.' - - -Smart Service git-receive-pack ------------------------------- -This service reads from the repository pointed to by `$GIT_URL`. - -Clients MUST first perform ref discovery with -`$GIT_URL/info/refs?service=git-receive-pack`. - - C: POST $GIT_URL/git-receive-pack HTTP/1.0 - C: Content-Type: application/x-git-receive-pack-request - C: - C: ....0a53e9ddeaddad63ad106860237bbf53411d11a7 441b40d833fdfa93eb2908e52742248faf0ee993 refs/heads/maint\0 report-status - C: 0000 - C: PACK.... - - S: 200 OK - S: Content-Type: application/x-git-receive-pack-result - S: Cache-Control: no-cache - S: - S: .... - -Clients MUST NOT reuse or revalidate a cached response. -Servers MUST include sufficient Cache-Control headers -to prevent caching of the response. - -Servers SHOULD support all capabilities defined here. - -Clients MUST send at least one command in the request body. -Within the command portion of the request body clients SHOULD send -the id obtained through ref discovery as old_id. - - update_request = command_list - "PACK" <binary data> - - command_list = PKT-LINE(command NUL cap_list LF) - *(command_pkt) - command_pkt = PKT-LINE(command LF) - cap_list = *(SP capability) SP - - command = create / delete / update - create = zero-id SP new_id SP name - delete = old_id SP zero-id SP name - update = old_id SP new_id SP name - -TODO: Document this further. - - -References ----------- - -http://www.ietf.org/rfc/rfc1738.txt[RFC 1738: Uniform Resource Locators (URL)] -http://www.ietf.org/rfc/rfc2616.txt[RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1] -link:technical/pack-protocol.html -link:technical/protocol-capabilities.html diff --git a/third_party/git/Documentation/technical/index-format.txt b/third_party/git/Documentation/technical/index-format.txt deleted file mode 100644 index 7c4d67aa6a..0000000000 --- a/third_party/git/Documentation/technical/index-format.txt +++ /dev/null @@ -1,357 +0,0 @@ -Git index format -================ - -== The Git index file has the following format - - All binary numbers are in network byte order. Version 2 is described - here unless stated otherwise. - - - A 12-byte header consisting of - - 4-byte signature: - The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache") - - 4-byte version number: - The current supported versions are 2, 3 and 4. - - 32-bit number of index entries. - - - A number of sorted index entries (see below). - - - Extensions - - Extensions are identified by signature. Optional extensions can - be ignored if Git does not understand them. - - Git currently supports cached tree and resolve undo extensions. - - 4-byte extension signature. If the first byte is 'A'..'Z' the - extension is optional and can be ignored. - - 32-bit size of the extension - - Extension data - - - 160-bit SHA-1 over the content of the index file before this - checksum. - -== Index entry - - Index entries are sorted in ascending order on the name field, - interpreted as a string of unsigned bytes (i.e. memcmp() order, no - localization, no special casing of directory separator '/'). Entries - with the same name are sorted by their stage field. - - 32-bit ctime seconds, the last time a file's metadata changed - this is stat(2) data - - 32-bit ctime nanosecond fractions - this is stat(2) data - - 32-bit mtime seconds, the last time a file's data changed - this is stat(2) data - - 32-bit mtime nanosecond fractions - this is stat(2) data - - 32-bit dev - this is stat(2) data - - 32-bit ino - this is stat(2) data - - 32-bit mode, split into (high to low bits) - - 4-bit object type - valid values in binary are 1000 (regular file), 1010 (symbolic link) - and 1110 (gitlink) - - 3-bit unused - - 9-bit unix permission. Only 0755 and 0644 are valid for regular files. - Symbolic links and gitlinks have value 0 in this field. - - 32-bit uid - this is stat(2) data - - 32-bit gid - this is stat(2) data - - 32-bit file size - This is the on-disk size from stat(2), truncated to 32-bit. - - 160-bit SHA-1 for the represented object - - A 16-bit 'flags' field split into (high to low bits) - - 1-bit assume-valid flag - - 1-bit extended flag (must be zero in version 2) - - 2-bit stage (during merge) - - 12-bit name length if the length is less than 0xFFF; otherwise 0xFFF - is stored in this field. - - (Version 3 or later) A 16-bit field, only applicable if the - "extended flag" above is 1, split into (high to low bits). - - 1-bit reserved for future - - 1-bit skip-worktree flag (used by sparse checkout) - - 1-bit intent-to-add flag (used by "git add -N") - - 13-bit unused, must be zero - - Entry path name (variable length) relative to top level directory - (without leading slash). '/' is used as path separator. The special - path components ".", ".." and ".git" (without quotes) are disallowed. - Trailing slash is also disallowed. - - The exact encoding is undefined, but the '.' and '/' characters - are encoded in 7-bit ASCII and the encoding cannot contain a NUL - byte (iow, this is a UNIX pathname). - - (Version 4) In version 4, the entry path name is prefix-compressed - relative to the path name for the previous entry (the very first - entry is encoded as if the path name for the previous entry is an - empty string). At the beginning of an entry, an integer N in the - variable width encoding (the same encoding as the offset is encoded - for OFS_DELTA pack entries; see pack-format.txt) is stored, followed - by a NUL-terminated string S. Removing N bytes from the end of the - path name for the previous entry, and replacing it with the string S - yields the path name for this entry. - - 1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes - while keeping the name NUL-terminated. - - (Version 4) In version 4, the padding after the pathname does not - exist. - - Interpretation of index entries in split index mode is completely - different. See below for details. - -== Extensions - -=== Cached tree - - Cached tree extension contains pre-computed hashes for trees that can - be derived from the index. It helps speed up tree object generation - from index for a new commit. - - When a path is updated in index, the path must be invalidated and - removed from tree cache. - - The signature for this extension is { 'T', 'R', 'E', 'E' }. - - A series of entries fill the entire extension; each of which - consists of: - - - NUL-terminated path component (relative to its parent directory); - - - ASCII decimal number of entries in the index that is covered by the - tree this entry represents (entry_count); - - - A space (ASCII 32); - - - ASCII decimal number that represents the number of subtrees this - tree has; - - - A newline (ASCII 10); and - - - 160-bit object name for the object that would result from writing - this span of index as a tree. - - An entry can be in an invalidated state and is represented by having - a negative number in the entry_count field. In this case, there is no - object name and the next entry starts immediately after the newline. - When writing an invalid entry, -1 should always be used as entry_count. - - The entries are written out in the top-down, depth-first order. The - first entry represents the root level of the repository, followed by the - first subtree--let's call this A--of the root level (with its name - relative to the root level), followed by the first subtree of A (with - its name relative to A), ... - -=== Resolve undo - - A conflict is represented in the index as a set of higher stage entries. - When a conflict is resolved (e.g. with "git add path"), these higher - stage entries will be removed and a stage-0 entry with proper resolution - is added. - - When these higher stage entries are removed, they are saved in the - resolve undo extension, so that conflicts can be recreated (e.g. with - "git checkout -m"), in case users want to redo a conflict resolution - from scratch. - - The signature for this extension is { 'R', 'E', 'U', 'C' }. - - A series of entries fill the entire extension; each of which - consists of: - - - NUL-terminated pathname the entry describes (relative to the root of - the repository, i.e. full pathname); - - - Three NUL-terminated ASCII octal numbers, entry mode of entries in - stage 1 to 3 (a missing stage is represented by "0" in this field); - and - - - At most three 160-bit object names of the entry in stages from 1 to 3 - (nothing is written for a missing stage). - -=== Split index - - In split index mode, the majority of index entries could be stored - in a separate file. This extension records the changes to be made on - top of that to produce the final index. - - The signature for this extension is { 'l', 'i', 'n', 'k' }. - - The extension consists of: - - - 160-bit SHA-1 of the shared index file. The shared index file path - is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the - index does not require a shared index file. - - - An ewah-encoded delete bitmap, each bit represents an entry in the - shared index. If a bit is set, its corresponding entry in the - shared index will be removed from the final index. Note, because - a delete operation changes index entry positions, but we do need - original positions in replace phase, it's best to just mark - entries for removal, then do a mass deletion after replacement. - - - An ewah-encoded replace bitmap, each bit represents an entry in - the shared index. If a bit is set, its corresponding entry in the - shared index will be replaced with an entry in this index - file. All replaced entries are stored in sorted order in this - index. The first "1" bit in the replace bitmap corresponds to the - first index entry, the second "1" bit to the second entry and so - on. Replaced entries may have empty path names to save space. - - The remaining index entries after replaced ones will be added to the - final index. These added entries are also sorted by entry name then - stage. - -== Untracked cache - - Untracked cache saves the untracked file list and necessary data to - verify the cache. The signature for this extension is { 'U', 'N', - 'T', 'R' }. - - The extension starts with - - - A sequence of NUL-terminated strings, preceded by the size of the - sequence in variable width encoding. Each string describes the - environment where the cache can be used. - - - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from - ctime field until "file size". - - - Stat data of core.excludesfile - - - 32-bit dir_flags (see struct dir_struct) - - - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file - does not exist. - - - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does - not exist. - - - NUL-terminated string of per-dir exclude file name. This usually - is ".gitignore". - - - The number of following directory blocks, variable width - encoding. If this number is zero, the extension ends here with a - following NUL. - - - A number of directory blocks in depth-first-search order, each - consists of - - - The number of untracked entries, variable width encoding. - - - The number of sub-directory blocks, variable width encoding. - - - The directory name terminated by NUL. - - - A number of untracked file/dir names terminated by NUL. - -The remaining data of each directory block is grouped by type: - - - An ewah bitmap, the n-th bit marks whether the n-th directory has - valid untracked cache entries. - - - An ewah bitmap, the n-th bit records "check-only" bit of - read_directory_recursive() for the n-th directory. - - - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data - is valid for the n-th directory and exists in the next data. - - - An array of stat data. The n-th data corresponds with the n-th - "one" bit in the previous ewah bitmap. - - - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit - in the previous ewah bitmap. - - - One NUL. - -== File System Monitor cache - - The file system monitor cache tracks files for which the core.fsmonitor - hook has told us about changes. The signature for this extension is - { 'F', 'S', 'M', 'N' }. - - The extension starts with - - - 32-bit version number: the current supported version is 1. - - - 64-bit time: the extension data reflects all changes through the given - time which is stored as the nanoseconds elapsed since midnight, - January 1, 1970. - - - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap. - - - An ewah bitmap, the n-th bit indicates whether the n-th index entry - is not CE_FSMONITOR_VALID. - -== End of Index Entry - - The End of Index Entry (EOIE) is used to locate the end of the variable - length index entries and the begining of the extensions. Code can take - advantage of this to quickly locate the index extensions without having - to parse through all of the index entries. - - Because it must be able to be loaded before the variable length cache - entries and other index extensions, this extension must be written last. - The signature for this extension is { 'E', 'O', 'I', 'E' }. - - The extension consists of: - - - 32-bit offset to the end of the index entries - - - 160-bit SHA-1 over the extension types and their sizes (but not - their contents). E.g. if we have "TREE" extension that is N-bytes - long, "REUC" extension that is M-bytes long, followed by "EOIE", - then the hash would be: - - SHA-1("TREE" + <binary representation of N> + - "REUC" + <binary representation of M>) - -== Index Entry Offset Table - - The Index Entry Offset Table (IEOT) is used to help address the CPU - cost of loading the index by enabling multi-threading the process of - converting cache entries from the on-disk format to the in-memory format. - The signature for this extension is { 'I', 'E', 'O', 'T' }. - - The extension consists of: - - - 32-bit version (currently 1) - - - A number of index offset entries each consisting of: - - - 32-bit offset from the begining of the file to the first cache entry - in this block of entries. - - - 32-bit count of cache entries in this block diff --git a/third_party/git/Documentation/technical/long-running-process-protocol.txt b/third_party/git/Documentation/technical/long-running-process-protocol.txt deleted file mode 100644 index aa0aa9af1c..0000000000 --- a/third_party/git/Documentation/technical/long-running-process-protocol.txt +++ /dev/null @@ -1,50 +0,0 @@ -Long-running process protocol -============================= - -This protocol is used when Git needs to communicate with an external -process throughout the entire life of a single Git command. All -communication is in pkt-line format (see technical/protocol-common.txt) -over standard input and standard output. - -Handshake ---------- - -Git starts by sending a welcome message (for example, -"git-filter-client"), a list of supported protocol version numbers, and -a flush packet. Git expects to read the welcome message with "server" -instead of "client" (for example, "git-filter-server"), exactly one -protocol version number from the previously sent list, and a flush -packet. All further communication will be based on the selected version. -The remaining protocol description below documents "version=2". Please -note that "version=42" in the example below does not exist and is only -there to illustrate how the protocol would look like with more than one -version. - -After the version negotiation Git sends a list of all capabilities that -it supports and a flush packet. Git expects to read a list of desired -capabilities, which must be a subset of the supported capabilities list, -and a flush packet as response: ------------------------- -packet: git> git-filter-client -packet: git> version=2 -packet: git> version=42 -packet: git> 0000 -packet: git< git-filter-server -packet: git< version=2 -packet: git< 0000 -packet: git> capability=clean -packet: git> capability=smudge -packet: git> capability=not-yet-invented -packet: git> 0000 -packet: git< capability=clean -packet: git< capability=smudge -packet: git< 0000 ------------------------- - -Shutdown --------- - -Git will close -the command pipe on exit. The filter is expected to detect EOF -and exit gracefully on its own. Git will wait until the filter -process has stopped. diff --git a/third_party/git/Documentation/technical/multi-pack-index.txt b/third_party/git/Documentation/technical/multi-pack-index.txt deleted file mode 100644 index d7e57639f7..0000000000 --- a/third_party/git/Documentation/technical/multi-pack-index.txt +++ /dev/null @@ -1,109 +0,0 @@ -Multi-Pack-Index (MIDX) Design Notes -==================================== - -The Git object directory contains a 'pack' directory containing -packfiles (with suffix ".pack") and pack-indexes (with suffix -".idx"). The pack-indexes provide a way to lookup objects and -navigate to their offset within the pack, but these must come -in pairs with the packfiles. This pairing depends on the file -names, as the pack-index differs only in suffix with its pack- -file. While the pack-indexes provide fast lookup per packfile, -this performance degrades as the number of packfiles increases, -because abbreviations need to inspect every packfile and we are -more likely to have a miss on our most-recently-used packfile. -For some large repositories, repacking into a single packfile -is not feasible due to storage space or excessive repack times. - -The multi-pack-index (MIDX for short) stores a list of objects -and their offsets into multiple packfiles. It contains: - -- A list of packfile names. -- A sorted list of object IDs. -- A list of metadata for the ith object ID including: - - A value j referring to the jth packfile. - - An offset within the jth packfile for the object. -- If large offsets are required, we use another list of large - offsets similar to version 2 pack-indexes. - -Thus, we can provide O(log N) lookup time for any number -of packfiles. - -Design Details --------------- - -- The MIDX is stored in a file named 'multi-pack-index' in the - .git/objects/pack directory. This could be stored in the pack - directory of an alternate. It refers only to packfiles in that - same directory. - -- The pack.multiIndex config setting must be on to consume MIDX files. - -- The file format includes parameters for the object ID hash - function, so a future change of hash algorithm does not require - a change in format. - -- The MIDX keeps only one record per object ID. If an object appears - in multiple packfiles, then the MIDX selects the copy in the most- - recently modified packfile. - -- If there exist packfiles in the pack directory not registered in - the MIDX, then those packfiles are loaded into the `packed_git` - list and `packed_git_mru` cache. - -- The pack-indexes (.idx files) remain in the pack directory so we - can delete the MIDX file, set core.midx to false, or downgrade - without any loss of information. - -- The MIDX file format uses a chunk-based approach (similar to the - commit-graph file) that allows optional data to be added. - -Future Work ------------ - -- Add a 'verify' subcommand to the 'git midx' builtin to verify the - contents of the multi-pack-index file match the offsets listed in - the corresponding pack-indexes. - -- The multi-pack-index allows many packfiles, especially in a context - where repacking is expensive (such as a very large repo), or - unexpected maintenance time is unacceptable (such as a high-demand - build machine). However, the multi-pack-index needs to be rewritten - in full every time. We can extend the format to be incremental, so - writes are fast. By storing a small "tip" multi-pack-index that - points to large "base" MIDX files, we can keep writes fast while - still reducing the number of binary searches required for object - lookups. - -- The reachability bitmap is currently paired directly with a single - packfile, using the pack-order as the object order to hopefully - compress the bitmaps well using run-length encoding. This could be - extended to pair a reachability bitmap with a multi-pack-index. If - the multi-pack-index is extended to store a "stable object order" - (a function Order(hash) = integer that is constant for a given hash, - even as the multi-pack-index is updated) then a reachability bitmap - could point to a multi-pack-index and be updated independently. - -- Packfiles can be marked as "special" using empty files that share - the initial name but replace ".pack" with ".keep" or ".promisor". - We can add an optional chunk of data to the multi-pack-index that - records flags of information about the packfiles. This allows new - states, such as 'repacked' or 'redeltified', that can help with - pack maintenance in a multi-pack environment. It may also be - helpful to organize packfiles by object type (commit, tree, blob, - etc.) and use this metadata to help that maintenance. - -- The partial clone feature records special "promisor" packs that - may point to objects that are not stored locally, but available - on request to a server. The multi-pack-index does not currently - track these promisor packs. - -Related Links -------------- -[0] https://bugs.chromium.org/p/git/issues/detail?id=6 - Chromium work item for: Multi-Pack Index (MIDX) - -[1] https://public-inbox.org/git/20180107181459.222909-1-dstolee@microsoft.com/ - An earlier RFC for the multi-pack-index feature - -[2] https://public-inbox.org/git/alpine.DEB.2.20.1803091557510.23109@alexmv-linux/ - Git Merge 2018 Contributor's summit notes (includes discussion of MIDX) diff --git a/third_party/git/Documentation/technical/pack-format.txt b/third_party/git/Documentation/technical/pack-format.txt deleted file mode 100644 index cab5bdd2ff..0000000000 --- a/third_party/git/Documentation/technical/pack-format.txt +++ /dev/null @@ -1,331 +0,0 @@ -Git pack format -=============== - -== pack-*.pack files have the following format: - - - A header appears at the beginning and consists of the following: - - 4-byte signature: - The signature is: {'P', 'A', 'C', 'K'} - - 4-byte version number (network byte order): - Git currently accepts version number 2 or 3 but - generates version 2 only. - - 4-byte number of objects contained in the pack (network byte order) - - Observation: we cannot have more than 4G versions ;-) and - more than 4G objects in a pack. - - - The header is followed by number of object entries, each of - which looks like this: - - (undeltified representation) - n-byte type and length (3-bit type, (n-1)*7+4-bit length) - compressed data - - (deltified representation) - n-byte type and length (3-bit type, (n-1)*7+4-bit length) - 20-byte base object name if OBJ_REF_DELTA or a negative relative - offset from the delta object's position in the pack if this - is an OBJ_OFS_DELTA object - compressed delta data - - Observation: length of each object is encoded in a variable - length format and is not constrained to 32-bit or anything. - - - The trailer records 20-byte SHA-1 checksum of all of the above. - -=== Object types - -Valid object types are: - -- OBJ_COMMIT (1) -- OBJ_TREE (2) -- OBJ_BLOB (3) -- OBJ_TAG (4) -- OBJ_OFS_DELTA (6) -- OBJ_REF_DELTA (7) - -Type 5 is reserved for future expansion. Type 0 is invalid. - -=== Deltified representation - -Conceptually there are only four object types: commit, tree, tag and -blob. However to save space, an object could be stored as a "delta" of -another "base" object. These representations are assigned new types -ofs-delta and ref-delta, which is only valid in a pack file. - -Both ofs-delta and ref-delta store the "delta" to be applied to -another object (called 'base object') to reconstruct the object. The -difference between them is, ref-delta directly encodes 20-byte base -object name. If the base object is in the same pack, ofs-delta encodes -the offset of the base object in the pack instead. - -The base object could also be deltified if it's in the same pack. -Ref-delta can also refer to an object outside the pack (i.e. the -so-called "thin pack"). When stored on disk however, the pack should -be self contained to avoid cyclic dependency. - -The delta data is a sequence of instructions to reconstruct an object -from the base object. If the base object is deltified, it must be -converted to canonical form first. Each instruction appends more and -more data to the target object until it's complete. There are two -supported instructions so far: one for copy a byte range from the -source object and one for inserting new data embedded in the -instruction itself. - -Each instruction has variable length. Instruction type is determined -by the seventh bit of the first octet. The following diagrams follow -the convention in RFC 1951 (Deflate compressed data format). - -==== Instruction to copy from base object - - +----------+---------+---------+---------+---------+-------+-------+-------+ - | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 | - +----------+---------+---------+---------+---------+-------+-------+-------+ - -This is the instruction format to copy a byte range from the source -object. It encodes the offset to copy from and the number of bytes to -copy. Offset and size are in little-endian order. - -All offset and size bytes are optional. This is to reduce the -instruction size when encoding small offsets or sizes. The first seven -bits in the first octet determines which of the next seven octets is -present. If bit zero is set, offset1 is present. If bit one is set -offset2 is present and so on. - -Note that a more compact instruction does not change offset and size -encoding. For example, if only offset2 is omitted like below, offset3 -still contains bits 16-23. It does not become offset2 and contains -bits 8-15 even if it's right next to offset1. - - +----------+---------+---------+ - | 10000101 | offset1 | offset3 | - +----------+---------+---------+ - -In its most compact form, this instruction only takes up one byte -(0x80) with both offset and size omitted, which will have default -values zero. There is another exception: size zero is automatically -converted to 0x10000. - -==== Instruction to add new data - - +----------+============+ - | 0xxxxxxx | data | - +----------+============+ - -This is the instruction to construct target object without the base -object. The following data is appended to the target object. The first -seven bits of the first octet determines the size of data in -bytes. The size must be non-zero. - -==== Reserved instruction - - +----------+============ - | 00000000 | - +----------+============ - -This is the instruction reserved for future expansion. - -== Original (version 1) pack-*.idx files have the following format: - - - The header consists of 256 4-byte network byte order - integers. N-th entry of this table records the number of - objects in the corresponding pack, the first byte of whose - object name is less than or equal to N. This is called the - 'first-level fan-out' table. - - - The header is followed by sorted 24-byte entries, one entry - per object in the pack. Each entry is: - - 4-byte network byte order integer, recording where the - object is stored in the packfile as the offset from the - beginning. - - 20-byte object name. - - - The file is concluded with a trailer: - - A copy of the 20-byte SHA-1 checksum at the end of - corresponding packfile. - - 20-byte SHA-1-checksum of all of the above. - -Pack Idx file: - - -- +--------------------------------+ -fanout | fanout[0] = 2 (for example) |-. -table +--------------------------------+ | - | fanout[1] | | - +--------------------------------+ | - | fanout[2] | | - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | - | fanout[255] = total objects |---. - -- +--------------------------------+ | | -main | offset | | | -index | object name 00XXXXXXXXXXXXXXXX | | | -table +--------------------------------+ | | - | offset | | | - | object name 00XXXXXXXXXXXXXXXX | | | - +--------------------------------+<+ | - .-| offset | | - | | object name 01XXXXXXXXXXXXXXXX | | - | +--------------------------------+ | - | | offset | | - | | object name 01XXXXXXXXXXXXXXXX | | - | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | - | | offset | | - | | object name FFXXXXXXXXXXXXXXXX | | - --| +--------------------------------+<--+ -trailer | | packfile checksum | - | +--------------------------------+ - | | idxfile checksum | - | +--------------------------------+ - .-------. - | -Pack file entry: <+ - - packed object header: - 1-byte size extension bit (MSB) - type (next 3 bit) - size0 (lower 4-bit) - n-byte sizeN (as long as MSB is set, each 7-bit) - size0..sizeN form 4+7+7+..+7 bit integer, size0 - is the least significant part, and sizeN is the - most significant part. - packed object data: - If it is not DELTA, then deflated bytes (the size above - is the size before compression). - If it is REF_DELTA, then - 20-byte base object name SHA-1 (the size above is the - size of the delta data that follows). - delta data, deflated. - If it is OFS_DELTA, then - n-byte offset (see below) interpreted as a negative - offset from the type-byte of the header of the - ofs-delta entry (the size above is the size of - the delta data that follows). - delta data, deflated. - - offset encoding: - n bytes with MSB set in all but the last one. - The offset is then the number constructed by - concatenating the lower 7 bit of each byte, and - for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1)) - to the result. - - - -== Version 2 pack-*.idx files support packs larger than 4 GiB, and - have some other reorganizations. They have the format: - - - A 4-byte magic number '\377tOc' which is an unreasonable - fanout[0] value. - - - A 4-byte version number (= 2) - - - A 256-entry fan-out table just like v1. - - - A table of sorted 20-byte SHA-1 object names. These are - packed together without offset values to reduce the cache - footprint of the binary search for a specific object name. - - - A table of 4-byte CRC32 values of the packed object data. - This is new in v2 so compressed data can be copied directly - from pack to pack during repacking without undetected - data corruption. - - - A table of 4-byte offset values (in network byte order). - These are usually 31-bit pack file offsets, but large - offsets are encoded as an index into the next table with - the msbit set. - - - A table of 8-byte offset entries (empty for pack files less - than 2 GiB). Pack files are organized with heavily used - objects toward the front, so most object references should - not need to refer to this table. - - - The same trailer as a v1 pack file: - - A copy of the 20-byte SHA-1 checksum at the end of - corresponding packfile. - - 20-byte SHA-1-checksum of all of the above. - -== multi-pack-index (MIDX) files have the following format: - -The multi-pack-index files refer to multiple pack-files and loose objects. - -In order to allow extensions that add extra data to the MIDX, we organize -the body into "chunks" and provide a lookup table at the beginning of the -body. The header includes certain length values, such as the number of packs, -the number of base MIDX files, hash lengths and types. - -All 4-byte numbers are in network order. - -HEADER: - - 4-byte signature: - The signature is: {'M', 'I', 'D', 'X'} - - 1-byte version number: - Git only writes or recognizes version 1. - - 1-byte Object Id Version - Git only writes or recognizes version 1 (SHA1). - - 1-byte number of "chunks" - - 1-byte number of base multi-pack-index files: - This value is currently always zero. - - 4-byte number of pack files - -CHUNK LOOKUP: - - (C + 1) * 12 bytes providing the chunk offsets: - First 4 bytes describe chunk id. Value 0 is a terminating label. - Other 8 bytes provide offset in current file for chunk to start. - (Chunks are provided in file-order, so you can infer the length - using the next chunk position if necessary.) - - The remaining data in the body is described one chunk at a time, and - these chunks may be given in any order. Chunks are required unless - otherwise specified. - -CHUNK DATA: - - Packfile Names (ID: {'P', 'N', 'A', 'M'}) - Stores the packfile names as concatenated, null-terminated strings. - Packfiles must be listed in lexicographic order for fast lookups by - name. This is the only chunk not guaranteed to be a multiple of four - bytes in length, so should be the last chunk for alignment reasons. - - OID Fanout (ID: {'O', 'I', 'D', 'F'}) - The ith entry, F[i], stores the number of OIDs with first - byte at most i. Thus F[255] stores the total - number of objects. - - OID Lookup (ID: {'O', 'I', 'D', 'L'}) - The OIDs for all objects in the MIDX are stored in lexicographic - order in this chunk. - - Object Offsets (ID: {'O', 'O', 'F', 'F'}) - Stores two 4-byte values for every object. - 1: The pack-int-id for the pack storing this object. - 2: The offset within the pack. - If all offsets are less than 2^31, then the large offset chunk - will not exist and offsets are stored as in IDX v1. - If there is at least one offset value larger than 2^32-1, then - the large offset chunk must exist. If the large offset chunk - exists and the 31st bit is on, then removing that bit reveals - the row in the large offsets containing the 8-byte offset of - this object. - - [Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'}) - 8-byte offsets into large packfiles. - -TRAILER: - - 20-byte SHA1-checksum of the above contents. diff --git a/third_party/git/Documentation/technical/pack-heuristics.txt b/third_party/git/Documentation/technical/pack-heuristics.txt deleted file mode 100644 index 95a07db6e8..0000000000 --- a/third_party/git/Documentation/technical/pack-heuristics.txt +++ /dev/null @@ -1,460 +0,0 @@ -Concerning Git's Packing Heuristics -=================================== - - Oh, here's a really stupid question: - - Where do I go - to learn the details - of Git's packing heuristics? - -Be careful what you ask! - -Followers of the Git, please open the Git IRC Log and turn to -February 10, 2006. - -It's a rare occasion, and we are joined by the King Git Himself, -Linus Torvalds (linus). Nathaniel Smith, (njs`), has the floor -and seeks enlightenment. Others are present, but silent. - -Let's listen in! - - <njs`> Oh, here's a really stupid question -- where do I go to - learn the details of Git's packing heuristics? google avails - me not, reading the source didn't help a lot, and wading - through the whole mailing list seems less efficient than any - of that. - -It is a bold start! A plea for help combined with a simultaneous -tri-part attack on some of the tried and true mainstays in the quest -for enlightenment. Brash accusations of google being useless. Hubris! -Maligning the source. Heresy! Disdain for the mailing list archives. -Woe. - - <pasky> yes, the packing-related delta stuff is somewhat - mysterious even for me ;) - -Ah! Modesty after all. - - <linus> njs, I don't think the docs exist. That's something where - I don't think anybody else than me even really got involved. - Most of the rest of Git others have been busy with (especially - Junio), but packing nobody touched after I did it. - -It's cryptic, yet vague. Linus in style for sure. Wise men -interpret this as an apology. A few argue it is merely a -statement of fact. - - <njs`> I guess the next step is "read the source again", but I - have to build up a certain level of gumption first :-) - -Indeed! On both points. - - <linus> The packing heuristic is actually really really simple. - -Bait... - - <linus> But strange. - -And switch. That ought to do it! - - <linus> Remember: Git really doesn't follow files. So what it does is - - generate a list of all objects - - sort the list according to magic heuristics - - walk the list, using a sliding window, seeing if an object - can be diffed against another object in the window - - write out the list in recency order - -The traditional understatement: - - <njs`> I suspect that what I'm missing is the precise definition of - the word "magic" - -The traditional insight: - - <pasky> yes - -And Babel-like confusion flowed. - - <njs`> oh, hmm, and I'm not sure what this sliding window means either - - <pasky> iirc, it appeared to me to be just the sha1 of the object - when reading the code casually ... - - ... which simply doesn't sound as a very good heuristics, though ;) - - <njs`> .....and recency order. okay, I think it's clear I didn't - even realize how much I wasn't realizing :-) - -Ah, grasshopper! And thus the enlightenment begins anew. - - <linus> The "magic" is actually in theory totally arbitrary. - ANY order will give you a working pack, but no, it's not - ordered by SHA-1. - - Before talking about the ordering for the sliding delta - window, let's talk about the recency order. That's more - important in one way. - - <njs`> Right, but if all you want is a working way to pack things - together, you could just use cat and save yourself some - trouble... - -Waaait for it.... - - <linus> The recency ordering (which is basically: put objects - _physically_ into the pack in the order that they are - "reachable" from the head) is important. - - <njs`> okay - - <linus> It's important because that's the thing that gives packs - good locality. It keeps the objects close to the head (whether - they are old or new, but they are _reachable_ from the head) - at the head of the pack. So packs actually have absolutely - _wonderful_ IO patterns. - -Read that again, because it is important. - - <linus> But recency ordering is totally useless for deciding how - to actually generate the deltas, so the delta ordering is - something else. - - The delta ordering is (wait for it): - - first sort by the "basename" of the object, as defined by - the name the object was _first_ reached through when - generating the object list - - within the same basename, sort by size of the object - - but always sort different types separately (commits first). - - That's not exactly it, but it's very close. - - <njs`> The "_first_ reached" thing is not too important, just you - need some way to break ties since the same objects may be - reachable many ways, yes? - -And as if to clarify: - - <linus> The point is that it's all really just any random - heuristic, and the ordering is totally unimportant for - correctness, but it helps a lot if the heuristic gives - "clumping" for things that are likely to delta well against - each other. - -It is an important point, so secretly, I did my own research and have -included my results below. To be fair, it has changed some over time. -And through the magic of Revisionistic History, I draw upon this entry -from The Git IRC Logs on my father's birthday, March 1: - - <gitster> The quote from the above linus should be rewritten a - bit (wait for it): - - first sort by type. Different objects never delta with - each other. - - then sort by filename/dirname. hash of the basename - occupies the top BITS_PER_INT-DIR_BITS bits, and bottom - DIR_BITS are for the hash of leading path elements. - - then if we are doing "thin" pack, the objects we are _not_ - going to pack but we know about are sorted earlier than - other objects. - - and finally sort by size, larger to smaller. - -In one swell-foop, clarification and obscurification! Nonetheless, -authoritative. Cryptic, yet concise. It even solicits notions of -quotes from The Source Code. Clearly, more study is needed. - - <gitster> That's the sort order. What this means is: - - we do not delta different object types. - - we prefer to delta the objects with the same full path, but - allow files with the same name from different directories. - - we always prefer to delta against objects we are not going - to send, if there are some. - - we prefer to delta against larger objects, so that we have - lots of removals. - - The penultimate rule is for "thin" packs. It is used when - the other side is known to have such objects. - -There it is again. "Thin" packs. I'm thinking to myself, "What -is a 'thin' pack?" So I ask: - - <jdl> What is a "thin" pack? - - <gitster> Use of --objects-edge to rev-list as the upstream of - pack-objects. The pack transfer protocol negotiates that. - -Woo hoo! Cleared that _right_ up! - - <gitster> There are two directions - push and fetch. - -There! Did you see it? It is not '"push" and "pull"'! How often the -confusion has started here. So casually mentioned, too! - - <gitster> For push, git-send-pack invokes git-receive-pack on the - other end. The receive-pack says "I have up to these commits". - send-pack looks at them, and computes what are missing from - the other end. So "thin" could be the default there. - - In the other direction, fetch, git-fetch-pack and - git-clone-pack invokes git-upload-pack on the other end - (via ssh or by talking to the daemon). - - There are two cases: fetch-pack with -k and clone-pack is one, - fetch-pack without -k is the other. clone-pack and fetch-pack - with -k will keep the downloaded packfile without expanded, so - we do not use thin pack transfer. Otherwise, the generated - pack will have delta without base object in the same pack. - - But fetch-pack without -k will explode the received pack into - individual objects, so we automatically ask upload-pack to - give us a thin pack if upload-pack supports it. - -OK then. - -Uh. - -Let's return to the previous conversation still in progress. - - <njs`> and "basename" means something like "the tail of end of - path of file objects and dir objects, as per basename(3), and - we just declare all commit and tag objects to have the same - basename" or something? - -Luckily, that too is a point that gitster clarified for us! - -If I might add, the trick is to make files that _might_ be similar be -located close to each other in the hash buckets based on their file -names. It used to be that "foo/Makefile", "bar/baz/quux/Makefile" and -"Makefile" all landed in the same bucket due to their common basename, -"Makefile". However, now they land in "close" buckets. - -The algorithm allows not just for the _same_ bucket, but for _close_ -buckets to be considered delta candidates. The rationale is -essentially that files, like Makefiles, often have very similar -content no matter what directory they live in. - - <linus> I played around with different delta algorithms, and with - making the "delta window" bigger, but having too big of a - sliding window makes it very expensive to generate the pack: - you need to compare every object with a _ton_ of other objects. - - There are a number of other trivial heuristics too, which - basically boil down to "don't bother even trying to delta this - pair" if we can tell before-hand that the delta isn't worth it - (due to size differences, where we can take a previous delta - result into account to decide that "ok, no point in trying - that one, it will be worse"). - - End result: packing is actually very size efficient. It's - somewhat CPU-wasteful, but on the other hand, since you're - really only supposed to do it maybe once a month (and you can - do it during the night), nobody really seems to care. - -Nice Engineering Touch, there. Find when it doesn't matter, and -proclaim it a non-issue. Good style too! - - <njs`> So, just to repeat to see if I'm following, we start by - getting a list of the objects we want to pack, we sort it by - this heuristic (basically lexicographically on the tuple - (type, basename, size)). - - Then we walk through this list, and calculate a delta of - each object against the last n (tunable parameter) objects, - and pick the smallest of these deltas. - -Vastly simplified, but the essence is there! - - <linus> Correct. - - <njs`> And then once we have picked a delta or fulltext to - represent each object, we re-sort by recency, and write them - out in that order. - - <linus> Yup. Some other small details: - -And of course there is the "Other Shoe" Factor too. - - <linus> - We limit the delta depth to another magic value (right - now both the window and delta depth magic values are just "10") - - <njs`> Hrm, my intuition is that you'd end up with really _bad_ IO - patterns, because the things you want are near by, but to - actually reconstruct them you may have to jump all over in - random ways. - - <linus> - When we write out a delta, and we haven't yet written - out the object it is a delta against, we write out the base - object first. And no, when we reconstruct them, we actually - get nice IO patterns, because: - - larger objects tend to be "more recent" (Linus' law: files grow) - - we actively try to generate deltas from a larger object to a - smaller one - - this means that the top-of-tree very seldom has deltas - (i.e. deltas in _practice_ are "backwards deltas") - -Again, we should reread that whole paragraph. Not just because -Linus has slipped Linus's Law in there on us, but because it is -important. Let's make sure we clarify some of the points here: - - <njs`> So the point is just that in practice, delta order and - recency order match each other quite well. - - <linus> Yes. There's another nice side to this (and yes, it was - designed that way ;): - - the reason we generate deltas against the larger object is - actually a big space saver too! - - <njs`> Hmm, but your last comment (if "we haven't yet written out - the object it is a delta against, we write out the base object - first"), seems like it would make these facts mostly - irrelevant because even if in practice you would not have to - wander around much, in fact you just brute-force say that in - the cases where you might have to wander, don't do that :-) - - <linus> Yes and no. Notice the rule: we only write out the base - object first if the delta against it was more recent. That - means that you can actually have deltas that refer to a base - object that is _not_ close to the delta object, but that only - happens when the delta is needed to generate an _old_ object. - - <linus> See? - -Yeah, no. I missed that on the first two or three readings myself. - - <linus> This keeps the front of the pack dense. The front of the - pack never contains data that isn't relevant to a "recent" - object. The size optimization comes from our use of xdelta - (but is true for many other delta algorithms): removing data - is cheaper (in size) than adding data. - - When you remove data, you only need to say "copy bytes n--m". - In contrast, in a delta that _adds_ data, you have to say "add - these bytes: 'actual data goes here'" - - *** njs` has quit: Read error: 104 (Connection reset by peer) - - <linus> Uhhuh. I hope I didn't blow njs` mind. - - *** njs` has joined channel #git - - <pasky> :) - -The silent observers are amused. Of course. - -And as if njs` was expected to be omniscient: - - <linus> njs - did you miss anything? - -OK, I'll spell it out. That's Geek Humor. If njs` was not actually -connected for a little bit there, how would he know if missed anything -while he was disconnected? He's a benevolent dictator with a sense of -humor! Well noted! - - <njs`> Stupid router. Or gremlins, or whatever. - -It's a cheap shot at Cisco. Take 'em when you can. - - <njs`> Yes and no. Notice the rule: we only write out the base - object first if the delta against it was more recent. - - I'm getting lost in all these orders, let me re-read :-) - So the write-out order is from most recent to least recent? - (Conceivably it could be the opposite way too, I'm not sure if - we've said) though my connection back at home is logging, so I - can just read what you said there :-) - -And for those of you paying attention, the Omniscient Trick has just -been detailed! - - <linus> Yes, we always write out most recent first - - <njs`> And, yeah, I got the part about deeper-in-history stuff - having worse IO characteristics, one sort of doesn't care. - - <linus> With the caveat that if the "most recent" needs an older - object to delta against (hey, shrinking sometimes does - happen), we write out the old object with the delta. - - <njs`> (if only it happened more...) - - <linus> Anyway, the pack-file could easily be denser still, but - because it's used both for streaming (the Git protocol) and - for on-disk, it has a few pessimizations. - -Actually, it is a made-up word. But it is a made-up word being -used as setup for a later optimization, which is a real word: - - <linus> In particular, while the pack-file is then compressed, - it's compressed just one object at a time, so the actual - compression factor is less than it could be in theory. But it - means that it's all nice random-access with a simple index to - do "object name->location in packfile" translation. - - <njs`> I'm assuming the real win for delta-ing large->small is - more homogeneous statistics for gzip to run over? - - (You have to put the bytes in one place or another, but - putting them in a larger blob wins on compression) - - Actually, what is the compression strategy -- each delta - individually gzipped, the whole file gzipped, somewhere in - between, no compression at all, ....? - - Right. - -Reality IRC sets in. For example: - - <pasky> I'll read the rest in the morning, I really have to go - sleep or there's no hope whatsoever for me at the today's - exam... g'nite all. - -Heh. - - <linus> pasky: g'nite - - <njs`> pasky: 'luck - - <linus> Right: large->small matters exactly because of compression - behaviour. If it was non-compressed, it probably wouldn't make - any difference. - - <njs`> yeah - - <linus> Anyway: I'm not even trying to claim that the pack-files - are perfect, but they do tend to have a nice balance of - density vs ease-of use. - -Gasp! OK, saved. That's a fair Engineering trade off. Close call! -In fact, Linus reflects on some Basic Engineering Fundamentals, -design options, etc. - - <linus> More importantly, they allow Git to still _conceptually_ - never deal with deltas at all, and be a "whole object" store. - - Which has some problems (we discussed bad huge-file - behaviour on the Git lists the other day), but it does mean - that the basic Git concepts are really really simple and - straightforward. - - It's all been quite stable. - - Which I think is very much a result of having very simple - basic ideas, so that there's never any confusion about what's - going on. - - Bugs happen, but they are "simple" bugs. And bugs that - actually get some object store detail wrong are almost always - so obvious that they never go anywhere. - - <njs`> Yeah. - -Nuff said. - - <linus> Anyway. I'm off for bed. It's not 6AM here, but I've got - three kids, and have to get up early in the morning to send - them off. I need my beauty sleep. - - <njs`> :-) - - <njs`> appreciate the infodump, I really was failing to find the - details on Git packs :-) - -And now you know the rest of the story. diff --git a/third_party/git/Documentation/technical/pack-protocol.txt b/third_party/git/Documentation/technical/pack-protocol.txt deleted file mode 100644 index c73e72de0e..0000000000 --- a/third_party/git/Documentation/technical/pack-protocol.txt +++ /dev/null @@ -1,674 +0,0 @@ -Packfile transfer protocols -=========================== - -Git supports transferring data in packfiles over the ssh://, git://, http:// and -file:// transports. There exist two sets of protocols, one for pushing -data from a client to a server and another for fetching data from a -server to a client. The three transports (ssh, git, file) use the same -protocol to transfer data. http is documented in http-protocol.txt. - -The processes invoked in the canonical Git implementation are 'upload-pack' -on the server side and 'fetch-pack' on the client side for fetching data; -then 'receive-pack' on the server and 'send-pack' on the client for pushing -data. The protocol functions to have a server tell a client what is -currently on the server, then for the two to negotiate the smallest amount -of data to send in order to fully update one or the other. - -pkt-line Format ---------------- - -The descriptions below build on the pkt-line format described in -protocol-common.txt. When the grammar indicate `PKT-LINE(...)`, unless -otherwise noted the usual pkt-line LF rules apply: the sender SHOULD -include a LF, but the receiver MUST NOT complain if it is not present. - -An error packet is a special pkt-line that contains an error string. - ----- - error-line = PKT-LINE("ERR" SP explanation-text) ----- - -Throughout the protocol, where `PKT-LINE(...)` is expected, an error packet MAY -be sent. Once this packet is sent by a client or a server, the data transfer -process defined in this protocol is terminated. - -Transports ----------- -There are three transports over which the packfile protocol is -initiated. The Git transport is a simple, unauthenticated server that -takes the command (almost always 'upload-pack', though Git -servers can be configured to be globally writable, in which 'receive- -pack' initiation is also allowed) with which the client wishes to -communicate and executes it and connects it to the requesting -process. - -In the SSH transport, the client just runs the 'upload-pack' -or 'receive-pack' process on the server over the SSH protocol and then -communicates with that invoked process over the SSH connection. - -The file:// transport runs the 'upload-pack' or 'receive-pack' -process locally and communicates with it over a pipe. - -Extra Parameters ----------------- - -The protocol provides a mechanism in which clients can send additional -information in its first message to the server. These are called "Extra -Parameters", and are supported by the Git, SSH, and HTTP protocols. - -Each Extra Parameter takes the form of `<key>=<value>` or `<key>`. - -Servers that receive any such Extra Parameters MUST ignore all -unrecognized keys. Currently, the only Extra Parameter recognized is -"version" with a value of '1' or '2'. See protocol-v2.txt for more -information on protocol version 2. - -Git Transport -------------- - -The Git transport starts off by sending the command and repository -on the wire using the pkt-line format, followed by a NUL byte and a -hostname parameter, terminated by a NUL byte. - - 0033git-upload-pack /project.git\0host=myserver.com\0 - -The transport may send Extra Parameters by adding an additional NUL -byte, and then adding one or more NUL-terminated strings: - - 003egit-upload-pack /project.git\0host=myserver.com\0\0version=1\0 - --- - git-proto-request = request-command SP pathname NUL - [ host-parameter NUL ] [ NUL extra-parameters ] - request-command = "git-upload-pack" / "git-receive-pack" / - "git-upload-archive" ; case sensitive - pathname = *( %x01-ff ) ; exclude NUL - host-parameter = "host=" hostname [ ":" port ] - extra-parameters = 1*extra-parameter - extra-parameter = 1*( %x01-ff ) NUL --- - -host-parameter is used for the -git-daemon name based virtual hosting. See --interpolated-path -option to git daemon, with the %H/%CH format characters. - -Basically what the Git client is doing to connect to an 'upload-pack' -process on the server side over the Git protocol is this: - - $ echo -e -n \ - "0039git-upload-pack /schacon/gitbook.git\0host=example.com\0" | - nc -v example.com 9418 - - -SSH Transport -------------- - -Initiating the upload-pack or receive-pack processes over SSH is -executing the binary on the server via SSH remote execution. -It is basically equivalent to running this: - - $ ssh git.example.com "git-upload-pack '/project.git'" - -For a server to support Git pushing and pulling for a given user over -SSH, that user needs to be able to execute one or both of those -commands via the SSH shell that they are provided on login. On some -systems, that shell access is limited to only being able to run those -two commands, or even just one of them. - -In an ssh:// format URI, it's absolute in the URI, so the '/' after -the host name (or port number) is sent as an argument, which is then -read by the remote git-upload-pack exactly as is, so it's effectively -an absolute path in the remote filesystem. - - git clone ssh://user@example.com/project.git - | - v - ssh user@example.com "git-upload-pack '/project.git'" - -In a "user@host:path" format URI, its relative to the user's home -directory, because the Git client will run: - - git clone user@example.com:project.git - | - v - ssh user@example.com "git-upload-pack 'project.git'" - -The exception is if a '~' is used, in which case -we execute it without the leading '/'. - - ssh://user@example.com/~alice/project.git, - | - v - ssh user@example.com "git-upload-pack '~alice/project.git'" - -Depending on the value of the `protocol.version` configuration variable, -Git may attempt to send Extra Parameters as a colon-separated string in -the GIT_PROTOCOL environment variable. This is done only if -the `ssh.variant` configuration variable indicates that the ssh command -supports passing environment variables as an argument. - -A few things to remember here: - -- The "command name" is spelled with dash (e.g. git-upload-pack), but - this can be overridden by the client; - -- The repository path is always quoted with single quotes. - -Fetching Data From a Server ---------------------------- - -When one Git repository wants to get data that a second repository -has, the first can 'fetch' from the second. This operation determines -what data the server has that the client does not then streams that -data down to the client in packfile format. - - -Reference Discovery -------------------- - -When the client initially connects the server will immediately respond -with a version number (if "version=1" is sent as an Extra Parameter), -and a listing of each reference it has (all branches and tags) along -with the object name that each reference currently points to. - - $ echo -e -n "0044git-upload-pack /schacon/gitbook.git\0host=example.com\0\0version=1\0" | - nc -v example.com 9418 - 000aversion 1 - 00887217a7c7e582c46cec22a130adf4b9d7d950fba0 HEAD\0multi_ack thin-pack - side-band side-band-64k ofs-delta shallow no-progress include-tag - 00441d3fcd5ced445d1abc402225c0b8a1299641f497 refs/heads/integration - 003f7217a7c7e582c46cec22a130adf4b9d7d950fba0 refs/heads/master - 003cb88d2441cac0977faf98efc80305012112238d9d refs/tags/v0.9 - 003c525128480b96c89e6418b1e40909bf6c5b2d580f refs/tags/v1.0 - 003fe92df48743b7bc7d26bcaabfddde0a1e20cae47c refs/tags/v1.0^{} - 0000 - -The returned response is a pkt-line stream describing each ref and -its current value. The stream MUST be sorted by name according to -the C locale ordering. - -If HEAD is a valid ref, HEAD MUST appear as the first advertised -ref. If HEAD is not a valid ref, HEAD MUST NOT appear in the -advertisement list at all, but other refs may still appear. - -The stream MUST include capability declarations behind a NUL on the -first ref. The peeled value of a ref (that is "ref^{}") MUST be -immediately after the ref itself, if presented. A conforming server -MUST peel the ref if it's an annotated tag. - ----- - advertised-refs = *1("version 1") - (no-refs / list-of-refs) - *shallow - flush-pkt - - no-refs = PKT-LINE(zero-id SP "capabilities^{}" - NUL capability-list) - - list-of-refs = first-ref *other-ref - first-ref = PKT-LINE(obj-id SP refname - NUL capability-list) - - other-ref = PKT-LINE(other-tip / other-peeled) - other-tip = obj-id SP refname - other-peeled = obj-id SP refname "^{}" - - shallow = PKT-LINE("shallow" SP obj-id) - - capability-list = capability *(SP capability) - capability = 1*(LC_ALPHA / DIGIT / "-" / "_") - LC_ALPHA = %x61-7A ----- - -Server and client MUST use lowercase for obj-id, both MUST treat obj-id -as case-insensitive. - -See protocol-capabilities.txt for a list of allowed server capabilities -and descriptions. - -Packfile Negotiation --------------------- -After reference and capabilities discovery, the client can decide to -terminate the connection by sending a flush-pkt, telling the server it can -now gracefully terminate, and disconnect, when it does not need any pack -data. This can happen with the ls-remote command, and also can happen when -the client already is up to date. - -Otherwise, it enters the negotiation phase, where the client and -server determine what the minimal packfile necessary for transport is, -by telling the server what objects it wants, its shallow objects -(if any), and the maximum commit depth it wants (if any). The client -will also send a list of the capabilities it wants to be in effect, -out of what the server said it could do with the first 'want' line. - ----- - upload-request = want-list - *shallow-line - *1depth-request - [filter-request] - flush-pkt - - want-list = first-want - *additional-want - - shallow-line = PKT-LINE("shallow" SP obj-id) - - depth-request = PKT-LINE("deepen" SP depth) / - PKT-LINE("deepen-since" SP timestamp) / - PKT-LINE("deepen-not" SP ref) - - first-want = PKT-LINE("want" SP obj-id SP capability-list) - additional-want = PKT-LINE("want" SP obj-id) - - depth = 1*DIGIT - - filter-request = PKT-LINE("filter" SP filter-spec) ----- - -Clients MUST send all the obj-ids it wants from the reference -discovery phase as 'want' lines. Clients MUST send at least one -'want' command in the request body. Clients MUST NOT mention an -obj-id in a 'want' command which did not appear in the response -obtained through ref discovery. - -The client MUST write all obj-ids which it only has shallow copies -of (meaning that it does not have the parents of a commit) as -'shallow' lines so that the server is aware of the limitations of -the client's history. - -The client now sends the maximum commit history depth it wants for -this transaction, which is the number of commits it wants from the -tip of the history, if any, as a 'deepen' line. A depth of 0 is the -same as not making a depth request. The client does not want to receive -any commits beyond this depth, nor does it want objects needed only to -complete those commits. Commits whose parents are not received as a -result are defined as shallow and marked as such in the server. This -information is sent back to the client in the next step. - -The client can optionally request that pack-objects omit various -objects from the packfile using one of several filtering techniques. -These are intended for use with partial clone and partial fetch -operations. An object that does not meet a filter-spec value is -omitted unless explicitly requested in a 'want' line. See `rev-list` -for possible filter-spec values. - -Once all the 'want's and 'shallow's (and optional 'deepen') are -transferred, clients MUST send a flush-pkt, to tell the server side -that it is done sending the list. - -Otherwise, if the client sent a positive depth request, the server -will determine which commits will and will not be shallow and -send this information to the client. If the client did not request -a positive depth, this step is skipped. - ----- - shallow-update = *shallow-line - *unshallow-line - flush-pkt - - shallow-line = PKT-LINE("shallow" SP obj-id) - - unshallow-line = PKT-LINE("unshallow" SP obj-id) ----- - -If the client has requested a positive depth, the server will compute -the set of commits which are no deeper than the desired depth. The set -of commits start at the client's wants. - -The server writes 'shallow' lines for each -commit whose parents will not be sent as a result. The server writes -an 'unshallow' line for each commit which the client has indicated is -shallow, but is no longer shallow at the currently requested depth -(that is, its parents will now be sent). The server MUST NOT mark -as unshallow anything which the client has not indicated was shallow. - -Now the client will send a list of the obj-ids it has using 'have' -lines, so the server can make a packfile that only contains the objects -that the client needs. In multi_ack mode, the canonical implementation -will send up to 32 of these at a time, then will send a flush-pkt. The -canonical implementation will skip ahead and send the next 32 immediately, -so that there is always a block of 32 "in-flight on the wire" at a time. - ----- - upload-haves = have-list - compute-end - - have-list = *have-line - have-line = PKT-LINE("have" SP obj-id) - compute-end = flush-pkt / PKT-LINE("done") ----- - -If the server reads 'have' lines, it then will respond by ACKing any -of the obj-ids the client said it had that the server also has. The -server will ACK obj-ids differently depending on which ack mode is -chosen by the client. - -In multi_ack mode: - - * the server will respond with 'ACK obj-id continue' for any common - commits. - - * once the server has found an acceptable common base commit and is - ready to make a packfile, it will blindly ACK all 'have' obj-ids - back to the client. - - * the server will then send a 'NAK' and then wait for another response - from the client - either a 'done' or another list of 'have' lines. - -In multi_ack_detailed mode: - - * the server will differentiate the ACKs where it is signaling - that it is ready to send data with 'ACK obj-id ready' lines, and - signals the identified common commits with 'ACK obj-id common' lines. - -Without either multi_ack or multi_ack_detailed: - - * upload-pack sends "ACK obj-id" on the first common object it finds. - After that it says nothing until the client gives it a "done". - - * upload-pack sends "NAK" on a flush-pkt if no common object - has been found yet. If one has been found, and thus an ACK - was already sent, it's silent on the flush-pkt. - -After the client has gotten enough ACK responses that it can determine -that the server has enough information to send an efficient packfile -(in the canonical implementation, this is determined when it has received -enough ACKs that it can color everything left in the --date-order queue -as common with the server, or the --date-order queue is empty), or the -client determines that it wants to give up (in the canonical implementation, -this is determined when the client sends 256 'have' lines without getting -any of them ACKed by the server - meaning there is nothing in common and -the server should just send all of its objects), then the client will send -a 'done' command. The 'done' command signals to the server that the client -is ready to receive its packfile data. - -However, the 256 limit *only* turns on in the canonical client -implementation if we have received at least one "ACK %s continue" -during a prior round. This helps to ensure that at least one common -ancestor is found before we give up entirely. - -Once the 'done' line is read from the client, the server will either -send a final 'ACK obj-id' or it will send a 'NAK'. 'obj-id' is the object -name of the last commit determined to be common. The server only sends -ACK after 'done' if there is at least one common base and multi_ack or -multi_ack_detailed is enabled. The server always sends NAK after 'done' -if there is no common base found. - -Instead of 'ACK' or 'NAK', the server may send an error message (for -example, if it does not recognize an object in a 'want' line received -from the client). - -Then the server will start sending its packfile data. - ----- - server-response = *ack_multi ack / nak - ack_multi = PKT-LINE("ACK" SP obj-id ack_status) - ack_status = "continue" / "common" / "ready" - ack = PKT-LINE("ACK" SP obj-id) - nak = PKT-LINE("NAK") ----- - -A simple clone may look like this (with no 'have' lines): - ----- - C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack \ - side-band-64k ofs-delta\n - C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n - C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n - C: 0032want 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n - C: 0032want 74730d410fcb6603ace96f1dc55ea6196122532d\n - C: 0000 - C: 0009done\n - - S: 0008NAK\n - S: [PACKFILE] ----- - -An incremental update (fetch) response might look like this: - ----- - C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack \ - side-band-64k ofs-delta\n - C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n - C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n - C: 0000 - C: 0032have 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n - C: [30 more have lines] - C: 0032have 74730d410fcb6603ace96f1dc55ea6196122532d\n - C: 0000 - - S: 003aACK 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01 continue\n - S: 003aACK 74730d410fcb6603ace96f1dc55ea6196122532d continue\n - S: 0008NAK\n - - C: 0009done\n - - S: 0031ACK 74730d410fcb6603ace96f1dc55ea6196122532d\n - S: [PACKFILE] ----- - - -Packfile Data -------------- - -Now that the client and server have finished negotiation about what -the minimal amount of data that needs to be sent to the client is, the server -will construct and send the required data in packfile format. - -See pack-format.txt for what the packfile itself actually looks like. - -If 'side-band' or 'side-band-64k' capabilities have been specified by -the client, the server will send the packfile data multiplexed. - -Each packet starting with the packet-line length of the amount of data -that follows, followed by a single byte specifying the sideband the -following data is coming in on. - -In 'side-band' mode, it will send up to 999 data bytes plus 1 control -code, for a total of up to 1000 bytes in a pkt-line. In 'side-band-64k' -mode it will send up to 65519 data bytes plus 1 control code, for a -total of up to 65520 bytes in a pkt-line. - -The sideband byte will be a '1', '2' or a '3'. Sideband '1' will contain -packfile data, sideband '2' will be used for progress information that the -client will generally print to stderr and sideband '3' is used for error -information. - -If no 'side-band' capability was specified, the server will stream the -entire packfile without multiplexing. - - -Pushing Data To a Server ------------------------- - -Pushing data to a server will invoke the 'receive-pack' process on the -server, which will allow the client to tell it which references it should -update and then send all the data the server will need for those new -references to be complete. Once all the data is received and validated, -the server will then update its references to what the client specified. - -Authentication --------------- - -The protocol itself contains no authentication mechanisms. That is to be -handled by the transport, such as SSH, before the 'receive-pack' process is -invoked. If 'receive-pack' is configured over the Git transport, those -repositories will be writable by anyone who can access that port (9418) as -that transport is unauthenticated. - -Reference Discovery -------------------- - -The reference discovery phase is done nearly the same way as it is in the -fetching protocol. Each reference obj-id and name on the server is sent -in packet-line format to the client, followed by a flush-pkt. The only -real difference is that the capability listing is different - the only -possible values are 'report-status', 'delete-refs', 'ofs-delta' and -'push-options'. - -Reference Update Request and Packfile Transfer ----------------------------------------------- - -Once the client knows what references the server is at, it can send a -list of reference update requests. For each reference on the server -that it wants to update, it sends a line listing the obj-id currently on -the server, the obj-id the client would like to update it to and the name -of the reference. - -This list is followed by a flush-pkt. - ----- - update-requests = *shallow ( command-list | push-cert ) - - shallow = PKT-LINE("shallow" SP obj-id) - - command-list = PKT-LINE(command NUL capability-list) - *PKT-LINE(command) - flush-pkt - - command = create / delete / update - create = zero-id SP new-id SP name - delete = old-id SP zero-id SP name - update = old-id SP new-id SP name - - old-id = obj-id - new-id = obj-id - - push-cert = PKT-LINE("push-cert" NUL capability-list LF) - PKT-LINE("certificate version 0.1" LF) - PKT-LINE("pusher" SP ident LF) - PKT-LINE("pushee" SP url LF) - PKT-LINE("nonce" SP nonce LF) - *PKT-LINE("push-option" SP push-option LF) - PKT-LINE(LF) - *PKT-LINE(command LF) - *PKT-LINE(gpg-signature-lines LF) - PKT-LINE("push-cert-end" LF) - - push-option = 1*( VCHAR | SP ) ----- - -If the server has advertised the 'push-options' capability and the client has -specified 'push-options' as part of the capability list above, the client then -sends its push options followed by a flush-pkt. - ----- - push-options = *PKT-LINE(push-option) flush-pkt ----- - -For backwards compatibility with older Git servers, if the client sends a push -cert and push options, it MUST send its push options both embedded within the -push cert and after the push cert. (Note that the push options within the cert -are prefixed, but the push options after the cert are not.) Both these lists -MUST be the same, modulo the prefix. - -After that the packfile that -should contain all the objects that the server will need to complete the new -references will be sent. - ----- - packfile = "PACK" 28*(OCTET) ----- - -If the receiving end does not support delete-refs, the sending end MUST -NOT ask for delete command. - -If the receiving end does not support push-cert, the sending end -MUST NOT send a push-cert command. When a push-cert command is -sent, command-list MUST NOT be sent; the commands recorded in the -push certificate is used instead. - -The packfile MUST NOT be sent if the only command used is 'delete'. - -A packfile MUST be sent if either create or update command is used, -even if the server already has all the necessary objects. In this -case the client MUST send an empty packfile. The only time this -is likely to happen is if the client is creating -a new branch or a tag that points to an existing obj-id. - -The server will receive the packfile, unpack it, then validate each -reference that is being updated that it hasn't changed while the request -was being processed (the obj-id is still the same as the old-id), and -it will run any update hooks to make sure that the update is acceptable. -If all of that is fine, the server will then update the references. - -Push Certificate ----------------- - -A push certificate begins with a set of header lines. After the -header and an empty line, the protocol commands follow, one per -line. Note that the trailing LF in push-cert PKT-LINEs is _not_ -optional; it must be present. - -Currently, the following header fields are defined: - -`pusher` ident:: - Identify the GPG key in "Human Readable Name <email@address>" - format. - -`pushee` url:: - The repository URL (anonymized, if the URL contains - authentication material) the user who ran `git push` - intended to push into. - -`nonce` nonce:: - The 'nonce' string the receiving repository asked the - pushing user to include in the certificate, to prevent - replay attacks. - -The GPG signature lines are a detached signature for the contents -recorded in the push certificate before the signature block begins. -The detached signature is used to certify that the commands were -given by the pusher, who must be the signer. - -Report Status -------------- - -After receiving the pack data from the sender, the receiver sends a -report if 'report-status' capability is in effect. -It is a short listing of what happened in that update. It will first -list the status of the packfile unpacking as either 'unpack ok' or -'unpack [error]'. Then it will list the status for each of the references -that it tried to update. Each line is either 'ok [refname]' if the -update was successful, or 'ng [refname] [error]' if the update was not. - ----- - report-status = unpack-status - 1*(command-status) - flush-pkt - - unpack-status = PKT-LINE("unpack" SP unpack-result) - unpack-result = "ok" / error-msg - - command-status = command-ok / command-fail - command-ok = PKT-LINE("ok" SP refname) - command-fail = PKT-LINE("ng" SP refname SP error-msg) - - error-msg = 1*(OCTECT) ; where not "ok" ----- - -Updates can be unsuccessful for a number of reasons. The reference can have -changed since the reference discovery phase was originally sent, meaning -someone pushed in the meantime. The reference being pushed could be a -non-fast-forward reference and the update hooks or configuration could be -set to not allow that, etc. Also, some references can be updated while others -can be rejected. - -An example client/server communication might look like this: - ----- - S: 006274730d410fcb6603ace96f1dc55ea6196122532d refs/heads/local\0report-status delete-refs ofs-delta\n - S: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe refs/heads/debug\n - S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/master\n - S: 003d74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/team\n - S: 0000 - - C: 00677d1665144a3a975c05f1f43902ddaf084e784dbe 74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/debug\n - C: 006874730d410fcb6603ace96f1dc55ea6196122532d 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a refs/heads/master\n - C: 0000 - C: [PACKDATA] - - S: 000eunpack ok\n - S: 0018ok refs/heads/debug\n - S: 002ang refs/heads/master non-fast-forward\n ----- diff --git a/third_party/git/Documentation/technical/partial-clone.txt b/third_party/git/Documentation/technical/partial-clone.txt deleted file mode 100644 index 896c7b3878..0000000000 --- a/third_party/git/Documentation/technical/partial-clone.txt +++ /dev/null @@ -1,324 +0,0 @@ -Partial Clone Design Notes -========================== - -The "Partial Clone" feature is a performance optimization for Git that -allows Git to function without having a complete copy of the repository. -The goal of this work is to allow Git better handle extremely large -repositories. - -During clone and fetch operations, Git downloads the complete contents -and history of the repository. This includes all commits, trees, and -blobs for the complete life of the repository. For extremely large -repositories, clones can take hours (or days) and consume 100+GiB of disk -space. - -Often in these repositories there are many blobs and trees that the user -does not need such as: - - 1. files outside of the user's work area in the tree. For example, in - a repository with 500K directories and 3.5M files in every commit, - we can avoid downloading many objects if the user only needs a - narrow "cone" of the source tree. - - 2. large binary assets. For example, in a repository where large build - artifacts are checked into the tree, we can avoid downloading all - previous versions of these non-mergeable binary assets and only - download versions that are actually referenced. - -Partial clone allows us to avoid downloading such unneeded objects *in -advance* during clone and fetch operations and thereby reduce download -times and disk usage. Missing objects can later be "demand fetched" -if/when needed. - -Use of partial clone requires that the user be online and the origin -remote be available for on-demand fetching of missing objects. This may -or may not be problematic for the user. For example, if the user can -stay within the pre-selected subset of the source tree, they may not -encounter any missing objects. Alternatively, the user could try to -pre-fetch various objects if they know that they are going offline. - - -Non-Goals ---------- - -Partial clone is a mechanism to limit the number of blobs and trees downloaded -*within* a given range of commits -- and is therefore independent of and not -intended to conflict with existing DAG-level mechanisms to limit the set of -requested commits (i.e. shallow clone, single branch, or fetch '<refspec>'). - - -Design Overview ---------------- - -Partial clone logically consists of the following parts: - -- A mechanism for the client to describe unneeded or unwanted objects to - the server. - -- A mechanism for the server to omit such unwanted objects from packfiles - sent to the client. - -- A mechanism for the client to gracefully handle missing objects (that - were previously omitted by the server). - -- A mechanism for the client to backfill missing objects as needed. - - -Design Details --------------- - -- A new pack-protocol capability "filter" is added to the fetch-pack and - upload-pack negotiation. -+ -This uses the existing capability discovery mechanism. -See "filter" in Documentation/technical/pack-protocol.txt. - -- Clients pass a "filter-spec" to clone and fetch which is passed to the - server to request filtering during packfile construction. -+ -There are various filters available to accommodate different situations. -See "--filter=<filter-spec>" in Documentation/rev-list-options.txt. - -- On the server pack-objects applies the requested filter-spec as it - creates "filtered" packfiles for the client. -+ -These filtered packfiles are *incomplete* in the traditional sense because -they may contain objects that reference objects not contained in the -packfile and that the client doesn't already have. For example, the -filtered packfile may contain trees or tags that reference missing blobs -or commits that reference missing trees. - -- On the client these incomplete packfiles are marked as "promisor packfiles" - and treated differently by various commands. - -- On the client a repository extension is added to the local config to - prevent older versions of git from failing mid-operation because of - missing objects that they cannot handle. - See "extensions.partialClone" in Documentation/technical/repository-version.txt" - - -Handling Missing Objects ------------------------- - -- An object may be missing due to a partial clone or fetch, or missing due - to repository corruption. To differentiate these cases, the local - repository specially indicates such filtered packfiles obtained from the - promisor remote as "promisor packfiles". -+ -These promisor packfiles consist of a "<name>.promisor" file with -arbitrary contents (like the "<name>.keep" files), in addition to -their "<name>.pack" and "<name>.idx" files. - -- The local repository considers a "promisor object" to be an object that - it knows (to the best of its ability) that the promisor remote has promised - that it has, either because the local repository has that object in one of - its promisor packfiles, or because another promisor object refers to it. -+ -When Git encounters a missing object, Git can see if it is a promisor object -and handle it appropriately. If not, Git can report a corruption. -+ -This means that there is no need for the client to explicitly maintain an -expensive-to-modify list of missing objects.[a] - -- Since almost all Git code currently expects any referenced object to be - present locally and because we do not want to force every command to do - a dry-run first, a fallback mechanism is added to allow Git to attempt - to dynamically fetch missing objects from the promisor remote. -+ -When the normal object lookup fails to find an object, Git invokes -fetch-object to try to get the object from the server and then retry -the object lookup. This allows objects to be "faulted in" without -complicated prediction algorithms. -+ -For efficiency reasons, no check as to whether the missing object is -actually a promisor object is performed. -+ -Dynamic object fetching tends to be slow as objects are fetched one at -a time. - -- `checkout` (and any other command using `unpack-trees`) has been taught - to bulk pre-fetch all required missing blobs in a single batch. - -- `rev-list` has been taught to print missing objects. -+ -This can be used by other commands to bulk prefetch objects. -For example, a "git log -p A..B" may internally want to first do -something like "git rev-list --objects --quiet --missing=print A..B" -and prefetch those objects in bulk. - -- `fsck` has been updated to be fully aware of promisor objects. - -- `repack` in GC has been updated to not touch promisor packfiles at all, - and to only repack other objects. - -- The global variable "fetch_if_missing" is used to control whether an - object lookup will attempt to dynamically fetch a missing object or - report an error. -+ -We are not happy with this global variable and would like to remove it, -but that requires significant refactoring of the object code to pass an -additional flag. We hope that concurrent efforts to add an ODB API can -encompass this. - - -Fetching Missing Objects ------------------------- - -- Fetching of objects is done using the existing transport mechanism using - transport_fetch_refs(), setting a new transport option - TRANS_OPT_NO_DEPENDENTS to indicate that only the objects themselves are - desired, not any object that they refer to. -+ -Because some transports invoke fetch_pack() in the same process, fetch_pack() -has been updated to not use any object flags when the corresponding argument -(no_dependents) is set. - -- The local repository sends a request with the hashes of all requested - objects as "want" lines, and does not perform any packfile negotiation. - It then receives a packfile. - -- Because we are reusing the existing fetch-pack mechanism, fetching - currently fetches all objects referred to by the requested objects, even - though they are not necessary. - - -Current Limitations -------------------- - -- The remote used for a partial clone (or the first partial fetch - following a regular clone) is marked as the "promisor remote". -+ -We are currently limited to a single promisor remote and only that -remote may be used for subsequent partial fetches. -+ -We accept this limitation because we believe initial users of this -feature will be using it on repositories with a strong single central -server. - -- Dynamic object fetching will only ask the promisor remote for missing - objects. We assume that the promisor remote has a complete view of the - repository and can satisfy all such requests. - -- Repack essentially treats promisor and non-promisor packfiles as 2 - distinct partitions and does not mix them. Repack currently only works - on non-promisor packfiles and loose objects. - -- Dynamic object fetching invokes fetch-pack once *for each item* - because most algorithms stumble upon a missing object and need to have - it resolved before continuing their work. This may incur significant - overhead -- and multiple authentication requests -- if many objects are - needed. - -- Dynamic object fetching currently uses the existing pack protocol V0 - which means that each object is requested via fetch-pack. The server - will send a full set of info/refs when the connection is established. - If there are large number of refs, this may incur significant overhead. - - -Future Work ------------ - -- Allow more than one promisor remote and define a strategy for fetching - missing objects from specific promisor remotes or of iterating over the - set of promisor remotes until a missing object is found. -+ -A user might want to have multiple geographically-close cache servers -for fetching missing blobs while continuing to do filtered `git-fetch` -commands from the central server, for example. -+ -Or the user might want to work in a triangular work flow with multiple -promisor remotes that each have an incomplete view of the repository. - -- Allow repack to work on promisor packfiles (while keeping them distinct - from non-promisor packfiles). - -- Allow non-pathname-based filters to make use of packfile bitmaps (when - present). This was just an omission during the initial implementation. - -- Investigate use of a long-running process to dynamically fetch a series - of objects, such as proposed in [5,6] to reduce process startup and - overhead costs. -+ -It would be nice if pack protocol V2 could allow that long-running -process to make a series of requests over a single long-running -connection. - -- Investigate pack protocol V2 to avoid the info/refs broadcast on - each connection with the server to dynamically fetch missing objects. - -- Investigate the need to handle loose promisor objects. -+ -Objects in promisor packfiles are allowed to reference missing objects -that can be dynamically fetched from the server. An assumption was -made that loose objects are only created locally and therefore should -not reference a missing object. We may need to revisit that assumption -if, for example, we dynamically fetch a missing tree and store it as a -loose object rather than a single object packfile. -+ -This does not necessarily mean we need to mark loose objects as promisor; -it may be sufficient to relax the object lookup or is-promisor functions. - - -Non-Tasks ---------- - -- Every time the subject of "demand loading blobs" comes up it seems - that someone suggests that the server be allowed to "guess" and send - additional objects that may be related to the requested objects. -+ -No work has gone into actually doing that; we're just documenting that -it is a common suggestion. We're not sure how it would work and have -no plans to work on it. -+ -It is valid for the server to send more objects than requested (even -for a dynamic object fetch), but we are not building on that. - - -Footnotes ---------- - -[a] expensive-to-modify list of missing objects: Earlier in the design of - partial clone we discussed the need for a single list of missing objects. - This would essentially be a sorted linear list of OIDs that the were - omitted by the server during a clone or subsequent fetches. - -This file would need to be loaded into memory on every object lookup. -It would need to be read, updated, and re-written (like the .git/index) -on every explicit "git fetch" command *and* on any dynamic object fetch. - -The cost to read, update, and write this file could add significant -overhead to every command if there are many missing objects. For example, -if there are 100M missing blobs, this file would be at least 2GiB on disk. - -With the "promisor" concept, we *infer* a missing object based upon the -type of packfile that references it. - - -Related Links -------------- -[0] https://crbug.com/git/2 - Bug#2: Partial Clone - -[1] https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/ + - Subject: [RFC] Add support for downloading blobs on demand + - Date: Fri, 13 Jan 2017 10:52:53 -0500 - -[2] https://public-inbox.org/git/cover.1506714999.git.jonathantanmy@google.com/ + - Subject: [PATCH 00/18] Partial clone (from clone to lazy fetch in 18 patches) + - Date: Fri, 29 Sep 2017 13:11:36 -0700 - -[3] https://public-inbox.org/git/20170426221346.25337-1-jonathantanmy@google.com/ + - Subject: Proposal for missing blob support in Git repos + - Date: Wed, 26 Apr 2017 15:13:46 -0700 - -[4] https://public-inbox.org/git/1488999039-37631-1-git-send-email-git@jeffhostetler.com/ + - Subject: [PATCH 00/10] RFC Partial Clone and Fetch + - Date: Wed, 8 Mar 2017 18:50:29 +0000 - -[5] https://public-inbox.org/git/20170505152802.6724-1-benpeart@microsoft.com/ + - Subject: [PATCH v7 00/10] refactor the filter process code into a reusable module + - Date: Fri, 5 May 2017 11:27:52 -0400 - -[6] https://public-inbox.org/git/20170714132651.170708-1-benpeart@microsoft.com/ + - Subject: [RFC/PATCH v2 0/1] Add support for downloading blobs on demand + - Date: Fri, 14 Jul 2017 09:26:50 -0400 diff --git a/third_party/git/Documentation/technical/protocol-capabilities.txt b/third_party/git/Documentation/technical/protocol-capabilities.txt deleted file mode 100644 index 2b267c0da6..0000000000 --- a/third_party/git/Documentation/technical/protocol-capabilities.txt +++ /dev/null @@ -1,337 +0,0 @@ -Git Protocol Capabilities -========================= - -NOTE: this document describes capabilities for versions 0 and 1 of the pack -protocol. For version 2, please refer to the link:protocol-v2.html[protocol-v2] -doc. - -Servers SHOULD support all capabilities defined in this document. - -On the very first line of the initial server response of either -receive-pack and upload-pack the first reference is followed by -a NUL byte and then a list of space delimited server capabilities. -These allow the server to declare what it can and cannot support -to the client. - -Client will then send a space separated list of capabilities it wants -to be in effect. The client MUST NOT ask for capabilities the server -did not say it supports. - -Server MUST diagnose and abort if capabilities it does not understand -was sent. Server MUST NOT ignore capabilities that client requested -and server advertised. As a consequence of these rules, server MUST -NOT advertise capabilities it does not understand. - -The 'atomic', 'report-status', 'delete-refs', 'quiet', and 'push-cert' -capabilities are sent and recognized by the receive-pack (push to server) -process. - -The 'ofs-delta' and 'side-band-64k' capabilities are sent and recognized -by both upload-pack and receive-pack protocols. The 'agent' capability -may optionally be sent in both protocols. - -All other capabilities are only recognized by the upload-pack (fetch -from server) process. - -multi_ack ---------- - -The 'multi_ack' capability allows the server to return "ACK obj-id -continue" as soon as it finds a commit that it can use as a common -base, between the client's wants and the client's have set. - -By sending this early, the server can potentially head off the client -from walking any further down that particular branch of the client's -repository history. The client may still need to walk down other -branches, sending have lines for those, until the server has a -complete cut across the DAG, or the client has said "done". - -Without multi_ack, a client sends have lines in --date-order until -the server has found a common base. That means the client will send -have lines that are already known by the server to be common, because -they overlap in time with another branch that the server hasn't found -a common base on yet. - -For example suppose the client has commits in caps that the server -doesn't and the server has commits in lower case that the client -doesn't, as in the following diagram: - - +---- u ---------------------- x - / +----- y - / / - a -- b -- c -- d -- E -- F - \ - +--- Q -- R -- S - -If the client wants x,y and starts out by saying have F,S, the server -doesn't know what F,S is. Eventually the client says "have d" and -the server sends "ACK d continue" to let the client know to stop -walking down that line (so don't send c-b-a), but it's not done yet, -it needs a base for x. The client keeps going with S-R-Q, until a -gets reached, at which point the server has a clear base and it all -ends. - -Without multi_ack the client would have sent that c-b-a chain anyway, -interleaved with S-R-Q. - -multi_ack_detailed ------------------- -This is an extension of multi_ack that permits client to better -understand the server's in-memory state. See pack-protocol.txt, -section "Packfile Negotiation" for more information. - -no-done -------- -This capability should only be used with the smart HTTP protocol. If -multi_ack_detailed and no-done are both present, then the sender is -free to immediately send a pack following its first "ACK obj-id ready" -message. - -Without no-done in the smart HTTP protocol, the server session would -end and the client has to make another trip to send "done" before -the server can send the pack. no-done removes the last round and -thus slightly reduces latency. - -thin-pack ---------- - -A thin pack is one with deltas which reference base objects not -contained within the pack (but are known to exist at the receiving -end). This can reduce the network traffic significantly, but it -requires the receiving end to know how to "thicken" these packs by -adding the missing bases to the pack. - -The upload-pack server advertises 'thin-pack' when it can generate -and send a thin pack. A client requests the 'thin-pack' capability -when it understands how to "thicken" it, notifying the server that -it can receive such a pack. A client MUST NOT request the -'thin-pack' capability if it cannot turn a thin pack into a -self-contained pack. - -Receive-pack, on the other hand, is assumed by default to be able to -handle thin packs, but can ask the client not to use the feature by -advertising the 'no-thin' capability. A client MUST NOT send a thin -pack if the server advertises the 'no-thin' capability. - -The reasons for this asymmetry are historical. The receive-pack -program did not exist until after the invention of thin packs, so -historically the reference implementation of receive-pack always -understood thin packs. Adding 'no-thin' later allowed receive-pack -to disable the feature in a backwards-compatible manner. - - -side-band, side-band-64k ------------------------- - -This capability means that server can send, and client understand multiplexed -progress reports and error info interleaved with the packfile itself. - -These two options are mutually exclusive. A modern client always -favors 'side-band-64k'. - -Either mode indicates that the packfile data will be streamed broken -up into packets of up to either 1000 bytes in the case of 'side_band', -or 65520 bytes in the case of 'side_band_64k'. Each packet is made up -of a leading 4-byte pkt-line length of how much data is in the packet, -followed by a 1-byte stream code, followed by the actual data. - -The stream code can be one of: - - 1 - pack data - 2 - progress messages - 3 - fatal error message just before stream aborts - -The "side-band-64k" capability came about as a way for newer clients -that can handle much larger packets to request packets that are -actually crammed nearly full, while maintaining backward compatibility -for the older clients. - -Further, with side-band and its up to 1000-byte messages, it's actually -999 bytes of payload and 1 byte for the stream code. With side-band-64k, -same deal, you have up to 65519 bytes of data and 1 byte for the stream -code. - -The client MUST send only maximum of one of "side-band" and "side- -band-64k". Server MUST diagnose it as an error if client requests -both. - -ofs-delta ---------- - -Server can send, and client understand PACKv2 with delta referring to -its base by position in pack rather than by an obj-id. That is, they can -send/read OBJ_OFS_DELTA (aka type 6) in a packfile. - -agent ------ - -The server may optionally send a capability of the form `agent=X` to -notify the client that the server is running version `X`. The client may -optionally return its own agent string by responding with an `agent=Y` -capability (but it MUST NOT do so if the server did not mention the -agent capability). The `X` and `Y` strings may contain any printable -ASCII characters except space (i.e., the byte range 32 < x < 127), and -are typically of the form "package/version" (e.g., "git/1.8.3.1"). The -agent strings are purely informative for statistics and debugging -purposes, and MUST NOT be used to programmatically assume the presence -or absence of particular features. - -symref ------- - -This parameterized capability is used to inform the receiver which symbolic ref -points to which ref; for example, "symref=HEAD:refs/heads/master" tells the -receiver that HEAD points to master. This capability can be repeated to -represent multiple symrefs. - -Servers SHOULD include this capability for the HEAD symref if it is one of the -refs being sent. - -Clients MAY use the parameters from this capability to select the proper initial -branch when cloning a repository. - -shallow -------- - -This capability adds "deepen", "shallow" and "unshallow" commands to -the fetch-pack/upload-pack protocol so clients can request shallow -clones. - -deepen-since ------------- - -This capability adds "deepen-since" command to fetch-pack/upload-pack -protocol so the client can request shallow clones that are cut at a -specific time, instead of depth. Internally it's equivalent of doing -"rev-list --max-age=<timestamp>" on the server side. "deepen-since" -cannot be used with "deepen". - -deepen-not ----------- - -This capability adds "deepen-not" command to fetch-pack/upload-pack -protocol so the client can request shallow clones that are cut at a -specific revision, instead of depth. Internally it's equivalent of -doing "rev-list --not <rev>" on the server side. "deepen-not" -cannot be used with "deepen", but can be used with "deepen-since". - -deepen-relative ---------------- - -If this capability is requested by the client, the semantics of -"deepen" command is changed. The "depth" argument is the depth from -the current shallow boundary, instead of the depth from remote refs. - -no-progress ------------ - -The client was started with "git clone -q" or something, and doesn't -want that side band 2. Basically the client just says "I do not -wish to receive stream 2 on sideband, so do not send it to me, and if -you did, I will drop it on the floor anyway". However, the sideband -channel 3 is still used for error responses. - -include-tag ------------ - -The 'include-tag' capability is about sending annotated tags if we are -sending objects they point to. If we pack an object to the client, and -a tag object points exactly at that object, we pack the tag object too. -In general this allows a client to get all new annotated tags when it -fetches a branch, in a single network connection. - -Clients MAY always send include-tag, hardcoding it into a request when -the server advertises this capability. The decision for a client to -request include-tag only has to do with the client's desires for tag -data, whether or not a server had advertised objects in the -refs/tags/* namespace. - -Servers MUST pack the tags if their referrant is packed and the client -has requested include-tags. - -Clients MUST be prepared for the case where a server has ignored -include-tag and has not actually sent tags in the pack. In such -cases the client SHOULD issue a subsequent fetch to acquire the tags -that include-tag would have otherwise given the client. - -The server SHOULD send include-tag, if it supports it, regardless -of whether or not there are tags available. - -report-status -------------- - -The receive-pack process can receive a 'report-status' capability, -which tells it that the client wants a report of what happened after -a packfile upload and reference update. If the pushing client requests -this capability, after unpacking and updating references the server -will respond with whether the packfile unpacked successfully and if -each reference was updated successfully. If any of those were not -successful, it will send back an error message. See pack-protocol.txt -for example messages. - -delete-refs ------------ - -If the server sends back the 'delete-refs' capability, it means that -it is capable of accepting a zero-id value as the target -value of a reference update. It is not sent back by the client, it -simply informs the client that it can be sent zero-id values -to delete references. - -quiet ------ - -If the receive-pack server advertises the 'quiet' capability, it is -capable of silencing human-readable progress output which otherwise may -be shown when processing the received pack. A send-pack client should -respond with the 'quiet' capability to suppress server-side progress -reporting if the local progress reporting is also being suppressed -(e.g., via `push -q`, or if stderr does not go to a tty). - -atomic ------- - -If the server sends the 'atomic' capability it is capable of accepting -atomic pushes. If the pushing client requests this capability, the server -will update the refs in one atomic transaction. Either all refs are -updated or none. - -push-options ------------- - -If the server sends the 'push-options' capability it is able to accept -push options after the update commands have been sent, but before the -packfile is streamed. If the pushing client requests this capability, -the server will pass the options to the pre- and post- receive hooks -that process this push request. - -allow-tip-sha1-in-want ----------------------- - -If the upload-pack server advertises this capability, fetch-pack may -send "want" lines with SHA-1s that exist at the server but are not -advertised by upload-pack. - -allow-reachable-sha1-in-want ----------------------------- - -If the upload-pack server advertises this capability, fetch-pack may -send "want" lines with SHA-1s that exist at the server but are not -advertised by upload-pack. - -push-cert=<nonce> ------------------ - -The receive-pack server that advertises this capability is willing -to accept a signed push certificate, and asks the <nonce> to be -included in the push certificate. A send-pack client MUST NOT -send a push-cert packet unless the receive-pack server advertises -this capability. - -filter ------- - -If the upload-pack server advertises the 'filter' capability, -fetch-pack may send "filter" commands to request a partial clone -or partial fetch and request that the server omit various objects -from the packfile. diff --git a/third_party/git/Documentation/technical/protocol-common.txt b/third_party/git/Documentation/technical/protocol-common.txt deleted file mode 100644 index ecedb34bba..0000000000 --- a/third_party/git/Documentation/technical/protocol-common.txt +++ /dev/null @@ -1,99 +0,0 @@ -Documentation Common to Pack and Http Protocols -=============================================== - -ABNF Notation -------------- - -ABNF notation as described by RFC 5234 is used within the protocol documents, -except the following replacement core rules are used: ----- - HEXDIG = DIGIT / "a" / "b" / "c" / "d" / "e" / "f" ----- - -We also define the following common rules: ----- - NUL = %x00 - zero-id = 40*"0" - obj-id = 40*(HEXDIGIT) - - refname = "HEAD" - refname /= "refs/" <see discussion below> ----- - -A refname is a hierarchical octet string beginning with "refs/" and -not violating the 'git-check-ref-format' command's validation rules. -More specifically, they: - -. They can include slash `/` for hierarchical (directory) - grouping, but no slash-separated component can begin with a - dot `.`. - -. They must contain at least one `/`. This enforces the presence of a - category like `heads/`, `tags/` etc. but the actual names are not - restricted. - -. They cannot have two consecutive dots `..` anywhere. - -. They cannot have ASCII control characters (i.e. bytes whose - values are lower than \040, or \177 `DEL`), space, tilde `~`, - caret `^`, colon `:`, question-mark `?`, asterisk `*`, - or open bracket `[` anywhere. - -. They cannot end with a slash `/` or a dot `.`. - -. They cannot end with the sequence `.lock`. - -. They cannot contain a sequence `@{`. - -. They cannot contain a `\\`. - - -pkt-line Format ---------------- - -Much (but not all) of the payload is described around pkt-lines. - -A pkt-line is a variable length binary string. The first four bytes -of the line, the pkt-len, indicates the total length of the line, -in hexadecimal. The pkt-len includes the 4 bytes used to contain -the length's hexadecimal representation. - -A pkt-line MAY contain binary data, so implementors MUST ensure -pkt-line parsing/formatting routines are 8-bit clean. - -A non-binary line SHOULD BE terminated by an LF, which if present -MUST be included in the total length. Receivers MUST treat pkt-lines -with non-binary data the same whether or not they contain the trailing -LF (stripping the LF if present, and not complaining when it is -missing). - -The maximum length of a pkt-line's data component is 65516 bytes. -Implementations MUST NOT send pkt-line whose length exceeds 65520 -(65516 bytes of payload + 4 bytes of length data). - -Implementations SHOULD NOT send an empty pkt-line ("0004"). - -A pkt-line with a length field of 0 ("0000"), called a flush-pkt, -is a special case and MUST be handled differently than an empty -pkt-line ("0004"). - ----- - pkt-line = data-pkt / flush-pkt - - data-pkt = pkt-len pkt-payload - pkt-len = 4*(HEXDIG) - pkt-payload = (pkt-len - 4)*(OCTET) - - flush-pkt = "0000" ----- - -Examples (as C-style strings): - ----- - pkt-line actual value - --------------------------------- - "0006a\n" "a\n" - "0005a" "a" - "000bfoobar\n" "foobar\n" - "0004" "" ----- diff --git a/third_party/git/Documentation/technical/protocol-v2.txt b/third_party/git/Documentation/technical/protocol-v2.txt deleted file mode 100644 index 40f91f6b1e..0000000000 --- a/third_party/git/Documentation/technical/protocol-v2.txt +++ /dev/null @@ -1,455 +0,0 @@ -Git Wire Protocol, Version 2 -============================ - -This document presents a specification for a version 2 of Git's wire -protocol. Protocol v2 will improve upon v1 in the following ways: - - * Instead of multiple service names, multiple commands will be - supported by a single service - * Easily extendable as capabilities are moved into their own section - of the protocol, no longer being hidden behind a NUL byte and - limited by the size of a pkt-line - * Separate out other information hidden behind NUL bytes (e.g. agent - string as a capability and symrefs can be requested using 'ls-refs') - * Reference advertisement will be omitted unless explicitly requested - * ls-refs command to explicitly request some refs - * Designed with http and stateless-rpc in mind. With clear flush - semantics the http remote helper can simply act as a proxy - -In protocol v2 communication is command oriented. When first contacting a -server a list of capabilities will advertised. Some of these capabilities -will be commands which a client can request be executed. Once a command -has completed, a client can reuse the connection and request that other -commands be executed. - -Packet-Line Framing -------------------- - -All communication is done using packet-line framing, just as in v1. See -`Documentation/technical/pack-protocol.txt` and -`Documentation/technical/protocol-common.txt` for more information. - -In protocol v2 these special packets will have the following semantics: - - * '0000' Flush Packet (flush-pkt) - indicates the end of a message - * '0001' Delimiter Packet (delim-pkt) - separates sections of a message - -Initial Client Request ----------------------- - -In general a client can request to speak protocol v2 by sending -`version=2` through the respective side-channel for the transport being -used which inevitably sets `GIT_PROTOCOL`. More information can be -found in `pack-protocol.txt` and `http-protocol.txt`. In all cases the -response from the server is the capability advertisement. - -Git Transport -~~~~~~~~~~~~~ - -When using the git:// transport, you can request to use protocol v2 by -sending "version=2" as an extra parameter: - - 003egit-upload-pack /project.git\0host=myserver.com\0\0version=2\0 - -SSH and File Transport -~~~~~~~~~~~~~~~~~~~~~~ - -When using either the ssh:// or file:// transport, the GIT_PROTOCOL -environment variable must be set explicitly to include "version=2". - -HTTP Transport -~~~~~~~~~~~~~~ - -When using the http:// or https:// transport a client makes a "smart" -info/refs request as described in `http-protocol.txt` and requests that -v2 be used by supplying "version=2" in the `Git-Protocol` header. - - C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0 - C: Git-Protocol: version=2 - -A v2 server would reply: - - S: 200 OK - S: <Some headers> - S: ... - S: - S: 000eversion 2\n - S: <capability-advertisement> - -Subsequent requests are then made directly to the service -`$GIT_URL/git-upload-pack`. (This works the same for git-receive-pack). - -Capability Advertisement ------------------------- - -A server which decides to communicate (based on a request from a client) -using protocol version 2, notifies the client by sending a version string -in its initial response followed by an advertisement of its capabilities. -Each capability is a key with an optional value. Clients must ignore all -unknown keys. Semantics of unknown values are left to the definition of -each key. Some capabilities will describe commands which can be requested -to be executed by the client. - - capability-advertisement = protocol-version - capability-list - flush-pkt - - protocol-version = PKT-LINE("version 2" LF) - capability-list = *capability - capability = PKT-LINE(key[=value] LF) - - key = 1*(ALPHA | DIGIT | "-_") - value = 1*(ALPHA | DIGIT | " -_.,?\/{}[]()<>!@#$%^&*+=:;") - -Command Request ---------------- - -After receiving the capability advertisement, a client can then issue a -request to select the command it wants with any particular capabilities -or arguments. There is then an optional section where the client can -provide any command specific parameters or queries. Only a single -command can be requested at a time. - - request = empty-request | command-request - empty-request = flush-pkt - command-request = command - capability-list - [command-args] - flush-pkt - command = PKT-LINE("command=" key LF) - command-args = delim-pkt - *command-specific-arg - - command-specific-args are packet line framed arguments defined by - each individual command. - -The server will then check to ensure that the client's request is -comprised of a valid command as well as valid capabilities which were -advertised. If the request is valid the server will then execute the -command. A server MUST wait till it has received the client's entire -request before issuing a response. The format of the response is -determined by the command being executed, but in all cases a flush-pkt -indicates the end of the response. - -When a command has finished, and the client has received the entire -response from the server, a client can either request that another -command be executed or can terminate the connection. A client may -optionally send an empty request consisting of just a flush-pkt to -indicate that no more requests will be made. - -Capabilities ------------- - -There are two different types of capabilities: normal capabilities, -which can be used to convey information or alter the behavior of a -request, and commands, which are the core actions that a client wants to -perform (fetch, push, etc). - -Protocol version 2 is stateless by default. This means that all commands -must only last a single round and be stateless from the perspective of the -server side, unless the client has requested a capability indicating that -state should be maintained by the server. Clients MUST NOT require state -management on the server side in order to function correctly. This -permits simple round-robin load-balancing on the server side, without -needing to worry about state management. - -agent -~~~~~ - -The server can advertise the `agent` capability with a value `X` (in the -form `agent=X`) to notify the client that the server is running version -`X`. The client may optionally send its own agent string by including -the `agent` capability with a value `Y` (in the form `agent=Y`) in its -request to the server (but it MUST NOT do so if the server did not -advertise the agent capability). The `X` and `Y` strings may contain any -printable ASCII characters except space (i.e., the byte range 32 < x < -127), and are typically of the form "package/version" (e.g., -"git/1.8.3.1"). The agent strings are purely informative for statistics -and debugging purposes, and MUST NOT be used to programmatically assume -the presence or absence of particular features. - -ls-refs -~~~~~~~ - -`ls-refs` is the command used to request a reference advertisement in v2. -Unlike the current reference advertisement, ls-refs takes in arguments -which can be used to limit the refs sent from the server. - -Additional features not supported in the base command will be advertised -as the value of the command in the capability advertisement in the form -of a space separated list of features: "<command>=<feature 1> <feature 2>" - -ls-refs takes in the following arguments: - - symrefs - In addition to the object pointed by it, show the underlying ref - pointed by it when showing a symbolic ref. - peel - Show peeled tags. - ref-prefix <prefix> - When specified, only references having a prefix matching one of - the provided prefixes are displayed. - -The output of ls-refs is as follows: - - output = *ref - flush-pkt - ref = PKT-LINE(obj-id SP refname *(SP ref-attribute) LF) - ref-attribute = (symref | peeled) - symref = "symref-target:" symref-target - peeled = "peeled:" obj-id - -fetch -~~~~~ - -`fetch` is the command used to fetch a packfile in v2. It can be looked -at as a modified version of the v1 fetch where the ref-advertisement is -stripped out (since the `ls-refs` command fills that role) and the -message format is tweaked to eliminate redundancies and permit easy -addition of future extensions. - -Additional features not supported in the base command will be advertised -as the value of the command in the capability advertisement in the form -of a space separated list of features: "<command>=<feature 1> <feature 2>" - -A `fetch` request can take the following arguments: - - want <oid> - Indicates to the server an object which the client wants to - retrieve. Wants can be anything and are not limited to - advertised objects. - - have <oid> - Indicates to the server an object which the client has locally. - This allows the server to make a packfile which only contains - the objects that the client needs. Multiple 'have' lines can be - supplied. - - done - Indicates to the server that negotiation should terminate (or - not even begin if performing a clone) and that the server should - use the information supplied in the request to construct the - packfile. - - thin-pack - Request that a thin pack be sent, which is a pack with deltas - which reference base objects not contained within the pack (but - are known to exist at the receiving end). This can reduce the - network traffic significantly, but it requires the receiving end - to know how to "thicken" these packs by adding the missing bases - to the pack. - - no-progress - Request that progress information that would normally be sent on - side-band channel 2, during the packfile transfer, should not be - sent. However, the side-band channel 3 is still used for error - responses. - - include-tag - Request that annotated tags should be sent if the objects they - point to are being sent. - - ofs-delta - Indicate that the client understands PACKv2 with delta referring - to its base by position in pack rather than by an oid. That is, - they can read OBJ_OFS_DELTA (ake type 6) in a packfile. - -If the 'shallow' feature is advertised the following arguments can be -included in the clients request as well as the potential addition of the -'shallow-info' section in the server's response as explained below. - - shallow <oid> - A client must notify the server of all commits for which it only - has shallow copies (meaning that it doesn't have the parents of - a commit) by supplying a 'shallow <oid>' line for each such - object so that the server is aware of the limitations of the - client's history. This is so that the server is aware that the - client may not have all objects reachable from such commits. - - deepen <depth> - Requests that the fetch/clone should be shallow having a commit - depth of <depth> relative to the remote side. - - deepen-relative - Requests that the semantics of the "deepen" command be changed - to indicate that the depth requested is relative to the client's - current shallow boundary, instead of relative to the requested - commits. - - deepen-since <timestamp> - Requests that the shallow clone/fetch should be cut at a - specific time, instead of depth. Internally it's equivalent to - doing "git rev-list --max-age=<timestamp>". Cannot be used with - "deepen". - - deepen-not <rev> - Requests that the shallow clone/fetch should be cut at a - specific revision specified by '<rev>', instead of a depth. - Internally it's equivalent of doing "git rev-list --not <rev>". - Cannot be used with "deepen", but can be used with - "deepen-since". - -If the 'filter' feature is advertised, the following argument can be -included in the client's request: - - filter <filter-spec> - Request that various objects from the packfile be omitted - using one of several filtering techniques. These are intended - for use with partial clone and partial fetch operations. See - `rev-list` for possible "filter-spec" values. When communicating - with other processes, senders SHOULD translate scaled integers - (e.g. "1k") into a fully-expanded form (e.g. "1024") to aid - interoperability with older receivers that may not understand - newly-invented scaling suffixes. However, receivers SHOULD - accept the following suffixes: 'k', 'm', and 'g' for 1024, - 1048576, and 1073741824, respectively. - -If the 'ref-in-want' feature is advertised, the following argument can -be included in the client's request as well as the potential addition of -the 'wanted-refs' section in the server's response as explained below. - - want-ref <ref> - Indicates to the server that the client wants to retrieve a - particular ref, where <ref> is the full name of a ref on the - server. - -If the 'sideband-all' feature is advertised, the following argument can be -included in the client's request: - - sideband-all - Instruct the server to send the whole response multiplexed, not just - the packfile section. All non-flush and non-delim PKT-LINE in the - response (not only in the packfile section) will then start with a byte - indicating its sideband (1, 2, or 3), and the server may send "0005\2" - (a PKT-LINE of sideband 2 with no payload) as a keepalive packet. - -The response of `fetch` is broken into a number of sections separated by -delimiter packets (0001), with each section beginning with its section -header. - - output = *section - section = (acknowledgments | shallow-info | wanted-refs | packfile) - (flush-pkt | delim-pkt) - - acknowledgments = PKT-LINE("acknowledgments" LF) - (nak | *ack) - (ready) - ready = PKT-LINE("ready" LF) - nak = PKT-LINE("NAK" LF) - ack = PKT-LINE("ACK" SP obj-id LF) - - shallow-info = PKT-LINE("shallow-info" LF) - *PKT-LINE((shallow | unshallow) LF) - shallow = "shallow" SP obj-id - unshallow = "unshallow" SP obj-id - - wanted-refs = PKT-LINE("wanted-refs" LF) - *PKT-LINE(wanted-ref LF) - wanted-ref = obj-id SP refname - - packfile = PKT-LINE("packfile" LF) - *PKT-LINE(%x01-03 *%x00-ff) - - acknowledgments section - * If the client determines that it is finished with negotiations - by sending a "done" line, the acknowledgments sections MUST be - omitted from the server's response. - - * Always begins with the section header "acknowledgments" - - * The server will respond with "NAK" if none of the object ids sent - as have lines were common. - - * The server will respond with "ACK obj-id" for all of the - object ids sent as have lines which are common. - - * A response cannot have both "ACK" lines as well as a "NAK" - line. - - * The server will respond with a "ready" line indicating that - the server has found an acceptable common base and is ready to - make and send a packfile (which will be found in the packfile - section of the same response) - - * If the server has found a suitable cut point and has decided - to send a "ready" line, then the server can decide to (as an - optimization) omit any "ACK" lines it would have sent during - its response. This is because the server will have already - determined the objects it plans to send to the client and no - further negotiation is needed. - - shallow-info section - * If the client has requested a shallow fetch/clone, a shallow - client requests a fetch or the server is shallow then the - server's response may include a shallow-info section. The - shallow-info section will be included if (due to one of the - above conditions) the server needs to inform the client of any - shallow boundaries or adjustments to the clients already - existing shallow boundaries. - - * Always begins with the section header "shallow-info" - - * If a positive depth is requested, the server will compute the - set of commits which are no deeper than the desired depth. - - * The server sends a "shallow obj-id" line for each commit whose - parents will not be sent in the following packfile. - - * The server sends an "unshallow obj-id" line for each commit - which the client has indicated is shallow, but is no longer - shallow as a result of the fetch (due to its parents being - sent in the following packfile). - - * The server MUST NOT send any "unshallow" lines for anything - which the client has not indicated was shallow as a part of - its request. - - * This section is only included if a packfile section is also - included in the response. - - wanted-refs section - * This section is only included if the client has requested a - ref using a 'want-ref' line and if a packfile section is also - included in the response. - - * Always begins with the section header "wanted-refs". - - * The server will send a ref listing ("<oid> <refname>") for - each reference requested using 'want-ref' lines. - - * The server MUST NOT send any refs which were not requested - using 'want-ref' lines. - - packfile section - * This section is only included if the client has sent 'want' - lines in its request and either requested that no more - negotiation be done by sending 'done' or if the server has - decided it has found a sufficient cut point to produce a - packfile. - - * Always begins with the section header "packfile" - - * The transmission of the packfile begins immediately after the - section header - - * The data transfer of the packfile is always multiplexed, using - the same semantics of the 'side-band-64k' capability from - protocol version 1. This means that each packet, during the - packfile data stream, is made up of a leading 4-byte pkt-line - length (typical of the pkt-line format), followed by a 1-byte - stream code, followed by the actual data. - - The stream code can be one of: - 1 - pack data - 2 - progress messages - 3 - fatal error message just before stream aborts - -server-option -~~~~~~~~~~~~~ - -If advertised, indicates that any number of server specific options can be -included in a request. This is done by sending each option as a -"server-option=<option>" capability line in the capability-list section of -a request. - -The provided options must not contain a NUL or LF character. diff --git a/third_party/git/Documentation/technical/racy-git.txt b/third_party/git/Documentation/technical/racy-git.txt deleted file mode 100644 index 4a8be4d144..0000000000 --- a/third_party/git/Documentation/technical/racy-git.txt +++ /dev/null @@ -1,201 +0,0 @@ -Use of index and Racy Git problem -================================= - -Background ----------- - -The index is one of the most important data structures in Git. -It represents a virtual working tree state by recording list of -paths and their object names and serves as a staging area to -write out the next tree object to be committed. The state is -"virtual" in the sense that it does not necessarily have to, and -often does not, match the files in the working tree. - -There are cases Git needs to examine the differences between the -virtual working tree state in the index and the files in the -working tree. The most obvious case is when the user asks `git -diff` (or its low level implementation, `git diff-files`) or -`git-ls-files --modified`. In addition, Git internally checks -if the files in the working tree are different from what are -recorded in the index to avoid stomping on local changes in them -during patch application, switching branches, and merging. - -In order to speed up this comparison between the files in the -working tree and the index entries, the index entries record the -information obtained from the filesystem via `lstat(2)` system -call when they were last updated. When checking if they differ, -Git first runs `lstat(2)` on the files and compares the result -with this information (this is what was originally done by the -`ce_match_stat()` function, but the current code does it in -`ce_match_stat_basic()` function). If some of these "cached -stat information" fields do not match, Git can tell that the -files are modified without even looking at their contents. - -Note: not all members in `struct stat` obtained via `lstat(2)` -are used for this comparison. For example, `st_atime` obviously -is not useful. Currently, Git compares the file type (regular -files vs symbolic links) and executable bits (only for regular -files) from `st_mode` member, `st_mtime` and `st_ctime` -timestamps, `st_uid`, `st_gid`, `st_ino`, and `st_size` members. -With a `USE_STDEV` compile-time option, `st_dev` is also -compared, but this is not enabled by default because this member -is not stable on network filesystems. With `USE_NSEC` -compile-time option, `st_mtim.tv_nsec` and `st_ctim.tv_nsec` -members are also compared. On Linux, this is not enabled by default -because in-core timestamps can have finer granularity than -on-disk timestamps, resulting in meaningless changes when an -inode is evicted from the inode cache. See commit 8ce13b0 -of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git -([PATCH] Sync in core time granularity with filesystems, -2005-01-04). This patch is included in kernel 2.6.11 and newer, but -only fixes the issue for file systems with exactly 1 ns or 1 s -resolution. Other file systems are still broken in current Linux -kernels (e.g. CEPH, CIFS, NTFS, UDF), see -https://lkml.org/lkml/2015/6/9/714 - -Racy Git --------- - -There is one slight problem with the optimization based on the -cached stat information. Consider this sequence: - - : modify 'foo' - $ git update-index 'foo' - : modify 'foo' again, in-place, without changing its size - -The first `update-index` computes the object name of the -contents of file `foo` and updates the index entry for `foo` -along with the `struct stat` information. If the modification -that follows it happens very fast so that the file's `st_mtime` -timestamp does not change, after this sequence, the cached stat -information the index entry records still exactly match what you -would see in the filesystem, even though the file `foo` is now -different. -This way, Git can incorrectly think files in the working tree -are unmodified even though they actually are. This is called -the "racy Git" problem (discovered by Pasky), and the entries -that appear clean when they may not be because of this problem -are called "racily clean". - -To avoid this problem, Git does two things: - -. When the cached stat information says the file has not been - modified, and the `st_mtime` is the same as (or newer than) - the timestamp of the index file itself (which is the time `git - update-index foo` finished running in the above example), it - also compares the contents with the object registered in the - index entry to make sure they match. - -. When the index file is updated that contains racily clean - entries, cached `st_size` information is truncated to zero - before writing a new version of the index file. - -Because the index file itself is written after collecting all -the stat information from updated paths, `st_mtime` timestamp of -it is usually the same as or newer than any of the paths the -index contains. And no matter how quick the modification that -follows `git update-index foo` finishes, the resulting -`st_mtime` timestamp on `foo` cannot get a value earlier -than the index file. Therefore, index entries that can be -racily clean are limited to the ones that have the same -timestamp as the index file itself. - -The callers that want to check if an index entry matches the -corresponding file in the working tree continue to call -`ce_match_stat()`, but with this change, `ce_match_stat()` uses -`ce_modified_check_fs()` to see if racily clean ones are -actually clean after comparing the cached stat information using -`ce_match_stat_basic()`. - -The problem the latter solves is this sequence: - - $ git update-index 'foo' - : modify 'foo' in-place without changing its size - : wait for enough time - $ git update-index 'bar' - -Without the latter, the timestamp of the index file gets a newer -value, and falsely clean entry `foo` would not be caught by the -timestamp comparison check done with the former logic anymore. -The latter makes sure that the cached stat information for `foo` -would never match with the file in the working tree, so later -checks by `ce_match_stat_basic()` would report that the index entry -does not match the file and Git does not have to fall back on more -expensive `ce_modified_check_fs()`. - - -Runtime penalty ---------------- - -The runtime penalty of falling back to `ce_modified_check_fs()` -from `ce_match_stat()` can be very expensive when there are many -racily clean entries. An obvious way to artificially create -this situation is to give the same timestamp to all the files in -the working tree in a large project, run `git update-index` on -them, and give the same timestamp to the index file: - - $ date >.datestamp - $ git ls-files | xargs touch -r .datestamp - $ git ls-files | git update-index --stdin - $ touch -r .datestamp .git/index - -This will make all index entries racily clean. The linux project, for -example, there are over 20,000 files in the working tree. On my -Athlon 64 X2 3800+, after the above: - - $ /usr/bin/time git diff-files - 1.68user 0.54system 0:02.22elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k - 0inputs+0outputs (0major+67111minor)pagefaults 0swaps - $ git update-index MAINTAINERS - $ /usr/bin/time git diff-files - 0.02user 0.12system 0:00.14elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k - 0inputs+0outputs (0major+935minor)pagefaults 0swaps - -Running `git update-index` in the middle checked the racily -clean entries, and left the cached `st_mtime` for all the paths -intact because they were actually clean (so this step took about -the same amount of time as the first `git diff-files`). After -that, they are not racily clean anymore but are truly clean, so -the second invocation of `git diff-files` fully took advantage -of the cached stat information. - - -Avoiding runtime penalty ------------------------- - -In order to avoid the above runtime penalty, post 1.4.2 Git used -to have a code that made sure the index file -got timestamp newer than the youngest files in the index when -there are many young files with the same timestamp as the -resulting index file would otherwise would have by waiting -before finishing writing the index file out. - -I suspected that in practice the situation where many paths in the -index are all racily clean was quite rare. The only code paths -that can record recent timestamp for large number of paths are: - -. Initial `git add .` of a large project. - -. `git checkout` of a large project from an empty index into an - unpopulated working tree. - -Note: switching branches with `git checkout` keeps the cached -stat information of existing working tree files that are the -same between the current branch and the new branch, which are -all older than the resulting index file, and they will not -become racily clean. Only the files that are actually checked -out can become racily clean. - -In a large project where raciness avoidance cost really matters, -however, the initial computation of all object names in the -index takes more than one second, and the index file is written -out after all that happens. Therefore the timestamp of the -index file will be more than one seconds later than the -youngest file in the working tree. This means that in these -cases there actually will not be any racily clean entry in -the resulting index. - -Based on this discussion, the current code does not use the -"workaround" to avoid the runtime penalty that does not exist in -practice anymore. This was done with commit 0fc82cff on Aug 15, -2006. diff --git a/third_party/git/Documentation/technical/repository-version.txt b/third_party/git/Documentation/technical/repository-version.txt deleted file mode 100644 index 7844ef30ff..0000000000 --- a/third_party/git/Documentation/technical/repository-version.txt +++ /dev/null @@ -1,102 +0,0 @@ -== Git Repository Format Versions - -Every git repository is marked with a numeric version in the -`core.repositoryformatversion` key of its `config` file. This version -specifies the rules for operating on the on-disk repository data. An -implementation of git which does not understand a particular version -advertised by an on-disk repository MUST NOT operate on that repository; -doing so risks not only producing wrong results, but actually losing -data. - -Because of this rule, version bumps should be kept to an absolute -minimum. Instead, we generally prefer these strategies: - - - bumping format version numbers of individual data files (e.g., - index, packfiles, etc). This restricts the incompatibilities only to - those files. - - - introducing new data that gracefully degrades when used by older - clients (e.g., pack bitmap files are ignored by older clients, which - simply do not take advantage of the optimization they provide). - -A whole-repository format version bump should only be part of a change -that cannot be independently versioned. For instance, if one were to -change the reachability rules for objects, or the rules for locking -refs, that would require a bump of the repository format version. - -Note that this applies only to accessing the repository's disk contents -directly. An older client which understands only format `0` may still -connect via `git://` to a repository using format `1`, as long as the -server process understands format `1`. - -The preferred strategy for rolling out a version bump (whether whole -repository or for a single file) is to teach git to read the new format, -and allow writing the new format with a config switch or command line -option (for experimentation or for those who do not care about backwards -compatibility with older gits). Then after a long period to allow the -reading capability to become common, we may switch to writing the new -format by default. - -The currently defined format versions are: - -=== Version `0` - -This is the format defined by the initial version of git, including but -not limited to the format of the repository directory, the repository -configuration file, and the object and ref storage. Specifying the -complete behavior of git is beyond the scope of this document. - -=== Version `1` - -This format is identical to version `0`, with the following exceptions: - - 1. When reading the `core.repositoryformatversion` variable, a git - implementation which supports version 1 MUST also read any - configuration keys found in the `extensions` section of the - configuration file. - - 2. If a version-1 repository specifies any `extensions.*` keys that - the running git has not implemented, the operation MUST NOT - proceed. Similarly, if the value of any known key is not understood - by the implementation, the operation MUST NOT proceed. - -Note that if no extensions are specified in the config file, then -`core.repositoryformatversion` SHOULD be set to `0` (setting it to `1` -provides no benefit, and makes the repository incompatible with older -implementations of git). - -This document will serve as the master list for extensions. Any -implementation wishing to define a new extension should make a note of -it here, in order to claim the name. - -The defined extensions are: - -==== `noop` - -This extension does not change git's behavior at all. It is useful only -for testing format-1 compatibility. - -==== `preciousObjects` - -When the config key `extensions.preciousObjects` is set to `true`, -objects in the repository MUST NOT be deleted (e.g., by `git-prune` or -`git repack -d`). - -==== `partialclone` - -When the config key `extensions.partialclone` is set, it indicates -that the repo was created with a partial clone (or later performed -a partial fetch) and that the remote may have omitted sending -certain unwanted objects. Such a remote is called a "promisor remote" -and it promises that all such omitted objects can be fetched from it -in the future. - -The value of this key is the name of the promisor remote. - -==== `worktreeConfig` - -If set, by default "git config" reads from both "config" and -"config.worktree" file from GIT_DIR in that order. In -multiple working directory mode, "config" file is shared while -"config.worktree" is per-working directory (i.e., it's in -GIT_COMMON_DIR/worktrees/<id>/config.worktree) diff --git a/third_party/git/Documentation/technical/rerere.txt b/third_party/git/Documentation/technical/rerere.txt deleted file mode 100644 index aa22d7ace8..0000000000 --- a/third_party/git/Documentation/technical/rerere.txt +++ /dev/null @@ -1,186 +0,0 @@ -Rerere -====== - -This document describes the rerere logic. - -Conflict normalization ----------------------- - -To ensure recorded conflict resolutions can be looked up in the rerere -database, even when branches are merged in a different order, -different branches are merged that result in the same conflict, or -when different conflict style settings are used, rerere normalizes the -conflicts before writing them to the rerere database. - -Different conflict styles and branch names are normalized by stripping -the labels from the conflict markers, and removing the common ancestor -version from the `diff3` conflict style. Branches that are merged -in different order are normalized by sorting the conflict hunks. More -on each of those steps in the following sections. - -Once these two normalization operations are applied, a conflict ID is -calculated based on the normalized conflict, which is later used by -rerere to look up the conflict in the rerere database. - -Removing the common ancestor version -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Say we have three branches AB, AC and AC2. The common ancestor of -these branches has a file with a line containing the string "A" (for -brevity this is called "line A" in the rest of the document). In -branch AB this line is changed to "B", in AC, this line is changed to -"C", and branch AC2 is forked off of AC, after the line was changed to -"C". - -Forking a branch ABAC off of branch AB and then merging AC into it, we -get a conflict like the following: - - <<<<<<< HEAD - B - ======= - C - >>>>>>> AC - -Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB -and then merging branch AC2 into it), using the diff3 conflict style, -we get a conflict like the following: - - <<<<<<< HEAD - B - ||||||| merged common ancestors - A - ======= - C - >>>>>>> AC2 - -By resolving this conflict, to leave line D, the user declares: - - After examining what branches AB and AC did, I believe that making - line A into line D is the best thing to do that is compatible with - what AB and AC wanted to do. - -As branch AC2 refers to the same commit as AC, the above implies that -this is also compatible what AB and AC2 wanted to do. - -By extension, this means that rerere should recognize that the above -conflicts are the same. To do this, the labels on the conflict -markers are stripped, and the common ancestor version is removed. The above -examples would both result in the following normalized conflict: - - <<<<<<< - B - ======= - C - >>>>>>> - -Sorting hunks -~~~~~~~~~~~~~ - -As before, lets imagine that a common ancestor had a file with line A -its early part, and line X in its late part. And then four branches -are forked that do these things: - - - AB: changes A to B - - AC: changes A to C - - XY: changes X to Y - - XZ: changes X to Z - -Now, forking a branch ABAC off of branch AB and then merging AC into -it, and forking a branch ACAB off of branch AC and then merging AB -into it, would yield the conflict in a different order. The former -would say "A became B or C, what now?" while the latter would say "A -became C or B, what now?" - -As a reminder, the act of merging AC into ABAC and resolving the -conflict to leave line D means that the user declares: - - After examining what branches AB and AC did, I believe that - making line A into line D is the best thing to do that is - compatible with what AB and AC wanted to do. - -So the conflict we would see when merging AB into ACAB should be -resolved the same way---it is the resolution that is in line with that -declaration. - -Imagine that similarly previously a branch XYXZ was forked from XY, -and XZ was merged into it, and resolved "X became Y or Z" into "X -became W". - -Now, if a branch ABXY was forked from AB and then merged XY, then ABXY -would have line B in its early part and line Y in its later part. -Such a merge would be quite clean. We can construct 4 combinations -using these four branches ((AB, AC) x (XY, XZ)). - -Merging ABXY and ACXZ would make "an early A became B or C, a late X -became Y or Z" conflict, while merging ACXY and ABXZ would make "an -early A became C or B, a late X became Y or Z". We can see there are -4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X"). - -By sorting, the conflict is given its canonical name, namely, "an -early part became B or C, a late part becames X or Y", and whenever -any of these four patterns appear, and we can get to the same conflict -and resolution that we saw earlier. - -Without the sorting, we'd have to somehow find a previous resolution -from combinatorial explosion. - -Conflict ID calculation -~~~~~~~~~~~~~~~~~~~~~~~ - -Once the conflict normalization is done, the conflict ID is calculated -as the sha1 hash of the conflict hunks appended to each other, -separated by <NUL> characters. The conflict markers are stripped out -before the sha1 is calculated. So in the example above, where we -merge branch AC which changes line A to line C, into branch AB, which -changes line A to line C, the conflict ID would be -SHA1('B<NUL>C<NUL>'). - -If there are multiple conflicts in one file, the sha1 is calculated -the same way with all hunks appended to each other, in the order in -which they appear in the file, separated by a <NUL> character. - -Nested conflicts -~~~~~~~~~~~~~~~~ - -Nested conflicts are handled very similarly to "simple" conflicts. -Similar to simple conflicts, the conflict is first normalized by -stripping the labels from conflict markers, stripping the common ancestor -version, and the sorting the conflict hunks, both for the outer and the -inner conflict. This is done recursively, so any number of nested -conflicts can be handled. - -Note that this only works for conflict markers that "cleanly nest". If -there are any unmatched conflict markers, rerere will fail to handle -the conflict and record a conflict resolution. - -The only difference is in how the conflict ID is calculated. For the -inner conflict, the conflict markers themselves are not stripped out -before calculating the sha1. - -Say we have the following conflict for example: - - <<<<<<< HEAD - 1 - ======= - <<<<<<< HEAD - 3 - ======= - 2 - >>>>>>> branch-2 - >>>>>>> branch-3~ - -After stripping out the labels of the conflict markers, and sorting -the hunks, the conflict would look as follows: - - <<<<<<< - 1 - ======= - <<<<<<< - 2 - ======= - 3 - >>>>>>> - >>>>>>> - -and finally the conflict ID would be calculated as: -`sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')` diff --git a/third_party/git/Documentation/technical/send-pack-pipeline.txt b/third_party/git/Documentation/technical/send-pack-pipeline.txt deleted file mode 100644 index 9b5a0bc186..0000000000 --- a/third_party/git/Documentation/technical/send-pack-pipeline.txt +++ /dev/null @@ -1,63 +0,0 @@ -Git-send-pack internals -======================= - -Overall operation ------------------ - -. Connects to the remote side and invokes git-receive-pack. - -. Learns what refs the remote has and what commit they point at. - Matches them to the refspecs we are pushing. - -. Checks if there are non-fast-forwards. Unlike fetch-pack, - the repository send-pack runs in is supposed to be a superset - of the recipient in fast-forward cases, so there is no need - for want/have exchanges, and fast-forward check can be done - locally. Tell the result to the other end. - -. Calls pack_objects() which generates a packfile and sends it - over to the other end. - -. If the remote side is new enough (v1.1.0 or later), wait for - the unpack and hook status from the other end. - -. Exit with appropriate error codes. - - -Pack_objects pipeline ---------------------- - -This function gets one file descriptor (`fd`) which is either a -socket (over the network) or a pipe (local). What's written to -this fd goes to git-receive-pack to be unpacked. - - send-pack ---> fd ---> receive-pack - -The function pack_objects creates a pipe and then forks. The -forked child execs pack-objects with --revs to receive revision -parameters from its standard input. This process will write the -packfile to the other end. - - send-pack - | - pack_objects() ---> fd ---> receive-pack - | ^ (pipe) - v | - (child) - -The child dup2's to arrange its standard output to go back to -the other end, and read its standard input to come from the -pipe. After that it exec's pack-objects. On the other hand, -the parent process, before starting to feed the child pipeline, -closes the reading side of the pipe and fd to receive-pack. - - send-pack - | - pack_objects(parent) - | - v [0] - pack-objects [0] ---> receive-pack - - -[jc: the pipeline was much more complex and needed documentation before - I understood an earlier bug, but now it is trivial and straightforward.] diff --git a/third_party/git/Documentation/technical/shallow.txt b/third_party/git/Documentation/technical/shallow.txt deleted file mode 100644 index 01dedfe9ff..0000000000 --- a/third_party/git/Documentation/technical/shallow.txt +++ /dev/null @@ -1,60 +0,0 @@ -Shallow commits -=============== - -.Definition -********************************************************* -Shallow commits do have parents, but not in the shallow -repo, and therefore grafts are introduced pretending that -these commits have no parents. -********************************************************* - -$GIT_DIR/shallow lists commit object names and tells Git to -pretend as if they are root commits (e.g. "git log" traversal -stops after showing them; "git fsck" does not complain saying -the commits listed on their "parent" lines do not exist). - -Each line contains exactly one SHA-1. When read, a commit_graft -will be constructed, which has nr_parent < 0 to make it easier -to discern from user provided grafts. - -Note that the shallow feature could not be changed easily to -use replace refs: a commit containing a `mergetag` is not allowed -to be replaced, not even by a root commit. Such a commit can be -made shallow, though. Also, having a `shallow` file explicitly -listing all the commits made shallow makes it a *lot* easier to -do shallow-specific things such as to deepen the history. - -Since fsck-objects relies on the library to read the objects, -it honours shallow commits automatically. - -There are some unfinished ends of the whole shallow business: - -- maybe we have to force non-thin packs when fetching into a - shallow repo (ATM they are forced non-thin). - -- A special handling of a shallow upstream is needed. At some - stage, upload-pack has to check if it sends a shallow commit, - and it should send that information early (or fail, if the - client does not support shallow repositories). There is no - support at all for this in this patch series. - -- Instead of locking $GIT_DIR/shallow at the start, just - the timestamp of it is noted, and when it comes to writing it, - a check is performed if the mtime is still the same, dying if - it is not. - -- It is unclear how "push into/from a shallow repo" should behave. - -- If you deepen a history, you'd want to get the tags of the - newly stored (but older!) commits. This does not work right now. - -To make a shallow clone, you can call "git-clone --depth 20 repo". -The result contains only commit chains with a length of at most 20. -It also writes an appropriate $GIT_DIR/shallow. - -You can deepen a shallow repository with "git-fetch --depth 20 -repo branch", which will fetch branch from repo, but stop at depth -20, updating $GIT_DIR/shallow. - -The special depth 2147483647 (or 0x7fffffff, the largest positive -number a signed 32-bit integer can contain) means infinite depth. diff --git a/third_party/git/Documentation/technical/signature-format.txt b/third_party/git/Documentation/technical/signature-format.txt deleted file mode 100644 index 2c9406a56a..0000000000 --- a/third_party/git/Documentation/technical/signature-format.txt +++ /dev/null @@ -1,186 +0,0 @@ -Git signature format -==================== - -== Overview - -Git uses cryptographic signatures in various places, currently objects (tags, -commits, mergetags) and transactions (pushes). In every case, the command which -is about to create an object or transaction determines a payload from that, -calls gpg to obtain a detached signature for the payload (`gpg -bsa`) and -embeds the signature into the object or transaction. - -Signatures always begin with `-----BEGIN PGP SIGNATURE-----` -and end with `-----END PGP SIGNATURE-----`, unless gpg is told to -produce RFC1991 signatures which use `MESSAGE` instead of `SIGNATURE`. - -The signed payload and the way the signature is embedded depends -on the type of the object resp. transaction. - -== Tag signatures - -- created by: `git tag -s` -- payload: annotated tag object -- embedding: append the signature to the unsigned tag object -- example: tag `signedtag` with subject `signed tag` - ----- -object 04b871796dc0420f8e7561a895b52484b701d51a -type commit -tag signedtag -tagger C O Mitter <committer@example.com> 1465981006 +0000 - -signed tag - -signed tag message body ------BEGIN PGP SIGNATURE----- -Version: GnuPG v1 - -iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn -rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh -8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods -q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0 -rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x -lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E= -=jpXa ------END PGP SIGNATURE----- ----- - -- verify with: `git verify-tag [-v]` or `git tag -v` - ----- -gpg: Signature made Wed Jun 15 10:56:46 2016 CEST using RSA key ID B7227189 -gpg: Good signature from "Eris Discordia <discord@example.net>" -gpg: WARNING: This key is not certified with a trusted signature! -gpg: There is no indication that the signature belongs to the owner. -Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189 -object 04b871796dc0420f8e7561a895b52484b701d51a -type commit -tag signedtag -tagger C O Mitter <committer@example.com> 1465981006 +0000 - -signed tag - -signed tag message body ----- - -== Commit signatures - -- created by: `git commit -S` -- payload: commit object -- embedding: header entry `gpgsig` - (content is preceded by a space) -- example: commit with subject `signed commit` - ----- -tree eebfed94e75e7760540d1485c740902590a00332 -parent 04b871796dc0420f8e7561a895b52484b701d51a -author A U Thor <author@example.com> 1465981137 +0000 -committer C O Mitter <committer@example.com> 1465981137 +0000 -gpgsig -----BEGIN PGP SIGNATURE----- - Version: GnuPG v1 - - iQEcBAABAgAGBQJXYRjRAAoJEGEJLoW3InGJ3IwIAIY4SA6GxY3BjL60YyvsJPh/ - HRCJwH+w7wt3Yc/9/bW2F+gF72kdHOOs2jfv+OZhq0q4OAN6fvVSczISY/82LpS7 - DVdMQj2/YcHDT4xrDNBnXnviDO9G7am/9OE77kEbXrp7QPxvhjkicHNwy2rEflAA - zn075rtEERDHr8nRYiDh8eVrefSO7D+bdQ7gv+7GsYMsd2auJWi1dHOSfTr9HIF4 - HJhWXT9d2f8W+diRYXGh4X0wYiGg6na/soXc+vdtDYBzIxanRqjg8jCAeo1eOTk1 - EdTwhcTZlI0x5pvJ3H0+4hA2jtldVtmPM4OTB0cTrEWBad7XV6YgiyuII73Ve3I= - =jKHM - -----END PGP SIGNATURE----- - -signed commit - -signed commit message body ----- - -- verify with: `git verify-commit [-v]` (or `git show --show-signature`) - ----- -gpg: Signature made Wed Jun 15 10:58:57 2016 CEST using RSA key ID B7227189 -gpg: Good signature from "Eris Discordia <discord@example.net>" -gpg: WARNING: This key is not certified with a trusted signature! -gpg: There is no indication that the signature belongs to the owner. -Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189 -tree eebfed94e75e7760540d1485c740902590a00332 -parent 04b871796dc0420f8e7561a895b52484b701d51a -author A U Thor <author@example.com> 1465981137 +0000 -committer C O Mitter <committer@example.com> 1465981137 +0000 - -signed commit - -signed commit message body ----- - -== Mergetag signatures - -- created by: `git merge` on signed tag -- payload/embedding: the whole signed tag object is embedded into - the (merge) commit object as header entry `mergetag` -- example: merge of the signed tag `signedtag` as above - ----- -tree c7b1cff039a93f3600a1d18b82d26688668c7dea -parent c33429be94b5f2d3ee9b0adad223f877f174b05d -parent 04b871796dc0420f8e7561a895b52484b701d51a -author A U Thor <author@example.com> 1465982009 +0000 -committer C O Mitter <committer@example.com> 1465982009 +0000 -mergetag object 04b871796dc0420f8e7561a895b52484b701d51a - type commit - tag signedtag - tagger C O Mitter <committer@example.com> 1465981006 +0000 - - signed tag - - signed tag message body - -----BEGIN PGP SIGNATURE----- - Version: GnuPG v1 - - iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn - rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh - 8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods - q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0 - rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x - lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E= - =jpXa - -----END PGP SIGNATURE----- - -Merge tag 'signedtag' into downstream - -signed tag - -signed tag message body - -# gpg: Signature made Wed Jun 15 08:56:46 2016 UTC using RSA key ID B7227189 -# gpg: Good signature from "Eris Discordia <discord@example.net>" -# gpg: WARNING: This key is not certified with a trusted signature! -# gpg: There is no indication that the signature belongs to the owner. -# Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189 ----- - -- verify with: verification is embedded in merge commit message by default, - alternatively with `git show --show-signature`: - ----- -commit 9863f0c76ff78712b6800e199a46aa56afbcbd49 -merged tag 'signedtag' -gpg: Signature made Wed Jun 15 10:56:46 2016 CEST using RSA key ID B7227189 -gpg: Good signature from "Eris Discordia <discord@example.net>" -gpg: WARNING: This key is not certified with a trusted signature! -gpg: There is no indication that the signature belongs to the owner. -Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189 -Merge: c33429b 04b8717 -Author: A U Thor <author@example.com> -Date: Wed Jun 15 09:13:29 2016 +0000 - - Merge tag 'signedtag' into downstream - - signed tag - - signed tag message body - - # gpg: Signature made Wed Jun 15 08:56:46 2016 UTC using RSA key ID B7227189 - # gpg: Good signature from "Eris Discordia <discord@example.net>" - # gpg: WARNING: This key is not certified with a trusted signature! - # gpg: There is no indication that the signature belongs to the owner. - # Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189 ----- diff --git a/third_party/git/Documentation/technical/trivial-merge.txt b/third_party/git/Documentation/technical/trivial-merge.txt deleted file mode 100644 index 1f1c33d0da..0000000000 --- a/third_party/git/Documentation/technical/trivial-merge.txt +++ /dev/null @@ -1,121 +0,0 @@ -Trivial merge rules -=================== - -This document describes the outcomes of the trivial merge logic in read-tree. - -One-way merge -------------- - -This replaces the index with a different tree, keeping the stat info -for entries that don't change, and allowing -u to make the minimum -required changes to the working tree to have it match. - -Entries marked '+' have stat information. Spaces marked '*' don't -affect the result. - - index tree result - ----------------------- - * (empty) (empty) - (empty) tree tree - index+ tree tree - index+ index index+ - -Two-way merge -------------- - -It is permitted for the index to lack an entry; this does not prevent -any case from applying. - -If the index exists, it is an error for it not to match either the old -or the result. - -If multiple cases apply, the one used is listed first. - -A result which changes the index is an error if the index is not empty -and not up to date. - -Entries marked '+' have stat information. Spaces marked '*' don't -affect the result. - - case index old new result - ------------------------------------- - 0/2 (empty) * (empty) (empty) - 1/3 (empty) * new new - 4/5 index+ (empty) (empty) index+ - 6/7 index+ (empty) index index+ - 10 index+ index (empty) (empty) - 14/15 index+ old old index+ - 18/19 index+ old index index+ - 20 index+ index new new - -Three-way merge ---------------- - -It is permitted for the index to lack an entry; this does not prevent -any case from applying. - -If the index exists, it is an error for it not to match either the -head or (if the merge is trivial) the result. - -If multiple cases apply, the one used is listed first. - -A result of "no merge" means that index is left in stage 0, ancest in -stage 1, head in stage 2, and remote in stage 3 (if any of these are -empty, no entry is left for that stage). Otherwise, the given entry is -left in stage 0, and there are no other entries. - -A result of "no merge" is an error if the index is not empty and not -up to date. - -*empty* means that the tree must not have a directory-file conflict - with the entry. - -For multiple ancestors, a '+' means that this case applies even if -only one ancestor or remote fits; a '^' means all of the ancestors -must be the same. - - case ancest head remote result - ---------------------------------------- - 1 (empty)+ (empty) (empty) (empty) - 2ALT (empty)+ *empty* remote remote - 2 (empty)^ (empty) remote no merge - 3ALT (empty)+ head *empty* head - 3 (empty)^ head (empty) no merge - 4 (empty)^ head remote no merge - 5ALT * head head head - 6 ancest+ (empty) (empty) no merge - 8 ancest^ (empty) ancest no merge - 7 ancest+ (empty) remote no merge - 10 ancest^ ancest (empty) no merge - 9 ancest+ head (empty) no merge - 16 anc1/anc2 anc1 anc2 no merge - 13 ancest+ head ancest head - 14 ancest+ ancest remote remote - 11 ancest+ head remote no merge - -Only #2ALT and #3ALT use *empty*, because these are the only cases -where there can be conflicts that didn't exist before. Note that we -allow directory-file conflicts between things in different stages -after the trivial merge. - -A possible alternative for #6 is (empty), which would make it like -#1. This is not used, due to the likelihood that it arises due to -moving the file to multiple different locations or moving and deleting -it in different branches. - -Case #1 is included for completeness, and also in case we decide to -put on '+' markings; any path that is never mentioned at all isn't -handled. - -Note that #16 is when both #13 and #14 apply; in this case, we refuse -the trivial merge, because we can't tell from this data which is -right. This is a case of a reverted patch (in some direction, maybe -multiple times), and the right answer depends on looking at crossings -of history or common ancestors of the ancestors. - -Note that, between #6, #7, #9, and #11, all cases not otherwise -covered are handled in this table. - -For #8 and #10, there is alternative behavior, not currently -implemented, where the result is (empty). As currently implemented, -the automatic merge will generally give this effect. |