about summary refs log tree commit diff
path: root/third_party/git/Documentation/technical/reftable.txt
diff options
context:
space:
mode:
Diffstat (limited to 'third_party/git/Documentation/technical/reftable.txt')
-rw-r--r--third_party/git/Documentation/technical/reftable.txt1083
1 files changed, 0 insertions, 1083 deletions
diff --git a/third_party/git/Documentation/technical/reftable.txt b/third_party/git/Documentation/technical/reftable.txt
deleted file mode 100644
index 2951840e9c..0000000000
--- a/third_party/git/Documentation/technical/reftable.txt
+++ /dev/null
@@ -1,1083 +0,0 @@
-reftable
---------
-
-Overview
-~~~~~~~~
-
-Problem statement
-^^^^^^^^^^^^^^^^^
-
-Some repositories contain a lot of references (e.g. android at 866k,
-rails at 31k). The existing packed-refs format takes up a lot of space
-(e.g. 62M), and does not scale with additional references. Lookup of a
-single reference requires linearly scanning the file.
-
-Atomic pushes modifying multiple references require copying the entire
-packed-refs file, which can be a considerable amount of data moved
-(e.g. 62M in, 62M out) for even small transactions (2 refs modified).
-
-Repositories with many loose references occupy a large number of disk
-blocks from the local file system, as each reference is its own file
-storing 41 bytes (and another file for the corresponding reflog). This
-negatively affects the number of inodes available when a large number of
-repositories are stored on the same filesystem. Readers can be penalized
-due to the larger number of syscalls required to traverse and read the
-`$GIT_DIR/refs` directory.
-
-
-Objectives
-^^^^^^^^^^
-
-* Near constant time lookup for any single reference, even when the
-repository is cold and not in process or kernel cache.
-* Near constant time verification if an object name is referred to by at least
-one reference (for allow-tip-sha1-in-want).
-* Efficient enumeration of an entire namespace, such as `refs/tags/`.
-* Support atomic push with `O(size_of_update)` operations.
-* Combine reflog storage with ref storage for small transactions.
-* Separate reflog storage for base refs and historical logs.
-
-Description
-^^^^^^^^^^^
-
-A reftable file is a portable binary file format customized for
-reference storage. References are sorted, enabling linear scans, binary
-search lookup, and range scans.
-
-Storage in the file is organized into variable sized blocks. Prefix
-compression is used within a single block to reduce disk space. Block
-size and alignment is tunable by the writer.
-
-Performance
-^^^^^^^^^^^
-
-Space used, packed-refs vs. reftable:
-
-[cols=",>,>,>,>,>",options="header",]
-|===============================================================
-|repository |packed-refs |reftable |% original |avg ref |avg obj
-|android |62.2 M |36.1 M |58.0% |33 bytes |5 bytes
-|rails |1.8 M |1.1 M |57.7% |29 bytes |4 bytes
-|git |78.7 K |48.1 K |61.0% |50 bytes |4 bytes
-|git (heads) |332 b |269 b |81.0% |33 bytes |0 bytes
-|===============================================================
-
-Scan (read 866k refs), by reference name lookup (single ref from 866k
-refs), and by SHA-1 lookup (refs with that SHA-1, from 866k refs):
-
-[cols=",>,>,>,>",options="header",]
-|=========================================================
-|format |cache |scan |by name |by SHA-1
-|packed-refs |cold |402 ms |409,660.1 usec |412,535.8 usec
-|packed-refs |hot | |6,844.6 usec |20,110.1 usec
-|reftable |cold |112 ms |33.9 usec |323.2 usec
-|reftable |hot | |20.2 usec |320.8 usec
-|=========================================================
-
-Space used for 149,932 log entries for 43,061 refs, reflog vs. reftable:
-
-[cols=",>,>",options="header",]
-|================================
-|format |size |avg entry
-|$GIT_DIR/logs |173 M |1209 bytes
-|reftable |5 M |37 bytes
-|================================
-
-Details
-~~~~~~~
-
-Peeling
-^^^^^^^
-
-References stored in a reftable are peeled, a record for an annotated
-(or signed) tag records both the tag object, and the object it refers
-to. This is analogous to storage in the packed-refs format.
-
-Reference name encoding
-^^^^^^^^^^^^^^^^^^^^^^^
-
-Reference names are an uninterpreted sequence of bytes that must pass
-linkgit:git-check-ref-format[1] as a valid reference name.
-
-Key unicity
-^^^^^^^^^^^
-
-Each entry must have a unique key; repeated keys are disallowed.
-
-Network byte order
-^^^^^^^^^^^^^^^^^^
-
-All multi-byte, fixed width fields are in network byte order.
-
-Varint encoding
-^^^^^^^^^^^^^^^
-
-Varint encoding is identical to the ofs-delta encoding method used
-within pack files.
-
-Decoder works such as:
-
-....
-val = buf[ptr] & 0x7f
-while (buf[ptr] & 0x80) {
-  ptr++
-  val = ((val + 1) << 7) | (buf[ptr] & 0x7f)
-}
-....
-
-Ordering
-^^^^^^^^
-
-Blocks are lexicographically ordered by their first reference.
-
-Directory/file conflicts
-^^^^^^^^^^^^^^^^^^^^^^^^
-
-The reftable format accepts both `refs/heads/foo` and
-`refs/heads/foo/bar` as distinct references.
-
-This property is useful for retaining log records in reftable, but may
-confuse versions of Git using `$GIT_DIR/refs` directory tree to maintain
-references. Users of reftable may choose to continue to reject `foo` and
-`foo/bar` type conflicts to prevent problems for peers.
-
-File format
-~~~~~~~~~~~
-
-Structure
-^^^^^^^^^
-
-A reftable file has the following high-level structure:
-
-....
-first_block {
-  header
-  first_ref_block
-}
-ref_block*
-ref_index*
-obj_block*
-obj_index*
-log_block*
-log_index*
-footer
-....
-
-A log-only file omits the `ref_block`, `ref_index`, `obj_block` and
-`obj_index` sections, containing only the file header and log block:
-
-....
-first_block {
-  header
-}
-log_block*
-log_index*
-footer
-....
-
-in a log-only file the first log block immediately follows the file
-header, without padding to block alignment.
-
-Block size
-^^^^^^^^^^
-
-The file's block size is arbitrarily determined by the writer, and does
-not have to be a power of 2. The block size must be larger than the
-longest reference name or log entry used in the repository, as
-references cannot span blocks.
-
-Powers of two that are friendly to the virtual memory system or
-filesystem (such as 4k or 8k) are recommended. Larger sizes (64k) can
-yield better compression, with a possible increased cost incurred by
-readers during access.
-
-The largest block size is `16777215` bytes (15.99 MiB).
-
-Block alignment
-^^^^^^^^^^^^^^^
-
-Writers may choose to align blocks at multiples of the block size by
-including `padding` filled with NUL bytes at the end of a block to round
-out to the chosen alignment. When alignment is used, writers must
-specify the alignment with the file header's `block_size` field.
-
-Block alignment is not required by the file format. Unaligned files must
-set `block_size = 0` in the file header, and omit `padding`. Unaligned
-files with more than one ref block must include the link:#Ref-index[ref
-index] to support fast lookup. Readers must be able to read both aligned
-and non-aligned files.
-
-Very small files (e.g. a single ref block) may omit `padding` and the ref
-index to reduce total file size.
-
-Header (version 1)
-^^^^^^^^^^^^^^^^^^
-
-A 24-byte header appears at the beginning of the file:
-
-....
-'REFT'
-uint8( version_number = 1 )
-uint24( block_size )
-uint64( min_update_index )
-uint64( max_update_index )
-....
-
-Aligned files must specify `block_size` to configure readers with the
-expected block alignment. Unaligned files must set `block_size = 0`.
-
-The `min_update_index` and `max_update_index` describe bounds for the
-`update_index` field of all log records in this file. When reftables are
-used in a stack for link:#Update-transactions[transactions], these
-fields can order the files such that the prior file's
-`max_update_index + 1` is the next file's `min_update_index`.
-
-Header (version 2)
-^^^^^^^^^^^^^^^^^^
-
-A 28-byte header appears at the beginning of the file:
-
-....
-'REFT'
-uint8( version_number = 2 )
-uint24( block_size )
-uint64( min_update_index )
-uint64( max_update_index )
-uint32( hash_id )
-....
-
-The header is identical to `version_number=1`, with the 4-byte hash ID
-("sha1" for SHA1 and "s256" for SHA-256) append to the header.
-
-For maximum backward compatibility, it is recommended to use version 1 when
-writing SHA1 reftables.
-
-First ref block
-^^^^^^^^^^^^^^^
-
-The first ref block shares the same block as the file header, and is 24
-bytes smaller than all other blocks in the file. The first block
-immediately begins after the file header, at position 24.
-
-If the first block is a log block (a log-only file), its block header
-begins immediately at position 24.
-
-Ref block format
-^^^^^^^^^^^^^^^^
-
-A ref block is written as:
-
-....
-'r'
-uint24( block_len )
-ref_record+
-uint24( restart_offset )+
-uint16( restart_count )
-
-padding?
-....
-
-Blocks begin with `block_type = 'r'` and a 3-byte `block_len` which
-encodes the number of bytes in the block up to, but not including the
-optional `padding`. This is always less than or equal to the file's
-block size. In the first ref block, `block_len` includes 24 bytes for
-the file header.
-
-The 2-byte `restart_count` stores the number of entries in the
-`restart_offset` list, which must not be empty. Readers can use
-`restart_count` to binary search between restarts before starting a
-linear scan.
-
-Exactly `restart_count` 3-byte `restart_offset` values precedes the
-`restart_count`. Offsets are relative to the start of the block and
-refer to the first byte of any `ref_record` whose name has not been
-prefix compressed. Entries in the `restart_offset` list must be sorted,
-ascending. Readers can start linear scans from any of these records.
-
-A variable number of `ref_record` fill the middle of the block,
-describing reference names and values. The format is described below.
-
-As the first ref block shares the first file block with the file header,
-all `restart_offset` in the first block are relative to the start of the
-file (position 0), and include the file header. This forces the first
-`restart_offset` to be `28`.
-
-ref record
-++++++++++
-
-A `ref_record` describes a single reference, storing both the name and
-its value(s). Records are formatted as:
-
-....
-varint( prefix_length )
-varint( (suffix_length << 3) | value_type )
-suffix
-varint( update_index_delta )
-value?
-....
-
-The `prefix_length` field specifies how many leading bytes of the prior
-reference record's name should be copied to obtain this reference's
-name. This must be 0 for the first reference in any block, and also must
-be 0 for any `ref_record` whose offset is listed in the `restart_offset`
-table at the end of the block.
-
-Recovering a reference name from any `ref_record` is a simple concat:
-
-....
-this_name = prior_name[0..prefix_length] + suffix
-....
-
-The `suffix_length` value provides the number of bytes available in
-`suffix` to copy from `suffix` to complete the reference name.
-
-The `update_index` that last modified the reference can be obtained by
-adding `update_index_delta` to the `min_update_index` from the file
-header: `min_update_index + update_index_delta`.
-
-The `value` follows. Its format is determined by `value_type`, one of
-the following:
-
-* `0x0`: deletion; no value data (see transactions, below)
-* `0x1`: one object name; value of the ref
-* `0x2`: two object names; value of the ref, peeled target
-* `0x3`: symbolic reference: `varint( target_len ) target`
-
-Symbolic references use `0x3`, followed by the complete name of the
-reference target. No compression is applied to the target name.
-
-Types `0x4..0x7` are reserved for future use.
-
-Ref index
-^^^^^^^^^
-
-The ref index stores the name of the last reference from every ref block
-in the file, enabling reduced disk seeks for lookups. Any reference can
-be found by searching the index, identifying the containing block, and
-searching within that block.
-
-The index may be organized into a multi-level index, where the 1st level
-index block points to additional ref index blocks (2nd level), which may
-in turn point to either additional index blocks (e.g. 3rd level) or ref
-blocks (leaf level). Disk reads required to access a ref go up with
-higher index levels. Multi-level indexes may be required to ensure no
-single index block exceeds the file format's max block size of
-`16777215` bytes (15.99 MiB). To achieve constant O(1) disk seeks for
-lookups the index must be a single level, which is permitted to exceed
-the file's configured block size, but not the format's max block size of
-15.99 MiB.
-
-If present, the ref index block(s) appears after the last ref block.
-
-If there are at least 4 ref blocks, a ref index block should be written
-to improve lookup times. Cold reads using the index require 2 disk reads
-(read index, read block), and binary searching < 4 blocks also requires
-<= 2 reads. Omitting the index block from smaller files saves space.
-
-If the file is unaligned and contains more than one ref block, the ref
-index must be written.
-
-Index block format:
-
-....
-'i'
-uint24( block_len )
-index_record+
-uint24( restart_offset )+
-uint16( restart_count )
-
-padding?
-....
-
-The index blocks begin with `block_type = 'i'` and a 3-byte `block_len`
-which encodes the number of bytes in the block, up to but not including
-the optional `padding`.
-
-The `restart_offset` and `restart_count` fields are identical in format,
-meaning and usage as in ref blocks.
-
-To reduce the number of reads required for random access in very large
-files the index block may be larger than other blocks. However, readers
-must hold the entire index in memory to benefit from this, so it's a
-time-space tradeoff in both file size and reader memory.
-
-Increasing the file's block size decreases the index size. Alternatively
-a multi-level index may be used, keeping index blocks within the file's
-block size, but increasing the number of blocks that need to be
-accessed.
-
-index record
-++++++++++++
-
-An index record describes the last entry in another block. Index records
-are written as:
-
-....
-varint( prefix_length )
-varint( (suffix_length << 3) | 0 )
-suffix
-varint( block_position )
-....
-
-Index records use prefix compression exactly like `ref_record`.
-
-Index records store `block_position` after the suffix, specifying the
-absolute position in bytes (from the start of the file) of the block
-that ends with this reference. Readers can seek to `block_position` to
-begin reading the block header.
-
-Readers must examine the block header at `block_position` to determine
-if the next block is another level index block, or the leaf-level ref
-block.
-
-Reading the index
-+++++++++++++++++
-
-Readers loading the ref index must first read the footer (below) to
-obtain `ref_index_position`. If not present, the position will be 0. The
-`ref_index_position` is for the 1st level root of the ref index.
-
-Obj block format
-^^^^^^^^^^^^^^^^
-
-Object blocks are optional. Writers may choose to omit object blocks,
-especially if readers will not use the object name to ref mapping.
-
-Object blocks use unique, abbreviated 2-32 object name keys, mapping to
-ref blocks containing references pointing to that object directly, or as
-the peeled value of an annotated tag. Like ref blocks, object blocks use
-the file's standard block size. The abbrevation length is available in
-the footer as `obj_id_len`.
-
-To save space in small files, object blocks may be omitted if the ref
-index is not present, as brute force search will only need to read a few
-ref blocks. When missing, readers should brute force a linear search of
-all references to lookup by object name.
-
-An object block is written as:
-
-....
-'o'
-uint24( block_len )
-obj_record+
-uint24( restart_offset )+
-uint16( restart_count )
-
-padding?
-....
-
-Fields are identical to ref block. Binary search using the restart table
-works the same as in reference blocks.
-
-Because object names are abbreviated by writers to the shortest unique
-abbreviation within the reftable, obj key lengths have a variable length. Their
-length must be at least 2 bytes. Readers must compare only for common prefix
-match within an obj block or obj index.
-
-obj record
-++++++++++
-
-An `obj_record` describes a single object abbreviation, and the blocks
-containing references using that unique abbreviation:
-
-....
-varint( prefix_length )
-varint( (suffix_length << 3) | cnt_3 )
-suffix
-varint( cnt_large )?
-varint( position_delta )*
-....
-
-Like in reference blocks, abbreviations are prefix compressed within an
-obj block. On large reftables with many unique objects, higher block
-sizes (64k), and higher restart interval (128), a `prefix_length` of 2
-or 3 and `suffix_length` of 3 may be common in obj records (unique
-abbreviation of 5-6 raw bytes, 10-12 hex digits).
-
-Each record contains `position_count` number of positions for matching
-ref blocks. For 1-7 positions the count is stored in `cnt_3`. When
-`cnt_3 = 0` the actual count follows in a varint, `cnt_large`.
-
-The use of `cnt_3` bets most objects are pointed to by only a single
-reference, some may be pointed to by a couple of references, and very
-few (if any) are pointed to by more than 7 references.
-
-A special case exists when `cnt_3 = 0` and `cnt_large = 0`: there are no
-`position_delta`, but at least one reference starts with this
-abbreviation. A reader that needs exact reference names must scan all
-references to find which specific references have the desired object.
-Writers should use this format when the `position_delta` list would have
-overflowed the file's block size due to a high number of references
-pointing to the same object.
-
-The first `position_delta` is the position from the start of the file.
-Additional `position_delta` entries are sorted ascending and relative to
-the prior entry, e.g. a reader would perform:
-
-....
-pos = position_delta[0]
-prior = pos
-for (j = 1; j < position_count; j++) {
-  pos = prior + position_delta[j]
-  prior = pos
-}
-....
-
-With a position in hand, a reader must linearly scan the ref block,
-starting from the first `ref_record`, testing each reference's object names
-(for `value_type = 0x1` or `0x2`) for full equality. Faster searching by
-object name within a single ref block is not supported by the reftable format.
-Smaller block sizes reduce the number of candidates this step must
-consider.
-
-Obj index
-^^^^^^^^^
-
-The obj index stores the abbreviation from the last entry for every obj
-block in the file, enabling reduced disk seeks for all lookups. It is
-formatted exactly the same as the ref index, but refers to obj blocks.
-
-The obj index should be present if obj blocks are present, as obj blocks
-should only be written in larger files.
-
-Readers loading the obj index must first read the footer (below) to
-obtain `obj_index_position`. If not present, the position will be 0.
-
-Log block format
-^^^^^^^^^^^^^^^^
-
-Unlike ref and obj blocks, log blocks are always unaligned.
-
-Log blocks are variable in size, and do not match the `block_size`
-specified in the file header or footer. Writers should choose an
-appropriate buffer size to prepare a log block for deflation, such as
-`2 * block_size`.
-
-A log block is written as:
-
-....
-'g'
-uint24( block_len )
-zlib_deflate {
-  log_record+
-  uint24( restart_offset )+
-  uint16( restart_count )
-}
-....
-
-Log blocks look similar to ref blocks, except `block_type = 'g'`.
-
-The 4-byte block header is followed by the deflated block contents using
-zlib deflate. The `block_len` in the header is the inflated size
-(including 4-byte block header), and should be used by readers to
-preallocate the inflation output buffer. A log block's `block_len` may
-exceed the file's block size.
-
-Offsets within the log block (e.g. `restart_offset`) still include the
-4-byte header. Readers may prefer prefixing the inflation output buffer
-with the 4-byte header.
-
-Within the deflate container, a variable number of `log_record` describe
-reference changes. The log record format is described below. See ref
-block format (above) for a description of `restart_offset` and
-`restart_count`.
-
-Because log blocks have no alignment or padding between blocks, readers
-must keep track of the bytes consumed by the inflater to know where the
-next log block begins.
-
-log record
-++++++++++
-
-Log record keys are structured as:
-
-....
-ref_name '\0' reverse_int64( update_index )
-....
-
-where `update_index` is the unique transaction identifier. The
-`update_index` field must be unique within the scope of a `ref_name`.
-See the update transactions section below for further details.
-
-The `reverse_int64` function inverses the value so lexicographical
-ordering the network byte order encoding sorts the more recent records
-with higher `update_index` values first:
-
-....
-reverse_int64(int64 t) {
-  return 0xffffffffffffffff - t;
-}
-....
-
-Log records have a similar starting structure to ref and index records,
-utilizing the same prefix compression scheme applied to the log record
-key described above.
-
-....
-    varint( prefix_length )
-    varint( (suffix_length << 3) | log_type )
-    suffix
-    log_data {
-      old_id
-      new_id
-      varint( name_length    )  name
-      varint( email_length   )  email
-      varint( time_seconds )
-      sint16( tz_offset )
-      varint( message_length )  message
-    }?
-....
-
-Log record entries use `log_type` to indicate what follows:
-
-* `0x0`: deletion; no log data.
-* `0x1`: standard git reflog data using `log_data` above.
-
-The `log_type = 0x0` is mostly useful for `git stash drop`, removing an
-entry from the reflog of `refs/stash` in a transaction file (below),
-without needing to rewrite larger files. Readers reading a stack of
-reflogs must treat this as a deletion.
-
-For `log_type = 0x1`, the `log_data` section follows
-linkgit:git-update-ref[1] logging and includes:
-
-* two object names (old id, new id)
-* varint string of committer's name
-* varint string of committer's email
-* varint time in seconds since epoch (Jan 1, 1970)
-* 2-byte timezone offset in minutes (signed)
-* varint string of message
-
-`tz_offset` is the absolute number of minutes from GMT the committer was
-at the time of the update. For example `GMT-0800` is encoded in reftable
-as `sint16(-480)` and `GMT+0230` is `sint16(150)`.
-
-The committer email does not contain `<` or `>`, it's the value normally
-found between the `<>` in a git commit object header.
-
-The `message_length` may be 0, in which case there was no message
-supplied for the update.
-
-Contrary to traditional reflog (which is a file), renames are encoded as
-a combination of ref deletion and ref creation.  A deletion is a log
-record with a zero new_id, and a creation is a log record with a zero old_id.
-
-Reading the log
-+++++++++++++++
-
-Readers accessing the log must first read the footer (below) to
-determine the `log_position`. The first block of the log begins at
-`log_position` bytes since the start of the file. The `log_position` is
-not block aligned.
-
-Importing logs
-++++++++++++++
-
-When importing from `$GIT_DIR/logs` writers should globally order all
-log records roughly by timestamp while preserving file order, and assign
-unique, increasing `update_index` values for each log line. Newer log
-records get higher `update_index` values.
-
-Although an import may write only a single reftable file, the reftable
-file must span many unique `update_index`, as each log line requires its
-own `update_index` to preserve semantics.
-
-Log index
-^^^^^^^^^
-
-The log index stores the log key
-(`refname \0 reverse_int64(update_index)`) for the last log record of
-every log block in the file, supporting bounded-time lookup.
-
-A log index block must be written if 2 or more log blocks are written to
-the file. If present, the log index appears after the last log block.
-There is no padding used to align the log index to block alignment.
-
-Log index format is identical to ref index, except the keys are 9 bytes
-longer to include `'\0'` and the 8-byte `reverse_int64(update_index)`.
-Records use `block_position` to refer to the start of a log block.
-
-Reading the index
-+++++++++++++++++
-
-Readers loading the log index must first read the footer (below) to
-obtain `log_index_position`. If not present, the position will be 0.
-
-Footer
-^^^^^^
-
-After the last block of the file, a file footer is written. It begins
-like the file header, but is extended with additional data.
-
-....
-    HEADER
-
-    uint64( ref_index_position )
-    uint64( (obj_position << 5) | obj_id_len )
-    uint64( obj_index_position )
-
-    uint64( log_position )
-    uint64( log_index_position )
-
-    uint32( CRC-32 of above )
-....
-
-If a section is missing (e.g. ref index) the corresponding position
-field (e.g. `ref_index_position`) will be 0.
-
-* `obj_position`: byte position for the first obj block.
-* `obj_id_len`: number of bytes used to abbreviate object names in
-obj blocks.
-* `log_position`: byte position for the first log block.
-* `ref_index_position`: byte position for the start of the ref index.
-* `obj_index_position`: byte position for the start of the obj index.
-* `log_index_position`: byte position for the start of the log index.
-
-The size of the footer is 68 bytes for version 1, and 72 bytes for
-version 2.
-
-Reading the footer
-++++++++++++++++++
-
-Readers must first read the file start to determine the version
-number. Then they seek to `file_length - FOOTER_LENGTH` to access the
-footer. A trusted external source (such as `stat(2)`) is necessary to
-obtain `file_length`. When reading the footer, readers must verify:
-
-* 4-byte magic is correct
-* 1-byte version number is recognized
-* 4-byte CRC-32 matches the other 64 bytes (including magic, and
-version)
-
-Once verified, the other fields of the footer can be accessed.
-
-Empty tables
-++++++++++++
-
-A reftable may be empty. In this case, the file starts with a header
-and is immediately followed by a footer.
-
-Binary search
-^^^^^^^^^^^^^
-
-Binary search within a block is supported by the `restart_offset` fields
-at the end of the block. Readers can binary search through the restart
-table to locate between which two restart points the sought reference or
-key should appear.
-
-Each record identified by a `restart_offset` stores the complete key in
-the `suffix` field of the record, making the compare operation during
-binary search straightforward.
-
-Once a restart point lexicographically before the sought reference has
-been identified, readers can linearly scan through the following record
-entries to locate the sought record, terminating if the current record
-sorts after (and therefore the sought key is not present).
-
-Restart point selection
-+++++++++++++++++++++++
-
-Writers determine the restart points at file creation. The process is
-arbitrary, but every 16 or 64 records is recommended. Every 16 may be
-more suitable for smaller block sizes (4k or 8k), every 64 for larger
-block sizes (64k).
-
-More frequent restart points reduces prefix compression and increases
-space consumed by the restart table, both of which increase file size.
-
-Less frequent restart points makes prefix compression more effective,
-decreasing overall file size, with increased penalties for readers
-walking through more records after the binary search step.
-
-A maximum of `65535` restart points per block is supported.
-
-Considerations
-~~~~~~~~~~~~~~
-
-Lightweight refs dominate
-^^^^^^^^^^^^^^^^^^^^^^^^^
-
-The reftable format assumes the vast majority of references are single
-object names valued with common prefixes, such as Gerrit Code Review's
-`refs/changes/` namespace, GitHub's `refs/pulls/` namespace, or many
-lightweight tags in the `refs/tags/` namespace.
-
-Annotated tags storing the peeled object cost an additional object name per
-reference.
-
-Low overhead
-^^^^^^^^^^^^
-
-A reftable with very few references (e.g. git.git with 5 heads) is 269
-bytes for reftable, vs. 332 bytes for packed-refs. This supports
-reftable scaling down for transaction logs (below).
-
-Block size
-^^^^^^^^^^
-
-For a Gerrit Code Review type repository with many change refs, larger
-block sizes (64 KiB) and less frequent restart points (every 64) yield
-better compression due to more references within the block compressing
-against the prior reference.
-
-Larger block sizes reduce the index size, as the reftable will require
-fewer blocks to store the same number of references.
-
-Minimal disk seeks
-^^^^^^^^^^^^^^^^^^
-
-Assuming the index block has been loaded into memory, binary searching
-for any single reference requires exactly 1 disk seek to load the
-containing block.
-
-Scans and lookups dominate
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Scanning all references and lookup by name (or namespace such as
-`refs/heads/`) are the most common activities performed on repositories.
-Object names are stored directly with references to optimize this use case.
-
-Logs are infrequently read
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Logs are infrequently accessed, but can be large. Deflating log blocks
-saves disk space, with some increased penalty at read time.
-
-Logs are stored in an isolated section from refs, reducing the burden on
-reference readers that want to ignore logs. Further, historical logs can
-be isolated into log-only files.
-
-Logs are read backwards
-^^^^^^^^^^^^^^^^^^^^^^^
-
-Logs are frequently accessed backwards (most recent N records for master
-to answer `master@{4}`), so log records are grouped by reference, and
-sorted descending by update index.
-
-Repository format
-~~~~~~~~~~~~~~~~~
-
-Version 1
-^^^^^^^^^
-
-A repository must set its `$GIT_DIR/config` to configure reftable:
-
-....
-[core]
-    repositoryformatversion = 1
-[extensions]
-    refStorage = reftable
-....
-
-Layout
-^^^^^^
-
-A collection of reftable files are stored in the `$GIT_DIR/reftable/`
-directory:
-
-....
-00000001-00000001.log
-00000002-00000002.ref
-00000003-00000003.ref
-....
-
-where reftable files are named by a unique name such as produced by the
-function `${min_update_index}-${max_update_index}.ref`.
-
-Log-only files use the `.log` extension, while ref-only and mixed ref
-and log files use `.ref`. extension.
-
-The stack ordering file is `$GIT_DIR/reftable/tables.list` and lists the
-current files, one per line, in order, from oldest (base) to newest
-(most recent):
-
-....
-$ cat .git/reftable/tables.list
-00000001-00000001.log
-00000002-00000002.ref
-00000003-00000003.ref
-....
-
-Readers must read `$GIT_DIR/reftable/tables.list` to determine which
-files are relevant right now, and search through the stack in reverse
-order (last reftable is examined first).
-
-Reftable files not listed in `tables.list` may be new (and about to be
-added to the stack by the active writer), or ancient and ready to be
-pruned.
-
-Backward compatibility
-^^^^^^^^^^^^^^^^^^^^^^
-
-Older clients should continue to recognize the directory as a git
-repository so they don't look for an enclosing repository in parent
-directories. To this end, a reftable-enabled repository must contain the
-following dummy files
-
-* `.git/HEAD`, a regular file containing `ref: refs/heads/.invalid`.
-* `.git/refs/`, a directory
-* `.git/refs/heads`, a regular file
-
-Readers
-^^^^^^^
-
-Readers can obtain a consistent snapshot of the reference space by
-following:
-
-1.  Open and read the `tables.list` file.
-2.  Open each of the reftable files that it mentions.
-3.  If any of the files is missing, goto 1.
-4.  Read from the now-open files as long as necessary.
-
-Update transactions
-^^^^^^^^^^^^^^^^^^^
-
-Although reftables are immutable, mutations are supported by writing a
-new reftable and atomically appending it to the stack:
-
-1.  Acquire `tables.list.lock`.
-2.  Read `tables.list` to determine current reftables.
-3.  Select `update_index` to be most recent file's
-`max_update_index + 1`.
-4.  Prepare temp reftable `tmp_XXXXXX`, including log entries.
-5.  Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`.
-6.  Copy `tables.list` to `tables.list.lock`, appending file from (5).
-7.  Rename `tables.list.lock` to `tables.list`.
-
-During step 4 the new file's `min_update_index` and `max_update_index`
-are both set to the `update_index` selected by step 3. All log records
-for the transaction use the same `update_index` in their keys. This
-enables later correlation of which references were updated by the same
-transaction.
-
-Because a single `tables.list.lock` file is used to manage locking, the
-repository is single-threaded for writers. Writers may have to busy-spin
-(with backoff) around creating `tables.list.lock`, for up to an
-acceptable wait period, aborting if the repository is too busy to
-mutate. Application servers wrapped around repositories (e.g. Gerrit
-Code Review) can layer their own lock/wait queue to improve fairness to
-writers.
-
-Reference deletions
-^^^^^^^^^^^^^^^^^^^
-
-Deletion of any reference can be explicitly stored by setting the `type`
-to `0x0` and omitting the `value` field of the `ref_record`. This serves
-as a tombstone, overriding any assertions about the existence of the
-reference from earlier files in the stack.
-
-Compaction
-^^^^^^^^^^
-
-A partial stack of reftables can be compacted by merging references
-using a straightforward merge join across reftables, selecting the most
-recent value for output, and omitting deleted references that do not
-appear in remaining, lower reftables.
-
-A compacted reftable should set its `min_update_index` to the smallest
-of the input files' `min_update_index`, and its `max_update_index`
-likewise to the largest input `max_update_index`.
-
-For sake of illustration, assume the stack currently consists of
-reftable files (from oldest to newest): A, B, C, and D. The compactor is
-going to compact B and C, leaving A and D alone.
-
-1.  Obtain lock `tables.list.lock` and read the `tables.list` file.
-2.  Obtain locks `B.lock` and `C.lock`. Ownership of these locks
-prevents other processes from trying to compact these files.
-3.  Release `tables.list.lock`.
-4.  Compact `B` and `C` into a temp file
-`${min_update_index}-${max_update_index}_XXXXXX`.
-5.  Reacquire lock `tables.list.lock`.
-6.  Verify that `B` and `C` are still in the stack, in that order. This
-should always be the case, assuming that other processes are adhering to
-the locking protocol.
-7.  Rename `${min_update_index}-${max_update_index}_XXXXXX` to
-`${min_update_index}-${max_update_index}.ref`.
-8.  Write the new stack to `tables.list.lock`, replacing `B` and `C`
-with the file from (4).
-9.  Rename `tables.list.lock` to `tables.list`.
-10. Delete `B` and `C`, perhaps after a short sleep to avoid forcing
-readers to backtrack.
-
-This strategy permits compactions to proceed independently of updates.
-
-Each reftable (compacted or not) is uniquely identified by its name, so
-open reftables can be cached by their name.
-
-Alternatives considered
-~~~~~~~~~~~~~~~~~~~~~~~
-
-bzip packed-refs
-^^^^^^^^^^^^^^^^
-
-`bzip2` can significantly shrink a large packed-refs file (e.g. 62 MiB
-compresses to 23 MiB, 37%). However the bzip format does not support
-random access to a single reference. Readers must inflate and discard
-while performing a linear scan.
-
-Breaking packed-refs into chunks (individually compressing each chunk)
-would reduce the amount of data a reader must inflate, but still leaves
-the problem of indexing chunks to support readers efficiently locating
-the correct chunk.
-
-Given the compression achieved by reftable's encoding, it does not seem
-necessary to add the complexity of bzip/gzip/zlib.
-
-Michael Haggerty's alternate format
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-Michael Haggerty proposed
-link:https://lore.kernel.org/git/CAMy9T_HCnyc1g8XWOOWhe7nN0aEFyyBskV2aOMb_fe%2BwGvEJ7A%40mail.gmail.com/[an
-alternate] format to reftable on the Git mailing list. This format uses
-smaller chunks, without the restart table, and avoids block alignment
-with padding. Reflog entries immediately follow each ref, and are thus
-interleaved between refs.
-
-Performance testing indicates reftable is faster for lookups (51%
-faster, 11.2 usec vs. 5.4 usec), although reftable produces a slightly
-larger file (+ ~3.2%, 28.3M vs 29.2M):
-
-[cols=">,>,>,>",options="header",]
-|=====================================
-|format |size |seek cold |seek hot
-|mh-alt |28.3 M |23.4 usec |11.2 usec
-|reftable |29.2 M |19.9 usec |5.4 usec
-|=====================================
-
-JGit Ketch RefTree
-^^^^^^^^^^^^^^^^^^
-
-https://dev.eclipse.org/mhonarc/lists/jgit-dev/msg03073.html[JGit Ketch]
-proposed
-link:https://lore.kernel.org/git/CAJo%3DhJvnAPNAdDcAAwAvU9C4RVeQdoS3Ev9WTguHx4fD0V_nOg%40mail.gmail.com/[RefTree],
-an encoding of references inside Git tree objects stored as part of the
-repository's object database.
-
-The RefTree format adds additional load on the object database storage
-layer (more loose objects, more objects in packs), and relies heavily on
-the packer's delta compression to save space. Namespaces which are flat
-(e.g. thousands of tags in refs/tags) initially create very large loose
-objects, and so RefTree does not address the problem of copying many
-references to modify a handful.
-
-Flat namespaces are not efficiently searchable in RefTree, as tree
-objects in canonical formatting cannot be binary searched. This fails
-the need to handle a large number of references in a single namespace,
-such as GitHub's `refs/pulls`, or a project with many tags.
-
-LMDB
-^^^^
-
-David Turner proposed
-https://lore.kernel.org/git/1455772670-21142-26-git-send-email-dturner@twopensource.com/[using
-LMDB], as LMDB is lightweight (64k of runtime code) and GPL-compatible
-license.
-
-A downside of LMDB is its reliance on a single C implementation. This
-makes embedding inside JGit (a popular reimplementation of Git)
-difficult, and hoisting onto virtual storage (for JGit DFS) virtually
-impossible.
-
-A common format that can be supported by all major Git implementations
-(git-core, JGit, libgit2) is strongly preferred.