about summary refs log tree commit diff
path: root/third_party/git/Documentation/technical/hash-function-transition.txt
diff options
context:
space:
mode:
Diffstat (limited to 'third_party/git/Documentation/technical/hash-function-transition.txt')
-rw-r--r--third_party/git/Documentation/technical/hash-function-transition.txt827
1 files changed, 0 insertions, 827 deletions
diff --git a/third_party/git/Documentation/technical/hash-function-transition.txt b/third_party/git/Documentation/technical/hash-function-transition.txt
deleted file mode 100644
index 2ae8fa470a..0000000000
--- a/third_party/git/Documentation/technical/hash-function-transition.txt
+++ /dev/null
@@ -1,827 +0,0 @@
-Git hash function transition
-============================
-
-Objective
----------
-Migrate Git from SHA-1 to a stronger hash function.
-
-Background
-----------
-At its core, the Git version control system is a content addressable
-filesystem. It uses the SHA-1 hash function to name content. For
-example, files, directories, and revisions are referred to by hash
-values unlike in other traditional version control systems where files
-or versions are referred to via sequential numbers. The use of a hash
-function to address its content delivers a few advantages:
-
-* Integrity checking is easy. Bit flips, for example, are easily
-  detected, as the hash of corrupted content does not match its name.
-* Lookup of objects is fast.
-
-Using a cryptographically secure hash function brings additional
-advantages:
-
-* Object names can be signed and third parties can trust the hash to
-  address the signed object and all objects it references.
-* Communication using Git protocol and out of band communication
-  methods have a short reliable string that can be used to reliably
-  address stored content.
-
-Over time some flaws in SHA-1 have been discovered by security
-researchers. On 23 February 2017 the SHAttered attack
-(https://shattered.io) demonstrated a practical SHA-1 hash collision.
-
-Git v2.13.0 and later subsequently moved to a hardened SHA-1
-implementation by default, which isn't vulnerable to the SHAttered
-attack.
-
-Thus Git has in effect already migrated to a new hash that isn't SHA-1
-and doesn't share its vulnerabilities, its new hash function just
-happens to produce exactly the same output for all known inputs,
-except two PDFs published by the SHAttered researchers, and the new
-implementation (written by those researchers) claims to detect future
-cryptanalytic collision attacks.
-
-Regardless, it's considered prudent to move past any variant of SHA-1
-to a new hash. There's no guarantee that future attacks on SHA-1 won't
-be published in the future, and those attacks may not have viable
-mitigations.
-
-If SHA-1 and its variants were to be truly broken, Git's hash function
-could not be considered cryptographically secure any more. This would
-impact the communication of hash values because we could not trust
-that a given hash value represented the known good version of content
-that the speaker intended.
-
-SHA-1 still possesses the other properties such as fast object lookup
-and safe error checking, but other hash functions are equally suitable
-that are believed to be cryptographically secure.
-
-Goals
------
-1. The transition to SHA-256 can be done one local repository at a time.
-   a. Requiring no action by any other party.
-   b. A SHA-256 repository can communicate with SHA-1 Git servers
-      (push/fetch).
-   c. Users can use SHA-1 and SHA-256 identifiers for objects
-      interchangeably (see "Object names on the command line", below).
-   d. New signed objects make use of a stronger hash function than
-      SHA-1 for their security guarantees.
-2. Allow a complete transition away from SHA-1.
-   a. Local metadata for SHA-1 compatibility can be removed from a
-      repository if compatibility with SHA-1 is no longer needed.
-3. Maintainability throughout the process.
-   a. The object format is kept simple and consistent.
-   b. Creation of a generalized repository conversion tool.
-
-Non-Goals
----------
-1. Add SHA-256 support to Git protocol. This is valuable and the
-   logical next step but it is out of scope for this initial design.
-2. Transparently improving the security of existing SHA-1 signed
-   objects.
-3. Intermixing objects using multiple hash functions in a single
-   repository.
-4. Taking the opportunity to fix other bugs in Git's formats and
-   protocols.
-5. Shallow clones and fetches into a SHA-256 repository. (This will
-   change when we add SHA-256 support to Git protocol.)
-6. Skip fetching some submodules of a project into a SHA-256
-   repository. (This also depends on SHA-256 support in Git
-   protocol.)
-
-Overview
---------
-We introduce a new repository format extension. Repositories with this
-extension enabled use SHA-256 instead of SHA-1 to name their objects.
-This affects both object names and object content --- both the names
-of objects and all references to other objects within an object are
-switched to the new hash function.
-
-SHA-256 repositories cannot be read by older versions of Git.
-
-Alongside the packfile, a SHA-256 repository stores a bidirectional
-mapping between SHA-256 and SHA-1 object names. The mapping is generated
-locally and can be verified using "git fsck". Object lookups use this
-mapping to allow naming objects using either their SHA-1 and SHA-256 names
-interchangeably.
-
-"git cat-file" and "git hash-object" gain options to display an object
-in its sha1 form and write an object given its sha1 form. This
-requires all objects referenced by that object to be present in the
-object database so that they can be named using the appropriate name
-(using the bidirectional hash mapping).
-
-Fetches from a SHA-1 based server convert the fetched objects into
-SHA-256 form and record the mapping in the bidirectional mapping table
-(see below for details). Pushes to a SHA-1 based server convert the
-objects being pushed into sha1 form so the server does not have to be
-aware of the hash function the client is using.
-
-Detailed Design
----------------
-Repository format extension
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
-A SHA-256 repository uses repository format version `1` (see
-Documentation/technical/repository-version.txt) with extensions
-`objectFormat` and `compatObjectFormat`:
-
-	[core]
-		repositoryFormatVersion = 1
-	[extensions]
-		objectFormat = sha256
-		compatObjectFormat = sha1
-
-The combination of setting `core.repositoryFormatVersion=1` and
-populating `extensions.*` ensures that all versions of Git later than
-`v0.99.9l` will die instead of trying to operate on the SHA-256
-repository, instead producing an error message.
-
-	# Between v0.99.9l and v2.7.0
-	$ git status
-	fatal: Expected git repo version <= 0, found 1
-	# After v2.7.0
-	$ git status
-	fatal: unknown repository extensions found:
-		objectformat
-		compatobjectformat
-
-See the "Transition plan" section below for more details on these
-repository extensions.
-
-Object names
-~~~~~~~~~~~~
-Objects can be named by their 40 hexadecimal digit sha1-name or 64
-hexadecimal digit sha256-name, plus names derived from those (see
-gitrevisions(7)).
-
-The sha1-name of an object is the SHA-1 of the concatenation of its
-type, length, a nul byte, and the object's sha1-content. This is the
-traditional <sha1> used in Git to name objects.
-
-The sha256-name of an object is the SHA-256 of the concatenation of its
-type, length, a nul byte, and the object's sha256-content.
-
-Object format
-~~~~~~~~~~~~~
-The content as a byte sequence of a tag, commit, or tree object named
-by sha1 and sha256 differ because an object named by sha256-name refers to
-other objects by their sha256-names and an object named by sha1-name
-refers to other objects by their sha1-names.
-
-The sha256-content of an object is the same as its sha1-content, except
-that objects referenced by the object are named using their sha256-names
-instead of sha1-names. Because a blob object does not refer to any
-other object, its sha1-content and sha256-content are the same.
-
-The format allows round-trip conversion between sha256-content and
-sha1-content.
-
-Object storage
-~~~~~~~~~~~~~~
-Loose objects use zlib compression and packed objects use the packed
-format described in Documentation/technical/pack-format.txt, just like
-today. The content that is compressed and stored uses sha256-content
-instead of sha1-content.
-
-Pack index
-~~~~~~~~~~
-Pack index (.idx) files use a new v3 format that supports multiple
-hash functions. They have the following format (all integers are in
-network byte order):
-
-- A header appears at the beginning and consists of the following:
-  - The 4-byte pack index signature: '\377t0c'
-  - 4-byte version number: 3
-  - 4-byte length of the header section, including the signature and
-    version number
-  - 4-byte number of objects contained in the pack
-  - 4-byte number of object formats in this pack index: 2
-  - For each object format:
-    - 4-byte format identifier (e.g., 'sha1' for SHA-1)
-    - 4-byte length in bytes of shortened object names. This is the
-      shortest possible length needed to make names in the shortened
-      object name table unambiguous.
-    - 4-byte integer, recording where tables relating to this format
-      are stored in this index file, as an offset from the beginning.
-  - 4-byte offset to the trailer from the beginning of this file.
-  - Zero or more additional key/value pairs (4-byte key, 4-byte
-    value). Only one key is supported: 'PSRC'. See the "Loose objects
-    and unreachable objects" section for supported values and how this
-    is used.  All other keys are reserved. Readers must ignore
-    unrecognized keys.
-- Zero or more NUL bytes. This can optionally be used to improve the
-  alignment of the full object name table below.
-- Tables for the first object format:
-  - A sorted table of shortened object names.  These are prefixes of
-    the names of all objects in this pack file, packed together
-    without offset values to reduce the cache footprint of the binary
-    search for a specific object name.
-
-  - A table of full object names in pack order. This allows resolving
-    a reference to "the nth object in the pack file" (from a
-    reachability bitmap or from the next table of another object
-    format) to its object name.
-
-  - A table of 4-byte values mapping object name order to pack order.
-    For an object in the table of sorted shortened object names, the
-    value at the corresponding index in this table is the index in the
-    previous table for that same object.
-
-    This can be used to look up the object in reachability bitmaps or
-    to look up its name in another object format.
-
-  - A table of 4-byte CRC32 values of the packed object data, in the
-    order that the objects appear in the pack file. This is to allow
-    compressed data to be copied directly from pack to pack during
-    repacking without undetected data corruption.
-
-  - A table of 4-byte offset values. For an object in the table of
-    sorted shortened object names, the value at the corresponding
-    index in this table indicates where that object can be found in
-    the pack file. These are usually 31-bit pack file offsets, but
-    large offsets are encoded as an index into the next table with the
-    most significant bit set.
-
-  - A table of 8-byte offset entries (empty for pack files less than
-    2 GiB). Pack files are organized with heavily used objects toward
-    the front, so most object references should not need to refer to
-    this table.
-- Zero or more NUL bytes.
-- Tables for the second object format, with the same layout as above,
-  up to and not including the table of CRC32 values.
-- Zero or more NUL bytes.
-- The trailer consists of the following:
-  - A copy of the 20-byte SHA-256 checksum at the end of the
-    corresponding packfile.
-
-  - 20-byte SHA-256 checksum of all of the above.
-
-Loose object index
-~~~~~~~~~~~~~~~~~~
-A new file $GIT_OBJECT_DIR/loose-object-idx contains information about
-all loose objects. Its format is
-
-  # loose-object-idx
-  (sha256-name SP sha1-name LF)*
-
-where the object names are in hexadecimal format. The file is not
-sorted.
-
-The loose object index is protected against concurrent writes by a
-lock file $GIT_OBJECT_DIR/loose-object-idx.lock. To add a new loose
-object:
-
-1. Write the loose object to a temporary file, like today.
-2. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the lock.
-3. Rename the loose object into place.
-4. Open loose-object-idx with O_APPEND and write the new object
-5. Unlink loose-object-idx.lock to release the lock.
-
-To remove entries (e.g. in "git pack-refs" or "git-prune"):
-
-1. Open loose-object-idx.lock with O_CREAT | O_EXCL to acquire the
-   lock.
-2. Write the new content to loose-object-idx.lock.
-3. Unlink any loose objects being removed.
-4. Rename to replace loose-object-idx, releasing the lock.
-
-Translation table
-~~~~~~~~~~~~~~~~~
-The index files support a bidirectional mapping between sha1-names
-and sha256-names. The lookup proceeds similarly to ordinary object
-lookups. For example, to convert a sha1-name to a sha256-name:
-
- 1. Look for the object in idx files. If a match is present in the
-    idx's sorted list of truncated sha1-names, then:
-    a. Read the corresponding entry in the sha1-name order to pack
-       name order mapping.
-    b. Read the corresponding entry in the full sha1-name table to
-       verify we found the right object. If it is, then
-    c. Read the corresponding entry in the full sha256-name table.
-       That is the object's sha256-name.
- 2. Check for a loose object. Read lines from loose-object-idx until
-    we find a match.
-
-Step (1) takes the same amount of time as an ordinary object lookup:
-O(number of packs * log(objects per pack)). Step (2) takes O(number of
-loose objects) time. To maintain good performance it will be necessary
-to keep the number of loose objects low. See the "Loose objects and
-unreachable objects" section below for more details.
-
-Since all operations that make new objects (e.g., "git commit") add
-the new objects to the corresponding index, this mapping is possible
-for all objects in the object store.
-
-Reading an object's sha1-content
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The sha1-content of an object can be read by converting all sha256-names
-its sha256-content references to sha1-names using the translation table.
-
-Fetch
-~~~~~
-Fetching from a SHA-1 based server requires translating between SHA-1
-and SHA-256 based representations on the fly.
-
-SHA-1s named in the ref advertisement that are present on the client
-can be translated to SHA-256 and looked up as local objects using the
-translation table.
-
-Negotiation proceeds as today. Any "have"s generated locally are
-converted to SHA-1 before being sent to the server, and SHA-1s
-mentioned by the server are converted to SHA-256 when looking them up
-locally.
-
-After negotiation, the server sends a packfile containing the
-requested objects. We convert the packfile to SHA-256 format using
-the following steps:
-
-1. index-pack: inflate each object in the packfile and compute its
-   SHA-1. Objects can contain deltas in OBJ_REF_DELTA format against
-   objects the client has locally. These objects can be looked up
-   using the translation table and their sha1-content read as
-   described above to resolve the deltas.
-2. topological sort: starting at the "want"s from the negotiation
-   phase, walk through objects in the pack and emit a list of them,
-   excluding blobs, in reverse topologically sorted order, with each
-   object coming later in the list than all objects it references.
-   (This list only contains objects reachable from the "wants". If the
-   pack from the server contained additional extraneous objects, then
-   they will be discarded.)
-3. convert to sha256: open a new (sha256) packfile. Read the topologically
-   sorted list just generated. For each object, inflate its
-   sha1-content, convert to sha256-content, and write it to the sha256
-   pack. Record the new sha1<->sha256 mapping entry for use in the idx.
-4. sort: reorder entries in the new pack to match the order of objects
-   in the pack the server generated and include blobs. Write a sha256 idx
-   file
-5. clean up: remove the SHA-1 based pack file, index, and
-   topologically sorted list obtained from the server in steps 1
-   and 2.
-
-Step 3 requires every object referenced by the new object to be in the
-translation table. This is why the topological sort step is necessary.
-
-As an optimization, step 1 could write a file describing what non-blob
-objects each object it has inflated from the packfile references. This
-makes the topological sort in step 2 possible without inflating the
-objects in the packfile for a second time. The objects need to be
-inflated again in step 3, for a total of two inflations.
-
-Step 4 is probably necessary for good read-time performance. "git
-pack-objects" on the server optimizes the pack file for good data
-locality (see Documentation/technical/pack-heuristics.txt).
-
-Details of this process are likely to change. It will take some
-experimenting to get this to perform well.
-
-Push
-~~~~
-Push is simpler than fetch because the objects referenced by the
-pushed objects are already in the translation table. The sha1-content
-of each object being pushed can be read as described in the "Reading
-an object's sha1-content" section to generate the pack written by git
-send-pack.
-
-Signed Commits
-~~~~~~~~~~~~~~
-We add a new field "gpgsig-sha256" to the commit object format to allow
-signing commits without relying on SHA-1. It is similar to the
-existing "gpgsig" field. Its signed payload is the sha256-content of the
-commit object with any "gpgsig" and "gpgsig-sha256" fields removed.
-
-This means commits can be signed
-1. using SHA-1 only, as in existing signed commit objects
-2. using both SHA-1 and SHA-256, by using both gpgsig-sha256 and gpgsig
-   fields.
-3. using only SHA-256, by only using the gpgsig-sha256 field.
-
-Old versions of "git verify-commit" can verify the gpgsig signature in
-cases (1) and (2) without modifications and view case (3) as an
-ordinary unsigned commit.
-
-Signed Tags
-~~~~~~~~~~~
-We add a new field "gpgsig-sha256" to the tag object format to allow
-signing tags without relying on SHA-1. Its signed payload is the
-sha256-content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
-SIGNATURE-----" delimited in-body signature removed.
-
-This means tags can be signed
-1. using SHA-1 only, as in existing signed tag objects
-2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
-   signature.
-3. using only SHA-256, by only using the gpgsig-sha256 field.
-
-Mergetag embedding
-~~~~~~~~~~~~~~~~~~
-The mergetag field in the sha1-content of a commit contains the
-sha1-content of a tag that was merged by that commit.
-
-The mergetag field in the sha256-content of the same commit contains the
-sha256-content of the same tag.
-
-Submodules
-~~~~~~~~~~
-To convert recorded submodule pointers, you need to have the converted
-submodule repository in place. The translation table of the submodule
-can be used to look up the new hash.
-
-Loose objects and unreachable objects
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Fast lookups in the loose-object-idx require that the number of loose
-objects not grow too high.
-
-"git gc --auto" currently waits for there to be 6700 loose objects
-present before consolidating them into a packfile. We will need to
-measure to find a more appropriate threshold for it to use.
-
-"git gc --auto" currently waits for there to be 50 packs present
-before combining packfiles. Packing loose objects more aggressively
-may cause the number of pack files to grow too quickly. This can be
-mitigated by using a strategy similar to Martin Fick's exponential
-rolling garbage collection script:
-https://gerrit-review.googlesource.com/c/gerrit/+/35215
-
-"git gc" currently expels any unreachable objects it encounters in
-pack files to loose objects in an attempt to prevent a race when
-pruning them (in case another process is simultaneously writing a new
-object that refers to the about-to-be-deleted object). This leads to
-an explosion in the number of loose objects present and disk space
-usage due to the objects in delta form being replaced with independent
-loose objects.  Worse, the race is still present for loose objects.
-
-Instead, "git gc" will need to move unreachable objects to a new
-packfile marked as UNREACHABLE_GARBAGE (using the PSRC field; see
-below). To avoid the race when writing new objects referring to an
-about-to-be-deleted object, code paths that write new objects will
-need to copy any objects from UNREACHABLE_GARBAGE packs that they
-refer to new, non-UNREACHABLE_GARBAGE packs (or loose objects).
-UNREACHABLE_GARBAGE are then safe to delete if their creation time (as
-indicated by the file's mtime) is long enough ago.
-
-To avoid a proliferation of UNREACHABLE_GARBAGE packs, they can be
-combined under certain circumstances. If "gc.garbageTtl" is set to
-greater than one day, then packs created within a single calendar day,
-UTC, can be coalesced together. The resulting packfile would have an
-mtime before midnight on that day, so this makes the effective maximum
-ttl the garbageTtl + 1 day. If "gc.garbageTtl" is less than one day,
-then we divide the calendar day into intervals one-third of that ttl
-in duration. Packs created within the same interval can be coalesced
-together. The resulting packfile would have an mtime before the end of
-the interval, so this makes the effective maximum ttl equal to the
-garbageTtl * 4/3.
-
-This rule comes from Thirumala Reddy Mutchukota's JGit change
-https://git.eclipse.org/r/90465.
-
-The UNREACHABLE_GARBAGE setting goes in the PSRC field of the pack
-index. More generally, that field indicates where a pack came from:
-
- - 1 (PACK_SOURCE_RECEIVE) for a pack received over the network
- - 2 (PACK_SOURCE_AUTO) for a pack created by a lightweight
-   "gc --auto" operation
- - 3 (PACK_SOURCE_GC) for a pack created by a full gc
- - 4 (PACK_SOURCE_UNREACHABLE_GARBAGE) for potential garbage
-   discovered by gc
- - 5 (PACK_SOURCE_INSERT) for locally created objects that were
-   written directly to a pack file, e.g. from "git add ."
-
-This information can be useful for debugging and for "gc --auto" to
-make appropriate choices about which packs to coalesce.
-
-Caveats
--------
-Invalid objects
-~~~~~~~~~~~~~~~
-The conversion from sha1-content to sha256-content retains any
-brokenness in the original object (e.g., tree entry modes encoded with
-leading 0, tree objects whose paths are not sorted correctly, and
-commit objects without an author or committer). This is a deliberate
-feature of the design to allow the conversion to round-trip.
-
-More profoundly broken objects (e.g., a commit with a truncated "tree"
-header line) cannot be converted but were not usable by current Git
-anyway.
-
-Shallow clone and submodules
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Because it requires all referenced objects to be available in the
-locally generated translation table, this design does not support
-shallow clone or unfetched submodules. Protocol improvements might
-allow lifting this restriction.
-
-Alternates
-~~~~~~~~~~
-For the same reason, a sha256 repository cannot borrow objects from a
-sha1 repository using objects/info/alternates or
-$GIT_ALTERNATE_OBJECT_REPOSITORIES.
-
-git notes
-~~~~~~~~~
-The "git notes" tool annotates objects using their sha1-name as key.
-This design does not describe a way to migrate notes trees to use
-sha256-names. That migration is expected to happen separately (for
-example using a file at the root of the notes tree to describe which
-hash it uses).
-
-Server-side cost
-~~~~~~~~~~~~~~~~
-Until Git protocol gains SHA-256 support, using SHA-256 based storage
-on public-facing Git servers is strongly discouraged. Once Git
-protocol gains SHA-256 support, SHA-256 based servers are likely not
-to support SHA-1 compatibility, to avoid what may be a very expensive
-hash reencode during clone and to encourage peers to modernize.
-
-The design described here allows fetches by SHA-1 clients of a
-personal SHA-256 repository because it's not much more difficult than
-allowing pushes from that repository. This support needs to be guarded
-by a configuration option --- servers like git.kernel.org that serve a
-large number of clients would not be expected to bear that cost.
-
-Meaning of signatures
-~~~~~~~~~~~~~~~~~~~~~
-The signed payload for signed commits and tags does not explicitly
-name the hash used to identify objects. If some day Git adopts a new
-hash function with the same length as the current SHA-1 (40
-hexadecimal digit) or SHA-256 (64 hexadecimal digit) objects then the
-intent behind the PGP signed payload in an object signature is
-unclear:
-
-	object e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7
-	type commit
-	tag v2.12.0
-	tagger Junio C Hamano <gitster@pobox.com> 1487962205 -0800
-
-	Git 2.12
-
-Does this mean Git v2.12.0 is the commit with sha1-name
-e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7 or the commit with
-new-40-digit-hash-name e7e07d5a4fcc2a203d9873968ad3e6bd4d7419d7?
-
-Fortunately SHA-256 and SHA-1 have different lengths. If Git starts
-using another hash with the same length to name objects, then it will
-need to change the format of signed payloads using that hash to
-address this issue.
-
-Object names on the command line
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-To support the transition (see Transition plan below), this design
-supports four different modes of operation:
-
- 1. ("dark launch") Treat object names input by the user as SHA-1 and
-    convert any object names written to output to SHA-1, but store
-    objects using SHA-256.  This allows users to test the code with no
-    visible behavior change except for performance.  This allows
-    allows running even tests that assume the SHA-1 hash function, to
-    sanity-check the behavior of the new mode.
-
- 2. ("early transition") Allow both SHA-1 and SHA-256 object names in
-    input. Any object names written to output use SHA-1. This allows
-    users to continue to make use of SHA-1 to communicate with peers
-    (e.g. by email) that have not migrated yet and prepares for mode 3.
-
- 3. ("late transition") Allow both SHA-1 and SHA-256 object names in
-    input. Any object names written to output use SHA-256. In this
-    mode, users are using a more secure object naming method by
-    default.  The disruption is minimal as long as most of their peers
-    are in mode 2 or mode 3.
-
- 4. ("post-transition") Treat object names input by the user as
-    SHA-256 and write output using SHA-256. This is safer than mode 3
-    because there is less risk that input is incorrectly interpreted
-    using the wrong hash function.
-
-The mode is specified in configuration.
-
-The user can also explicitly specify which format to use for a
-particular revision specifier and for output, overriding the mode. For
-example:
-
-git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256}
-
-Choice of Hash
---------------
-In early 2005, around the time that Git was written,  Xiaoyun Wang,
-Yiqun Lisa Yin, and Hongbo Yu announced an attack finding SHA-1
-collisions in 2^69 operations. In August they published details.
-Luckily, no practical demonstrations of a collision in full SHA-1 were
-published until 10 years later, in 2017.
-
-Git v2.13.0 and later subsequently moved to a hardened SHA-1
-implementation by default that mitigates the SHAttered attack, but
-SHA-1 is still believed to be weak.
-
-The hash to replace this hardened SHA-1 should be stronger than SHA-1
-was: we would like it to be trustworthy and useful in practice for at
-least 10 years.
-
-Some other relevant properties:
-
-1. A 256-bit hash (long enough to match common security practice; not
-   excessively long to hurt performance and disk usage).
-
-2. High quality implementations should be widely available (e.g., in
-   OpenSSL and Apple CommonCrypto).
-
-3. The hash function's properties should match Git's needs (e.g. Git
-   requires collision and 2nd preimage resistance and does not require
-   length extension resistance).
-
-4. As a tiebreaker, the hash should be fast to compute (fortunately
-   many contenders are faster than SHA-1).
-
-We choose SHA-256.
-
-Transition plan
----------------
-Some initial steps can be implemented independently of one another:
-- adding a hash function API (vtable)
-- teaching fsck to tolerate the gpgsig-sha256 field
-- excluding gpgsig-* from the fields copied by "git commit --amend"
-- annotating tests that depend on SHA-1 values with a SHA1 test
-  prerequisite
-- using "struct object_id", GIT_MAX_RAWSZ, and GIT_MAX_HEXSZ
-  consistently instead of "unsigned char *" and the hardcoded
-  constants 20 and 40.
-- introducing index v3
-- adding support for the PSRC field and safer object pruning
-
-
-The first user-visible change is the introduction of the objectFormat
-extension (without compatObjectFormat). This requires:
-- implementing the loose-object-idx
-- teaching fsck about this mode of operation
-- using the hash function API (vtable) when computing object names
-- signing objects and verifying signatures
-- rejecting attempts to fetch from or push to an incompatible
-  repository
-
-Next comes introduction of compatObjectFormat:
-- translating object names between object formats
-- translating object content between object formats
-- generating and verifying signatures in the compat format
-- adding appropriate index entries when adding a new object to the
-  object store
-- --output-format option
-- ^{sha1} and ^{sha256} revision notation
-- configuration to specify default input and output format (see
-  "Object names on the command line" above)
-
-The next step is supporting fetches and pushes to SHA-1 repositories:
-- allow pushes to a repository using the compat format
-- generate a topologically sorted list of the SHA-1 names of fetched
-  objects
-- convert the fetched packfile to sha256 format and generate an idx
-  file
-- re-sort to match the order of objects in the fetched packfile
-
-The infrastructure supporting fetch also allows converting an existing
-repository. In converted repositories and new clones, end users can
-gain support for the new hash function without any visible change in
-behavior (see "dark launch" in the "Object names on the command line"
-section). In particular this allows users to verify SHA-256 signatures
-on objects in the repository, and it should ensure the transition code
-is stable in production in preparation for using it more widely.
-
-Over time projects would encourage their users to adopt the "early
-transition" and then "late transition" modes to take advantage of the
-new, more futureproof SHA-256 object names.
-
-When objectFormat and compatObjectFormat are both set, commands
-generating signatures would generate both SHA-1 and SHA-256 signatures
-by default to support both new and old users.
-
-In projects using SHA-256 heavily, users could be encouraged to adopt
-the "post-transition" mode to avoid accidentally making implicit use
-of SHA-1 object names.
-
-Once a critical mass of users have upgraded to a version of Git that
-can verify SHA-256 signatures and have converted their existing
-repositories to support verifying them, we can add support for a
-setting to generate only SHA-256 signatures. This is expected to be at
-least a year later.
-
-That is also a good moment to advertise the ability to convert
-repositories to use SHA-256 only, stripping out all SHA-1 related
-metadata. This improves performance by eliminating translation
-overhead and security by avoiding the possibility of accidentally
-relying on the safety of SHA-1.
-
-Updating Git's protocols to allow a server to specify which hash
-functions it supports is also an important part of this transition. It
-is not discussed in detail in this document but this transition plan
-assumes it happens. :)
-
-Alternatives considered
------------------------
-Upgrading everyone working on a particular project on a flag day
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Projects like the Linux kernel are large and complex enough that
-flipping the switch for all projects based on the repository at once
-is infeasible.
-
-Not only would all developers and server operators supporting
-developers have to switch on the same flag day, but supporting tooling
-(continuous integration, code review, bug trackers, etc) would have to
-be adapted as well. This also makes it difficult to get early feedback
-from some project participants testing before it is time for mass
-adoption.
-
-Using hash functions in parallel
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-(e.g. https://public-inbox.org/git/22708.8913.864049.452252@chiark.greenend.org.uk/ )
-Objects newly created would be addressed by the new hash, but inside
-such an object (e.g. commit) it is still possible to address objects
-using the old hash function.
-* You cannot trust its history (needed for bisectability) in the
-  future without further work
-* Maintenance burden as the number of supported hash functions grows
-  (they will never go away, so they accumulate). In this proposal, by
-  comparison, converted objects lose all references to SHA-1.
-
-Signed objects with multiple hashes
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Instead of introducing the gpgsig-sha256 field in commit and tag objects
-for sha256-content based signatures, an earlier version of this design
-added "hash sha256 <sha256-name>" fields to strengthen the existing
-sha1-content based signatures.
-
-In other words, a single signature was used to attest to the object
-content using both hash functions. This had some advantages:
-* Using one signature instead of two speeds up the signing process.
-* Having one signed payload with both hashes allows the signer to
-  attest to the sha1-name and sha256-name referring to the same object.
-* All users consume the same signature. Broken signatures are likely
-  to be detected quickly using current versions of git.
-
-However, it also came with disadvantages:
-* Verifying a signed object requires access to the sha1-names of all
-  objects it references, even after the transition is complete and
-  translation table is no longer needed for anything else. To support
-  this, the design added fields such as "hash sha1 tree <sha1-name>"
-  and "hash sha1 parent <sha1-name>" to the sha256-content of a signed
-  commit, complicating the conversion process.
-* Allowing signed objects without a sha1 (for after the transition is
-  complete) complicated the design further, requiring a "nohash sha1"
-  field to suppress including "hash sha1" fields in the sha256-content
-  and signed payload.
-
-Lazily populated translation table
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Some of the work of building the translation table could be deferred to
-push time, but that would significantly complicate and slow down pushes.
-Calculating the sha1-name at object creation time at the same time it is
-being streamed to disk and having its sha256-name calculated should be
-an acceptable cost.
-
-Document History
-----------------
-
-2017-03-03
-bmwill@google.com, jonathantanmy@google.com, jrnieder@gmail.com,
-sbeller@google.com
-
-Initial version sent to
-http://public-inbox.org/git/20170304011251.GA26789@aiede.mtv.corp.google.com
-
-2017-03-03 jrnieder@gmail.com
-Incorporated suggestions from jonathantanmy and sbeller:
-* describe purpose of signed objects with each hash type
-* redefine signed object verification using object content under the
-  first hash function
-
-2017-03-06 jrnieder@gmail.com
-* Use SHA3-256 instead of SHA2 (thanks, Linus and brian m. carlson).[1][2]
-* Make sha3-based signatures a separate field, avoiding the need for
-  "hash" and "nohash" fields (thanks to peff[3]).
-* Add a sorting phase to fetch (thanks to Junio for noticing the need
-  for this).
-* Omit blobs from the topological sort during fetch (thanks to peff).
-* Discuss alternates, git notes, and git servers in the caveats
-  section (thanks to Junio Hamano, brian m. carlson[4], and Shawn
-  Pearce).
-* Clarify language throughout (thanks to various commenters,
-  especially Junio).
-
-2017-09-27 jrnieder@gmail.com, sbeller@google.com
-* use placeholder NewHash instead of SHA3-256
-* describe criteria for picking a hash function.
-* include a transition plan (thanks especially to Brandon Williams
-  for fleshing these ideas out)
-* define the translation table (thanks, Shawn Pearce[5], Jonathan
-  Tan, and Masaya Suzuki)
-* avoid loose object overhead by packing more aggressively in
-  "git gc --auto"
-
-Later history:
-
- See the history of this file in git.git for the history of subsequent
- edits. This document history is no longer being maintained as it
- would now be superfluous to the commit log
-
-[1] http://public-inbox.org/git/CA+55aFzJtejiCjV0e43+9oR3QuJK2PiFiLQemytoLpyJWe6P9w@mail.gmail.com/
-[2] http://public-inbox.org/git/CA+55aFz+gkAsDZ24zmePQuEs1XPS9BP_s8O7Q4wQ7LV7X5-oDA@mail.gmail.com/
-[3] http://public-inbox.org/git/20170306084353.nrns455dvkdsfgo5@sigill.intra.peff.net/
-[4] http://public-inbox.org/git/20170304224936.rqqtkdvfjgyezsht@genre.crustytoothpaste.net
-[5] https://public-inbox.org/git/CAJo=hJtoX9=AyLHHpUJS7fueV9ciZ_MNpnEPHUz8Whui6g9F0A@mail.gmail.com/