From 1b593e1ea4d2af0f6444d9a7788d5d99abd6fde5 Mon Sep 17 00:00:00 2001 From: Vincent Ambo Date: Sat, 11 Jan 2020 23:36:56 +0000 Subject: Squashed 'third_party/git/' content from commit cb71568594 git-subtree-dir: third_party/git git-subtree-split: cb715685942260375e1eb8153b0768a376e4ece7 --- Documentation/git-fast-import.txt | 1508 +++++++++++++++++++++++++++++++++++++ 1 file changed, 1508 insertions(+) create mode 100644 Documentation/git-fast-import.txt (limited to 'Documentation/git-fast-import.txt') diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt new file mode 100644 index 000000000000..fad327aecc1b --- /dev/null +++ b/Documentation/git-fast-import.txt @@ -0,0 +1,1508 @@ +git-fast-import(1) +================== + +NAME +---- +git-fast-import - Backend for fast Git data importers + + +SYNOPSIS +-------- +[verse] +frontend | 'git fast-import' [] + +DESCRIPTION +----------- +This program is usually not what the end user wants to run directly. +Most end users want to use one of the existing frontend programs, +which parses a specific type of foreign source and feeds the contents +stored there to 'git fast-import'. + +fast-import reads a mixed command/data stream from standard input and +writes one or more packfiles directly into the current repository. +When EOF is received on standard input, fast import writes out +updated branch and tag refs, fully updating the current repository +with the newly imported data. + +The fast-import backend itself can import into an empty repository (one that +has already been initialized by 'git init') or incrementally +update an existing populated repository. Whether or not incremental +imports are supported from a particular foreign source depends on +the frontend program in use. + + +OPTIONS +------- + +--force:: + Force updating modified existing branches, even if doing + so would cause commits to be lost (as the new commit does + not contain the old commit). + +--quiet:: + Disable the output shown by --stats, making fast-import usually + be silent when it is successful. However, if the import stream + has directives intended to show user output (e.g. `progress` + directives), the corresponding messages will still be shown. + +--stats:: + Display some basic statistics about the objects fast-import has + created, the packfiles they were stored into, and the + memory used by fast-import during this run. Showing this output + is currently the default, but can be disabled with --quiet. + +Options for Frontends +~~~~~~~~~~~~~~~~~~~~~ + +--cat-blob-fd=:: + Write responses to `get-mark`, `cat-blob`, and `ls` queries to the + file descriptor instead of `stdout`. Allows `progress` + output intended for the end-user to be separated from other + output. + +--date-format=:: + Specify the type of dates the frontend will supply to + fast-import within `author`, `committer` and `tagger` commands. + See ``Date Formats'' below for details about which formats + are supported, and their syntax. + +--done:: + Terminate with error if there is no `done` command at the end of + the stream. This option might be useful for detecting errors + that cause the frontend to terminate before it has started to + write a stream. + +Locations of Marks Files +~~~~~~~~~~~~~~~~~~~~~~~~ + +--export-marks=:: + Dumps the internal marks table to when complete. + Marks are written one per line as `:markid SHA-1`. + Frontends can use this file to validate imports after they + have been completed, or to save the marks table across + incremental runs. As is only opened and truncated + at checkpoint (or completion) the same path can also be + safely given to --import-marks. + +--import-marks=:: + Before processing any input, load the marks specified in + . The input file must exist, must be readable, and + must use the same format as produced by --export-marks. + Multiple options may be supplied to import more than one + set of marks. If a mark is defined to different values, + the last file wins. + +--import-marks-if-exists=:: + Like --import-marks but instead of erroring out, silently + skips the file if it does not exist. + +--[no-]relative-marks:: + After specifying --relative-marks the paths specified + with --import-marks= and --export-marks= are relative + to an internal directory in the current repository. + In git-fast-import this means that the paths are relative + to the .git/info/fast-import directory. However, other + importers may use a different location. ++ +Relative and non-relative marks may be combined by interweaving +--(no-)-relative-marks with the --(import|export)-marks= options. + +Performance and Compression Tuning +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +--active-branches=:: + Maximum number of branches to maintain active at once. + See ``Memory Utilization'' below for details. Default is 5. + +--big-file-threshold=:: + Maximum size of a blob that fast-import will attempt to + create a delta for, expressed in bytes. The default is 512m + (512 MiB). Some importers may wish to lower this on systems + with constrained memory. + +--depth=:: + Maximum delta depth, for blob and tree deltification. + Default is 50. + +--export-pack-edges=:: + After creating a packfile, print a line of data to + listing the filename of the packfile and the last + commit on each branch that was written to that packfile. + This information may be useful after importing projects + whose total object set exceeds the 4 GiB packfile limit, + as these commits can be used as edge points during calls + to 'git pack-objects'. + +--max-pack-size=:: + Maximum size of each output packfile. + The default is unlimited. + +fastimport.unpackLimit:: + See linkgit:git-config[1] + +PERFORMANCE +----------- +The design of fast-import allows it to import large projects in a minimum +amount of memory usage and processing time. Assuming the frontend +is able to keep up with fast-import and feed it a constant stream of data, +import times for projects holding 10+ years of history and containing +100,000+ individual commits are generally completed in just 1-2 +hours on quite modest (~$2,000 USD) hardware. + +Most bottlenecks appear to be in foreign source data access (the +source just cannot extract revisions fast enough) or disk IO (fast-import +writes as fast as the disk will take the data). Imports will run +faster if the source data is stored on a different drive than the +destination Git repository (due to less IO contention). + + +DEVELOPMENT COST +---------------- +A typical frontend for fast-import tends to weigh in at approximately 200 +lines of Perl/Python/Ruby code. Most developers have been able to +create working importers in just a couple of hours, even though it +is their first exposure to fast-import, and sometimes even to Git. This is +an ideal situation, given that most conversion tools are throw-away +(use once, and never look back). + + +PARALLEL OPERATION +------------------ +Like 'git push' or 'git fetch', imports handled by fast-import are safe to +run alongside parallel `git repack -a -d` or `git gc` invocations, +or any other Git operation (including 'git prune', as loose objects +are never used by fast-import). + +fast-import does not lock the branch or tag refs it is actively importing. +After the import, during its ref update phase, fast-import tests each +existing branch ref to verify the update will be a fast-forward +update (the commit stored in the ref is contained in the new +history of the commit to be written). If the update is not a +fast-forward update, fast-import will skip updating that ref and instead +prints a warning message. fast-import will always attempt to update all +branch refs, and does not stop on the first failure. + +Branch updates can be forced with --force, but it's recommended that +this only be used on an otherwise quiet repository. Using --force +is not necessary for an initial import into an empty repository. + + +TECHNICAL DISCUSSION +-------------------- +fast-import tracks a set of branches in memory. Any branch can be created +or modified at any point during the import process by sending a +`commit` command on the input stream. This design allows a frontend +program to process an unlimited number of branches simultaneously, +generating commits in the order they are available from the source +data. It also simplifies the frontend programs considerably. + +fast-import does not use or alter the current working directory, or any +file within it. (It does however update the current Git repository, +as referenced by `GIT_DIR`.) Therefore an import frontend may use +the working directory for its own purposes, such as extracting file +revisions from the foreign source. This ignorance of the working +directory also allows fast-import to run very quickly, as it does not +need to perform any costly file update operations when switching +between branches. + +INPUT FORMAT +------------ +With the exception of raw file data (which Git does not interpret) +the fast-import input format is text (ASCII) based. This text based +format simplifies development and debugging of frontend programs, +especially when a higher level language such as Perl, Python or +Ruby is being used. + +fast-import is very strict about its input. Where we say SP below we mean +*exactly* one space. Likewise LF means one (and only one) linefeed +and HT one (and only one) horizontal tab. +Supplying additional whitespace characters will cause unexpected +results, such as branch names or file names with leading or trailing +spaces in their name, or early termination of fast-import when it encounters +unexpected input. + +Stream Comments +~~~~~~~~~~~~~~~ +To aid in debugging frontends fast-import ignores any line that +begins with `#` (ASCII pound/hash) up to and including the line +ending `LF`. A comment line may contain any sequence of bytes +that does not contain an LF and therefore may be used to include +any detailed debugging information that might be specific to the +frontend and useful when inspecting a fast-import data stream. + +Date Formats +~~~~~~~~~~~~ +The following date formats are supported. A frontend should select +the format it will use for this import by passing the format name +in the --date-format= command-line option. + +`raw`:: + This is the Git native format and is `