Al O'Connor commited on
Commit
5b9a38e
1 Parent(s): 9b7ed79

Add new folders/files

Browse files
Files changed (6) hide show
  1. .DS_Store +0 -0
  2. temp1/README.md +567 -0
  3. temp2/README.md +567 -0
  4. temp3/README.md +567 -0
  5. temp4/README.md +567 -0
  6. temp5/README.md +567 -0
.DS_Store CHANGED
Binary files a/.DS_Store and b/.DS_Store differ
 
temp1/README.md ADDED
@@ -0,0 +1,567 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ git filter-repo is a versatile tool for rewriting history, which includes
2
+ [capabilities I have not found anywhere
3
+ else](#design-rationale-behind-filter-repo). It roughly falls into the
4
+ same space of tool as [git
5
+ filter-branch](https://git-scm.com/docs/git-filter-branch) but without the
6
+ capitulation-inducing poor
7
+ [performance](https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/),
8
+ with far more capabilities, and with a design that scales usability-wise
9
+ beyond trivial rewriting cases. [git filter-repo is now recommended by the
10
+ git project](https://git-scm.com/docs/git-filter-branch#_warning) instead
11
+ of git filter-branch.
12
+
13
+ While most users will probably just use filter-repo as a simple command
14
+ line tool (and likely only use a few of its flags), at its core filter-repo
15
+ contains a library for creating history rewriting tools. As such, users
16
+ with specialized needs can leverage it to quickly create [entirely new
17
+ history rewriting tools](contrib/filter-repo-demos).
18
+
19
+ # Table of Contents
20
+
21
+ * [Prerequisites](#prerequisites)
22
+ * [How do I install it?](#how-do-i-install-it)
23
+ * [How do I use it?](#how-do-i-use-it)
24
+ * [Why filter-repo instead of other alternatives?](#why-filter-repo-instead-of-other-alternatives)
25
+ * [filter-branch](#filter-branch)
26
+ * [BFG Repo Cleaner](#bfg-repo-cleaner)
27
+ * [Simple example, with comparisons](#simple-example-with-comparisons)
28
+ * [Solving this with filter-repo](#solving-this-with-filter-repo)
29
+ * [Solving this with BFG Repo Cleaner](#solving-this-with-bfg-repo-cleaner)
30
+ * [Solving this with filter-branch](#solving-this-with-filter-branch)
31
+ * [Solving this with fast-export/fast-import](#solving-this-with-fast-exportfast-import)
32
+ * [Design rationale behind filter-repo](#design-rationale-behind-filter-repo)
33
+ * [How do I contribute?](#how-do-i-contribute)
34
+ * [Is there a Code of Conduct?](#is-there-a-code-of-conduct)
35
+ * [Upstream Improvements](#upstream-improvements)
36
+
37
+ # Prerequisites
38
+
39
+ filter-repo requires:
40
+
41
+ * git >= 2.22.0 at a minimum; [some features](#upstream-improvements)
42
+ require git >= 2.24.0 or later
43
+ * python3 >= 3.5
44
+
45
+ # How do I install it?
46
+
47
+ `git-filter-repo` is a single-file python script, which was done to make
48
+ installation for basic use on many systems trivial: just place that
49
+ file into your $PATH.
50
+
51
+ See [INSTALL.md](INSTALL.md) for things beyond basic usage or special
52
+ cases. The more involved instructions are only needed if one of the
53
+ following apply:
54
+
55
+ * you do not find the above comment about trivial installation intuitively
56
+ obvious
57
+ * you are working with a python3 executable named something other than
58
+ "python3"
59
+ * you want to install documentation (beyond the builtin docs shown with -h)
60
+ * you want to run some of the [contrib](contrib/filter-repo-demos/) examples
61
+ * you want to create your own python filtering scripts using filter-repo as
62
+ a module/library
63
+
64
+ # How do I use it?
65
+
66
+ For comprehensive documentation:
67
+ * see the [user manual](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html)
68
+ * alternative formating of the user manual is available on various
69
+ external sites
70
+ ([example](https://www.mankier.com/1/git-filter-repo)), for those
71
+ that don't like the htmlpreview.github.io layout, though it may
72
+ only be up-to-date as of the latest release
73
+
74
+ If you prefer learning from examples:
75
+ * there is a [cheat sheet for converting filter-branch
76
+ commands](Documentation/converting-from-filter-branch.md#cheat-sheet-conversion-of-examples-from-the-filter-branch-manpage),
77
+ which covers every example from the filter-branch manual
78
+ * there is a [cheat sheet for converting BFG Repo Cleaner
79
+ commands](Documentation/converting-from-bfg-repo-cleaner.md#cheat-sheet-conversion-of-examples-from-bfg),
80
+ which covers every example from the BFG website
81
+ * the [simple example](#simple-example-with-comparisons) below may
82
+ be of interest
83
+ * the user manual has an extensive [examples
84
+ section](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES)
85
+
86
+ # Why filter-repo instead of other alternatives?
87
+
88
+ This was covered in more detail in a [Git Rev News article on
89
+ filter-repo](https://git.github.io/rev_news/2019/08/21/edition-54/#an-introduction-to-git-filter-repo--written-by-elijah-newren),
90
+ but some highlights for the main competitors:
91
+
92
+ ## filter-branch
93
+
94
+ * filter-branch is [extremely to unusably
95
+ slow](https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/)
96
+ ([multiple orders of magnitude slower than it should
97
+ be](https://git-scm.com/docs/git-filter-branch#PERFORMANCE))
98
+ for non-trivial repositories.
99
+
100
+ * [filter-branch is riddled with
101
+ gotchas](https://git-scm.com/docs/git-filter-branch#SAFETY) that can
102
+ silently corrupt your rewrite or at least thwart your "cleanup"
103
+ efforts by giving you something more problematic and messy than what
104
+ you started with.
105
+
106
+ * filter-branch is [very onerous](#simple-example-with-comparisons)
107
+ [to
108
+ use](https://github.com/newren/git-filter-repo/blob/a6a6a1b0f62d365bbe2e76f823e1621857ec4dbd/contrib/filter-repo-demos/filter-lamely#L9-L61)
109
+ for any rewrite which is even slightly non-trivial.
110
+
111
+ * the git project has stated that the above issues with filter-branch
112
+ cannot be backward compatibly fixed; they recommend that you [stop
113
+ using
114
+ filter-branch](https://git-scm.com/docs/git-filter-branch#_warning)
115
+
116
+ * die-hard fans of filter-branch may be interested in
117
+ [filter-lamely](contrib/filter-repo-demos/filter-lamely)
118
+ (a.k.a. [filter-branch-ish](contrib/filter-repo-demos/filter-branch-ish)),
119
+ a reimplementation of filter-branch based on filter-repo which is
120
+ more performant (though not nearly as fast or safe as
121
+ filter-repo).
122
+
123
+ * a [cheat
124
+ sheet](Documentation/converting-from-filter-branch.md#cheat-sheet-conversion-of-examples-from-the-filter-branch-manpage)
125
+ is available showing how to convert example commands from the manual of
126
+ filter-branch into filter-repo commands.
127
+
128
+ ## BFG Repo Cleaner
129
+
130
+ * great tool for its time, but while it makes some things simple, it
131
+ is limited to a few kinds of rewrites.
132
+
133
+ * its architecture is not amenable to handling more types of
134
+ rewrites.
135
+
136
+ * its architecture presents some shortcomings and bugs even for its
137
+ intended usecase.
138
+
139
+ * fans of bfg may be interested in
140
+ [bfg-ish](contrib/filter-repo-demos/bfg-ish), a reimplementation of bfg
141
+ based on filter-repo which includes several new features and bugfixes
142
+ relative to bfg.
143
+
144
+ * a [cheat
145
+ sheet](Documentation/converting-from-bfg-repo-cleaner.md#cheat-sheet-conversion-of-examples-from-bfg)
146
+ is available showing how to convert example commands from the manual of
147
+ BFG Repo Cleaner into filter-repo commands.
148
+
149
+ # Simple example, with comparisons
150
+
151
+ Let's say that we want to extract a piece of a repository, with the intent
152
+ on merging just that piece into some other bigger repo. For extraction, we
153
+ want to:
154
+
155
+ * extract the history of a single directory, src/. This means that only
156
+ paths under src/ remain in the repo, and any commits that only touched
157
+ paths outside this directory will be removed.
158
+ * rename all files to have a new leading directory, my-module/ (e.g. so that
159
+ src/foo.c becomes my-module/src/foo.c)
160
+ * rename any tags in the extracted repository to have a 'my-module-'
161
+ prefix (to avoid any conflicts when we later merge this repo into
162
+ something else)
163
+
164
+ ## Solving this with filter-repo
165
+
166
+ Doing this with filter-repo is as simple as the following command:
167
+ ```shell
168
+ git filter-repo --path src/ --to-subdirectory-filter my-module --tag-rename '':'my-module-'
169
+ ```
170
+ (the single quotes are unnecessary, but make it clearer to a human that we
171
+ are replacing the empty string as a prefix with `my-module-`)
172
+
173
+ ## Solving this with BFG Repo Cleaner
174
+
175
+ BFG Repo Cleaner is not capable of this kind of rewrite; in fact, all
176
+ three types of wanted changes are outside of its capabilities.
177
+
178
+ ## Solving this with filter-branch
179
+
180
+ filter-branch comes with a pile of caveats (more on that below) even
181
+ once you figure out the necessary invocation(s):
182
+
183
+ ```shell
184
+ git filter-branch \
185
+ --tree-filter 'mkdir -p my-module && \
186
+ git ls-files \
187
+ | grep -v ^src/ \
188
+ | xargs git rm -f -q && \
189
+ ls -d * \
190
+ | grep -v my-module \
191
+ | xargs -I files mv files my-module/' \
192
+ --tag-name-filter 'echo "my-module-$(cat)"' \
193
+ --prune-empty -- --all
194
+ git clone file://$(pwd) newcopy
195
+ cd newcopy
196
+ git for-each-ref --format="delete %(refname)" refs/tags/ \
197
+ | grep -v refs/tags/my-module- \
198
+ | git update-ref --stdin
199
+ git gc --prune=now
200
+ ```
201
+
202
+ Some might notice that the above filter-branch invocation will be really
203
+ slow due to using --tree-filter; you could alternatively use the
204
+ --index-filter option of filter-branch, changing the above commands to:
205
+
206
+ ```shell
207
+ git filter-branch \
208
+ --index-filter 'git ls-files \
209
+ | grep -v ^src/ \
210
+ | xargs git rm -q --cached;
211
+ git ls-files -s \
212
+ | sed "s%$(printf \\t)%&my-module/%" \
213
+ | git update-index --index-info;
214
+ git ls-files \
215
+ | grep -v ^my-module/ \
216
+ | xargs git rm -q --cached' \
217
+ --tag-name-filter 'echo "my-module-$(cat)"' \
218
+ --prune-empty -- --all
219
+ git clone file://$(pwd) newcopy
220
+ cd newcopy
221
+ git for-each-ref --format="delete %(refname)" refs/tags/ \
222
+ | grep -v refs/tags/my-module- \
223
+ | git update-ref --stdin
224
+ git gc --prune=now
225
+ ```
226
+
227
+ However, for either filter-branch command there are a pile of caveats.
228
+ First, some may be wondering why I list five commands here for
229
+ filter-branch. Despite the use of --all and --tag-name-filter, and
230
+ filter-branch's manpage claiming that a clone is enough to get rid of
231
+ old objects, the extra steps to delete the other tags and do another
232
+ gc are still required to clean out the old objects and avoid mixing
233
+ new and old history before pushing somewhere. Other caveats:
234
+ * Commit messages are not rewritten; so if some of your commit
235
+ messages refer to prior commits by (abbreviated) sha1, after the
236
+ rewrite those messages will now refer to commits that are no longer
237
+ part of the history. It would be better to rewrite those
238
+ (abbreviated) sha1 references to refer to the new commit ids.
239
+ * The --prune-empty flag sometimes misses commits that should be
240
+ pruned, and it will also prune commits that *started* empty rather
241
+ than just ended empty due to filtering. For repositories that
242
+ intentionally use empty commits for versioning and publishing
243
+ related purposes, this can be detrimental.
244
+ * The commands above are OS-specific. GNU vs. BSD issues for sed,
245
+ xargs, and other commands often trip up users; I think I failed to
246
+ get most folks to use --index-filter since the only example in the
247
+ filter-branch manpage that both uses it and shows how to move
248
+ everything into a subdirectory is linux-specific, and it is not
249
+ obvious to the reader that it has a portability issue since it
250
+ silently misbehaves rather than failing loudly.
251
+ * The --index-filter version of the filter-branch command may be two to
252
+ three times faster than the --tree-filter version, but both
253
+ filter-branch commands are going to be multiple orders of magnitude
254
+ slower than filter-repo.
255
+ * Both commands assume all filenames are composed entirely of ascii
256
+ characters (even special ascii characters such as tabs or double
257
+ quotes will wreak havoc and likely result in missing files or
258
+ misnamed files)
259
+
260
+ ## Solving this with fast-export/fast-import
261
+
262
+ One can kind of hack this together with something like:
263
+
264
+ ```shell
265
+ git fast-export --no-data --reencode=yes --mark-tags --fake-missing-tagger \
266
+ --signed-tags=strip --tag-of-filtered-object=rewrite --all \
267
+ | grep -vP '^M [0-9]+ [0-9a-f]+ (?!src/)' \
268
+ | grep -vP '^D (?!src/)' \
269
+ | perl -pe 's%^(M [0-9]+ [0-9a-f]+ )(.*)$%\1my-module/\2%' \
270
+ | perl -pe 's%^(D )(.*)$%\1my-module/\2%' \
271
+ | perl -pe s%refs/tags/%refs/tags/my-module-% \
272
+ | git -c core.ignorecase=false fast-import --date-format=raw-permissive \
273
+ --force --quiet
274
+ git for-each-ref --format="delete %(refname)" refs/tags/ \
275
+ | grep -v refs/tags/my-module- \
276
+ | git update-ref --stdin
277
+ git reset --hard
278
+ git reflog expire --expire=now --all
279
+ git gc --prune=now
280
+ ```
281
+
282
+ But this comes with some nasty caveats and limitations:
283
+ * The various greps and regex replacements operate on the entire
284
+ fast-export stream and thus might accidentally corrupt unintended
285
+ portions of it, such as commit messages. If you needed to edit
286
+ file contents and thus dropped the --no-data flag, it could also
287
+ end up corrupting file contents.
288
+ * This command assumes all filenames in the repository are composed
289
+ entirely of ascii characters, and also exclude special characters
290
+ such as tabs or double quotes. If such a special filename exists
291
+ within the old src/ directory, it will be pruned even though it
292
+ was intended to be kept. (In slightly different repository
293
+ rewrites, this type of editing also risks corrupting filenames
294
+ with special characters by adding extra double quotes near the end
295
+ of the filename and in some leading directory name.)
296
+ * This command will leave behind huge numbers of useless empty
297
+ commits, and has no realistic way of pruning them. (And if you
298
+ tried to combine this technique with another tool to prune the
299
+ empty commits, then you now have no way to distinguish between
300
+ commits which were made empty by the filtering that you want to
301
+ remove, and commits which were empty before the filtering process
302
+ and which you thus may want to keep.)
303
+ * Commit messages which reference other commits by hash will now
304
+ reference old commits that no longer exist. Attempting to edit
305
+ the commit messages to update them is extraordinarily difficult to
306
+ add to this kind of direct rewrite.
307
+
308
+ # Design rationale behind filter-repo
309
+
310
+ None of the existing repository filtering tools did what I wanted;
311
+ they all came up short for my needs. No tool provided any of the
312
+ first eight traits below I wanted, and no tool provided more than
313
+ two of the last four traits either:
314
+
315
+ 1. [Starting report] Provide user an analysis of their repo to help
316
+ them get started on what to prune or rename, instead of expecting
317
+ them to guess or find other tools to figure it out. (Triggered, e.g.
318
+ by running the first time with a special flag, such as --analyze.)
319
+
320
+ 1. [Keep vs. remove] Instead of just providing a way for users to
321
+ easily remove selected paths, also provide flags for users to
322
+ only *keep* certain paths. Sure, users could workaround this by
323
+ specifying to remove all paths other than the ones they want to
324
+ keep, but the need to specify all paths that *ever* existed in
325
+ **any** version of the repository could sometimes be quite
326
+ painful. For filter-branch, using pipelines like `git ls-files |
327
+ grep -v ... | xargs -r git rm` might be a reasonable workaround
328
+ but can get unwieldy and isn't as straightforward for users; plus
329
+ those commands are often operating-system specific (can you spot
330
+ the GNUism in the snippet I provided?).
331
+
332
+ 1. [Renaming] It should be easy to rename paths. For example, in
333
+ addition to allowing one to treat some subdirectory as the root
334
+ of the repository, also provide options for users to make the
335
+ root of the repository just become a subdirectory. And more
336
+ generally allow files and directories to be easily renamed.
337
+ Provide sanity checks if renaming causes multiple files to exist
338
+ at the same path. (And add special handling so that if a commit
339
+ merely copied oldname->newname without modification, then
340
+ filtering oldname->newname doesn't trigger the sanity check and
341
+ die on that commit.)
342
+
343
+ 1. [More intelligent safety] Writing copies of the original refs to
344
+ a special namespace within the repo does not provide a
345
+ user-friendly recovery mechanism. Many would struggle to recover
346
+ using that. Almost everyone I've ever seen do a repository
347
+ filtering operation has done so with a fresh clone, because
348
+ wiping out the clone in case of error is a vastly easier recovery
349
+ mechanism. Strongly encourage that workflow by [detecting and
350
+ bailing if we're not in a fresh
351
+ clone](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#FRESHCLONE),
352
+ unless the user overrides with --force.
353
+
354
+ 1. [Auto shrink] Automatically remove old cruft and repack the
355
+ repository for the user after filtering (unless overridden); this
356
+ simplifies things for the user, helps avoid mixing old and new
357
+ history together, and avoids problems where the multi-step
358
+ process for shrinking the repo documented in the manpage doesn't
359
+ actually work in some cases. (I'm looking at you,
360
+ filter-branch.)
361
+
362
+ 1. [Clean separation] Avoid confusing users (and prevent accidental
363
+ re-pushing of old stuff) due to mixing old repo and rewritten
364
+ repo together. (This is particularly a problem with filter-branch
365
+ when using the --tag-name-filter option, and sometimes also an
366
+ issue when only filtering a subset of branches.)
367
+
368
+ 1. [Versatility] Provide the user the ability to extend the tool or
369
+ even write new tools that leverage existing capabilities, and
370
+ provide this extensibility in a way that (a) avoids the need to
371
+ fork separate processes (which would destroy performance), (b)
372
+ avoids making the user specify OS-dependent shell commands (which
373
+ would prevent users from sharing commands with each other), (c)
374
+ takes advantage of rich data structures (because hashes, dicts,
375
+ lists, and arrays are prohibitively difficult in shell) and (d)
376
+ provides reasonable string manipulation capabilities (which are
377
+ sorely lacking in shell).
378
+
379
+ 1. [Old commit references] Provide a way for users to use old commit
380
+ IDs with the new repository (in particular via mapping from old to
381
+ new hashes with refs/replace/ references).
382
+
383
+ 1. [Commit message consistency] If commit messages refer to other
384
+ commits by ID (e.g. "this reverts commit 01234567890abcdef", "In
385
+ commit 0013deadbeef9a..."), those commit messages should be
386
+ rewritten to refer to the new commit IDs.
387
+
388
+ 1. [Become-empty pruning] Commits which become empty due to filtering
389
+ should be pruned. If the parent of a commit is pruned, the first
390
+ non-pruned ancestor needs to become the new parent. If no
391
+ non-pruned ancestor exists and the commit was not a merge, then it
392
+ becomes a new root commit. If no non-pruned ancestor exists and
393
+ the commit was a merge, then the merge will have one less parent
394
+ (and thus make it likely to become a non-merge commit which would
395
+ itself be pruned if it had no file changes of its own). One
396
+ special thing to note here is that we prune commits which become
397
+ empty, NOT commits which start empty. Some projects intentionally
398
+ create empty commits for versioning or publishing reasons, and
399
+ these should not be removed. (As a special case, commits which
400
+ started empty but whose parent was pruned away will also be
401
+ considered to have "become empty".)
402
+
403
+ 1. [Become-degenerate pruning] Pruning of commits which become empty
404
+ can potentially cause topology changes, and there are lots of
405
+ special cases. Normally, merge commits are not removed since they
406
+ are needed to preserve the graph topology, but the pruning of
407
+ parents and other ancestors can ultimately result in the loss of
408
+ one or more parents. A simple case was already noted above: if a
409
+ merge commit loses enough parents to become a non-merge commit and
410
+ it has no file changes, then it too can be pruned. Merge commits
411
+ can also have a topology that becomes degenerate: it could end up
412
+ with the merge_base serving as both parents (if all intervening
413
+ commits from the original repo were pruned), or it could end up
414
+ with one parent which is an ancestor of its other parent. In such
415
+ cases, if the merge has no file changes of its own, then the merge
416
+ commit can also be pruned. However, much as we do with empty
417
+ pruning we do not prune merge commits that started degenerate
418
+ (which indicates it may have been intentional, such as with --no-ff
419
+ merges) but only merge commits that become degenerate and have no
420
+ file changes of their own.
421
+
422
+ 1. [Speed] Filtering should be reasonably fast
423
+
424
+ # How do I contribute?
425
+
426
+ See the [contributing guidelines](Documentation/Contributing.md).
427
+
428
+ # Is there a Code of Conduct?
429
+
430
+ Participants in the filter-repo community are expected to adhere to
431
+ the same standards as for the git project, so the [git Code of
432
+ Conduct](https://git.kernel.org/pub/scm/git/git.git/tree/CODE_OF_CONDUCT.md)
433
+ applies.
434
+
435
+ # Upstream Improvements
436
+
437
+ Work on filter-repo and [its
438
+ predecessor](https://public-inbox.org/git/51419b2c0904072035u1182b507o836a67ac308d32b9@mail.gmail.com/)
439
+ has also driven numerous improvements to fast-export and fast-import
440
+ (and occasionally other commands) in core git, based on things
441
+ filter-repo needs to do its work:
442
+
443
+ * git-2.28.0
444
+ * [fast-import: add new --date-format=raw-permissive format](
445
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d42a2fb72f)
446
+ * git-2.24.0
447
+ * [fast-export: handle nested tags](
448
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=941790d7de)
449
+ * [t9350: add tests for tags of things other than a commit](
450
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=8d7d33c1ce)
451
+ * [fast-export: allow user to request tags be marked with --mark-tags](
452
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=a1638cfe12)
453
+ * [fast-export: add support for --import-marks-if-exists](
454
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=208d69246e)
455
+ * [fast-import: add support for new 'alias' command](
456
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=b8f50e5b60)
457
+ * [fast-import: allow tags to be identified by mark labels](
458
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=f73b2aba05)
459
+ * [fast-import: fix handling of deleted tags](
460
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=3164e6bd24)
461
+ * [fast-export: fix exporting a tag and nothing else](
462
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=af2abd870b)
463
+ * [git-fast-import.txt: clarify that multiple merge commits are allowed](
464
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d1387d3895)
465
+ * git-2.23.0
466
+ * [t9350: fix encoding test to actually test reencoding](
467
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=32615ce762)
468
+ * [fast-import: support 'encoding' commit header](
469
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=3edfcc65fd)
470
+ * [fast-export: avoid stripping encoding header if we cannot reencode](
471
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=ccbfc96dc4)
472
+ * [fast-export: differentiate between explicitly UTF-8 and implicitly
473
+ UTF-8](
474
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=57a8be2cb0)
475
+ * [fast-export: do automatic reencoding of commit messages only if
476
+ requested](
477
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=e80001f8fd)
478
+ * git-2.22.0
479
+ * [log,diff-tree: add --combined-all-paths option](
480
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d76ce4f734)
481
+ * [t9300: demonstrate bug with get-mark and empty orphan commits](
482
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=62edbec7de)
483
+ * [git-fast-import.txt: fix wording about where ls command can appear](
484
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=a63c54a019)
485
+ * [fast-import: check most prominent commands first](
486
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=5056bb7646)
487
+ * [fast-import: only allow cat-blob requests where it makes sense](
488
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=7ffde293f2)
489
+ * [fast-import: fix erroneous handling of get-mark with empty orphan
490
+ commits](
491
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=cf7b857a77)
492
+ * [Honor core.precomposeUnicode in more places](
493
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=8e712ef6fc)
494
+ * git-2.21.0
495
+ * [fast-export: convert sha1 to oid](
496
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=843b9e6d48)
497
+ * [git-fast-import.txt: fix documentation for --quiet option](
498
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=f55c979b14)
499
+ * [git-fast-export.txt: clarify misleading documentation about rev-list
500
+ args](
501
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=4532be7cba)
502
+ * [fast-export: use value from correct enum](
503
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=b93b81e799)
504
+ * [fast-export: avoid dying when filtering by paths and old tags exist](
505
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=1f30c904b3)
506
+ * [fast-export: move commit rewriting logic into a function for reuse](
507
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=f129c4275c)
508
+ * [fast-export: when using paths, avoid corrupt stream with non-existent
509
+ mark](
510
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=cd13762d8f)
511
+ * [fast-export: ensure we export requested refs](
512
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=fdf31b6369)
513
+ * [fast-export: add --reference-excluded-parents option](
514
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=530ca19c02)
515
+ * [fast-import: remove unmaintained duplicate documentation](
516
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=25dd3e4889)
517
+ * [fast-export: add a --show-original-ids option to show
518
+ original names](
519
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=a965bb3116)
520
+ * [git-show-ref.txt: fix order of flags](
521
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=bd8d6f0def)
522
+ * git-2.20.0
523
+ * [update-ref: fix type of update_flags variable to
524
+ match its usage](
525
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=e4c34855a2)
526
+ * [update-ref: allow --no-deref with --stdin](
527
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d345e9fbe7)
528
+ * git-1.7.3
529
+ * [fast-export: Fix dropping of files with --import-marks and path
530
+ limiting](
531
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=4087a02e45)
532
+ * [fast-export: Add a --full-tree option](
533
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=7f40ab0916)
534
+ * [fast-export: Fix output order of D/F changes](
535
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=060df62422)
536
+ * [fast-import: Improve robustness when D->F changes provided in wrong
537
+ order](
538
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=253fb5f889)
539
+ * git-1.6.4:
540
+ * [fast-export: Set revs.topo_order before calling setup_revisions](
541
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=668f3aa776)
542
+ * [fast-export: Omit tags that tag trees](
543
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=02c48cd69b)
544
+ * [fast-export: Make sure we show actual ref names instead of "(null)"](
545
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=2374502c6c)
546
+ * [fast-export: Do parent rewriting to avoid dropping relevant commits](
547
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=32164131db)
548
+ * [fast-export: Add a --tag-of-filtered-object option for newly
549
+ dangling tags](
550
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=2d8ad46919)
551
+ * [Add new fast-export testcases](
552
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=25e0ca5dd6)
553
+ * [fast-export: Document the fact that git-rev-list arguments are
554
+ accepted](
555
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=8af15d282e)
556
+ * git-1.6.3:
557
+ * [git-filter-branch: avoid collisions with variables in eval'ed
558
+ commands](
559
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d5b0c97d13)
560
+ * [Correct missing SP characters in grammar comment at top of
561
+ fast-import.c](
562
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=98e1a4186a)
563
+ * [fast-export: Avoid dropping files from commits](
564
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=ebeec7dbc5)
565
+ * git-1.6.1.4:
566
+ * [fast-export: ensure we traverse commits in topological order](
567
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=784f8affe4)
temp2/README.md ADDED
@@ -0,0 +1,567 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ git filter-repo is a versatile tool for rewriting history, which includes
2
+ [capabilities I have not found anywhere
3
+ else](#design-rationale-behind-filter-repo). It roughly falls into the
4
+ same space of tool as [git
5
+ filter-branch](https://git-scm.com/docs/git-filter-branch) but without the
6
+ capitulation-inducing poor
7
+ [performance](https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/),
8
+ with far more capabilities, and with a design that scales usability-wise
9
+ beyond trivial rewriting cases. [git filter-repo is now recommended by the
10
+ git project](https://git-scm.com/docs/git-filter-branch#_warning) instead
11
+ of git filter-branch.
12
+
13
+ While most users will probably just use filter-repo as a simple command
14
+ line tool (and likely only use a few of its flags), at its core filter-repo
15
+ contains a library for creating history rewriting tools. As such, users
16
+ with specialized needs can leverage it to quickly create [entirely new
17
+ history rewriting tools](contrib/filter-repo-demos).
18
+
19
+ # Table of Contents
20
+
21
+ * [Prerequisites](#prerequisites)
22
+ * [How do I install it?](#how-do-i-install-it)
23
+ * [How do I use it?](#how-do-i-use-it)
24
+ * [Why filter-repo instead of other alternatives?](#why-filter-repo-instead-of-other-alternatives)
25
+ * [filter-branch](#filter-branch)
26
+ * [BFG Repo Cleaner](#bfg-repo-cleaner)
27
+ * [Simple example, with comparisons](#simple-example-with-comparisons)
28
+ * [Solving this with filter-repo](#solving-this-with-filter-repo)
29
+ * [Solving this with BFG Repo Cleaner](#solving-this-with-bfg-repo-cleaner)
30
+ * [Solving this with filter-branch](#solving-this-with-filter-branch)
31
+ * [Solving this with fast-export/fast-import](#solving-this-with-fast-exportfast-import)
32
+ * [Design rationale behind filter-repo](#design-rationale-behind-filter-repo)
33
+ * [How do I contribute?](#how-do-i-contribute)
34
+ * [Is there a Code of Conduct?](#is-there-a-code-of-conduct)
35
+ * [Upstream Improvements](#upstream-improvements)
36
+
37
+ # Prerequisites
38
+
39
+ filter-repo requires:
40
+
41
+ * git >= 2.22.0 at a minimum; [some features](#upstream-improvements)
42
+ require git >= 2.24.0 or later
43
+ * python3 >= 3.5
44
+
45
+ # How do I install it?
46
+
47
+ `git-filter-repo` is a single-file python script, which was done to make
48
+ installation for basic use on many systems trivial: just place that
49
+ file into your $PATH.
50
+
51
+ See [INSTALL.md](INSTALL.md) for things beyond basic usage or special
52
+ cases. The more involved instructions are only needed if one of the
53
+ following apply:
54
+
55
+ * you do not find the above comment about trivial installation intuitively
56
+ obvious
57
+ * you are working with a python3 executable named something other than
58
+ "python3"
59
+ * you want to install documentation (beyond the builtin docs shown with -h)
60
+ * you want to run some of the [contrib](contrib/filter-repo-demos/) examples
61
+ * you want to create your own python filtering scripts using filter-repo as
62
+ a module/library
63
+
64
+ # How do I use it?
65
+
66
+ For comprehensive documentation:
67
+ * see the [user manual](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html)
68
+ * alternative formating of the user manual is available on various
69
+ external sites
70
+ ([example](https://www.mankier.com/1/git-filter-repo)), for those
71
+ that don't like the htmlpreview.github.io layout, though it may
72
+ only be up-to-date as of the latest release
73
+
74
+ If you prefer learning from examples:
75
+ * there is a [cheat sheet for converting filter-branch
76
+ commands](Documentation/converting-from-filter-branch.md#cheat-sheet-conversion-of-examples-from-the-filter-branch-manpage),
77
+ which covers every example from the filter-branch manual
78
+ * there is a [cheat sheet for converting BFG Repo Cleaner
79
+ commands](Documentation/converting-from-bfg-repo-cleaner.md#cheat-sheet-conversion-of-examples-from-bfg),
80
+ which covers every example from the BFG website
81
+ * the [simple example](#simple-example-with-comparisons) below may
82
+ be of interest
83
+ * the user manual has an extensive [examples
84
+ section](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES)
85
+
86
+ # Why filter-repo instead of other alternatives?
87
+
88
+ This was covered in more detail in a [Git Rev News article on
89
+ filter-repo](https://git.github.io/rev_news/2019/08/21/edition-54/#an-introduction-to-git-filter-repo--written-by-elijah-newren),
90
+ but some highlights for the main competitors:
91
+
92
+ ## filter-branch
93
+
94
+ * filter-branch is [extremely to unusably
95
+ slow](https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/)
96
+ ([multiple orders of magnitude slower than it should
97
+ be](https://git-scm.com/docs/git-filter-branch#PERFORMANCE))
98
+ for non-trivial repositories.
99
+
100
+ * [filter-branch is riddled with
101
+ gotchas](https://git-scm.com/docs/git-filter-branch#SAFETY) that can
102
+ silently corrupt your rewrite or at least thwart your "cleanup"
103
+ efforts by giving you something more problematic and messy than what
104
+ you started with.
105
+
106
+ * filter-branch is [very onerous](#simple-example-with-comparisons)
107
+ [to
108
+ use](https://github.com/newren/git-filter-repo/blob/a6a6a1b0f62d365bbe2e76f823e1621857ec4dbd/contrib/filter-repo-demos/filter-lamely#L9-L61)
109
+ for any rewrite which is even slightly non-trivial.
110
+
111
+ * the git project has stated that the above issues with filter-branch
112
+ cannot be backward compatibly fixed; they recommend that you [stop
113
+ using
114
+ filter-branch](https://git-scm.com/docs/git-filter-branch#_warning)
115
+
116
+ * die-hard fans of filter-branch may be interested in
117
+ [filter-lamely](contrib/filter-repo-demos/filter-lamely)
118
+ (a.k.a. [filter-branch-ish](contrib/filter-repo-demos/filter-branch-ish)),
119
+ a reimplementation of filter-branch based on filter-repo which is
120
+ more performant (though not nearly as fast or safe as
121
+ filter-repo).
122
+
123
+ * a [cheat
124
+ sheet](Documentation/converting-from-filter-branch.md#cheat-sheet-conversion-of-examples-from-the-filter-branch-manpage)
125
+ is available showing how to convert example commands from the manual of
126
+ filter-branch into filter-repo commands.
127
+
128
+ ## BFG Repo Cleaner
129
+
130
+ * great tool for its time, but while it makes some things simple, it
131
+ is limited to a few kinds of rewrites.
132
+
133
+ * its architecture is not amenable to handling more types of
134
+ rewrites.
135
+
136
+ * its architecture presents some shortcomings and bugs even for its
137
+ intended usecase.
138
+
139
+ * fans of bfg may be interested in
140
+ [bfg-ish](contrib/filter-repo-demos/bfg-ish), a reimplementation of bfg
141
+ based on filter-repo which includes several new features and bugfixes
142
+ relative to bfg.
143
+
144
+ * a [cheat
145
+ sheet](Documentation/converting-from-bfg-repo-cleaner.md#cheat-sheet-conversion-of-examples-from-bfg)
146
+ is available showing how to convert example commands from the manual of
147
+ BFG Repo Cleaner into filter-repo commands.
148
+
149
+ # Simple example, with comparisons
150
+
151
+ Let's say that we want to extract a piece of a repository, with the intent
152
+ on merging just that piece into some other bigger repo. For extraction, we
153
+ want to:
154
+
155
+ * extract the history of a single directory, src/. This means that only
156
+ paths under src/ remain in the repo, and any commits that only touched
157
+ paths outside this directory will be removed.
158
+ * rename all files to have a new leading directory, my-module/ (e.g. so that
159
+ src/foo.c becomes my-module/src/foo.c)
160
+ * rename any tags in the extracted repository to have a 'my-module-'
161
+ prefix (to avoid any conflicts when we later merge this repo into
162
+ something else)
163
+
164
+ ## Solving this with filter-repo
165
+
166
+ Doing this with filter-repo is as simple as the following command:
167
+ ```shell
168
+ git filter-repo --path src/ --to-subdirectory-filter my-module --tag-rename '':'my-module-'
169
+ ```
170
+ (the single quotes are unnecessary, but make it clearer to a human that we
171
+ are replacing the empty string as a prefix with `my-module-`)
172
+
173
+ ## Solving this with BFG Repo Cleaner
174
+
175
+ BFG Repo Cleaner is not capable of this kind of rewrite; in fact, all
176
+ three types of wanted changes are outside of its capabilities.
177
+
178
+ ## Solving this with filter-branch
179
+
180
+ filter-branch comes with a pile of caveats (more on that below) even
181
+ once you figure out the necessary invocation(s):
182
+
183
+ ```shell
184
+ git filter-branch \
185
+ --tree-filter 'mkdir -p my-module && \
186
+ git ls-files \
187
+ | grep -v ^src/ \
188
+ | xargs git rm -f -q && \
189
+ ls -d * \
190
+ | grep -v my-module \
191
+ | xargs -I files mv files my-module/' \
192
+ --tag-name-filter 'echo "my-module-$(cat)"' \
193
+ --prune-empty -- --all
194
+ git clone file://$(pwd) newcopy
195
+ cd newcopy
196
+ git for-each-ref --format="delete %(refname)" refs/tags/ \
197
+ | grep -v refs/tags/my-module- \
198
+ | git update-ref --stdin
199
+ git gc --prune=now
200
+ ```
201
+
202
+ Some might notice that the above filter-branch invocation will be really
203
+ slow due to using --tree-filter; you could alternatively use the
204
+ --index-filter option of filter-branch, changing the above commands to:
205
+
206
+ ```shell
207
+ git filter-branch \
208
+ --index-filter 'git ls-files \
209
+ | grep -v ^src/ \
210
+ | xargs git rm -q --cached;
211
+ git ls-files -s \
212
+ | sed "s%$(printf \\t)%&my-module/%" \
213
+ | git update-index --index-info;
214
+ git ls-files \
215
+ | grep -v ^my-module/ \
216
+ | xargs git rm -q --cached' \
217
+ --tag-name-filter 'echo "my-module-$(cat)"' \
218
+ --prune-empty -- --all
219
+ git clone file://$(pwd) newcopy
220
+ cd newcopy
221
+ git for-each-ref --format="delete %(refname)" refs/tags/ \
222
+ | grep -v refs/tags/my-module- \
223
+ | git update-ref --stdin
224
+ git gc --prune=now
225
+ ```
226
+
227
+ However, for either filter-branch command there are a pile of caveats.
228
+ First, some may be wondering why I list five commands here for
229
+ filter-branch. Despite the use of --all and --tag-name-filter, and
230
+ filter-branch's manpage claiming that a clone is enough to get rid of
231
+ old objects, the extra steps to delete the other tags and do another
232
+ gc are still required to clean out the old objects and avoid mixing
233
+ new and old history before pushing somewhere. Other caveats:
234
+ * Commit messages are not rewritten; so if some of your commit
235
+ messages refer to prior commits by (abbreviated) sha1, after the
236
+ rewrite those messages will now refer to commits that are no longer
237
+ part of the history. It would be better to rewrite those
238
+ (abbreviated) sha1 references to refer to the new commit ids.
239
+ * The --prune-empty flag sometimes misses commits that should be
240
+ pruned, and it will also prune commits that *started* empty rather
241
+ than just ended empty due to filtering. For repositories that
242
+ intentionally use empty commits for versioning and publishing
243
+ related purposes, this can be detrimental.
244
+ * The commands above are OS-specific. GNU vs. BSD issues for sed,
245
+ xargs, and other commands often trip up users; I think I failed to
246
+ get most folks to use --index-filter since the only example in the
247
+ filter-branch manpage that both uses it and shows how to move
248
+ everything into a subdirectory is linux-specific, and it is not
249
+ obvious to the reader that it has a portability issue since it
250
+ silently misbehaves rather than failing loudly.
251
+ * The --index-filter version of the filter-branch command may be two to
252
+ three times faster than the --tree-filter version, but both
253
+ filter-branch commands are going to be multiple orders of magnitude
254
+ slower than filter-repo.
255
+ * Both commands assume all filenames are composed entirely of ascii
256
+ characters (even special ascii characters such as tabs or double
257
+ quotes will wreak havoc and likely result in missing files or
258
+ misnamed files)
259
+
260
+ ## Solving this with fast-export/fast-import
261
+
262
+ One can kind of hack this together with something like:
263
+
264
+ ```shell
265
+ git fast-export --no-data --reencode=yes --mark-tags --fake-missing-tagger \
266
+ --signed-tags=strip --tag-of-filtered-object=rewrite --all \
267
+ | grep -vP '^M [0-9]+ [0-9a-f]+ (?!src/)' \
268
+ | grep -vP '^D (?!src/)' \
269
+ | perl -pe 's%^(M [0-9]+ [0-9a-f]+ )(.*)$%\1my-module/\2%' \
270
+ | perl -pe 's%^(D )(.*)$%\1my-module/\2%' \
271
+ | perl -pe s%refs/tags/%refs/tags/my-module-% \
272
+ | git -c core.ignorecase=false fast-import --date-format=raw-permissive \
273
+ --force --quiet
274
+ git for-each-ref --format="delete %(refname)" refs/tags/ \
275
+ | grep -v refs/tags/my-module- \
276
+ | git update-ref --stdin
277
+ git reset --hard
278
+ git reflog expire --expire=now --all
279
+ git gc --prune=now
280
+ ```
281
+
282
+ But this comes with some nasty caveats and limitations:
283
+ * The various greps and regex replacements operate on the entire
284
+ fast-export stream and thus might accidentally corrupt unintended
285
+ portions of it, such as commit messages. If you needed to edit
286
+ file contents and thus dropped the --no-data flag, it could also
287
+ end up corrupting file contents.
288
+ * This command assumes all filenames in the repository are composed
289
+ entirely of ascii characters, and also exclude special characters
290
+ such as tabs or double quotes. If such a special filename exists
291
+ within the old src/ directory, it will be pruned even though it
292
+ was intended to be kept. (In slightly different repository
293
+ rewrites, this type of editing also risks corrupting filenames
294
+ with special characters by adding extra double quotes near the end
295
+ of the filename and in some leading directory name.)
296
+ * This command will leave behind huge numbers of useless empty
297
+ commits, and has no realistic way of pruning them. (And if you
298
+ tried to combine this technique with another tool to prune the
299
+ empty commits, then you now have no way to distinguish between
300
+ commits which were made empty by the filtering that you want to
301
+ remove, and commits which were empty before the filtering process
302
+ and which you thus may want to keep.)
303
+ * Commit messages which reference other commits by hash will now
304
+ reference old commits that no longer exist. Attempting to edit
305
+ the commit messages to update them is extraordinarily difficult to
306
+ add to this kind of direct rewrite.
307
+
308
+ # Design rationale behind filter-repo
309
+
310
+ None of the existing repository filtering tools did what I wanted;
311
+ they all came up short for my needs. No tool provided any of the
312
+ first eight traits below I wanted, and no tool provided more than
313
+ two of the last four traits either:
314
+
315
+ 1. [Starting report] Provide user an analysis of their repo to help
316
+ them get started on what to prune or rename, instead of expecting
317
+ them to guess or find other tools to figure it out. (Triggered, e.g.
318
+ by running the first time with a special flag, such as --analyze.)
319
+
320
+ 1. [Keep vs. remove] Instead of just providing a way for users to
321
+ easily remove selected paths, also provide flags for users to
322
+ only *keep* certain paths. Sure, users could workaround this by
323
+ specifying to remove all paths other than the ones they want to
324
+ keep, but the need to specify all paths that *ever* existed in
325
+ **any** version of the repository could sometimes be quite
326
+ painful. For filter-branch, using pipelines like `git ls-files |
327
+ grep -v ... | xargs -r git rm` might be a reasonable workaround
328
+ but can get unwieldy and isn't as straightforward for users; plus
329
+ those commands are often operating-system specific (can you spot
330
+ the GNUism in the snippet I provided?).
331
+
332
+ 1. [Renaming] It should be easy to rename paths. For example, in
333
+ addition to allowing one to treat some subdirectory as the root
334
+ of the repository, also provide options for users to make the
335
+ root of the repository just become a subdirectory. And more
336
+ generally allow files and directories to be easily renamed.
337
+ Provide sanity checks if renaming causes multiple files to exist
338
+ at the same path. (And add special handling so that if a commit
339
+ merely copied oldname->newname without modification, then
340
+ filtering oldname->newname doesn't trigger the sanity check and
341
+ die on that commit.)
342
+
343
+ 1. [More intelligent safety] Writing copies of the original refs to
344
+ a special namespace within the repo does not provide a
345
+ user-friendly recovery mechanism. Many would struggle to recover
346
+ using that. Almost everyone I've ever seen do a repository
347
+ filtering operation has done so with a fresh clone, because
348
+ wiping out the clone in case of error is a vastly easier recovery
349
+ mechanism. Strongly encourage that workflow by [detecting and
350
+ bailing if we're not in a fresh
351
+ clone](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#FRESHCLONE),
352
+ unless the user overrides with --force.
353
+
354
+ 1. [Auto shrink] Automatically remove old cruft and repack the
355
+ repository for the user after filtering (unless overridden); this
356
+ simplifies things for the user, helps avoid mixing old and new
357
+ history together, and avoids problems where the multi-step
358
+ process for shrinking the repo documented in the manpage doesn't
359
+ actually work in some cases. (I'm looking at you,
360
+ filter-branch.)
361
+
362
+ 1. [Clean separation] Avoid confusing users (and prevent accidental
363
+ re-pushing of old stuff) due to mixing old repo and rewritten
364
+ repo together. (This is particularly a problem with filter-branch
365
+ when using the --tag-name-filter option, and sometimes also an
366
+ issue when only filtering a subset of branches.)
367
+
368
+ 1. [Versatility] Provide the user the ability to extend the tool or
369
+ even write new tools that leverage existing capabilities, and
370
+ provide this extensibility in a way that (a) avoids the need to
371
+ fork separate processes (which would destroy performance), (b)
372
+ avoids making the user specify OS-dependent shell commands (which
373
+ would prevent users from sharing commands with each other), (c)
374
+ takes advantage of rich data structures (because hashes, dicts,
375
+ lists, and arrays are prohibitively difficult in shell) and (d)
376
+ provides reasonable string manipulation capabilities (which are
377
+ sorely lacking in shell).
378
+
379
+ 1. [Old commit references] Provide a way for users to use old commit
380
+ IDs with the new repository (in particular via mapping from old to
381
+ new hashes with refs/replace/ references).
382
+
383
+ 1. [Commit message consistency] If commit messages refer to other
384
+ commits by ID (e.g. "this reverts commit 01234567890abcdef", "In
385
+ commit 0013deadbeef9a..."), those commit messages should be
386
+ rewritten to refer to the new commit IDs.
387
+
388
+ 1. [Become-empty pruning] Commits which become empty due to filtering
389
+ should be pruned. If the parent of a commit is pruned, the first
390
+ non-pruned ancestor needs to become the new parent. If no
391
+ non-pruned ancestor exists and the commit was not a merge, then it
392
+ becomes a new root commit. If no non-pruned ancestor exists and
393
+ the commit was a merge, then the merge will have one less parent
394
+ (and thus make it likely to become a non-merge commit which would
395
+ itself be pruned if it had no file changes of its own). One
396
+ special thing to note here is that we prune commits which become
397
+ empty, NOT commits which start empty. Some projects intentionally
398
+ create empty commits for versioning or publishing reasons, and
399
+ these should not be removed. (As a special case, commits which
400
+ started empty but whose parent was pruned away will also be
401
+ considered to have "become empty".)
402
+
403
+ 1. [Become-degenerate pruning] Pruning of commits which become empty
404
+ can potentially cause topology changes, and there are lots of
405
+ special cases. Normally, merge commits are not removed since they
406
+ are needed to preserve the graph topology, but the pruning of
407
+ parents and other ancestors can ultimately result in the loss of
408
+ one or more parents. A simple case was already noted above: if a
409
+ merge commit loses enough parents to become a non-merge commit and
410
+ it has no file changes, then it too can be pruned. Merge commits
411
+ can also have a topology that becomes degenerate: it could end up
412
+ with the merge_base serving as both parents (if all intervening
413
+ commits from the original repo were pruned), or it could end up
414
+ with one parent which is an ancestor of its other parent. In such
415
+ cases, if the merge has no file changes of its own, then the merge
416
+ commit can also be pruned. However, much as we do with empty
417
+ pruning we do not prune merge commits that started degenerate
418
+ (which indicates it may have been intentional, such as with --no-ff
419
+ merges) but only merge commits that become degenerate and have no
420
+ file changes of their own.
421
+
422
+ 1. [Speed] Filtering should be reasonably fast
423
+
424
+ # How do I contribute?
425
+
426
+ See the [contributing guidelines](Documentation/Contributing.md).
427
+
428
+ # Is there a Code of Conduct?
429
+
430
+ Participants in the filter-repo community are expected to adhere to
431
+ the same standards as for the git project, so the [git Code of
432
+ Conduct](https://git.kernel.org/pub/scm/git/git.git/tree/CODE_OF_CONDUCT.md)
433
+ applies.
434
+
435
+ # Upstream Improvements
436
+
437
+ Work on filter-repo and [its
438
+ predecessor](https://public-inbox.org/git/51419b2c0904072035u1182b507o836a67ac308d32b9@mail.gmail.com/)
439
+ has also driven numerous improvements to fast-export and fast-import
440
+ (and occasionally other commands) in core git, based on things
441
+ filter-repo needs to do its work:
442
+
443
+ * git-2.28.0
444
+ * [fast-import: add new --date-format=raw-permissive format](
445
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d42a2fb72f)
446
+ * git-2.24.0
447
+ * [fast-export: handle nested tags](
448
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=941790d7de)
449
+ * [t9350: add tests for tags of things other than a commit](
450
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=8d7d33c1ce)
451
+ * [fast-export: allow user to request tags be marked with --mark-tags](
452
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=a1638cfe12)
453
+ * [fast-export: add support for --import-marks-if-exists](
454
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=208d69246e)
455
+ * [fast-import: add support for new 'alias' command](
456
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=b8f50e5b60)
457
+ * [fast-import: allow tags to be identified by mark labels](
458
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=f73b2aba05)
459
+ * [fast-import: fix handling of deleted tags](
460
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=3164e6bd24)
461
+ * [fast-export: fix exporting a tag and nothing else](
462
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=af2abd870b)
463
+ * [git-fast-import.txt: clarify that multiple merge commits are allowed](
464
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d1387d3895)
465
+ * git-2.23.0
466
+ * [t9350: fix encoding test to actually test reencoding](
467
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=32615ce762)
468
+ * [fast-import: support 'encoding' commit header](
469
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=3edfcc65fd)
470
+ * [fast-export: avoid stripping encoding header if we cannot reencode](
471
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=ccbfc96dc4)
472
+ * [fast-export: differentiate between explicitly UTF-8 and implicitly
473
+ UTF-8](
474
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=57a8be2cb0)
475
+ * [fast-export: do automatic reencoding of commit messages only if
476
+ requested](
477
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=e80001f8fd)
478
+ * git-2.22.0
479
+ * [log,diff-tree: add --combined-all-paths option](
480
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d76ce4f734)
481
+ * [t9300: demonstrate bug with get-mark and empty orphan commits](
482
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=62edbec7de)
483
+ * [git-fast-import.txt: fix wording about where ls command can appear](
484
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=a63c54a019)
485
+ * [fast-import: check most prominent commands first](
486
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=5056bb7646)
487
+ * [fast-import: only allow cat-blob requests where it makes sense](
488
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=7ffde293f2)
489
+ * [fast-import: fix erroneous handling of get-mark with empty orphan
490
+ commits](
491
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=cf7b857a77)
492
+ * [Honor core.precomposeUnicode in more places](
493
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=8e712ef6fc)
494
+ * git-2.21.0
495
+ * [fast-export: convert sha1 to oid](
496
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=843b9e6d48)
497
+ * [git-fast-import.txt: fix documentation for --quiet option](
498
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=f55c979b14)
499
+ * [git-fast-export.txt: clarify misleading documentation about rev-list
500
+ args](
501
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=4532be7cba)
502
+ * [fast-export: use value from correct enum](
503
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=b93b81e799)
504
+ * [fast-export: avoid dying when filtering by paths and old tags exist](
505
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=1f30c904b3)
506
+ * [fast-export: move commit rewriting logic into a function for reuse](
507
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=f129c4275c)
508
+ * [fast-export: when using paths, avoid corrupt stream with non-existent
509
+ mark](
510
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=cd13762d8f)
511
+ * [fast-export: ensure we export requested refs](
512
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=fdf31b6369)
513
+ * [fast-export: add --reference-excluded-parents option](
514
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=530ca19c02)
515
+ * [fast-import: remove unmaintained duplicate documentation](
516
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=25dd3e4889)
517
+ * [fast-export: add a --show-original-ids option to show
518
+ original names](
519
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=a965bb3116)
520
+ * [git-show-ref.txt: fix order of flags](
521
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=bd8d6f0def)
522
+ * git-2.20.0
523
+ * [update-ref: fix type of update_flags variable to
524
+ match its usage](
525
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=e4c34855a2)
526
+ * [update-ref: allow --no-deref with --stdin](
527
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d345e9fbe7)
528
+ * git-1.7.3
529
+ * [fast-export: Fix dropping of files with --import-marks and path
530
+ limiting](
531
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=4087a02e45)
532
+ * [fast-export: Add a --full-tree option](
533
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=7f40ab0916)
534
+ * [fast-export: Fix output order of D/F changes](
535
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=060df62422)
536
+ * [fast-import: Improve robustness when D->F changes provided in wrong
537
+ order](
538
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=253fb5f889)
539
+ * git-1.6.4:
540
+ * [fast-export: Set revs.topo_order before calling setup_revisions](
541
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=668f3aa776)
542
+ * [fast-export: Omit tags that tag trees](
543
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=02c48cd69b)
544
+ * [fast-export: Make sure we show actual ref names instead of "(null)"](
545
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=2374502c6c)
546
+ * [fast-export: Do parent rewriting to avoid dropping relevant commits](
547
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=32164131db)
548
+ * [fast-export: Add a --tag-of-filtered-object option for newly
549
+ dangling tags](
550
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=2d8ad46919)
551
+ * [Add new fast-export testcases](
552
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=25e0ca5dd6)
553
+ * [fast-export: Document the fact that git-rev-list arguments are
554
+ accepted](
555
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=8af15d282e)
556
+ * git-1.6.3:
557
+ * [git-filter-branch: avoid collisions with variables in eval'ed
558
+ commands](
559
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d5b0c97d13)
560
+ * [Correct missing SP characters in grammar comment at top of
561
+ fast-import.c](
562
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=98e1a4186a)
563
+ * [fast-export: Avoid dropping files from commits](
564
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=ebeec7dbc5)
565
+ * git-1.6.1.4:
566
+ * [fast-export: ensure we traverse commits in topological order](
567
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=784f8affe4)
temp3/README.md ADDED
@@ -0,0 +1,567 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ git filter-repo is a versatile tool for rewriting history, which includes
2
+ [capabilities I have not found anywhere
3
+ else](#design-rationale-behind-filter-repo). It roughly falls into the
4
+ same space of tool as [git
5
+ filter-branch](https://git-scm.com/docs/git-filter-branch) but without the
6
+ capitulation-inducing poor
7
+ [performance](https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/),
8
+ with far more capabilities, and with a design that scales usability-wise
9
+ beyond trivial rewriting cases. [git filter-repo is now recommended by the
10
+ git project](https://git-scm.com/docs/git-filter-branch#_warning) instead
11
+ of git filter-branch.
12
+
13
+ While most users will probably just use filter-repo as a simple command
14
+ line tool (and likely only use a few of its flags), at its core filter-repo
15
+ contains a library for creating history rewriting tools. As such, users
16
+ with specialized needs can leverage it to quickly create [entirely new
17
+ history rewriting tools](contrib/filter-repo-demos).
18
+
19
+ # Table of Contents
20
+
21
+ * [Prerequisites](#prerequisites)
22
+ * [How do I install it?](#how-do-i-install-it)
23
+ * [How do I use it?](#how-do-i-use-it)
24
+ * [Why filter-repo instead of other alternatives?](#why-filter-repo-instead-of-other-alternatives)
25
+ * [filter-branch](#filter-branch)
26
+ * [BFG Repo Cleaner](#bfg-repo-cleaner)
27
+ * [Simple example, with comparisons](#simple-example-with-comparisons)
28
+ * [Solving this with filter-repo](#solving-this-with-filter-repo)
29
+ * [Solving this with BFG Repo Cleaner](#solving-this-with-bfg-repo-cleaner)
30
+ * [Solving this with filter-branch](#solving-this-with-filter-branch)
31
+ * [Solving this with fast-export/fast-import](#solving-this-with-fast-exportfast-import)
32
+ * [Design rationale behind filter-repo](#design-rationale-behind-filter-repo)
33
+ * [How do I contribute?](#how-do-i-contribute)
34
+ * [Is there a Code of Conduct?](#is-there-a-code-of-conduct)
35
+ * [Upstream Improvements](#upstream-improvements)
36
+
37
+ # Prerequisites
38
+
39
+ filter-repo requires:
40
+
41
+ * git >= 2.22.0 at a minimum; [some features](#upstream-improvements)
42
+ require git >= 2.24.0 or later
43
+ * python3 >= 3.5
44
+
45
+ # How do I install it?
46
+
47
+ `git-filter-repo` is a single-file python script, which was done to make
48
+ installation for basic use on many systems trivial: just place that
49
+ file into your $PATH.
50
+
51
+ See [INSTALL.md](INSTALL.md) for things beyond basic usage or special
52
+ cases. The more involved instructions are only needed if one of the
53
+ following apply:
54
+
55
+ * you do not find the above comment about trivial installation intuitively
56
+ obvious
57
+ * you are working with a python3 executable named something other than
58
+ "python3"
59
+ * you want to install documentation (beyond the builtin docs shown with -h)
60
+ * you want to run some of the [contrib](contrib/filter-repo-demos/) examples
61
+ * you want to create your own python filtering scripts using filter-repo as
62
+ a module/library
63
+
64
+ # How do I use it?
65
+
66
+ For comprehensive documentation:
67
+ * see the [user manual](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html)
68
+ * alternative formating of the user manual is available on various
69
+ external sites
70
+ ([example](https://www.mankier.com/1/git-filter-repo)), for those
71
+ that don't like the htmlpreview.github.io layout, though it may
72
+ only be up-to-date as of the latest release
73
+
74
+ If you prefer learning from examples:
75
+ * there is a [cheat sheet for converting filter-branch
76
+ commands](Documentation/converting-from-filter-branch.md#cheat-sheet-conversion-of-examples-from-the-filter-branch-manpage),
77
+ which covers every example from the filter-branch manual
78
+ * there is a [cheat sheet for converting BFG Repo Cleaner
79
+ commands](Documentation/converting-from-bfg-repo-cleaner.md#cheat-sheet-conversion-of-examples-from-bfg),
80
+ which covers every example from the BFG website
81
+ * the [simple example](#simple-example-with-comparisons) below may
82
+ be of interest
83
+ * the user manual has an extensive [examples
84
+ section](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES)
85
+
86
+ # Why filter-repo instead of other alternatives?
87
+
88
+ This was covered in more detail in a [Git Rev News article on
89
+ filter-repo](https://git.github.io/rev_news/2019/08/21/edition-54/#an-introduction-to-git-filter-repo--written-by-elijah-newren),
90
+ but some highlights for the main competitors:
91
+
92
+ ## filter-branch
93
+
94
+ * filter-branch is [extremely to unusably
95
+ slow](https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/)
96
+ ([multiple orders of magnitude slower than it should
97
+ be](https://git-scm.com/docs/git-filter-branch#PERFORMANCE))
98
+ for non-trivial repositories.
99
+
100
+ * [filter-branch is riddled with
101
+ gotchas](https://git-scm.com/docs/git-filter-branch#SAFETY) that can
102
+ silently corrupt your rewrite or at least thwart your "cleanup"
103
+ efforts by giving you something more problematic and messy than what
104
+ you started with.
105
+
106
+ * filter-branch is [very onerous](#simple-example-with-comparisons)
107
+ [to
108
+ use](https://github.com/newren/git-filter-repo/blob/a6a6a1b0f62d365bbe2e76f823e1621857ec4dbd/contrib/filter-repo-demos/filter-lamely#L9-L61)
109
+ for any rewrite which is even slightly non-trivial.
110
+
111
+ * the git project has stated that the above issues with filter-branch
112
+ cannot be backward compatibly fixed; they recommend that you [stop
113
+ using
114
+ filter-branch](https://git-scm.com/docs/git-filter-branch#_warning)
115
+
116
+ * die-hard fans of filter-branch may be interested in
117
+ [filter-lamely](contrib/filter-repo-demos/filter-lamely)
118
+ (a.k.a. [filter-branch-ish](contrib/filter-repo-demos/filter-branch-ish)),
119
+ a reimplementation of filter-branch based on filter-repo which is
120
+ more performant (though not nearly as fast or safe as
121
+ filter-repo).
122
+
123
+ * a [cheat
124
+ sheet](Documentation/converting-from-filter-branch.md#cheat-sheet-conversion-of-examples-from-the-filter-branch-manpage)
125
+ is available showing how to convert example commands from the manual of
126
+ filter-branch into filter-repo commands.
127
+
128
+ ## BFG Repo Cleaner
129
+
130
+ * great tool for its time, but while it makes some things simple, it
131
+ is limited to a few kinds of rewrites.
132
+
133
+ * its architecture is not amenable to handling more types of
134
+ rewrites.
135
+
136
+ * its architecture presents some shortcomings and bugs even for its
137
+ intended usecase.
138
+
139
+ * fans of bfg may be interested in
140
+ [bfg-ish](contrib/filter-repo-demos/bfg-ish), a reimplementation of bfg
141
+ based on filter-repo which includes several new features and bugfixes
142
+ relative to bfg.
143
+
144
+ * a [cheat
145
+ sheet](Documentation/converting-from-bfg-repo-cleaner.md#cheat-sheet-conversion-of-examples-from-bfg)
146
+ is available showing how to convert example commands from the manual of
147
+ BFG Repo Cleaner into filter-repo commands.
148
+
149
+ # Simple example, with comparisons
150
+
151
+ Let's say that we want to extract a piece of a repository, with the intent
152
+ on merging just that piece into some other bigger repo. For extraction, we
153
+ want to:
154
+
155
+ * extract the history of a single directory, src/. This means that only
156
+ paths under src/ remain in the repo, and any commits that only touched
157
+ paths outside this directory will be removed.
158
+ * rename all files to have a new leading directory, my-module/ (e.g. so that
159
+ src/foo.c becomes my-module/src/foo.c)
160
+ * rename any tags in the extracted repository to have a 'my-module-'
161
+ prefix (to avoid any conflicts when we later merge this repo into
162
+ something else)
163
+
164
+ ## Solving this with filter-repo
165
+
166
+ Doing this with filter-repo is as simple as the following command:
167
+ ```shell
168
+ git filter-repo --path src/ --to-subdirectory-filter my-module --tag-rename '':'my-module-'
169
+ ```
170
+ (the single quotes are unnecessary, but make it clearer to a human that we
171
+ are replacing the empty string as a prefix with `my-module-`)
172
+
173
+ ## Solving this with BFG Repo Cleaner
174
+
175
+ BFG Repo Cleaner is not capable of this kind of rewrite; in fact, all
176
+ three types of wanted changes are outside of its capabilities.
177
+
178
+ ## Solving this with filter-branch
179
+
180
+ filter-branch comes with a pile of caveats (more on that below) even
181
+ once you figure out the necessary invocation(s):
182
+
183
+ ```shell
184
+ git filter-branch \
185
+ --tree-filter 'mkdir -p my-module && \
186
+ git ls-files \
187
+ | grep -v ^src/ \
188
+ | xargs git rm -f -q && \
189
+ ls -d * \
190
+ | grep -v my-module \
191
+ | xargs -I files mv files my-module/' \
192
+ --tag-name-filter 'echo "my-module-$(cat)"' \
193
+ --prune-empty -- --all
194
+ git clone file://$(pwd) newcopy
195
+ cd newcopy
196
+ git for-each-ref --format="delete %(refname)" refs/tags/ \
197
+ | grep -v refs/tags/my-module- \
198
+ | git update-ref --stdin
199
+ git gc --prune=now
200
+ ```
201
+
202
+ Some might notice that the above filter-branch invocation will be really
203
+ slow due to using --tree-filter; you could alternatively use the
204
+ --index-filter option of filter-branch, changing the above commands to:
205
+
206
+ ```shell
207
+ git filter-branch \
208
+ --index-filter 'git ls-files \
209
+ | grep -v ^src/ \
210
+ | xargs git rm -q --cached;
211
+ git ls-files -s \
212
+ | sed "s%$(printf \\t)%&my-module/%" \
213
+ | git update-index --index-info;
214
+ git ls-files \
215
+ | grep -v ^my-module/ \
216
+ | xargs git rm -q --cached' \
217
+ --tag-name-filter 'echo "my-module-$(cat)"' \
218
+ --prune-empty -- --all
219
+ git clone file://$(pwd) newcopy
220
+ cd newcopy
221
+ git for-each-ref --format="delete %(refname)" refs/tags/ \
222
+ | grep -v refs/tags/my-module- \
223
+ | git update-ref --stdin
224
+ git gc --prune=now
225
+ ```
226
+
227
+ However, for either filter-branch command there are a pile of caveats.
228
+ First, some may be wondering why I list five commands here for
229
+ filter-branch. Despite the use of --all and --tag-name-filter, and
230
+ filter-branch's manpage claiming that a clone is enough to get rid of
231
+ old objects, the extra steps to delete the other tags and do another
232
+ gc are still required to clean out the old objects and avoid mixing
233
+ new and old history before pushing somewhere. Other caveats:
234
+ * Commit messages are not rewritten; so if some of your commit
235
+ messages refer to prior commits by (abbreviated) sha1, after the
236
+ rewrite those messages will now refer to commits that are no longer
237
+ part of the history. It would be better to rewrite those
238
+ (abbreviated) sha1 references to refer to the new commit ids.
239
+ * The --prune-empty flag sometimes misses commits that should be
240
+ pruned, and it will also prune commits that *started* empty rather
241
+ than just ended empty due to filtering. For repositories that
242
+ intentionally use empty commits for versioning and publishing
243
+ related purposes, this can be detrimental.
244
+ * The commands above are OS-specific. GNU vs. BSD issues for sed,
245
+ xargs, and other commands often trip up users; I think I failed to
246
+ get most folks to use --index-filter since the only example in the
247
+ filter-branch manpage that both uses it and shows how to move
248
+ everything into a subdirectory is linux-specific, and it is not
249
+ obvious to the reader that it has a portability issue since it
250
+ silently misbehaves rather than failing loudly.
251
+ * The --index-filter version of the filter-branch command may be two to
252
+ three times faster than the --tree-filter version, but both
253
+ filter-branch commands are going to be multiple orders of magnitude
254
+ slower than filter-repo.
255
+ * Both commands assume all filenames are composed entirely of ascii
256
+ characters (even special ascii characters such as tabs or double
257
+ quotes will wreak havoc and likely result in missing files or
258
+ misnamed files)
259
+
260
+ ## Solving this with fast-export/fast-import
261
+
262
+ One can kind of hack this together with something like:
263
+
264
+ ```shell
265
+ git fast-export --no-data --reencode=yes --mark-tags --fake-missing-tagger \
266
+ --signed-tags=strip --tag-of-filtered-object=rewrite --all \
267
+ | grep -vP '^M [0-9]+ [0-9a-f]+ (?!src/)' \
268
+ | grep -vP '^D (?!src/)' \
269
+ | perl -pe 's%^(M [0-9]+ [0-9a-f]+ )(.*)$%\1my-module/\2%' \
270
+ | perl -pe 's%^(D )(.*)$%\1my-module/\2%' \
271
+ | perl -pe s%refs/tags/%refs/tags/my-module-% \
272
+ | git -c core.ignorecase=false fast-import --date-format=raw-permissive \
273
+ --force --quiet
274
+ git for-each-ref --format="delete %(refname)" refs/tags/ \
275
+ | grep -v refs/tags/my-module- \
276
+ | git update-ref --stdin
277
+ git reset --hard
278
+ git reflog expire --expire=now --all
279
+ git gc --prune=now
280
+ ```
281
+
282
+ But this comes with some nasty caveats and limitations:
283
+ * The various greps and regex replacements operate on the entire
284
+ fast-export stream and thus might accidentally corrupt unintended
285
+ portions of it, such as commit messages. If you needed to edit
286
+ file contents and thus dropped the --no-data flag, it could also
287
+ end up corrupting file contents.
288
+ * This command assumes all filenames in the repository are composed
289
+ entirely of ascii characters, and also exclude special characters
290
+ such as tabs or double quotes. If such a special filename exists
291
+ within the old src/ directory, it will be pruned even though it
292
+ was intended to be kept. (In slightly different repository
293
+ rewrites, this type of editing also risks corrupting filenames
294
+ with special characters by adding extra double quotes near the end
295
+ of the filename and in some leading directory name.)
296
+ * This command will leave behind huge numbers of useless empty
297
+ commits, and has no realistic way of pruning them. (And if you
298
+ tried to combine this technique with another tool to prune the
299
+ empty commits, then you now have no way to distinguish between
300
+ commits which were made empty by the filtering that you want to
301
+ remove, and commits which were empty before the filtering process
302
+ and which you thus may want to keep.)
303
+ * Commit messages which reference other commits by hash will now
304
+ reference old commits that no longer exist. Attempting to edit
305
+ the commit messages to update them is extraordinarily difficult to
306
+ add to this kind of direct rewrite.
307
+
308
+ # Design rationale behind filter-repo
309
+
310
+ None of the existing repository filtering tools did what I wanted;
311
+ they all came up short for my needs. No tool provided any of the
312
+ first eight traits below I wanted, and no tool provided more than
313
+ two of the last four traits either:
314
+
315
+ 1. [Starting report] Provide user an analysis of their repo to help
316
+ them get started on what to prune or rename, instead of expecting
317
+ them to guess or find other tools to figure it out. (Triggered, e.g.
318
+ by running the first time with a special flag, such as --analyze.)
319
+
320
+ 1. [Keep vs. remove] Instead of just providing a way for users to
321
+ easily remove selected paths, also provide flags for users to
322
+ only *keep* certain paths. Sure, users could workaround this by
323
+ specifying to remove all paths other than the ones they want to
324
+ keep, but the need to specify all paths that *ever* existed in
325
+ **any** version of the repository could sometimes be quite
326
+ painful. For filter-branch, using pipelines like `git ls-files |
327
+ grep -v ... | xargs -r git rm` might be a reasonable workaround
328
+ but can get unwieldy and isn't as straightforward for users; plus
329
+ those commands are often operating-system specific (can you spot
330
+ the GNUism in the snippet I provided?).
331
+
332
+ 1. [Renaming] It should be easy to rename paths. For example, in
333
+ addition to allowing one to treat some subdirectory as the root
334
+ of the repository, also provide options for users to make the
335
+ root of the repository just become a subdirectory. And more
336
+ generally allow files and directories to be easily renamed.
337
+ Provide sanity checks if renaming causes multiple files to exist
338
+ at the same path. (And add special handling so that if a commit
339
+ merely copied oldname->newname without modification, then
340
+ filtering oldname->newname doesn't trigger the sanity check and
341
+ die on that commit.)
342
+
343
+ 1. [More intelligent safety] Writing copies of the original refs to
344
+ a special namespace within the repo does not provide a
345
+ user-friendly recovery mechanism. Many would struggle to recover
346
+ using that. Almost everyone I've ever seen do a repository
347
+ filtering operation has done so with a fresh clone, because
348
+ wiping out the clone in case of error is a vastly easier recovery
349
+ mechanism. Strongly encourage that workflow by [detecting and
350
+ bailing if we're not in a fresh
351
+ clone](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#FRESHCLONE),
352
+ unless the user overrides with --force.
353
+
354
+ 1. [Auto shrink] Automatically remove old cruft and repack the
355
+ repository for the user after filtering (unless overridden); this
356
+ simplifies things for the user, helps avoid mixing old and new
357
+ history together, and avoids problems where the multi-step
358
+ process for shrinking the repo documented in the manpage doesn't
359
+ actually work in some cases. (I'm looking at you,
360
+ filter-branch.)
361
+
362
+ 1. [Clean separation] Avoid confusing users (and prevent accidental
363
+ re-pushing of old stuff) due to mixing old repo and rewritten
364
+ repo together. (This is particularly a problem with filter-branch
365
+ when using the --tag-name-filter option, and sometimes also an
366
+ issue when only filtering a subset of branches.)
367
+
368
+ 1. [Versatility] Provide the user the ability to extend the tool or
369
+ even write new tools that leverage existing capabilities, and
370
+ provide this extensibility in a way that (a) avoids the need to
371
+ fork separate processes (which would destroy performance), (b)
372
+ avoids making the user specify OS-dependent shell commands (which
373
+ would prevent users from sharing commands with each other), (c)
374
+ takes advantage of rich data structures (because hashes, dicts,
375
+ lists, and arrays are prohibitively difficult in shell) and (d)
376
+ provides reasonable string manipulation capabilities (which are
377
+ sorely lacking in shell).
378
+
379
+ 1. [Old commit references] Provide a way for users to use old commit
380
+ IDs with the new repository (in particular via mapping from old to
381
+ new hashes with refs/replace/ references).
382
+
383
+ 1. [Commit message consistency] If commit messages refer to other
384
+ commits by ID (e.g. "this reverts commit 01234567890abcdef", "In
385
+ commit 0013deadbeef9a..."), those commit messages should be
386
+ rewritten to refer to the new commit IDs.
387
+
388
+ 1. [Become-empty pruning] Commits which become empty due to filtering
389
+ should be pruned. If the parent of a commit is pruned, the first
390
+ non-pruned ancestor needs to become the new parent. If no
391
+ non-pruned ancestor exists and the commit was not a merge, then it
392
+ becomes a new root commit. If no non-pruned ancestor exists and
393
+ the commit was a merge, then the merge will have one less parent
394
+ (and thus make it likely to become a non-merge commit which would
395
+ itself be pruned if it had no file changes of its own). One
396
+ special thing to note here is that we prune commits which become
397
+ empty, NOT commits which start empty. Some projects intentionally
398
+ create empty commits for versioning or publishing reasons, and
399
+ these should not be removed. (As a special case, commits which
400
+ started empty but whose parent was pruned away will also be
401
+ considered to have "become empty".)
402
+
403
+ 1. [Become-degenerate pruning] Pruning of commits which become empty
404
+ can potentially cause topology changes, and there are lots of
405
+ special cases. Normally, merge commits are not removed since they
406
+ are needed to preserve the graph topology, but the pruning of
407
+ parents and other ancestors can ultimately result in the loss of
408
+ one or more parents. A simple case was already noted above: if a
409
+ merge commit loses enough parents to become a non-merge commit and
410
+ it has no file changes, then it too can be pruned. Merge commits
411
+ can also have a topology that becomes degenerate: it could end up
412
+ with the merge_base serving as both parents (if all intervening
413
+ commits from the original repo were pruned), or it could end up
414
+ with one parent which is an ancestor of its other parent. In such
415
+ cases, if the merge has no file changes of its own, then the merge
416
+ commit can also be pruned. However, much as we do with empty
417
+ pruning we do not prune merge commits that started degenerate
418
+ (which indicates it may have been intentional, such as with --no-ff
419
+ merges) but only merge commits that become degenerate and have no
420
+ file changes of their own.
421
+
422
+ 1. [Speed] Filtering should be reasonably fast
423
+
424
+ # How do I contribute?
425
+
426
+ See the [contributing guidelines](Documentation/Contributing.md).
427
+
428
+ # Is there a Code of Conduct?
429
+
430
+ Participants in the filter-repo community are expected to adhere to
431
+ the same standards as for the git project, so the [git Code of
432
+ Conduct](https://git.kernel.org/pub/scm/git/git.git/tree/CODE_OF_CONDUCT.md)
433
+ applies.
434
+
435
+ # Upstream Improvements
436
+
437
+ Work on filter-repo and [its
438
+ predecessor](https://public-inbox.org/git/51419b2c0904072035u1182b507o836a67ac308d32b9@mail.gmail.com/)
439
+ has also driven numerous improvements to fast-export and fast-import
440
+ (and occasionally other commands) in core git, based on things
441
+ filter-repo needs to do its work:
442
+
443
+ * git-2.28.0
444
+ * [fast-import: add new --date-format=raw-permissive format](
445
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d42a2fb72f)
446
+ * git-2.24.0
447
+ * [fast-export: handle nested tags](
448
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=941790d7de)
449
+ * [t9350: add tests for tags of things other than a commit](
450
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=8d7d33c1ce)
451
+ * [fast-export: allow user to request tags be marked with --mark-tags](
452
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=a1638cfe12)
453
+ * [fast-export: add support for --import-marks-if-exists](
454
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=208d69246e)
455
+ * [fast-import: add support for new 'alias' command](
456
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=b8f50e5b60)
457
+ * [fast-import: allow tags to be identified by mark labels](
458
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=f73b2aba05)
459
+ * [fast-import: fix handling of deleted tags](
460
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=3164e6bd24)
461
+ * [fast-export: fix exporting a tag and nothing else](
462
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=af2abd870b)
463
+ * [git-fast-import.txt: clarify that multiple merge commits are allowed](
464
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d1387d3895)
465
+ * git-2.23.0
466
+ * [t9350: fix encoding test to actually test reencoding](
467
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=32615ce762)
468
+ * [fast-import: support 'encoding' commit header](
469
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=3edfcc65fd)
470
+ * [fast-export: avoid stripping encoding header if we cannot reencode](
471
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=ccbfc96dc4)
472
+ * [fast-export: differentiate between explicitly UTF-8 and implicitly
473
+ UTF-8](
474
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=57a8be2cb0)
475
+ * [fast-export: do automatic reencoding of commit messages only if
476
+ requested](
477
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=e80001f8fd)
478
+ * git-2.22.0
479
+ * [log,diff-tree: add --combined-all-paths option](
480
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d76ce4f734)
481
+ * [t9300: demonstrate bug with get-mark and empty orphan commits](
482
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=62edbec7de)
483
+ * [git-fast-import.txt: fix wording about where ls command can appear](
484
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=a63c54a019)
485
+ * [fast-import: check most prominent commands first](
486
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=5056bb7646)
487
+ * [fast-import: only allow cat-blob requests where it makes sense](
488
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=7ffde293f2)
489
+ * [fast-import: fix erroneous handling of get-mark with empty orphan
490
+ commits](
491
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=cf7b857a77)
492
+ * [Honor core.precomposeUnicode in more places](
493
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=8e712ef6fc)
494
+ * git-2.21.0
495
+ * [fast-export: convert sha1 to oid](
496
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=843b9e6d48)
497
+ * [git-fast-import.txt: fix documentation for --quiet option](
498
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=f55c979b14)
499
+ * [git-fast-export.txt: clarify misleading documentation about rev-list
500
+ args](
501
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=4532be7cba)
502
+ * [fast-export: use value from correct enum](
503
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=b93b81e799)
504
+ * [fast-export: avoid dying when filtering by paths and old tags exist](
505
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=1f30c904b3)
506
+ * [fast-export: move commit rewriting logic into a function for reuse](
507
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=f129c4275c)
508
+ * [fast-export: when using paths, avoid corrupt stream with non-existent
509
+ mark](
510
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=cd13762d8f)
511
+ * [fast-export: ensure we export requested refs](
512
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=fdf31b6369)
513
+ * [fast-export: add --reference-excluded-parents option](
514
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=530ca19c02)
515
+ * [fast-import: remove unmaintained duplicate documentation](
516
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=25dd3e4889)
517
+ * [fast-export: add a --show-original-ids option to show
518
+ original names](
519
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=a965bb3116)
520
+ * [git-show-ref.txt: fix order of flags](
521
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=bd8d6f0def)
522
+ * git-2.20.0
523
+ * [update-ref: fix type of update_flags variable to
524
+ match its usage](
525
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=e4c34855a2)
526
+ * [update-ref: allow --no-deref with --stdin](
527
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d345e9fbe7)
528
+ * git-1.7.3
529
+ * [fast-export: Fix dropping of files with --import-marks and path
530
+ limiting](
531
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=4087a02e45)
532
+ * [fast-export: Add a --full-tree option](
533
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=7f40ab0916)
534
+ * [fast-export: Fix output order of D/F changes](
535
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=060df62422)
536
+ * [fast-import: Improve robustness when D->F changes provided in wrong
537
+ order](
538
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=253fb5f889)
539
+ * git-1.6.4:
540
+ * [fast-export: Set revs.topo_order before calling setup_revisions](
541
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=668f3aa776)
542
+ * [fast-export: Omit tags that tag trees](
543
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=02c48cd69b)
544
+ * [fast-export: Make sure we show actual ref names instead of "(null)"](
545
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=2374502c6c)
546
+ * [fast-export: Do parent rewriting to avoid dropping relevant commits](
547
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=32164131db)
548
+ * [fast-export: Add a --tag-of-filtered-object option for newly
549
+ dangling tags](
550
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=2d8ad46919)
551
+ * [Add new fast-export testcases](
552
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=25e0ca5dd6)
553
+ * [fast-export: Document the fact that git-rev-list arguments are
554
+ accepted](
555
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=8af15d282e)
556
+ * git-1.6.3:
557
+ * [git-filter-branch: avoid collisions with variables in eval'ed
558
+ commands](
559
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d5b0c97d13)
560
+ * [Correct missing SP characters in grammar comment at top of
561
+ fast-import.c](
562
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=98e1a4186a)
563
+ * [fast-export: Avoid dropping files from commits](
564
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=ebeec7dbc5)
565
+ * git-1.6.1.4:
566
+ * [fast-export: ensure we traverse commits in topological order](
567
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=784f8affe4)
temp4/README.md ADDED
@@ -0,0 +1,567 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ git filter-repo is a versatile tool for rewriting history, which includes
2
+ [capabilities I have not found anywhere
3
+ else](#design-rationale-behind-filter-repo). It roughly falls into the
4
+ same space of tool as [git
5
+ filter-branch](https://git-scm.com/docs/git-filter-branch) but without the
6
+ capitulation-inducing poor
7
+ [performance](https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/),
8
+ with far more capabilities, and with a design that scales usability-wise
9
+ beyond trivial rewriting cases. [git filter-repo is now recommended by the
10
+ git project](https://git-scm.com/docs/git-filter-branch#_warning) instead
11
+ of git filter-branch.
12
+
13
+ While most users will probably just use filter-repo as a simple command
14
+ line tool (and likely only use a few of its flags), at its core filter-repo
15
+ contains a library for creating history rewriting tools. As such, users
16
+ with specialized needs can leverage it to quickly create [entirely new
17
+ history rewriting tools](contrib/filter-repo-demos).
18
+
19
+ # Table of Contents
20
+
21
+ * [Prerequisites](#prerequisites)
22
+ * [How do I install it?](#how-do-i-install-it)
23
+ * [How do I use it?](#how-do-i-use-it)
24
+ * [Why filter-repo instead of other alternatives?](#why-filter-repo-instead-of-other-alternatives)
25
+ * [filter-branch](#filter-branch)
26
+ * [BFG Repo Cleaner](#bfg-repo-cleaner)
27
+ * [Simple example, with comparisons](#simple-example-with-comparisons)
28
+ * [Solving this with filter-repo](#solving-this-with-filter-repo)
29
+ * [Solving this with BFG Repo Cleaner](#solving-this-with-bfg-repo-cleaner)
30
+ * [Solving this with filter-branch](#solving-this-with-filter-branch)
31
+ * [Solving this with fast-export/fast-import](#solving-this-with-fast-exportfast-import)
32
+ * [Design rationale behind filter-repo](#design-rationale-behind-filter-repo)
33
+ * [How do I contribute?](#how-do-i-contribute)
34
+ * [Is there a Code of Conduct?](#is-there-a-code-of-conduct)
35
+ * [Upstream Improvements](#upstream-improvements)
36
+
37
+ # Prerequisites
38
+
39
+ filter-repo requires:
40
+
41
+ * git >= 2.22.0 at a minimum; [some features](#upstream-improvements)
42
+ require git >= 2.24.0 or later
43
+ * python3 >= 3.5
44
+
45
+ # How do I install it?
46
+
47
+ `git-filter-repo` is a single-file python script, which was done to make
48
+ installation for basic use on many systems trivial: just place that
49
+ file into your $PATH.
50
+
51
+ See [INSTALL.md](INSTALL.md) for things beyond basic usage or special
52
+ cases. The more involved instructions are only needed if one of the
53
+ following apply:
54
+
55
+ * you do not find the above comment about trivial installation intuitively
56
+ obvious
57
+ * you are working with a python3 executable named something other than
58
+ "python3"
59
+ * you want to install documentation (beyond the builtin docs shown with -h)
60
+ * you want to run some of the [contrib](contrib/filter-repo-demos/) examples
61
+ * you want to create your own python filtering scripts using filter-repo as
62
+ a module/library
63
+
64
+ # How do I use it?
65
+
66
+ For comprehensive documentation:
67
+ * see the [user manual](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html)
68
+ * alternative formating of the user manual is available on various
69
+ external sites
70
+ ([example](https://www.mankier.com/1/git-filter-repo)), for those
71
+ that don't like the htmlpreview.github.io layout, though it may
72
+ only be up-to-date as of the latest release
73
+
74
+ If you prefer learning from examples:
75
+ * there is a [cheat sheet for converting filter-branch
76
+ commands](Documentation/converting-from-filter-branch.md#cheat-sheet-conversion-of-examples-from-the-filter-branch-manpage),
77
+ which covers every example from the filter-branch manual
78
+ * there is a [cheat sheet for converting BFG Repo Cleaner
79
+ commands](Documentation/converting-from-bfg-repo-cleaner.md#cheat-sheet-conversion-of-examples-from-bfg),
80
+ which covers every example from the BFG website
81
+ * the [simple example](#simple-example-with-comparisons) below may
82
+ be of interest
83
+ * the user manual has an extensive [examples
84
+ section](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES)
85
+
86
+ # Why filter-repo instead of other alternatives?
87
+
88
+ This was covered in more detail in a [Git Rev News article on
89
+ filter-repo](https://git.github.io/rev_news/2019/08/21/edition-54/#an-introduction-to-git-filter-repo--written-by-elijah-newren),
90
+ but some highlights for the main competitors:
91
+
92
+ ## filter-branch
93
+
94
+ * filter-branch is [extremely to unusably
95
+ slow](https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/)
96
+ ([multiple orders of magnitude slower than it should
97
+ be](https://git-scm.com/docs/git-filter-branch#PERFORMANCE))
98
+ for non-trivial repositories.
99
+
100
+ * [filter-branch is riddled with
101
+ gotchas](https://git-scm.com/docs/git-filter-branch#SAFETY) that can
102
+ silently corrupt your rewrite or at least thwart your "cleanup"
103
+ efforts by giving you something more problematic and messy than what
104
+ you started with.
105
+
106
+ * filter-branch is [very onerous](#simple-example-with-comparisons)
107
+ [to
108
+ use](https://github.com/newren/git-filter-repo/blob/a6a6a1b0f62d365bbe2e76f823e1621857ec4dbd/contrib/filter-repo-demos/filter-lamely#L9-L61)
109
+ for any rewrite which is even slightly non-trivial.
110
+
111
+ * the git project has stated that the above issues with filter-branch
112
+ cannot be backward compatibly fixed; they recommend that you [stop
113
+ using
114
+ filter-branch](https://git-scm.com/docs/git-filter-branch#_warning)
115
+
116
+ * die-hard fans of filter-branch may be interested in
117
+ [filter-lamely](contrib/filter-repo-demos/filter-lamely)
118
+ (a.k.a. [filter-branch-ish](contrib/filter-repo-demos/filter-branch-ish)),
119
+ a reimplementation of filter-branch based on filter-repo which is
120
+ more performant (though not nearly as fast or safe as
121
+ filter-repo).
122
+
123
+ * a [cheat
124
+ sheet](Documentation/converting-from-filter-branch.md#cheat-sheet-conversion-of-examples-from-the-filter-branch-manpage)
125
+ is available showing how to convert example commands from the manual of
126
+ filter-branch into filter-repo commands.
127
+
128
+ ## BFG Repo Cleaner
129
+
130
+ * great tool for its time, but while it makes some things simple, it
131
+ is limited to a few kinds of rewrites.
132
+
133
+ * its architecture is not amenable to handling more types of
134
+ rewrites.
135
+
136
+ * its architecture presents some shortcomings and bugs even for its
137
+ intended usecase.
138
+
139
+ * fans of bfg may be interested in
140
+ [bfg-ish](contrib/filter-repo-demos/bfg-ish), a reimplementation of bfg
141
+ based on filter-repo which includes several new features and bugfixes
142
+ relative to bfg.
143
+
144
+ * a [cheat
145
+ sheet](Documentation/converting-from-bfg-repo-cleaner.md#cheat-sheet-conversion-of-examples-from-bfg)
146
+ is available showing how to convert example commands from the manual of
147
+ BFG Repo Cleaner into filter-repo commands.
148
+
149
+ # Simple example, with comparisons
150
+
151
+ Let's say that we want to extract a piece of a repository, with the intent
152
+ on merging just that piece into some other bigger repo. For extraction, we
153
+ want to:
154
+
155
+ * extract the history of a single directory, src/. This means that only
156
+ paths under src/ remain in the repo, and any commits that only touched
157
+ paths outside this directory will be removed.
158
+ * rename all files to have a new leading directory, my-module/ (e.g. so that
159
+ src/foo.c becomes my-module/src/foo.c)
160
+ * rename any tags in the extracted repository to have a 'my-module-'
161
+ prefix (to avoid any conflicts when we later merge this repo into
162
+ something else)
163
+
164
+ ## Solving this with filter-repo
165
+
166
+ Doing this with filter-repo is as simple as the following command:
167
+ ```shell
168
+ git filter-repo --path src/ --to-subdirectory-filter my-module --tag-rename '':'my-module-'
169
+ ```
170
+ (the single quotes are unnecessary, but make it clearer to a human that we
171
+ are replacing the empty string as a prefix with `my-module-`)
172
+
173
+ ## Solving this with BFG Repo Cleaner
174
+
175
+ BFG Repo Cleaner is not capable of this kind of rewrite; in fact, all
176
+ three types of wanted changes are outside of its capabilities.
177
+
178
+ ## Solving this with filter-branch
179
+
180
+ filter-branch comes with a pile of caveats (more on that below) even
181
+ once you figure out the necessary invocation(s):
182
+
183
+ ```shell
184
+ git filter-branch \
185
+ --tree-filter 'mkdir -p my-module && \
186
+ git ls-files \
187
+ | grep -v ^src/ \
188
+ | xargs git rm -f -q && \
189
+ ls -d * \
190
+ | grep -v my-module \
191
+ | xargs -I files mv files my-module/' \
192
+ --tag-name-filter 'echo "my-module-$(cat)"' \
193
+ --prune-empty -- --all
194
+ git clone file://$(pwd) newcopy
195
+ cd newcopy
196
+ git for-each-ref --format="delete %(refname)" refs/tags/ \
197
+ | grep -v refs/tags/my-module- \
198
+ | git update-ref --stdin
199
+ git gc --prune=now
200
+ ```
201
+
202
+ Some might notice that the above filter-branch invocation will be really
203
+ slow due to using --tree-filter; you could alternatively use the
204
+ --index-filter option of filter-branch, changing the above commands to:
205
+
206
+ ```shell
207
+ git filter-branch \
208
+ --index-filter 'git ls-files \
209
+ | grep -v ^src/ \
210
+ | xargs git rm -q --cached;
211
+ git ls-files -s \
212
+ | sed "s%$(printf \\t)%&my-module/%" \
213
+ | git update-index --index-info;
214
+ git ls-files \
215
+ | grep -v ^my-module/ \
216
+ | xargs git rm -q --cached' \
217
+ --tag-name-filter 'echo "my-module-$(cat)"' \
218
+ --prune-empty -- --all
219
+ git clone file://$(pwd) newcopy
220
+ cd newcopy
221
+ git for-each-ref --format="delete %(refname)" refs/tags/ \
222
+ | grep -v refs/tags/my-module- \
223
+ | git update-ref --stdin
224
+ git gc --prune=now
225
+ ```
226
+
227
+ However, for either filter-branch command there are a pile of caveats.
228
+ First, some may be wondering why I list five commands here for
229
+ filter-branch. Despite the use of --all and --tag-name-filter, and
230
+ filter-branch's manpage claiming that a clone is enough to get rid of
231
+ old objects, the extra steps to delete the other tags and do another
232
+ gc are still required to clean out the old objects and avoid mixing
233
+ new and old history before pushing somewhere. Other caveats:
234
+ * Commit messages are not rewritten; so if some of your commit
235
+ messages refer to prior commits by (abbreviated) sha1, after the
236
+ rewrite those messages will now refer to commits that are no longer
237
+ part of the history. It would be better to rewrite those
238
+ (abbreviated) sha1 references to refer to the new commit ids.
239
+ * The --prune-empty flag sometimes misses commits that should be
240
+ pruned, and it will also prune commits that *started* empty rather
241
+ than just ended empty due to filtering. For repositories that
242
+ intentionally use empty commits for versioning and publishing
243
+ related purposes, this can be detrimental.
244
+ * The commands above are OS-specific. GNU vs. BSD issues for sed,
245
+ xargs, and other commands often trip up users; I think I failed to
246
+ get most folks to use --index-filter since the only example in the
247
+ filter-branch manpage that both uses it and shows how to move
248
+ everything into a subdirectory is linux-specific, and it is not
249
+ obvious to the reader that it has a portability issue since it
250
+ silently misbehaves rather than failing loudly.
251
+ * The --index-filter version of the filter-branch command may be two to
252
+ three times faster than the --tree-filter version, but both
253
+ filter-branch commands are going to be multiple orders of magnitude
254
+ slower than filter-repo.
255
+ * Both commands assume all filenames are composed entirely of ascii
256
+ characters (even special ascii characters such as tabs or double
257
+ quotes will wreak havoc and likely result in missing files or
258
+ misnamed files)
259
+
260
+ ## Solving this with fast-export/fast-import
261
+
262
+ One can kind of hack this together with something like:
263
+
264
+ ```shell
265
+ git fast-export --no-data --reencode=yes --mark-tags --fake-missing-tagger \
266
+ --signed-tags=strip --tag-of-filtered-object=rewrite --all \
267
+ | grep -vP '^M [0-9]+ [0-9a-f]+ (?!src/)' \
268
+ | grep -vP '^D (?!src/)' \
269
+ | perl -pe 's%^(M [0-9]+ [0-9a-f]+ )(.*)$%\1my-module/\2%' \
270
+ | perl -pe 's%^(D )(.*)$%\1my-module/\2%' \
271
+ | perl -pe s%refs/tags/%refs/tags/my-module-% \
272
+ | git -c core.ignorecase=false fast-import --date-format=raw-permissive \
273
+ --force --quiet
274
+ git for-each-ref --format="delete %(refname)" refs/tags/ \
275
+ | grep -v refs/tags/my-module- \
276
+ | git update-ref --stdin
277
+ git reset --hard
278
+ git reflog expire --expire=now --all
279
+ git gc --prune=now
280
+ ```
281
+
282
+ But this comes with some nasty caveats and limitations:
283
+ * The various greps and regex replacements operate on the entire
284
+ fast-export stream and thus might accidentally corrupt unintended
285
+ portions of it, such as commit messages. If you needed to edit
286
+ file contents and thus dropped the --no-data flag, it could also
287
+ end up corrupting file contents.
288
+ * This command assumes all filenames in the repository are composed
289
+ entirely of ascii characters, and also exclude special characters
290
+ such as tabs or double quotes. If such a special filename exists
291
+ within the old src/ directory, it will be pruned even though it
292
+ was intended to be kept. (In slightly different repository
293
+ rewrites, this type of editing also risks corrupting filenames
294
+ with special characters by adding extra double quotes near the end
295
+ of the filename and in some leading directory name.)
296
+ * This command will leave behind huge numbers of useless empty
297
+ commits, and has no realistic way of pruning them. (And if you
298
+ tried to combine this technique with another tool to prune the
299
+ empty commits, then you now have no way to distinguish between
300
+ commits which were made empty by the filtering that you want to
301
+ remove, and commits which were empty before the filtering process
302
+ and which you thus may want to keep.)
303
+ * Commit messages which reference other commits by hash will now
304
+ reference old commits that no longer exist. Attempting to edit
305
+ the commit messages to update them is extraordinarily difficult to
306
+ add to this kind of direct rewrite.
307
+
308
+ # Design rationale behind filter-repo
309
+
310
+ None of the existing repository filtering tools did what I wanted;
311
+ they all came up short for my needs. No tool provided any of the
312
+ first eight traits below I wanted, and no tool provided more than
313
+ two of the last four traits either:
314
+
315
+ 1. [Starting report] Provide user an analysis of their repo to help
316
+ them get started on what to prune or rename, instead of expecting
317
+ them to guess or find other tools to figure it out. (Triggered, e.g.
318
+ by running the first time with a special flag, such as --analyze.)
319
+
320
+ 1. [Keep vs. remove] Instead of just providing a way for users to
321
+ easily remove selected paths, also provide flags for users to
322
+ only *keep* certain paths. Sure, users could workaround this by
323
+ specifying to remove all paths other than the ones they want to
324
+ keep, but the need to specify all paths that *ever* existed in
325
+ **any** version of the repository could sometimes be quite
326
+ painful. For filter-branch, using pipelines like `git ls-files |
327
+ grep -v ... | xargs -r git rm` might be a reasonable workaround
328
+ but can get unwieldy and isn't as straightforward for users; plus
329
+ those commands are often operating-system specific (can you spot
330
+ the GNUism in the snippet I provided?).
331
+
332
+ 1. [Renaming] It should be easy to rename paths. For example, in
333
+ addition to allowing one to treat some subdirectory as the root
334
+ of the repository, also provide options for users to make the
335
+ root of the repository just become a subdirectory. And more
336
+ generally allow files and directories to be easily renamed.
337
+ Provide sanity checks if renaming causes multiple files to exist
338
+ at the same path. (And add special handling so that if a commit
339
+ merely copied oldname->newname without modification, then
340
+ filtering oldname->newname doesn't trigger the sanity check and
341
+ die on that commit.)
342
+
343
+ 1. [More intelligent safety] Writing copies of the original refs to
344
+ a special namespace within the repo does not provide a
345
+ user-friendly recovery mechanism. Many would struggle to recover
346
+ using that. Almost everyone I've ever seen do a repository
347
+ filtering operation has done so with a fresh clone, because
348
+ wiping out the clone in case of error is a vastly easier recovery
349
+ mechanism. Strongly encourage that workflow by [detecting and
350
+ bailing if we're not in a fresh
351
+ clone](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#FRESHCLONE),
352
+ unless the user overrides with --force.
353
+
354
+ 1. [Auto shrink] Automatically remove old cruft and repack the
355
+ repository for the user after filtering (unless overridden); this
356
+ simplifies things for the user, helps avoid mixing old and new
357
+ history together, and avoids problems where the multi-step
358
+ process for shrinking the repo documented in the manpage doesn't
359
+ actually work in some cases. (I'm looking at you,
360
+ filter-branch.)
361
+
362
+ 1. [Clean separation] Avoid confusing users (and prevent accidental
363
+ re-pushing of old stuff) due to mixing old repo and rewritten
364
+ repo together. (This is particularly a problem with filter-branch
365
+ when using the --tag-name-filter option, and sometimes also an
366
+ issue when only filtering a subset of branches.)
367
+
368
+ 1. [Versatility] Provide the user the ability to extend the tool or
369
+ even write new tools that leverage existing capabilities, and
370
+ provide this extensibility in a way that (a) avoids the need to
371
+ fork separate processes (which would destroy performance), (b)
372
+ avoids making the user specify OS-dependent shell commands (which
373
+ would prevent users from sharing commands with each other), (c)
374
+ takes advantage of rich data structures (because hashes, dicts,
375
+ lists, and arrays are prohibitively difficult in shell) and (d)
376
+ provides reasonable string manipulation capabilities (which are
377
+ sorely lacking in shell).
378
+
379
+ 1. [Old commit references] Provide a way for users to use old commit
380
+ IDs with the new repository (in particular via mapping from old to
381
+ new hashes with refs/replace/ references).
382
+
383
+ 1. [Commit message consistency] If commit messages refer to other
384
+ commits by ID (e.g. "this reverts commit 01234567890abcdef", "In
385
+ commit 0013deadbeef9a..."), those commit messages should be
386
+ rewritten to refer to the new commit IDs.
387
+
388
+ 1. [Become-empty pruning] Commits which become empty due to filtering
389
+ should be pruned. If the parent of a commit is pruned, the first
390
+ non-pruned ancestor needs to become the new parent. If no
391
+ non-pruned ancestor exists and the commit was not a merge, then it
392
+ becomes a new root commit. If no non-pruned ancestor exists and
393
+ the commit was a merge, then the merge will have one less parent
394
+ (and thus make it likely to become a non-merge commit which would
395
+ itself be pruned if it had no file changes of its own). One
396
+ special thing to note here is that we prune commits which become
397
+ empty, NOT commits which start empty. Some projects intentionally
398
+ create empty commits for versioning or publishing reasons, and
399
+ these should not be removed. (As a special case, commits which
400
+ started empty but whose parent was pruned away will also be
401
+ considered to have "become empty".)
402
+
403
+ 1. [Become-degenerate pruning] Pruning of commits which become empty
404
+ can potentially cause topology changes, and there are lots of
405
+ special cases. Normally, merge commits are not removed since they
406
+ are needed to preserve the graph topology, but the pruning of
407
+ parents and other ancestors can ultimately result in the loss of
408
+ one or more parents. A simple case was already noted above: if a
409
+ merge commit loses enough parents to become a non-merge commit and
410
+ it has no file changes, then it too can be pruned. Merge commits
411
+ can also have a topology that becomes degenerate: it could end up
412
+ with the merge_base serving as both parents (if all intervening
413
+ commits from the original repo were pruned), or it could end up
414
+ with one parent which is an ancestor of its other parent. In such
415
+ cases, if the merge has no file changes of its own, then the merge
416
+ commit can also be pruned. However, much as we do with empty
417
+ pruning we do not prune merge commits that started degenerate
418
+ (which indicates it may have been intentional, such as with --no-ff
419
+ merges) but only merge commits that become degenerate and have no
420
+ file changes of their own.
421
+
422
+ 1. [Speed] Filtering should be reasonably fast
423
+
424
+ # How do I contribute?
425
+
426
+ See the [contributing guidelines](Documentation/Contributing.md).
427
+
428
+ # Is there a Code of Conduct?
429
+
430
+ Participants in the filter-repo community are expected to adhere to
431
+ the same standards as for the git project, so the [git Code of
432
+ Conduct](https://git.kernel.org/pub/scm/git/git.git/tree/CODE_OF_CONDUCT.md)
433
+ applies.
434
+
435
+ # Upstream Improvements
436
+
437
+ Work on filter-repo and [its
438
+ predecessor](https://public-inbox.org/git/51419b2c0904072035u1182b507o836a67ac308d32b9@mail.gmail.com/)
439
+ has also driven numerous improvements to fast-export and fast-import
440
+ (and occasionally other commands) in core git, based on things
441
+ filter-repo needs to do its work:
442
+
443
+ * git-2.28.0
444
+ * [fast-import: add new --date-format=raw-permissive format](
445
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d42a2fb72f)
446
+ * git-2.24.0
447
+ * [fast-export: handle nested tags](
448
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=941790d7de)
449
+ * [t9350: add tests for tags of things other than a commit](
450
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=8d7d33c1ce)
451
+ * [fast-export: allow user to request tags be marked with --mark-tags](
452
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=a1638cfe12)
453
+ * [fast-export: add support for --import-marks-if-exists](
454
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=208d69246e)
455
+ * [fast-import: add support for new 'alias' command](
456
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=b8f50e5b60)
457
+ * [fast-import: allow tags to be identified by mark labels](
458
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=f73b2aba05)
459
+ * [fast-import: fix handling of deleted tags](
460
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=3164e6bd24)
461
+ * [fast-export: fix exporting a tag and nothing else](
462
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=af2abd870b)
463
+ * [git-fast-import.txt: clarify that multiple merge commits are allowed](
464
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d1387d3895)
465
+ * git-2.23.0
466
+ * [t9350: fix encoding test to actually test reencoding](
467
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=32615ce762)
468
+ * [fast-import: support 'encoding' commit header](
469
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=3edfcc65fd)
470
+ * [fast-export: avoid stripping encoding header if we cannot reencode](
471
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=ccbfc96dc4)
472
+ * [fast-export: differentiate between explicitly UTF-8 and implicitly
473
+ UTF-8](
474
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=57a8be2cb0)
475
+ * [fast-export: do automatic reencoding of commit messages only if
476
+ requested](
477
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=e80001f8fd)
478
+ * git-2.22.0
479
+ * [log,diff-tree: add --combined-all-paths option](
480
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d76ce4f734)
481
+ * [t9300: demonstrate bug with get-mark and empty orphan commits](
482
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=62edbec7de)
483
+ * [git-fast-import.txt: fix wording about where ls command can appear](
484
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=a63c54a019)
485
+ * [fast-import: check most prominent commands first](
486
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=5056bb7646)
487
+ * [fast-import: only allow cat-blob requests where it makes sense](
488
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=7ffde293f2)
489
+ * [fast-import: fix erroneous handling of get-mark with empty orphan
490
+ commits](
491
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=cf7b857a77)
492
+ * [Honor core.precomposeUnicode in more places](
493
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=8e712ef6fc)
494
+ * git-2.21.0
495
+ * [fast-export: convert sha1 to oid](
496
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=843b9e6d48)
497
+ * [git-fast-import.txt: fix documentation for --quiet option](
498
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=f55c979b14)
499
+ * [git-fast-export.txt: clarify misleading documentation about rev-list
500
+ args](
501
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=4532be7cba)
502
+ * [fast-export: use value from correct enum](
503
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=b93b81e799)
504
+ * [fast-export: avoid dying when filtering by paths and old tags exist](
505
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=1f30c904b3)
506
+ * [fast-export: move commit rewriting logic into a function for reuse](
507
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=f129c4275c)
508
+ * [fast-export: when using paths, avoid corrupt stream with non-existent
509
+ mark](
510
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=cd13762d8f)
511
+ * [fast-export: ensure we export requested refs](
512
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=fdf31b6369)
513
+ * [fast-export: add --reference-excluded-parents option](
514
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=530ca19c02)
515
+ * [fast-import: remove unmaintained duplicate documentation](
516
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=25dd3e4889)
517
+ * [fast-export: add a --show-original-ids option to show
518
+ original names](
519
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=a965bb3116)
520
+ * [git-show-ref.txt: fix order of flags](
521
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=bd8d6f0def)
522
+ * git-2.20.0
523
+ * [update-ref: fix type of update_flags variable to
524
+ match its usage](
525
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=e4c34855a2)
526
+ * [update-ref: allow --no-deref with --stdin](
527
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d345e9fbe7)
528
+ * git-1.7.3
529
+ * [fast-export: Fix dropping of files with --import-marks and path
530
+ limiting](
531
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=4087a02e45)
532
+ * [fast-export: Add a --full-tree option](
533
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=7f40ab0916)
534
+ * [fast-export: Fix output order of D/F changes](
535
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=060df62422)
536
+ * [fast-import: Improve robustness when D->F changes provided in wrong
537
+ order](
538
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=253fb5f889)
539
+ * git-1.6.4:
540
+ * [fast-export: Set revs.topo_order before calling setup_revisions](
541
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=668f3aa776)
542
+ * [fast-export: Omit tags that tag trees](
543
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=02c48cd69b)
544
+ * [fast-export: Make sure we show actual ref names instead of "(null)"](
545
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=2374502c6c)
546
+ * [fast-export: Do parent rewriting to avoid dropping relevant commits](
547
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=32164131db)
548
+ * [fast-export: Add a --tag-of-filtered-object option for newly
549
+ dangling tags](
550
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=2d8ad46919)
551
+ * [Add new fast-export testcases](
552
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=25e0ca5dd6)
553
+ * [fast-export: Document the fact that git-rev-list arguments are
554
+ accepted](
555
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=8af15d282e)
556
+ * git-1.6.3:
557
+ * [git-filter-branch: avoid collisions with variables in eval'ed
558
+ commands](
559
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d5b0c97d13)
560
+ * [Correct missing SP characters in grammar comment at top of
561
+ fast-import.c](
562
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=98e1a4186a)
563
+ * [fast-export: Avoid dropping files from commits](
564
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=ebeec7dbc5)
565
+ * git-1.6.1.4:
566
+ * [fast-export: ensure we traverse commits in topological order](
567
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=784f8affe4)
temp5/README.md ADDED
@@ -0,0 +1,567 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ git filter-repo is a versatile tool for rewriting history, which includes
2
+ [capabilities I have not found anywhere
3
+ else](#design-rationale-behind-filter-repo). It roughly falls into the
4
+ same space of tool as [git
5
+ filter-branch](https://git-scm.com/docs/git-filter-branch) but without the
6
+ capitulation-inducing poor
7
+ [performance](https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/),
8
+ with far more capabilities, and with a design that scales usability-wise
9
+ beyond trivial rewriting cases. [git filter-repo is now recommended by the
10
+ git project](https://git-scm.com/docs/git-filter-branch#_warning) instead
11
+ of git filter-branch.
12
+
13
+ While most users will probably just use filter-repo as a simple command
14
+ line tool (and likely only use a few of its flags), at its core filter-repo
15
+ contains a library for creating history rewriting tools. As such, users
16
+ with specialized needs can leverage it to quickly create [entirely new
17
+ history rewriting tools](contrib/filter-repo-demos).
18
+
19
+ # Table of Contents
20
+
21
+ * [Prerequisites](#prerequisites)
22
+ * [How do I install it?](#how-do-i-install-it)
23
+ * [How do I use it?](#how-do-i-use-it)
24
+ * [Why filter-repo instead of other alternatives?](#why-filter-repo-instead-of-other-alternatives)
25
+ * [filter-branch](#filter-branch)
26
+ * [BFG Repo Cleaner](#bfg-repo-cleaner)
27
+ * [Simple example, with comparisons](#simple-example-with-comparisons)
28
+ * [Solving this with filter-repo](#solving-this-with-filter-repo)
29
+ * [Solving this with BFG Repo Cleaner](#solving-this-with-bfg-repo-cleaner)
30
+ * [Solving this with filter-branch](#solving-this-with-filter-branch)
31
+ * [Solving this with fast-export/fast-import](#solving-this-with-fast-exportfast-import)
32
+ * [Design rationale behind filter-repo](#design-rationale-behind-filter-repo)
33
+ * [How do I contribute?](#how-do-i-contribute)
34
+ * [Is there a Code of Conduct?](#is-there-a-code-of-conduct)
35
+ * [Upstream Improvements](#upstream-improvements)
36
+
37
+ # Prerequisites
38
+
39
+ filter-repo requires:
40
+
41
+ * git >= 2.22.0 at a minimum; [some features](#upstream-improvements)
42
+ require git >= 2.24.0 or later
43
+ * python3 >= 3.5
44
+
45
+ # How do I install it?
46
+
47
+ `git-filter-repo` is a single-file python script, which was done to make
48
+ installation for basic use on many systems trivial: just place that
49
+ file into your $PATH.
50
+
51
+ See [INSTALL.md](INSTALL.md) for things beyond basic usage or special
52
+ cases. The more involved instructions are only needed if one of the
53
+ following apply:
54
+
55
+ * you do not find the above comment about trivial installation intuitively
56
+ obvious
57
+ * you are working with a python3 executable named something other than
58
+ "python3"
59
+ * you want to install documentation (beyond the builtin docs shown with -h)
60
+ * you want to run some of the [contrib](contrib/filter-repo-demos/) examples
61
+ * you want to create your own python filtering scripts using filter-repo as
62
+ a module/library
63
+
64
+ # How do I use it?
65
+
66
+ For comprehensive documentation:
67
+ * see the [user manual](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html)
68
+ * alternative formating of the user manual is available on various
69
+ external sites
70
+ ([example](https://www.mankier.com/1/git-filter-repo)), for those
71
+ that don't like the htmlpreview.github.io layout, though it may
72
+ only be up-to-date as of the latest release
73
+
74
+ If you prefer learning from examples:
75
+ * there is a [cheat sheet for converting filter-branch
76
+ commands](Documentation/converting-from-filter-branch.md#cheat-sheet-conversion-of-examples-from-the-filter-branch-manpage),
77
+ which covers every example from the filter-branch manual
78
+ * there is a [cheat sheet for converting BFG Repo Cleaner
79
+ commands](Documentation/converting-from-bfg-repo-cleaner.md#cheat-sheet-conversion-of-examples-from-bfg),
80
+ which covers every example from the BFG website
81
+ * the [simple example](#simple-example-with-comparisons) below may
82
+ be of interest
83
+ * the user manual has an extensive [examples
84
+ section](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES)
85
+
86
+ # Why filter-repo instead of other alternatives?
87
+
88
+ This was covered in more detail in a [Git Rev News article on
89
+ filter-repo](https://git.github.io/rev_news/2019/08/21/edition-54/#an-introduction-to-git-filter-repo--written-by-elijah-newren),
90
+ but some highlights for the main competitors:
91
+
92
+ ## filter-branch
93
+
94
+ * filter-branch is [extremely to unusably
95
+ slow](https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/)
96
+ ([multiple orders of magnitude slower than it should
97
+ be](https://git-scm.com/docs/git-filter-branch#PERFORMANCE))
98
+ for non-trivial repositories.
99
+
100
+ * [filter-branch is riddled with
101
+ gotchas](https://git-scm.com/docs/git-filter-branch#SAFETY) that can
102
+ silently corrupt your rewrite or at least thwart your "cleanup"
103
+ efforts by giving you something more problematic and messy than what
104
+ you started with.
105
+
106
+ * filter-branch is [very onerous](#simple-example-with-comparisons)
107
+ [to
108
+ use](https://github.com/newren/git-filter-repo/blob/a6a6a1b0f62d365bbe2e76f823e1621857ec4dbd/contrib/filter-repo-demos/filter-lamely#L9-L61)
109
+ for any rewrite which is even slightly non-trivial.
110
+
111
+ * the git project has stated that the above issues with filter-branch
112
+ cannot be backward compatibly fixed; they recommend that you [stop
113
+ using
114
+ filter-branch](https://git-scm.com/docs/git-filter-branch#_warning)
115
+
116
+ * die-hard fans of filter-branch may be interested in
117
+ [filter-lamely](contrib/filter-repo-demos/filter-lamely)
118
+ (a.k.a. [filter-branch-ish](contrib/filter-repo-demos/filter-branch-ish)),
119
+ a reimplementation of filter-branch based on filter-repo which is
120
+ more performant (though not nearly as fast or safe as
121
+ filter-repo).
122
+
123
+ * a [cheat
124
+ sheet](Documentation/converting-from-filter-branch.md#cheat-sheet-conversion-of-examples-from-the-filter-branch-manpage)
125
+ is available showing how to convert example commands from the manual of
126
+ filter-branch into filter-repo commands.
127
+
128
+ ## BFG Repo Cleaner
129
+
130
+ * great tool for its time, but while it makes some things simple, it
131
+ is limited to a few kinds of rewrites.
132
+
133
+ * its architecture is not amenable to handling more types of
134
+ rewrites.
135
+
136
+ * its architecture presents some shortcomings and bugs even for its
137
+ intended usecase.
138
+
139
+ * fans of bfg may be interested in
140
+ [bfg-ish](contrib/filter-repo-demos/bfg-ish), a reimplementation of bfg
141
+ based on filter-repo which includes several new features and bugfixes
142
+ relative to bfg.
143
+
144
+ * a [cheat
145
+ sheet](Documentation/converting-from-bfg-repo-cleaner.md#cheat-sheet-conversion-of-examples-from-bfg)
146
+ is available showing how to convert example commands from the manual of
147
+ BFG Repo Cleaner into filter-repo commands.
148
+
149
+ # Simple example, with comparisons
150
+
151
+ Let's say that we want to extract a piece of a repository, with the intent
152
+ on merging just that piece into some other bigger repo. For extraction, we
153
+ want to:
154
+
155
+ * extract the history of a single directory, src/. This means that only
156
+ paths under src/ remain in the repo, and any commits that only touched
157
+ paths outside this directory will be removed.
158
+ * rename all files to have a new leading directory, my-module/ (e.g. so that
159
+ src/foo.c becomes my-module/src/foo.c)
160
+ * rename any tags in the extracted repository to have a 'my-module-'
161
+ prefix (to avoid any conflicts when we later merge this repo into
162
+ something else)
163
+
164
+ ## Solving this with filter-repo
165
+
166
+ Doing this with filter-repo is as simple as the following command:
167
+ ```shell
168
+ git filter-repo --path src/ --to-subdirectory-filter my-module --tag-rename '':'my-module-'
169
+ ```
170
+ (the single quotes are unnecessary, but make it clearer to a human that we
171
+ are replacing the empty string as a prefix with `my-module-`)
172
+
173
+ ## Solving this with BFG Repo Cleaner
174
+
175
+ BFG Repo Cleaner is not capable of this kind of rewrite; in fact, all
176
+ three types of wanted changes are outside of its capabilities.
177
+
178
+ ## Solving this with filter-branch
179
+
180
+ filter-branch comes with a pile of caveats (more on that below) even
181
+ once you figure out the necessary invocation(s):
182
+
183
+ ```shell
184
+ git filter-branch \
185
+ --tree-filter 'mkdir -p my-module && \
186
+ git ls-files \
187
+ | grep -v ^src/ \
188
+ | xargs git rm -f -q && \
189
+ ls -d * \
190
+ | grep -v my-module \
191
+ | xargs -I files mv files my-module/' \
192
+ --tag-name-filter 'echo "my-module-$(cat)"' \
193
+ --prune-empty -- --all
194
+ git clone file://$(pwd) newcopy
195
+ cd newcopy
196
+ git for-each-ref --format="delete %(refname)" refs/tags/ \
197
+ | grep -v refs/tags/my-module- \
198
+ | git update-ref --stdin
199
+ git gc --prune=now
200
+ ```
201
+
202
+ Some might notice that the above filter-branch invocation will be really
203
+ slow due to using --tree-filter; you could alternatively use the
204
+ --index-filter option of filter-branch, changing the above commands to:
205
+
206
+ ```shell
207
+ git filter-branch \
208
+ --index-filter 'git ls-files \
209
+ | grep -v ^src/ \
210
+ | xargs git rm -q --cached;
211
+ git ls-files -s \
212
+ | sed "s%$(printf \\t)%&my-module/%" \
213
+ | git update-index --index-info;
214
+ git ls-files \
215
+ | grep -v ^my-module/ \
216
+ | xargs git rm -q --cached' \
217
+ --tag-name-filter 'echo "my-module-$(cat)"' \
218
+ --prune-empty -- --all
219
+ git clone file://$(pwd) newcopy
220
+ cd newcopy
221
+ git for-each-ref --format="delete %(refname)" refs/tags/ \
222
+ | grep -v refs/tags/my-module- \
223
+ | git update-ref --stdin
224
+ git gc --prune=now
225
+ ```
226
+
227
+ However, for either filter-branch command there are a pile of caveats.
228
+ First, some may be wondering why I list five commands here for
229
+ filter-branch. Despite the use of --all and --tag-name-filter, and
230
+ filter-branch's manpage claiming that a clone is enough to get rid of
231
+ old objects, the extra steps to delete the other tags and do another
232
+ gc are still required to clean out the old objects and avoid mixing
233
+ new and old history before pushing somewhere. Other caveats:
234
+ * Commit messages are not rewritten; so if some of your commit
235
+ messages refer to prior commits by (abbreviated) sha1, after the
236
+ rewrite those messages will now refer to commits that are no longer
237
+ part of the history. It would be better to rewrite those
238
+ (abbreviated) sha1 references to refer to the new commit ids.
239
+ * The --prune-empty flag sometimes misses commits that should be
240
+ pruned, and it will also prune commits that *started* empty rather
241
+ than just ended empty due to filtering. For repositories that
242
+ intentionally use empty commits for versioning and publishing
243
+ related purposes, this can be detrimental.
244
+ * The commands above are OS-specific. GNU vs. BSD issues for sed,
245
+ xargs, and other commands often trip up users; I think I failed to
246
+ get most folks to use --index-filter since the only example in the
247
+ filter-branch manpage that both uses it and shows how to move
248
+ everything into a subdirectory is linux-specific, and it is not
249
+ obvious to the reader that it has a portability issue since it
250
+ silently misbehaves rather than failing loudly.
251
+ * The --index-filter version of the filter-branch command may be two to
252
+ three times faster than the --tree-filter version, but both
253
+ filter-branch commands are going to be multiple orders of magnitude
254
+ slower than filter-repo.
255
+ * Both commands assume all filenames are composed entirely of ascii
256
+ characters (even special ascii characters such as tabs or double
257
+ quotes will wreak havoc and likely result in missing files or
258
+ misnamed files)
259
+
260
+ ## Solving this with fast-export/fast-import
261
+
262
+ One can kind of hack this together with something like:
263
+
264
+ ```shell
265
+ git fast-export --no-data --reencode=yes --mark-tags --fake-missing-tagger \
266
+ --signed-tags=strip --tag-of-filtered-object=rewrite --all \
267
+ | grep -vP '^M [0-9]+ [0-9a-f]+ (?!src/)' \
268
+ | grep -vP '^D (?!src/)' \
269
+ | perl -pe 's%^(M [0-9]+ [0-9a-f]+ )(.*)$%\1my-module/\2%' \
270
+ | perl -pe 's%^(D )(.*)$%\1my-module/\2%' \
271
+ | perl -pe s%refs/tags/%refs/tags/my-module-% \
272
+ | git -c core.ignorecase=false fast-import --date-format=raw-permissive \
273
+ --force --quiet
274
+ git for-each-ref --format="delete %(refname)" refs/tags/ \
275
+ | grep -v refs/tags/my-module- \
276
+ | git update-ref --stdin
277
+ git reset --hard
278
+ git reflog expire --expire=now --all
279
+ git gc --prune=now
280
+ ```
281
+
282
+ But this comes with some nasty caveats and limitations:
283
+ * The various greps and regex replacements operate on the entire
284
+ fast-export stream and thus might accidentally corrupt unintended
285
+ portions of it, such as commit messages. If you needed to edit
286
+ file contents and thus dropped the --no-data flag, it could also
287
+ end up corrupting file contents.
288
+ * This command assumes all filenames in the repository are composed
289
+ entirely of ascii characters, and also exclude special characters
290
+ such as tabs or double quotes. If such a special filename exists
291
+ within the old src/ directory, it will be pruned even though it
292
+ was intended to be kept. (In slightly different repository
293
+ rewrites, this type of editing also risks corrupting filenames
294
+ with special characters by adding extra double quotes near the end
295
+ of the filename and in some leading directory name.)
296
+ * This command will leave behind huge numbers of useless empty
297
+ commits, and has no realistic way of pruning them. (And if you
298
+ tried to combine this technique with another tool to prune the
299
+ empty commits, then you now have no way to distinguish between
300
+ commits which were made empty by the filtering that you want to
301
+ remove, and commits which were empty before the filtering process
302
+ and which you thus may want to keep.)
303
+ * Commit messages which reference other commits by hash will now
304
+ reference old commits that no longer exist. Attempting to edit
305
+ the commit messages to update them is extraordinarily difficult to
306
+ add to this kind of direct rewrite.
307
+
308
+ # Design rationale behind filter-repo
309
+
310
+ None of the existing repository filtering tools did what I wanted;
311
+ they all came up short for my needs. No tool provided any of the
312
+ first eight traits below I wanted, and no tool provided more than
313
+ two of the last four traits either:
314
+
315
+ 1. [Starting report] Provide user an analysis of their repo to help
316
+ them get started on what to prune or rename, instead of expecting
317
+ them to guess or find other tools to figure it out. (Triggered, e.g.
318
+ by running the first time with a special flag, such as --analyze.)
319
+
320
+ 1. [Keep vs. remove] Instead of just providing a way for users to
321
+ easily remove selected paths, also provide flags for users to
322
+ only *keep* certain paths. Sure, users could workaround this by
323
+ specifying to remove all paths other than the ones they want to
324
+ keep, but the need to specify all paths that *ever* existed in
325
+ **any** version of the repository could sometimes be quite
326
+ painful. For filter-branch, using pipelines like `git ls-files |
327
+ grep -v ... | xargs -r git rm` might be a reasonable workaround
328
+ but can get unwieldy and isn't as straightforward for users; plus
329
+ those commands are often operating-system specific (can you spot
330
+ the GNUism in the snippet I provided?).
331
+
332
+ 1. [Renaming] It should be easy to rename paths. For example, in
333
+ addition to allowing one to treat some subdirectory as the root
334
+ of the repository, also provide options for users to make the
335
+ root of the repository just become a subdirectory. And more
336
+ generally allow files and directories to be easily renamed.
337
+ Provide sanity checks if renaming causes multiple files to exist
338
+ at the same path. (And add special handling so that if a commit
339
+ merely copied oldname->newname without modification, then
340
+ filtering oldname->newname doesn't trigger the sanity check and
341
+ die on that commit.)
342
+
343
+ 1. [More intelligent safety] Writing copies of the original refs to
344
+ a special namespace within the repo does not provide a
345
+ user-friendly recovery mechanism. Many would struggle to recover
346
+ using that. Almost everyone I've ever seen do a repository
347
+ filtering operation has done so with a fresh clone, because
348
+ wiping out the clone in case of error is a vastly easier recovery
349
+ mechanism. Strongly encourage that workflow by [detecting and
350
+ bailing if we're not in a fresh
351
+ clone](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#FRESHCLONE),
352
+ unless the user overrides with --force.
353
+
354
+ 1. [Auto shrink] Automatically remove old cruft and repack the
355
+ repository for the user after filtering (unless overridden); this
356
+ simplifies things for the user, helps avoid mixing old and new
357
+ history together, and avoids problems where the multi-step
358
+ process for shrinking the repo documented in the manpage doesn't
359
+ actually work in some cases. (I'm looking at you,
360
+ filter-branch.)
361
+
362
+ 1. [Clean separation] Avoid confusing users (and prevent accidental
363
+ re-pushing of old stuff) due to mixing old repo and rewritten
364
+ repo together. (This is particularly a problem with filter-branch
365
+ when using the --tag-name-filter option, and sometimes also an
366
+ issue when only filtering a subset of branches.)
367
+
368
+ 1. [Versatility] Provide the user the ability to extend the tool or
369
+ even write new tools that leverage existing capabilities, and
370
+ provide this extensibility in a way that (a) avoids the need to
371
+ fork separate processes (which would destroy performance), (b)
372
+ avoids making the user specify OS-dependent shell commands (which
373
+ would prevent users from sharing commands with each other), (c)
374
+ takes advantage of rich data structures (because hashes, dicts,
375
+ lists, and arrays are prohibitively difficult in shell) and (d)
376
+ provides reasonable string manipulation capabilities (which are
377
+ sorely lacking in shell).
378
+
379
+ 1. [Old commit references] Provide a way for users to use old commit
380
+ IDs with the new repository (in particular via mapping from old to
381
+ new hashes with refs/replace/ references).
382
+
383
+ 1. [Commit message consistency] If commit messages refer to other
384
+ commits by ID (e.g. "this reverts commit 01234567890abcdef", "In
385
+ commit 0013deadbeef9a..."), those commit messages should be
386
+ rewritten to refer to the new commit IDs.
387
+
388
+ 1. [Become-empty pruning] Commits which become empty due to filtering
389
+ should be pruned. If the parent of a commit is pruned, the first
390
+ non-pruned ancestor needs to become the new parent. If no
391
+ non-pruned ancestor exists and the commit was not a merge, then it
392
+ becomes a new root commit. If no non-pruned ancestor exists and
393
+ the commit was a merge, then the merge will have one less parent
394
+ (and thus make it likely to become a non-merge commit which would
395
+ itself be pruned if it had no file changes of its own). One
396
+ special thing to note here is that we prune commits which become
397
+ empty, NOT commits which start empty. Some projects intentionally
398
+ create empty commits for versioning or publishing reasons, and
399
+ these should not be removed. (As a special case, commits which
400
+ started empty but whose parent was pruned away will also be
401
+ considered to have "become empty".)
402
+
403
+ 1. [Become-degenerate pruning] Pruning of commits which become empty
404
+ can potentially cause topology changes, and there are lots of
405
+ special cases. Normally, merge commits are not removed since they
406
+ are needed to preserve the graph topology, but the pruning of
407
+ parents and other ancestors can ultimately result in the loss of
408
+ one or more parents. A simple case was already noted above: if a
409
+ merge commit loses enough parents to become a non-merge commit and
410
+ it has no file changes, then it too can be pruned. Merge commits
411
+ can also have a topology that becomes degenerate: it could end up
412
+ with the merge_base serving as both parents (if all intervening
413
+ commits from the original repo were pruned), or it could end up
414
+ with one parent which is an ancestor of its other parent. In such
415
+ cases, if the merge has no file changes of its own, then the merge
416
+ commit can also be pruned. However, much as we do with empty
417
+ pruning we do not prune merge commits that started degenerate
418
+ (which indicates it may have been intentional, such as with --no-ff
419
+ merges) but only merge commits that become degenerate and have no
420
+ file changes of their own.
421
+
422
+ 1. [Speed] Filtering should be reasonably fast
423
+
424
+ # How do I contribute?
425
+
426
+ See the [contributing guidelines](Documentation/Contributing.md).
427
+
428
+ # Is there a Code of Conduct?
429
+
430
+ Participants in the filter-repo community are expected to adhere to
431
+ the same standards as for the git project, so the [git Code of
432
+ Conduct](https://git.kernel.org/pub/scm/git/git.git/tree/CODE_OF_CONDUCT.md)
433
+ applies.
434
+
435
+ # Upstream Improvements
436
+
437
+ Work on filter-repo and [its
438
+ predecessor](https://public-inbox.org/git/51419b2c0904072035u1182b507o836a67ac308d32b9@mail.gmail.com/)
439
+ has also driven numerous improvements to fast-export and fast-import
440
+ (and occasionally other commands) in core git, based on things
441
+ filter-repo needs to do its work:
442
+
443
+ * git-2.28.0
444
+ * [fast-import: add new --date-format=raw-permissive format](
445
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d42a2fb72f)
446
+ * git-2.24.0
447
+ * [fast-export: handle nested tags](
448
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=941790d7de)
449
+ * [t9350: add tests for tags of things other than a commit](
450
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=8d7d33c1ce)
451
+ * [fast-export: allow user to request tags be marked with --mark-tags](
452
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=a1638cfe12)
453
+ * [fast-export: add support for --import-marks-if-exists](
454
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=208d69246e)
455
+ * [fast-import: add support for new 'alias' command](
456
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=b8f50e5b60)
457
+ * [fast-import: allow tags to be identified by mark labels](
458
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=f73b2aba05)
459
+ * [fast-import: fix handling of deleted tags](
460
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=3164e6bd24)
461
+ * [fast-export: fix exporting a tag and nothing else](
462
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=af2abd870b)
463
+ * [git-fast-import.txt: clarify that multiple merge commits are allowed](
464
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d1387d3895)
465
+ * git-2.23.0
466
+ * [t9350: fix encoding test to actually test reencoding](
467
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=32615ce762)
468
+ * [fast-import: support 'encoding' commit header](
469
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=3edfcc65fd)
470
+ * [fast-export: avoid stripping encoding header if we cannot reencode](
471
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=ccbfc96dc4)
472
+ * [fast-export: differentiate between explicitly UTF-8 and implicitly
473
+ UTF-8](
474
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=57a8be2cb0)
475
+ * [fast-export: do automatic reencoding of commit messages only if
476
+ requested](
477
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=e80001f8fd)
478
+ * git-2.22.0
479
+ * [log,diff-tree: add --combined-all-paths option](
480
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d76ce4f734)
481
+ * [t9300: demonstrate bug with get-mark and empty orphan commits](
482
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=62edbec7de)
483
+ * [git-fast-import.txt: fix wording about where ls command can appear](
484
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=a63c54a019)
485
+ * [fast-import: check most prominent commands first](
486
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=5056bb7646)
487
+ * [fast-import: only allow cat-blob requests where it makes sense](
488
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=7ffde293f2)
489
+ * [fast-import: fix erroneous handling of get-mark with empty orphan
490
+ commits](
491
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=cf7b857a77)
492
+ * [Honor core.precomposeUnicode in more places](
493
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=8e712ef6fc)
494
+ * git-2.21.0
495
+ * [fast-export: convert sha1 to oid](
496
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=843b9e6d48)
497
+ * [git-fast-import.txt: fix documentation for --quiet option](
498
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=f55c979b14)
499
+ * [git-fast-export.txt: clarify misleading documentation about rev-list
500
+ args](
501
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=4532be7cba)
502
+ * [fast-export: use value from correct enum](
503
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=b93b81e799)
504
+ * [fast-export: avoid dying when filtering by paths and old tags exist](
505
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=1f30c904b3)
506
+ * [fast-export: move commit rewriting logic into a function for reuse](
507
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=f129c4275c)
508
+ * [fast-export: when using paths, avoid corrupt stream with non-existent
509
+ mark](
510
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=cd13762d8f)
511
+ * [fast-export: ensure we export requested refs](
512
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=fdf31b6369)
513
+ * [fast-export: add --reference-excluded-parents option](
514
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=530ca19c02)
515
+ * [fast-import: remove unmaintained duplicate documentation](
516
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=25dd3e4889)
517
+ * [fast-export: add a --show-original-ids option to show
518
+ original names](
519
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=a965bb3116)
520
+ * [git-show-ref.txt: fix order of flags](
521
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=bd8d6f0def)
522
+ * git-2.20.0
523
+ * [update-ref: fix type of update_flags variable to
524
+ match its usage](
525
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=e4c34855a2)
526
+ * [update-ref: allow --no-deref with --stdin](
527
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d345e9fbe7)
528
+ * git-1.7.3
529
+ * [fast-export: Fix dropping of files with --import-marks and path
530
+ limiting](
531
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=4087a02e45)
532
+ * [fast-export: Add a --full-tree option](
533
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=7f40ab0916)
534
+ * [fast-export: Fix output order of D/F changes](
535
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=060df62422)
536
+ * [fast-import: Improve robustness when D->F changes provided in wrong
537
+ order](
538
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=253fb5f889)
539
+ * git-1.6.4:
540
+ * [fast-export: Set revs.topo_order before calling setup_revisions](
541
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=668f3aa776)
542
+ * [fast-export: Omit tags that tag trees](
543
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=02c48cd69b)
544
+ * [fast-export: Make sure we show actual ref names instead of "(null)"](
545
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=2374502c6c)
546
+ * [fast-export: Do parent rewriting to avoid dropping relevant commits](
547
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=32164131db)
548
+ * [fast-export: Add a --tag-of-filtered-object option for newly
549
+ dangling tags](
550
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=2d8ad46919)
551
+ * [Add new fast-export testcases](
552
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=25e0ca5dd6)
553
+ * [fast-export: Document the fact that git-rev-list arguments are
554
+ accepted](
555
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=8af15d282e)
556
+ * git-1.6.3:
557
+ * [git-filter-branch: avoid collisions with variables in eval'ed
558
+ commands](
559
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=d5b0c97d13)
560
+ * [Correct missing SP characters in grammar comment at top of
561
+ fast-import.c](
562
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=98e1a4186a)
563
+ * [fast-export: Avoid dropping files from commits](
564
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=ebeec7dbc5)
565
+ * git-1.6.1.4:
566
+ * [fast-export: ensure we traverse commits in topological order](
567
+ https://git.kernel.org/pub/scm/git/git.git/commit/?id=784f8affe4)