[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: GIT for pdiff generation



>>   As we are no git gurus ourself: Does anyone out there see any trouble
>>   doing it this way? It means storing something around 7GB of
>>   uncompressed text files in git, plus the daily changes happening to
>>   them, then diffing them in the way described above, however the
>>   archive will only need to go back for a couple of weeks and therefore
>>   we should be able to apply git gc --prune (coupled with whatever way
>>   to actually tell git that everything before $DATE can be removed) to
>>   keep the size down.
> AFAIK, there can be trouble.  It all depends on how you're structuring
> the data in git, and the size of the largest data object you will want
> to commit to the repository.

Right now the source contents of unstable has, unpacked, 220MB. (Packed
gzip its 28MB, while the binary contents per have each have 18MB
packed).

Lets add a safety margin: 350MB is a good guess for the largest.
A packages file nearly doesnt count compared to them, unpacked its just
some 34mb

> There is an alternative: git can rewrite the entire history
> (invalidating all commit IDs from the start pointing up to all the
> branch heads in the process).  You can use that facility to drop old
> commits.  Given the indented use, where you don't seem to need the
> commit ids to be constant across runs and you will rewrite the history
> of the entire repo at once and drop everything that was not rewritten,
> this is likely the less ugly way of doing what you want.  Refer to git
> filter-branch.

Its the one and only thing I ever seen where "history rewrite" is
actually something one wants to do.

> Other than that, git loads entire objects to memory to manipulate them,
> which AFAIK CAN cause problems in datasets with very large files (the
> problem is not usually the size of the repository, but rather the size
> of the largest object).  You probably want to test your use case with
> several worst-case files AND a large safety margin to ensure it won't
> break on us anytime soon, using something to track git memory usage.

Well, yes.

-- 
bye, Joerg
Some NM:
> FTBFS=Fails to Build from Start
Err, yes? How do you start in the middle?


Reply to: