Re: [GSoC] Status Reoprt: Java Project Dependency Builder
On 10/06/14 19:20, Andrew Schurman wrote:
> On Monday, June 9, 2014, Daniel Pocock <email@example.com
> <mailto:firstname.lastname@example.org>> wrote:
>> On 09/06/14 11:16, Andrew Schurman wrote:
>>> On Sat, Jun 7, 2014 at 1:08 AM, Daniel Pocock <email@example.com
> <mailto:firstname.lastname@example.org>> wrote:
>>>> On 07/06/14 09:33, Andrew Schurman wrote:
>>>> For some projects you will need to patch their pom file or build.xml or
>>>> whatever and it would be ideal if you forked their Git repository and
>>>> created your changes on a branch.
>>>> If the project is in an SVN repo, you will probably need my sync2git
>>>> Please have a look at the issues in github too, #2 is quite easy:
>>> Instead of cloning the entire history, what if we just take a snapshot
>>> of the files at that particular version? It will save us from
>>> translating between VCSs. We could do something different for git, but
>>> I think doing the same thing for everything would make things easier.
>> I've been using this approach for some very big projects like
>> reSIProcate and sipXtapi for a while now - the sync2git script runs
>> automatically from cron. For over 90% of SVN projects that use the
>> standard layout (trunk/branches/tags) this approach is likely to just
>> work. The only manual work is tweaking authors.txt (e.g. inserting the
>> names), some people don't bother with that but I personally feel it is
>> good to properly attribute Git commits to their authors so they will
>> show up in Github reports, etc.
>> Doing VCS conversion does add some latency to the whole build process.
>> However, it means you can benefit from using VCS like tools to inspect
>> the changes. For projects that are not in the 90%, e.g. those where
>> some manual effort is needed to get them to build, the VCS will provide
>> a consistent way for people to make the manual changes before the
>> automated build resumes again, e.g:
>> - build fails because foo.jar is missing
>> - foo SVN is found from foo.pom or some other clue
>> - foo-svn-mirror is created in Github using sync2git and a call to the
>> Github API
>> - using the Github API, a fork is created in Github, from foo-svn-mirror
>> -> foo-dfsg
>> - automated changes are made on a dfsg branch in foo-dfsg (e.g. removing
>> copies of junit.jar or other binary or non-free artifacts in the
>> - now for the manual step - the developer can manually tweak the
>> build.xml or commit a build.properties or whatever is needed. These
>> changes are also committed on the dfsg branch in the foo-dfsg fork
>> - finally, the developer sends some signal to kick off the automated
>> build again (e.g. with a command line or web-based UI)
> I'm still thinking that tracking only the changes made between the
> released versions is the better way to go. It still supports the
> workflow you describe above plus ignores changes such as a file was
> added then removed between a release. Yes, you will loose that context
> of why some changes are made though.
> In the case of reSIProcate and sipXtapi, you would maybe spend 10min
> cloning the repo, but if you had something like the apache svn repo, it
> could take hours, albeit a rare case.
Actually, reSIProcate has over 10,000 commits - it takes the better part
of a day for git-svn to run the first time
> Plus if you start dealing with tracking multiple versions, you need to
> handle the case when a repo has been relocated. You might be able to
> work something out when working with two svn repos, but going from svn
> to something like git would be trouble.
> I think I could spend a lot of time here with the corner cases and think
> it would be best finish the rest of the system first with the approach
> I've described. If this is something we really want, we could always
> tackle it as a wislist item near the end of the project.
Definitely don't try to handle every corner case just now
If you can deal with some repositories effectively (e.g. if you can see
they have less than 1,000 commits and they follow the standard layout)
then that would be nice - but I'm not going to insist on this approach
If you just want to take snapshots of the releases and commit them into
a git repository, the Debian tool git-buildpackage (see the
git-import-orig script) may be worth having a look at - although it
assumes that the code on the "upstream" branch is already dfsg (free of
binary artifacts) before you import it.