[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PATCH] proposed v3 source format using .git.tar.gz



Thanks a lot for the code review. Any comments on the big picture or design?

Frank Lichtenheld wrote:
> On Fri, Oct 05, 2007 at 07:16:13PM -0400, Joey Hess wrote:
> > I have a sourcev3 branch with my changes at <git://kitenet.net/dpkg>,
> > and have also attached a diff to this mail. I feel that this is ready
> > for review and hopefully merging into dpkg now. Looking forward to your
> > comments.
> 
> A little code review follows.
> 
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program; if not, write to the Free Software
> > +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
> 
> old FSF address (not really important, but while we're at it ;)

Copied from elsewhere in dpkg source. :-)

> > +sub sanity_check {
> > +	my $srcdir=shift;
> > +
> > +	if (! -s "$srcdir/.git") {
> > +		main::error(sprintf(_g("%s is not the top directory of a git repository (%s/.git not present), but Format git was specified"), $srcdir, $srcdir));
> 
> you probably mean -e or -d here? -s on a directory is kinda strange.
> printing $srcdir twice might bloat the error message.

Yes, I meant -d, the -s snuck in from the other test.

ACK on the duplication.

> > +	}
> > +	if (-s "$srcdir/.gitmodules") {
> > +		main::error(sprintf(_g("git repository %s uses submodules. This is not yet supported."), $srcdir));
> > +	}
> > +
> > +	# Symlinks from .git to outside could cause unpack failures, or
> > +	# point to files they shouldn't, so check for and don't allow.
> > +	if (-l "$srcdir/.git") {
> > +		main::error(sprintf(_g("%s is a symlink"), "$srcdir/.git"));
> > +	}
> > +	my $abs_srcdir=Cwd::abs_path($srcdir);
> > +	find(sub {
> > +		if (-l $_) {
> > +			if (Cwd::abs_path(readlink($_)) !~ /^\Q$abs_srcdir\E(\/|$)/) {
> > +				main::error(sprintf(_g("%s is a symlink to outside %s"), $File::Find::name, $srcdir));
> > +			}
> > +		}
> > +	}, "$srcdir/.git");
> 
> Maybe it would be easier to just disallow symlinks completly? Or are
> there important use cases for that?

I've tried to not make dpkg have to know too much about git internals.
(As you can see I've not been 100% successful, but have kept it to about
the level someone with a week's knowledge of git would be comfortable
with.) So while I don't see any symlinks in my git repos, if git decides
to use symlinks, I don't want dpkg to have to be updated. (I think git
did historically use symlinks in the repo).

There are probably semi-valid reasons to manually add symlinks inside a .git
directory today, too.

> > +}
> > +
> > +# Called before a tarball is created, to prepare the tar directory.
> > +sub prep_tar {
> > +	my $srcdir=shift;
> > +	my $tardir=shift;
> > +	
> > +	sanity_check($srcdir);
> > +
> > +	if (! -e "$srcdir/.git") {
> > +		main::error(sprintf(_g("%s is not a git repository, but Format git was specified"), $srcdir));
> > +	}
> > +	if (-e "$srcdir/.gitmodules") {
> > +		main::error(sprintf(_g("git repository %s uses submodules. This is not yet supported."), $srcdir));
> > +	}
> 
> Duplicated code from sanity_check

Doh!

> > +
> > +	# Check for uncommitted files.
> > +	open(GIT_STATUS, "LANG=C cd $srcdir && git-status |") ||
> > +		main::subprocerr("cd $srcdir && git-status");
> 
> you make a lot "cd $srcdir". Maybe you should just chdir() in the parent
> process?

I could make it do that, I suppose it would be safe as long as I cd back
(dpkg-source in general assumes it's in the parent dir of the source
tree).

> This would also take care of funny things in $srcdir like
> whitespaces...

> If you get rid of the cd you could use the '-|', @array form of open
> here which would be preferable imho.

Wow, you've taught me something new, I only knew about the much more
clumsy manual fork and open("-|") approach. I'll do this, but it will
take a little while.

> > +	my $clean=0;
> > +	my $status="";
> > +	while (<GIT_STATUS>) {
> > +		if (/^\Qnothing to commit (working directory clean)\E$/) {
> > +			$clean=1;
> > +		}
> > +		else {
> > +			$status.="git-status: $_";
> > +		}
> > +	}
> > +	close GIT_STATUS;
> > +	# git-status exits 1 if there are uncommitted changes or if
> > +	# the repo is clean, and 0 if there are uncommitted changes
> > +	# listed in the index.
> > +	if ($? && $? >> 8 != 1) {
> > +		main::subprocerr("cd $srcdir && git status");
> > +	}
> > +	if (! $clean) {
> > +		# To support dpkg-buildpackage -i, get a list of files
> 
> dpkg-source -i would be the proper attribution here. dpkg-buildpackage
> implements -i only as a pass-through option.

True.

> > +		# eqivilant to the ones git-status finds, and remove any
> 
> is that an English word?

Even better, a common typo of one. :-)

> > +		# ignored files from it.
> > +		my @ignores="--exclude-per-directory=.gitignore";
> > +		my $core_excludesfile=`cd $srcdir && git-config --get core.excludesfile`;
> > +		chomp $core_excludesfile;
> > +		if (length $core_excludesfile && -e "$srcdir/$core_excludesfile") {
> > +			push @ignores, "--exclude-from='$core_excludesfile'";
> > +		}
> > +		if (-e "$srcdir/.git/info/exclude") {
> > +			push @ignores, "--exclude-from=.git/info/exclude";
> > +		}
> > +		open(GIT_LS_FILES, "cd $srcdir && git-ls-files -m -d -o @ignores |") ||
> > +			main::subprocerr("cd $srcdir && git-ls-files");
> 
> This is essentially running git-status again without the output
> beautification... Can't we avoid doing the work twice?

The git-status output is not designed to be machine parseable.
Conversely, the git-ls-files output is not designed to be human
parseable (the command is not part of the "porcelin"). dpkg-source could
just list the non-ignored modified files and not output the git-status
and leave the user to do that by hand.

> Also I would prefer using long options where available. It's not like
> anyone has to type them more than once ;)

Done.

> > +		my @files;
> > +		while (<GIT_LS_FILES>) {
> > +			chomp;
> > +			if (! length $main::diff_ignore_regexp ||
> > +			    ! m/$main::diff_ignore_regexp/o) {
> > +				push @files, $_;
> > +			}
> > +		}
> > +		close(GIT_LS_FILES) || main::syserr("git-ls-files exited nonzero");
> > +
> > +		if (@files) {
> > +			print $status;
> > +			main::error(sprintf(_g("uncommitted, not-ignored changes in working directory: %s"),
> > +				join(" ", @files)));
> 
> That intendation looks wrong.

I assume you mean of the join part? I typically write this one of two
ways, I've switched it to the other way.

> > +		}
> > +	}
> > +
> > +	# garbage collect the repo
> > +	system("cd $srcdir && git-gc --prune");
> > +	$? && main::subprocerr("cd $srcdir && git-gc --prune");
> 
> Again, dropping the cd would make it possible to use the @array form.
> git gc --prune is also a very rude thing to do implicitly. Maybe this
> should better be done in the copy?

Agreed, done.

> We could be even more rude then and
> also delete the reflog which would make git gc way more efficient in
> some cases.

I had been meaning to look at that but forgot. You think I should run
git-reflog expire --expire=<NOW> before git-gc? git-reflog expire --all does
not in fact seem to delete all reflog entries. I haven't figured out what
to use for <NOW> to make it expire everything.

> > +	# TODO support for creating a shallow clone for those cases where
> > +	# uploading the whole repo history is not desired
> > +
> > +	mkdir($tardir,0755) ||
> > +            &syserr(sprintf(_g("unable to create `%s'"), $tardir));
> > +	system("cp -a $srcdir/.git $tardir");
> > +	$? && main::subprocerr("cp -a $srcdir/.git $tardir");
> 
> you actually can use the @array form here.

Rodger.

> > +			main::warning(sprintf(_g("executable bit set on %s; clearing"), $hook));
> > +			chmod(0644 &~ umask(), $hook) ||
> > +				main::syserr(sprintf(_g("unable to change permission of `%s'"), $hook));
> 
> why these very cautios permissions here (i.e. why not 0666)?

Your're right, that should be 666 &~umask.

> > +	# This needs to be an absolute path, for some reason.
> > +	my $configfile=Cwd::abs_path("$srcdir/.git/config");
> > +	open(GIT_CONFIG, "git-config --file $configfile -l |") ||
> > +		main::subprocerr("git-config");
> 
> you can use the @array form here. And using git-config --null
> would preferable here. This was introduced only recently, but since
> git-config isn't available in etch's version at all...

Ok, I'll write that wnen I have more energy.

> > +	my @config=<GIT_CONFIG>;
> > +	close(GIT_CONFIG) || main::syserr("git-config exited nonzero");
> > +	my %removed_fields;
> > +	foreach (@config) {
> > +		chomp;
> > +		my ($field, $value)=split(/=/, $_, 2);
> > +		if ($field !~ /$safe_fields/) {
> > +			if (! exists $removed_fields{$field}) {
> > +				system("git-config", "--file", $configfile,
> > +					"--unset-all", $field);
> > +				$? && main::subprocerr("git-config --file $configfile --unset-all $field");
> > +			}
> > +			push @{$removed_fields{$field}}, $value;
> 
> Have you tested what happens if you try to unset something that
> git-config -l listed but that is really from the global or the system
> config file?

That's why I use --file $configfile, it really doesn't list things
from anywhere except the specified file.

> > +		}
> > +	}
> > +	if (%removed_fields) {
> > +		open(GIT_CONFIG, ">>", $configfile) ||
> > +			main::syserr(sprintf(_g("unstable to append to %s", $configfile)));
> > +		print GIT_CONFIG "\n# "._g("The following setting(s) were disabled by dpkg-source").":\n";
> > +		foreach my $field (sort keys %removed_fields) {
> > +			foreach my $value (@{$removed_fields{$field}}) {
> > +				print GIT_CONFIG "# $field=$value\n";
> > +			}
> > +		}
> > +		close GIT_CONFIG;
> > +		main::warning(_g(_g("modifying .git/config to comment out some settings")));
> > +	}
> 
> nested _g

D'oh.

> > +
> > +	# Note that git-reset is used to repopulate the WC with files.
> > +	# git-clone isn't used because the repo might be an unclonable
> > +	# shallow copy. git-reset also recreates the index.
> > +	# XXX git-reset should be made to run in quiet mode here, but
> > +	# lacks a good way to do it. Bug filed.
> > +	system("cd $srcdir && git-reset --hard");
> 
> again a chdir + system(@array) would be better imho.
> Why not git checkout -f ?

Because I didn't know about it (< 1 week ;-).

It does seem to work equivilantly and avoid ugly messages, so I've made
the change.

> > +sub loadvcs {
> > +	my $vcs = shift;
> > +	my $mod = "Dpkg::Source::VCS::$vcs";
> > +	eval qq{use $mod};
> > +	return ! $@;
> > +}
> 
> Since we never ever want to import something from $mod, maybe we should
> require it?

Well, I can't see any reason to import it right now, so I don't mind
making that change.

> > -
> > +	
> 
> Whitespace changes?

Fixed.


Patch attached with changes from this review (or see my repo). I'll have
a second patch later to remove shell exposures, and use git-config --null.

-- 
see shy jo
diff --git a/scripts/Dpkg/Source/VCS/git.pm b/scripts/Dpkg/Source/VCS/git.pm
index cac7d05..6eb4894 100644
--- a/scripts/Dpkg/Source/VCS/git.pm
+++ b/scripts/Dpkg/Source/VCS/git.pm
@@ -40,8 +40,8 @@ delete $ENV{GIT_WORK_TREE};
 sub sanity_check {
 	my $srcdir=shift;
 
-	if (! -s "$srcdir/.git") {
-		main::error(sprintf(_g("%s is not the top directory of a git repository (%s/.git not present), but Format git was specified"), $srcdir, $srcdir));
+	if (! -d "$srcdir/.git") {
+		main::error(sprintf(_g("source directory is not the top directory of a git repository (%s/.git not present), but Format git was specified"), $srcdir));
 	}
 	if (-s "$srcdir/.gitmodules") {
 		main::error(sprintf(_g("git repository %s uses submodules. This is not yet supported."), $srcdir));
@@ -98,8 +98,8 @@ sub prep_tar {
 		main::subprocerr("cd $srcdir && git status");
 	}
 	if (! $clean) {
-		# To support dpkg-buildpackage -i, get a list of files
-		# eqivilant to the ones git-status finds, and remove any
+		# To support dpkg-source -i, get a list of files
+		# equivalent to the ones git-status finds, and remove any
 		# ignored files from it.
 		my @ignores="--exclude-per-directory=.gitignore";
 		my $core_excludesfile=`cd $srcdir && git-config --get core.excludesfile`;
@@ -110,7 +110,7 @@ sub prep_tar {
 		if (-e "$srcdir/.git/info/exclude") {
 			push @ignores, "--exclude-from=.git/info/exclude";
 		}
-		open(GIT_LS_FILES, "cd $srcdir && git-ls-files -m -d -o @ignores |") ||
+		open(GIT_LS_FILES, "cd $srcdir && git-ls-files --modified --deleted --others @ignores |") ||
 			main::subprocerr("cd $srcdir && git-ls-files");
 		my @files;
 		while (<GIT_LS_FILES>) {
@@ -125,22 +125,22 @@ sub prep_tar {
 		if (@files) {
 			print $status;
 			main::error(sprintf(_g("uncommitted, not-ignored changes in working directory: %s"),
-				join(" ", @files)));
+			            join(" ", @files)));
 		}
 	}
 
-	# garbage collect the repo
-	system("cd $srcdir && git-gc --prune");
-	$? && main::subprocerr("cd $srcdir && git-gc --prune");
-
 	# TODO support for creating a shallow clone for those cases where
 	# uploading the whole repo history is not desired
 
 	mkdir($tardir,0755) ||
             &syserr(sprintf(_g("unable to create `%s'"), $tardir));
-	system("cp -a $srcdir/.git $tardir");
+	system("cp", "-a", "$srcdir/.git", $tardir);
 	$? && main::subprocerr("cp -a $srcdir/.git $tardir");
 
+	# garbage collect the new repo
+	system("cd $tardir && git-gc --prune");
+	$? && main::subprocerr("cd $tardir && git-gc --prune");
+
 	# As an optimisation, remove the index. It will be recreated by git
 	# reset during unpack. It's probably small, but you never know, this
 	# might save a lot of space.
@@ -158,7 +158,7 @@ sub post_unpack_tar {
 	foreach my $hook (glob("$srcdir/.git/hooks/*")) {
 		if (-x $hook) {
 			main::warning(sprintf(_g("executable bit set on %s; clearing"), $hook));
-			chmod(0644 &~ umask(), $hook) ||
+			chmod(0666 &~ umask(), $hook) ||
 				main::syserr(sprintf(_g("unable to change permission of `%s'"), $hook));
 		}
 	}
@@ -211,16 +211,15 @@ sub post_unpack_tar {
 			}
 		}
 		close GIT_CONFIG;
-		main::warning(_g(_g("modifying .git/config to comment out some settings")));
+		main::warning(_g("modifying .git/config to comment out some settings"));
 	}
 
-	# Note that git-reset is used to repopulate the WC with files.
-	# git-clone isn't used because the repo might be an unclonable
-	# shallow copy. git-reset also recreates the index.
-	# XXX git-reset should be made to run in quiet mode here, but
-	# lacks a good way to do it. Bug filed.
-	system("cd $srcdir && git-reset --hard");
-	$? && main::subprocerr("cd $srcdir && git-reset --hard");
+	# git-checkout is used to repopulate the WC with files
+	# and recreate the index.
+	# (git-clone isn't used because the repo might be an unclonable
+	# shallow copy.)
+	system("cd $srcdir && git-clone -f");
+	$? && main::subprocerr("cd $srcdir && git-clone -f");
 }
 
 1
diff --git a/scripts/dpkg-source.pl b/scripts/dpkg-source.pl
index 6c823c8..6774814 100755
--- a/scripts/dpkg-source.pl
+++ b/scripts/dpkg-source.pl
@@ -193,7 +193,7 @@ sub handleformat {
 sub loadvcs {
 	my $vcs = shift;
 	my $mod = "Dpkg::Source::VCS::$vcs";
-	eval qq{use $mod};
+	eval qq{require $mod};
 	return ! $@;
 }
 
@@ -535,7 +535,7 @@ if ($opmode eq 'build') {
             &syserr(sprintf(_g("unable to rename `%s' (newly created) to `%s'"), $newtar, $tarname));
 	chmod(0666 &~ umask(), $tarname) ||
 	    &syserr(sprintf(_g("unable to change permission of `%s'"), $tarname));
-	
+
     } else {
         
         printf(_g("%s: building %s using existing %s")."\n",

Attachment: signature.asc
Description: Digital signature


Reply to: