[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Reimplement mksplit.pl in C



On Sat, Apr 03, 2010 at 01:19:33AM -0500, Jonathan Nieder wrote:

> I could only find approx one reference for this goal [1].

 Ah, yes, that was it, thanks!  Removing Perl from the essential
 set.  While going to bed, I actually figured that was the reason
 why I wasn't finding perl in the dependencies, but I didn't
 remember this was the reason I had read about for the
 reimplementation.

> Luckily, even if that doesn’t pan out, there are plenty of
> other benefits to well written C code.  Most importantly, it
> makes code sharing a little easier.

 Sure.  I'd like to take this implementation and factor out
 whatever is reusable.  There are a couple of maybe-useful
 gimmicks in there, for example the Perl idiom `program`
 transliterated into C.  I cheated and took a guess regarding how
 large should a buffer be to be "large enough" for this use case:
 my guess is one page.  There are other hard-coded buffer length
 limits, too.  Memory concious it ain't right now, this code.

> I guess this would also make repairing a broken perl
> installation a little simpler on some system with only access
> to small media, though I haven’t encountered a need for
> dpkg-split in a long while.  Thanks for working on it.

 Same here.  The last time I used dpkg-split for something was
 back in ... 1996?

> >  I did compare the output with the Perl version in several
> >  test cases and it's identical.
> 
> Some test cases (or methods for producing them) would make me
> very happy.  dpkg doesn’t have enough automated tests of basic
> functionality.  Automake provides a rudimentary test harness
> [2].

 I was thinking about that.  The problem with mksplit is that the
 generated files contain an embedded timestamp, so when I say
 "identical" I actually mean "identical modulo the embedded
 timestamp".  I mean, you can't just cmp two sets of files.

 Beyond that, mksplit has several bits of code identifiable as
 functions that ought to be separated and those could have unit
 tests.  I would have to look in more detail at dpkg-split to see
 if it's possible to devise a good automated test for this
 functionality.  The two cases I can think of are splitting an
 archive that splits to a single piece, splitting to multiple
 pieces and some failure cases.  There's a check in mksplit.pl
 (also implemented in mksplit.c) that I'm still trying to figure
 out how to trigger (the corresponding error message refers to
 the header being too large).

> I think this should be
> 
>  bin_PROGRAMS = dpkg-split
>  pkglib_PROGRAMS = mksplit
> 
> so that mksplit stays in /usr/lib/dpkg.

 Ah, right, thanks, updated patch attached.

 Marcelo
>From acbd7b59d0d48ec47191d67846029aedc3c49afe Mon Sep 17 00:00:00 2001
From: Marcelo E. Magallon <marcelo.magallon@gmail.com>
Date: Fri, 2 Apr 2010 22:27:50 -0600
Subject: [PATCH] Reimplement mksplit in C

My original intention was to do without some of the pipes, but for the
initial implementation this is just a translation of the original Perl
program into C, without much in the way of optimization.  It produces
the same output and exits in the same way as the original program (mod
exit status probably).

The reimplementation can use some refactoring love.
---
 dpkg-split/Makefile.am |   12 ++-
 dpkg-split/mksplit.c   |  279 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 288 insertions(+), 3 deletions(-)
 create mode 100644 dpkg-split/mksplit.c

diff --git a/dpkg-split/Makefile.am b/dpkg-split/Makefile.am
index 6f6043c..3ea9d00 100644
--- a/dpkg-split/Makefile.am
+++ b/dpkg-split/Makefile.am
@@ -24,10 +24,16 @@ dpkg_split_LDADD = \
 	../lib/compat/libcompat.a \
 	$(LIBINTL)
 
+mksplit_SOURCES = \
+	mksplit.c
 
-pkglib_SCRIPTS = mksplit
-EXTRA_DIST = mksplit.pl
-CLEANFILES = $(pkglib_SCRIPTS)
+mksplit_LDADD = \
+	../lib/dpkg/libdpkg.a \
+	../lib/compat/libcompat.a \
+	$(LIBINTL)
+
+pkglib_PROGRAMS = mksplit
+CLEANFILES = $(pkglib_PROGRAMS)
 
 
 do_perl_subst = $(AM_V_GEN) sed -e "s:^\#![:space:]*/usr/bin/perl:\#!$(PERL):"
diff --git a/dpkg-split/mksplit.c b/dpkg-split/mksplit.c
new file mode 100644
index 0000000..5ca34d6
--- /dev/null
+++ b/dpkg-split/mksplit.c
@@ -0,0 +1,279 @@
+/* This program is only supposed to be called by dpkg-split.
+ * Its arguments are:
+ * <sourcefile> <partsize> <prefix> <totalsize> <partsizeallow> <msdostruncyesno>
+ * Stdin is also redirected from the source archive by dpkg-split.
+ *
+ * Copyright © 1995 Ian Jackson <ian@chiark.greenend.org.uk>
+ * Copyright © 2010 Marcelo E. Magallon <mmagallo@debian.org>
+ *
+ * This is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <time.h>
+#include <limits.h>
+#include <error.h>
+#include <string.h>
+#include <ctype.h>
+#include <errno.h>
+#include <unistd.h>
+
+#define MSDOS_PATH_MAX 10
+
+void checked_free(void *ptr)
+{
+    if (ptr) free(ptr);
+}
+
+void add(FILE *stream, char* id, char* data, size_t len)
+{
+    time_t current_time;
+
+    current_time = time(NULL);
+    fprintf(stream, "%-16s%-12d0     0     100644  %-10d%c\n",
+            id, (int)current_time, (int)len, 0140);
+    fwrite(data, 1, len, stream);
+    fprintf(stream, "%s", (len & 1) ? "\n" : "");
+}
+
+char* output(char *cmd)
+{
+    FILE * file = NULL;
+    int len = getpagesize();
+    char *result = (char *)malloc(len);
+    char *p = result;
+
+    if (result == NULL)
+        goto error_out;
+
+    if ((file = popen(cmd, "r")) == NULL)
+        goto error_out;
+
+    while (len > 0 && !feof(file))
+    {
+        int n;
+        n = fread(p, 1, len, file);
+        if (ferror(file))
+            goto error_out;
+        p += n;
+        len -= n;
+    }
+    goto out;
+
+error_out:
+    if (result)
+    {
+        free(result);
+        result = NULL;
+    }
+
+out:
+    if (file)
+        pclose(file);
+    return result;
+}
+
+char* get_field(char *filename, char *field)
+{
+    char *result;
+    char cmd[PATH_MAX];
+    sprintf(cmd, "dpkg-deb --field %s %s", filename, field);
+
+    result = output(cmd);
+    if (result)
+    {
+        char *p;
+        for (p = result + strlen(result) - 1; p >= result && isspace(*p); --p)
+            *p = 0;
+    }
+
+    return result;
+}
+
+char* get_md5sum(char *filename)
+{
+    char *result;
+    char cmd[PATH_MAX];
+    sprintf(cmd, "md5sum <%s", filename);
+
+    result = output(cmd);
+    if (result)
+    {
+        char *p;
+        for(p = result; *p && *p != ' '; ++p);
+        if (*p) *p = 0;
+    }
+
+    return result;
+}
+
+int main(int argc, char* argv[])
+{
+    char *sourcefile;
+    char *prefix;
+    char basename[PATH_MAX];
+    char *prefixdir = NULL;
+    char *cleanprefix = NULL;
+    char *package = NULL;
+    char *myversion = "2.1";
+    char *version = NULL;
+    char *revision = NULL;
+    char *csum = NULL;
+    char *data = NULL;
+    int partsize;
+    int orgsize;
+    size_t partsizeallow;
+    int msdos; /* yes = true, anything else = false */
+    int nparts;
+    int startat;
+    int showpartnum = 1;
+
+    if (argc != 7)
+        error(1, 0, "bad invocation\n");
+
+    sourcefile = argv[1];
+    partsize = atoi(argv[2]);
+    prefix = argv[3];
+    orgsize = atoi(argv[4]);
+    partsizeallow = (size_t)atoi(argv[5]);
+    msdos = strcmp(argv[6], "yes") == 0 ? 1 : 0;
+
+    package = get_field(sourcefile, "Package");
+    version = get_field(sourcefile, "Version");
+    revision = get_field(sourcefile, "Package_Revision");
+
+    if (version && revision && strlen(revision) > 0)
+    {
+        int n = strlen(version) + 1 + strlen(revision) + 1;
+        char *tmp = (char *)malloc(n*sizeof(char));
+        sprintf(tmp, "%s-%s", version, revision);
+        free(version);
+        version = tmp;
+    }
+
+    csum = get_md5sum(sourcefile);
+    nparts=(orgsize+partsize-1)/partsize;
+
+    printf("Splitting package %s into %d parts: ", package, nparts);
+
+    if (msdos)
+    {
+        char *p, *q;
+        prefixdir = strdup(prefix);
+        p = strrchr(prefixdir, '/');
+        if (p)
+        {
+            ++p;
+            cleanprefix = strdup(p);
+            *p = 0;
+        }
+        else
+        {
+            cleanprefix = strdup(prefix);
+            *prefixdir = 0;
+        }
+
+        for(p = q = cleanprefix; *p; ++p)
+        {
+            if (*p == '+')
+                *q++ = *p = 'x';
+            else if (isupper(*p))
+                *q++ = *p = tolower(*p);
+            else if (islower(*p) || isdigit(*p))
+                *q++ = *p;
+        }
+
+        *q = 0;
+    }
+
+    data = malloc(partsize);
+
+    for(startat = 0; startat < orgsize; startat += partsize)
+    {
+        char tmp[17];
+        char dsp[1024];
+        FILE *output;
+        size_t thispartreallen;
+
+        if (msdos)
+        {
+            int len;
+            char *p = basename;
+            len = snprintf(basename, MSDOS_PATH_MAX,
+                    "%dof%d", showpartnum, nparts);
+            p += snprintf(p, PATH_MAX, "%s/", prefixdir);
+            p += snprintf(p, MSDOS_PATH_MAX - len, "%s", cleanprefix);
+            snprintf(p, PATH_MAX - (p - basename),
+                    "%dof%d.deb", showpartnum, nparts);
+        }
+        else
+        {
+            snprintf(basename, PATH_MAX, "%s.%dof%d.deb",
+                    prefix, showpartnum, nparts);
+        }
+
+        output = fopen(basename, "w");
+
+        if (!output)
+        {
+            error(1, errno, "open %s", basename);
+        }
+
+        fprintf(output, "!<arch>\n");
+        printf("%d ", showpartnum);
+
+        snprintf(dsp, sizeof(dsp)/sizeof(*dsp),
+                 "%s\n"
+                 "%s\n"
+                 "%s\n"
+                 "%s\n"
+                 "%d\n"
+                 "%d\n"
+                 "%d/%d\n",
+                 myversion, package, version, csum, orgsize, partsize,
+                 showpartnum, nparts);
+        add(output, "debian-split", dsp, strlen(dsp));
+
+        thispartreallen = fread(data, 1, partsize, stdin);
+
+        snprintf(tmp, sizeof(tmp)/sizeof(*tmp), "data.%d", showpartnum);
+        add(output, tmp, data, thispartreallen);
+
+        if (thispartreallen > partsizeallow)
+        {
+            error(1, 0,
+                    "Header is too long, making part too long.  "
+                    "Your package name or version\n"
+                    "numbers must be extraordinarily long, or something.  "
+                    "Giving up.\n");
+        }
+
+        fclose(output);
+
+        ++showpartnum;
+    }
+
+    printf("done\n");
+
+    checked_free(data);
+    checked_free(prefixdir);
+    checked_free(cleanprefix);
+    checked_free(package);
+    checked_free(version);
+    checked_free(revision);
+    checked_free(csum);
+
+    return 0;
+}
-- 
1.7.0.3


Reply to: