[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Pulling data in for package build



Hi Andreas,

   Thanks for your quick response.

Andreas writes:

> >    Under normal circumstances, one builds the database this way:
> >
> > 	1) data is pulled down off of a public website
> >         2) software is built

> Does this build process depend from the data installed on the box?

   No.

   The "software" mentioned above is a set of source files that are
compiled into an executable. Once it is compiled into an executable,
the executable is run which reads the data and builds the
database. That is basically the executable's entire function although
it does fill a small niche.

> >         3) software reads data fetched in (1) and builds database

> Does "Software" mean the data are needed in the installation process
> (in a {pre/post}inst script) or are the data just needed if the software
> is running.

   The data is needed for the installation because the data is
basically all that is installed except it is transformed into a
different format; instead of installing a series of text files from the
FDA's website, the database file(s) are installed on the end user's
machine.

   The "Software" - or executables - are not installed on the
end-user's machine; the binary package consist solely of a database and
its support files (mostly documentation) all of which should work on
any architecture which Berkeley DB 3 has been ported.

> > While this may seem well and good, I wonder if (1) is such a great
> > step and if it would be better to simply ship the data with the
> > sources. Two reasons come to mind:
> >
> > 	1) The data occasionally changes, meaning that some tweaking
> >            of the software's config file may be necessary.

> You could perhaps use versioned depends to solve this.

   I don't consider this to be a big problem except manual supervision
seems necessary for each new dataset (they update the data
quarterly). But, yes that would solve a situation in which the data
changes dramatically. Generally, however, its overall format is fairly
consistent but I'm concerned a few caveats may show up on occasion.

> >         2) Fetching the data in an automated build environment seems
> >            overly redundant.

> I would vote against fetching the data while building the package
> (did I understand this right).  The build process should work on an
> offline machine.  So you could provide the data in following ways:
> 
>   1. data_u.v.orig.tar.gz -> data_u.v-w_all.deb
>       (Perhaps: "Suggests: prog")
> 
>      prog_x.y.orig.tar.gz -> prog_x.y-z_arch.deb
>       Depending from the questions above:
>         Build-Depends: data (= u.v-w)  [if this versioning makes sense]

Yes. This one makes sense because the raw data - the text files from
the website - are only needed for the build.

> 
>         Depends: data (= u.v-w)  [if this versioning makes sense]

>   2. Add data fetched from the web as patch to the upstream source of prog.
>      Please make sure to build two binary packages from it, to separate
>      architecture independent data from architecture dependend binary.
> 
>   3. Create a new upstream source prog+data_x.y.orig.tar.gz and explain
>      in debian/README.Debian in detail why this makes sense.
> 

Thank you for your input,

Elizabeth



Reply to: