[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#990302: ITP: bulk-extractor -- A stream-based forensics tool for triage and cross-evidence analysis, which scans the media and extracts recognizable content



Hello Samuel,

thanks for your message.

On 03.07.21 01:39, Samuel Henrique wrote:
Hello Jan,

This would be a great package to have it on Debian,

Happy to hear that!

I usually do a quick review to see if I spot any noticeable issues
before I do a deep dive on it (which I would to during this weekend),
and I notice an issue on d/rules, there are some commands doing:
"test -d foo || git clone bar"

Unfortunately the repository, that you reviewed is very much work-in-progress and it was not intended by me, that somebody else looks at it. Since I originally posted on the mailing list due to the licensing issues, which could be resolved by

    a) discussing with upstream
    b) reimplementing the JSON-scanner-code with libjson-c
This is an issue because it goes against our policy of not using
network during the build process[0], you can read a recent discussion
about it on LWN as well[1]:
"For packages in the main archive, no required targets may attempt
network access, except, via the loopback interface, to services on the
build host that have been started by the build."

In order to fix this issue you have two options:
1) Package those projects separately and add them to B-D.
2) Repack the upstream tarball and vendor/bundle them in.

You would usually prefer option 1 when the libraries could be reused
by other packages and option 2 when they are likely to only be used by
your package (usually means the same upstream).
But sometimes, even if the library could be used by another package in
the future (but it's not currently), you can go with option 1 if it
makes more sense. Beware that there is not a clear consensus on this
matter so some arguing might be needed (even though we have examples
of packages vendoring libraries which are already available in a
standalone manner on main).

Looking at the three libraries we are talking about:
simsong/be13_api
simsong/dfxml (watchout cause it looks like this one has just been
moved to a different repo)
nbeebe/sceadan

It looks like it's totally fine to vendor be13_api and dfxml, it seems
like sceadan is generic enough to be used by other projects but I
didn't do a proper check.
I suggest you consider the options here and let us know what you think
it's best.
I am aware of not using the network during the build. Actually I just copied the rules-file from the Kali-repo and did nothing else to it, sorry that you looked at it and wrote a thorough review about it, did not intend that, but thanks for that anyways.

I thought, I will package scaedan and dfxml as separate Debian packages, since they are generic and of use for others.

If you don't know about dfxml, here is a short quote from the abstract of the original research paper:

    "Digital Forensics XML (DFXML) is an XML language that enables the exchange of structured     forensic information. DFXML can represent the provenance of data subject to forensic     investigation, document the presence and location of file systems, files, Microsoft     Windows Registry entries, JPEG EXIFs, and other technical information of interest to the
    forensic analyst." [0]

Furthermore, the NIST is concerned with dfxml [1]. Dfxml is currently primarily used by universities and analysts looking at the traces of applications, so I think, it would be a valuable addition Debian -- independent of bulk_extractor, don't you think so?

Right now I am discussing and working with upstream on the organisation of the dfxml-project [1]. Simson Garfinkel and Alex Nelson -- the upstream authors -- decided to build up language-specific repositories on my proposal and relocated the projects under a Github group-account called dfxml-working-group [2].

There is still some work to do, before we are ready to create a package from it -- a first step was to build it as a dynamically linked shared library. Currently, I have the plan to create the following packages for Debian's package archives:

- python3-dfxml: containing the python implementation of dfxml
- python3-dfxml-tools : containing helpful tools building on the Python dfxml-implementation, like fiwalk, idifference and so on - libdfxml: containing the C++-implementation of dfxml as shared library as it is used by bulk_extractor (and maybe future software?!?)
- scaedan: a package needed by bulk_extractor

What do you think about it, do you think this is reasoable and that I will find a sponsors for those packages? If you think so, then I will file the corresponding ITPs in the course of the next week.
Oh, and since you are in contact with upstream, this sort of issue is
sometimes solved by upstream providing a release tarball that includes
the submodules. The issue is that as far as I know Github does not
provide this feature, so they have to use a script to generate the
tarball and attach it to the release.
This makes the tarball easier to be worked on/packaged by other
distros as well[2], but it's also easy for us to workaround so this is
a tradeoff between bothering upstream vs repacking on our side.
Considering upstream is focused on a rewrite of bulk_extractor, it
might be a good idea to repack it ourselves, I just wanted to let you
know so you're aware of the ideal fix for this if it happens again in
the future.
This is a great hint and it concerns be13_api. So if I understand correctly, I could just add the be13_api-submodule in the salsa-repo, right?
Thanks for your work!

[0] https://www.debian.org/doc/debian-policy/ch-source.html#main-building-script-debian-rules
[1] https://lwn.net/Articles/700465/
[2] And I guess it's also easier for users who wants to build it
themselves, as plain git clone will not checkout the submodules.



--
Samuel Henrique <samueloph>

Thank you a lot, Samuel, for getting back to me and providing profound advice! Much appreciated.

Best regards,
    Jan

---
[0] Garfinkel, S. (2012). Digital forensics XML and the DFXML toolset. Digital Investigation, 8(3-4), 161-174.
        https://core.ac.uk/download/pdf/36736443.pdf
[1] https://www.nist.gov/itl/ssd/software-quality-group/national-software-reference-library-nsrl/technical-information/dfxml
[2] https://github.com/simsong/dfxml/pull/69
[3] https://github.com/dfxml-working-group


Reply to: