[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Fwd: Packaging TensorFlow for Debian



Hi Wookey,

Thanks for your work and the updates. I think I can commont
on some of the details.
As I had ever written a hacky build system for tensorflow,
I still remember some details of it.

On Fri, 2021-05-28 at 02:07 +0100, Wookey wrote:
> 
> * Collected .h files into -dev package (this is done horribly with
>   rsync because tensorflow/bazel doesn't have a 'make install' I can
>   just use - but it does know the list of headers so I'm sure there
> is
>   a better way).

IIRC the list of header files that should be installed varies
according to the configuration. For example, if we enable the
component A, but disable the component B when compiling the shared
objects, then headers for component B should be filtered out.

Bazel is able to resolve the file-level dependency for a given
target, where the "files" surely includes headers. (for example,
in traditional Makefiles we also list headers in the dependency
list in order to rebuild correctly.)

We can dump that dependency using bazel, and grep the headers out.
I've forgotten the concrete command to achieve that. Oops.

> * Create symlinks to .so files (bazel does it for
>   libtensorflow_framework2.so.* but not libtensorflow_cc2.so.* - I
>   don't know why yet)

Maybe the corresponding symlink target is not included in the
dependency tree for the current building targets.

> * Got it to use system copy of libpng, rather than statically
> embedding a copy

Great. When using a statically linked library that is also used
in many other system components, we should be cautious to potential
symbol clashes once the underlying libpng versions are not aligned.

Tensorflow itself is already very good at introducing CVEs, e.g.
https://github.com/tensorflow/tensorflow/releases/tag/v2.5.0
and we won't have the energy to deal with the CVEs in its embedded
libraries.

> What guarantees does upstream make about backwards/forwards
> compatibility? They are putting SONAMEs in and managing major, minor,
> patch versioning, which is better than many projects these days.
> 
> I'm wondering what the right strategy is for abi/api versioning. I
> presume we will have quite a lot of packages using this so we should
> try and do it right.

If the upstream tightly stick to the semantics versioning, we should
probably directly use the upstream SONAMES. I'm doing this for other
two packages I maintain: opencv and pytorch. Their soversions are both
<MAJOR.MINOR>. Although we have to pass the NEW queue every time when
there is a MINOR bump ... at least we won't easily break reverse deps.

> However then this question of ABIs gets sidetracked by something I
> noticed whilst looking at the symbols situation: The symbols file for
> libtensorflow_cc2 is 24MB (that's really quite fat) Is it worth
> putting that in the package? I'm not sure anyone is going to actually
> 'maintain' it beyond autogenerating a new one each version. Symbols
> files work OK for C but are bloated and awkward for C++. Even so 24MB
> seems huge.

C++ symbols are known to be hard to track. As currently we don't
expect many reverse dependencies of libtensorflow, maybe we should
not track it manually, at least for now.

> lintian only complained about an embedded libpng, but now
> I look I am pretty sure there is a still a range of embedded
> statically-linked libs hiding in there.
> 
> We have lots of symbols like:
> ZN6google8protobuf3MapINSt7__*
> _ZN4absl14lts_2020_02_*
> AES_decrypt@Base
> BORINGSSL_self_test@Base
> _ZN3Aws22AmazonWebService*

Maybe it's using static libraries for some of them?

> So I think that means that despite turning off network downloads it's
> still embedding protobuf, boringssl, google_abls, highwayhash,
> farmhash and some AWS stuff (at least). I'm not sure where it is
> getting them from... Some of this is the stuff Yun told us about at
> the start of the thread... But it shouldn't be embedding
> com_google_protobuf or gif, because those are already listed in
> --repo_env=TF_SYSTEM_LIBS=<list> bazel command line in the rules
> file. I guess I'll have to pore over the logs some more and see how
> the workspace is getting set up.
> 
> The build log is here:      
> http://wookware.org/software/tensorflow/tensorflow_2.3.1-1_amd64.build
> 
> Most of this should be fixable in due course, but what is our view on
> uploading sooner vs expunging all embedded libs?  I am normally
> something of a purist on this, but there is some demand for this so
> maybe some embedded libs are OK for the time being?
> Not sure if the ftpmasters will agree, even if we do...

My recommendation is to pass the NEW queue first. Because it is
expected to stay in experimental for a while, and the first upload
could enhance everybody's morale.

Actually there are many details to improve in the pytorch package,
and I'm still fixing them bit by bit...

BTW, please make sure to separate the tensorflow shared objects
into separate binary packages, e.g.

bin:libtensorflow-framework.*
bin:libtensorflow.*
bin:libtensorflow_cc.*
bin:libtensorflow-dev

This is because some customized Ops/Kernels only NEED
libtensorflow_framework.so.*


Reply to: