[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Fwd: Packaging TensorFlow for Debian



On 2021-03-31 15:45 +0100, Wookey wrote:

OK. Time for an update as I've made some progress.

I now have a build for
libtensorflow-cc2
libtensorflow-framework2
libtensorflow-dev

Which completes with no lintian errors (there are still some warnings).

Various things done:

* Collected .h files into -dev package (this is done horribly with
  rsync because tensorflow/bazel doesn't have a 'make install' I can
  just use - but it does know the list of headers so I'm sure there is
  a better way).
* Create symlinks to .so files (bazel does it for
  libtensorflow_framework2.so.* but not libtensorflow_cc2.so.* - I
  don't know why yet)
* Updated symbols files and fixed version errors
* Removed rpaths (the ugly way for now - see 'rpaths' thread on debian-bazel)
* Enabled tests to build, then disabled them again as the build has errors
* Turned on verbose build logs so we can see what's going on (and
  comply with policy; 'terse' now works too)
* Got it to use system copy of libpng, rather than statically embedding a copy
* debian/rules clean actually cleans the bazel cache (but only by using rm
  -rf /tmp/.cache/bazel because bazel clean --expunge seems not to
  work - see 'clean builds' thread on debian-bazel list)

So as you can see there is a theme of hacking about in the rules file
(because I understand that stuff) rather than trying to work out how
to get bazel to do the build and install the way we want it (because I
mostly don't understand/am not familiar with that stuff). We can
improve this over time, and upstream fixes for a more distro-friendly
build process.

The only glaring omission now is that the -debug packages are empty of
debug symbols, because we are doing the 'opt' build, which optimises
and throws away all the debug stuff. There is a 'dbg' build, but I
guess that turns all the optimisation off, which we don't want either.
I am trying to get it to use -g (keep debug info) instead of -g0
(create no debug info) and then dh_dwz/debhelper should just DTRT.

It seems that the 'dbg' build is what we want as that's the same as 'opt' but with '-g'. Perfect. However trying that means that the final link command fails:
ERROR: /home/wookey/packages/tensorflow/salsa/tensorflow/BUILD:754:20: Couldn't build file tensorflow/libtensorflow_cc.so.2.3.1: Linking of rule '//tensorflow:libtensorflow_cc.so.2.3.1' failed (Exit 1): gcc failed: error executing command 
  (cd /tmp/.cache/bazel/_bazel_wookey/5a73853b1764682f4fdcfb56b63560fb/execroot/org_tensorflow && \
  exec env - \
    PATH=/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11 \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/usr/bin/python3 \
    PYTHON_LIB_PATH=/usr/lib/python3/dist-packages \
    TF2_BEHAVIOR=1 \
    TF_CONFIGURE_IOS=0 \
    TF_ENABLE_XLA=1 \
  /usr/bin/gcc @bazel-out/k8-dbg/bin/tensorflow/libtensorflow_cc.so.2.3.1-2.params)
Execution platform: @local_execution_config_platform//:platform
tensorflow/core/kernels/data/experimental/io_ops.cc:120: error: undefined reference to 'tensorflow::data::experimental::SaveDatasetOp::kFileFormatVersion'
tensorflow/core/kernels/data/experimental/io_ops.cc:234: error: undefined reference to 'tensorflow::data::experimental::LoadDatasetOp::kCompression'
tensorflow/core/kernels/data/experimental/io_ops.cc:234: error: undefined reference to 'tensorflow::data::experimental::LoadDatasetOp::kReaderFunc'
tensorflow/core/kernels/data/experimental/io_ops.cc:234: error: undefined reference to 'tensorflow::data::experimental::LoadDatasetOp::kReaderFuncTarguments'
tensorflow/core/kernels/data/experimental/snapshot_dataset_op.cc:372: error: undefined reference to 'tensorflow::data::experimental::SnapshotDatasetV2Op::kCompression'
tensorflow/core/kernels/data/experimental/snapshot_dataset_op.cc:372: error: undefined reference to 'tensorflow::data::experimental::SnapshotDatasetV2Op::kReaderFunc'
tensorflow/core/kernels/data/experimental/snapshot_dataset_op.cc:372: error: undefined reference to 'tensorflow::data::experimental::SnapshotDatasetV2Op::kShardFunc'
tensorflow/core/kernels/data/experimental/snapshot_dataset_op.cc:372: error: undefined reference to 'tensorflow::data::experimental::SnapshotDatasetV2Op::kReaderFuncTarguments'
tensorflow/core/kernels/data/experimental/snapshot_dataset_op.cc:372: error: undefined reference to 'tensorflow::data::experimental::SnapshotDatasetV2Op::kShardFuncTarguments'
tensorflow/core/kernels/data/experimental/snapshot_dataset_op.cc:696: error: undefined reference to 'tensorflow::data::experimental::SnapshotDatasetV2Op::kFileFormatVersion'

The exact same build with config=opt works fine. The params file is here:
http://wookware.org/software/tensorflow/libtensorflow_cc.so.2.3.1-2.params
Anyone got any ideas why this might be failing?

Bazel has a concept of --fission builds where it does the splitting of
binaries and debug info files (that dh_DWZ does). In theory that's
nice but I'm not sure how to interface it with the debian automatic
-dbg package machinery.

I need to get my commits and fixups into reasonable order before
checking them in. (I have learned to drive magit this week,
which has dramatically reduced the amount of frustration git gives me:
it's extremely nice)

Once that is done we may be ready for an upload of this initial package to
new. Next week (I'm away for long weekend in a few hours)

However see below about embedded libs.

The next jobs are to sort out the googleapis package so we can build
the C library (also waiting on me checking-in my half-arsed work so
far into salsa), work out how to build tflite in the debian context,
and build the python bindings.

Then there is stuff like ensuring the hardening flags are set right,
seeing what our reproducibility is like and getting bazel to do more
of the right things so there is less fixing-up in the /rules.

I also have a question about symbols and ABIs:

What guarantees does upstream make about backwards/forwards
compatibility? They are putting SONAMEs in and managing major, minor,
patch versioning, which is better than many projects these days.

I'm wondering what the right strategy is for abi/api versioning. I
presume we will have quite a lot of packages using this so we should
try and do it right.

However then this question of ABIs gets sidetracked by something I
noticed whilst looking at the symbols situation: The symbols file for
libtensorflow_cc2 is 24MB (that's really quite fat) Is it worth
putting that in the package? I'm not sure anyone is going to actually
'maintain' it beyond autogenerating a new one each version. Symbols
files work OK for C but are bloated and awkward for C++. Even so 24MB
seems huge. lintian only complained about an embedded libpng, but now
I look I am pretty sure there is a still a range of embedded
statically-linked libs hiding in there.

We have lots of symbols like:
ZN6google8protobuf3MapINSt7__*
_ZN4absl14lts_2020_02_*
AES_decrypt@Base
BORINGSSL_self_test@Base
_ZN3Aws22AmazonWebService*

So I think that means that despite turning off network downloads it's
still embedding protobuf, boringssl, google_abls, highwayhash,
farmhash and some AWS stuff (at least). I'm not sure where it is
getting them from... Some of this is the stuff Yun told us about at
the start of the thread... But it shouldn't be embedding
com_google_protobuf or gif, because those are already listed in
--repo_env=TF_SYSTEM_LIBS=<list> bazel command line in the rules
file. I guess I'll have to pore over the logs some more and see how
the workspace is getting set up.

The build log is here: http://wookware.org/software/tensorflow/tensorflow_2.3.1-1_amd64.build

Most of this should be fixable in due course, but what is our view on
uploading sooner vs expunging all embedded libs?  I am normally
something of a purist on this, but there is some demand for this so
maybe some embedded libs are OK for the time being?
Not sure if the ftpmasters will agree, even if we do...


Wookey
-- 
Principal hats:  Linaro, Debian, Wookware, ARM
http://wookware.org/

Attachment: signature.asc
Description: PGP signature


Reply to: