On 2021-03-31 15:45 +0100, Wookey wrote: OK. Time for an update as I've made some progress. I now have a build for libtensorflow-cc2 libtensorflow-framework2 libtensorflow-dev Which completes with no lintian errors (there are still some warnings). Various things done: * Collected .h files into -dev package (this is done horribly with rsync because tensorflow/bazel doesn't have a 'make install' I can just use - but it does know the list of headers so I'm sure there is a better way). * Create symlinks to .so files (bazel does it for libtensorflow_framework2.so.* but not libtensorflow_cc2.so.* - I don't know why yet) * Updated symbols files and fixed version errors * Removed rpaths (the ugly way for now - see 'rpaths' thread on debian-bazel) * Enabled tests to build, then disabled them again as the build has errors * Turned on verbose build logs so we can see what's going on (and comply with policy; 'terse' now works too) * Got it to use system copy of libpng, rather than statically embedding a copy * debian/rules clean actually cleans the bazel cache (but only by using rm -rf /tmp/.cache/bazel because bazel clean --expunge seems not to work - see 'clean builds' thread on debian-bazel list) So as you can see there is a theme of hacking about in the rules file (because I understand that stuff) rather than trying to work out how to get bazel to do the build and install the way we want it (because I mostly don't understand/am not familiar with that stuff). We can improve this over time, and upstream fixes for a more distro-friendly build process. The only glaring omission now is that the -debug packages are empty of debug symbols, because we are doing the 'opt' build, which optimises and throws away all the debug stuff. There is a 'dbg' build, but I guess that turns all the optimisation off, which we don't want either. I am trying to get it to use -g (keep debug info) instead of -g0 (create no debug info) and then dh_dwz/debhelper should just DTRT. It seems that the 'dbg' build is what we want as that's the same as 'opt' but with '-g'. Perfect. However trying that means that the final link command fails: ERROR: /home/wookey/packages/tensorflow/salsa/tensorflow/BUILD:754:20: Couldn't build file tensorflow/libtensorflow_cc.so.2.3.1: Linking of rule '//tensorflow:libtensorflow_cc.so.2.3.1' failed (Exit 1): gcc failed: error executing command (cd /tmp/.cache/bazel/_bazel_wookey/5a73853b1764682f4fdcfb56b63560fb/execroot/org_tensorflow && \ exec env - \ PATH=/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11 \ PWD=/proc/self/cwd \ PYTHON_BIN_PATH=/usr/bin/python3 \ PYTHON_LIB_PATH=/usr/lib/python3/dist-packages \ TF2_BEHAVIOR=1 \ TF_CONFIGURE_IOS=0 \ TF_ENABLE_XLA=1 \ /usr/bin/gcc @bazel-out/k8-dbg/bin/tensorflow/libtensorflow_cc.so.2.3.1-2.params) Execution platform: @local_execution_config_platform//:platform tensorflow/core/kernels/data/experimental/io_ops.cc:120: error: undefined reference to 'tensorflow::data::experimental::SaveDatasetOp::kFileFormatVersion' tensorflow/core/kernels/data/experimental/io_ops.cc:234: error: undefined reference to 'tensorflow::data::experimental::LoadDatasetOp::kCompression' tensorflow/core/kernels/data/experimental/io_ops.cc:234: error: undefined reference to 'tensorflow::data::experimental::LoadDatasetOp::kReaderFunc' tensorflow/core/kernels/data/experimental/io_ops.cc:234: error: undefined reference to 'tensorflow::data::experimental::LoadDatasetOp::kReaderFuncTarguments' tensorflow/core/kernels/data/experimental/snapshot_dataset_op.cc:372: error: undefined reference to 'tensorflow::data::experimental::SnapshotDatasetV2Op::kCompression' tensorflow/core/kernels/data/experimental/snapshot_dataset_op.cc:372: error: undefined reference to 'tensorflow::data::experimental::SnapshotDatasetV2Op::kReaderFunc' tensorflow/core/kernels/data/experimental/snapshot_dataset_op.cc:372: error: undefined reference to 'tensorflow::data::experimental::SnapshotDatasetV2Op::kShardFunc' tensorflow/core/kernels/data/experimental/snapshot_dataset_op.cc:372: error: undefined reference to 'tensorflow::data::experimental::SnapshotDatasetV2Op::kReaderFuncTarguments' tensorflow/core/kernels/data/experimental/snapshot_dataset_op.cc:372: error: undefined reference to 'tensorflow::data::experimental::SnapshotDatasetV2Op::kShardFuncTarguments' tensorflow/core/kernels/data/experimental/snapshot_dataset_op.cc:696: error: undefined reference to 'tensorflow::data::experimental::SnapshotDatasetV2Op::kFileFormatVersion' The exact same build with config=opt works fine. The params file is here: http://wookware.org/software/tensorflow/libtensorflow_cc.so.2.3.1-2.params Anyone got any ideas why this might be failing? Bazel has a concept of --fission builds where it does the splitting of binaries and debug info files (that dh_DWZ does). In theory that's nice but I'm not sure how to interface it with the debian automatic -dbg package machinery. I need to get my commits and fixups into reasonable order before checking them in. (I have learned to drive magit this week, which has dramatically reduced the amount of frustration git gives me: it's extremely nice) Once that is done we may be ready for an upload of this initial package to new. Next week (I'm away for long weekend in a few hours) However see below about embedded libs. The next jobs are to sort out the googleapis package so we can build the C library (also waiting on me checking-in my half-arsed work so far into salsa), work out how to build tflite in the debian context, and build the python bindings. Then there is stuff like ensuring the hardening flags are set right, seeing what our reproducibility is like and getting bazel to do more of the right things so there is less fixing-up in the /rules. I also have a question about symbols and ABIs: What guarantees does upstream make about backwards/forwards compatibility? They are putting SONAMEs in and managing major, minor, patch versioning, which is better than many projects these days. I'm wondering what the right strategy is for abi/api versioning. I presume we will have quite a lot of packages using this so we should try and do it right. However then this question of ABIs gets sidetracked by something I noticed whilst looking at the symbols situation: The symbols file for libtensorflow_cc2 is 24MB (that's really quite fat) Is it worth putting that in the package? I'm not sure anyone is going to actually 'maintain' it beyond autogenerating a new one each version. Symbols files work OK for C but are bloated and awkward for C++. Even so 24MB seems huge. lintian only complained about an embedded libpng, but now I look I am pretty sure there is a still a range of embedded statically-linked libs hiding in there. We have lots of symbols like: ZN6google8protobuf3MapINSt7__* _ZN4absl14lts_2020_02_* AES_decrypt@Base BORINGSSL_self_test@Base _ZN3Aws22AmazonWebService* So I think that means that despite turning off network downloads it's still embedding protobuf, boringssl, google_abls, highwayhash, farmhash and some AWS stuff (at least). I'm not sure where it is getting them from... Some of this is the stuff Yun told us about at the start of the thread... But it shouldn't be embedding com_google_protobuf or gif, because those are already listed in --repo_env=TF_SYSTEM_LIBS=<list> bazel command line in the rules file. I guess I'll have to pore over the logs some more and see how the workspace is getting set up. The build log is here: http://wookware.org/software/tensorflow/tensorflow_2.3.1-1_amd64.build Most of this should be fixable in due course, but what is our view on uploading sooner vs expunging all embedded libs? I am normally something of a purist on this, but there is some demand for this so maybe some embedded libs are OK for the time being? Not sure if the ftpmasters will agree, even if we do... Wookey -- Principal hats: Linaro, Debian, Wookware, ARM http://wookware.org/
Attachment:
signature.asc
Description: PGP signature