Hurd and linux timing differences
Good evening everyone,
These days I was building the benchmark-1.9.1 Debian package on hurd-i386 and
linux-i386 on the same machine: a Toshiba netbook with an Atom N455 CPU and 2GiB
of RAM. It is a sid hurd installation (gcc-15) and a bookworm linux installation
(gcc-12). At the end of the build several tests are run which are, well benchmarks.
The timings of many tests are similar between hurd and linux, but there are
several where the test take disproportionally longer to run in the hurd. Below I
paste the output of test 1 of 77.
I was just wondering if anyone would have any insight on what may be causing
this difference. I note that on linux more information about the CPU cache is
reported, so could it be that on hurd relevant CPU features for the tests are
not picked up? I notice that the timing numbers on the hurd are usually round
numbers, so could there be problems with the timers? I would think that on this
CPU the lack of SMP on gnumach would not explain such difference.
Best regards,
João
Test running on hurd-i386 (gcc-15)
test 1
Start 1: benchmark
1: Test command: /root/benchmark-1.9.1/obj-i686-gnu/test/benchmark_test "--benchmark_min_time=0.01s"
1: Working Directory: /root/benchmark-1.9.1/obj-i686-gnu/test
1: Test timeout computed to be: 10000000
1: ***WARNING*** Failed to set thread affinity. Estimated CPU frequency may be incorrect.
1: 2026-01-10T13:17:37+00:00
1: Running /root/benchmark-1.9.1/obj-i686-gnu/test/benchmark_test
1: Run on (1 X 1651.07 MHz CPU )
1: ***WARNING*** Library was built as DEBUG. Timings may be affected.
1: -----------------------------------------------------------------------------------
1: Benchmark Time CPU Iterations
1: -----------------------------------------------------------------------------------
1: BM_Factorial 0.000 ns 0.000 ns 1000000000000 40320
1: BM_Factorial/real_time 0.000 ns 0.000 ns 1000000000000 40320
1: BM_CalculatePiRange/1 100.0 ns 100 ns 100000 0
1: BM_CalculatePiRange/8 1000 ns 1000 ns 10000 3.28374
1: BM_CalculatePiRange/64 5000 ns 5000 ns 10000 3.15746
1: BM_CalculatePiRange/512 40000 ns 40000 ns 1000 3.14355
1: BM_CalculatePiRange/4096 300000 ns 300000 ns 100 3.14184
1: BM_CalculatePiRange/32768 2000022 ns 2000000 ns 10 3.14162
1: BM_CalculatePiRange/262144 19999743 ns 20000000 ns 1 3.1416
1: BM_CalculatePiRange/1048576 79999924 ns 80000000 ns 1 3.14159
1: BM_CalculatePi/threads:8 62500 ns 62500 ns 800
1: BM_CalculatePi/threads:1 99998 ns 100000 ns 100
1: BM_CalculatePi/threads:2 70000 ns 70000 ns 2000
1: BM_CalculatePi/threads:4 24999 ns 25000 ns 400
1: BM_CalculatePi/threads:8 62500 ns 62500 ns 800
1: BM_CalculatePi/threads:16 62500 ns 62500 ns 160
1: BM_CalculatePi/threads:32 62500 ns 62500 ns 320
1: BM_CalculatePi/threads:1 100000 ns 100000 ns 100
1: BM_SetInsert/1024/128 100002 ns 100000 ns 100 bytes_per_second=4.88281Mi/s items_per_second=1.28M/s
1: BM_SetInsert/4096/128 102043 ns 102041 ns 196 bytes_per_second=4.78516Mi/s items_per_second=1.2544M/s
1: BM_SetInsert/8192/128 99998 ns 100000 ns 100 bytes_per_second=4.88281Mi/s items_per_second=1.28M/s
1: BM_SetInsert/1024/512 200000 ns 200000 ns 100 bytes_per_second=9.76563Mi/s items_per_second=2.56M/s
1: BM_SetInsert/4096/512 300002 ns 300000 ns 100 bytes_per_second=6.51042Mi/s items_per_second=1.70667M/s
1: BM_SetInsert/8192/512 428574 ns 428571 ns 140 bytes_per_second=4.55729Mi/s items_per_second=1.19467M/s
1: BM_Sequential<std::vector<int>,int>/1 20.0 ns 20.0 ns 1000000 bytes_per_second=190.735Mi/s items_per_second=50M/s
1: BM_Sequential<std::vector<int>,int>/8 714 ns 714 ns 14000 bytes_per_second=42.7246Mi/s items_per_second=11.2M/s
1: BM_Sequential<std::vector<int>,int>/64 2000 ns 2000 ns 10000 bytes_per_second=122.07Mi/s items_per_second=32M/s
1: BM_Sequential<std::vector<int>,int>/512 10000 ns 10000 ns 1000 bytes_per_second=195.312Mi/s items_per_second=51.2M/s
1: BM_Sequential<std::vector<int>,int>/1024 7143 ns 7143 ns 1400 bytes_per_second=546.875Mi/s items_per_second=143.36M/s
1: BM_Sequential<std::list<int>>/1 20.0 ns 20.0 ns 1000000 bytes_per_second=190.735Mi/s items_per_second=50M/s
1: BM_Sequential<std::list<int>>/8 1000 ns 1000 ns 10000 bytes_per_second=30.5176Mi/s items_per_second=8M/s
1: BM_Sequential<std::list<int>>/64 20000 ns 20000 ns 1000 bytes_per_second=12.207Mi/s items_per_second=3.2M/s
1: BM_Sequential<std::list<int>>/512 99998 ns 100000 ns 100 bytes_per_second=19.5312Mi/s items_per_second=5.12M/s
1: BM_Sequential<std::list<int>>/1024 300002 ns 300000 ns 100 bytes_per_second=13.0208Mi/s items_per_second=3.41333M/s
1: BM_Sequential<std::vector<int>, int>/512 6000 ns 6000 ns 10000 bytes_per_second=325.521Mi/s items_per_second=85.3333M/s
1: BM_StringCompare/1 30.0 ns 30.0 ns 1000000
1: BM_StringCompare/8 35.7 ns 35.7 ns 1400000
1: BM_StringCompare/64 100.0 ns 100 ns 100000
1: BM_StringCompare/512 100.0 ns 100 ns 100000
1: BM_StringCompare/4096 714 ns 714 ns 14000
1: BM_StringCompare/32768 10000 ns 10000 ns 1000
1: BM_StringCompare/262144 71429 ns 71429 ns 140
1: BM_StringCompare/1048576 999975 ns 1000000 ns 10
1: BM_SetupTeardown/threads:1 100 ns 100 ns 100000
1: BM_LongTest/65536 1428553 ns 1428571 ns 14
1: BM_LongTest/262144 4999995 ns 5000000 ns 10
1: BM_LongTest/2097152 39999962 ns 40000000 ns 1
1: BM_LongTest/16777216 309999943 ns 310000000 ns 1
1: BM_LongTest/134217728 2490000248 ns 2490000000 ns 1
1: BM_LongTest/268435456 4960000038 ns 4960000000 ns 1
1: BM_ParallelMemset/1048576/threads:1 1999998 ns 2000000 ns 10
1: BM_ParallelMemset/1048576/threads:2 500000 ns 500000 ns 20
1: BM_ParallelMemset/1048576/threads:4 400000 ns 400000 ns 400
1: BM_ManualTiming/1/real_time 19999981 ns 0.000 ns 1 items_per_second=50/s
1: BM_ManualTiming/8/real_time 20000219 ns 0.000 ns 1 items_per_second=399.996/s
1: BM_ManualTiming/64/real_time 19999743 ns 0.000 ns 1 items_per_second=3.20004k/s
1: BM_ManualTiming/512/real_time 20000219 ns 0.000 ns 1 items_per_second=25.5997k/s
1: BM_ManualTiming/4096/real_time 19999743 ns 0.000 ns 1 items_per_second=204.803k/s
1: BM_ManualTiming/16384/real_time 29999971 ns 0.000 ns 1 items_per_second=546.134k/s
1: BM_ManualTiming/1/manual_time 20000000 ns 0.000 ns 1 items_per_second=50/s
1: BM_ManualTiming/8/manual_time 20000000 ns 0.000 ns 1 items_per_second=400/s
1: BM_ManualTiming/64/manual_time 20000000 ns 0.000 ns 1 items_per_second=3.2k/s
1: BM_ManualTiming/512/manual_time 20000000 ns 0.000 ns 1 items_per_second=25.6k/s
1: BM_ManualTiming/4096/manual_time 20000000 ns 0.000 ns 1 items_per_second=204.8k/s
1: BM_ManualTiming/16384/manual_time 30000000 ns 0.000 ns 1 items_per_second=546.133k/s
1: BM_with_args/int_test 0.000 ns 0.000 ns 1000000000000
1: BM_with_args/string_and_pair_test 0.000 ns 0.000 ns 1000000000000
1: BM_non_template_args/basic_test 6.00 ns 6.00 ns 10000000
1: BM_template2_capture<void,char*>/foo 0.000 ns 0.000 ns 1000000000000
1: (BM_template2_capture<void, char*>)/foo 0.000 ns 0.000 ns 1000000000000
1: BM_template1_capture<void>/foo 0.000 ns 0.000 ns 1000000000000
1: BM_template1_capture<void>/foo 0.000 ns 0.000 ns 1000000000000
1: BM_DenseThreadRanges/1/threads:1 7.14 ns 7.14 ns 1400000
1: BM_DenseThreadRanges/1/threads:2 5.50 ns 5.50 ns 20000000
1: BM_DenseThreadRanges/1/threads:3 5.33 ns 5.33 ns 30000000
1: BM_DenseThreadRanges/2/threads:1 5.10 ns 5.10 ns 1960000
1: BM_DenseThreadRanges/2/threads:3 5.33 ns 5.33 ns 30000000
1: BM_DenseThreadRanges/2/threads:4 2.50 ns 2.50 ns 4000000
1: BM_DenseThreadRanges/3/threads:5 4.00 ns 4.00 ns 5000000
1: BM_DenseThreadRanges/3/threads:8 5.00 ns 5.00 ns 8000000
1: BM_DenseThreadRanges/3/threads:11 5.45 ns 5.45 ns 11000000
1: BM_DenseThreadRanges/3/threads:14 5.00 ns 5.00 ns 14000000
1: BM_BenchmarkName 0.000 ns 0.000 ns 1000000000000
1: BM_templated_test_double 5.00 ns 5.00 ns 14000000
1/77 Test #1: benchmark .................................. Passed 914.35 sec
Test running on linux-i386 (gcc-12)
test 1
Start 1: benchmark
1: Test command: /root/benchmark-1.9.1/obj-i686-linux-gnu/test/benchmark_test "--benchmark_min_time=0.01s"
1: Working Directory: /root/benchmark-1.9.1/obj-i686-linux-gnu/test
1: Test timeout computed to be: 10000000
1: 2026-01-10T12:27:08+00:00
1: Running /root/benchmark-1.9.1/obj-i686-linux-gnu/test/benchmark_test
1: Run on (2 X 1667 MHz CPU s)
1: CPU Caches:
1: L1 Data 24 KiB (x1)
1: L1 Instruction 32 KiB (x1)
1: L2 Unified 512 KiB (x1)
1: Load Average: 2.20, 2.04, 1.36
1: ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
1: ***WARNING*** Library was built as DEBUG. Timings may be affected.
1: -----------------------------------------------------------------------------------
1: Benchmark Time CPU Iterations
1: -----------------------------------------------------------------------------------
1: BM_Factorial 0.000 ns 0.000 ns 1000000000000 40320
1: BM_Factorial/real_time 0.000 ns 0.000 ns 1000000000000 40320
1: BM_CalculatePiRange/1 91.4 ns 91.2 ns 125352 0
1: BM_CalculatePiRange/8 586 ns 586 ns 20650 3.28374
1: BM_CalculatePiRange/64 4443 ns 4192 ns 3359 3.15746
1: BM_CalculatePiRange/512 32820 ns 32698 ns 432 3.14355
1: BM_CalculatePiRange/4096 260536 ns 259665 ns 54 3.14184
1: BM_CalculatePiRange/32768 2077010 ns 2069820 ns 7 3.14162
1: BM_CalculatePiRange/262144 16634448 ns 16609300 ns 1 3.1416
1: BM_CalculatePiRange/1048576 66680197 ns 66394292 ns 1 3.14159
1: BM_CalculatePi/threads:8 66049 ns 66179 ns 208
1: BM_CalculatePi/threads:1 66063 ns 65603 ns 214
1: BM_CalculatePi/threads:2 65323 ns 65223 ns 214
1: BM_CalculatePi/threads:4 65661 ns 65715 ns 208
1: BM_CalculatePi/threads:8 65642 ns 65805 ns 168
1: BM_CalculatePi/threads:16 85200 ns 85144 ns 144
1: BM_CalculatePi/threads:32 80880 ns 80787 ns 192
1: BM_CalculatePi/threads:2 111614 ns 82884 ns 162
1: BM_SetInsert/1024/128 147820 ns 143518 ns 98 bytes_per_second=3.40223Mi/s items_per_second=891.874k/s
1: BM_SetInsert/4096/128 139526 ns 134286 ns 100 bytes_per_second=3.63614Mi/s items_per_second=953.192k/s
1: BM_SetInsert/8192/128 106201 ns 102341 ns 136 bytes_per_second=4.77111Mi/s items_per_second=1.25072M/s
1: BM_SetInsert/1024/512 324200 ns 323643 ns 44 bytes_per_second=6.03481Mi/s items_per_second=1.58199M/s
1: BM_SetInsert/4096/512 354812 ns 353169 ns 39 bytes_per_second=5.53029Mi/s items_per_second=1.44973M/s
1: BM_SetInsert/8192/512 385214 ns 381479 ns 37 bytes_per_second=5.11988Mi/s items_per_second=1.34215M/s
1: BM_Sequential<std::vector<int>,int>/1 15.8 ns 15.8 ns 892447 bytes_per_second=242.043Mi/s items_per_second=63.4501M/s
1: BM_Sequential<std::vector<int>,int>/8 1203 ns 1198 ns 11567 bytes_per_second=25.4637Mi/s items_per_second=6.67515M/s
1: BM_Sequential<std::vector<int>,int>/64 2502 ns 2502 ns 5544 bytes_per_second=97.576Mi/s items_per_second=25.579M/s
1: BM_Sequential<std::vector<int>,int>/512 6961 ns 6961 ns 2000 bytes_per_second=280.583Mi/s items_per_second=73.553M/s
1: BM_Sequential<std::vector<int>,int>/1024 10736 ns 10737 ns 1298 bytes_per_second=363.819Mi/s items_per_second=95.3729M/s
1: BM_Sequential<std::list<int>>/1 15.1 ns 15.1 ns 907665 bytes_per_second=252.262Mi/s items_per_second=66.1289M/s
1: BM_Sequential<std::list<int>>/8 1361 ns 1361 ns 10113 bytes_per_second=22.4166Mi/s items_per_second=5.87638M/s
1: BM_Sequential<std::list<int>>/64 16198 ns 16198 ns 851 bytes_per_second=15.0721Mi/s items_per_second=3.95107M/s
1: BM_Sequential<std::list<int>>/512 134870 ns 134641 ns 102 bytes_per_second=14.5062Mi/s items_per_second=3.80271M/s
1: BM_Sequential<std::list<int>>/1024 270160 ns 270165 ns 51 bytes_per_second=14.4588Mi/s items_per_second=3.79028M/s
1: BM_Sequential<std::vector<int>, int>/512 6979 ns 6979 ns 2015 bytes_per_second=279.872Mi/s items_per_second=73.3667M/s
1: BM_StringCompare/1 26.0 ns 26.0 ns 534243
1: BM_StringCompare/8 30.9 ns 30.9 ns 449312
1: BM_StringCompare/64 54.1 ns 54.0 ns 260019
1: BM_StringCompare/512 132 ns 132 ns 106356
1: BM_StringCompare/4096 675 ns 675 ns 20791
1: BM_StringCompare/32768 10043 ns 10043 ns 1396
1: BM_StringCompare/262144 110957 ns 110943 ns 112
1: BM_StringCompare/1048576 557859 ns 557843 ns 25
1: BM_SetupTeardown/threads:2 1092 ns 1090 ns 10942
1: BM_LongTest/65536 1073740 ns 1073773 ns 13
1: BM_LongTest/262144 4276777 ns 4276822 ns 3
1: BM_LongTest/2097152 34245975 ns 34244507 ns 1
1: BM_LongTest/16777216 275434048 ns 275372861 ns 1
1: BM_LongTest/134217728 2630417458 ns 2586242143 ns 1
1: BM_LongTest/268435456 5539464480 ns 5225060798 ns 1
1: BM_ParallelMemset/1048576/threads:1 2193051 ns 2192765 ns 6
1: BM_ParallelMemset/1048576/threads:2 1097364 ns 1097254 ns 12
1: BM_ParallelMemset/1048576/threads:4 549098 ns 549086 ns 24
1: BM_ManualTiming/1/real_time 90250 ns 30143 ns 154 items_per_second=11.0803k/s
1: BM_ManualTiming/8/real_time 97199 ns 29628 ns 144 items_per_second=82.3056k/s
1: BM_ManualTiming/64/real_time 153697 ns 29726 ns 91 items_per_second=416.404k/s
1: BM_ManualTiming/512/real_time 603577 ns 31587 ns 23 items_per_second=848.276k/s
1: BM_ManualTiming/4096/real_time 4196344 ns 38117 ns 3 items_per_second=976.088k/s
1: BM_ManualTiming/16384/real_time 16494556 ns 48356 ns 1 items_per_second=993.297k/s
1: BM_ManualTiming/1/manual_time 87257 ns 29966 ns 163 items_per_second=11.4605k/s
1: BM_ManualTiming/8/manual_time 93921 ns 29618 ns 148 items_per_second=85.1779k/s
1: BM_ManualTiming/64/manual_time 149465 ns 29562 ns 95 items_per_second=428.195k/s
1: BM_ManualTiming/512/manual_time 600334 ns 31126 ns 23 items_per_second=852.858k/s
1: BM_ManualTiming/4096/manual_time 4188661 ns 36994 ns 3 items_per_second=977.878k/s
1: BM_ManualTiming/16384/manual_time 16485895 ns 53997 ns 1 items_per_second=993.819k/s
1: BM_with_args/int_test 0.000 ns 0.000 ns 1000000000000
1: BM_with_args/string_and_pair_test 0.000 ns 0.000 ns 1000000000000
1: BM_non_template_args/basic_test 7.48 ns 7.48 ns 1867947
1: BM_template2_capture<void,char*>/foo 0.000 ns 0.000 ns 1000000000000
1: (BM_template2_capture<void, char*>)/foo 0.000 ns 0.000 ns 1000000000000
1: BM_template1_capture<void>/foo 0.000 ns 0.000 ns 1000000000000
1: BM_template1_capture<void>/foo 0.000 ns 0.000 ns 1000000000000
1: BM_DenseThreadRanges/1/threads:1 7.51 ns 7.50 ns 1868828
1: BM_DenseThreadRanges/1/threads:2 7.53 ns 7.52 ns 1857232
1: BM_DenseThreadRanges/1/threads:3 7.56 ns 7.56 ns 1826112
1: BM_DenseThreadRanges/2/threads:1 7.57 ns 7.53 ns 1860386
1: BM_DenseThreadRanges/2/threads:3 7.53 ns 7.53 ns 1860381
1: BM_DenseThreadRanges/2/threads:4 7.57 ns 7.57 ns 1835544
1: BM_DenseThreadRanges/3/threads:5 7.59 ns 7.59 ns 1838820
1: BM_DenseThreadRanges/3/threads:8 7.65 ns 7.65 ns 1824576
1: BM_DenseThreadRanges/3/threads:11 7.35 ns 7.35 ns 1783067
1: BM_DenseThreadRanges/3/threads:14 9.17 ns 7.55 ns 1665608
1: BM_BenchmarkName 0.000 ns 0.000 ns 1000000000000
1: BM_templated_test_double 39.1 ns 39.1 ns 350009
1/77 Test #1: benchmark .................................. Passed 11.58 sec
Reply to: