[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

rsync seem to be broken on sparc64



I posted about this in the kernel lists a few months ago to no avail. I see it on gentoo as well with any kernel newer than 3.18. I came across this when using lxc on sparc64. The debian template uses rsync to move the cache's rootfs to the actual container directory.

I've since modified the template to use "cp -a" instead of rsync, which works. However this could be an issue for quite a lot of people that use rsync as a backup solution. It really needs to be addressed if we want sparc64 to be a release platform.

Here's the gist of it from back then...

On Sun, Feb 21, 2016 at 01:52:55PM -0500, Alex McWhirter wrote:
On 02/14/2016 07:02 PM, Alex McWhirter wrote:
> I having a strange issue where using any 4.X kernel causes rsync to
> appear to die on a select syscall. Not sure why, maybe it's getting a
> wrong file descriptor or something. Unfortunately this starts pushing
> outside of my knowledge level of linux so bear with me. This is on a Sun
> V215 but i have also tested it on a Sun Blade 150 and a Sun Ultra 45
> with the same results. These are all sun4u boxes of course, i haven't
> tried any sun4v boxes. I''l try to spin up a T5120 this week and find
> out if it's also an issue on sun4v.
>
> Here's what I've tested.
>
> 3.14.58 "gentoo" - Works
> 3.18.26 "vanilla"  - Works
> 4.1.15   "gentoo" - Dead
> 4.1.17   "vanilla"  - Dead
> 4.4.1     "vanilla"  - Dead
>
> I don't mind hacking away at kernel sources if anyone can point me in
> the right direction. It's also worth noting that this only happens when
> the folder i am attempting to rsync is significantly large in regards to
> the amount of sub-folders and files. The Gentoo portage tree in particular.
>
> Attached is the strace output of a failing rsync job.
>
>

I've traced this down a bit further.

Kernel 3.18.26 is working but 3.19.0 is not. Git bisect traced it down
to this commit.

e5a4b0bb803b39a36478451eae53a880d2663d5b is the first bad commit
commit e5a4b0bb803b39a36478451eae53a880d2663d5b

here is the gist of that commit...

https://lkml.org/lkml/2014/12/5/25

here is the output of rsync when the error occurs.

root@Magi-01:~# rsync -a /export/test/* /export/test2
rsync: [sender] write error: Broken pipe (32)
rsync error: error in socket IO (code 10) at io.c(820) [sender=3.1.1]
root@Magi-01:~#

here is the output of rsync when executed via gdb

root@Magi-01:~# gdb /usr/bin/rsync
GNU gdb (Debian 7.11.1-2) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
/root/.gdbinit:1: Error in sourced command file:
No executable file specified.
Use the "file" or "exec-file" command.
Reading symbols from /usr/bin/rsync...(no debugging symbols found)...done.
(gdb) set args -a /export/test/* /export/test2
(gdb) run
Starting program: /usr/bin/rsync -a /export/test/* /export/test2

Program received signal SIGPIPE, Broken pipe.
0xfffff80100528fb4 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:84
84	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb)


Adrian, i believe you chimed in on this earlier. Do you have any ideas? I am up to date with the latest packages in the debian-ports repository.


Reply to: