[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

rumpdisk device timeouts



I've been using the 32 bit Hurd for my stress-ng testing so as to isolate from developments in the 64 bit version. I've used virtual machines using 'QEMU HARDDISK' and have repeatedly hit read and write timeouts when using rump disk but no evidence of any similar failures using the Linux block driver without rumpdisk.

For example:

[ 2011.9300050] wd0d: device timeout reading fsbn 10184704 of 10184704-10184711 (wd0 bn 10184704; cn 10103 tn 13 sn 61), xfer d20,\
 retry 0
[ 2011.9300050] wd0d: device timeout reading fsbn 449552 of 449552-449559 (wd0 bn 449552; cn 445 tn 15 sn 47), xfer ec0, retry 0 [ 2011.9300050] wd0d: device timeout reading fsbn 440176 of 440176-440183 (wd0 bn 440176; cn 436 tn 10 sn 58), xfer b80, retry 0 [ 2011.9300050] wd0d: device timeout reading fsbn 1502104 of 1502104-1502111 (wd0 bn 1502104; cn 1490 tn 2 sn 58), xfer f90, retry\
 0
[ 10508.3700050] wd0d: device timeout writing fsbn 176616 of 176616-176623 (wd0 bn 176616; cn 175 tn 3 sn 27), xfer d88, retry 0 [ 10508.3700050] wd0d: device timeout writing fsbn 176624 of 176624-176631 (wd0 bn 176624; cn 175 tn 3 sn 35), xfer f90, retry 0 [ 10518.8700050] wd0d: device timeout writing fsbn 176624 of 176624-176631 (wd0 bn 176624; cn 175 tn 3 sn 35), xfer f90, retry 1

I've also had a number of occasions where the rumpdisk task was semingly the central figure in a system wide lockup with the kernel debugger user space stack trace showing, for example:

thread: 32
Continuation mach_msg_continue
>>>>> user space <<<<<
mach_msg_trap 0x822daec(0x81b2360(7b518b4,2,0,18,ae)
__pthread_block 0x81e0369(7c04320,20118840,7b518f8,8120e02,0)
__pthread_cond_timedwait_internal 0x81e0ce4(7c04260,7c04b40,ffffffff,0,81e0e09)
pthread_cond_wait 0x81e0e21(7c04260,7c04b40,7c04b40,812c2bd,0)
rumpuser_cv_wait 0x812cadd(7c04260,7c04b40,7b519e8,811bd11,7c04b40)
0x811bd7c(200aefa0,200aef9c,1000,1,0)
rumpns_physio 0x818a37c(806f110,0,303,0,0)
0x8070512(303,0,7b51c64,10,1)
rumpns_cdev_write 0x80d666b(303,0,7b51c64,10,812c2a9)
rumpns_spec_write 0x8160ccb(7b51bac,8354d74,819c98b,8354d74,0)
0x819b5ba(20092000,7b51c64,10,20048040,0)
0x816d399(2011e0c0,7b51cd8,7b51c64,20048040,0)
rumpns_dofilewrite 0x8096a44(3,2011e0c0,71fb000,1000,7b51cd8)
0x817d105(20118840,7b51d58,7b51d50,8124fbe,0)
0x812500e(ae,7b51d58,18,7b51d50,0)
0x8117dac(3,71fb000,1000,20b7b000,0)
rumpdisk_device_write 0x804a413(20013f80,b2,12,0,105bd8)
_Xdevice_write 0x804d371(7b51ee0,7b53ef0,819febb,cb87f3c,7b53ef0)
0x804ab9b(7b51ee0,7b53ef0,7b51e94,0,7b53ef0)
0x804e21b(7b51ee0,7b53ef0,0,0,80001712)
0x81b26da(7b55f98,2000,10,900,1d4c0)
0x804e34b(0,8354d74,7b55fe8,819e545,0)
0x819e589(7c04320,cb87f78,0,0,

The above might not be abnormal, I haven't looked through the code yet, but nevertheless the task was stalled and the cause was not related to page-in (page wiring does seem to be functioning correctly) as it often is with other tasks during this test case. I have a virtual machine snapshot with rumpdisk in the above state if more information is helpful.

With an additional improvement to libports interruptions (which I'll mail separately about), I switched off the non-rumpdisk Hurd test case after 20 successful hours whereas using rumpdisk I can only sometimes achieve 90 minutes at the most.

Is a 64 bit rumpdisk virtual machine likely to be any more stable than the 32 bit ?

There's no indication that my host machine has any hardware issues but I suppose that there could be a bug in qemu q35/SATA rather than the i440/IDE setup on the non-rumpdisk guest. I did try a 64 bit Hurd install on an old real PC but without any success so far. My 2nd PC has only UEFI BIOS so that won't get very far I believe.




Reply to: