[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#591362: linux-image-2.6.26-2-xen-686: domU hang and are unresponsive (was #534880)



On Mon, 2010-08-09 at 01:18 +0100, Ben Hutchings wrote:
> On Wed, 2010-08-04 at 15:05 +0200, Zdenek Salvet wrote:
> > On Wed, Aug 04, 2010 at 03:23:52AM +0100, Ben Hutchings wrote:
> > > > I found root cause of the problem; after I added following fix to lenny 
> > > > xen kernel, none of 56 domU froze again in one week of testing:
> > > [...]
> > > 
> > > That sounds promising.  Is this patch based on a change made by the
> > > upstream Xen or Linux developers?
> > 
> > No, it is my own fix and I have not reported it anywhere else yet.
> 
> OK.  It is clearly not applicable to the pvops version of Linux-for-Xen
> so it probably doesn't make sense to send upstream.
> 
> > The deadlock it fixes is very similar to that fixed by
> > bugfix/all/printk-robustify-printk.patch .
> 
> So you reckon xtime_lock is lower in the lock hierarchy than run-queue
> locks?  I can't see any place where xtime_lock is obtained after a
> run-queue lock, but this change nevertheless looks reasonable and safe.
> 
> Ian, any comment on this?

It seems sane to me, particularly based on the comparison with b845b51
(AKA bugfix/all/printk-robustify-printk.patch) which is concerned with
the same deadlock (albeit a different source though).

Jan, the Novell kernels might want this, it still seems to be present in
the 2.6.32 based tree. (the complete discussion can be seen at
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=591362)

It's not entirely clear to me why the Xen timer interrupt needs to call
clock_was_set anyway -- the native timer interrupts certainly do not (it
maybe that this behaviour is a hangover from an ancient kernel when
time.c was forked into time-xen.c). I didn't look too deeply since I
think the proposed patch is most likely to be the safest option for a
stable branch anyway.

Ian.

> 
> > --- source_amd64_xen/arch/x86/kernel/time_32-xen.c      2010-07-24 07:28:32.162719094 +0200
> > +++ source_amd64_xen.new/arch/x86/kernel/time_32-xen.c  2010-07-24 07:26:32.416076711 +0200
> > @@ -466,6 +466,7 @@
> >  {
> >         s64 delta, delta_cpu, stolen, blocked;
> >         unsigned int i, cpu = smp_processor_id();
> > +       int schedule_clock_was_set_work = 0;
> >         struct shadow_time_info *shadow = &per_cpu(shadow_time, cpu);
> >         struct vcpu_runstate_info runstate;
> >  
> > @@ -525,12 +526,13 @@
> >  
> >         if (shadow_tv_version != HYPERVISOR_shared_info->wc_version) {
> >                 update_wallclock();
> > -               if (keventd_up())
> > -                       schedule_work(&clock_was_set_work);
> > +               schedule_clock_was_set_work = 1;
> >         }
> >  
> >         write_sequnlock(&xtime_lock);
> >  
> > +       if (schedule_clock_was_set_work && keventd_up())
> > +               schedule_work(&clock_was_set_work);
> >         /*
> >          * Account stolen ticks.
> >          * HACK: Passing NULL to account_steal_time()
> 
> Ben.
> 

-- 
Ian Campbell
Current Noise: Pearl Jam - MFC

Confidence is simply that quiet, assured feeling you have before you
fall flat on your face.
		-- Dr. L. Binder




Reply to: