[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: The Internet locks up Buster



	Hi.

On Thu, Jun 07, 2018 at 11:05:27AM +0300, Abdullah Ramazanoğlu wrote:
> On Thu, 7 Jun 2018 10:52:01 +0300 Reco said:
> 
> [--8<--]
> 
> > I.e. 12309 bug is back. It's obscure and presumably fixed (at least four
> > times fixed) bug that happens with relatively slow filesystem (be it
> > SSD/HDD/NFS or whatever) and a large amount of free RAM. I first
> > encountered the thing back in 2.6.18 days, where it was presumably
> > implemented (as in - nobody complained before ;).
> > 
> > The idea behind that bug is simple - first, the kernel accumulates a
> > certain amount of 'dirty' (i.e. changed) filesystem blocks. Since the
> > amount of free RAM is large, the amount of such blocks is huge too.
> > Next, kernel realizes that it's time for a 'barrier write' - everything
> > that was happening before the barrier must be written onto persistent
> > storage. And since it's 'barrier write time', everyone at userspace are
> > blocked from making new changes for the existing filesystems, i.e.
> > everyone are blocked on I/O.
> > Since the amount of dirty blocks is huge, and the filesystem is slow -
> > the kernel takes its time and writes dirty blocks. But - it writes them
> > slowly, and new I/O requests are accumulating faster than it's possible
> > for the kernel to write them. Hence the lookup.
> > 
> > > So, as I think you suggested, it seems that OOM-killer isn't getting
> > > in quickly enough to kill a program and/or not working correctly.
> > > 
> > > Suggestions?  
> > 
> > Limit the size of dirty blocks cache. Kernel defaults are insanely large.
> > What I'm using here is:
> > 
> > $ cat /etc/sysctl.d/12309.conf
> > vm.dirty_ratio=5
> > vm.dirty_background_ratio=5
> 
> I have added the line below to /etc/crontab for different reasons (better FS
> resilience), but it might help to circumvent this bug too.
> 
> * * * * * root /bin/sync

Here our approaches differ. I'm a strong believer of 'kernel does not
need userspace kludges' principle.
Yours is what Red Hat is using these days - 'we wrote a userspace tool
for that'.

To each its own, I guess.

Reco


Reply to: