[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: The Internet locks up Buster



On Thu, 7 Jun 2018 10:52:01 +0300 Reco said:

[--8<--]

> I.e. 12309 bug is back. It's obscure and presumably fixed (at least four
> times fixed) bug that happens with relatively slow filesystem (be it
> SSD/HDD/NFS or whatever) and a large amount of free RAM. I first
> encountered the thing back in 2.6.18 days, where it was presumably
> implemented (as in - nobody complained before ;).
> 
> The idea behind that bug is simple - first, the kernel accumulates a
> certain amount of 'dirty' (i.e. changed) filesystem blocks. Since the
> amount of free RAM is large, the amount of such blocks is huge too.
> Next, kernel realizes that it's time for a 'barrier write' - everything
> that was happening before the barrier must be written onto persistent
> storage. And since it's 'barrier write time', everyone at userspace are
> blocked from making new changes for the existing filesystems, i.e.
> everyone are blocked on I/O.
> Since the amount of dirty blocks is huge, and the filesystem is slow -
> the kernel takes its time and writes dirty blocks. But - it writes them
> slowly, and new I/O requests are accumulating faster than it's possible
> for the kernel to write them. Hence the lookup.
> 
> > So, as I think you suggested, it seems that OOM-killer isn't getting
> > in quickly enough to kill a program and/or not working correctly.
> > 
> > Suggestions?  
> 
> Limit the size of dirty blocks cache. Kernel defaults are insanely large.
> What I'm using here is:
> 
> $ cat /etc/sysctl.d/12309.conf
> vm.dirty_ratio=5
> vm.dirty_background_ratio=5

I have added the line below to /etc/crontab for different reasons (better FS
resilience), but it might help to circumvent this bug too.

* * * * * root /bin/sync

Regards
-- 
Abdullah Ramazanoğlu



Reply to: