Bug#703356: megasas: Failed to alloc kernel SGL buffer for IOCTL (ref.#688198)
Jean-Francois Chevrette <jf.cron0@gmail.com> writes:
> Package: src:linux
> Version: 3.2.39-2
> Severity: important
>
> (first time submiting to a bug report, sorry if I missed anything)
>
> We are still affected by bug #688198
Yes, I see that it was closed after applying a related bugfix.  But as I
noted in http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=688198#25 the
reported bug would not be fixed by this after all.  The fixed bug was
real, but unrelated to the reported one.
> We have other seemingly identical servers (hardware & software) and not
> all of them have this problem.
>
> Is there anything else I can provide to help?
The message indicates a memory allocation problem related to sending
management commands from userspace to the driver/controller.  Management
commands are e.g. requests from smartctl, raid monitoring etc.
All data transferred between these userspace applications and the
controller must be copied to/from dma-coherent buffers for transfer to
the controller, and it is the allocation of these buffers which fails.
Either because the requests are so bogus (too many or too big) that they
just cannot be serviced, or because the system is out of memory in the
appropriate pool.
Maybe we can get some ideas about why this fails if you describe the
conditions you experience the problem under.  I believe the fact that
you only see this on some of otherwise identical servers is very
interesting. If we could find some pattern here, then that would help.
Is there some special monitoring application running on the failing
servers?  Are there other devices in these servers which may have
drivers eating memory?
I can't, but maybe the Debian kernel gurus can read something out of 
 /proc/slabinfo 
 /proc/buddyinfo
 /proc/pagetypeinfo
Comparing those files on a failing server and a non-failing server would
certainly be interesting.
Bjørn
Reply to: