Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

To: RobJE Debian ARM <f2985a762f2ea45f9f1ce1339ad4318b@renf.us>
Cc: debian-arm@lists.debian.org, Timo Jyrinki <timo.jyrinki@gmail.com>, Martin Michlmayr <tbm@cyrius.com>, Ian Campbell <ijc@hellion.org.uk>
Subject: Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC
From: Andrew Lunn <andrew@lunn.ch>
Date: Mon, 28 May 2018 18:00:33 +0200
Message-id: <[🔎] 20180528160033.GB27177@lunn.ch>
In-reply-to: <[🔎] 34b43db9-7574-843b-8ecf-674b74367ce9@renf.us>
References: <1500801941.22097.24.camel@hellion.org.uk> <20170729155031.GB19012@lunn.ch> <53be01e4-d17e-c3af-2b8d-0e4d2007cb8b@renf.us> <20180425111628.2uif7xwekc6le5s5@jirafa.cyrius.com> <[🔎] CAJtFfxnA7_9L1ybrzDYzksXaMnm_+HECMr0KuHXf2B19c4_zpA@mail.gmail.com> <[🔎] 20180524123026.GA24557@lunn.ch> <[🔎] 34b43db9-7574-843b-8ecf-674b74367ce9@renf.us>

1On Sun, May 27, 2018 at 01:39:35PM +0200, RobJE Debian ARM wrote:
> On 24-05-18 14:30, Andrew Lunn wrote:
> > On Thu, May 24, 2018 at 12:40:06PM +0300, Timo Jyrinki wrote:
> >> 2018-04-25 14:16 GMT+03:00 Martin Michlmayr <tbm@cyrius.com>:
> >>> Timo Jyrinki is happy to run some tests.  He's affected and has a
> >>> serial console.  The bug is still there in the 4.9 kernel we're
> >>> shipping with Debian kernel.
> >>>
> >>> Andrew, what information or access do you need so this can be tracked
> >>> down?
> >>
> >> Yesterday I tried booting with mem=512M added to the u-boot's setenv
> >> bootargs, and wasn't able to reproduce the problem. Booting again
> >> without the parameter it was there again. I repeated a couple of times
> >> with same results, although sometimes it took some time for the
> >> problem to occur in the normal 1GB RAM use case so I'm not 100% sure
> >> of how bullet proof the workaround is. I tried to use at least some
> >> memory by starting Debian installer fetching, logging into it via ssh
> >> etc.
> >>
> >> Could someone else try it out? Double-check the parameter worked with
> >> 'free'. I'm tempted to make a backup of my current / + flash
> >> partitions and dist-upgrade to stretch. On that note, what would be
> >> the easiest way to set the mem=512M as the default for normal boots?
> >>
> >> Andrew wasn't able to reproduce the problem on his 6282 machine. Would
> >> it be that he has QNAP TS-219P+ or similar that has only 512MB RAM?
> >> (https://www.cyrius.com/debian/kirkwood/qnap/ts-219/specs/)
> > 
> > Hi Timo
> > 
> > root@qnap:~# cat /proc/meminfo 
> > MemTotal:         511516 kB
> > 
> > So lets think about what this could mean...
> > 
> > Is the 1G implemented using two RAM chips? Do you have photos of your
> > board? Can you identify the chips? Does u-boot say anything useful
> > about the RAM?
> > 
> > Could the u-boot you have not be correctly initialising the second RAM
> > chip? Are you using the stock QNAP/marvell u-boot, or have you
> > upgraded u-boot?
> > 
> > Is there a hole in the address range between the two RAMs? The kernel
> > should be able to handle that, but i don't know if you have to tell
> > it, or if it can figure it out itself. Can you see anything about this
> > in the kernel logs, or u-boot?
> > 
> > Do we see the physical address being accessed when we get the abort?
> > Is it in the top 1/2 of the RAM? Could it be a DMA operation which has
> > gone over the boarder between the end of the first RAM and the
> > beginning of the second RAM? Seems a bit unlikely....
> > 
> >    Andrew
> 
> Timo's remark about memory triggered me.
> 
> I am not convinced it is related to u-boot or memory chips. Specifically
> because kernel lenny 4.3.0-0.bpo.1-kirkwood (4.3.5-1~bpo8+1) does not
> have these issues. For me the issues started after the flavour change
> from kirkwood to marvell.
> 
> I tried running strecth 4.16.0-0.bpo.1-marvell (4.16.5-1~bpo9+1) with
> mem=512M which was stable for more than 24 hours. Comparing dmesg output
> one interesting line was missing in the 512M version:
> 
> 	HighMem zone: 65536 pages, LIFO batch:15
> 
> With mem=768M also kernel boots with no bug and error reports. 768M is
> the border where (according to dmesg) HighMem starts. With no mem= (i.e.
> using the full 1024M) just booting already prints a lot of error
> messages for me.

Hi Rob

Since my QNAP only has 512M, there is not too much experimentation i
can do.

Could you try changing "Memory split" to "3G/1G user/kernel split (for
full 1G low memory)". You should then see that the lowmem in the
Virtual kernel memory layout table goes from starting at 0xc0000000 to
starting at 0xB0000000. I hope it will then not use high mem, and
still give you the full 1G of RAM.

    Andrew

Reply to:

Follow-Ups:
- Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC
  - From: Jonathan Medhurst <tixy@yxit.co.uk>

References:
- Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC
  - From: Timo Jyrinki <timo.jyrinki@gmail.com>
- Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC
  - From: Andrew Lunn <andrew@lunn.ch>
- Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC
  - From: RobJE Debian ARM <f2985a762f2ea45f9f1ce1339ad4318b@renf.us>

Prev by Date: Pyra needs help by kernel and low-level ARM/OMAP experts
Next by Date: Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC
Previous by thread: Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC
Next by thread: Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC
Index(es):
- Date
- Thread