Re: rx2660 + debian

To: Sergei Trofimovich <slyich@gmail.com>
Cc: Frank Scheiner <frank.scheiner@web.de>, John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>, Anton Borisov <anton.borisov@gmail.com>, debian-ia64 <debian-ia64@lists.debian.org>
Subject: Re: rx2660 + debian
From: Pedro Miguel Justo <pmsjt@texair.net>
Date: Wed, 27 Apr 2022 07:48:15 +0000
Message-id: <[🔎] A4953905-B9B0-4B83-930F-35E4B623A7D5@texair.net>
In-reply-to: <[🔎] 20220427082706.6c97bd7f@nz>
References: <CANHzssBhm=A2Q+npDqdwYsd9DKhqYyT_VVnYwQr=7WpPBNccqg@mail.gmail.com> <[🔎] 6c888a13-4737-8a1b-023f-c683f12e083d@web.de> <[🔎] E8BAF1B0-6C33-4E65-BCEC-B2D258D97898@texair.net> <[🔎] c7c5702b-8123-b3fc-c0f5-c5e132393d60@physik.fu-berlin.de> <[🔎] E0CC7F7A-01A2-4F36-A080-266B56830C74@texair.net> <[🔎] 549c036f-fbb3-f990-deb3-4b47a57d38b2@physik.fu-berlin.de> <[🔎] 3FBD5838-E30A-4825-AD47-657F065062AE@texair.net> <[🔎] 8CDCF0F2-C2C2-4EED-ADD0-9E3889938908@texair.net> <[🔎] 6fc4b6d1-5c84-1eef-d7fa-952d9a09ee3f@physik.fu-berlin.de> <[🔎] 529EF8D2-F76E-4EC4-8642-ED7B0175E888@texair.net> <[🔎] fa5bc5de-09f3-c047-5eb3-0253ed6959f5@physik.fu-berlin.de> <[🔎] b1d000e5-1698-7f24-4d78-eced6c98e57b@web.de> <[🔎] 1E227A5F-0C03-4615-B921-D2EBED3DB058@texair.net> <[🔎] DD920E23-474D-4F72-B835-0BBD4C6C25A0@texair.net> <[🔎] 20220425220857.002cf2a3@nz> <[🔎] 6A919D40-D0A5-4A89-BA26-2C423426FC7B@texair.net> <[🔎] 400280AA-2191-4DE0-A8D6-13C4A38BA846@texair.net> <20220426090108.60f163b3@nz> <[🔎] F61A50F6-41E3-4EB1-8AE0-35F868D2BB07@texair.net> <[🔎] 20220427082706.6c97bd7f@nz>


> On 2022/Apr/27, at 00:27, Sergei Trofimovich <slyich@gmail.com> wrote:
> 
> On Tue, 26 Apr 2022 08:18:25 +0000
> Pedro Miguel Justo <pmsjt@texair.net> wrote:
> 
>>> On 2022/Apr/26, at 01:01, Sergei Trofimovich <slyich@gmail.com> wrote:
>>> 
>>> On Tue, 26 Apr 2022 02:43:00 +0000
>>> Pedro Miguel Justo <pmsjt@texair.net> wrote:
>>> 
>>>>> On 2022/Apr/25, at 14:27, Pedro Miguel Justo <pmsjt@texair.net> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 2022/Apr/25, at 14:09, Sergei Trofimovich <slyich@gmail.com> wrote:
>>>>>> 
>>>>>> On Mon, 25 Apr 2022 15:07:58 +0000
>>>>>> Pedro Miguel Justo <pmsjt@texair.net> wrote:
>>>>>> 
>>>>>>>> On 2022/Apr/25, at 01:22, Pedro Miguel Justo <pmsjt@texair.net> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 2022/Apr/25, at 01:14, Frank Scheiner <frank.scheiner@web.de> wrote:
>>>>>>>>> 
>>>>>>>>> Hi guys,
>>>>>>>>> 
>>>>>>>>> On 25.04.22 10:09, John Paul Adrian Glaubitz wrote:      
>>>>>>>>>>> From what I can understand by the information in the bugcheck, this is somewhat related to a violation
>>>>>>>>>>> in parameter copy from user to kernel during some boot-time, crypto, self-test. Does that sound right?
>>>>>>>>>>> If that is the case, how would this be related to FW?      
>>>>>>>>>> 
>>>>>>>>>> I'm not claiming that it must be related to the firmware, I'm just saying that I don't see this problem
>>>>>>>>>> on my RX2660 at all and I have even reinstalled it recently with one of the latest firmware images
>>>>>>>>>> without having to pass any parameter to the command line.      
>>>>>>>>> 
>>>>>>>>> A difference between Adrian's rx2660 and Pedro's rx2660 is Montecito
>>>>>>>>> left and Montvale right.
>>>>>>>>> 
>>>>>>>>> But could still be multiple other reasons we haven't looked at yet in
>>>>>>>>> detail:
>>>>>>>>> 
>>>>>>>>> * amount of memory installed
>>>>>>>>> * SMT enabled or not
>>>>>>>>> * number of processor modules installed
>>>>>>>>> 
>>>>>>>>> It might be possible for me to check on my rx2660s (one with Montvale
>>>>>>>>> and one with Montecito(s)) tomorrow. I will then also look at my other
>>>>>>>>> Itanium gear and gather relevant information.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Yes, this sounds mode likely to me too.
>>>>>>>> 
>>>>>>>> The crypto self-tests seem to be an innocent bystander here. I tried booting the most recent kernel with the option “cryptomgr.notests” and it went much farther. Alas it still failed with another buffer copy validation for a different caller altogether:
>>>>>>>> 
>>>>>>>> [    3.836466]  [<a000000101353690>] usercopy_abort+0x120/0x130
>>>>>>>> [    3.836466]                                 sp=e0000001000cfdf0 bsp=e0000001000c9388
>>>>>>>> [    3.836466]  [<a0000001004c5660>] __check_object_size+0x3c0/0x420
>>>>>>>> [    3.836466]                                 sp=e0000001000cfe00 bsp=e0000001000c9350
>>>>>>>> [    3.836466]  [<a000000100570030>] sys_getcwd+0x250/0x420
>>>>>>>> [    3.836466]                                 sp=e0000001000cfe00 bsp=e0000001000c92c8
>>>>>>>> [    3.836466]  [<a00000010000c860>] ia64_ret_from_syscall+0x0/0x20
>>>>>>>> [    3.836466]                                 sp=e0000001000cfe30 bsp=e0000001000c92c8
>>>>>>>> [    3.836466]  [<a000000000040720>] ia64_ivt+0xffffffff00040720/0x400
>>>>>>>> [    3.836466]                                 sp=e0000001000d0000 bsp=e0000001000c92c8
>>>>>>>> 
>>>>>>>> This suggests the bug might be in the logic validating these buffers against the allocations (heap, span, etc).
>>>>>>>> 
>>>>>>>> I don’t know why hardened_usercopy=off is not being observed by the kernel. As a work-around I am copying myself a new kernel with CONFIG_HARDENED_USERCOPY disabled at the source.     
>>>> 
>>>> So, I finished compiling my kernel with CONFIG_HARDENED_USERCOPY disabled. Guess what:
>>>> 
>>>> pmsjt@debian:~$ uname -a
>>>> Linux debian 5.17.3-rt17 #2 SMP Mon Apr 25 16:55:00 PDT 2022 ia64 GNU/Linux
>>>> 
>>>> Yup, the system starts just fine with the most recent kernel. So, two things we can infer from this:
>>>> - Yes, usercopy validation appears to be broken. The contours of how broken it is are yet unknown but we’ll have to investigate to see what part of the validation is failing.  
>>> 
>>> My vague memory tells me that the breakage mechanics is the following:
>>> 
>>> 1. [ok] mm/usercopy.c:check_kernel_text_object() gets called on various kernel addresses
>>> 2. [ok] ia64 address space has at least two widely used kernel addresses:
>>>       - 0xe0000000_00000000-0xe000..ff-ffffffff - proper linear mapping, 0xe0000000_00000000 + phys
>>>                                                   (so called, RGN_KERNEL = 7)
>>>       - 0xa0000001_00000000-0xa0000000_00ffffff - kernel ~linear mapping, 0xa0000001_00000000 + phys
>>>                                                   (so called, RGN_GATE = 5)
>>> 
>>> Both are easy to translate to proper linear address, but the translations
>>> are not identical. Note that RGN_GATE has an offset.
>>> 
>>> ia64's stack for some reason lives in RGN_KERNEL space (probably due to
>>> historical reasons before kernel switched to vmapped stacks on other arches).
>>> You can see these addresses in backtraces as
>>>   sp=e0000001000cfdf0 bsp=e0000001000c9388
>>> 
>>> ia64's kernel code on the other hand lives in RGN_GATE.
>>> You can see these addresses in backtraces as
>>>   [<a000000101353690>] usercopy_abort+0x120/0x130
>>> 
>>> Now, check_kernel_text_object() compares passed kernel address for 2 ranges:
>>> - RGN_GATE:
>>>   unsigned long textlow = (unsigned long)_stext;
>>>   unsigned long texthigh = (unsigned long)_etext;
>>>   if (overlaps(ptr, n, textlow, texthigh))
>>>       usercopy_abort("kernel text", NULL, to_user, ptr - textlow, n);
>>> 
>>>   I think this check is correct. There is no address translation.
>>> 
>>> - RGN_KERNEL:
>>>   textlow_linear = (unsigned long)lm_alias(textlow);
>>>   texthigh_linear = (unsigned long)lm_alias(texthigh);
>>>   if (overlaps(ptr, n, textlow_linear, texthigh_linear))
>>>       usercopy_abort("linear kernel text", NULL, to_user,...
>>> 
>>>   I think this one is wrong. Looking at the definition of lm_alias()
>>>   it can ge used only on RGN_KERNEL addresses (which looks useless,
>>>   they are already in correct form) and literal symbol names:
>>> 
>>>     include/linux/mm.h:#define lm_alias(x)  __va(__pa_symbol(x))
>>>     include/linux/mm.h:#define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
>>>     arch/ia64/include/asm/page.h:#define __pa(x)            ({ia64_va _v; _v.l = (long) (x); _v.f.reg = 0; _v.l;})
>>>     #define __va(x)            ({ia64_va _v; _v.l = (long) (x); _v.f.reg = -1; _v.p;})
>>> 
>>> I think that means __pa_symbol(x) against local variables like textlow
>>> is completely broken. 
>>> 
>>> If lm_alias() works at all on ia64 maybe that is enough to unbreak usercoy:
>>> 
>>> --- a/mm/usercopy.c
>>> +++ b/mm/usercopy.c
>>> @@ -133,13 +133,13 @@ static inline void check_kernel_text_object(const unsigned long ptr,
>>>        * __pa() is not just the reverse of __va(). This can be detected
>>>        * and checked:
>>>        */
>>> -       textlow_linear = (unsigned long)lm_alias(textlow);
>>> +       textlow_linear = (unsigned long)lm_alias(_stext);
>>>       /* No different mapping: we're done. */
>>>       if (textlow_linear == textlow)
>>>               return;
>>> 
>>>       /* Check the secondary mapping... */
>>> -       texthigh_linear = (unsigned long)lm_alias(texthigh);
>>> +       texthigh_linear = (unsigned long)lm_alias(_etext);
>>>       if (overlaps(ptr, n, textlow_linear, texthigh_linear))
>>>               usercopy_abort("linear kernel text", NULL, to_user,
>>>                              ptr - textlow_linear, n);
>>> 
>>> If not here is an ia64-specific workaround:
>>> 
>>> --- a/mm/usercopy.c
>>> +++ b/mm/usercopy.c
>>> @@ -133,13 +133,13 @@ static inline void check_kernel_text_object(const unsigned long ptr,
>>>        * __pa() is not just the reverse of __va(). This can be detected
>>>        * and checked:
>>>        */
>>> -       textlow_linear = (unsigned long)lm_alias(textlow);
>>> +       textlow_linear = (unsigned long)ia64_imva(textlow);
>>>       /* No different mapping: we're done. */
>>>       if (textlow_linear == textlow)
>>>               return;
>>> 
>>>       /* Check the secondary mapping... */
>>> -       texthigh_linear = (unsigned long)lm_alias(texthigh);
>>> +       texthigh_linear = (unsigned long)ia64_imva(texthigh);
>>>       if (overlaps(ptr, n, textlow_linear, texthigh_linear))
>>>               usercopy_abort("linear kernel text", NULL, to_user,
>>>                              ptr - textlow_linear, n);
>>> 
>>> CAVEAT: I did not compile- or run- test it. Might need a bit of pointer/long casting around.
>>> 
>> 
>> I don’t mind at all giving this a try. Sounds quite plausible.
>> 
>> Do you think the stacks (and RSE) will eventually have to move from RGN_KERNEL to vmapped the same as other archs to avoid issues like this and other to come?
> 
> I'd say the move to virtual mapping area would not change things much
> even for this specific bug. It would move more addresses into a place
> where __pa() macro does not work and thus requires proper resolution
> to get linear address.
> 

I’ll take your word for it. I was just wondering if this was happening as a result of the lack of consistency and it introducing unexpected conditions. If it doesn’t help, forget I mentioned.

> Moving to vmap on ia64 does not have a benefit of guarding against stack
> overflows that other targets have: backing store clashes into memory stack
> first on overflows somewhere in the middle of the stack area. On the other
> hand vmap does help in getting rid of higher order allocations for stacks.
> 

Oh my. You are right. It was right in front of my nose but I never realized it: On Linux, SP and BSTORE start at each edge of the stack and march-on towards each other in an overflow condition. And there is no guard page in the middle? Or is there? I always assumed Linux used the same design as Windows, there SP and BSTORE start and the same address - in the middle of the stack memory block - and grow apart, towards two stack guard pages, one in each end of the stack memory block.

> Implementing vmapped stacks is probably a bit tricky: ia64 is an architecture
> where TLB cache registers (itc*/dtc*) are maintained in software (with some
> hardware assistance from VHPT walker). To make things simpler ia64 sets static
> TLB registers (itr*/dtr*) to kernel as is: code around IA64_TR_KERNEL and
> IA64_TR_CURRENT_STACK has a few assumptions around the stack layout.
> 

Maybe that’s true. I am not sure why the direct access to TLBs static and cache tlb entries changes the design much. Yes, there is advantage in using less slots by piling more into the static entries that are there for the kernel anyways.

> I'm not sure if TLB fault handlers would just work for vmapped stack. Maybe?
> 

Are vmapped stacks not represented by laid-out page-table-entries that can be fetched either by the VHPT walker or, if it can’t be bothered (or has been disabled) collected by the software vectors? I don’t know - I am honestly asking. In Itanium there are so many different possibilities to serve a TLB fault that one has to ask :).

> -- 
> 
>  Sergei

Reply to:

References:
- Re: rx2660 + debian
  - From: Frank Scheiner <frank.scheiner@web.de>
- Re: rx2660 + debian
  - From: Pedro Miguel Justo <pmsjt@texair.net>
- Re: rx2660 + debian
  - From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
- Re: rx2660 + debian
  - From: Pedro Miguel Justo <pmsjt@texair.net>
- Re: rx2660 + debian
  - From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
- Re: rx2660 + debian
  - From: Pedro Miguel Justo <pmsjt@texair.net>
- Re: rx2660 + debian
  - From: Pedro Miguel Justo <pmsjt@texair.net>
- Re: rx2660 + debian
  - From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
- Re: rx2660 + debian
  - From: Pedro Miguel Justo <pmsjt@texair.net>
- Re: rx2660 + debian
  - From: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
- Re: rx2660 + debian
  - From: Frank Scheiner <frank.scheiner@web.de>
- Re: rx2660 + debian
  - From: Pedro Miguel Justo <pmsjt@texair.net>
- Re: rx2660 + debian
  - From: Pedro Miguel Justo <pmsjt@texair.net>
- Re: rx2660 + debian
  - From: Sergei Trofimovich <slyich@gmail.com>
- Re: rx2660 + debian
  - From: Pedro Miguel Justo <pmsjt@texair.net>
- Re: rx2660 + debian
  - From: Pedro Miguel Justo <pmsjt@texair.net>
- Re: rx2660 + debian
  - From: Pedro Miguel Justo <pmsjt@texair.net>
- Re: rx2660 + debian
  - From: Sergei Trofimovich <slyich@gmail.com>

Prev by Date: Re: rx2660 + debian
Next by Date: Re: rx2660 + debian
Previous by thread: Re: rx2660 + debian
Next by thread: Re: rx2660 + debian
Index(es):
- Date
- Thread