[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PATCH ] Fix capability check to allow privileged CLONE_NEWUSER from nested user namespaces



On 1/31/18 9:01 AM, Serge E. Hallyn wrote:
> Quoting Srivatsa S. Bhat (srivatsa@csail.mit.edu):
>> From: Srivatsa S. Bhat <srivatsa@csail.mit.edu>
>>
>> The existing patch which disallows unprivileged CLONE_NEWUSER applies
>> the check for CAP_SYS_ADMIN capability on the 'init_user_ns'
>> namespace, which is not entirely correct. Consider the following sequence:
>>
>> 1. A process with root privileges calls
>>    clone(child_fn, ..., CLONE_NEWUSER, ...) to create a new user namespace.
>>
>> 2. child_fn, now running in the newly created user namespace enjoys the
>>    full set of capabilities in the new user namespace, but has lost
>>    its capabilities in the old user namespace (init_user_ns in this
>>    case).
>>
>> 3. child_fn now calls
>>    clone(child_fn2, ..., CLONE_NEWUSER, ...) to create a new (nested)
>>    user namespace.
>>
>> Step 3 should have succeeded because child_fn has full privileges
> 
> No, it should not.  If the host has unprivileged_userns_clone=false,
> then it should require privilege against the init_user_ns to change
> or bypass that.  Yes the userns was *created* by someone with that
> privilege, but likely they did so precisely so that they could run
> more sandboxed code.

Ah, I see. Thanks a lot for clarifying that!

> 
>> (including CAP_SYS_ADMIN) in its user namespace, but this step fails
>> because the CAP_SYS_ADMIN capability is checked against init_user_ns,
>> as opposed to child_fn's user namespace.
>>
>> So fix this by checking for CAP_SYS_ADMIN using ns_capable() on the
>> current task's user namespace.
>>
>> This also helps the userns07 testcase from LTP
>> (testcases/kernel/containers/userns/userns07.c) to pass when running
>> with root privileges.
> 
> If you want to run that test (which checks for >=32 level nesting depth)
> with privilege, surely you can just set unprivileged_userns_clone=true?
> 

Indeed! (I referred to LTP above just to give a concrete example of a
scenario that had brought my attention to your patch. But given your
explanation above, I see that your patch addresses a slightly different
goal/scenario from this one).

Thank you!

Regards,
Srivatsa

>> Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu>
>> ---
>> Applies on debian kernel's master branch.
>>
>>  ...low-unprivileged-CLONE_NEWUSER-by-default.patch |   11 +++++++----
>>  1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch b/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch
>> index 55edbc7..fc978ea 100644
>> --- a/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch
>> +++ b/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch
>> @@ -12,6 +12,9 @@ issues are found, we have a fail-safe.
>>  
>>  Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
>>  [bwh: Remove unneeded binary sysctl bits]
>> +[Srivatsa: Fix capability checks when running nested user namespaces
>> +by using ns_capable() on the current task's user namespace.]
>> +Signed-off-by: Srivatsa S. Bhat <srivatsa@csail.mit.edu>
>>  ---
>>  --- a/kernel/fork.c
>>  +++ b/kernel/fork.c
>> @@ -27,24 +30,24 @@ Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
>>   
>>   /*
>>    * Minimum number of threads to boot the kernel
>> -@@ -1550,6 +1555,10 @@ static __latent_entropy struct task_stru
>> +@@ -1550,6 +1555,10 @@ static __latent_entropy struct task_struct *copy_process(
>>   	if ((clone_flags & (CLONE_NEWUSER|CLONE_FS)) == (CLONE_NEWUSER|CLONE_FS))
>>   		return ERR_PTR(-EINVAL);
>>   
>>  +	if ((clone_flags & CLONE_NEWUSER) && !unprivileged_userns_clone)
>> -+		if (!capable(CAP_SYS_ADMIN))
>> ++		if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
>>  +			return ERR_PTR(-EPERM);
>>  +
>>   	/*
>>   	 * Thread groups must share signals as well, and detached threads
>>   	 * can only be started up within the thread group.
>> -@@ -2343,6 +2352,12 @@ SYSCALL_DEFINE1(unshare, unsigned long,
>> +@@ -2343,6 +2352,12 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
>>   	if (unshare_flags & CLONE_NEWNS)
>>   		unshare_flags |= CLONE_FS;
>>   
>>  +	if ((unshare_flags & CLONE_NEWUSER) && !unprivileged_userns_clone) {
>>  +		err = -EPERM;
>> -+		if (!capable(CAP_SYS_ADMIN))
>> ++		if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
>>  +			goto bad_unshare_out;
>>  +	}
>>  +


Reply to: