Bug#986431: marked as done (unblock: mksh/59c-4)

To: Sebastian Ramacher <sramacher@respighi.debian.org>
Subject: Bug#986431: marked as done (unblock: mksh/59c-4)
From: "Debian Bug Tracking System" <owner@bugs.debian.org>
Date: Wed, 07 Apr 2021 20:33:09 +0000
Message-id: <[🔎] handler.986431.D986431.161782747523814.ackdone@bugs.debian.org>
Reply-to: 986431@bugs.debian.org
References: <E1lUEpU-0005NN-GX@respighi.debian.org> <[🔎] 161765350199.1243.14882163773313020530.reportbug@tglase-nb.lan.tarent.de>

Your message dated Wed, 07 Apr 2021 20:31:12 +0000
with message-id <E1lUEpU-0005NN-GX@respighi.debian.org>
and subject line unblock mksh
has caused the Debian Bug report #986431,
regarding unblock: mksh/59c-4
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
986431: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=986431
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems

--- Begin Message ---

To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: unblock: mksh/59c-4
From: Thorsten Glaser <tg@mirbsd.de>
Date: Mon, 05 Apr 2021 22:11:41 +0200
Message-id: <[🔎] 161765350199.1243.14882163773313020530.reportbug@tglase-nb.lan.tarent.de>

Package: release.debian.org
Severity: normal
User: release.debian.org@packages.debian.org
Usertags: unblock
X-Debbugs-Cc: tg@mirbsd.de

Please unblock package mksh

[ Reason ]
This change was made under the impression that it would migrate
thanks to autopkgtests… I was taken by surprise that mksh is a
key package. I debugged that and found that debhelper B-D on
shunit2 which B-D on mksh, which causes it to be a key package.

Now shunit2 only B-D on mksh because it uses mksh at build time
to run its tests, but shunit2’s build only fails if the last
(shell, test) tuple fails, which is zsh.

The change (asides from updating the documentation, especially
wrt. future compatibility guarantees) brings mksh closer to the
other shells in that it implements a form of POSIX locale trak‐
king to enable/disable UTF-8 mode (instead of mostly defaulting
to C). Furthermore, a few Emacs input editing mode commands are
extended to be able to operate on bigwords (like in Vi mode),
not only words.

[ Impact ]
The user who requested the Emacs mode extension will be sad and
have to wait for another release (unless I backport mksh, which
I may do or may not do later).
The shell will behave more closely to the previous releases’ mksh
and less like a future mksh and most other shells wrt. handling
the POSIX locale.
I would consider that unfortunate.

[ Tests ]
mksh has an extensive regression test suite which exercises a good
portion of the code (and the compiler, toolchain and libc/kernel).
The testsuite was updated to match the expectations of locale trak‐
king as implemented (minor) and the changes were reviewed. I’ve run
the shell, even as /bin/sh (supported but only as manual step in
Debian), for more than three weeks now, extensively using it, and
found no problems. I’ve verified the interactive mode change as well.

[ Risks ]
As stated above, the risk is minimal, considering what shunit2 does
and when it was changed last, incidentally. mksh is used as direct
dependency by not many packages, and while it’s installed on 2454
systems, few would install it as /bin/sh, and those people know what
they’re doing. A standard Debian installation will not have mksh
even installed.

[ Checklist ]
  [x] all changes are documented in the d/changelog
  [x] I reviewed all changes and I approve them
  [↓] attach debdiff against the package in testing

I’ve changed the debdiff to change the diff-to-a-patch into a diff
between the patched files (the packaging uses single-debian-patch
as I normally work on a fully patched tree with git and every new
upstream release will have the Debian-local changes go down to 0)
to make it more readable. The last time I did that it was accepted.

[ Other info ]
In a clean bullseye chroot, the autopkgtests pass.

unblock mksh/59c-4

diff -Nru mksh-59c/debian/changelog mksh-59c/debian/changelog
--- mksh-59c/debian/changelog	2021-02-07 02:57:12.000000000 +0100
+++ mksh-59c/debian/changelog	2021-03-13 19:09:48.000000000 +0100
@@ -1,3 +1,27 @@
+mksh (59c-4) unstable; urgency=low
+
+  * Update to upstream CVS HEAD
+    - [tg] Make "C" the implementation-specified default locale for
+      early-locale-tracking (note full locale tracking will have to
+      use whatever the underlying OS’ is, if no setlocale(3) it’ll
+      be just "C" again) and document possibly removing turning on
+      POSIX mode disabling and presence of a BOM enabling UTF-8 mode
+      with full locale tracking
+    - [tg] Document OPTU-16 (U+EF80‥U+EFFF) mapping for raw octets will not
+      be present once mksh will have switched to full 21-bit UCS / UTF-8
+    - [tg] Add several bigword-based editing commands to Emacs mode
+    - [tg] Improve documentation wrt. $ENV
+  * Bring locale tracking code somewhat closer to what will eventually be
+    in upstream code (once I manage to do the related changes around MirBSD
+    base and its scripts, which needs some more time and tuits):
+    - drop BOM enabling UTF-8 mode code
+    - no longer deactivate UTF-8 mode on entering POSIX mode
+      (rationale: the POSIX locale parameters will be the only deciding
+      factor; even if, nominally, only the POSIX locale is compliant)
+  * Apply locale tracking to nōn-interactive shells as well
+
+ -- Thorsten Glaser <tg@mirbsd.de>  Sat, 13 Mar 2021 19:09:48 +0100
+
 mksh (59c-3) unstable; urgency=medium
 
   * Update to upstream CVS HEAD
diff --git mksh_59c-3/check.t mksh_59c-4/check.t
index e8f96af..7c601fb 100644
--- mksh_59c-3/check.t
+++ mksh_59c-4/check.t
@@ -31,7 +31,7 @@
 # (2013/12/02 20:39:44) http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/regress/bin/ksh/?sortby=date
 
 expected-stdout:
-	KSH R59 2021/02/07
+	KSH R59 2021/03/13
 description:
 	Check base version of full shell
 stdin:
@@ -8890,7 +8890,7 @@ expected-stderr-pattern:
 ---
 name: utf8opt-1
 description:
-	Check that the utf8-mode flag is not set at non-interactive startup
+	Check that the utf8-mode flag *is* set at non-interactive startup
 env-setup: !PS1=!PS2=!LC_CTYPE=@utflocale@!
 stdin:
 	if [[ $- = *U* ]]; then
@@ -8899,7 +8899,7 @@ stdin:
 		echo is not set
 	fi
 expected-stdout:
-	is not set
+	is set
 ---
 name: utf8opt-2
 description:
diff --git mksh_59c-3/edit.c mksh_59c-4/edit.c
index ebccfc6..29c8494 100644
--- mksh_59c-3/edit.c
+++ mksh_59c-4/edit.c
@@ -29,7 +29,7 @@
 
 #ifndef MKSH_NO_CMDLINE_EDITING
 
-__RCSID("$MirOS: src/bin/mksh/edit.c,v 1.360 2021/01/24 18:14:40 tg Exp $");
+__RCSID("$MirOS: src/bin/mksh/edit.c,v 1.362 2021/02/26 11:51:07 tg Exp $");
 
 /*
  * in later versions we might use libtermcap for this, but since external
@@ -983,8 +983,8 @@ static int x_col;		/* current column on line */
 
 static int x_ins(const char *);
 static void x_delete(size_t, bool);
-static size_t x_bword(void);
-static size_t x_fword(bool);
+static void x_bword(uint32_t, bool);
+static void x_fword(uint32_t, bool);
 static void x_goto(char *);
 static char *x_bs0(char *, char *) MKSH_A_PURE;
 static void x_bs3(char **);
@@ -1007,7 +1007,7 @@ static void x_e_putc2(int);
 static void x_e_putc3(const char **);
 static void x_e_puts(const char *);
 #ifndef MKSH_SMALL
-static int x_fold_case(int);
+static int x_fold_case(int, uint32_t);
 #endif
 static char *x_lastcp(void);
 static void x_lastpos(void);
@@ -1043,6 +1043,12 @@ static struct x_defbindings const x_defbindings[] = {
 	{ XFUNC_mv_bword,		1,	'b'	},
 	{ XFUNC_mv_fword,		1,	'f'	},
 	{ XFUNC_del_fword,		1,	'd'	},
+#ifndef MKSH_SMALL
+	{ XFUNC_del_bbigword,		1,	'H'	},
+	{ XFUNC_mv_bbigword,		1,	'B'	},
+	{ XFUNC_mv_fbigword,		1,	'F'	},
+	{ XFUNC_del_fbigword,		1,	'D'	},
+#endif
 	{ XFUNC_mv_back,		0,  CTRL_B	},
 	{ XFUNC_mv_forw,		0,  CTRL_F	},
 	{ XFUNC_search_char_forw,	0,  CTRL_BC	},
@@ -1097,11 +1103,11 @@ static struct x_defbindings const x_defbindings[] = {
 	{ XFUNC_set_arg,		1,	'8'	},
 	{ XFUNC_set_arg,		1,	'9'	},
 #ifndef MKSH_SMALL
-	{ XFUNC_fold_upper,		1,	'U'	},
+	{ XFUNC_foldb_upper,		1,	'U'	},
 	{ XFUNC_fold_upper,		1,	'u'	},
-	{ XFUNC_fold_lower,		1,	'L'	},
+	{ XFUNC_foldb_lower,		1,	'L'	},
 	{ XFUNC_fold_lower,		1,	'l'	},
-	{ XFUNC_fold_capitalise,	1,	'C'	},
+	{ XFUNC_foldb_capitalise,	1,	'C'	},
 	{ XFUNC_fold_capitalise,	1,	'c'	},
 #endif
 	/*
@@ -1525,75 +1531,105 @@ x_delete(size_t nc, bool push)
 static int
 x_del_bword(int c MKSH_A_UNUSED)
 {
-	x_delete(x_bword(), true);
+	x_bword(C_MFS, true);
 	return (KSTD);
 }
 
 static int
 x_mv_bword(int c MKSH_A_UNUSED)
 {
-	x_bword();
+	x_bword(C_MFS, false);
 	return (KSTD);
 }
 
 static int
 x_mv_fword(int c MKSH_A_UNUSED)
 {
-	x_fword(true);
+	x_fword(C_MFS, false);
 	return (KSTD);
 }
 
 static int
 x_del_fword(int c MKSH_A_UNUSED)
 {
-	x_delete(x_fword(false), true);
+	x_fword(C_MFS, true);
 	return (KSTD);
 }
 
-static size_t
-x_bword(void)
+#ifndef MKSH_SMALL
+static int
+x_del_bbigword(int c MKSH_A_UNUSED)
+{
+	x_bword(C_BLANK, true);
+	return (KSTD);
+}
+
+static int
+x_mv_bbigword(int c MKSH_A_UNUSED)
+{
+	x_bword(C_BLANK, false);
+	return (KSTD);
+}
+
+static int
+x_mv_fbigword(int c MKSH_A_UNUSED)
+{
+	x_fword(C_BLANK, false);
+	return (KSTD);
+}
+
+static int
+x_del_fbigword(int c MKSH_A_UNUSED)
+{
+	x_fword(C_BLANK, true);
+	return (KSTD);
+}
+#endif
+
+static void
+x_bword(uint32_t separator, bool erase)
 {
 	size_t nb = 0;
 	char *cp = xcp;
 
 	if (cp == xbuf) {
 		x_e_putc2(KSH_BEL);
-		return (0);
+		return;
 	}
 	while (x_arg--) {
-		while (cp != xbuf && ctype(cp[-1], C_MFS)) {
+		while (cp != xbuf && ctype(cp[-1], separator)) {
 			cp--;
 			nb++;
 		}
-		while (cp != xbuf && !ctype(cp[-1], C_MFS)) {
+		while (cp != xbuf && !ctype(cp[-1], separator)) {
 			cp--;
 			nb++;
 		}
 	}
 	x_goto(cp);
-	return (x_nb2nc(nb));
+	if (erase)
+		x_delete(x_nb2nc(nb), true);
 }
 
-static size_t
-x_fword(bool move)
+static void
+x_fword(uint32_t separator, bool erase)
 {
-	size_t nc;
 	char *cp = xcp;
 
 	if (cp == xep) {
 		x_e_putc2(KSH_BEL);
-		return (0);
+		return;
 	}
 	while (x_arg--) {
-		while (cp != xep && ctype(*cp, C_MFS))
+		while (cp != xep && ctype(*cp, separator))
 			cp++;
-		while (cp != xep && !ctype(*cp, C_MFS))
+		while (cp != xep && !ctype(*cp, separator))
 			cp++;
 	}
-	nc = x_nb2nc(cp - xcp);
-	if (move)
+	if (erase)
+		x_delete(x_nb2nc(cp - xcp), true);
+	else
 		x_goto(cp);
-	return (nc);
 }
 
 static void
@@ -3157,12 +3193,12 @@ x_edit_line(int c MKSH_A_UNUSED)
 
 /*-
  * NAME:
- *	x_prev_histword - recover word from prev command
+ *	x_prev_histword - recover bigword from prev command
  *
  * DESCRIPTION:
- *	This function recovers the last word from the previous
+ *	This function recovers the last bigword from the previous
  *	command and inserts it into the current edit line. If a
- *	numeric arg is supplied then the n'th word from the
+ *	numeric arg is supplied then the n'th bigword from the
  *	start of the previous command is used.
  *	As a side effect, trashes the mark in order to achieve
  *	being called in a repeatable fashion.
@@ -3200,13 +3236,13 @@ x_prev_histword(int c MKSH_A_UNUSED)
 
 		rcp = &cp[strlen(cp) - 1];
 		/*
-		 * ignore white-space after the last word
+		 * ignore whitespace after the last bigword
 		 */
-		while (rcp > cp && ctype(*rcp, C_CFS))
+		while (rcp > cp && ctype(*rcp, C_BLANK))
 			rcp--;
-		while (rcp > cp && !ctype(*rcp, C_CFS))
+		while (rcp > cp && !ctype(*rcp, C_BLANK))
 			rcp--;
-		if (ctype(*rcp, C_CFS))
+		if (ctype(*rcp, C_BLANK))
 			rcp++;
 		x_ins(rcp);
 	} else {
@@ -3215,18 +3251,18 @@ x_prev_histword(int c MKSH_A_UNUSED)
 
 		rcp = cp;
 		/*
-		 * ignore white-space at start of line
+		 * ignore whitespace at start of line
 		 */
-		while (*rcp && ctype(*rcp, C_CFS))
+		while (*rcp && ctype(*rcp, C_BLANK))
 			rcp++;
 		while (x_arg-- > 0) {
-			while (*rcp && !ctype(*rcp, C_CFS))
+			while (*rcp && !ctype(*rcp, C_BLANK))
 				rcp++;
-			while (*rcp && ctype(*rcp, C_CFS))
+			while (*rcp && ctype(*rcp, C_BLANK))
 				rcp++;
 		}
 		cp = rcp;
-		while (*rcp && !ctype(*rcp, C_CFS))
+		while (*rcp && !ctype(*rcp, C_BLANK))
 			rcp++;
 		ch = *rcp;
 		*rcp = '\0';
@@ -3244,21 +3280,42 @@ x_prev_histword(int c MKSH_A_UNUSED)
 static int
 x_fold_upper(int c MKSH_A_UNUSED)
 {
-	return (x_fold_case('U'));
+	return (x_fold_case('U', C_MFS));
 }
 
 /* Lowercase N(1) words */
 static int
 x_fold_lower(int c MKSH_A_UNUSED)
 {
-	return (x_fold_case('L'));
+	return (x_fold_case('L', C_MFS));
 }
 
 /* Titlecase N(1) words */
 static int
 x_fold_capitalise(int c MKSH_A_UNUSED)
 {
-	return (x_fold_case('C'));
+	return (x_fold_case('C', C_MFS));
+}
+
+/* Uppercase N(1) bigwords */
+static int
+x_foldb_upper(int c MKSH_A_UNUSED)
+{
+	return (x_fold_case('U', C_BLANK));
+}
+
+/* Lowercase N(1) bigwords */
+static int
+x_foldb_lower(int c MKSH_A_UNUSED)
+{
+	return (x_fold_case('L', C_BLANK));
+}
+
+/* Titlecase N(1) bigwords */
+static int
+x_foldb_capitalise(int c MKSH_A_UNUSED)
+{
+	return (x_fold_case('C', C_BLANK));
 }
 
 /*-
@@ -3267,13 +3324,13 @@ x_fold_capitalise(int c MKSH_A_UNUSED)
  *
  * DESCRIPTION:
  *	This function is used to implement M-U/M-u, M-L/M-l, M-C/M-c
- *	to UPPER CASE, lower case or Capitalise Words.
+ *	to UPPER CASE, lower case or Capitalise words and bigwords.
  *
  * RETURN VALUE:
  *	None
  */
 static int
-x_fold_case(int c)
+x_fold_case(int c, uint32_t separator)
 {
 	char *cp = xcp;
 
@@ -3285,7 +3342,7 @@ x_fold_case(int c)
 		/*
 		 * first skip over any white-space
 		 */
-		while (cp != xep && ctype(*cp, C_MFS))
+		while (cp != xep && ctype(*cp, separator))
 			cp++;
 		/*
 		 * do the first char on its own since it may be
@@ -3303,7 +3360,7 @@ x_fold_case(int c)
 		/*
 		 * now for the rest of the word
 		 */
-		while (cp != xep && !ctype(*cp, C_MFS)) {
+		while (cp != xep && !ctype(*cp, separator)) {
 			if (c == 'U')
 				/* uppercase */
 				*cp = ksh_toupper(*cp);
diff --git mksh_59c-3/emacsfn.h mksh_59c-4/emacsfn.h
index 6162987..1a9c183 100644
--- mksh_59c-3/emacsfn.h
+++ mksh_59c-4/emacsfn.h
@@ -1,5 +1,5 @@
 /*-
- * Copyright (c) 2009, 2010, 2015, 2016, 2020
+ * Copyright (c) 2009, 2010, 2015, 2016, 2020, 2021
  *	mirabilos <m@mirbsd.org>
  *
  * Provided that these terms and disclaimer and all copyright notices
@@ -19,7 +19,7 @@
  */
 
 #if defined(EMACSFN_DEFNS)
-__RCSID("$MirOS: src/bin/mksh/emacsfn.h,v 1.11 2020/04/13 20:46:39 tg Exp $");
+__RCSID("$MirOS: src/bin/mksh/emacsfn.h,v 1.15 2021/02/26 11:51:08 tg Exp $");
 #define FN(cname,sname,flags)	static int x_##cname(int);
 #elif defined(EMACSFN_ENUMS)
 #define FN(cname,sname,flags)	XFUNC_##cname,
@@ -42,8 +42,14 @@ FN(comp_list, "complete-list", 0)
 FN(complete, "complete", 0)
 FN(del_back, "delete-char-backward", XF_ARG)
 FN(del_bword, "delete-word-backward", XF_ARG)
+#ifndef MKSH_SMALL
+FN(del_bbigword, "delete-bigword-backward", XF_ARG)
+#endif
 FN(del_char, "delete-char-forward", XF_ARG)
 FN(del_fword, "delete-word-forward", XF_ARG)
+#ifndef MKSH_SMALL
+FN(del_fbigword, "delete-bigword-forward", XF_ARG)
+#endif
 FN(del_line, "kill-line", 0)
 FN(draw_line, "redraw", 0)
 #ifndef MKSH_SMALL
@@ -59,8 +65,11 @@ FN(eval_region, "evaluate-region", 0)
 #endif
 FN(expand, "expand-file", 0)
 #ifndef MKSH_SMALL
-FN(fold_capitalise, "capitalize-word", XF_ARG)
+FN(foldb_capitalise, "capitalise-bigword", XF_ARG)
+FN(fold_capitalise, "capitalise-word", XF_ARG)
+FN(foldb_lower, "downcase-bigword", XF_ARG)
 FN(fold_lower, "downcase-word", XF_ARG)
+FN(foldb_upper, "upcase-bigword", XF_ARG)
 FN(fold_upper, "upcase-word", XF_ARG)
 #endif
 FN(goto_hist, "goto-history", XF_ARG)
@@ -80,15 +89,21 @@ FN(meta_yank, "yank-pop", 0)
 FN(mv_back, "backward-char", XF_ARG)
 FN(mv_beg, "beginning-of-line", 0)
 FN(mv_bword, "backward-word", XF_ARG)
+#ifndef MKSH_SMALL
+FN(mv_bbigword, "backward-bigword", XF_ARG)
+#endif
 FN(mv_end, "end-of-line", 0)
 FN(mv_forw, "forward-char", XF_ARG)
 FN(mv_fword, "forward-word", XF_ARG)
+#ifndef MKSH_SMALL
+FN(mv_fbigword, "forward-bigword", XF_ARG)
+#endif
 FN(newline, "newline", 0)
 FN(next_com, "down-history", XF_ARG)
 FN(nl_next_com, "newline-and-next", 0)
 FN(noop, "no-op", 0)
 FN(prev_com, "up-history", XF_ARG)
-FN(prev_histword, "prev-hist-word", XF_ARG)
+FN(prev_histword, "prev-hist-bigword", XF_ARG)
 #ifndef MKSH_SMALL
 FN(quote_region, "quote-region", 0)
 #endif
diff --git mksh_59c-3/lex.c mksh_59c-4/lex.c
index 9abd0ae..21b145b 100644
--- mksh_59c-3/lex.c
+++ mksh_59c-4/lex.c
@@ -1792,7 +1792,6 @@ yyskiputf8bom(void)
 		ungetsc_i(asc2rtt(0xEF));
 		return;
 	}
-	UTFMODE |= 8;
 }
 
 static Lex_state *
diff --git mksh_59c-3/main.c mksh_59c-4/main.c
index 0e9e176..5e5d5cf 100644
--- mksh_59c-3/main.c
+++ mksh_59c-4/main.c
@@ -6,7 +6,7 @@
 /*-
  * Copyright (c) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
  *		 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018,
- *		 2019, 2020
+ *		 2019, 2020, 2021
  *	mirabilos <m@mirbsd.org>
  *
  * Provided that these terms and disclaimer and all copyright notices
@@ -35,7 +35,7 @@
 #include <locale.h>
 #endif
 
-__RCSID("$MirOS: src/bin/mksh/main.c,v 1.376 2021/01/24 23:03:11 tg Exp $");
+__RCSID("$MirOS: src/bin/mksh/main.c,v 1.377 2021/02/07 02:02:26 tg Exp $");
 
 #ifndef MKSHRC_PATH
 #define MKSHRC_PATH	"~/.mkshrc"
@@ -611,6 +611,8 @@ main_init(int argc, const char *argv[], Source **sp, struct block **lp)
 	ccp = null;
 	switch (utf_flag) {
 
+	/* not set on command line, not FTALKING */
+	case 2:
 	/* auto-detect from locale or environment */
 	case 4:
 #if HAVE_SETLOCALE_CTYPE
@@ -636,8 +638,6 @@ main_init(int argc, const char *argv[], Source **sp, struct block **lp)
 		UTFMODE = isuc(ccp);
 		break;
 
-	/* not set on command line, not FTALKING */
-	case 2:
 	/* unknown values */
 	default:
 		utf_flag = 0;
diff --git mksh_59c-3/misc.c mksh_59c-4/misc.c
index dc653d4..7f3ac41 100644
--- mksh_59c-3/misc.c
+++ mksh_59c-4/misc.c
@@ -292,10 +292,6 @@ change_flag(enum sh_flag f, int what, bool newset)
 
 		/* +++ privs changed +++ */
 	} else if ((f == FPOSIX || f == FSH) && newval) {
-		/* Turning on -o posix? */
-		if (f == FPOSIX)
-			/* C locale required for compliance */
-			UTFMODE = 0;
 		/* Turning on -o posix or -o sh? */
 		Flag(FBRACEEXPAND) = 0;
 #ifndef MKSH_NO_CMDLINE_EDITING
diff --git mksh_59c-3/mksh.1 mksh_59c-4/mksh.1
index 3fadd8d..8017850 100644
--- mksh_59c-3/mksh.1
+++ mksh_59c-4/mksh.1
@@ -1,4 +1,4 @@
-.\" $MirOS: src/bin/mksh/mksh.1,v 1.502+locale-tracking 2021/01/24 23:04:39 tg Exp $
+.\" $MirOS: src/bin/mksh/mksh.1,v 1.506+locale-tracking 2021/02/26 11:51:08 tg Exp $
 .\" $OpenBSD: ksh.1,v 1.160 2015/07/04 13:27:04 feinerer Exp $
 .\"-
 .\" Copyright © 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
@@ -84,7 +84,7 @@
 .\" with -mandoc, it might implement .Mx itself, but we want to
 .\" use our own definition. And .Dd must come *first*, always.
 .\"
-.Dd $Mdocdate: February 7 2021 $
+.Dd March 13, 2021
 .\"
 .\" Check which macro package we use, and do other -mdoc setup.
 .\"
@@ -4443,11 +4443,9 @@ and at least one of these returns something that matches
 or
 .Dq utf8
 case-insensitively; for direct builtin calls depending on the
-aforementioned environment variables; or for stdin or scripts,
-if the input begins with a UTF-8 Byte Order Mark.
-Do not rely on BOM triggering, it might be removed with full locale tracking.
+aforementioned environment variables.
 .Pp
-This build of the shell implements (early) locale tracking, that is,
+This build of the shell implements semi-early locale tracking, that is,
 .Ic set Fl +U
 is changed whenever one of the
 .Tn POSIX
@@ -4558,9 +4556,7 @@ and
 this autodetection feature is compiled in.
 As a side effect, setting this flag turns off the
 .Ic braceexpand
-and
-.Ic utf8\-mode
-flags, which can be turned back on manually, and
+flag, which can be turned back on manually, and
 .Pq unless both are set in the same command
 .Ic sh
 mode.
@@ -5890,21 +5886,28 @@ command.
 .Pp
 The following is a list of available editing commands.
 Each description starts with the name of the command,
-suffixed with a colon;
-an
+suffixed with a colon; a
 .Op Ar n
 (if the command can be prefixed with a count); and any keys the command is
 bound to by default, written using caret notation
-e.g. the ASCII Esc character is written as \*(ha[.
-These control sequences are not case sensitive.
+(e.g. the ASCII Esc character is written as
+.Li \*(ha[ )
+or terminal-specific indications.
 A count prefix for a command is entered using the sequence
 .Pf \*(ha[ Ns Ar n ,
 where
 .Ar n
-is a sequence of 1 or more digits.
+is one or more digits.
 Unless otherwise specified, if a count is
 omitted, it defaults to 1.
 .Pp
+Bigwords, as used below, are separated by spaces or tabs;
+words consist of alphanumerics, underscore
+.Pq Ql _
+or dollar sign
+.Pq Ql $
+characters.
+.Pp
 Note that editing command names are used only with the
 .Ic bind
 command.
@@ -5927,8 +5930,13 @@ Emacs key bindings:
 Abort the current command, save it to the history, empty the line buffer and
 set the exit state to interrupted.
 .It auto\-insert: Op Ar n
+.Pq Most ordinary characters are bound to this command.
 Simply causes the character to appear as literal input.
-Most ordinary characters are bound to this.
+.It Xo backward\-bigword:
+.Op Ar n
+.No \*(ha[B
+.Xc
+Moves the cursor backward to the beginning of the bigword.
 .It Xo backward\-char:
 .Op Ar n
 .No \*(haB , \*(haXD , ANSI-CurLeft , PC-CurLeft
@@ -5940,19 +5948,21 @@ characters.
 .Op Ar n
 .No \*(ha[b , ANSI-Ctrl-CurLeft , ANSI-Alt-CurLeft
 .Xc
-Moves the cursor backward to the beginning of the word; words consist of
-alphanumerics, underscore
-.Pq Ql _
-and dollar sign
-.Pq Ql $
-characters.
+Moves the cursor backward to the beginning of the word.
 .It beginning\-of\-history: \*(ha[\*(Lt
 Moves to the beginning of the history.
 .It beginning\-of\-line: \*(haA, ANSI-Home, PC-Home
 Moves the cursor to the beginning of the edited input line.
+.It Xo capitalise\-bigword:
+.Op Ar n
+.No \*(ha[C
+.Xc
+Uppercase the first character in the next
+.Ar n
+bigwords as below.
 .It Xo capitalise\-word:
 .Op Ar n
-.No \*(ha[C , \*(ha[c
+.No \*(ha[c
 .Xc
 Uppercase the first ASCII character in the next
 .Ar n
@@ -5996,6 +6006,20 @@ match as in the
 .Ic complete
 command above.
 Note that \*(haI is usually generated by the Tab (tabulator) key.
+.It Xo delete\-bigword\-backward:
+.Op Ar n
+.No \*(ha[H
+.Xc
+Deletes
+.Ar n
+bigwords before the cursor.
+.It Xo delete\-bigword\-forward:
+.Op Ar n
+.No \*(ha[D
+.Xc
+Deletes characters after the cursor up to the end of
+.Ar n
+bigwords.
 .It Xo delete\-char\-backward:
 .Op Ar n
 .No ERASE Pq \*(haH ,
@@ -6043,9 +6067,16 @@ is not useful until either
 or
 .Ic up\-history
 has been performed.
+.It Xo downcase\-bigword:
+.Op Ar n
+.No \*(ha[L
+.Xc
+Lowercases the next
+.Ar n
+bigwords.
 .It Xo downcase\-word:
 .Op Ar n
-.No \*(ha[L , \*(ha[l
+.No \*(ha[l
 .Xc
 Lowercases the next
 .Ar n
@@ -6096,6 +6127,13 @@ Appends a
 to the current word and replaces the word with the result of performing file
 globbing on the word.
 If no files match the pattern, the bell is rung.
+.It Xo forward\-bigword:
+.Op Ar n
+.No \*(ha[F
+.Xc
+Moves the cursor forward to the end of the
+.Ar n Ns th
+bigword.
 .It Xo forward\-char:
 .Op Ar n
 .No \*(haF , \*(haXC , ANSI-CurRight , PC-CurRight
@@ -6164,15 +6202,18 @@ This does nothing.
 Introduces a 2-character command sequence.
 .It prefix\-2: \*(haX , \*(ha[[ , \*(ha[O
 Introduces a multi-character command sequence.
-.It Xo prev\-hist\-word:
+.It prefix\-3: \*(ha@
+Introduces a PC keyboard scancode.
+.It Xo prev\-hist\-bigword:
 .Op Ar n
 .No \*(ha[. , \*(ha[_
 .Xc
-The last word or, if given, the
-.Ar n Ns th
-word (zero-based) of the previous (on repeated execution, second-last,
-third-last, etc.) command is inserted at the cursor.
-Use of this editing command trashes the mark.
+If no count is given, the last bigword, otherwise the
+.No ( Ar n Ns +1)th
+bigword of the previous line is inserted at the cursor,
+and the mark is set to the beginning of the inserted word.
+When invoked repeatedly, the inserted text is replaced by the corresponding
+bigword from the second-last, third-last, etc. line.
 .It quote: \*(ha\*(ha , \*(haV
 The following character is taken literally rather than as an editing command.
 .It quote\-region: \*(ha[Q
@@ -6202,7 +6243,7 @@ The internal history list is searched
 backwards for commands matching the input.
 An initial
 .Ql \*(ha
-in the search string anchors the search.
+in the search string anchors the search at the beginning of the line.
 The escape key will leave search mode.
 Other commands, including sequences of escape as
 .Ic prefix\-1
@@ -6210,31 +6251,33 @@ followed by a
 .Ic prefix\-1
 or
 .Ic prefix\-2
-key will be executed after leaving search mode.
+key, will be executed after leaving search mode.
 The
 .Ic abort Pq \*(haG
-command will restore the input line before search started.
+command will restore the input line from before search started.
 Successive
 .Ic search\-history
-commands continue searching backward to the next previous occurrence of the
-pattern.
+commands continue searching backward to the following previous occurrence
+of the pattern.
 The history buffer retains only a finite number of lines; the oldest
 are discarded as necessary.
-.It search\-history\-up: ANSI-PgUp, PC-PgUp
-Search backwards through the history buffer for commands whose beginning match
-the portion of the input line before the cursor.
-When used on an empty line, this has the same effect as
-.Ic up\-history .
 .It search\-history\-down: ANSI-PgDn, PC-PgDn
-Search forwards through the history buffer for commands whose beginning match
+Search forwards (this command is only useful after an
+.Ic up\-history ,
+.Ic search\-history\-up
+or
+.Ic search\-history )
+through the history buffer for commands whose beginning matches
 the portion of the input line before the cursor.
 When used on an empty line, this has the same effect as
 .Ic down\-history .
-This is only useful after an
-.Ic up\-history ,
-.Ic search\-history
-or
-.Ic search\-history\-up .
+.It search\-history\-up: ANSI-PgUp, PC-PgUp
+Search backwards through the history buffer for commands whose beginning
+matches the portion of the input line before the cursor.
+When used on an empty line, this has the same effect as
+.Ic up\-history .
+.It set\-arg: \*(ha[0 .. \*(ha[9
+Mapped to begin prefixing a count to a command.
 .It set\-mark\-command: \*(ha[ Ns Aq space
 Set the mark at the cursor position.
 .It transpose\-chars: \*(haT
@@ -6250,9 +6293,16 @@ character to the right.
 Scrolls the history buffer backward
 .Ar n
 lines (earlier).
+.It Xo upcase\-bigword:
+.Op Ar n
+.No \*(ha[U
+.Xc
+Uppercase the next
+.Ar n
+bigwords.
 .It Xo upcase\-word:
 .Op Ar n
-.No \*(ha[U , \*(ha[u
+.No \*(ha[u
 .Xc
 Uppercase the next
 .Ar n
@@ -6262,6 +6312,8 @@ Display the version of
 .Nm mksh .
 The current edit buffer is restored as soon as a key is pressed.
 The restoring keypress is processed, unless it is a space.
+.It vt100\-hack: \*(ha[[1
+Mapped to internally represent some longer key sequences.
 .It yank: \*(haY
 Inserts the most recently killed text string at the current cursor position.
 .It yank\-pop: \*(ha[y
@@ -7049,6 +7101,12 @@ and wraparound defined, even (defying POSIX) on 36-bit and 64-bit systems.
 currently uses OPTU-16 internally, which is the same as UTF-8 and CESU-8
 with 0000..FFFD being valid codepoints; raw octets are mapped into the
 PUA range EF80..EFFF, which is assigned by CSUR for this purpose.
+.Em Future compatibility note :
+there's work underway to use full 21-bit UTF-8 in mksh R60 or so.
+Raw octet mapping will almost certainly be moved out of the PUA and into
+some range outside of UCS, such as 0x00400000 with the lower bits
+corresponding to the octet; high-bit7 octets only to keep ASCII unambiguous
+(EBCDIC will have to see, perhaps using the extant ASCII mapping).
 .Sh BUGS
 Suspending (using \*(haZ) pipelines like the one below will only suspend
 the currently running part of the pipeline; in this example,
diff --git mksh_59c-3/mksh.faq mksh_59c-4/mksh.faq
index 72979b1..9557bdc 100644
--- mksh_59c-3/mksh.faq
+++ mksh_59c-4/mksh.faq
@@ -1,4 +1,4 @@
-RCSID: $MirOS: src/bin/mksh/mksh.faq,v 1.12+locale-tracking 2021/01/30 05:56:15 tg Exp $
+RCSID: $MirOS: src/bin/mksh/mksh.faq,v 1.17+locale-tracking 2021/03/11 14:16:08 tg Exp $
 ToC: spelling
 Title: How do you spell <tt>mksh</tt>? How do you pronounce it?
 
@@ -263,7 +263,7 @@ Title: My prompt is weird!
  (This was agreed upon as suggestion in a discussion between bash, zsh and
  Korn shell developers.) The feature set of different shells vastly differs
  and each shell should use its default PS1 or from its startup files.</li>
-<li><tt>$ENV</tt> <a href="#env">is set and/or <tt>export</tt>ed</a>.</li>
+<li><tt>$ENV</tt> <a href="#env">is set and probably <tt>export</tt>ed</a>.</li>
 <li>Your prompt is just “<tt># </tt>”: you’re entering a root shell, and
  <tt>$PS1</tt> does not contain the ‘#’ character, in which case the shell
  forces this prompt, making extra privileges obvious.</li>
@@ -287,7 +287,7 @@ Title: My prompt is weird!
 ToC: env
 Title: On startup files and <tt>$ENV</tt> across and detecting various shells
 
-Interactive shells look at <tt>~/.mkshrc</tt> (or <tt>/system/etc/mkshrc</tt>
+<p>Interactive shells look at <tt>~/.mkshrc</tt> (or <tt>/system/etc/mkshrc</tt>
 on Android and <tt>/etc/mkshrc</tt> on FreeWRT and OpenWrt) by default. This
 location can, however, be overridden by setting the <tt>ENV</tt> environment
 variable. (FreeBSD is rumoured to set it in their system profile.) It’s better
@@ -298,7 +298,30 @@ or “MIRBSD KSH” for mksh, “PD KSH” for ancient mirbsdksh/oksh/pdksh, “
 for ksh93); <tt>$NETBSD_SHELL</tt> (NetBSD ash); <tt>POSH_VERSION</tt> (posh, a
 pdksh derivative); <tt>$SH_VERSION</tt> (“PD KSH” as sh), <tt>$YASH_VERSION</tt>
 (yash), <tt>$ZSH_VERSION</tt> (or if <tt>$VERSION</tt> begins with “zsh”); a <a
-href="@@RELPATH@@ksh-chan.htm#which-shell">list of more approaches</a> exists.
+href="@@RELPATH@@ksh-chan.htm#which-shell">list of more approaches</a> exists.</p>
+
+<p>Note that, in some scenarios, it might be very useful to actually set
+ <tt>$ENV</tt>: the regular interactive shell startup file lies in the
+ user’s home directory, relying on being copied from <tt>/etc/skel/</tt>
+ which normally is only done at user creation time. If mksh was installed
+ later, the user often won’t get it at all, and delivering updates is
+ challenging. One way of partially working around this is to ship an
+ <tt>/etc/skel/.mkshrc</tt> that reads <tt>/etc/mkshrc</tt> by default
+ (but the user can change it of course) and ship the <tt>dot.mkshrc</tt>
+ file as <tt>/etc/mkshrc</tt>, but that won’t fully help. This is where
+ <tt>$ENV</tt> comes into play:</p><ul>
+  <li>In <tt>/etc/profile</tt>, set <tt>ENV</tt> to a, say, <tt>shrc</tt>
+   file shipped in <tt>/etc/</tt> and export it.</li>
+  <li>In that new file, which must use only constructs compatible with
+   all shells, usually a subset of POSIX, read the various rc files
+   (<tt>.mkshrc</tt> for mksh, <tt>.kshrc</tt> for AT&amp;T ksh93, etc.)
+   from the user’s home if they exist, from <tt>/etc/skel/</tt> otherwise.</li>
+</ul><p>This may very well be <em>required</em> if the alternative would
+ be <a href="#ps1weird">to <del><tt>export PS1</tt></del>[sic!]</a>. <a
+  href="https://gitlab.alpinelinux.org/alpine/aports/-/issues/12398#note_146574";
+ >alpine Linux</a> encountered this very problem, and the linked post is
+ a (draft) solution using the <tt>$ENV</tt> method and looks at various
+ other shells’ startup file situation as well.</p>
 ----
 ToC: ctrl-x-e
 Title: Multiline command editing
@@ -332,9 +355,15 @@ Use ^[^L (Escape+Ctrl-L) or rebind it:<br />
 ToC: ctrl-u-pico
 Title: ^U (Ctrl-U) clears the entire line
 
-If it should only delete the line up to the cursor, use:<br />
+If you want it to only delete the line up to the cursor, use:<br />
 <tt>bind -m ^U='^[0^K'</tt>
 ----
+ToC: ctrl-w-bash
+Title: ^W (Ctrl-W) deletes a word, not a bigword
+
+If you want it to delete more, with R60 you can use:<br />
+<tt>bind '^W=delete-bigword-backward'</tt>
+----
 ToC: cur-up-zsh
 Title: Cursor Up behaves differently from zsh
 
@@ -402,7 +431,11 @@ Title: How POSIX compliant is mksh? Also, UTF-8 vs. locales?
  UCS and maps raw octets into the U+EF80‥U+EFFF wide character range; see
  <tt>Arithmetic expressions</tt> in mksh(1) for details) <em>must</em> stay
  disabled in POSIX mode (it is disabled upon enabling POSIX mode in R56+ but
- don’t depend on this with full locale tracking).</p>
+ don’t depend on this to stay once locale tracking will be implemented; the
+ disabling code is not present in this build).</p>
+<p><strong>Future compatibility note:</strong> there’s work underway to use
+ full 21-bit UTF-8 in mksh R60 or so. Raw octet mapping will almost certainly
+ be moved out of the PUA and to some range outside of UCS.</p>
 <p class="boxhead">The following POSIX sh-compatible code toggles the
  <tt>utf8-mode</tt> option dependent on the current POSIX locale, for mksh
  to allow using the UTF-8 mode, within the constraints outlined above, in
diff --git mksh_59c-3/sh.h mksh_59c-4/sh.h
index a6c3594..f115a6f 100644
--- mksh_59c-3/sh.h
+++ mksh_59c-4/sh.h
@@ -195,7 +195,7 @@
 #ifdef EXTERN
 __RCSID("$MirOS: src/bin/mksh/sh.h,v 1.906 2021/01/24 19:37:31 tg Exp $");
 #endif
-#define MKSH_VERSION "R59 2021/02/07"
+#define MKSH_VERSION "R59 2021/03/13"
 
 /* arithmetic types: C implementation */
 #if !HAVE_CAN_INTTYPES

--- End Message ---

--- Begin Message ---

To: 986431-done@bugs.debian.org

Subject: unblock mksh

From: Sebastian Ramacher <sramacher@respighi.debian.org>

Date: Wed, 07 Apr 2021 20:31:12 +0000

Message-id: <E1lUEpU-0005NN-GX@respighi.debian.org>
Unblocked.
--- End Message ---

Reply to:

References:
- Bug#986431: unblock: mksh/59c-4
  - From: Thorsten Glaser <tg@mirbsd.de>

Prev by Date: Bug#986596: marked as done ([pre-approval] unblock: alien/ 8.95.4)
Next by Date: Processed: Re: Bug#986341: unblock: neomutt/20201127+dfsg.1-1.1
Previous by thread: Bug#986431: unblock: mksh/59c-4
Next by thread: Bug#986439: [pre-approval] unblock: node-xmldom/0.5.0-1
Index(es):
- Date
- Thread