[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#989314: marked as done (unblock: mksh/59c-8)



Your message dated Wed, 02 Jun 2021 18:22:44 +0000
with message-id <E1loVVs-00067K-UN@respighi.debian.org>
and subject line unblock mksh
has caused the Debian Bug report #989314,
regarding unblock: mksh/59c-8
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
989314: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989314
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: release.debian.org
Severity: normal
User: release.debian.org@packages.debian.org
Usertags: unblock
X-Debbugs-Cc: tg@mirbsd.de
Control: block -1 by 989279

Please unblock package mksh

[ Reason ]
This upload addresses the following issues:
• Work around #988027 in klibc (which is a POSIX violation but
  apparently deliberate by upstream) by using {set,long}jmp
  instead of sig{set,long}jmp when not saving/restoring signals
  (cherry-pick from upstream)
• Rebuild against klibc with #943425 ({set,long}jmp on s390x use
  wrong registers) fixed
• Cherry-pick another two upstream memory leak fixes
• Backport just enough (for Debian) of upstream patch fixing the
  way control characters are escaped when showing variable contents
  (for reentrancy or when deliberately escaping with ${varname@Q});
  specifically, escape C1 control characters dependent on whether
  utf8-mode is on (UTF8-encoded) or off (bytewise), catching some
  situations in which they were not escaped properly, make the
  escaped output match the UTF-8 mode better, and add a shell option
  “asis” to allow \x80‥\x9F unescaped outside(!) of UTF-8 mode only
  for when the user uses a codepage that has them as printable, not
  control, characters

(There’s also a one-line d-ports-only change of no relevance to
the release architectures.)

[ Impact ]
• Potential (but minor; except on s390x, the testsuite didn’t catch
  anything) misbehaviour of the klibc-built binaries; outdated
  Built-Using for klibc once 2.0.8-6.1 migrates
• Minor memory leaks
• Attempting to display a variable escaped (“typeset -p varname”,
  “set | grep ^varname”) may send control sequences to the terminal,
  including sequences that cause xterm to, for example, dump the
  current terminal contents to files

[ Tests ]
The testsuite is very throughout (it did catch the s390x/klibc issue
and switched the mksh-static and lksh binary to musl for that); it
also proves the klibc change works. I’ve manually verified the escaping-
related changes. I’ve not verified the memory leaks separately, but
the codepaths are like this that, if they were wrong (e.g. use-after-
free), the testsuite (especially on MirBSD with malloc hardening)
would’ve caught it crashing.

I’ve run a number of scripts comparing output with the previous and
the binaries from this upload installed.

[ Risks ]
As I wrote in earlier unblock requests (#987975, #986431) mksh is
effectively leaf in Debian, and changes like these are low risk.

I’ve reduced the upstream commits related to escaping to include
only the necessary parts to make reviewing easier. This carries
some, but very low, risk. The tests would have caught mistakes
during that (incidentally, they did, when I removed a hunk which
I at first thought not necessary).


[ Checklist ]
  [x] all changes are documented in the d/changelog
  [x] I reviewed all changes and I approve them
  [x] attach debdiff against the package in testing

[ Other info ]
I’m attaching a diff of the unpacked trees instead of debdiff(1)
output again because I use single-debian-patch here and that’d
be a nightmare to review. I’ve commented in the diff which hunks
match which issue.

I’m also attaching a “diff -w” of the file misc.c to make review
easier; a huge part of the escaping code lost one level of indent.

I’ll be uploading another escaping-related fix revision, similar
issues (C0/C1 control characters and DEL) but in the command line
editor and tab completion code, but I have yet to find the time
to actually fix these issues first and think that including what
we’ve already got in sid into testing right now is a good thing.


unblock mksh/59c-8
diff -pruN mksh_59c-6/debian/changelog mksh_59c-8/debian/changelog
--- mksh_59c-6/debian/changelog	2021-05-03 03:26:28.000000000 +0200
+++ mksh_59c-8/debian/changelog	2021-05-31 02:42:55.000000000 +0200
@@ -1,3 +1,26 @@
+mksh (59c-8) unstable; urgency=medium
+
+  * Fix a -Wpointer-sign in escaping code
+  * Shrink escape diff (algorithm unchanged) for easier review
+
+ -- Thorsten Glaser <tg@mirbsd.de>  Mon, 31 May 2021 02:42:55 +0200
+
+mksh (59c-7) unstable; urgency=medium
+
+  * Do not use sigsetjmp(â?¦, 0) with klibc (cf. #988027)
+  * Cherry-pick upstream memory leak fixes
+  * Apply just enough upstream changes to address more escaping
+    issues: for ${var@Q} and â??typeset -pâ??, take UTF-8 mode (on/off)
+    into account; donâ??t issue \uNNNN escapes outside of UTF-8 mode,
+    donâ??t escape nÅ?n-ASCII printable, that is, nÅ?n-control characters;
+    always escape C0 controls and DEL; escape C1 controls by default,
+    but add an option â??asisâ?? to disable that (e.g. for DOS codepages)
+    in nÅ?n-UTF8 modeâ??â??â??note this will need fixing for tab completion,
+    command line editing (in a subsequent upload)â?¦
+  * Work around hppa buildd issue (same as m68k, sh4)
+
+ -- Thorsten Glaser <tg@mirbsd.de>  Sun, 30 May 2021 23:57:59 +0200
+
 mksh (59c-6) unstable; urgency=medium
 
   * Clear â??taintâ?? on most actions mutating a variable
diff -pruN mksh_59c-6/debian/meat mksh_59c-8/debian/meat
--- mksh_59c-6/debian/meat	2021-05-03 03:25:11.000000000 +0200
+++ mksh_59c-8/debian/meat	2021-05-31 02:40:08.000000000 +0200
@@ -352,7 +352,7 @@ done
 
 test x"$iscross" = x"1" || case $DEB_BUILD_OPTIONS:$DEB_HOST_ARCH in
 (*reallynocheck*) ;;
-(*:m68k|*:sh4)
+(*:hppa|*:m68k|*:sh4)
 	# ignore nocheck on these architectures
 	nocheck=0 ;;
 esac

NOTE the following works around #988027 in klibc

diff -pruN mksh_59c-6/Build.sh mksh_59c-8/Build.sh
--- mksh_59c-6/Build.sh	2021-06-01 01:09:30.000000000 +0200
+++ mksh_59c-8/Build.sh	2021-06-01 01:09:41.000000000 +0200
@@ -1375,6 +1375,7 @@ esac
 etd=" on $et"
 case $et in
 klibc)
+	add_cppflags -DMKSH_NO_SIGSETJMP -D_setjmp=setjmp -D_longjmp=longjmp
 	: "${MKSH_UNLIMITED=1}"
 	;;
 unknown)

NOTE the following are the memory leak fixes

diff -pruN mksh_59c-6/lex.c mksh_59c-8/lex.c
--- mksh_59c-6/lex.c	2021-06-01 01:09:30.000000000 +0200
+++ mksh_59c-8/lex.c	2021-06-01 01:09:41.000000000 +0200
@@ -1487,6 +1487,8 @@ set_prompt(int to, Source *s)
 			Area *saved_atemp;
 			int saved_lineno;
 
+			saved_atemp = ATEMP;
+			newenv(E_ERRH);
 			ps1 = str_val(global("PS1"));
 			shf = shf_sopen(NULL, strlen(ps1) * 2,
 			    SHF_WR | SHF_DYNAMIC, NULL);
@@ -1500,8 +1502,6 @@ set_prompt(int to, Source *s)
 			saved_lineno = current_lineno;
 			if (s)
 				current_lineno = s->line + 1;
-			saved_atemp = ATEMP;
-			newenv(E_ERRH);
 			if (kshsetjmp(e->jbuf)) {
 				prompt = safe_prompt;
 				/*
@@ -1516,6 +1516,7 @@ set_prompt(int to, Source *s)
 				strdupx(prompt, cp, saved_atemp);
 			}
 			current_lineno = saved_lineno;
+			/* frees everything in post-newenv ATEMP */
 			quitenv(NULL);
 		}
 		break;
diff -pruN mksh_59c-6/tree.c mksh_59c-8/tree.c
--- mksh_59c-6/tree.c	2020-10-31 05:29:21.000000000 +0100
+++ mksh_59c-8/tree.c	2021-06-01 01:09:41.000000000 +0200
@@ -43,8 +43,12 @@ static bool ptree_hashere;
 static struct shf ptree_heredoc;
 #define ptree_outhere(shf) do {					\
 	if (ptree_hashere) {					\
-		shf_puts(shf_sclose(&ptree_heredoc), (shf));	\
+		char *ptree_thehere;				\
+								\
+		ptree_thehere = shf_sclose(&ptree_heredoc);	\
+		shf_puts(ptree_thehere, (shf));			\
 		shf_putc('\n', (shf));				\
+		afree(ptree_thehere, ATEMP);			\
 		ptree_hashere = false;				\
 		/*prevent_semicolon = true;*/			\
 	}							\

NOTE everything else (below) is the escaping fixes
(might want to read misc.c.-w.diff which is 'diff -uw'
for misc.c to make reviewing easier)

diff -pruN mksh_59c-6/check.t mksh_59c-8/check.t
--- mksh_59c-6/check.t	2021-06-01 01:09:30.000000000 +0200
+++ mksh_59c-8/check.t	2021-06-01 01:09:41.000000000 +0200
@@ -31,7 +31,7 @@
 # (2013/12/02 20:39:44) http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/regress/bin/ksh/?sortby=date
 
 expected-stdout:
-	KSH R59 2021/05/03
+	KSH R59 2021/05/30
 description:
 	Check base version of full shell
 stdin:
@@ -9933,13 +9933,19 @@ stdin:
 	print -r -- "s=\"$s\""
 	eval "$s"
 	typeset -p u v w
+	set -o asis
+	typeset -p w
+	set -U
+	typeset -p w
 expected-stdout:
 	<i=x j=a b k=c
 	d eâ?¬f>
-	s="u=x v='a b' w=$'c\nd\240e\u20ACf'"
+	s="u=x v='a b' w=$'c\nd eâ\202¬f'"
 	typeset u=x
 	typeset v='a b'
-	typeset w=$'c\nd\240e\u20ACf'
+	typeset w=$'c\nd eâ\202¬f'
+	typeset w=$'c\nd eâ?¬f'
+	typeset w=$'c\nd\240eâ?¬f'
 ---
 name: varexpand-special-quote-faux-EBCDIC
 description:
@@ -13787,8 +13793,14 @@ stdin:
 	done
 	s+=$'\xC2\xA0\xE2\x82\xAC\xEF\xBF\xBD\xEF\xBF\xBE\xEF\xBF\xBF\xF0\x90\x80\x80.'
 	typeset -p s
+	set -o asis
+	typeset -p s
+	set -U
+	typeset -p s
 expected-stdout:
-	typeset s=$'\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\E\034\035\036\037 !"#$%&\047()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\177\200\201\202\203\204\205\206\207\210\211\212\213\214\215\216\217\220\221\222\223\224\225\226\227\230\231\232\233\234\235\236\237\240\241\242\243\244\245\246\247\250\251\252\253\254\255\256\257\260\261\262\263\264\265\266\267\270\271\272\273\274\275\276\277\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\327\330\331\332\333\334\335\336\337\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\367\370\371\372\373\374\375\376\377\u00A0\u20AC\uFFFD\357\277\276\357\277\277\360\220\200\200.'
+	typeset s=$'\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\E\034\035\036\037 !"#$%&\047()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\177\200\201\202\203\204\205\206\207\210\211\212\213\214\215\216\217\220\221\222\223\224\225\226\227\230\231\232\233\234\235\236\237 ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ â\202¬ï¿½ï¿¾ï¿¿ð\220\200\200.'
+	typeset s=$'\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\E\034\035\036\037 !"#$%&\047()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\177???????????????????????????????? ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ â?¬ï¿½ï¿¾ï¿¿ð???.'
+	typeset s=$'\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\E\034\035\036\037 !"#$%&\047()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\177\200\201\202\203\204\205\206\207\210\211\212\213\214\215\216\217\220\221\222\223\224\225\226\227\230\231\232\233\234\235\236\237\240\241\242\243\244\245\246\247\250\251\252\253\254\255\256\257\260\261\262\263\264\265\266\267\270\271\272\273\274\275\276\277\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\327\330\331\332\333\334\335\336\337\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\367\370\371\372\373\374\375\376\377 ��\357\277\276\357\277\277\360\220\200\200.'
 ---
 name: duffs-device-ebcdic
 description:
diff -pruN mksh_59c-6/misc.c mksh_59c-8/misc.c
--- mksh_59c-6/misc.c	2021-06-01 01:09:30.000000000 +0200
+++ mksh_59c-8/misc.c	2021-06-01 01:09:41.000000000 +0200
@@ -57,6 +57,8 @@ static const unsigned char *gmatch_cclas
 #ifdef KSH_CHVT_CODE
 static void chvt(const Getopt *);
 #endif
+static unsigned int dollarqU(struct shf *, const unsigned char *);
+#define dollarq8 dollarqU
 
 /*XXX this should go away */
 static int make_path(const char *, const char *, char **, XString *, int *);
@@ -1402,116 +1404,180 @@ print_value_quoted(struct shf *shf, cons
 	}
 
 	/* non-empty; check whether any quotes are needed */
-	while (rtt2asc(c = *p++) >= 32)
-		if (ctype(c, C_QUOTE | C_SPC))
-			inquote = false;
+	if (UTFMODE) {
+		/* C1 always escaped; multibyte makes this tricky */
+		while ((c = *p++) != 0) {
+			if (ctype(c, C_CNTRL)) {
+				dollarqU(shf, (const unsigned char *)s);
+				return;
+			}
+			/* anything out of ASCII present? */
+			if (rtt2asc(c) > 0x7EU) {
+				/* dollar-quote, but be prepared to redo */
+				char *guess;
+				struct shf to;
+
+				shf_sopen(NULL, 0, SHF_WR | SHF_DYNAMIC, &to);
+				c = dollarqU(&to, (const unsigned char *)s);
+				guess = shf_sclose(&to);
+				/* output guess if it was right */
+				if (c > 1)
+					shf_puts(guess, shf);
+				afree(guess, ATEMP);
+				if (c == 1)
+					goto always_single;
+				if (c == 0)
+ noquoteneeded:
+					shf_puts(s, shf);
+				return;
+			}
+			if (ctype(c, C_QUOTE | C_SPC))
+				inquote = false;
+		}
+		/* assert: c == 0; all chars in [20;7E] ASCII */
+#ifndef MKSH_EBCDIC
+	} else if (Flag(FASIS)) {
+		while ((c = *p++), !ksh_asisctrl(c))
+			if (ctype(c, C_QUOTE | C_SPC))
+				inquote = false;
+#endif
+	} else {
+		while ((c = *p++), !ksh_isctrl(c))
+			if (ctype(c, C_QUOTE | C_SPC))
+				inquote = false;
+	}
+	/* state: if c == 0, all chars printable, inquote shortcuts */
+
+	if (c) {
+		/* otherwise, escape control chars */
+		dollarq8(shf, (const unsigned char *)s);
+		return;
+	}
 
+	/* can we shortcut? */
+	if (inquote)
+		goto noquoteneeded;
+	/* no */
+ always_single:
+	/* all chars printable, no control chars, quote nicely */
+	inquote = false;
 	p = (const unsigned char *)s;
-	if (c == 0) {
-		if (inquote) {
-			/* nope, use the shortcut */
-			shf_puts(s, shf);
-			return;
-		}
 
-		/* otherwise, quote nicely via state machine */
-		while ((c = *p++) != 0) {
-			if (c == '\'') {
-				/*
-				 * multiple single quotes or any of them
-				 * at the beginning of a string look nicer
-				 * this way than when simply substituting
-				 */
-				if (inquote) {
-					shf_putc('\'', shf);
-					inquote = false;
-				}
-				shf_putc('\\', shf);
-			} else if (!inquote) {
+	while ((c = *p++) != 0) {
+		if (c == '\'') {
+			if (inquote) {
 				shf_putc('\'', shf);
-				inquote = true;
+				inquote = false;
 			}
-			shf_putc(c, shf);
+			shf_putc('\\', shf);
+		} else if (!inquote) {
+			shf_putc('\'', shf);
+			inquote = true;
 		}
-	} else {
-		unsigned int wc;
-		size_t n;
-
-		/* use $'...' quote format */
-		shf_putc('$', shf);
+		shf_putc(c, shf);
+	}
+	if (inquote)
 		shf_putc('\'', shf);
-		while ((c = *p) != 0) {
-#ifndef MKSH_EBCDIC
-			if (c >= 0xC2) {
-				n = utf_mbtowc(&wc, (const char *)p);
-				if (n != (size_t)-1) {
-					p += n;
-					shf_fprintf(shf, "\\u%04X", wc);
-					continue;
-				}
-			}
-#endif
-			++p;
-			switch (c) {
-			/* see unbksl() in this file for comments */
-			case KSH_BEL:
-				c = 'a';
-				if (0)
-					/* FALLTHROUGH */
-			case '\b':
-				  c = 'b';
-				if (0)
-					/* FALLTHROUGH */
-			case '\f':
-				  c = 'f';
-				if (0)
-					/* FALLTHROUGH */
-			case '\n':
-				  c = 'n';
-				if (0)
-					/* FALLTHROUGH */
-			case '\r':
-				  c = 'r';
-				if (0)
-					/* FALLTHROUGH */
-			case '\t':
-				  c = 't';
-				if (0)
-					/* FALLTHROUGH */
-			case KSH_VTAB:
-				  c = 'v';
-				if (0)
-					/* FALLTHROUGH */
-			case KSH_ESC:
-				/* take E not e because \e is \ in *roff */
-				  c = 'E';
-				/* FALLTHROUGH */
-			case '\\':
-				shf_putc('\\', shf);
+}
 
-				if (0)
-					/* FALLTHROUGH */
-			default:
-#if defined(MKSH_EBCDIC) || defined(MKSH_FAUX_EBCDIC)
-				  if (ksh_isctrl(c))
+#ifdef MKSH_EBCDIC
+#define dollarq_isctrl8(c)	ksh_isctrl(c)
 #else
-				  if (!ctype(c, C_PRINT))
+#define dollarq_isctrl8(c)	Flag(FASIS) ? ksh_asisctrl(c) : ksh_isctrl(c)
 #endif
-				    {
-					/* FALLTHROUGH */
-			case '\'':
-					shf_fprintf(shf, "\\%03o", c);
-					break;
-				}
 
-				shf_putc(c, shf);
+#define dollarq_Uctrl(c)	!ctype(c, C_PRINT)
+#define dollarq_isctrlU(c)	UTFMODE ? dollarq_Uctrl(c) : dollarq_isctrl8(c)
+
+/* escape with $'...' (!MKSH_SMALL: in UTFMODE) */
+static unsigned int
+dollarqU(struct shf *shf, const unsigned char *s)
+{
+	unsigned char c;
+	unsigned int wc;
+	size_t n;
+	unsigned int rv = 0;
+
+	shf_putc('$', shf);
+	shf_putc('\'', shf);
+	while ((c = *s) != 0) {
+		if (UTFMODE &&
+		    rtt2asc(c) >= 0xC2U && (n = utf_mbtowc(&wc,
+		    (const char *)s)) != (size_t)-1) {
+			/* valid UTF-8 multibyte character > 0x7F */
+			if ((wc ^ 0x80U) < 0x20U) {
+				/* C1 control character */
+				shf_fprintf(shf, "\\u%04X", wc);
+				rv = 2;
+			} else {
+				/*
+				 * print as-is; we assume the tty DTRT for
+				 * interlinear annotations, LTR/RTL mark,
+				 * U+2028, U+2029, U+2066..U+206F, etc.
+				 */
+				shf_write((const char *)s, n, shf);
+			}
+			s += n;
+			continue;
+		}
+		++s;
+		/* single octet */
+		rv |= ctype(c, C_QUOTE | C_SPC);
+		switch (c) {
+		/* see unbksl() in this file for comments */
+		case KSH_BEL:
+			c = 'a';
+			if (0)
+				/* FALLTHROUGH */
+		case '\b':
+			  c = 'b';
+			if (0)
+				/* FALLTHROUGH */
+		case '\f':
+			  c = 'f';
+			if (0)
+				/* FALLTHROUGH */
+		case '\n':
+			  c = 'n';
+			if (0)
+				/* FALLTHROUGH */
+		case '\r':
+			  c = 'r';
+			if (0)
+				/* FALLTHROUGH */
+		case '\t':
+			  c = 't';
+			if (0)
+				/* FALLTHROUGH */
+		case KSH_VTAB:
+			  c = 'v';
+			if (0)
+				/* FALLTHROUGH */
+		case KSH_ESC:
+			/* take E not e because \e is \ in *roff */
+			  c = 'E';
+			rv = 2;
+			/* FALLTHROUGH */
+		case '\\':
+			shf_putc('\\', shf);
+
+			if (0)
+				/* FALLTHROUGH */
+		default:
+			  if (dollarq_isctrlU(c)) {
+				rv = 2;
+				/* FALLTHROUGH */
+		case '\'':
+				shf_fprintf(shf, "\\%03o", c);
 				break;
 			}
+
+			shf_putc(c, shf);
+			break;
 		}
-		inquote = true;
 	}
-	if (inquote)
-		shf_putc('\'', shf);
+	shf_putc('\'', shf);
+	return (rv);
 }
 
 /*
diff -pruN mksh_59c-6/mksh.1 mksh_59c-8/mksh.1
--- mksh_59c-6/mksh.1	2021-06-01 01:09:30.000000000 +0200
+++ mksh_59c-8/mksh.1	2021-06-01 01:09:41.000000000 +0200
@@ -4470,6 +4470,15 @@ during globbing.
 .It Fl x \*(Ba Fl o Ic xtrace
 Print commands when they are executed, preceded by
 .Ev PS4 .
+.It Fl o Ic asis
+When quoting output, if not in EBCDIC mode and
+.Ic utf8\-mode
+is disabled, show C1 control characters
+.Dq as is ,
+that is, do not escape them.
+Use with codepages where the range 0x80..0x9F contains printable
+characters (such as 437, 850, 1252, etc. but not the ISO\ 8859
+series, for example).
 .It Fl o Ic bgnice
 Background jobs are run with lower priority.
 .It Fl o Ic braceexpand
diff -pruN mksh_59c-6/sh.h mksh_59c-8/sh.h
--- mksh_59c-6/sh.h	2021-06-01 01:09:30.000000000 +0200
+++ mksh_59c-8/sh.h	2021-06-01 01:09:41.000000000 +0200
@@ -195,7 +195,7 @@
 #ifdef EXTERN
 __RCSID("$MirOS: src/bin/mksh/sh.h,v 1.906 2021/01/24 19:37:31 tg Exp $");
 #endif
-#define MKSH_VERSION "R59 2021/05/03"
+#define MKSH_VERSION "R59 2021/05/30"
 
 /* arithmetic types: C implementation */
 #if !HAVE_CAN_INTTYPES
@@ -1576,6 +1576,7 @@ extern void ebcdic_init(void);
 #define ksh_isctrl(c)	(ord(c) < 0x40 || ord(c) == 0xFF)
 #else
 #define ksh_isctrl(c)	((ord(c) & 0x7F) < 0x20 || ord(c) == 0x7F)
+#define ksh_asisctrl(c)	(ord(c) < 0x20U || ord(c) == 0x7FU)
 #endif
 /* new fast character classes */
 #define ctype(c,t)	tobool(ksh_ctypes[ord(c)] & (t))
diff -pruN mksh_59c-6/sh_flags.opt mksh_59c-8/sh_flags.opt
--- mksh_59c-6/sh_flags.opt	2020-05-17 00:38:52.000000000 +0200
+++ mksh_59c-8/sh_flags.opt	2021-06-01 01:09:41.000000000 +0200
@@ -43,6 +43,10 @@ __RCSID("$MirOS: src/bin/mksh/sh_flags.o
 >a|
 F0("allexport", FEXPORT, OF_ANY
 
+/* ./.	when quoting, show C1 control characters as-is; +U only */
+>|
+FN("asis", FASIS, OF_ANY
+
 /* ./.	bgnice */
 >| HAVE_NICE
 FN("bgnice", FBGNICE, OF_ANY
--- mksh_59c-6/misc.c	2021-06-01 01:09:30.000000000 +0200
+++ mksh_59c-8/misc.c	2021-06-01 01:09:41.000000000 +0200
@@ -57,6 +57,8 @@ static const unsigned char *gmatch_cclas
 #ifdef KSH_CHVT_CODE
 static void chvt(const Getopt *);
 #endif
+static unsigned int dollarqU(struct shf *, const unsigned char *);
+#define dollarq8 dollarqU
 
 /*XXX this should go away */
 static int make_path(const char *, const char *, char **, XString *, int *);
@@ -1402,26 +1404,67 @@ print_value_quoted(struct shf *shf, cons
 	}
 
 	/* non-empty; check whether any quotes are needed */
-	while (rtt2asc(c = *p++) >= 32)
+	if (UTFMODE) {
+		/* C1 always escaped; multibyte makes this tricky */
+		while ((c = *p++) != 0) {
+			if (ctype(c, C_CNTRL)) {
+				dollarqU(shf, (const unsigned char *)s);
+				return;
+			}
+			/* anything out of ASCII present? */
+			if (rtt2asc(c) > 0x7EU) {
+				/* dollar-quote, but be prepared to redo */
+				char *guess;
+				struct shf to;
+
+				shf_sopen(NULL, 0, SHF_WR | SHF_DYNAMIC, &to);
+				c = dollarqU(&to, (const unsigned char *)s);
+				guess = shf_sclose(&to);
+				/* output guess if it was right */
+				if (c > 1)
+					shf_puts(guess, shf);
+				afree(guess, ATEMP);
+				if (c == 1)
+					goto always_single;
+				if (c == 0)
+ noquoteneeded:
+					shf_puts(s, shf);
+				return;
+			}
+			if (ctype(c, C_QUOTE | C_SPC))
+				inquote = false;
+		}
+		/* assert: c == 0; all chars in [20;7E] ASCII */
+#ifndef MKSH_EBCDIC
+	} else if (Flag(FASIS)) {
+		while ((c = *p++), !ksh_asisctrl(c))
+			if (ctype(c, C_QUOTE | C_SPC))
+				inquote = false;
+#endif
+	} else {
+		while ((c = *p++), !ksh_isctrl(c))
 		if (ctype(c, C_QUOTE | C_SPC))
 			inquote = false;
+	}
+	/* state: if c == 0, all chars printable, inquote shortcuts */
 
-	p = (const unsigned char *)s;
-	if (c == 0) {
-		if (inquote) {
-			/* nope, use the shortcut */
-			shf_puts(s, shf);
+	if (c) {
+		/* otherwise, escape control chars */
+		dollarq8(shf, (const unsigned char *)s);
 			return;
 		}
 
-		/* otherwise, quote nicely via state machine */
+	/* can we shortcut? */
+	if (inquote)
+		goto noquoteneeded;
+	/* no */
+ always_single:
+	/* all chars printable, no control chars, quote nicely */
+	inquote = false;
+	p = (const unsigned char *)s;
+
 		while ((c = *p++) != 0) {
 			if (c == '\'') {
-				/*
-				 * multiple single quotes or any of them
-				 * at the beginning of a string look nicer
-				 * this way than when simply substituting
-				 */
 				if (inquote) {
 					shf_putc('\'', shf);
 					inquote = false;
@@ -1433,25 +1476,53 @@ print_value_quoted(struct shf *shf, cons
 			}
 			shf_putc(c, shf);
 		}
-	} else {
+	if (inquote)
+		shf_putc('\'', shf);
+}
+
+#ifdef MKSH_EBCDIC
+#define dollarq_isctrl8(c)	ksh_isctrl(c)
+#else
+#define dollarq_isctrl8(c)	Flag(FASIS) ? ksh_asisctrl(c) : ksh_isctrl(c)
+#endif
+
+#define dollarq_Uctrl(c)	!ctype(c, C_PRINT)
+#define dollarq_isctrlU(c)	UTFMODE ? dollarq_Uctrl(c) : dollarq_isctrl8(c)
+
+/* escape with $'...' (!MKSH_SMALL: in UTFMODE) */
+static unsigned int
+dollarqU(struct shf *shf, const unsigned char *s)
+{
+	unsigned char c;
 		unsigned int wc;
 		size_t n;
+	unsigned int rv = 0;
 
-		/* use $'...' quote format */
 		shf_putc('$', shf);
 		shf_putc('\'', shf);
-		while ((c = *p) != 0) {
-#ifndef MKSH_EBCDIC
-			if (c >= 0xC2) {
-				n = utf_mbtowc(&wc, (const char *)p);
-				if (n != (size_t)-1) {
-					p += n;
+	while ((c = *s) != 0) {
+		if (UTFMODE &&
+		    rtt2asc(c) >= 0xC2U && (n = utf_mbtowc(&wc,
+		    (const char *)s)) != (size_t)-1) {
+			/* valid UTF-8 multibyte character > 0x7F */
+			if ((wc ^ 0x80U) < 0x20U) {
+				/* C1 control character */
 					shf_fprintf(shf, "\\u%04X", wc);
-					continue;
+				rv = 2;
+			} else {
+				/*
+				 * print as-is; we assume the tty DTRT for
+				 * interlinear annotations, LTR/RTL mark,
+				 * U+2028, U+2029, U+2066..U+206F, etc.
+				 */
+				shf_write((const char *)s, n, shf);
 				}
+			s += n;
+			continue;
 			}
-#endif
-			++p;
+		++s;
+		/* single octet */
+		rv |= ctype(c, C_QUOTE | C_SPC);
 			switch (c) {
 			/* see unbksl() in this file for comments */
 			case KSH_BEL:
@@ -1485,6 +1556,7 @@ print_value_quoted(struct shf *shf, cons
 			case KSH_ESC:
 				/* take E not e because \e is \ in *roff */
 				  c = 'E';
+			rv = 2;
 				/* FALLTHROUGH */
 			case '\\':
 				shf_putc('\\', shf);
@@ -1492,12 +1564,8 @@ print_value_quoted(struct shf *shf, cons
 				if (0)
 					/* FALLTHROUGH */
 			default:
-#if defined(MKSH_EBCDIC) || defined(MKSH_FAUX_EBCDIC)
-				  if (ksh_isctrl(c))
-#else
-				  if (!ctype(c, C_PRINT))
-#endif
-				    {
+			  if (dollarq_isctrlU(c)) {
+				rv = 2;
 					/* FALLTHROUGH */
 			case '\'':
 					shf_fprintf(shf, "\\%03o", c);
@@ -1508,10 +1576,8 @@ print_value_quoted(struct shf *shf, cons
 				break;
 			}
 		}
-		inquote = true;
-	}
-	if (inquote)
 		shf_putc('\'', shf);
+	return (rv);
 }
 
 /*

--- End Message ---
--- Begin Message ---
Unblocked.

--- End Message ---

Reply to: