[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#38107: new patch for man-db japanese support (need help for other langs)



[crossposted to debian-i18n because I need people from other charsets to
compelte the table in the patch and try if and how it works for their
languages/charsets.]

Here is an attempt to have man-db display pages written in languages
other than latin1 to show correctly using their own charset.
Apply the patch to man-db -69i sources and recompile.
You don't need to make a new package, or to change your machine: issue
debian/rules debug (need sudo installed) and you'll get the new binary
as src/man with the correct permissions.
If you want to debug it, use   sudo gdb src/man    to get it working, as
gdb croacks on setgid binaries.

To update the table, put a new line _before_ the one with the asterisc;
put yor LANG value in the first column, the driver you use for groff
(when you call man -t device ...) in the second, and the value of
LESSCHARSET env var in the third.
with that, if man selects to display a page under .../man/ja/man1/
(which is driven by LANG), it will set the values in the corresponding
line of the table, and 
	man foo
should work as if you issued 
	LESSCHARSET=ja man -Tnippon foo


On Fri, Sep 10, 1999 at 03:46:57PM +0900, Fumitoshi UKAI wrote:
> At Fri, 10 Sep 1999 07:00:45 +0300, Fabrizio Polacco <fab@prosa.it> wrote:
> 
> > The problem here is that we are using the roff_device for specifying the
> > charset.
> > The roff_device is the renderer of the page.
> > Is not that the roff_device is "latin1" because the page is written in
> > ascii, but because the page will be displayed on an ascii device.
> > The same page can be rendered in postscript (which is the default) and
> > displayed on a postscript dsplay or printer.
> 
> Ah, I understand.  The roff_device is used to select output device.
> It should not selected by charset of manpages as I said. Sorry.
> 

So, here is a new patch, completely different from the previous.
It uses the same table, but that is used directly while creating the
roff command line, and can be overritten by options and ENV VARS.

Please try it for japanese pages. It should use nippon device for pages
found on /man/ja/man?/ path, and latin1 for the /man/man?/ path,
indipendently of the LANG var.

The problem is: will this approach work also for other charsets, like
latin2, greek, cyrillic or korean?
Do they have a different groff driver?

As you see, I need info to fill in the table.

fab
-- 
| fab@pukki.ntc.nokia.com     fpolacco@prosa.it    fpolacco@debian.org
| 6F7267F5 fingerprint 57 16 C4 ED C9 86 40 7B 1A 69 A1 66 EC FB D2 5E
| fabrizio.polacco@nokia.com                  gsm: +358 (0)40 707 2468
--- /var/tmp/man.c	Wed Sep  8 16:15:44 1999
+++ src/man.c	Sun Sep 12 20:51:07 1999
@@ -142,6 +142,18 @@
 #  define STDERR_FILENO 2
 #endif
 
+char * lang;
+struct {
+	char *	lang;
+	char *	device;
+	char *	charset;
+} lang_table[] =	{
+	/* LANG		roff_device	LESSCHARSET */
+	{ "ja"		, "nippon"	, "ja"		},
+	{ "cs"		, "latin2"	, "latin2"	},
+	{ "*"		, "latin1"	, "latin1"	},
+	{ 0		, 0		, 0		} };
+
 /* external formatter programs, one for use without -t, and one with -t */
 #define NFMT_PROG "./mandb_nfmt"
 #define TFMT_PROG "./mandb_tfmt"
@@ -317,6 +329,37 @@
 }
 #endif /* MAN_CATS */
 
+char * lang_dir( char * filename)
+{
+	char *ld;	/* the lang dir: point to static data */
+	char *fm;	/* the first "/man/" dir */
+	char *sm;	/* the second "/man?/" dir */
+
+	ld = "";
+	if ( ! filename ) 
+		return ld;
+
+	if ( ! (fm = strstr( filename, "/man/")) )
+		return ld;
+	if ( ! (sm = strstr( 3+fm, "/man")) )
+		return ld;
+	if ( sm == 4+fm )
+		return ld;
+	if ( sm[5] != '/' )
+		return ld;
+	if ( ! strchr( "123456789lno", sm[4]) )
+		return ld;
+	/* found a lang dir */
+	fm += 5;
+	if ( ! (sm = strchr( fm, '/')) )
+		return ld;
+	ld = xstrdup ( fm);
+	ld[sm-fm] = '\0';
+	if (debug)
+		fprintf (stderr, "found lang dir element %s\n", ld);
+	return ld;
+}
+
 static __inline__ void gripe_system (char *command, int status)
 {
 	error (CHILD_FAIL, 0, _( "command exited with status %d: %s"), status, command);
@@ -841,8 +884,6 @@
 	if (optind == argc)
 		gripe_no_name (NULL);
 
-	putenv("LESSCHARSET=latin1");
-
 	signal( SIGINT, int_handler);
 
 	/* man issued with `-l' option */
@@ -1285,6 +1326,27 @@
 		char *dev;	/* either " -T<mumble>" or "" */
 		int using_tbl = 0;
 
+		/* load the roff_device value dependent on the language dir in path */
+		if ( ! roff_device ) {
+			if ( ! *lang ) {
+				roff_device = "latin1";
+			} else {
+				int j;
+				for ( j=0; j && lang_table[j].lang; j++ ) {
+					if (( strncmp( lang_table[j].lang, lang
+						, strlen( lang_table[j].lang)) == 0 )
+					||  ( lang_table[j].lang[0] == '*' )) {
+						roff_device = lang_table[j].device;
+						troff = 1;
+						putenv( strappend ( 0
+							,"LESSCHARSET="
+							, lang_table[j].charset
+							, 0));
+					}
+					j= -1;
+				}
+			}
+		}
 		/* tell grops to guess the page size */
 		if ( roff_device && strcmp( roff_device, "ps") == 0 )
 			roff_device = strappend( NULL, "ps -P-g ", NULL);
@@ -2028,6 +2090,7 @@
 
 			if (debug)
 				fprintf (stderr, "found ultimate source file %s\n", man_file);
+			lang = lang_dir (man_file);
 
 			cat_file = find_cat_file (path, man_file, sec);
 			found += display (path, man_file, cat_file, title);
@@ -2135,6 +2198,7 @@
 
 			if (debug)
 				fprintf (stderr, "found ultimate source file %s\n", man_file);
+			lang = lang_dir (man_file);
 
 			cat_file = find_cat_file (path, man_file, in->ext);
 			found += display (path, man_file, cat_file, title);

Reply to: