Re: detect shell script language

To: Lorenzo Bettini <bettini@dsi.unifi.it>
Cc: Debian User Mailing List <debian-user@lists.debian.org>
Subject: Re: detect shell script language
From: Bob McGowan <bob_mcgowan@symantec.com>
Date: Tue, 05 Sep 2006 12:22:17 -0700
Message-id: <[🔎] 44FDCE69.6010206@symantec.com>
In-reply-to: <[🔎] 44FB091E.4020007@dsi.unifi.it>
References: <[🔎] 44F7E442.4080503@dsi.unifi.it> <[🔎] 44F7F0AA.2070000@cox.net> <[🔎] 44F8015D.3060006@dsi.unifi.it> <[🔎] b400c69a0609011145o499265b6g2be2bbb59869e356@mail.gmail.com> <[🔎] 44FB091E.4020007@dsi.unifi.it>

Some general comments about how I understand this stuff to work, a bitlong, perhaps, but to be sure we're all on the same starting line.Also, due to being out of the office, I missed some of the earlier mailsand so will most likely repeat some things that have already been said.My apologies in advance for the possibly excess verbiage.

1. the she/bang (#!) first line. This is a 'magic' value, and is usedby the system's 'exec' family of system calls to determine how toexecute the file. In the 'old' days (I'm familiar with AT&T version 7UNIX, for this), when a user typed in a command, the interactive shellwould immediately pass it off to 'exec' to execute. This is fine forbinary executables, but would fail on a script (or any text type) file.So, on return from exec with an error status, the shell would fork acopy of itself to try and run the script.

1a. As a result of the above, it was hard to tell whether the scriptwas a Bourne shell (sh) or C shell (csh), so the convention wasintroduced of using the Bourne shell no op command (:), as the firstline in a Bourne shell script. This convention can still be found inOracle's Bourne scripts, even as recent as Oracle 10.2 for Solaris (aLinux Oracle 10.2 install has mostly she/bang format, but at least onecame up with a colon character on the first line).

2. Similar tactics are used by Perl and Tcl/Tk (tclsh/wish) to causeexecution of the correct interpreter. This is based on the fact that,at least for the Bourne shell and its derivatives, execution and scriptvalidation are essentially concurrent. So, the shell never even gets tothe line following the 'exec' line. The exec is a perfectly legalscript command, which causes the desired scripting language engine(perl, tclsh, wish) to get run. Since these languages allow scriptcommands to cover multiple lines, they both see an 'if' test that fails,so they never execute the 'exec', and proceed to interpret the rest ofthe script.

3. Modern Bourne derived shells are designed to be as compatible aspossible with the original 'sh', so there is no easy way todifferentiate between them. The same applies to any shell derived from'csh' (tcsh, etc.). zsh, on the other hand, is a beast I know littleabout, but based on the man page, it appears to be a Bourne compatibleshell. In any case, the highlighting for these should be the sameanyway, so no sweat over differentiating them is needed.


Enough background.

Since all modern UNIX/Linux systems support the she/bang functionality,I think you'll find your best option is to use it to begin with. But itwould be a good idea, I believe, to also look for that archaic ':' asthe first character of a file (the file command reports these as 'shellarchive or script for antique kernel text'). A suggestion in one of theemails I did see, to use 'file' to help sort things out, is a good idea,as the command is pretty good at sorting things out (of course, as youhave access to source for 'file', you may be able to use it toincorporate the file command's heuristics directly in your code). Butthis is not a panacea, 'file' can be confused. A file with this content:


  exec "/usr/local/bin/perl" $0

and with execute permission set, will run (legal shell code, but illegalPerl, so there's an error from Perl about it). And 'file' just calls itan 'ASCII text file'.

And the above is no help for the cases mentioned in paragraph 2.Looking for a line with 'exec' alone is not enough, you would need tocheck to see if the text following it looks like a command to execute.This is because Bourne style shells allow you to open/close/reopen filesand file descriptors using 'exec', for example:


  #!/bin/sh
  exec 3<message.file 4>errors.out
  ...
  echo error condition >&4

  while read input
  do
  done <&3

And, of course, there are the special cases where the script is for twointerpreters. I use this to first run a shell script environment to setthings up for Perl (ORACLE_HOME, LD_LIBRARY_PATH, etc) for differentsystems (Linux, Solaris, Cygwin), and then do an 'exec $PERL' at a laterpoint (around line 90, IIRC). So, in this case, most of the script isperl code, but it starts out a shell code. And the 'exec' line uses avariable, so it's not clear from just the line what is being exec'd.

But, now, enough is enough, I hope this is helpful to you in figuringout what you need to do and to perhaps point out some of the pitfalls towatch out for.


Good luck,

Bob

Lorenzo Bettini wrote:

Maxim Vexler wrote:
(I'm thinking out loud here)
go ahead :-)
How about identifying patterns specific to each shell, and then
implementing an algorithm that would produce score for each shell
match. The one with the highest score will be the one used by
src-highlite. This perhaps should be a standalone utility/lib, a fact
that would allow it to be used in other implementation besides
src-highlite.
indeed I was thinking about something similar; the problem is that Ishould restrict it to shell scripts, since otherwise I should checkagainst all the possible language handled by source-highlight and thatwould be inefficient.
I should know more about script languages though, which is not the case(shame on me! ;-). However, I was thinking also of letting the userprovide his own regular expressions to detect a language, and that couldbe then enjoyed also by other users.
BTW, src-highlite is great. Thank you Lorenzo for adding another tool
to my already unbelievably huge free software tools arsenal.
WOW!  Thank you!  :-D

I'll let you know when I release this new version of source-highlight!
And by the way, if you use some language which is still not handled bysource-highlight, and would like to add it, please let me know and wecan work it out!
cheers
    Lorenzo

Reply to:

References:
- detect shell script language
  - From: Lorenzo Bettini <bettini@dsi.unifi.it>
- Re: detect shell script language
  - From: Ron Johnson <ron.l.johnson@cox.net>
- Re: detect shell script language
  - From: Lorenzo Bettini <bettini@dsi.unifi.it>
- Re: detect shell script language
  - From: "Maxim Vexler" <hq4ever@gmail.com>
- Re: detect shell script language
  - From: Lorenzo Bettini <bettini@dsi.unifi.it>

Prev by Date: Re: problems with gnome
Next by Date: Re: Thunderbird does not show attachments any longer
Previous by thread: Re: detect shell script language
Next by thread: Re: detect shell script language
Index(es):
- Date
- Thread