[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: SPlus balked at Linux; R: statistician's dream



This message addresses several private questions about R from my earlier mail 
to the Debian users mail-list.
Of course, the real experts gave us R, knowing answers better than I.
Indeed, I answer these questions from the perspective of a novice.  
I include here a little about the R developers, where to get R,
some documentation sources, and at the end some R entries 
to get a quick feel for R. 

The Comprehensive R Archives (CRAN) primary site resides at
	http://www.ci.tuwien.ac.at/R/contents.html
because of communication costs at the primary development site in New Zealand
[sorry, I originally credited Australia].
There, 
	Ross Ihaka 		ihaka@stat.auckland.ac.nz
and
	Robert Gentleman 	rgentlem@stat.auckland.ac.nz
primarily develop R at the University of Auckland.
The core group of developers now extends to 
	Peter Dalgaard		p.dalgaard@biostat.ku.dk
	Kurt Hornik		Kurt.Hornik@ci.tuwien.ac.at
	Friedrich Leisch 	Friedrich.Leisch@ci.tuwien.ac.at
	Thomas Lumley		thomas@biostat.washington.edu
	Martin Maechler		maechler@stat.math.ethz.ch
	Paul Murrell, 
	Heiner Schwarte, 
and
	Luke Tierney		luke@stat.umn.edu

I repeatedly see others names who actively help.
This many people contributing to R probably equals those actually developing
SPlus  (R is so much like SPlus that our SPlus code has run under R with nary 
a change).

I understand SPlus has 90 people, many who sell and work on authorization 
schemes.  I understand perhaps only 10 people actually work on the SPlus code. 
They must put much effort into what the SPlus Installation Manual 
reflects: its dozens of pages covers almost solely various schemes 
to avoid going over your licensed limit.  
Even then, their manual did not give a hint how to handle serving but
two licenses to what could be either SunOS or Solaris users.  
Only a Unix-like guess and our writing a shell script resolved that problem.  

As a statistician, I consider the viability of software by the breadth 
(across countries) and count of developers. 
I supect R development efforts match those of SPlus.
Indeed, it was the mere existence of R that indicated to me that
SPlus must be useful (rather like the existence of Octave indicates 
the usefulness of Matlab).
There never appeared similar software for the often used "sas"
(Robert Morrison, Oklahoma State University, still has the 76,000
computer cards from just before commercial-sas took the publicly created 
sas code). 

I see in this "R" development group a level of organization and public access
analogous to that of Debian Linux.  Of course, the "R" group solves
statistical/mathematical problems as much as it solves computer problems.
They will be converting to GNU coding standards.

The computer community often funds projects like Debian Linux, and the 
community will probably support word-processor and spreadsheet development.  
Something like R represents more of a nitch market, though every graduate 
student must take at least one statistics course.  As a nitch market, R should 
be supported by us statisticians more than other computer software.  Such 
support would  further the ideas behind GNU and not leave barren GNU software 
for the nitch field statistics.

R and SPlus run like a Mazaratti, while other statistical packages like sas 
run like elephants, lacking flexibility and cutting edge procedures. I heard 
of one engineering college that abandoned sas for SPlus.  I am thankful for 
statistical packages; I am ecstatic about R.

#########################
The following answers some questions about getting and using "R".
Since I sent my original message to the Debian Users' Mailing List,
I presume you use Debian Linux.
It's difficult searching for software having but one letter, "r", since 
you can't reasonably search for "r" or "r-", though you might search 
for "r-base".
You can get "R" from a math directory of a Debian mirror site in the 
hamm distribution,
sometimes in .../hamm/hamm, sometimes in .../hamm/contrib.
In particular, I installed from the site
	 ftp://ftp.debian.org/pub/linux/distributions/debian/hamm/
the following four "R" packages
	.../hamm/binary-i386/math/r-base_0.61.1-3.deb
	.../hamm/binary-i386/math/r-cran_0.61-1.deb
	.../hamm/binary-i386/math/r-mlbench_0.61-1.deb
	.../non-free/binary-i386/math/r-cran-non-free_0.61-2.deb

You can also get the latest debianized packages from the primary site
	 ftp://franz.stat.wisc.edu/pub/R/bin/i386-linux/Debian-2.0/*
or from CRAN archive sites (which have more than just Debian packages)
	http://www.ci.tuwien.ac.at/R  #master site
	http://lib.stat.cmu.edu/R/CRAN
	ftp://franz.stat.wisc.edu/pub/R
which have the directories
	src/base
	src/contrib
	doc
	bin     #has binaries for *.deb and *.rpm.
On installation of the Debian packages, most of these "R" packages install
libraries in /usr/lib/R/library, so you have a non-"R" way to see 
available libraries.
In R, you load these installed libraries with, eg,
	library(stepfun)
and get information about a library with
	library(help=stepfun)
You can access the very good online manual by entering, eg,
	 help(read.table)
or see commands having the word "print",
	 apropos("print")
or an html version of help
	 help.start()
or the latest html version via internet from	
	 http://www.stat.math.ethz.ch/R/manual
When starting, you really need a brief introduction to R.

Many people prefer the 47 page 
	Introductory Guide to S-Plus   by B.D.Ripley
	ftp://markov.stats.ox.ac.uk/SGuide.ps1.z
This is dated 1994, and asks that the reader work with the practice 
library "ripley", whose parts appear to come with the r-base debian package.
You won't need "library(ripley)" mentioned in the B.D.Ripley documentation, 
but to get data like "trees" (from the "ripley" library)
mentioned by Ripley you need to use instead, eg, 
	data(trees)
You can then use 
	attach(trees) 
when mentioned in B.D.Ripley.

While the the online manual itself is probably the official documentation,
a better beginners guide is the 85 page document,
	Notes on R:  A Programming Environment for Data Analysis and Graphics 
	by Bill Venables & Dave Smith
	ftp://franz.stat.wisc.edu/pub/R/doc/Rnotes.ps.gz
They still work at converting this document, dated 1997, from S-Plus to R, 
so a few comments pertain only to S-Plus.

Note: the above two documents' authors Ripley and Venables, produced the 
Springer-Verlag "Modern Applied Statistics with S-Plus".  All S-Plus and R 
authors refer to the standard texts "The NEW S Language" by Becker, R.A., 
Chambers, and Wilks, 1988; and "Statistical Models in S" by Chambers, J.M. and 
Hastie, T.J. eds, 1992.





Answers to Frequently Asked Questions can be found at
        http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html

You can use subscribe to the mailing-list  r-help@stat.math.ethz.ch  
by sending in the "body" (NOT the "subject") "subscribe" to
	r-help-request@stat.math.ethz.ch
This mail-list runs about 10 messages a day.
An archive of them can be found at
	http://www.ci.tuwien.ac.at/R/doc/mail-archives/



#########################
R is unix like, so vi users can enter <esc><k> to go up to previous commands
and then do vi-like editing, just as they do in the "bash" shell.
When you become familiar with R, you will probably prefer to store data in what
R calls a "data frame", often using the function "read.table()".
For example, if you have the file /tmp/zz
	alpha,beta,gamma
	5,4,3
	2,1,10
	9,8,7
Then in R you can read in this data, with the first row as labels,
	 aa <- read.table("/tmp/zz",header=T,sep=",")
You can change the decimals printed (though internal precision remains high) 
with
	options(digits=10)
You can edit an existing R object "trees" with
	trees.new <- vi(trees)
or
	options(editor="vim")
	trees.new <- edit(trees)
On exiting, "R" optionally stores your data in 
	.RData
In this file resides the file .First, which runs on "R" startup.




So that you might get an immediate sense of useability, once installed,
start R with
	R
then at the R prompt,
	1/3
	sqrt(2*pi)
	aa <- c(1,3,9)	#"c" concatenate to create the vector (1,3,9)
	aa		# display the aa vector
	x <- rnorm(50)	#generates 50 pseudo-random numbers
	y <- rnorm(x)	#generates 50 pseudo-random numbers
	plot(x,y)	#plots on a separate window
	help(plot)	#help on "plot"
	identify(x,y)	#then click points on the graph to identify them 
	data()		#lists available data sets with which to muck about 
	data(trees)	#includes a trees dataset like in B.D.Ripley 
	trees		#prints a data-frame for trees data
	attach(trees)	#make trees variables accessible without "trees$Volume"
	hist(Girth)	#histogram of variable Girth in trees data
	pairs(trees)	#plots all possible pairwise plots 
	plot(Girth,Volume)	#plot of Girth by Volume 
	dummy.results <- lm(Volume ~ Girth)	#linear regression
	summary(dummy.results)	#print results stored in dummy.results list
	q()		#quit





-- 
Jim Burt, NJ9L,		Fairfax, Virginia, USA
jameson@mnsinc.com	http://www.mnsinc.com/jameson
jameson@pressroom.com

"It is not the shortcomings of others, nor what others have done or not
 done that one should think about, but what one has done or not done oneself."
--Dhammapada   ["dp" command for quotes from the Dhammapada, in Linux]



--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
debian-user-request@lists.debian.org . 
Trouble?  e-mail to templin@bucknell.edu .


Reply to: