Re: SPlus balked at Linux; R: statistician's dream
This message addresses several private questions about R from my earlier mail
to the Debian users mail-list.
Of course, the real experts gave us R, knowing answers better than I.
Indeed, I answer these questions from the perspective of a novice.
I include here a little about the R developers, where to get R,
some documentation sources, and at the end some R entries
to get a quick feel for R.
The Comprehensive R Archives (CRAN) primary site resides at
http://www.ci.tuwien.ac.at/R/contents.html
because of communication costs at the primary development site in New Zealand
[sorry, I originally credited Australia].
There,
Ross Ihaka ihaka@stat.auckland.ac.nz
and
Robert Gentleman rgentlem@stat.auckland.ac.nz
primarily develop R at the University of Auckland.
The core group of developers now extends to
Peter Dalgaard p.dalgaard@biostat.ku.dk
Kurt Hornik Kurt.Hornik@ci.tuwien.ac.at
Friedrich Leisch Friedrich.Leisch@ci.tuwien.ac.at
Thomas Lumley thomas@biostat.washington.edu
Martin Maechler maechler@stat.math.ethz.ch
Paul Murrell,
Heiner Schwarte,
and
Luke Tierney luke@stat.umn.edu
I repeatedly see others names who actively help.
This many people contributing to R probably equals those actually developing
SPlus (R is so much like SPlus that our SPlus code has run under R with nary
a change).
I understand SPlus has 90 people, many who sell and work on authorization
schemes. I understand perhaps only 10 people actually work on the SPlus code.
They must put much effort into what the SPlus Installation Manual
reflects: its dozens of pages covers almost solely various schemes
to avoid going over your licensed limit.
Even then, their manual did not give a hint how to handle serving but
two licenses to what could be either SunOS or Solaris users.
Only a Unix-like guess and our writing a shell script resolved that problem.
As a statistician, I consider the viability of software by the breadth
(across countries) and count of developers.
I supect R development efforts match those of SPlus.
Indeed, it was the mere existence of R that indicated to me that
SPlus must be useful (rather like the existence of Octave indicates
the usefulness of Matlab).
There never appeared similar software for the often used "sas"
(Robert Morrison, Oklahoma State University, still has the 76,000
computer cards from just before commercial-sas took the publicly created
sas code).
I see in this "R" development group a level of organization and public access
analogous to that of Debian Linux. Of course, the "R" group solves
statistical/mathematical problems as much as it solves computer problems.
They will be converting to GNU coding standards.
The computer community often funds projects like Debian Linux, and the
community will probably support word-processor and spreadsheet development.
Something like R represents more of a nitch market, though every graduate
student must take at least one statistics course. As a nitch market, R should
be supported by us statisticians more than other computer software. Such
support would further the ideas behind GNU and not leave barren GNU software
for the nitch field statistics.
R and SPlus run like a Mazaratti, while other statistical packages like sas
run like elephants, lacking flexibility and cutting edge procedures. I heard
of one engineering college that abandoned sas for SPlus. I am thankful for
statistical packages; I am ecstatic about R.
#########################
The following answers some questions about getting and using "R".
Since I sent my original message to the Debian Users' Mailing List,
I presume you use Debian Linux.
It's difficult searching for software having but one letter, "r", since
you can't reasonably search for "r" or "r-", though you might search
for "r-base".
You can get "R" from a math directory of a Debian mirror site in the
hamm distribution,
sometimes in .../hamm/hamm, sometimes in .../hamm/contrib.
In particular, I installed from the site
ftp://ftp.debian.org/pub/linux/distributions/debian/hamm/
the following four "R" packages
.../hamm/binary-i386/math/r-base_0.61.1-3.deb
.../hamm/binary-i386/math/r-cran_0.61-1.deb
.../hamm/binary-i386/math/r-mlbench_0.61-1.deb
.../non-free/binary-i386/math/r-cran-non-free_0.61-2.deb
You can also get the latest debianized packages from the primary site
ftp://franz.stat.wisc.edu/pub/R/bin/i386-linux/Debian-2.0/*
or from CRAN archive sites (which have more than just Debian packages)
http://www.ci.tuwien.ac.at/R #master site
http://lib.stat.cmu.edu/R/CRAN
ftp://franz.stat.wisc.edu/pub/R
which have the directories
src/base
src/contrib
doc
bin #has binaries for *.deb and *.rpm.
On installation of the Debian packages, most of these "R" packages install
libraries in /usr/lib/R/library, so you have a non-"R" way to see
available libraries.
In R, you load these installed libraries with, eg,
library(stepfun)
and get information about a library with
library(help=stepfun)
You can access the very good online manual by entering, eg,
help(read.table)
or see commands having the word "print",
apropos("print")
or an html version of help
help.start()
or the latest html version via internet from
http://www.stat.math.ethz.ch/R/manual
When starting, you really need a brief introduction to R.
Many people prefer the 47 page
Introductory Guide to S-Plus by B.D.Ripley
ftp://markov.stats.ox.ac.uk/SGuide.ps1.z
This is dated 1994, and asks that the reader work with the practice
library "ripley", whose parts appear to come with the r-base debian package.
You won't need "library(ripley)" mentioned in the B.D.Ripley documentation,
but to get data like "trees" (from the "ripley" library)
mentioned by Ripley you need to use instead, eg,
data(trees)
You can then use
attach(trees)
when mentioned in B.D.Ripley.
While the the online manual itself is probably the official documentation,
a better beginners guide is the 85 page document,
Notes on R: A Programming Environment for Data Analysis and Graphics
by Bill Venables & Dave Smith
ftp://franz.stat.wisc.edu/pub/R/doc/Rnotes.ps.gz
They still work at converting this document, dated 1997, from S-Plus to R,
so a few comments pertain only to S-Plus.
Note: the above two documents' authors Ripley and Venables, produced the
Springer-Verlag "Modern Applied Statistics with S-Plus". All S-Plus and R
authors refer to the standard texts "The NEW S Language" by Becker, R.A.,
Chambers, and Wilks, 1988; and "Statistical Models in S" by Chambers, J.M. and
Hastie, T.J. eds, 1992.
Answers to Frequently Asked Questions can be found at
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
You can use subscribe to the mailing-list r-help@stat.math.ethz.ch
by sending in the "body" (NOT the "subject") "subscribe" to
r-help-request@stat.math.ethz.ch
This mail-list runs about 10 messages a day.
An archive of them can be found at
http://www.ci.tuwien.ac.at/R/doc/mail-archives/
#########################
R is unix like, so vi users can enter <esc><k> to go up to previous commands
and then do vi-like editing, just as they do in the "bash" shell.
When you become familiar with R, you will probably prefer to store data in what
R calls a "data frame", often using the function "read.table()".
For example, if you have the file /tmp/zz
alpha,beta,gamma
5,4,3
2,1,10
9,8,7
Then in R you can read in this data, with the first row as labels,
aa <- read.table("/tmp/zz",header=T,sep=",")
You can change the decimals printed (though internal precision remains high)
with
options(digits=10)
You can edit an existing R object "trees" with
trees.new <- vi(trees)
or
options(editor="vim")
trees.new <- edit(trees)
On exiting, "R" optionally stores your data in
.RData
In this file resides the file .First, which runs on "R" startup.
So that you might get an immediate sense of useability, once installed,
start R with
R
then at the R prompt,
1/3
sqrt(2*pi)
aa <- c(1,3,9) #"c" concatenate to create the vector (1,3,9)
aa # display the aa vector
x <- rnorm(50) #generates 50 pseudo-random numbers
y <- rnorm(x) #generates 50 pseudo-random numbers
plot(x,y) #plots on a separate window
help(plot) #help on "plot"
identify(x,y) #then click points on the graph to identify them
data() #lists available data sets with which to muck about
data(trees) #includes a trees dataset like in B.D.Ripley
trees #prints a data-frame for trees data
attach(trees) #make trees variables accessible without "trees$Volume"
hist(Girth) #histogram of variable Girth in trees data
pairs(trees) #plots all possible pairwise plots
plot(Girth,Volume) #plot of Girth by Volume
dummy.results <- lm(Volume ~ Girth) #linear regression
summary(dummy.results) #print results stored in dummy.results list
q() #quit
--
Jim Burt, NJ9L, Fairfax, Virginia, USA
jameson@mnsinc.com http://www.mnsinc.com/jameson
jameson@pressroom.com
"It is not the shortcomings of others, nor what others have done or not
done that one should think about, but what one has done or not done oneself."
--Dhammapada ["dp" command for quotes from the Dhammapada, in Linux]
--
TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
debian-user-request@lists.debian.org .
Trouble? e-mail to templin@bucknell.edu .
Reply to: