[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: good multivariate regression packages?



On 26 November 2005 at 00:42, Maciej Kalisiak wrote:
| On 11/25/05, Dirk Eddelbuettel <edd@debian.org> wrote:
| just the one whose plot I've linked to, and so in general I will not
| have a good idea of the underlying model.  I would expect that quite
| often it will not be linear.

That would points in favour of R as you can experiment easily ...

| I should mention that this has led me to believe I should be looking
| at nonparametric models, and have thus looked at using libsvm (not in
| Debian), a Support Vector Machine library/tools.  These seemed

The e1071 package on CRAN has (some) SVM support. As I recall, there was also
an R News article on it.

Also, the 'CRAN Task Views' may be of interest, see

  http://cran.r-project.org/src/contrib/Views
  http://cran.r-project.org/src/contrib/Views/MachineLearning.html

which mentions the kernlab and klaR packages as having SVM code.

| regression, but I encountered two issues.  1) I had the feeling that R
| is not particularly lean, in that it is voracious with regards to
| memory, especially when dealing with large data sets, and is not

R is designed as an interactive environment for programming with data, as
well as a language. Leanest possible resource usage isn't a goal; some batch
processors may be better for you there. That said, R will have more methods.
Resource consumption has improved vastly over the last few releases, and the
language is pretty fast when used well (vectorise, vectorise, vectorise, :)
and offers pretty easy and documented ways to link in C/C++/Fortran/... code. 

| particularly speedy; 2) there is an outright overabundance of
| regression methods in R, to the point where I am drowning in
| information; with my meagre knowledge of regression methods I cannot
| assess which methods would be most appropriate to my task, what the
| tradeoffs, advantages and disadvantages are, etc.  I made a similar

Oh come, then R is even better as the methods are there, as are documentation
and pointers to further reference.

If you took your argument to its conclusion, you'd draw a straight regression
line and be done with "because that's all we know and care about". Not. 

| posting to the R users list, but it did not yield any replies.

Well-articulated queries always get responses, whereas scatter-shot questions
("I have these data set, can you tell me what model to use") have a harder
time, understandably.

| Because of these two issues I was thinking I might have more luck with
| a more specialized package, one which centers only on regression, as
| it might be more optimized, and have better regression-specific
| documentation...

The statisticians on the r-help list would probably advise you to consult
with a statistician. They do have a point ...
 
Lacking that, you could always peruse books like Hastie, Tibshirani,
Friedman, "The Elements of Statistical Learning" but you probably already
knew that :)

| > Greetings back to Ontario, and good luck,  Dirk
| 
| Whereabouts in Ontario?  Just curious.  Hmm, is there a map

Kingston 93/94/95 to 97, Toronto 1997 to 2000. I lived right north of U of T
in the Annex :) Now it's Chicago.

| illustrating the geo-distribution of Debian developers?  Wondering how
| many of you are hiding in this neck of the woods...  :)

Yes, there have been maps though I don't have a URL ready.

Cheers, Dirk

-- 
Statistics: The (futile) attempt to offer certainty about uncertainty.
         -- Roger Koenker, 'Dictionary of Received Ideas of Statistics'



Reply to: