[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Request for Comments/Feedback



Okay, people don't like tarballs.  :-)

I don't know if testing is an adequate description for the 
feedback I was looking for.  Testing is certainly part of what 
would be nice.  I will test with what data I have, but finding 
other data sets would be nice.

Perl and CPAN have established names and namespaces for many kinds 
of modules.  I am not sure if the names I am proposing are 
correct:

Map::MQTree
Map::IDW
Math::BrentOO
Statistics::Outlier::Peirce
Statistics::CV::LOO

While Inverse Distance Weighting is a generic means of 
interpolation, and is often used in GIS work, I am not really a 
fan of the algorithm.  If it was a good interpolation method, it 
probably belongs in the Math:: or Algorithm:: namespaces.  If it 
is only going to be used for maps, having it in Map:: makes sense.  
And I would consider replacing Map:: with GIS::.  However, I can 
see this work also being useful for working with surface chemistry 
in materials science.  It is probably ueful in other places as 
well.

The main module of this work is Map::MQTree.  Instead of pivoting 
on the centre of a region, and dividing things into 4 equally 
sized sub-regions, this pivots on the median (X,Y), and divides 
things (almost) equally population-wise (quadrant 1 and 3 have 
about the same population, and quadrant 2 and 4 have about the 
same population).  This is an extention of an idea for an existing 
module in the Algorithm:: namespace with deals with QuadTrees.

The data problem I am first looking at, is similar to what farmers 
are going to have: they have traversed the field N times with GPS 
(maybe even DGPS), and they want to build an accurate topographic 
model of a pice of land (not necessarily rectangular).  I just 
happened to be using a lawnmower (6 foot cut) on about 1.8 
hectares.  And after cutting that lawn 7 times, I have about 18k 
points, which is about 1 point per square meter.

I want to iteratively remove the the elevation component from data 
as the data is processed in the QuadTree development.  To do this, 
I will fit a thin plate spline to selected data, at each level of 
the quad tree.  By the time the number of data points in a 
quadrant drops below 61 points, Hopefully most of the elevation 
data has become near Gaussian, so that applying Peirce's Criteria 
for detecting outliers works (once per subtree).  In order to fit 
the thin plate spline to selected data in the quadrant, I am going 
to use Leave One Out Cross Validation.  And I'll use Brent's 
Method to find the best value for the smoothing constant 
(regularization constant) of the thin plate spline.

The above set of proposed modules are complete, except for the 
cross validation module.  I couldn't see how I could use 
J.A.R.Williams original Math::Brent to find the best smoothing 
constant in its current form.  Especially since there was no way 
to limit the bracketing process.  And as near as I can tell, there 
is no meaning to a negative smoothing constant for thin plate 
splines.

Brent is done, so now I can go back to cross validation, to see if 
I can figure out how to make this work through inheritance.  But, 
being an old FORTRAN programmer from way back, OO often drives me 
nuts.

As mentioned in the first note, if people know of data (nominally 
uniform, or highly condensed (such as travelling bicycle paths)) 
that has different error characteristics than what the tracklogs 
from my Garmin Summit produces (which is not a directed random 
walk, as it has a magnetic compasss and a barometric sensor inputs 
to the Kalman filter solution), that would be useful.

Thanks.
Gord


Reply to: