Re: Request for Comments/Feedback
Okay, people don't like tarballs. :-)
I don't know if testing is an adequate description for the
feedback I was looking for. Testing is certainly part of what
would be nice. I will test with what data I have, but finding
other data sets would be nice.
Perl and CPAN have established names and namespaces for many kinds
of modules. I am not sure if the names I am proposing are
correct:
Map::MQTree
Map::IDW
Math::BrentOO
Statistics::Outlier::Peirce
Statistics::CV::LOO
While Inverse Distance Weighting is a generic means of
interpolation, and is often used in GIS work, I am not really a
fan of the algorithm. If it was a good interpolation method, it
probably belongs in the Math:: or Algorithm:: namespaces. If it
is only going to be used for maps, having it in Map:: makes sense.
And I would consider replacing Map:: with GIS::. However, I can
see this work also being useful for working with surface chemistry
in materials science. It is probably ueful in other places as
well.
The main module of this work is Map::MQTree. Instead of pivoting
on the centre of a region, and dividing things into 4 equally
sized sub-regions, this pivots on the median (X,Y), and divides
things (almost) equally population-wise (quadrant 1 and 3 have
about the same population, and quadrant 2 and 4 have about the
same population). This is an extention of an idea for an existing
module in the Algorithm:: namespace with deals with QuadTrees.
The data problem I am first looking at, is similar to what farmers
are going to have: they have traversed the field N times with GPS
(maybe even DGPS), and they want to build an accurate topographic
model of a pice of land (not necessarily rectangular). I just
happened to be using a lawnmower (6 foot cut) on about 1.8
hectares. And after cutting that lawn 7 times, I have about 18k
points, which is about 1 point per square meter.
I want to iteratively remove the the elevation component from data
as the data is processed in the QuadTree development. To do this,
I will fit a thin plate spline to selected data, at each level of
the quad tree. By the time the number of data points in a
quadrant drops below 61 points, Hopefully most of the elevation
data has become near Gaussian, so that applying Peirce's Criteria
for detecting outliers works (once per subtree). In order to fit
the thin plate spline to selected data in the quadrant, I am going
to use Leave One Out Cross Validation. And I'll use Brent's
Method to find the best value for the smoothing constant
(regularization constant) of the thin plate spline.
The above set of proposed modules are complete, except for the
cross validation module. I couldn't see how I could use
J.A.R.Williams original Math::Brent to find the best smoothing
constant in its current form. Especially since there was no way
to limit the bracketing process. And as near as I can tell, there
is no meaning to a negative smoothing constant for thin plate
splines.
Brent is done, so now I can go back to cross validation, to see if
I can figure out how to make this work through inheritance. But,
being an old FORTRAN programmer from way back, OO often drives me
nuts.
As mentioned in the first note, if people know of data (nominally
uniform, or highly condensed (such as travelling bicycle paths))
that has different error characteristics than what the tracklogs
from my Garmin Summit produces (which is not a directed random
walk, as it has a magnetic compasss and a barometric sensor inputs
to the Kalman filter solution), that would be useful.
Thanks.
Gord
Reply to: