[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: simple database solution without root access



kamaraju kusumanchi wrote:
> I have some data in text format organized as follows
> 
> field_1,field_2,field_3,...,field_9
> val_1_1,val_1_2,val_1_3,...,val_1_9
> val_2_1,val_2_2,val_2_3,...,val_2_9
> ...
> val_100_1,val_100_2,val_100_3,...,val_100_9
> 
> 
> I want to do database (sql) like operations on this data. For example,

It sounds like you what you want is what many call "data munging".
Such as "data munging with X" where X is awk, perl, python, ruby, or
other programming language.  There is even a classic book Data Munging
with Perl that specializes in such things.  I tend to use awk for such
things since they can be done as a one-liner fairly easily.

>         - I want to find all the entries that have a particular value in
> field_2 and then use the hits to sum up the values in field_8

  awk -F, '$2==5{sum+=$8}END{printf("%d\n",sum);}' datafile

Of course this assumes a simple csv format.  If there are quoted
commas in the data then full CSV parsing is needed.  But if not then
this works well.

>         - I want to find all the entries that have a particular value in
> field_2 and filed_7

  awk -F, '$2==42 && $7 == 37' datafile

[The default action is to print so no explicit action is needed.]

>         - If the values in the first column are dates, I want to find all
> the entries in field1 that are in a given date range

  awk -F, '$1 ~ /PATTERN1/ && $1 ~ /PATTERN2/' datafile

> These are just a few examples... and the actual operations can be more
> complex.

And so for for me with the above.  All of those can be done as
one-liners with perl, python, ruby, other too.

> The data is used only by myself. I will be the only person
> updating/deleting the entries.
> 
> Is there a simple way to achieve what I am after?

Personally I would use the "data munging with X language" paradigm.  :-)

> I looked at mysql (http://zetcode.com/databases/mysqltutorial/), but there
> the configuration seems to focus on having a central database that could be
> shared across different users. This does not work for me for three reasons:

Are you aware of SQLite?

> 1) I want the data to be stored in my home directory. So, I can backup the
> data as I please.

With SQLite you don't have a central server.  You just have a local
database file.  So if you must have a database then I suggest SQLite.

However I don't think you need a database yet at this time.

> 2) I want the data to be in text format. Since some times, I do
> sed/grep/awk the data myself. Also, this helps me to edit the data file
> directly using editors such as vim.

If you are already sed/grep/awk then stop at awk.  :-)

Seriously though what do you want to do that can't be done easily with awk?

> 3) I do not have root access and am trying to avoid compiling large
> software applications.

Of course!  Everything is already readily available to you without
compiling anything using the basic utilities.

> Any ideas, pointers to existing code (perl scripts?) are much appreciated.

If you like perl then:

  perl -F, -lane 'END{print $t} $t += $F[7] if $F[1] == 5'

And similar for other languages.  You could even load a full CSV
module if needed.

Bob

Attachment: signature.asc
Description: Digital signature


Reply to: