[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: OT: file system versus databases



Paul:

Thanks for the response.

> I will guess that, at your company, there are very few updates of this
> transaction data, New transactions are added to the record as they
> happen.  The *.txt files may contain references to prior transactions,
> but these are human readable text, not some sort of computer
> actionable links, or pointers.

Correct, most of this data is actually sorted by date.

For instance, country/2005/01/01/foo.txt

So, the data is  never duplicated.

I was just curious how this method is much faster than DBMS for
access. It seems when we know what data we are looking for by date,
the data retreivel is very fast when its on a Unix filesystem.

For instance, grep "something" country/2005/??/01/foo.txt

It gives an instant result. Thats how we are using it and we love it.

TIA



On Tue, Feb 24, 2009 at 10:06 AM, Paul E Condon
<pecondon@mesanetworks.net> wrote:
> On 2009-02-23_23:28:22, Mag Gam wrote:
>> I was curious why this was faster:
>>
>> At our company we store close to 50TB of  certain transaction data and
>> we stored it on a UNIX filesystem raw without any DBMS help.
>
> I will guess that, at your company, there are very few updates of this
> transaction data, New transactions are added to the record as they
> happen.  The *.txt files may contain references to prior transactions,
> but these are human readable text, not some sort of computer
> actionable links, or pointers.
>
> But are you sure that your current system never loses any transaction
> history because of glitches in adding to the record? Are there
> back-references (in text) to transistions that can't be found in the
> record when someone decides to look for them? Does your company ever
> have a disagreement with a customer of a supplier in which your
> counter-party claims that your transaction record is not a record of
> what really happened, but just wishful thinking of your management?
>
> If not, you really don't need to trouble yourself about DBMS, but ...
>
> In a real, full up DBMS, there is something called 'ACID'.  This
> stands for four features that distinguish a real DBMS, from a
> not-so-real DBMS. They concern DBMS transactions, not real business
> transitions, but changes in the database that DBMS gurus call
> transactions.
>
> These DBMS transactions must be Atomic, Consistent, Isolated, and
> Durable. Those DBMS features are very difficult to implement,
> especially once one understands the full implications of what these
> words mean to real DBMS gurus. Google 'ACID' and follow the
> links. Read the docs of PostgreSQL, which is an open-source real DBMS'.
> Be cautious about MySQL. It has a long history of being not
> 'ACID', and struggling to understand it.
>
> But, by all means, don't get involved in DBMS if you don't really
> need it. OTOH, if the company really does need it, but doesn't
> realize the danger of not having it --- perhaps you can be a savior.
>
> HTH
>>
>> For example:
>> country/A/name/A.txt
>> country/B/name/B.txt
>> country/C/name/C.txt
>> and so on...
>> We have close to 500 million entries in this format.
>>
>>
>> When we do a read() on a file, its very fast and we enjoy it. Would we
>> get a similar performance if we use a database and index it?
>>
>> TIA
>>
>>
>> --
>> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
>> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
>>
>>
>
> --
> Paul E Condon
> pecondon@mesanetworks.net
>
>
> --
> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
>
>


Reply to: