Re: OT: file system versus databases
On 2009-02-23_23:28:22, Mag Gam wrote:
> I was curious why this was faster:
>
> At our company we store close to 50TB of certain transaction data and
> we stored it on a UNIX filesystem raw without any DBMS help.
I will guess that, at your company, there are very few updates of this
transaction data, New transactions are added to the record as they
happen. The *.txt files may contain references to prior transactions,
but these are human readable text, not some sort of computer
actionable links, or pointers.
But are you sure that your current system never loses any transaction
history because of glitches in adding to the record? Are there
back-references (in text) to transistions that can't be found in the
record when someone decides to look for them? Does your company ever
have a disagreement with a customer of a supplier in which your
counter-party claims that your transaction record is not a record of
what really happened, but just wishful thinking of your management?
If not, you really don't need to trouble yourself about DBMS, but ...
In a real, full up DBMS, there is something called 'ACID'. This
stands for four features that distinguish a real DBMS, from a
not-so-real DBMS. They concern DBMS transactions, not real business
transitions, but changes in the database that DBMS gurus call
transactions.
These DBMS transactions must be Atomic, Consistent, Isolated, and
Durable. Those DBMS features are very difficult to implement,
especially once one understands the full implications of what these
words mean to real DBMS gurus. Google 'ACID' and follow the
links. Read the docs of PostgreSQL, which is an open-source real DBMS'.
Be cautious about MySQL. It has a long history of being not
'ACID', and struggling to understand it.
But, by all means, don't get involved in DBMS if you don't really
need it. OTOH, if the company really does need it, but doesn't
realize the danger of not having it --- perhaps you can be a savior.
HTH
>
> For example:
> country/A/name/A.txt
> country/B/name/B.txt
> country/C/name/C.txt
> and so on...
> We have close to 500 million entries in this format.
>
>
> When we do a read() on a file, its very fast and we enjoy it. Would we
> get a similar performance if we use a database and index it?
>
> TIA
>
>
> --
> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
>
>
--
Paul E Condon
pecondon@mesanetworks.net
Reply to: