[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RE: Virtual Hosting and the FHS



Do you find it difficult to manage your text file database when you have
programs on different machines needing access to the data?  I use mysql
extensively in our shop because it makes it easy to access from any of our
servers, and it makes reporting easy.  I'd rather spend a few minutes
crafting an SQL query than half an hour writing code to perform the same
task over text files.

I am of the opinion that storing http hit logs in a database is stupid.  I
doubt anyone on this list has such complex and dynamic log analysis needs
that they must keep traffic logs in a database, rather than analyze the text
logfile and store summaries.  Not only does it waste disk and make it
complicated to run log common analysis software on your logs, or allow
customers to do so, it also reduces the overall throughput of your web
servers.  Thats a no-no in my book.

I would be interested in the motivations and arguments anyone on the list
has to contradict my opinion.  I'm sure it looks like I'm trying to start a
flame war, but I just cannot understand why anyone would wish to log to a
database.  Perhaps someone can enlighten me.

As far as file descriptor limits are concerned, my understanding of Apache
2.0's archeticture is that it will reduce the FD problem by using kernel
threads to share file descriptors among threads.  I don't know how that fits
into the mod_perl / php / etc. picture, though, I really have not
investigated Apache 2.0 extensively.  To be honest threading makes me afraid
my good old tools won't work anymore, or they will work but I'll have to
live without the benefit of the new thread model.

So perhaps Apache 2.0's threading benefits will only shine in areas of
static content?  If that is the case, I'll be disappointed as products like
Zeus and thttpd seem to be superior to Apache in that arena, and probably
will continue to be.

- jsw


-----Original Message-----
From: Craig Sanders [mailto:cas@taz.net.au]
Sent: Thursday, July 12, 2001 5:05 PM
To: Haim Dimermanas
Cc: Russell Coker; debian-isp@lists.debian.org
Subject: Re: Virtual Hosting and the FHS


On Thu, Jul 12, 2001 at 10:00:57AM -0500, Haim Dimermanas wrote:
> > any script i need to write can just open the virtual-hosts.conf file
> > and parse it (it's a single line, colon-delimited format) to find
> > out everything it needs to know about every virtual host.
>
>  I used to do it that way and then I discovered something called a
>  database.

i've considered using postgres for this but am resisting it until the
advantages greatly outweigh the disadvantages.

why complicate a simple job with a database? plain text configuration is
perfect for a task of this size.

it takes a lot longer to edit a database entry than it does to edit a
text file with vi.

i'd lose the ability to check-in all changes to RCS if i used a database
instead of a text file.

to get these features, i'd have to write a wrapper script to dump the
config database to a text file, run vi, and then import the database
from the edited file. that still wouldn't get around the fact that you
can put comments in text files - you can't in databases.

in short: databases are appropriate for some tasks, but not all.

> It makes it a lot easier to delete an entry and prevent duplicates.

huh? it takes no time at all to run "vi virtual-hosts.conf" and comment
our or delete a line.


> > i need to split up the log files so that each virtual domain can
> > download their raw access logs at any time. having separate error
> > log files is necessary for debugging scripts too (and preserving
> > privacy - don't want user A having access to user B's error logs).
>
>  I strongly suggest you invest some time looking into a
> way to put the access log into a database. Something like
> http://freshmeat.net/projects/apachedb/.

i wrote my own code a year ago to store logs in postgres (mysql is a
toy). it had it's uses but i decided it was a waste of disk space and
it made archiving old logs a pain. it greatly complicated the task of
allowing users to download their log files.

i went back to log files.

i'm a strong believer in the KISS principle, and see no need to add
unneccesary complication, especially for such little benefit.


> My research showed that web hosting customers don't look at their
> stats every day. Even if they did, your stats are generated
> daily. Having the logs in a database allows you to generate the stats
> on the fly. Now with a simple caching system that keeps the stats
> until midnight, you can save yourself a lot of machine power.

not relevant.

1. my customers want raw log files. the fact that i run webalizer
for them is a nice bonus, but what they insist on having is the raw
logs downloadably by ftp whenever they want (within a time limit -
we don't keep old logs forever). that's fine by me - stats are their
responsibility.

2. cpu usage is basically irrelevant on a machine which is I/O bound.

3. caching the stats pages defeats the purpose of generating them on the
fly.

4. generating stats on the fly is more expensive CPU and I/O wise than
running webalizer once/night and generating static html stats pages.

5. adding more boxes to the web farm is pretty easy with a properly
designed load-balancer system.


> > the only trouble is that means at least 2 log files open per vhost
> > per apache process...on one of my machines, that means 344 log files
> > open per process, * 50 processes (average) = 17,200 log files open.
>
>  Read http://httpd.apache.org/docs/vhosts/fd-limits.html

i read it years ago. i'm fully aware of the issues regarding
file-descriptor limits.

> > that obviously is not very scalable.
>
>  That's a nice way to put it. Another way to put it would be "it's not
> gonna work".

no. it does work. it's working right now, with that many log files open.

it's not scalable. looking at current growth patterns, i reckon i've got
a few months to come up with a long-term solution before it becomes a
serious problem.

craig

--
craig sanders <cas@taz.net.au>

Fabricati Diem, PVNC.
 -- motto of the Ankh-Morpork City Watch


--
To UNSUBSCRIBE, email to debian-isp-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact
listmaster@lists.debian.org



Reply to: