Realtime statistics: is there some isues?
Hello,
It has been 3 years I'm working on a hosting software project. Domain
Technologie Control (DTC) is A GPL web control panel for admin and
accounting hosting services.
http://www.gplhost.com/?rub=softwares&sousrub=dtc
I have my bandwidth usage calculated each day, and my webalizer
statistics generated at end of each month. Currently, I have all my logs
saved using mod_log_sql and I dump it as a file, then launch webalizer
with that generated access.log file. Finally the log file is compressed
with bzip2, so my customer have access to row logs (which is something
many users need).
I think I can handle the real time bandwidth usage rather easily by
modifying mod_log_sql and my MTA loggers (mysqmail). But I'm not sure
about webalizer, if I should do it or not.
I was wondering if some of you had experience running near real-time
webalizer statistics, and the issues you could have encounter. I'm
expecting to have rather high CPU usage. How can I handle it? How often
can I start a calculation?
Remember I want to be able to have high loads like 150 GB a month in a
single server (that's millions of web requests). For example, for 100GB,
you can expect to have 2GB of apache logs in /var/lib/mysql/apachelogs.
That's quite something!
Did someone did some benchmarking of webalizer? How much time does it
takes to calculate statistics when you have let's say 1 million entry in
the log file? What kind of curve can I expect? Is it exponential?
Bye, and thanks to the ones who will reply,
Thomas
Reply to: