[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Realtime statistics: is there some isues?



Hello,

It has been 3 years I'm working on a hosting software project. Domain Technologie Control (DTC) is A GPL web control panel for admin and accounting hosting services.

http://www.gplhost.com/?rub=softwares&sousrub=dtc

I have my bandwidth usage calculated each day, and my webalizer statistics generated at end of each month. Currently, I have all my logs saved using mod_log_sql and I dump it as a file, then launch webalizer with that generated access.log file. Finally the log file is compressed with bzip2, so my customer have access to row logs (which is something many users need).

I think I can handle the real time bandwidth usage rather easily by modifying mod_log_sql and my MTA loggers (mysqmail). But I'm not sure about webalizer, if I should do it or not.

I was wondering if some of you had experience running near real-time webalizer statistics, and the issues you could have encounter. I'm expecting to have rather high CPU usage. How can I handle it? How often can I start a calculation?

Remember I want to be able to have high loads like 150 GB a month in a single server (that's millions of web requests). For example, for 100GB, you can expect to have 2GB of apache logs in /var/lib/mysql/apachelogs. That's quite something!

Did someone did some benchmarking of webalizer? How much time does it takes to calculate statistics when you have let's say 1 million entry in the log file? What kind of curve can I expect? Is it exponential?

Bye, and thanks to the ones who will reply,

   Thomas







Reply to: