Hi all, Alioth has become an important part of the Debian infrastructure in recent years; it has been used by more and more people and teams inside Debian, as well as by some upstream projects. This growth in usage wasn't closely followed by an increase in resources, however, and the server hosting Alioth was getting more and more overloaded as time passed. It was time for Something to be Done. My name is Roland Mas, and I'm your host for this report of the Something that got Done. The admins of Alioth (that would be Stephen Gran, Tollef Fog Heen and yours truly) got bold and submitted a proposal for a sprint to the Debian project leader. Zack, being the cunning DPL that he is, promptly agreed to it, and there was no way we could renege on our proposal. We tried to invite others to join us, but basically nobody fell for it, and so the three of us got together in sunny Cambridge, England from the 20th of May to the 22nd. We were provided with our basic requirements (meeting room, power, networking, whiteboard and coffee) by Collabora, which we would like to thank for their hospitality. We started by getting our hands on vasks.debian.org, setting it up with Squeeze, and copying most of the data over from the old Alioth (which was hosted in a Xen domU on wagner.d.o). Actually, we started by stopping all services on old-alioth, in order to free some I/O bandwidth for the data transfer; even though, that took quite some time. Old-alioth was just across the Ethernet switch, but its disk setup was very sub-optimal. Once the data transfer was started and we had removed every bottleneck we could have an influence on, we got down to setting up the FusionForge instance. (A version based on the 5.1 upstream branch, with a few Alioth-specific patches.) The database and web interface were mostly operational by the time we called it a day. We left the data transfer to its own devices, and adjourned to the nearest curry house. There was pub time afterwards, although you'll have to get the details from Tollef and/or Stephen, because your editor decided he needed sleep more than alcohol. Saturday morning. The data transfer was done, so we got down to serious sysadminning and bikeshedding about names and sizes for LVM volumes, hostnames, where each hostname would point, what the URLs would look like, and so on. Once that was decided, I, being the official FusionForge guy, focused on fixing the problems we encountered on that front (on vasks.d.o) while my honoured colleagues wondered how to do a remote reinstallation of wagner.d.o without a remote console. I wasn't the most attentive of spectators, being out-quirking PHP quirks at that time, but from what I got it was akin to sawing the branch you're sitting on while it was suspended (with bits of string) from the branch above it, which you're then going to saw off too. And there are spikes on the snake-infested ground. Anyway, after some deep magic, Qemu, three levels of Grub chainloading, deconstruction of running RAID arrays hosting root filesystems and all, wagner.d.o was running Squeeze too, no longer virtualized, and it survived reboot. So we started setting up the parts of Alioth that would run on wagner.d.o; namely, the read-only anonymous access to SCM repositories, the repository browsers, and the project websites. We also got a visit from Bazaar developer Jelmer Vernooij, whom we failed to task with interesting jobs so he ended up working on backports of Bazaar-related packages for us; these will help keeping Alioth responsive, so many thanks to him. Then it was Saturday evening, and even I couldn't weasel out of the traditional British occupation for the night. Beer and pub food were had and enjoyed, for which we were joined by local Debian release manager Neil Williams. Discussed politics, cat-herding, and "if you're lucky enough to look under 21 you will have to prove you're over 18" signs. Sunday morning happened, as it usually does, and the three of us reconvened; we thought we'd firmly attach the last few remaining dangling ends, but there were more of them than we thought. NFS mounts, DNS hacks, HTTP proxying, SMTP configuration, bug-fixing all over the place, and so on. We gradually opened up the services again, fixed stuff as we got notified of it on IRC, sent status emails, yada yada. Lunch^WFinal debriefing was had in a pub with Colin Watson and family, and interspersed with discussions of some aspects of WWII-era history and a behavioral study of the rail replacement bus and its predators in the wild. And then it was time to go back to our respective homes and countries. The following is a rough description of the new setup. vasks and wagner both run Squeeze directly (no virtualization); both have roughly the same amount of disk space (around 500 GB after RAIDing); vasks has slightly faster CPUs (four 3 GHz cores compared to four 2.2 GHz ones), wagner has a bit more RAM (16 GiB instead of 6). The load is therefore split so that the "real-time" tasks (SCM access for developers, FusionForge web interface, database) run on vasks, while the "lower priority" tasks (SCM repository browsers, projects' websites, email, local cronjobs and whatever random stuff run on Alioth) are on wagner. The projects' "Sources" tab in the FusionForge web interface should give correct URLs for the repositories of various kinds, for read/write developer access and for read-only anonymous access. After a week of running, it seems the benefits are apparent, and the load average is down to very reasonable levels on both hosts. There are still some things to fix or amend: it would be nice to preserve the old URLs as much as possible, some synchronization of homes across the servers would be desirable, we didn't necessarily reinstall all the packages that used to be there on old-alioth, and so on. While we're on this subject: we'd like to take this opportunity to remind our users not to consider Alioth as a generic and infinitely elastic hosting service. Please be considerate on what you run there, and on the amount of disk-space you use. The disks are large enough that we don't meet the limits quite yet, but there's some data that is clearly outdated and very probably useless. ISO images of multiple daily CDs, however small, aren't necessarily bad, but keeping them for years can't be right. Ditto for years-old tarballs of the SCM repositories, and for 2004-era package repositories, and so on. We're considering setting up an way to mark some files as expirable and automatically remove them after a while, but in the meantime you might want to have a look at your data and clean up what's obsolete. A few final words: we would like to apologize for the poor communication and the lack of reminders that the sprint was going to happen; we'd also like to thank the DPL and Collabora for respectively triggering and hosting this sprint. More apologies for the continued inconvenience as we keep on fixing glitches; and more thanks for bearing with us in the meantime and reporting problems. We even received patches fixing some of the problems, for which we're very grateful. Please drop by and say hi on #alioth on IRC. Only don't be *too* helpful, otherwise we might be tempted to add you to the team :-) Thank you for reading so far. This report was sent to you on behalf of the Alioth admin team, Stephen Gran, Tollef Fog Heen, and Roland Mas. -- Roland Mas Shyumiribirikku ga susunde imashyou ka ? -- Le Schmilblick en japonais
Attachment:
pgptOZpOHKhzb.pgp
Description: PGP signature