On Thu, 17 Sep 2009 22:49:07 +0200 Hector Oron <hector.oron@gmail.com> wrote: Sending this to the list as the issues are now quite general. The background is issues relating to how Grip builds packages, how long it all takes and how to mirror Grip. The issues are mostly to do with available space on the machines running emdebian-grip-server and the amount of workload that emdebian-grip-server now appears to be demanding. ('ant' refers to the www.emdebian.org server.) If particular issues "hit a nerve", please file bugs against the emdebian-grip-server package in Debian. > >> rsync was already running, so it was locked. > > > > rsynd is running, rsync is not. > > Sure. I think you misunderstood me, ./signal is a <1sec task which > just tells ftp.tw.debian.org to update the repository using rsync > (above ftp.tw.debian.org was already running rsync that is why it says > it is locked, but this as the test we did to proof that push_mirror > was working. OK but the signal should probably not happen every 2 seconds for 7 hours per day. The Grip process is not staged (it would take up even more temporary space), the .deb is put into reprepro as soon as it is built, the temporary files are all deleted, then we go on to build the next .deb for the same package of the next architecture etc. i.e. we don't have incoming.emdebian.org because we don't have that much temporary space available. 'Emdebian Grip unstable' changes many hundreds of times per day. This is probably not ideal but we'd need another 30Gb of temporary space to have a dedicated incoming.emdebian.org area. 'Emdebian Grip testing' changes a few dozen times a day, depending on how many packages are migrating and how many new packages need to be gripped. This is particularly noticeable during the evenings because grip-cron has to do the migrations only after unstable has finished updating and so testing is being updated just when most people want to do 'apt-get update', resulting in apt reporting that certain combinations cannot be installed and then becoming available a little later on. We are working reprepro quite hard right now, causing a lot of churn in the Packages files and reprepro databases and that churn takes longer the more packages get involved. With a lot more temporary space, we could make things a lot more like standard Debian with twice daily runs to move packages from incoming into the real pool. (Our incoming area would be subdivided according to each component and each suite we support so that reprepro could be given a path to all .debs to put into the same suite and the same component in one operation. This means that the Packages file changes only once per push as each component and each suite have their own Packages files.) IIRC this requires someone providing a new hard drive to go into ant - check with Simon. If we do that, I'd suggest adding a lot more than just 30Gb as there seems to be little point searching for a new drive that small - may as well go for 120Gb or more. Having said that, using an incoming area and supporting that in emdebian-grip-server also means that any other machine running emdebian-grip-server needs to have similar amounts of temporary space or introduce some form of configuration (debconf question) option that retains current behaviour or uses an incoming path. (So even once the new space is available, the use of that space requires a new release of emdebian-grip-server.) If this is what users want, please file a bug report to ensure it happens. > >> Do you mind to run (with user emdebian) > >> the /home/emdebian/bin/signal script everytime you add new > >> packages to the repository? This would only work with Crush which builds one entire set of packages for one source at a time, puts them all into the repo in one lump and then waits for the autobuilder to finish the next entire batch of builds before uploading the next lot. Grip doesn't work that way, it is incremental and continuous. Each actual grip process can take such a short space of time that there would (usually) not be time for the rsync process to start before the repo changed again. Unfortunately, this even happens with testing which is quite unusual for how Debian normally works. The majority of migrations into testing happen instantaneously with reprepro but the repo is still growing and each time a new package is added (or each time testing-proposed-updates is used once Debian gets into the Squeeze release freeze), if there is a different version in testing, that version has to be gripped afresh. We're still adding packages quite often - mostly missing dependencies of existing packages where a new version adds another dependency. One issue right now is that we aren't handling removals *AT ALL*. I haven't got code to handle what ftp.debian.org does manually (and because ftpmasters do do this manually, there probably isn't code to do it automatically). I don't think this is a particular problem - removing a package from unstable and/or testing doesn't affect the size of the archive pool/ (because we have a stable release that retains it and we'll have an oldstable after that) and as the package has been removed from Debian unstable and testing, we aren't going to need to Grip the package again so it doesn't cost us runtime either. However, I'm sure this will bite us eventually and I'm not sure how to fix it other than to trust most of this to reprepro so that when oldstable is removed prior to receiving packages from stable during a release, the old versions will be removed from pool/. Having the package listed in unstable when Debian does not does bloat our Packages file (which isn't good) and possibly complicates dependency resolution on systems running Grip (because Debian does not expect libfoo0 to still be around). If we had an incoming area, we could use triggers as you initially requested but removals would still not be handled explicitly. > > Not practical. With Grip, the packages are added by the > > emdebian-grip-server package. The only real way is to do the call at > > the end. Right now, grip-cron.sh appears to need more than several > > hours to run - about 7-10hrs at the moment. I think it had > > something to do with the outage because it usually takes quite a > > bit less and the scripts are still trying to catch up with > > unstable. It could also be simply dealing with much, much larger > > packages which take a lot of time to Grip. I'm hoping to take a > > look at what is happening tomorrow but the time required means that > > I can't actually do anything with the repository in the evenings as > > reprepro is almost constantly locked (for additions/updates). > > About your lags, i have been building toolchains on ant, that could > explain your timings. > According to Simon, the machine was down because KVM did not attached > to the right network interface, but else it is posible too. I'll check on that later. I'm certainly hoping that the combination of other loads and the extra package workload due to the outage is the reason why the times have grown so far. It does show that Grip has technical limits to the number of packages per archive (i.e. per machine). It becomes necessary to have multiple archives (on multiple machines) - one for the base packages and others (which do not have to use mirroring from base) to add alternative optional sets of packages. The reality is that any one machine can only cope with so many Grip packages (or so many Grip architectures). Debian has at least one buildd per architecture, we have one buildd for seven architectures - we're going to need several partial buildd's to complete the package set. We don't need mirrors for Grip, we need partial builders that augment the packages available from the base machine. One option is to have a "behind-the-scenes" machine with the same internet connection as the "frontend" but which does the grunt work of processing the packages and then all the frontend (ant) needs to do is sync the mirror twice a day. This would prevent things like toolchains, apache and other tasks prolonging the build process. Another option would be to separate out the architectures but this has less direct support in the scripts - where to put the Arch:all packages for one thing. > I am not aware (yet) how emdebian-grip-server and grip-cron.sh works, 1. use normal reprepro methods to update the local filter repo which saves time in the later stages but does take up as much space again. 2. identify the relevant packages from the Packages file of the filter repo. (these stages take a very short time but stage 1 is very network intensive, stage 2 is very CPU intensive.) 3. Iterate through the list of packages, passing each .deb and .dsc through the grip processing. Each run doesn't take that much time but the .deb has to be unpacked and repacked - with very large packages (java and gcc), this can take a noticeable period of time. The problem is that as we are not compiling the package from source, (where you only unpack the source once), we unpack each compiled .deb for each architecture, process it and then repack it. If the source builds 30 architecture-dependent binaries and Grip includes 12 of those binary packages, we unpack and repack 84 .debs. (7 architectures). grip-cron wraps this process to first handle unstable, then migrate packages into testing. A separate run (which hasn't even been running recently) then handles updates to stable-proposed-updates. Where the package to be migrated already exists, it is copied using reprepro which is trivial. When a specific version has been uploaded for that suite (typically stable-proposed-updates but testing can have dedicated queues too), the grip process needs to be run against that version of the package. Each version of each architecture of each package is only "gripped" once but there are a few corner cases to do with translations and Arch: all packages. These do result in some duplication of effort but I haven't found a satisfactory way to handle those yet. It is possible that Emdebian decides *not* to bother about the endianness of .mo files - we need some data on whether the current setup does give any performance gain (especially when loading the GUI in a non-English locale that is well supported in the translations) and then debate whether that gain is sufficient reason to keep the Emdebian TDebs are architecture-specific. Changing to Arch:all TDebs is a lot of work but would save a very large amount of space and cut out a signficant chunk of time from the Grip processing. The implications for Crush also need to be considered - indeed the measurement of the performance gain should be done using Crush because Grip only uses this method because it was deemed appropriate for Crush to have architecture-specific TDebs. The original idea was that by the time we had to decide this, Debian would have Arch:all TDebs that could simply be put directly into Grip. There are issues here that need wider discussion on the mailing list. > but the ./signal it is just a trigger. > Anyone from emdebian server can talk to ftp.tw.debian.org and trigger > the update by running such script in not much time, then the work is > done by tw machine. :-) I'm just expecting the trigger to be run less times per day. The current setup means that before rsync has worked out which files have been updated, the next package has been built and reprepro is busy "Exporting Indices...". > If you still think this model does not fit, i'll just set a cron task > which triggers the update. I think a once-daily cron task is going to be necessary, synchronised to run after the grip-cron task on ant (which currently starts at 3pm UTC). One way is to wrap grip-cron itself with a new script calls grip-cron. When grip-cron finally exits the wrapper can call a trigger. The trigger cannot be executed before grip-cron as the mirroring that follows will add load to ant just when grip-cron itself is wanting to use a lot of CPU and a lot of network connections. Alternatively, set the cron task to run long enough before 3pm that there is no chance of the mirroring still running when 3pm comes around, but long enough after 3pm that grip-cron itself has finished. (That time will, inevitably, land right in the zone where the normal maintenance cron tasks run on ant (6am UTC), so liaise with Simon for that timing.) I don't think grip-cron itself should know about triggers until we know how other machines using emdebian-grip-server want to handle such issues. -- Neil Williams ============= http://www.data-freedom.org/ http://www.linux.codehelp.co.uk/ http://e-mail.is-not-s.ms/
Attachment:
pgps4dRy9TjgI.pgp
Description: PGP signature