In the last two weeks I've been doing some stress tests in the classrooms where I'm using ltsp, and I'd like to share the results and my conclusions. Obvioulsy I also would like to hear from your experience, possible solutions to the problems and any wrong thing I may have done. This email is going to be long, so if you're not interested technically in ltsp, you can stop reading now. Previously: I'm using Debian lenny amd64, totally updated with a 2.6.24-amd64 stock kernel, in a classroom with 15 PC , and used two servers to test: - a Quad Core with 4 gb of RAM, one 32 bits pci 1 Gb card and 1 100 mHz network card in the main board - a Xeon with four cores, 4 gb of RAM, and two 1 Gbps network cards in the Main board. Both have the same sata 160 Gbytes disk, same manufacturer and very similar features (except the main board and the processor, obviously). For the tests I'm loading the thin client image using nfs to avoid generating the image for every change, but at the end I'll use nbd to load the image, saving some time when running many clients. The switch is pretty good, used VLAN to isolate the thin clients network from the rest of the school network, it has two 1 Gbps network ports, and connected to one of these ports the network cable from the server to the thin clients vlan. The main test have been: using wakeonlan I started 15 "old" (Pentium IV, 1,4 GHz, 256 Mb RAM, 8 Mb Video ram, 10/100 Mbps network card) computers being thin clients, used autologin in ldm, and autostart a very big openoffice impress presentation with a lot of heavy graphics and animations. Results when using a 100 Mbps port in the switch to the thin clients - There was a kept 89 Mbps NFS traffic when starting, and a kept 89 Mbps ssh traffic when starting Open Office. Adding the remaining udp, nfs, etc. traffic it's about 100 Mbps. In this case several of the clients did not start or started very slowly, as soon as the first ones begin to load nfs or the openoffice presentation, the delayed ones can not go on. So it seems that several clients steal all the bandwith. When using a 1 Gbps port in the switch to connect the server to the thin clients: - THere was some 98 Mbps NFS peaks, and some 124 ssh peaks, but they didn't last in time. All the clients started, no one was stalled. The starting time was reduced from 1'40" to 45" with the Open Office application running. - There was a ¿funny? result: the Xeon machine raised its CPU use to 100% when Openoffice started, with the Quad Core the CPU didn't raise more than 20%. Because of it, when using the Xeon server some of the clients where stalled for several seconds when starting X. Anyway, the starting time was about 5" less when using the Xeon for the clients that didn't stall. Conclusions: - There seem to be a need of introducing some QoS in the clients network traffic (it could be done in the switch, but then it will depend on the switch model and manufacturer, so I prefer to see it in the ltsp structure) , has anybody worked on this? - The Xeon server seems to behave better in network use, but worse when starting big desktop applications with a lot of graphics. - RAM is not very important. The use of RAM was never more than 2 Gbytes, remaining more than 1,8 Gbytes of not used RAM. - When there's no concurrency, both the Xeon and the Quad Core have not CPU problems, less than 5% starting OpenOffice (even if some other instances are already started), but when there is concurrency, the Xeon server behaves much worse. I don't know the reasons and would like to know any opinion. In the medium term, I'll have to start 30 clients per server, but we have to take a decission about the kind of ltsp server we're going to use a.s.a.p, so any advice is welcome. That's all Regards. José L.
Attachment:
signature.asc
Description: Esta parte del mensaje =?ISO-8859-1?Q?est=E1?= firmada digitalmente