Re: non-standard TCP tunings in EC2 images
On 19/07/16 at 19:10 -0400, Sam Hartman wrote:
> >>>>> "Lucas" == Lucas Nussbaum <email@example.com> writes:
> Lucas> by this. One could argue that those test suites are a bit
> Lucas> fragile, but on the other hand, I would expect an image
> Lucas> labelled as "Official Debian" on
> Lucas> https://wiki.debian.org/Cloud/AmazonEC2Image to not differ
> Lucas> from default Debian configuration in subtle ways like this.
> I'm not sure that's true.
> We had some interesting discussion of this in the cloud BOF, at debconf.
> we had strong agreement there (at least as I read it) that the official
> images must include software only from Debian main.
> There were people pushing for the idea that it have the entirely
> standard config as well, although I and some others pointed out that
> isn't going to work.
> The most obvious issue is that you need to establish the appropriate
> cloud-init data sources for the platform in question unless you want to
> introduce significant boot delays.
> Also, some folks who've worked with the cloud providers pointed out that
> unless people can produce images that work well with their cloud,
> they're not going to use the official images.
> Tuning things like networking and disk performance to be right for a
> given cloud seems like a discussion worth having, and at least in my
> mind "debian official," doesn't obviously say no.
I agree that there are some changes that are required (typically
cloud-init-related). I also agree that it might make sense to
work-around known (performance) bugs in cloud infrastructure by using
However, I think that, unless justified, we should strive to make Debian
images for cloud environments as similar as possible as what one
would get from a standard debian installation, to provide a consistent
behaviour to users over all environments. Currently there are some
undocumented changes made to the EC2 images, that can:
- result in a different application behaviour (as shown by the FTBFS
bugs I pointed to)
- mislead the user about the exact dependencies of an application
To be more concrete, according to debfoster, the jessie AMI includes:
- cloud-init: OK
- cloud-utils and awscli: I wonder if those really needs to be
installed by default. I guess that the rationale is that they could be
needed by scripts executed via cloud-init, but couldn't they be
installed in that script?
- dkms, which also pulls linux-headers-amd64, gcc 4.8 and 4.9, make,
patch, sudo: Apparently, this is required to build Intel's ixgbevf
driver, which is built and shipped inside the image. The 'task' in
bootstrapvz doesn't explain in which cases this driver is needed.
Maybe dkms and its dependencies could be removed after the kernel
module is built?
- openssh-server: OK
- grub-pc, linux-image-amd64: OK
- lvm2: maybe, even though it could probably be installed when needed
- parted, gdisk: probably needed to grow partitions at boot (I did not
- bootlogd: maybe (but it's not really needed anymore with systemd)
- python3-boto, python-boto: same as cloud-utils/awscli
- sysvinit: not need
- apt-transport-https: I don't see the point. none of the configured
repositories use https. People could just install it if needed.
Additionally, I could spot the following configuration changes using
'cruft' and picking bits from bootstrapvz:
- backports are enabled.
- backports use cloudfront.debian.net. It would be great to work with
DSA to make cloudfront.debian.net official (part of deb.debian.org).
In the meantime, I would be more comfortable with using deb.debian.org.
- there's a customized version of growpart installed in
- there's a number of sysctl changes in /etc/sysctl.d/01_ec2.conf:
vm.swappiness = 0
vm.dirty_ratio = 80
vm.dirty_background_ratio = 5
vm.dirty_expire_centisecs = 12000
net.core.somaxconn = 1000
net.core.netdev_max_backlog = 5000
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_wmem = 4096 12582912 16777216
net.ipv4.tcp_rmem = 4096 12582912 16777216
net.ipv4.tcp_max_syn_backlog = 8096
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 10240 65535
kernel.sysrq = 0
I don't see a reason for changing those: it might make sense in some
environments, sure, but several of them can have a visible impact on
application's behaviour (tcp_tw_reuse, tcp_ss_after_idle,
ip_local_port_range), and/or have downsides e.g. wrt memory usage
(all mem settings). I would be much more comfortable with sticking
with the default upstream (kernel) values.
Also, it seems that most of those changes are done in bootstrapvz's ec2
subdirectory. Shouldn't they be done for all cloud providers?
Have you thought about packaging those customizations? They could go in
'cloud-image-ec2-required' and 'cloud-image-ec2-recommended' packages,
that users could install if they want the additional tunings.