Re: Possibly moving Debian services to a CDN
]] Ingo Jürgensmann
> 1) Privacy concerns: Debian would deliver much more data to business
> companies than necessary. Keep in mind that personalized data is one
> of the most valuable things to data miners. Currently I choose one
> mirror site to pull my packages from. I can freely choose that mirror
> on basis of location, bandwidth, personal likes or, let's say, privacy
> reasons because I know that this specific mirror doesn't log my IPs.
> When using a CDN, at least in that way I understood your proposal, I'm
> not free to choose anymore. The company running that CDN will obtain
> all of data like how many machines are behind a subnet or IP, what
> kind of machines (intel, sparc, powerpc, m68k, ...) and might know if
> I forget to update a machine (security).
This is absolutely a valid concern. I have a few mitigation strategies
and one observation:
- You can still run your own mirror. We need that ourselves and like I
wrote in the initial email, we need to find a way that keeps rsync
- You can use an IP anonymizing service such as Tor.
- You can use a local proxy that hides the details of how many nodes,
etc. you have.
- I would like us to have agreements with any donors that they're not
allowed to use the information for anything but operational issues. We
can't tell them not to log (because that's really hard on a technical
level), but we can restrict what they can do with the logs.
The observation is that we currently don't have any such control over
mirror operators. They are, AFAIK, free to use whatever information
they collect for whatever purpose they would like.
> 2) Integrity concerns: although Debian uses signed package lists and
> hashed packages, using a CDN would raise the chances that there might
> be attack vectors by manipulating the traffic. Maybe not be the will
> of the running company, but there are other groups that might have
> interest and the power to intercept traffic and manipulating it. This
> is, of course, also true to current mirror sites, but a centralized
> CDN will be more convenient to such kind of attackers.
Given we don't use HTTPs and such today, you don't know if the traffic
is actually going to the mirror you think it's going to, so this isn't
really different from today. With a CDN we could actually push more of
the traffic to HTTPS if we wanted. This isn't feasible with today's
> 3) Surveillance concerns: together with 1) and 2) goes this
> one... Using a CDN would make it easier to secret services to collect
> data, because they have a single point where they can get all wanted
> data from instead of monitoring several providers and connections.
CDNs generally don't have central logging at the request level. There's
just no way for them to do that with the data rates you're looking at.
Also, can be mitigated with chucking HTTPS at the problem.
> 4) Dependency concerns: as a project Debian should be as independent
> as possible. Using a CDN provider will create a big dependency to a
> specific company, although we might be able to shift companies from
> time to time. Using multiple CDN providers will mitigate that concern
> a little bit, but only to a certain degree. Having too many CDN
> providers will be as difficult to handle as now the many FTP mirror
> donators. So, there's some trade-off anyway.
As I wrote in the initial email: CDNs are becoming a
commodity. Switching from one provider to another isn't hard, and we
already have offers from multiple CDNs, so I'm not particularly worried
about this. Were it harder to switch, it would be different, but
luckily, it's fairly easy.
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are