Re: Possibly moving Debian services to a CDN
Am 14.10.2013 um 07:29 schrieb Tollef Fog Heen <firstname.lastname@example.org>:
>> 1) Privacy concerns: Debian would deliver much more data to business
>> companies than necessary. Keep in mind that personalized data is one
>> of the most valuable things to data miners. Currently I choose one
>> mirror site to pull my packages from. I can freely choose that mirror
>> on basis of location, bandwidth, personal likes or, let's say, privacy
>> reasons because I know that this specific mirror doesn't log my IPs.
>> When using a CDN, at least in that way I understood your proposal, I'm
>> not free to choose anymore. The company running that CDN will obtain
>> all of data like how many machines are behind a subnet or IP, what
>> kind of machines (intel, sparc, powerpc, m68k, ...) and might know if
>> I forget to update a machine (security).
> This is absolutely a valid concern. I have a few mitigation strategies
> and one observation:
> - You can still run your own mirror. We need that ourselves and like I
> wrote in the initial email, we need to find a way that keeps rsync
Yeah, running my own mirror is an option for me. I did run a backports.org mirror in the past and was thinking of expanding it to a full-blown mirror.
But that, of course, is not an option for Joe Average User.
> - You can use an IP anonymizing service such as Tor.
We know that NSA and GCHQ are running Tor exit nodes. And yet they don't have the capacity to track all TOR traffic globally, but only, with great cost/effort, single users can be tracked. Apparently this might just be a matter of time, competing with counter measures from the TOR project.
> - You can use a local proxy that hides the details of how many nodes,
> etc. you have.
There are ways to distinguish nodes/users behind a proxy by using fingerprinting, latency checks and other stuff.
Yes, I know it will be rather unlikely that someone will do that for Debian updates, but until some weeks ago I couldn't think of secret services that will do a Full Take of intercontinental sea cables like the GCHQ is doing. The lesson from that is: if a secret service like NSA or GCHQ want to know something, no effort is too big.
All the Debian project can do, is to drive the costs high for such kind of surveillance. Or to put it other way around: Debian should avoid it to make it more easy for them.
> - I would like us to have agreements with any donors that they're not
> allowed to use the information for anything but operational issues. We
> can't tell them not to log (because that's really hard on a technical
> level), but we can restrict what they can do with the logs.
True. You can request agreements, but as the whole NSA affair is showing: it doesn't matter when it comes down to NSA & Co. There are secret courts with secret decisions and National Security Letters for silencing the providers, although internal agreements like Safe Harbor do exist.
So, whereas agreements can be made, there will be no way for Debian to control whether they are being held or not.
> The observation is that we currently don't have any such control over
> mirror operators. They are, AFAIK, free to use whatever information
> they collect for whatever purpose they would like.
Granted. That's maybe something Debian can address as well in the future.
But having many mirror operators result in:
- higher "costs" for controlling them
- each mirror operator only sees its own traffic
- each mirror site will be subject to the specific law in that country (higher data protection level in Germany for example)
Well, I think you got the point already... ;-)
>> 2) Integrity concerns: although Debian uses signed package lists and
>> hashed packages, using a CDN would raise the chances that there might
>> be attack vectors by manipulating the traffic. Maybe not be the will
>> of the running company, but there are other groups that might have
>> interest and the power to intercept traffic and manipulating it. This
>> is, of course, also true to current mirror sites, but a centralized
>> CDN will be more convenient to such kind of attackers.
> Given we don't use HTTPs and such today, you don't know if the traffic
> is actually going to the mirror you think it's going to, so this isn't
> really different from today. With a CDN we could actually push more of
> the traffic to HTTPS if we wanted. This isn't feasible with today's
> mirror network.
That's a valid point of you, thanks! The use of HTTPS should be encouraged, of course. How would HTTPS with a CDN work? I would believe that the CDN provider will use some kind of SSL proxy or SSL interception techniques. Otherwise you would have the same problems with managing HTTPS with the current mirror network.
There are probably these possible ways:
a) CDN provides an HTTPS entry point, but connects to the underlying mirror by plain HTTP.
b) CDN uses DPI and SSL interception to break end-to-end encryption
For example using Cisco WAAS is a nice and decent way to minimize TCP traffic on a low level protocol layer, but WAAS cannot handle SSL connections for obvious reasons. SSL connections will then bypass the cache. OTOH hand Cisco DPI will break SSL connection by intercepting them. The user cannot verify the validity of SSL certs by himself. Those are two common problems with this kind of network "enhancements". I expect similar problems with CDN providers.
>> 3) Surveillance concerns: together with 1) and 2) goes this
>> one... Using a CDN would make it easier to secret services to collect
>> data, because they have a single point where they can get all wanted
>> data from instead of monitoring several providers and connections.
> CDNs generally don't have central logging at the request level. There's
> just no way for them to do that with the data rates you're looking at.
> Also, can be mitigated with chucking HTTPS at the problem.
Well, as said above: I would have believed that several sea cables will be stored in GCHQs data centers for about 3 days in a Full Take approach if you have told me that some months ago. ;)
The problem is: the more you centralize the service the more it will become easy to surveillance that traffic, no matter of big the traffic is. Simple because you will just have to deal with a handfull of companies.
But when want to learn one thing from Edward Snowdens disclosures, then that centralized services are easier to control (for NSA & Co) and that de-centralized and small services will be better for you and your privacy.
>> 4) Dependency concerns: as a project Debian should be as independent
>> as possible. Using a CDN provider will create a big dependency to a
>> specific company, although we might be able to shift companies from
>> time to time. Using multiple CDN providers will mitigate that concern
>> a little bit, but only to a certain degree. Having too many CDN
>> providers will be as difficult to handle as now the many FTP mirror
>> donators. So, there's some trade-off anyway.
> As I wrote in the initial email: CDNs are becoming a
> commodity. Switching from one provider to another isn't hard, and we
> already have offers from multiple CDNs, so I'm not particularly worried
> about this. Were it harder to switch, it would be different, but
> luckily, it's fairly easy.
Yeah, so is Cloud computing becoming a commoditiy as well. And you might guess it: I'm no friend of Cloud computing either. ;-)
Anyway, I think the discussion about using a CDN is not about technical aspects, but it's a political debate that needs to b
held and finally a political decision have to made whether Debian as a Free/Libre Software project/distribution wants to use a CDN and accept the risks that come with that or not.
Personally I believe, that using a CDN would make live of DSA more easier (you wrote something in a different mail today that current CDN breaks on a weekly basis. Can you elaborate this, maybe on wiki.d.o?) and it might be easier for users. OTOH I have great privacy concerns of using a CDN. And when the current mirror network will still be maintained, where's the benefit for DSA and the users then at all? Having freedom of choice is always good, so I'd be fine with keeping current mirror network, but having a cdn.debian.org in parallel. When doing fresh installations people should be made aware of privacy concerns when using the CDN (like: "Using a CDN might be easier and faster for you, but Debian doesn't control the CDN and cannot guarantee privacy and data protection").
Ciao... // Fon: 0381-2744150
Ingo \X/ http://blog.windfluechter.net
gpg pubkey: http://www.juergensmann.de/ij_public_key.asc