On 21/11/2013 12:37 AM, Peter Palfrader wrote:
Be aware that snapshot includes the old debian-non-US archive, which contains things we might not be able to export from the US, once it's there. As such, the data should only be outside of the US if we want to link to it freely.
Hm I'd forgotten those bad old days! ;) Where is this source now? Would you prefer to keep it outside of the US? I'm only syncing the first folder "00" (53k files, 84 GB) right now.
I've set a 4 day 'archive to Glacier' policy on this bucket. All objects we transfer in will go from the standard S3 (live) to archive storage after 4 days of being ingested. This is tunable; but we want older files (less likely to be recalled) to be in the cheaper tier of storage. On demand we can initiate a pullback from Archive of those files (3-5 hours to complete) - which brings back a copy of the file into "live S3" (using the Reduced Redundancy tier of storage for this duplicate live copy) for a number of days.It's nice to have the backup, but I'm unsure how to use a backend that has that amount of latency. Ideas welcome.
My thoughts was this would eventually go out to a 1 year policy - but in the short term, since everything is ingesting with a date of now() that means none of it would be in the archive tier. Perhaps even a 2 year policy - such that we reduce the number of times we're doing a recall form archive.
* While this first sync is happening, we can look at the Postgres database. AWS announced last week that the managed database environment, RDS (Relational Database Service) now supports Postgres as the 4th engine - with PG 9.3.1. I have brought up one of these instances in US-East ready for this database (snapshot-prod.cjaijq7ayn5u.us-east-1.rds.amazonaws.com:5432).I wonder if we can replicate to a running postgres instance. If not, we might have to feed it individually, importing the dumps that the master produces. Thoughts?
A dump from the current master would be a good start. What size are they (is it the 2.1 GB file I saw in there)? Peter, would you like the credentials for this DB (also in US-East right now)? If so, can you give me an IPv4 you'll be accessing it from?
Mobile: +61 422 166 708, Email: james_AT_rcpt.to