On 5/12/2013 1:54 AM, Peter Palfrader wrote:
I wonder if we can somehow, somewhere tag files that we got from non-US or archive.d.o (which also covers non-US) and no other tree. Somebody would have to write code for that.
Do you have the source of where the files originated from somewhere? We can add arbitrary metadata as headers to objects if we want to. That metadata tag gets served back as headers when you get/head the object (file).
I wonder if we can replicate to a running postgres instance. If not, we might have to feed it individually, importing the dumps that the master produces. Thoughts?A dump from the current master would be a good start. What size are they (is it the 2.1 GB file I saw in there)? Peter, would you like the credentials for this DB (also in US-East right now)? If so, can you give me an IPv4 you'll be accessing it from?I'm not sure I can make use of DB access right now, thanks. When we still had a mirror at UBC, we used postgresql's DB replication feature to keep that mirror in sync. Is that an option with this instance?
Not right now - the replication that is supported is currently wholly within the AWS environment - Multi-AZ is the feature, synchronous block level replication from host in one cluster of data-centers (Availability Zone - AZ), to a standby host in the second AZ.
Absolutely - completely your choice.OTOH, we may not necessarily need a DB at amazon. It should certainly be possible to seperate backend hosts from frontend from database hosts.
See attached for an untested example to 'restore' a given file that has been archived; this is not a working example, just an initial sketch. The user credentials I sent you (offlist, naturally) for data ingest to S3 does not have access to call this restore at this time; we'll do a separate user with only access to restore (and not ingest) that can be in this script - but the concept is easy:
* We'll default to restoring a file to live storage for 14 days (after which the duplicate in lice storage is automatically removed)
* We'll limit to doing 100 file restores per day from archive
Feel free to edit parameters as above - but hopefully it shows you how this plugs together.
(Going to bed now - 12:30am now here at AWST+0800)
Mobile: +61 422 166 708, Email: james_AT_rcpt.to
#!/usr/bin/python # vi: ft=python restore_time_days = 14 restores_per_day = 100 bucket_name = aws.snapshot.debian.org simpledb_table = snapshot.debian.org import datetime import boto.sdb from boto.s3.connection import S3Connection s3_conn = S3Connection('<aws access key>', '<aws secret key>') bucket = s3_conn.create_bucket(bucket_name) from boto.s3.key import Key sdb_conn.create_domain(simpledb_table) def check_file_is_archived(file_name): if bucket.list(prefix=file_name).storage_class = "GLACIER" return 1 return 0 def check_file_already_being_restored(file_name); if bucket.get_key(file_name).ongoing_restore return 1 return 0 def check_daily_restores(): today = datetime.datetime.today() tomorrow = datetime.date.today() + datetime.timedelta(days=1) sdb_domain = sdb_conn.get_domain(simpledb_table) todays_sdb_data = sdb_domain.get_item(today.day) return True if todays_sbd_data.count < restores_per_day def restore_file(file_name, days): key = bucket.get_key(file_name) key.restore(days) def update_daily_restores(): today = datetime.datetime.today() sdb_domain = sdb_conn.get_domain(simpledb_table) todays_sdb_data = sdb_domain.get_item(today.day) if todays_sdb_data.today != today todays_sdb_data.count=1 todays_sdb_data.today = today else todays_sdb_data.count++ todays_sdb_data.save() def process(file_name): if (not check_file_is_archived(file_name)) return "File is not achived." else if (check_file_already_being_restored(file_name)) return "File is already being restored - please wait 3 - 5 hours from the inital restore" else if not check_daily_restores return "Too many restores have been done today - come back tomorrow" else restore_file(file_name, restore_time_days) update_daily_restores() return "Your file has been scheduled for restore - please try and access it in 3 - 5 hours" def main(): # If this is web accessible, probably drop some HTML in here... print process()