[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#843758: ITP: s4cmd -- Super Amazon S3 command line tool



Package: wnpp
Severity: wishlist
Owner: Sascha Steinbiss <satta@debian.org>

* Package name    : s4cmd
  Version         : 2.0.1
  Upstream Author : BloomReach Inc.
* URL             : https://github.com/bloomreach/s4cmd
* License         : Apache
  Programming Lang: Python
  Description     : Super Amazon S3 command line tool

The s4cmd tool is intended as an alternative to s3cmd for enhanced
performance and for large files, and with a number of additional
features and fixes.

It strives to be compatible with the most common usage scenarios for
s3cmd. It does not offer exact drop-in compatibility, due to a number of
corner cases where different behavior seems preferable, or for bugfixes.

The main features that distinguish s4cmd are:

- Simple (less than 1500 lines of code) and implemented in pure Python,
  based on the widely used Boto3 library.
- Multi-threaded/multi-connection implementation for enhanced performance
  on all commands. As with many network-intensive applications (like web
  browsers), accessing S3 in a single-threaded way is often significantly
  less efficient than having multiple connections actively transferring
  data at once. In general, one gets a 2X boost to upload/download speeds 
  from this.
- Path handling: S3 is not a traditional filesystem with built-in support
  for directory structure: internally, there are only objects, not
  directories or folders. However, most people use S3 in a hierarchical
  structure, with paths separated by slashes, to emulate traditional
  filesystems. S4cmd follows conventions to more closely replicate the
  behavior of traditional filesystems in certain corner cases. For example,
  "ls" and "cp" work much like in Unix shells, to avoid odd surprises.
- Wildcard support: Wildcards, including multiple levels of wildcards, like
  in Unix shells, are handled.
  For example: s3://my-bucket/my-folder/20120512//chunk00?1?
- Automatic retry: Failure tasks will be executed again after a delay.
- Multi-part upload support for files larger than 5GB.
- Handling of MD5s properly with respect to multi-part uploads.
- Miscellaneous enhancements and bugfixes:
  - Partial file creation: Avoid creating empty target files if source does
    not exist. Avoid creating partial output files when commands are interrupted.
  - General thread safety: Tool can be interrupted or killed at any time without
    being blocked by child threads or leaving incomplete or corrupt files in place.
  - Ensure exit code is nonzero on all failure scenarios.
  - Expected handling of symlinks (they are followed).
  - Support both s3:// and s3n:// prefixes (the latter is common with Amazon
    Elastic Mapreduce).


Reply to: