Bug#843758: ITP: s4cmd -- Super Amazon S3 command line tool
Package: wnpp
Severity: wishlist
Owner: Sascha Steinbiss <satta@debian.org>
* Package name : s4cmd
Version : 2.0.1
Upstream Author : BloomReach Inc.
* URL : https://github.com/bloomreach/s4cmd
* License : Apache
Programming Lang: Python
Description : Super Amazon S3 command line tool
The s4cmd tool is intended as an alternative to s3cmd for enhanced
performance and for large files, and with a number of additional
features and fixes.
It strives to be compatible with the most common usage scenarios for
s3cmd. It does not offer exact drop-in compatibility, due to a number of
corner cases where different behavior seems preferable, or for bugfixes.
The main features that distinguish s4cmd are:
- Simple (less than 1500 lines of code) and implemented in pure Python,
based on the widely used Boto3 library.
- Multi-threaded/multi-connection implementation for enhanced performance
on all commands. As with many network-intensive applications (like web
browsers), accessing S3 in a single-threaded way is often significantly
less efficient than having multiple connections actively transferring
data at once. In general, one gets a 2X boost to upload/download speeds
from this.
- Path handling: S3 is not a traditional filesystem with built-in support
for directory structure: internally, there are only objects, not
directories or folders. However, most people use S3 in a hierarchical
structure, with paths separated by slashes, to emulate traditional
filesystems. S4cmd follows conventions to more closely replicate the
behavior of traditional filesystems in certain corner cases. For example,
"ls" and "cp" work much like in Unix shells, to avoid odd surprises.
- Wildcard support: Wildcards, including multiple levels of wildcards, like
in Unix shells, are handled.
For example: s3://my-bucket/my-folder/20120512//chunk00?1?
- Automatic retry: Failure tasks will be executed again after a delay.
- Multi-part upload support for files larger than 5GB.
- Handling of MD5s properly with respect to multi-part uploads.
- Miscellaneous enhancements and bugfixes:
- Partial file creation: Avoid creating empty target files if source does
not exist. Avoid creating partial output files when commands are interrupted.
- General thread safety: Tool can be interrupted or killed at any time without
being blocked by child threads or leaving incomplete or corrupt files in place.
- Ensure exit code is nonzero on all failure scenarios.
- Expected handling of symlinks (they are followed).
- Support both s3:// and s3n:// prefixes (the latter is common with Amazon
Elastic Mapreduce).
Reply to: