Bug#950369: buster-pu: package python-internetarchive/1.8.1-1
Package: release.debian.org
Severity: normal
Tags: buster
User: release.debian.org@packages.debian.org
Usertags: pu
The version of "internetarchive" in Debian has some serious
scalability and reliability issues. In particular, it has trouble
handling more than 1024 files (#950289) but upstream also made a few
other changes which we might want to merge in. We're shipping a
modified version of 1.8.1 in buster, and upstream has released a few
releases up to 1.8.5 and 1.9.0 since then, the latter of which is in
Debian.
I just uploaded a patch for #950289 to unstable. I'm hoping this patch
could also be shipped to stable, but I am wondering if we wouldn't be
better off syncing the entire package with upstream, to 1.8.5 or maybe
even what will become 1.9.1 once they ship the fix for #950289
upstream.
In the meantime, I'm including the debdiff for a hotfix on 1.8.1. I
also include a debdiff to upgrade to 1.8.5 *and* the hotfix, in case
that would also be acceptable. Finally, I'd be happy to wait a little
longer and coordinate with upstream to get 1.9.1 synchronized in all
suites.
Thanks for your work!
a.
-- System Information:
Debian Release: 10.2
APT prefers stable-debug
APT policy: (500, 'stable-debug'), (500, 'stable'), (1, 'experimental'), (1, 'unstable')
Architecture: amd64 (x86_64)
Kernel: Linux 4.19.0-6-amd64 (SMP w/4 CPU cores)
Locale: LANG=fr_CA.UTF-8, LC_CTYPE=fr_CA.UTF-8 (charmap=UTF-8), LANGUAGE=fr_CA.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
diff -Nru python-internetarchive-1.8.1/debian/changelog python-internetarchive-1.8.1/debian/changelog
--- python-internetarchive-1.8.1/debian/changelog 2018-09-24 23:08:05.000000000 -0400
+++ python-internetarchive-1.8.1/debian/changelog 2020-01-31 15:00:57.000000000 -0500
@@ -1,3 +1,9 @@
+python-internetarchive (1.8.1-1+deb10u1) buster; urgency=medium
+
+ * hotfix: close file after getting md5 (Closes: #950289)
+
+ -- Antoine Beaupré <anarcat@debian.org> Fri, 31 Jan 2020 15:00:57 -0500
+
python-internetarchive (1.8.1-1) unstable; urgency=low
* Package internetarchive library for Debian (Closes: #909550)
diff -Nru python-internetarchive-1.8.1/debian/patches/0001-close-file-after-getting-md5.patch python-internetarchive-1.8.1/debian/patches/0001-close-file-after-getting-md5.patch
--- python-internetarchive-1.8.1/debian/patches/0001-close-file-after-getting-md5.patch 1969-12-31 19:00:00.000000000 -0500
+++ python-internetarchive-1.8.1/debian/patches/0001-close-file-after-getting-md5.patch 2020-01-31 15:00:57.000000000 -0500
@@ -0,0 +1,55 @@
+From 086e2e65fc840fd827b02e1022fad084ee700d7c Mon Sep 17 00:00:00 2001
+From: kpcyrd <git@rxv.cc>
+Date: Fri, 31 Jan 2020 14:53:05 -0500
+Subject: [PATCH] close file after getting md5
+
+I've tried to upload to archive.org and noticed ia crashes on
+large folders.
+
+ $ ulimit -n
+ 1024
+ $ ia upload asdf ./folder-with-more-than-1024-files/
+ [...]
+ OSError: [Errno 24] Too many open files
+ [...]
+ $
+
+The bug is present in src:python-internetarchive, I found a patch that
+resolves the issue from 2018 that was never applied. You can find a
+patch that cleanly applies to the current debian/sid below. The original
+author is github.com/Arkiver2.
+
+Upstream patch:
+https://github.com/jjjake/internetarchive/commit/4e4120f07c98ea98c61791293835df2797bfee61
+
+Debian Bug: #950289
+---
+ internetarchive/utils.py | 6 ++++--
+ 1 file changed, 4 insertions(+), 2 deletions(-)
+
+diff --git a/internetarchive/utils.py b/internetarchive/utils.py
+index db8412a..2f3e04e 100644
+--- a/internetarchive/utils.py
++++ b/internetarchive/utils.py
+@@ -235,14 +235,16 @@ def recursive_file_count(files, item=None, checksum=False):
+ is_dir = False
+ if is_dir:
+ for x, _ in iter_directory(f):
+- lmd5 = get_md5(open(x, 'rb'))
++ with open(x, 'rb') as f_:
++ lmd5 = get_md5(f_)
+ if lmd5 in md5s:
+ continue
+ else:
+ total_files += 1
+ else:
+ try:
+- lmd5 = get_md5(open(f, 'rb'))
++ with open(f, 'rb') as f_:
++ lmd5 = get_md5(f_)
+ except TypeError:
+ # Support file-like objects.
+ lmd5 = get_md5(f)
+--
+2.20.1
+
diff -Nru python-internetarchive-1.8.1/debian/patches/series python-internetarchive-1.8.1/debian/patches/series
--- python-internetarchive-1.8.1/debian/patches/series 2018-09-24 23:08:05.000000000 -0400
+++ python-internetarchive-1.8.1/debian/patches/series 2020-01-31 15:00:57.000000000 -0500
@@ -1 +1,2 @@
0001-v1.8.1.patch
+0001-close-file-after-getting-md5.patch
diff -Nru python-internetarchive-1.8.1/debian/changelog python-internetarchive-1.8.5/debian/changelog
--- python-internetarchive-1.8.1/debian/changelog 2018-09-24 23:08:05.000000000 -0400
+++ python-internetarchive-1.8.5/debian/changelog 2020-01-31 15:00:57.000000000 -0500
@@ -1,3 +1,20 @@
+python-internetarchive (1.8.5-1+deb10u1) buster; urgency=medium
+
+ * hotfix: close file after getting md5 (Closes: #950289)
+
+ -- Antoine Beaupré <anarcat@debian.org> Fri, 31 Jan 2020 15:00:57 -0500
+
+python-internetarchive (1.8.5-1) unstable; urgency=medium
+
+ [ Ondřej Nový ]
+ * Use debhelper-compat instead of debian/compat.
+
+ [ Antoine Beaupré]
+ * new upstream release (Closes: #922357)
+ * remove patches merged upstream
+
+ -- Antoine Beaupré <anarcat@debian.org> Tue, 15 Oct 2019 20:29:12 -0400
+
python-internetarchive (1.8.1-1) unstable; urgency=low
* Package internetarchive library for Debian (Closes: #909550)
diff -Nru python-internetarchive-1.8.1/debian/compat python-internetarchive-1.8.5/debian/compat
--- python-internetarchive-1.8.1/debian/compat 2018-09-24 23:08:05.000000000 -0400
+++ python-internetarchive-1.8.5/debian/compat 1969-12-31 19:00:00.000000000 -0500
@@ -1 +0,0 @@
-11
diff -Nru python-internetarchive-1.8.1/debian/control python-internetarchive-1.8.5/debian/control
--- python-internetarchive-1.8.1/debian/control 2018-09-24 23:08:05.000000000 -0400
+++ python-internetarchive-1.8.5/debian/control 2020-01-31 15:00:57.000000000 -0500
@@ -2,7 +2,7 @@
Section: python
Priority: optional
Maintainer: Antoine Beaupré <anarcat@debian.org>
-Build-Depends: debhelper (>= 11~), dh-python,
+Build-Depends: debhelper-compat (= 11), dh-python,
python3-all,
python3-clint,
python3-docopt,
diff -Nru python-internetarchive-1.8.1/debian/patches/0001-close-file-after-getting-md5.patch python-internetarchive-1.8.5/debian/patches/0001-close-file-after-getting-md5.patch
--- python-internetarchive-1.8.1/debian/patches/0001-close-file-after-getting-md5.patch 1969-12-31 19:00:00.000000000 -0500
+++ python-internetarchive-1.8.5/debian/patches/0001-close-file-after-getting-md5.patch 2020-01-31 15:00:57.000000000 -0500
@@ -0,0 +1,55 @@
+From 086e2e65fc840fd827b02e1022fad084ee700d7c Mon Sep 17 00:00:00 2001
+From: kpcyrd <git@rxv.cc>
+Date: Fri, 31 Jan 2020 14:53:05 -0500
+Subject: [PATCH] close file after getting md5
+
+I've tried to upload to archive.org and noticed ia crashes on
+large folders.
+
+ $ ulimit -n
+ 1024
+ $ ia upload asdf ./folder-with-more-than-1024-files/
+ [...]
+ OSError: [Errno 24] Too many open files
+ [...]
+ $
+
+The bug is present in src:python-internetarchive, I found a patch that
+resolves the issue from 2018 that was never applied. You can find a
+patch that cleanly applies to the current debian/sid below. The original
+author is github.com/Arkiver2.
+
+Upstream patch:
+https://github.com/jjjake/internetarchive/commit/4e4120f07c98ea98c61791293835df2797bfee61
+
+Debian Bug: #950289
+---
+ internetarchive/utils.py | 6 ++++--
+ 1 file changed, 4 insertions(+), 2 deletions(-)
+
+diff --git a/internetarchive/utils.py b/internetarchive/utils.py
+index db8412a..2f3e04e 100644
+--- a/internetarchive/utils.py
++++ b/internetarchive/utils.py
+@@ -235,14 +235,16 @@ def recursive_file_count(files, item=None, checksum=False):
+ is_dir = False
+ if is_dir:
+ for x, _ in iter_directory(f):
+- lmd5 = get_md5(open(x, 'rb'))
++ with open(x, 'rb') as f_:
++ lmd5 = get_md5(f_)
+ if lmd5 in md5s:
+ continue
+ else:
+ total_files += 1
+ else:
+ try:
+- lmd5 = get_md5(open(f, 'rb'))
++ with open(f, 'rb') as f_:
++ lmd5 = get_md5(f_)
+ except TypeError:
+ # Support file-like objects.
+ lmd5 = get_md5(f)
+--
+2.20.1
+
diff -Nru python-internetarchive-1.8.1/debian/patches/0001-v1.8.1.patch python-internetarchive-1.8.5/debian/patches/0001-v1.8.1.patch
--- python-internetarchive-1.8.1/debian/patches/0001-v1.8.1.patch 2018-09-24 23:08:05.000000000 -0400
+++ python-internetarchive-1.8.5/debian/patches/0001-v1.8.1.patch 1969-12-31 19:00:00.000000000 -0500
@@ -1,46 +0,0 @@
-Forwarded: https://github.com/jjjake/internetarchive/issues/271
-Origin: upstream
-From eb4d1d7821b20368ac3e43836062a74ad960baf9 Mon Sep 17 00:00:00 2001
-From: jake <jake@archive.org>
-Date: Mon, 2 Jul 2018 11:01:11 -0700
-Subject: [PATCH] v1.8.1
-
----
- HISTORY.rst | 7 +++++++
- internetarchive/__init__.py | 2 +-
- 2 files changed, 8 insertions(+), 1 deletion(-)
-
-diff --git a/HISTORY.rst b/HISTORY.rst
-index c064b68..415ceb1 100644
---- a/HISTORY.rst
-+++ b/HISTORY.rst
-@@ -3,6 +3,13 @@
- Release History
- ---------------
-
-+1.8.1 (2018-06-28)
-+++++++++++++++++++
-+
-+**Bugfixes**
-+
-+- Fixed bug in ``ia tasks --get-task-log`` that was returning an unable to parse JSON error.
-+
- 1.8.0 (2018-06-28)
- ++++++++++++++++++
-
-diff --git a/internetarchive/__init__.py b/internetarchive/__init__.py
-index 4aa5d2d..0b81dd5 100644
---- a/internetarchive/__init__.py
-+++ b/internetarchive/__init__.py
-@@ -37,7 +37,7 @@
- from __future__ import absolute_import
-
- __title__ = 'internetarchive'
--__version__ = '1.8.0'
-+__version__ = '1.8.1'
- __author__ = 'Jacob M. Johnson'
- __license__ = 'AGPL 3'
- __copyright__ = 'Copyright (C) 2012-2017 Internet Archive'
---
-2.19.0
-
diff -Nru python-internetarchive-1.8.1/debian/patches/series python-internetarchive-1.8.5/debian/patches/series
--- python-internetarchive-1.8.1/debian/patches/series 2018-09-24 23:08:05.000000000 -0400
+++ python-internetarchive-1.8.5/debian/patches/series 2020-01-31 15:00:57.000000000 -0500
@@ -1 +1 @@
-0001-v1.8.1.patch
+0001-close-file-after-getting-md5.patch
diff -Nru python-internetarchive-1.8.1/docs/source/api.rst python-internetarchive-1.8.5/docs/source/api.rst
--- python-internetarchive-1.8.1/docs/source/api.rst 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/docs/source/api.rst 2019-06-07 17:28:42.000000000 -0400
@@ -117,7 +117,7 @@
Item Objects
------------
-:class:`Item` objects represent `Internet Archive items <items.html>`_.
+:class:`Item` objects represent `Internet Archive items <//archive.org/services/docs/api/items.html>`_.
From the :class:`Item` object you can create new items, upload files to existing items, read and write metadata, and download or delete files.
.. autofunction:: get_item
@@ -137,7 +137,7 @@
The item will automatically be created if it does not exist.
-Refer to `archive.org Identifiers <metadata.html#archive-org-identifiers>`_ for more information on creating valid archive.org identifiers.
+Refer to `archive.org Identifiers <//archive.org/services/docs/api/metadata-schema/index.html#archive-org-identifiers>`_ for more information on creating valid archive.org identifiers.
Setting Remote Filenames
^^^^^^^^^^^^^^^^^^^^^^^^
diff -Nru python-internetarchive-1.8.1/docs/source/cli.rst python-internetarchive-1.8.5/docs/source/cli.rst
--- python-internetarchive-1.8.1/docs/source/cli.rst 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/docs/source/cli.rst 2019-06-07 17:28:42.000000000 -0400
@@ -64,7 +64,7 @@
Modifying Metadata
^^^^^^^^^^^^^^^^^^
-Once ``ia`` has been `configured <quickstart.html#configuring>`_, you can modify metadata:
+Once ``ia`` has been `configured <quickstart.html#configuring>`_, you can modify `metadata <//archive.org/services/docs/api/metadata-schema>`_:
.. code:: bash
@@ -115,7 +115,7 @@
This would remove ``another subject`` from the items subject field, regardless of whether or not the field is a single or multi-value field.
-Refer to `Internet Archive Metadata <metadata.html>`_ for more specific details regarding metadata and archive.org.
+Refer to `Internet Archive Metadata <//archive.org/services/docs/api/metadata-schema/index.html>`_ for more specific details regarding metadata and archive.org.
Modifying Metadata in Bulk
@@ -142,7 +142,7 @@
$ ia upload <identifier> file1 file2 --metadata="mediatype:texts" --metadata="blah:arg"
-Please note that, unless specified otherwise, items will be uploaded with a ``data`` mediatype. **This cannot be changed afterwards.** Therefore, you should specify a mediatype when uploading, eg. ``--metadata="mediatype:movies"``
+.. warning:: Please note that, unless specified otherwise, items will be uploaded with a ``data`` mediatype. **This cannot be changed afterwards.** Therefore, you should specify a mediatype when uploading, eg. ``--metadata="mediatype:movies"``. Similarly, if you want your upload to end up somewhere else than the default collection (currently `community texts <//archive.org/details/opensource>`_), you should also specify a collection with ``--metadata="collection:foo"``. See `metadata documentation <//archive.org/services/docs/api/metadata-schema>`_ for more information.
You can upload files from ``stdin``:
@@ -163,8 +163,8 @@
These files can be deleted like normal files.
You can also prevent the backup from happening on clobbers by adding ``-H x-archive-keep-old-version:0`` to your command.
-Refer to `archive.org Identifiers <metadata.html#archive-org-identifiers>`_ for more information on creating valid archive.org identifiers.
-Please also read the `Internet Archive Items <items.html>`_ page before getting started.
+Refer to `archive.org Identifiers <//archive.org/services/docs/api/metadata-schema/index.html#archive-org-identifiers>`_ for more information on creating valid archive.org identifiers.
+Please also read the `Internet Archive Items <//archive.org/services/docs/api/items.html>`_ page before getting started.
Bulk Uploading
^^^^^^^^^^^^^^
diff -Nru python-internetarchive-1.8.1/docs/source/index.rst python-internetarchive-1.8.5/docs/source/index.rst
--- python-internetarchive-1.8.1/docs/source/index.rst 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/docs/source/index.rst 2019-06-07 17:28:42.000000000 -0400
@@ -28,8 +28,6 @@
installation
quickstart
cli
- items
- metadata
api
updates
troubleshooting
diff -Nru python-internetarchive-1.8.1/docs/source/items.rst python-internetarchive-1.8.5/docs/source/items.rst
--- python-internetarchive-1.8.1/docs/source/items.rst 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/docs/source/items.rst 1969-12-31 19:00:00.000000000 -0500
@@ -1,74 +0,0 @@
-Internet Archive Items
-======================
-
-What Is an Item?
-----------------
-
-Archive.org is made up of "items".
-An item is a logical "thing" that we represent on one web page on archive.org.
-An item can be considered as a group of files that deserve their own metadata.
-If the files in an item have separate metadata, the files should probably be in different items.
-An item can be a book, a song, an album, a dataset, a movie, an image or set of images, etc.
-Every item has an `identifier <metadata.html#archive-org-identifiers>`_ that is unique across archive.org.
-
-How Items Are Structured
-------------------------
-
-An item is just a directory of files and possibly subdirectories.
-Every item has at least two files named in the following format (see `metadata page <metadata.html#archive-org-identifiers>`_ for more context on what an identifier is):
-
- - ``<identifier>_files.xml``
- - ``<identifier>_meta.xml``
-
-The ``_meta.xml`` file is an XML file containing all of the `metadata describing the item <metadata.html>`_.
-The ``_files.xml`` file is an XML file containing all of the file-level metadata.
-There can only be one ``_meta.xml`` file and one ``_files.xml`` file per item.
-
-Alongside these metadata files and the original files uploaded to the item, the item may also contain `derivative files automatically generated by archive.org <https://archive.org/help/derivatives.php>`_.
-
-Item Limitations
-----------------
-
-As a rule of thumb, items should:
-
- - **not** be over 100GB
- - **not** contain more than 10,000 files.
-
-Collections
------------
-
-All items must be part of a collection.
-A collection is simply an item with special characteristics.
-Besides an image file for the collection logo, files should **never** be uploaded directly to a collection item.
-Items can be assigned to a collection at the time of creation, or after the item has been created by modifying the ``collection`` element in an item's metadata to contain the identifier for the given collection (i.e. ``ia metadata <identifier> -m collection:<collection-identifier>``.
-Currently collections can only be created by archive.org staff.
-Please contact `info@archive.org <mailto:info@archive.org>`_ if you need a collection.
-
-Archival URLs
--------------
-
-An item's "details" page will always be available at::
-
- https://archive.org/details/<identifier>
-
-The item directory is always available at::
-
- https://archive.org/download/<identifier>
-
-A particular file can always be downloaded from::
-
- https://archive.org/download/<identifier>/<filename>
-
-**Note**: Archival URLs may redirect to an actual server that contains the content.
-The resultant URL is **not** a permalink.
-For example, the archival URL::
-
- https://archive.org/download/popeye_taxi-turvey/popeye_taxi-turvey_meta.xml
-
-currently redirects to::
-
- https://ia802304.us.archive.org/30/items/popeye_taxi-turvey/popeye_taxi-turvey_meta.xml
-
-**DO NOT LINK** to any archive.org URL that begins with numbers like this.
-This refers to the particular machine that we're serving the file from right now, but we move items to new servers all the time.
-If you link to this sort of URL, instead of the archival URL, your link **WILL** break at some point.
diff -Nru python-internetarchive-1.8.1/docs/source/jq.rst python-internetarchive-1.8.5/docs/source/jq.rst
--- python-internetarchive-1.8.1/docs/source/jq.rst 1969-12-31 19:00:00.000000000 -0500
+++ python-internetarchive-1.8.5/docs/source/jq.rst 2019-06-07 17:28:42.000000000 -0400
@@ -0,0 +1,344 @@
+.. _jq:
+
+Using jq with ia
+================
+
+`jq <https://stedolan.github.io/jq/>`_ is a lightweight and flexible command-line JSON processor.
+It's a great tool for processing the JSON output of ``ia``.
+This document will go over how to install or download ``jq`` and how to use it with ``ia``.
+
+If you have a tip you'd like to add to this page, please email `jake@archive.org <mailto:jake@archive.org>`_ or send a pull request.
+If you're unable to figure out a ``jq`` command to do what you need and don't see it on this page, please email `jake@archive.org <mailto:jake@archive.org>`_ for help.
+
+Installation
+------------
+
+Downloading a binary
+^^^^^^^^^^^^^^^^^^^^
+
+The easiest way to get started with ``jq`` is to download a binary.
+Binaries for Linux, OS X, and Windows are available at `https://stedolan.github.io/jq/download/ <https://stedolan.github.io/jq/download/>`_.
+Once you find the binary for your OS, you could right-click the hypertext and copy the link to the binary.
+Then you could paste it into your terminal and download it like so:
+
+.. code:: bash
+
+ $ curl -Ls https://github.com/stedolan/jq/releases/download/jq-1.5/jq-osx-amd64 > jq
+ $ chmod +x jq # make it executable
+
+To confirm it's working, simply run the following.
+You should see the help page.
+
+.. code:: bash
+
+ $ ./jq
+ jq - commandline JSON processor [version 1.5]
+ Usage: ./jq [options] <jq filter> [file...]
+
+ jq is a tool for processing JSON inputs, applying the
+ given filter to its JSON text inputs and producing the
+ filter's results as JSON on standard output.
+ The simplest filter is ., which is the identity filter,
+ copying jq's input to its output unmodified (except for
+ formatting).
+ For more advanced filters see the jq(1) manpage ("man jq")
+ and/or https://stedolan.github.io/jq
+
+ Some of the options include:
+ -c compact instead of pretty-printed output;
+ -n use `null` as the single input value;
+ -e set the exit status code based on the output;
+ -s read (slurp) all inputs into an array; apply filter to it;
+ -r output raw strings, not JSON texts;
+ -R read raw strings, not JSON texts;
+ -C colorize JSON;
+ -M monochrome (don't colorize JSON);
+ -S sort keys of objects on output;
+ --tab use tabs for indentation;
+ --arg a v set variable $a to value <v>;
+ --argjson a v set variable $a to JSON value <v>;
+ --slurpfile a f set variable $a to an array of JSON texts read from <f>;
+ See the manpage for more options.
+
+Just like the ``ia`` binary, downloading the ``jq`` binary does not install it to your system.
+It's simply an executable binary.
+To use it, you'll have to use either a relative or absolute path. For example:
+
+.. code:: bash
+
+ $ ~/jq --help
+ $ ./jq --help
+ $ /Users/jake/jq --help
+
+Installing with a package manager
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``jq`` can also be installed with most popular package managers:
+
+.. code:: bash
+
+ # Linux
+ $ sudo apt-get install jq
+
+ # OS X
+ $ brew install jq
+
+ # FreeBSD
+ $ pkg install jq
+
+ # Solaris
+ $ pkgutil -i jq
+
+ # Windows
+ $ chocolately install jq
+
+Please refer to `https://stedolan.github.io/jq/download/ <https://stedolan.github.io/jq/download/>`_ for more details.
+
+
+
+Getting started
+---------------
+
+``jq`` can seem a bit overwhelming at first, so let's get started with some basic examples.
+A good way to make sense of how you can access a specific metadata field is to use ``jq 'keys'``.
+This will show you the top-level keys that exist in the JSON document.
+
+.. code:: bash
+
+ $ ia metadata nasa | jq 'keys'
+ [
+ "created",
+ "d1",
+ "d2",
+ "dir",
+ "files",
+ "files_count",
+ "is_collection",
+ "item_size",
+ "metadata",
+ "reviews",
+ "server",
+ "uniq",
+ "workable_servers"
+ ]
+
+To access the value of a given key, you can simply do:
+
+.. code:: bash
+
+ $ ia metadata nasa | jq '.files_count'
+ 8
+
+As you can see, the command above returns the value for the ``files_count`` key.
+There are 8 files in the item.
+
+When working with ``ia metadata`` the ``metadata`` and ``files`` keys are likely to be the targets you'll want to access most.
+Let's take a look at ``metadata``:
+
+.. code:: bash
+
+ $ ia metadata | jq '.metadata | keys'
+ [
+ "addeddate",
+ "backup_location",
+ "collection",
+ "description",
+ "hidden",
+ "homepage",
+ "identifier",
+ "mediatype",
+ "num_recent_reviews",
+ "num_subcollections",
+ "num_top_dl",
+ "publicdate",
+ "related_collection",
+ "rights",
+ "show_browse_by_date",
+ "show_hidden_subcollections",
+ "show_search_by_year",
+ "spotlight_identifier",
+ "title",
+ "updatedate",
+ "updater",
+ "uploader"
+ ]
+
+As you might notice, this is all of the item-level metadata (i.e. the JSON equivalent of an item's ``<identifier>_meta.xml`` file).
+We can decend deeper into the JSON document like so:
+
+.. code:: bash
+
+ $ ia metadata nasa | jq '.metadata.title'
+ "NASA Images"
+
+``jq`` returns JSON by default.
+In this case, a quoted string.
+To access the raw value, you can use the ``-r`` option:
+
+.. code:: bash
+
+ $ ia metadata nasa | jq -r '.metadata.title'
+ NASA Images
+
+Search
+------
+
+``ia search`` outputs JSONL.
+JSONL is series of JSON documents separated by a newline.
+In this case, one JSON document is returned per search document reutrned.
+
+
+Converting search results to CSV and other formats
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``jq`` can be used to parse the JSON returned by ``ia search`` into CSV or TSV files:
+
+.. code:: bash
+
+ $ ia search 'identifier:nasa OR identifier:stairs' --field title,date,subject | jq -r '[.identifier, .title, .date, .subject] | @csv'
+ "nasa","NASA Images",,
+ "stairs","stairs where i worked","2004-01-01T00:00:00Z","test"
+
+If you'd prefer a tab-separated spreadsheet, you can replace ``@csv`` with ``@tsv`` in the command above.
+More options can be found in the *Format strings and escaping* section in the `jq manual <https://stedolan.github.io/jq/manual/>`_.
+
+Catalog
+-------
+
+Get info on all of your IA-S3 tasks:
+
+.. code:: bash
+
+ $ ia tasks --json | jq 'select(.args.comment == "s3-put")'
+
+Or, output a link to the tasklog for each S3 task you currently have queued or running:
+
+.. code:: bash
+
+ $ ia tasks nasa --json \
+ | jq -r 'select(.args.comment == "s3-put") | "https://archive.org/log/\(.task_id)"'
+ https://archive.org/log/469558161
+ https://archive.org/log/400818482
+
+Get the identifiers for all of your redrows:
+
+.. code:: bash
+
+ $ ia tasks --json | jq -r 'select(.row_type == "red").identifier'
+
+TODO
+____
+
+Recipes to document, work in progress...
+
+
+Select files of a specific format
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code:: bash
+
+ $ ia metadata nasa | jq '.files[] | select(.format == "JPEG")'
+ {
+ "name": "globe_west_540.jpg",
+ "source": "original",
+ "size": "66065",
+ "format": "JPEG",
+ "mtime": "1245274910",
+ "md5": "9366a4b09386bf673c447e33d806d904",
+ "crc32": "2283b5fd",
+ "sha1": "3e20a009994405f535cdf07cdc2974cef2fce8f2",
+ "rotation": "0"
+ }
+
+Select a file by name
+^^^^^^^^^^^^^^^^^^^^^
+
+.. code:: bash
+
+ $ ia metadata nasa | jq '.files[] | select(.name == "nasa_meta.xml")'
+ {
+ "name": "nasa_meta.xml",
+ "source": "metadata",
+ "size": "7968",
+ "format": "Metadata",
+ "mtime": "1530756295",
+ "md5": "06cd95343d60df0f10fb8518b349a795",
+ "crc32": "6b9c6e24",
+ "sha1": "c0dc994eeba245671ef53e2f6c52612722bf51d3"
+ }
+
+
+Get the size of a collection
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code:: bash
+
+ » ia search 'collection:georgeblood' -f item_size | jq '.item_size' | paste -sd+ - | bc
+ 51677834206186
+
+Getting checksums for all files in an item
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code:: bash
+
+ $ ia metadata nasa | jq -r '.metadata.identifier as $id | .files[] | [$id, .name, .md5] | @tsv'
+ nasa NASAarchiveLogo.jpg 64dcc1092df36142eb4aab7cc255a4a6
+ nasa __ia_thumb.jpg c354f821954f80516d163c23135e7dd7
+ nasa globe_west_540.jpg 9366a4b09386bf673c447e33d806d904
+ nasa globe_west_540_thumb.jpg d3dab682c56058c8af0df5a2073b1dd1
+ nasa nasa_archive.torrent 70a7b2b44c318bac381c25febca3b2ca
+ nasa nasa_files.xml 5b8a61ea930ce04d093deebe260fd5f8
+ nasa nasa_meta.xml 06cd95343d60df0f10fb8518b349a795
+ nasa nasa_reviews.xml 711ba65d49383a25657640716c45e840
+
+Creating histograms
+^^^^^^^^^^^^^^^^^^^
+
+This example creates a histogram of publisher's grouped by item_size.
+
+.. code:: bash
+
+ » ia search 'collection:georgeblood' -f publisher,item_size \
+ | jq -r '"\(.publisher) \(.item_size)"' \
+ | awk '{arr[$1]+=$2} END {for (i in arr) {print i,arr[i]}}' \
+ | sort -rn -k2 \
+ | head
+ Decca 9518737758200
+ Victor 8067854677756
+ Columbia 7221975357654
+ Capitol 1944338651172
+ Brunswick 1574280922547
+ Bluebird 1058465142211
+ Mercury 1003001910967
+ MGM 898067089555
+ Okeh 808308437878
+ Vocalion 608766709327
+
+Get total imagecount of a collection
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code:: bash
+
+ $ ia search 'scanningcenter:uoft AND shiptracking:ace54704' -f imagecount | jq '.imagecount' | paste -sd+ - | bc
+ 8172
+
+Selecting files based on filesize
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Get the filenames of every file in ``goodytwoshoes00newyiala`` that is larger than 3000 bytes:
+
+.. code:: bash
+
+ $ ia metadata goodytwoshoes00newyiala \
+ | jq -r '.files[] | select(.name | endswith(".pdf")) | select((.size | tonumber) > 3000) | .name'
+ goodytwoshoes00newyiala.pdf
+ goodytwoshoes00newyiala_bw.pdf
+
+You can also include the identifier in the output like so:
+
+.. code:: bash
+
+ $ ia metadata goodytwoshoes00newyiala \
+ | jq -r '.metadata.identifier as $i | .files[] | select(.name | endswith(".pdf")) | select((.size | tonumber) > 3000) | "\($i)/\(.name)"'
+ goodytwoshoes00newyiala/goodytwoshoes00newyiala.pdf
+ goodytwoshoes00newyiala/goodytwoshoes00newyiala_bw.pdf
diff -Nru python-internetarchive-1.8.1/docs/source/metadata.rst python-internetarchive-1.8.5/docs/source/metadata.rst
--- python-internetarchive-1.8.1/docs/source/metadata.rst 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/docs/source/metadata.rst 2019-06-07 17:28:42.000000000 -0400
@@ -63,8 +63,15 @@
Please `contact Internet Archive <mailto:info@archive.org?subject=[Collection Creation Request]>`_ if you need a collection created.
All items **should** belong to a collection.
-If a collection is not specified at the time of upload, it will be added to the ``opensource`` collection.
-For testing purposes, you may upload to the ``test_collection`` collection.
+If a collection is not specified at the time of upload, it will be added to the `Community texts <https://archive.org/details/opensource>`_ collection.
+For testing purposes, you may upload to the ``test_collection`` collection. The following collections are also available to the public at the time of writing:
+
+ * `Community Audio <https://archive.org/details/opensource_audio>`_
+ * `Community Media <https://archive.org/details/opensource_media>`_
+ * `Community Software <https://archive.org/details/open_source_software>`_
+ * `Community Texts <https://archive.org/details/opensource>`_ (default collection)
+ * `Community Video <https://archive.org/details/opensource_movies>`_
+ * `Test collection <https://archive.org/details/test_collection>`_
contributor
^^^^^^^^^^^
diff -Nru python-internetarchive-1.8.1/docs/source/quickstart.rst python-internetarchive-1.8.5/docs/source/quickstart.rst
--- python-internetarchive-1.8.1/docs/source/quickstart.rst 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/docs/source/quickstart.rst 2019-06-07 17:28:42.000000000 -0400
@@ -27,7 +27,7 @@
Uploading
---------
-Creating a new `item on archive.org <items.html>`_ and uploading files to it is as easy as::
+Creating a new `item on archive.org <//archive.org/services/docs/api/items.html>`_ and uploading files to it is as easy as::
>>> from internetarchive import upload
>>> md = dict(collection='test_collection', title='My New Item', mediatype='movies')
@@ -67,9 +67,9 @@
You can access all of an item's metadata via the :class:`Item <internetarchive.Item>` object::
>>> from internetarchive import get_item
- >>> item = get_item('iacli-test-item301')
+ >>> item = get_item('nasa')
>>> item.item_metadata['metadata']['title']
- 'My Title'
+ 'NASA Images'
:func:`get_item <internetarchive.get_item>` retrieves all of an item's metadata via the `Internet Archive Metadata API <http://blog.archive.org/2013/07/04/metadata-api/>`_. This metadata can be accessed via the ``Item.item_metadata`` attribute::
@@ -79,13 +79,13 @@
All of the top-level keys in ``item.item_metadata`` are available as attributes::
>>> item.server
- 'ia801507.us.archive.org'
+ 'ia802606.us.archive.org'
>>> item.item_size
- 161752024
+ 126586
>>> item.files[0]['name']
- 'blank.txt'
+ 'NASAarchiveLogo.jpg'
>>> item.metadata['identifier']
- 'iacli-test-item301'
+ 'nasa'
Writing Metadata
@@ -120,7 +120,7 @@
>>> f.title
'My File Title'
-Refer to `Internet Archive Metadata <metadata.html>`_ for more specific details regarding metadata and archive.org.
+Refer to `Internet Archive Metadata <//archive.org/services/docs/api/metadata-schema/index.html>`_ for more specific details regarding metadata and archive.org.
Downloading
diff -Nru python-internetarchive-1.8.1/.gitignore python-internetarchive-1.8.5/.gitignore
--- python-internetarchive-1.8.1/.gitignore 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/.gitignore 2019-06-07 17:28:42.000000000 -0400
@@ -7,13 +7,20 @@
itemlist.txt
.tox
TAGS
-*csv
+*.csv
htmlcov
-*log
+*.log
*.pex
+pex/
wheelhouse
*gz
.venv*
.cache
.vagrant
-.idea
\ Pas de fin de ligne à la fin du fichier
+.idea
+v2/
+v3.6/
+v3.7/
+.pytest_cache/
+.python-version
+trash/
diff -Nru python-internetarchive-1.8.1/HISTORY.rst python-internetarchive-1.8.5/HISTORY.rst
--- python-internetarchive-1.8.1/HISTORY.rst 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/HISTORY.rst 2019-06-07 17:28:42.000000000 -0400
@@ -3,10 +3,65 @@
Release History
---------------
+1.8.5 (2019-06-07)
+++++++++++++++++++
+
+**Features and Improvements**
+
+- Improved timeout logging and exceptions.
+- Added support for arbitrary targets to metadata write.
+- IA-S3 keys now supported for auth in download.
+- Authoraization (i.e. ``ia configure``) now uses the archive.org xauthn endpoint.
+
+**Bugfixes**
+
+- Fixed encoding error in --get-task-log
+- Fixed bug in upload where connections were not being closed in upload.
+
+1.8.4 (2019-04-11)
+++++++++++++++++++
+
+**Features and Improvements**
+
+- It's now possible to retrieve task logs, given a task id, without first retrieving the items task history.
+- Added examples to ``ia tasks`` help.
+
+1.8.3 (2019-03-29)
+++++++++++++++++++
+
+**Features and Improvements**
+
+- Increased search timeout from 24 to 300 seconds.
+
+**Bugfixes**
+
+- Fixed bug in setup.py where backports.csv wasn't being installed when installing from pypi.
+
+1.8.2 (2019-03-21)
+++++++++++++++++++
+
+**Features and Improvements**
+
+- Documnetation updates.
+- Added support for write-many to modify_metadata.
+
+**Bugfixes**
+
+- Fixed bug in ``ia tasks --task-id`` where no task was being returned.
+- Fixed bug in ``internetarchive.get_tasks()`` where it was not possible to query by ``task_id``.
+- Fixed TypeError bug in upload when uploading with checksum=True.
+
+1.8.1 (2018-06-28)
+++++++++++++++++++
+
+**Bugfixes**
+
+- Fixed bug in ``ia tasks --get-task-log`` that was returning an unable to parse JSON error.
+
1.8.0 (2018-06-28)
++++++++++++++++++
-**Feautres and Improvements**
+**Features and Improvements**
- Only use backports.csv for python2 in support of FreeBDS port.
- Added a nicer error message to ``ia search`` for authentication errors.
@@ -26,7 +81,7 @@
1.7.7 (2018-03-05)
++++++++++++++++++
-**Feautres and Improvements**
+**Features and Improvements**
- Added support for downloading on-the-fly archive_marc.xml files.
@@ -39,7 +94,7 @@
1.7.6 (2018-01-05)
++++++++++++++++++
-**Feautres and Improvements**
+**Features and Improvements**
- Added ability to set the remote-name for a directory in ``ia upload`` (previously you could only do this for single files).
@@ -50,7 +105,7 @@
1.7.5 (2017-12-07)
++++++++++++++++++
-**Feautres and Improvements**
+**Features and Improvements**
- Turned on ``x-archive-keep-old-version`` S3 header by default for all ``ia upload``, ``ia delete``, ``ia copy``, and ``ia move`` commands.
This means that any ``ia`` command that clobbers or deletes a command, will save a version of the file in ``<identifier>/history/files/$key.~N~``.
@@ -60,7 +115,7 @@
1.7.4 (2017-11-06)
++++++++++++++++++
-**Feautres and Improvements**
+**Features and Improvements**
- Increased timeout in search from 12 seconds to 24.
- Added ability to set the ``max_retries`` in :func:`internetarchive.search_items`.
@@ -83,7 +138,7 @@
1.7.2 (2017-09-11)
++++++++++++++++++
-**Feautres and Improvements**
+**Features and Improvements**
- Added support for adding custom headers to ``ia search``.
@@ -114,7 +169,7 @@
1.7.0 (2017-07-25)
++++++++++++++++++
-**Feautres and Improvements**
+**Features and Improvements**
- Loosened up ``jsonpatch`` requirements, as the metadata API now supports more recent versions of the JSON Patch standard.
- Added support for building "snap" packages (https://snapcraft.io/).
diff -Nru python-internetarchive-1.8.1/internetarchive/api.py python-internetarchive-1.8.5/internetarchive/api.py
--- python-internetarchive-1.8.1/internetarchive/api.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/api.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2017 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -23,7 +23,7 @@
This module implements the Internetarchive API.
-:copyright: (C) 2012-2017 by Internet Archive.
+:copyright: (C) 2012-2019 by Internet Archive.
:license: AGPL 3, see LICENSE for more details.
"""
from __future__ import absolute_import
@@ -447,7 +447,7 @@
def get_tasks(identifier=None,
- task_ids=None,
+ task_id=None,
task_type=None,
params=None,
config=None,
@@ -464,8 +464,8 @@
:param identifier: (optional) The Archive.org identifier for which to retrieve tasks
for.
- :type task_ids: int or str
- :param task_ids: (optional) The task_ids to retrieve from the Archive.org catalog.
+ :type task_id: int or str
+ :param task_is: (optional) The task_id to retrieve from the Archive.org catalog.
:type task_type: str
:param task_type: (optional) The type of tasks to retrieve from the Archive.org
@@ -489,7 +489,7 @@
if not archive_session:
archive_session = get_session(config, config_file, http_adapter_kwargs)
return archive_session.get_tasks(identifier=identifier,
- task_ids=task_ids,
+ task_id=task_id,
params=params,
config=config,
verbose=verbose,
diff -Nru python-internetarchive-1.8.1/internetarchive/auth.py python-internetarchive-1.8.5/internetarchive/auth.py
--- python-internetarchive-1.8.1/internetarchive/auth.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/auth.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2017 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -23,7 +23,7 @@
This module contains the Archive.org authentication handlers for Requests.
-:copyright: (C) 2012-2017 by Internet Archive.
+:copyright: (C) 2012-2019 by Internet Archive.
:license: AGPL 3, see LICENSE for more details.
"""
from requests.auth import AuthBase
diff -Nru python-internetarchive-1.8.1/internetarchive/catalog.py python-internetarchive-1.8.5/internetarchive/catalog.py
--- python-internetarchive-1.8.1/internetarchive/catalog.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/catalog.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2017 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -23,7 +23,7 @@
This module contains objects for interacting with the Archive.org catalog.
-:copyright: (C) 2012-2017 by Internet Archive.
+:copyright: (C) 2012-2019 by Internet Archive.
:license: AGPL 3, see LICENSE for more details.
"""
from __future__ import absolute_import
@@ -128,6 +128,8 @@
self.params['justme'] = 1
if task_id:
+ if isinstance(task_id, list):
+ task_id = task_id[0]
task_id = str(task_id)
self.params.update(dict(
search_task_id=task_id,
@@ -217,9 +219,31 @@
"""
if self.task_id is None:
raise ValueError('task_id is None')
- url = '{0}//catalogd.archive.org/log/{1}'.format(self.session.protocol,
- self.task_id)
+ return self.get_task_log(self.task_id, self.session, self.request_kwargs)
+
+ @staticmethod
+ def get_task_log(task_id, session, request_kwargs=None):
+ """Static method for getting a task log, given a task_id.
+
+ This method exists so a task log can be retrieved without
+ retrieving the items task history first.
+
+ :type task_id: str or int
+ :param task_id: The task id for the task log you'd like to fetch.
+
+ :type archive_session: :class:`ArchiveSession <ArchiveSession>`
+
+ :type request_kwargs: dict
+ :param request_kwargs: (optional) Keyword arguments that
+ :py:class:`requests.Request` takes.
+
+ :rtype: str
+ :returns: The task log as a string.
+
+ """
+ request_kwargs = request_kwargs if request_kwargs else dict()
+ url = '{0}//catalogd.archive.org/log/{1}'.format(session.protocol, task_id)
p = dict(full=1)
- r = self.session.get(url, params=p, **self.request_kwargs)
+ r = session.get(url, params=p, **request_kwargs)
r.raise_for_status()
return r.content.decode('utf-8')
diff -Nru python-internetarchive-1.8.1/internetarchive/cli/argparser.py python-internetarchive-1.8.5/internetarchive/cli/argparser.py
--- python-internetarchive-1.8.1/internetarchive/cli/argparser.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/cli/argparser.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2016 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -21,7 +21,7 @@
internetarchive.cli.argparser
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-:copyright: (C) 2012-2016 by Internet Archive.
+:copyright: (C) 2012-2019 by Internet Archive.
:license: AGPL 3, see LICENSE for more details.
"""
from collections import defaultdict
@@ -54,6 +54,18 @@
return metadata
+def get_args_dict_many_write(metadata):
+ changes = defaultdict(list)
+ for key in metadata:
+ target = '/'.join(key.split('/')[:-1])
+ field = key.split('/')[-1]
+ if not changes[target]:
+ changes[target] = {field: metadata[key]}
+ else:
+ changes[target][field] = metadata[key]
+ return changes
+
+
def convert_str_list_to_unicode(str_list):
unicode_list = list()
for x in str_list:
diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_configure.py python-internetarchive-1.8.5/internetarchive/cli/ia_configure.py
--- python-internetarchive-1.8.1/internetarchive/cli/ia_configure.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/cli/ia_configure.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2016 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_copy.py python-internetarchive-1.8.5/internetarchive/cli/ia_copy.py
--- python-internetarchive-1.8.1/internetarchive/cli/ia_copy.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/cli/ia_copy.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2016 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -69,12 +69,12 @@
s = Schema({
str: Use(bool),
'<src-identifier>/<src-file>': And(str, And(And(str, lambda x: '/' in x,
- error='Destiantion not formatted correctly. See usage example.'),
+ error='Destination not formatted correctly. See usage example.'),
assert_src_file_exists, error=(
'https://archive.org/download/{} does not exist. '
'Please check the identifier and filepath and retry.'.format(src_path)))),
'<dest-identifier>/<dest-file>': And(str, lambda x: '/' in x,
- error='Destiantion not formatted correctly. See usage example.'),
+ error='Destination not formatted correctly. See usage example.'),
'--metadata': Or(None, And(Use(get_args_dict), dict),
error='--metadata must be formatted as --metadata="key:value"'),
'--header': Or(None, And(Use(get_args_dict), dict),
diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_delete.py python-internetarchive-1.8.5/internetarchive/cli/ia_delete.py
--- python-internetarchive-1.8.1/internetarchive/cli/ia_delete.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/cli/ia_delete.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2016 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_download.py python-internetarchive-1.8.5/internetarchive/cli/ia_download.py
--- python-internetarchive-1.8.1/internetarchive/cli/ia_download.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/cli/ia_download.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2016 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -37,7 +37,7 @@
-I, --itemlist=<file> Download items from a specified file. Itemlists should
be a plain text file with one identifier per line.
-S, --search=<query> Download items returned from a specified search query.
- -p, --search-parameters=<key:value>... Download items returned from a specified search query.
+ -P, --search-parameters=<key:value>... Download items returned from a specified search query.
-g, --glob=<pattern> Only download files whose filename matches the
given glob pattern.
-f, --format=<format>... Only download files of the specified format(s).
@@ -57,6 +57,7 @@
-s, --stdout Write file contents to stdout.
--no-change-timestamp Don't change the timestamp of downloaded files to reflect
the source material.
+ -p, --parameters=<key:value>... Parameters to send with your query (e.g. `cnt=0`).
"""
from __future__ import print_function, absolute_import
import os
@@ -96,7 +97,8 @@
'--retries': Use(lambda x: x[0]),
'--search-parameters': Use(lambda x: get_args_dict(x, query_string=True)),
'--on-the-fly': Use(bool),
- '--no-change-timestamp': Use(bool)
+ '--no-change-timestamp': Use(bool),
+ '--parameters': Use(lambda x: get_args_dict(x, query_string=True)),
})
# Filenames should be unicode literals. Support PY2 and PY3.
@@ -166,7 +168,9 @@
stdout_buf = sys.stdout
else:
stdout_buf = sys.stdout.buffer
- f[0].download(retries=args['--retries'], fileobj=stdout_buf)
+ f[0].download(retries=args['--retries'],
+ fileobj=stdout_buf,
+ params=args['--parameters'])
sys.exit(0)
try:
identifier = identifier.strip()
@@ -204,7 +208,8 @@
item_index=item_index,
ignore_errors=True,
on_the_fly=args['--on-the-fly'],
- no_change_timestamp=args['--no-change-timestamp']
+ no_change_timestamp=args['--no-change-timestamp'],
+ params=args['--parameters']
)
if _errors:
errors.append(_errors)
diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_list.py python-internetarchive-1.8.5/internetarchive/cli/ia_list.py
--- python-internetarchive-1.8.1/internetarchive/cli/ia_list.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/cli/ia_list.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2016 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_metadata.py python-internetarchive-1.8.5/internetarchive/cli/ia_metadata.py
--- python-internetarchive-1.8.1/internetarchive/cli/ia_metadata.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/cli/ia_metadata.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2016 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -25,7 +25,7 @@
[--priority=<priority>]
ia metadata <identifier>... --remove=<key:value>... [--priority=<priority>]
ia metadata <identifier>... [--append=<key:value>... | --append-list=<key:value>...]
- [--priority=<priority>]
+ [--priority=<priority>] [--target=<target>]
ia metadata --spreadsheet=<metadata.csv> [--priority=<priority>]
[--modify=<key:value>...]
ia metadata --help
@@ -60,7 +60,7 @@
from schema import Schema, SchemaError, Or, And, Use
import six
-from internetarchive.cli.argparser import get_args_dict
+from internetarchive.cli.argparser import get_args_dict, get_args_dict_many_write
# Only import backports.csv for Python2 (in support of FreeBSD port).
PY2 = sys.version_info[0] == 2
@@ -188,6 +188,8 @@
metadata_args = args['--remove']
try:
metadata = get_args_dict(metadata_args)
+ if any('/' in k for k in metadata):
+ metadata = get_args_dict_many_write(metadata)
except ValueError:
print("error: The value of --modify, --remove, --append or --append-list "
"is invalid. It must be formatted as: --modify=key:value",
diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_move.py python-internetarchive-1.8.5/internetarchive/cli/ia_move.py
--- python-internetarchive-1.8.1/internetarchive/cli/ia_move.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/cli/ia_move.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2016 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -53,7 +53,7 @@
'<src-identifier>/<src-file>': And(str, lambda x: '/' in x,
error='Source not formatted correctly. See usage example.'),
'<dest-identifier>/<dest-file>': And(str, lambda x: '/' in x,
- error='Destiantion not formatted correctly. See usage example.'),
+ error='Destination not formatted correctly. See usage example.'),
})
try:
args = s.validate(args)
@@ -63,7 +63,7 @@
# Add keep-old-version by default.
if 'x-archive-keep-old-version' not in args['--header']:
- headers['x-archive-keep-old-version'] = '1'
+ args['--header']['x-archive-keep-old-version'] = '1'
# First we use ia_copy, prep argv for ia_copy.
argv.pop(0)
diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia.py python-internetarchive-1.8.5/internetarchive/cli/ia.py
--- python-internetarchive-1.8.1/internetarchive/cli/ia.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/cli/ia.py 2019-06-07 17:28:42.000000000 -0400
@@ -3,7 +3,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2016 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_search.py python-internetarchive-1.8.5/internetarchive/cli/ia_search.py
--- python-internetarchive-1.8.1/internetarchive/cli/ia_search.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/cli/ia_search.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2016 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -33,7 +33,7 @@
-i, --itemlist Output identifiers only.
-f, --field=<field>... Metadata fields to return.
-n, --num-found Print the number of results to stdout.
- -t, --timeout=<seconds> Set the timeout in seconds [default: 24].
+ -t, --timeout=<seconds> Set the timeout in seconds [default: 300].
"""
from __future__ import absolute_import, print_function, unicode_literals
import sys
diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_tasks.py python-internetarchive-1.8.5/internetarchive/cli/ia_tasks.py
--- python-internetarchive-1.8.1/internetarchive/cli/ia_tasks.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/cli/ia_tasks.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2016 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -36,15 +36,22 @@
-g, --green-rows Return information about tasks that have not run.
-b, --blue-rows Return information about running tasks.
-r, --red-rows Return information about tasks that have failed.
- -p, --parameter=<k:v>... Return tasks matching the given parameter.
+ -p, --parameter=<k:v>... URL parameters passed to catalog.php.
-j, --json Output detailed information in JSON.
+examples:
+ ia tasks nasa
+ ia tasks nasa -p cmds:derive.php # only return derive.php tasks
+ ia tasks -p mode:s3 # return all S3 tasks
+ ia tasks --get-task-log 1178878475 # get a task log for a specific task
"""
from __future__ import absolute_import, print_function
import sys
import json
from docopt import docopt
+from requests.exceptions import HTTPError
+import six
from internetarchive.cli.argparser import get_args_dict
@@ -76,13 +83,16 @@
task_type=task_type,
params=params)
elif args['--get-task-log']:
- task = session.get_tasks(task_id=args['--get-task-log'], params=params)
- if task:
- log = task[0].task_log()
- sys.exit(print(log))
- else:
+ try:
+ log = session.get_task_log(args['--get-task-log'], params)
+ if six.PY2:
+ print(log.encode('utf-8'))
+ else:
+ print(log)
+ sys.exit(0)
+ except HTTPError:
print('error retrieving task-log '
- 'for {0}\n'.format(args['--get-task-log']), file=sys.stderr)
+ 'for {0}'.format(args['--get-task-log']), file=sys.stderr)
sys.exit(1)
elif args['--task']:
tasks = session.get_tasks(task_id=args['--task'], params=params)
diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_upload.py python-internetarchive-1.8.5/internetarchive/cli/ia_upload.py
--- python-internetarchive-1.8.1/internetarchive/cli/ia_upload.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/cli/ia_upload.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2016 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -102,7 +102,7 @@
# Format error message for any non 200 responses that
# we haven't caught yet,and write to stderr.
- if responses and responses[-1] and responses[-1].status_code != 200:
+ if responses and responses[-1].status_code and responses[-1].status_code != 200:
if not responses[-1].status_code:
return responses
filename = responses[-1].request.url.split('/')[-1]
@@ -110,7 +110,6 @@
msg = get_s3_xml_text(responses[-1].content)
except:
msg = responses[-1].content
- print(' error uploading {0}: {2}'.format(filename, msg), file=sys.stderr)
return responses
@@ -225,7 +224,7 @@
for _r in _upload_files(item, files, upload_kwargs):
if args['--debug']:
break
- if (not _r) or (not _r.ok):
+ if (not _r.status_code) or (not _r.ok):
ERRORS = True
# Bulk upload using spreadsheet.
diff -Nru python-internetarchive-1.8.1/internetarchive/cli/__init__.py python-internetarchive-1.8.5/internetarchive/cli/__init__.py
--- python-internetarchive-1.8.1/internetarchive/cli/__init__.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/cli/__init__.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2016 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -21,7 +21,7 @@
internetarchive.cli
~~~~~~~~~~~~~~~~~~~
-:copyright: (C) 2012-2016 by Internet Archive.
+:copyright: (C) 2012-2019 by Internet Archive.
:license: AGPL 3, see LICENSE for more details.
"""
from internetarchive.cli import ia, ia_configure, ia_delete, ia_download, ia_list, \
diff -Nru python-internetarchive-1.8.1/internetarchive/config.py python-internetarchive-1.8.5/internetarchive/config.py
--- python-internetarchive-1.8.1/internetarchive/config.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/config.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2017 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -21,7 +21,7 @@
internetarchive.config
~~~~~~~~~~~~~~~~~~~~~~
-:copyright: (C) 2012-2017 by Internet Archive.
+:copyright: (C) 2012-2019 by Internet Archive.
:license: AGPL 3, see LICENSE for more details.
"""
from __future__ import absolute_import
@@ -37,57 +37,37 @@
from internetarchive import auth
-def get_auth_config(username, password):
- payload = dict(
- username=username,
- password=password,
- remember='CHECKED',
- action='login',
- )
-
- with requests.Session() as s:
- # Attache logged-in-* cookies to Session.
- u = 'https://archive.org/account/login.php'
- r = s.post(u, data=payload, cookies={'test-cookie': '1'})
- if 'logged-in-sig' not in s.cookies:
- raise AuthenticationError('Authentication failed. '
- 'Please check your credentials and try again.')
-
- # Get S3 keys.
- u = 'https://archive.org/account/s3.php'
- p = dict(output_json=1)
- r = s.get(u, params=p)
- j = r.json()
- access_key = j['key']['s3accesskey']
- secret_key = j['key']['s3secretkey']
- if not j or not j.get('key'):
- raise AuthenticationError('Authentication failed. '
- 'Please check your credentials and try again.')
-
- # Get user info (screenname).
- u = 'https://s3.us.archive.org'
- p = dict(check_auth=1)
- r = requests.get(u, params=p, auth=auth.S3Auth(access_key, secret_key))
- r.raise_for_status()
- j = r.json()
- if j.get('error'):
- raise AuthenticationError(j.get('error'))
- user_info = j['screenname']
-
- auth_config = {
- 's3': {
- 'access': access_key,
- 'secret': secret_key,
- },
- 'cookies': {
- 'logged-in-user': s.cookies['logged-in-user'],
- 'logged-in-sig': s.cookies['logged-in-sig'],
- },
- 'general': {
- 'screenname': user_info,
- }
+def get_auth_config(email, password):
+ u = 'https://archive.org/services/xauthn/'
+ p = dict(op='login')
+ d = dict(email=email, password=password)
+ r = requests.post(u, params=p, data=d)
+ j = r.json()
+ if not j.get('success'):
+ try:
+ msg = j['values']['reason']
+ except KeyError:
+ msg = j['error']
+ if msg == 'account_not_found':
+ msg = 'Account not found, check your email and try again.'
+ elif msg == 'account_bad_password':
+ msg = 'Incorrect password, try again.'
+ else:
+ msg = 'Authentication failed: {}'.format(msg)
+ raise AuthenticationError(msg)
+ auth_config = {
+ 's3': {
+ 'access': j['values']['s3']['access'],
+ 'secret': j['values']['s3']['secret'],
+ },
+ 'cookies': {
+ 'logged-in-user': j['values']['cookies']['logged-in-user'],
+ 'logged-in-sig': j['values']['cookies']['logged-in-sig'],
+ },
+ 'general': {
+ 'screenname': j['values']['screenname'],
}
-
+ }
return auth_config
diff -Nru python-internetarchive-1.8.1/internetarchive/exceptions.py python-internetarchive-1.8.5/internetarchive/exceptions.py
--- python-internetarchive-1.8.1/internetarchive/exceptions.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/exceptions.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2017 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -21,7 +21,7 @@
internetarchive.exceptions
~~~~~~~~~~~~~~~~~~~~~~~~~~
-:copyright: (C) 2012-2017 by Internet Archive.
+:copyright: (C) 2012-2019 by Internet Archive.
:license: AGPL 3, see LICENSE for more details.
"""
diff -Nru python-internetarchive-1.8.1/internetarchive/files.py python-internetarchive-1.8.5/internetarchive/files.py
--- python-internetarchive-1.8.1/internetarchive/files.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/files.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2017 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -21,7 +21,7 @@
internetarchive.files
~~~~~~~~~~~~~~~~~~~~~
-:copyright: (C) 2012-2017 by Internet Archive.
+:copyright: (C) 2012-2019 by Internet Archive.
:license: AGPL 3, see LICENSE for more details.
"""
from __future__ import absolute_import, unicode_literals, print_function
@@ -35,7 +35,7 @@
from requests.exceptions import HTTPError, RetryError, ConnectTimeout, \
ConnectionError, ReadTimeout
-from internetarchive import iarequest, utils
+from internetarchive import iarequest, utils, auth
log = logging.getLogger(__name__)
@@ -117,6 +117,11 @@
name=urllib.parse.quote(name.encode('utf-8')),
)
self.url = '{protocol}//archive.org/download/{id}/{name}'.format(**url_parts)
+ if self.item.session.access_key and self.item.session.secret_key:
+ self.auth = auth.S3Auth(self.item.session.access_key,
+ self.item.session.secret_key)
+ else:
+ self.auth = None
def __repr__(self):
return ('File(identifier={identifier!r}, '
@@ -126,7 +131,8 @@
def download(self, file_path=None, verbose=None, silent=None, ignore_existing=None,
checksum=None, destdir=None, retries=None, ignore_errors=None,
- fileobj=None, return_responses=None, no_change_timestamp=None):
+ fileobj=None, return_responses=None, no_change_timestamp=None,
+ params=None):
"""Download the file into the current working directory.
:type file_path: str
@@ -169,6 +175,10 @@
current time instead of changing it to that given in
the original archive.
+ :type params: dict
+ :param params: (optional) URL parameters to send with
+ download request (e.g. `cnt=0`).
+
:rtype: bool
:returns: True if file was successfully downloaded.
"""
@@ -179,6 +189,7 @@
ignore_errors = False if not ignore_errors else ignore_errors
return_responses = False if not return_responses else return_responses
no_change_timestamp = False if not no_change_timestamp else no_change_timestamp
+ params = None if not params else params
if (fileobj and silent is None) or silent is not False:
silent = True
@@ -240,7 +251,11 @@
os.makedirs(parent_dir)
try:
- response = self.item.session.get(self.url, stream=True, timeout=12)
+ response = self.item.session.get(self.url,
+ stream=True,
+ timeout=12,
+ auth=self.auth,
+ params=params)
response.raise_for_status()
if return_responses:
return response
diff -Nru python-internetarchive-1.8.1/internetarchive/iarequest.py python-internetarchive-1.8.5/internetarchive/iarequest.py
--- python-internetarchive-1.8.1/internetarchive/iarequest.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/iarequest.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2017 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -21,7 +21,7 @@
internetarchive.iarequest
~~~~~~~~~~~~~~~~~~~~~~~~~
-:copyright: (C) 2012-2017 by Internet Archive.
+:copyright: (C) 2012-2019 by Internet Archive.
:license: AGPL 3, see LICENSE for more details.
"""
from __future__ import absolute_import
@@ -32,6 +32,7 @@
import json
import re
import copy
+import logging
from six.moves import urllib
import requests.models
@@ -40,7 +41,10 @@
import six
from internetarchive import auth, __version__
-from internetarchive.utils import needs_quote
+from internetarchive.utils import needs_quote, delete_items_from_dict
+
+
+logger = logging.getLogger(__name__)
class S3Request(requests.models.Request):
@@ -232,40 +236,120 @@
if not source_metadata:
r = requests.get(self.url)
- source_metadata = r.json().get(target.split('/')[0], {})
- if 'metadata' in target:
- destination_metadata = source_metadata.copy()
- prepared_metadata = prepare_metadata(metadata, source_metadata, append,
- append_list)
- destination_metadata.update(prepared_metadata)
- elif 'files' in target:
- filename = '/'.join(target.split('/')[1:])
- for f in source_metadata:
- if f.get('name') == filename:
- source_metadata = f
- break
- destination_metadata = source_metadata.copy()
- prepared_metadata = prepare_metadata(metadata, source_metadata, append)
- destination_metadata.update(prepared_metadata)
+ source_metadata = r.json()
+
+ # Write to many targets
+ if isinstance(metadata, list) \
+ or any('/' in k for k in metadata) \
+ or all(isinstance(k, dict) for k in metadata.values()):
+ changes = list()
+
+ if any(not k for k in metadata):
+ raise ValueError('Invalid metadata provided, '
+ 'check your input and try again')
+
+ if target:
+ metadata = {target: metadata}
+ for key in metadata:
+ if key == 'metadata':
+ patch = prepare_patch(metadata[key],
+ source_metadata['metadata'],
+ append,
+ append_list)
+ elif key.startswith('files'):
+ patch = prepare_files_patch(metadata[key],
+ source_metadata['files'],
+ append,
+ key,
+ append_list)
+ else:
+ key = key.split('/')[0]
+ patch = prepare_target_patch(metadata, source_metadata, append,
+ target, append_list, key)
+ changes.append({'target': key, 'patch': patch})
+ self.data = {
+ '-changes': json.dumps(changes),
+ 'priority': priority,
+ }
+ logger.debug('submitting metadata request: {}'.format(self.data))
+ # Write to single target
else:
- destination_metadata = source_metadata.copy()
- prepared_metadata = prepare_metadata(metadata, source_metadata, append)
- destination_metadata.update(prepared_metadata)
-
- # Delete metadata items where value is REMOVE_TAG.
- destination_metadata = dict(
- (k, v) for (k, v) in destination_metadata.items() if v != 'REMOVE_TAG'
- )
+ if not target or 'metadata' in target:
+ target = 'metadata'
+ patch = prepare_patch(metadata, source_metadata['metadata'], append,
+ append_list)
+ elif 'files' in target:
+ patch = prepare_files_patch(metadata, source_metadata['files'], append,
+ target, append_list)
+ else:
+ metadata = {target: metadata}
+ patch = prepare_target_patch(metadata, source_metadata, append,
+ target, append_list, target)
+ self.data = {
+ '-patch': json.dumps(patch),
+ '-target': target,
+ 'priority': priority,
+ }
+ logger.debug('submitting metadata request: {}'.format(self.data))
+ super(MetadataPreparedRequest, self).prepare_body(self.data, None)
- patch = json.dumps(make_patch(source_metadata, destination_metadata).patch)
- self.data = {
- '-patch': patch,
- '-target': target,
- 'priority': priority,
- }
+def prepare_patch(metadata, source_metadata, append, append_list=None):
+ destination_metadata = source_metadata.copy()
+ if isinstance(metadata, list):
+ prepared_metadata = metadata
+ if not destination_metadata:
+ destination_metadata = list()
+ else:
+ prepared_metadata = prepare_metadata(metadata, source_metadata, append,
+ append_list)
+ if isinstance(destination_metadata, dict):
+ destination_metadata.update(prepared_metadata)
+ elif isinstance(metadata, list) and not destination_metadata:
+ destination_metadata = metadata
+ else:
+ if isinstance(prepared_metadata, list):
+ if append_list:
+ destination_metadata += prepared_metadata
+ else:
+ destination_metadata = prepared_metadata
+ else:
+ destination_metadata.append(prepared_metadata)
+ # Delete metadata items where value is REMOVE_TAG.
+ destination_metadata = delete_items_from_dict(destination_metadata, 'REMOVE_TAG')
+ patch = make_patch(source_metadata, destination_metadata).patch
+ return patch
+
+
+def prepare_target_patch(metadata, source_metadata, append, target, append_list, key):
+
+ def dictify(lst, key=None, value=None):
+ if not lst:
+ return value
+ sub_dict = dictify(lst[1:], key, value)
+ for i, v in enumerate(lst):
+ md = {v: copy.deepcopy(sub_dict)}
+ return md
+
+ for _k in metadata:
+ metadata = dictify(_k.split('/')[1:], _k.split('/')[-1], metadata[_k])
+ for i, _k in enumerate(key.split('/')):
+ if i == 0:
+ source_metadata = source_metadata.get(_k, dict())
+ else:
+ source_metadata[_k] = source_metadata.get(_k, dict()).get(_k, dict())
+ patch = prepare_patch(metadata, source_metadata, append, append_list)
+ return patch
- super(MetadataPreparedRequest, self).prepare_body(self.data, None)
+
+def prepare_files_patch(metadata, source_metadata, append, target, append_list):
+ filename = '/'.join(target.split('/')[1:])
+ for f in source_metadata:
+ if f.get('name') == filename:
+ source_metadata = f
+ break
+ patch = prepare_patch(metadata, source_metadata, append, append_list)
+ return patch
def prepare_metadata(metadata, source_metadata=None, append=False, append_list=False):
@@ -338,8 +422,12 @@
if not isinstance(metadata[key], list):
metadata[key] = [metadata[key]]
for v in metadata[key]:
- if v in source_metadata[key]:
- continue
+ if not isinstance(source_metadata[key], list):
+ if v in [source_metadata[key]]:
+ continue
+ else:
+ if v in source_metadata[key]:
+ continue
if not isinstance(source_metadata[key], list):
prepared_metadata[key] = [source_metadata[key]]
else:
diff -Nru python-internetarchive-1.8.1/internetarchive/__init__.py python-internetarchive-1.8.5/internetarchive/__init__.py
--- python-internetarchive-1.8.1/internetarchive/__init__.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/__init__.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2017 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -30,17 +30,17 @@
>>> item.exists
True
-:copyright: (C) 2012-2017 by Internet Archive.
+:copyright: (C) 2012-2019 by Internet Archive.
:license: AGPL 3, see LICENSE for more details.
"""
from __future__ import absolute_import
__title__ = 'internetarchive'
-__version__ = '1.8.0'
+__version__ = '1.8.5'
__author__ = 'Jacob M. Johnson'
__license__ = 'AGPL 3'
-__copyright__ = 'Copyright (C) 2012-2017 Internet Archive'
+__copyright__ = 'Copyright (C) 2012-2019 Internet Archive'
from internetarchive.item import Item
from internetarchive.files import File
diff -Nru python-internetarchive-1.8.1/internetarchive/item.py python-internetarchive-1.8.5/internetarchive/item.py
--- python-internetarchive-1.8.1/internetarchive/item.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/item.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2017 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -21,7 +21,7 @@
internetarchive.item
~~~~~~~~~~~~~~~~~~~~
-:copyright: (C) 2012-2017 by Internet Archive.
+:copyright: (C) 2012-2019 by Internet Archive.
:license: AGPL 3, see LICENSE for more details.
"""
from __future__ import absolute_import, unicode_literals, print_function
@@ -283,7 +283,8 @@
ignore_errors=None,
on_the_fly=None,
return_responses=None,
- no_change_timestamp=None):
+ no_change_timestamp=None,
+ params=None):
"""Download files from an item.
:param files: (optional) Only download files matching given file names.
@@ -345,6 +346,10 @@
current time instead of changing it to that given in
the original archive.
+ :type params: dict
+ :param params: (optional) URL parameters to send with
+ download request (e.g. `cnt=0`).
+
:rtype: bool
:returns: True if if all files have been downloaded successfully.
"""
@@ -357,6 +362,7 @@
no_directory = False if no_directory is None else no_directory
return_responses = False if not return_responses else True
no_change_timestamp = False if not no_change_timestamp else no_change_timestamp
+ params = None if not params else params
if not dry_run:
if item_index and verbose is True:
@@ -415,7 +421,7 @@
continue
r = f.download(path, verbose, silent, ignore_existing, checksum, destdir,
retries, ignore_errors, None, return_responses,
- no_change_timestamp)
+ no_change_timestamp, params)
if return_responses:
responses.append(r)
if r is False:
@@ -473,7 +479,6 @@
:returns: A dictionary containing the status_code and response
returned from the Metadata API.
"""
- target = 'metadata' if target is None else target
append = False if append is None else append
access_key = self.session.access_key if not access_key else access_key
secret_key = self.session.secret_key if not secret_key else secret_key
@@ -483,12 +488,15 @@
url = '{protocol}//archive.org/metadata/{identifier}'.format(
protocol=self.session.protocol,
identifier=self.identifier)
+ # TODO: currently files and metadata targets do not support dict's,
+ # but they might someday?? refactor this check.
+ source_metadata = self.item_metadata
request = MetadataRequest(
method='POST',
url=url,
metadata=metadata,
headers=self.session.headers,
- source_metadata=self.item_metadata.get(target.split('/')[0], {}),
+ source_metadata=source_metadata,
target=target,
priority=priority,
access_key=access_key,
@@ -729,6 +737,7 @@
body.close()
os.remove(filename)
body.close()
+ response.close()
return response
except HTTPError as exc:
body.close()
@@ -758,12 +767,10 @@
"""Upload files to an item. The item will be created if it
does not exist.
- :type files: list
+ :type files: str, file, list, tuple, dict
:param files: The filepaths or file-like objects to upload.
- :type kwargs: dict
- :param kwargs: The keyword arguments from the call to
- upload_file().
+ :param \*\*kwargs: Optional arguments that :func:`Item.upload_file()` takes.
Usage::
@@ -771,10 +778,32 @@
>>> item = internetarchive.Item('identifier')
>>> md = dict(mediatype='image', creator='Jake Johnson')
>>> item.upload('/path/to/image.jpg', metadata=md, queue_derive=False)
- True
+ [<Response [200]>]
+
+ Uploading multiple files::
+
+ >>> r = item.upload(['file1.txt', 'file2.txt'])
+ >>> r = item.upload([fileobj, fileobj2])
+ >>> r = item.upload(('file1.txt', 'file2.txt'))
+
+ Uploading file objects:
+
+ >>> import io
+ >>> f = io.BytesIO(b"some initial binary data: \\x00\\x01")
+ >>> r = item.upload({'remote-name.txt': f})
+ >>> f = io.BytesIO(b"some more binary data: \\x00\\x01")
+ >>> f.name = 'remote-name.txt'
+ >>> r = item.upload(f)
+
+ *Note: file objects must either have a name attribute, or be uploaded in a
+ dict where the key is the remote-name*
+
+ Setting the remote filename with a dict::
+
+ >>> r = item.upload({'remote-name.txt': '/path/to/local/file.txt'})
:rtype: list
- :returns: A list of requests.Response objects.
+ :returns: A list of :class:`requests.Response` objects.
"""
queue_derive = True if queue_derive is None else queue_derive
remote_dir_name = None
diff -Nru python-internetarchive-1.8.1/internetarchive/search.py python-internetarchive-1.8.5/internetarchive/search.py
--- python-internetarchive-1.8.1/internetarchive/search.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/search.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2017 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -24,7 +24,7 @@
This module provides objects for interacting with the Archive.org
search engine.
-:copyright: (C) 2012-2017 by Internet Archive.
+:copyright: (C) 2012-2019 by Internet Archive.
:license: AGPL 3, see LICENSE for more details.
"""
from __future__ import absolute_import, unicode_literals
@@ -92,7 +92,7 @@
# Set timeout.
if 'timeout' not in self.request_kwargs:
- self.request_kwargs['timeout'] = 24
+ self.request_kwargs['timeout'] = 300
# Set retries.
self.session.mount_http_adapter(max_retries=self.max_retries)
diff -Nru python-internetarchive-1.8.1/internetarchive/session.py python-internetarchive-1.8.5/internetarchive/session.py
--- python-internetarchive-1.8.1/internetarchive/session.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/session.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2017 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -24,7 +24,7 @@
This module provides an ArchiveSession object to manage and persist
settings across the internetarchive package.
-:copyright: (C) 2012-2017 by Internet Archive.
+:copyright: (C) 2012-2019 by Internet Archive.
:license: AGPL 3, see LICENSE for more details.
"""
from __future__ import absolute_import, unicode_literals
@@ -40,12 +40,14 @@
from requests.utils import default_headers
from requests.adapters import HTTPAdapter
from requests.packages.urllib3 import Retry
+from six.moves.urllib.parse import urlparse
from internetarchive import __version__
from internetarchive.config import get_config
from internetarchive.item import Item, Collection
from internetarchive.search import Search
-from internetarchive.catalog import Catalog
+from internetarchive.catalog import Catalog, CatalogTask
+from internetarchive.utils import reraise_modify
logger = logging.getLogger(__name__)
@@ -118,7 +120,7 @@
if debug or (logger.level <= 10):
self.set_file_logger(logging_config.get('level', 'NOTSET'),
logging_config.get('file', 'internetarchive.log'),
- 'requests.packages.urllib3')
+ 'urllib3')
def _get_user_agent_string(self):
"""Generate a User-Agent string to be sent with every request."""
@@ -131,6 +133,14 @@
return 'internetarchive/{0} ({1} {2}; N; {3}; {4}) Python/{5}'.format(
__version__, uname[0], uname[-1], lang, self.access_key, py_version)
+ def rebuild_auth(self, prepared_request, response):
+ """Never rebuild auth for archive.org URLs.
+ """
+ u = urlparse(prepared_request.url)
+ if u.netloc.endswith('archive.org'):
+ return
+ super(ArchiveSession, self).rebuild_auth(prepared_request, response)
+
def mount_http_adapter(self, protocol=None, max_retries=None,
status_forcelist=None, host=None):
"""Mount an HTTP adapter to the
@@ -287,6 +297,9 @@
request_kwargs=request_kwargs,
max_retries=max_retries)
+ def get_task_log(self, task_id, request_kwargs=None):
+ return CatalogTask.get_task_log(task_id, self, request_kwargs)
+
def get_tasks(self,
identifier=None,
task_id=None,
@@ -367,7 +380,14 @@
insecure = False
with warnings.catch_warnings(record=True) as w:
warnings.filterwarnings('always')
- r = super(ArchiveSession, self).send(request, **kwargs)
+ try:
+ r = super(ArchiveSession, self).send(request, **kwargs)
+ except Exception as e:
+ try:
+ reraise_modify(e, e.request.url, prepend=False)
+ except:
+ logger.error(e)
+ raise e
if self.protocol == 'http:':
return r
insecure_warnings = ['SNIMissingWarning', 'InsecurePlatformWarning']
diff -Nru python-internetarchive-1.8.1/internetarchive/utils.py python-internetarchive-1.8.5/internetarchive/utils.py
--- python-internetarchive-1.8.1/internetarchive/utils.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/internetarchive/utils.py 2019-06-07 17:28:42.000000000 -0400
@@ -2,7 +2,7 @@
#
# The internetarchive module is a Python/CLI interface to Archive.org.
#
-# Copyright (C) 2012-2017 Internet Archive
+# Copyright (C) 2012-2019 Internet Archive
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
@@ -23,7 +23,7 @@
This module provides utility functions for the internetarchive library.
-:copyright: (C) 2012-2017 by Internet Archive.
+:copyright: (C) 2012-2019 by Internet Archive.
:license: AGPL 3, see LICENSE for more details.
"""
import hashlib
@@ -260,3 +260,85 @@
return os.path.isdir(obj)
except TypeError as exc:
return False
+
+
+def reraise_modify(caught_exc, append_msg, prepend=False):
+ """Append message to exception while preserving attributes.
+
+ Preserves exception class, and exception traceback.
+
+ Note:
+ This function needs to be called inside an except because
+ `sys.exc_info()` requires the exception context.
+
+ Args:
+ caught_exc(Exception): The caught exception object
+ append_msg(str): The message to append to the caught exception
+ prepend(bool): If True prepend the message to args instead of appending
+
+ Returns:
+ None
+
+ Side Effects:
+ Re-raises the exception with the preserved data / trace but
+ modified message
+ """
+ ExceptClass = type(caught_exc)
+ # Keep old traceback
+ traceback = sys.exc_info()[2]
+ if not caught_exc.args:
+ # If no args, create our own tuple
+ arg_list = [append_msg]
+ else:
+ # Take the last arg
+ # If it is a string
+ # append your message.
+ # Otherwise append it to the
+ # arg list(Not as pretty)
+ arg_list = list(caught_exc.args[:-1])
+ last_arg = caught_exc.args[-1]
+ if isinstance(last_arg, str):
+ if prepend:
+ arg_list.append(append_msg + last_arg)
+ else:
+ arg_list.append(last_arg + append_msg)
+ else:
+ arg_list += [last_arg, append_msg]
+ caught_exc.args = tuple(arg_list)
+ six.reraise(ExceptClass,
+ caught_exc,
+ traceback)
+
+
+def remove_none(obj):
+ if isinstance(obj, (list, tuple, set)):
+ l = type(obj)(remove_none(x) for x in obj if x)
+ try:
+ return [dict(t) for t in {tuple(sorted(d.items())) for d in l}]
+ except (AttributeError, TypeError):
+ return l
+ elif isinstance(obj, dict):
+ return type(obj)((remove_none(k), remove_none(v))
+ for k, v in obj.items() if k is not None and v)
+ else:
+ return obj
+
+
+def delete_items_from_dict(d, to_delete):
+ """Recursively deletes items from a dict,
+ if the item's value(s) is in ``to_delete``.
+ """
+ if not isinstance(to_delete, list):
+ to_delete = [to_delete]
+ if isinstance(d, dict):
+ for single_to_delete in set(to_delete):
+ if single_to_delete in d.values():
+ for k, v in d.copy().items():
+ if v == single_to_delete:
+ del d[k]
+ for k, v in d.items():
+ delete_items_from_dict(v, to_delete)
+ elif isinstance(d, list):
+ for i in d:
+ delete_items_from_dict(i, to_delete)
+ return remove_none(d)
diff -Nru python-internetarchive-1.8.1/Makefile python-internetarchive-1.8.5/Makefile
--- python-internetarchive-1.8.1/Makefile 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/Makefile 2019-06-07 17:28:42.000000000 -0400
@@ -31,7 +31,9 @@
binary:
# This requires using https://github.com/jjjake/pex which has been hacked for multi-platform support.
- pex . --python python3.6 --python python2 --python-shebang='/usr/bin/env python' -e internetarchive.cli.ia:main -o ia-$(VERSION)-py2.py3-none-any.pex
+ pex . --python python3.7 --python python2 --python-shebang='/usr/bin/env python' --platform=linux-x86_64 --platform=macosx_10_11 -e internetarchive.cli.ia:main -o ia-$(VERSION)-py2.py3-none-any.pex -r pex-requirements.txt # make with py2???
+ # Use pex==1.4.0
+ #pex . --python python3 --python /usr/bin/python --python-shebang='/usr/bin/env python' --platform=linux-x86_64 --platform=macosx_10_11 -e internetarchive.cli.ia:main -o ia-$(VERSION)-py2.py3-none-any.pex -f wheelhouse/ --no-pypi
publish-binary:
./ia-$(VERSION)-py2.py3-none-any.pex upload ia-pex ia-$(VERSION)-py2.py3-none-any.pex --no-derive
diff -Nru python-internetarchive-1.8.1/pex-requirements.txt python-internetarchive-1.8.5/pex-requirements.txt
--- python-internetarchive-1.8.1/pex-requirements.txt 1969-12-31 19:00:00.000000000 -0500
+++ python-internetarchive-1.8.5/pex-requirements.txt 2019-06-07 17:28:42.000000000 -0400
@@ -0,0 +1,8 @@
+requests>=2.9.1,<3.0.0
+jsonpatch>=0.4
+docopt>=0.6.0,<0.7.0
+clint>=0.4.0,<0.6.0
+six>=1.0.0,<2.0.0
+schema>=0.4.0
+total-ordering
+backports.csv
diff -Nru python-internetarchive-1.8.1/README.rst python-internetarchive-1.8.5/README.rst
--- python-internetarchive-1.8.1/README.rst 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/README.rst 2019-06-07 17:28:42.000000000 -0400
@@ -11,7 +11,7 @@
This package installs a command-line tool named ``ia`` for using Archive.org from the command-line.
It also installs the ``internetarchive`` Python module for programatic access to archive.org.
-Please report all bugs and issues on `Github <https://github.com/jjjake/ia-wrapper/issues>`__.
+Please report all bugs and issues on `Github <https://github.com/jjjake/internetarchive/issues>`__.
Installation
@@ -35,10 +35,10 @@
Documentation
-------------
-Documentation is available at `https://internetarchive.readthedocs.io <https://internetarchive.readthedocs.io>`_.
+Documentation is available at `https://archive.org/services/docs/api/internetarchive <https://archive.org/services/docs/api/internetarchive>`_.
Contributing
------------
-All contributions are welcome and appreciated. Please see `https://internetarchive.readthedocs.io/en/latest/contributing.html <https://internetarchive.readthedocs.io/en/latest/contributing.html>`_ for more details.
+All contributions are welcome and appreciated. Please see `https://archive.org/services/docs/api/internetarchive/contributing.html <https://archive.org/services/docs/api/internetarchive/contributing.html>`_ for more details.
diff -Nru python-internetarchive-1.8.1/setup.py python-internetarchive-1.8.5/setup.py
--- python-internetarchive-1.8.1/setup.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/setup.py 2019-06-07 17:28:42.000000000 -0400
@@ -43,8 +43,12 @@
'clint>=0.4.0,<0.6.0',
'six>=1.0.0,<2.0.0',
'schema>=0.4.0',
- ] + (['total-ordering'] if sys.version_info < (2, 7) else []) +
- (['backports.csv'] if sys.version_info < (3, 0) else []),
+ 'backports.csv < 1.07;python_version<"2.7"',
+ 'backports.csv < 1.07;python_version<"3.4"',
+ 'backports.csv;python_version>="2.7"',
+ 'backports.csv;python_version>="3.4"',
+ 'total-ordering;python_version<"2.7"',
+ ],
classifiers=[
'Development Status :: 5 - Production/Stable',
'Intended Audience :: Developers',
diff -Nru python-internetarchive-1.8.1/tests/cli/test_ia_download.py python-internetarchive-1.8.5/tests/cli/test_ia_download.py
--- python-internetarchive-1.8.1/tests/cli/test_ia_download.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/tests/cli/test_ia_download.py 2019-06-07 17:28:42.000000000 -0400
@@ -34,7 +34,8 @@
expected_files = set([
'globe_west_540.jpg',
'NASAarchiveLogo.jpg',
- 'globe_west_540_thumb.jpg'
+ 'globe_west_540_thumb.jpg',
+ '__ia_thumb.jpg',
])
call_cmd('ia --insecure download --glob="*jpg" nasa')
diff -Nru python-internetarchive-1.8.1/tests/cli/test_ia_metadata.py python-internetarchive-1.8.5/tests/cli/test_ia_metadata.py
--- python-internetarchive-1.8.1/tests/cli/test_ia_metadata.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/tests/cli/test_ia_metadata.py 2019-06-07 17:28:42.000000000 -0400
@@ -9,9 +9,10 @@
def test_ia_metadata_exists(capsys):
with IaRequestsMock() as rsps:
rsps.add_metadata_mock('nasa')
- ia_call(['ia', 'metadata', '--exists', 'nasa'])
+ ia_call(['ia', 'metadata', '--exists', 'nasa'], expected_exit_code=0)
out, err = capsys.readouterr()
assert out == 'nasa exists\n'
+ rsps.reset()
rsps.add_metadata_mock('nasa', '{}')
sys.argv = ['ia', 'metadata', '--exists', 'nasa']
ia_call(['ia', 'metadata', '--exists', 'nasa'], expected_exit_code=1)
diff -Nru python-internetarchive-1.8.1/tests/cli/test_ia_upload.py python-internetarchive-1.8.5/tests/cli/test_ia_upload.py
--- python-internetarchive-1.8.1/tests/cli/test_ia_upload.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/tests/cli/test_ia_upload.py 2019-06-07 17:28:42.000000000 -0400
@@ -36,6 +36,7 @@
j = json.loads(STATUS_CHECK_RESPONSE)
j['over_limit'] = 1
+ rsps.reset()
rsps.add(responses.GET, '{0}//s3.us.archive.org'.format(PROTOCOL),
body=json.dumps(j),
content_type='application/json')
diff -Nru python-internetarchive-1.8.1/tests/conftest.py python-internetarchive-1.8.5/tests/conftest.py
--- python-internetarchive-1.8.1/tests/conftest.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/tests/conftest.py 2019-06-07 17:28:42.000000000 -0400
@@ -38,7 +38,8 @@
'nasa_meta.xml',
'nasa_reviews.xml',
'NASAarchiveLogo.jpg',
- 'globe_west_540_thumb.jpg'
+ 'globe_west_540_thumb.jpg',
+ '__ia_thumb.jpg',
])
diff -Nru python-internetarchive-1.8.1/tests/requirements.txt python-internetarchive-1.8.5/tests/requirements.txt
--- python-internetarchive-1.8.1/tests/requirements.txt 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/tests/requirements.txt 2019-06-07 17:28:42.000000000 -0400
@@ -1,3 +1,3 @@
pytest>=3.3.1
pytest-pep8
-responses==0.5.0
+responses==0.10.6
diff -Nru python-internetarchive-1.8.1/tests/test_api.py python-internetarchive-1.8.5/tests/test_api.py
--- python-internetarchive-1.8.1/tests/test_api.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/tests/test_api.py 2019-06-07 17:28:42.000000000 -0400
@@ -178,12 +178,12 @@
def test_modify_metadata():
with IaRequestsMock(assert_all_requests_are_fired=False) as rsps:
- rsps.add(responses.GET, '{0}//archive.org/metadata/test'.format(PROTOCOL),
- body='{}')
- rsps.add(responses.POST, '{0}//archive.org/metadata/test'.format(PROTOCOL),
+ rsps.add(responses.GET, '{0}//archive.org/metadata/nasa'.format(PROTOCOL),
+ body='{"metadata":{"title":"foo"}}')
+ rsps.add(responses.POST, '{0}//archive.org/metadata/nasa'.format(PROTOCOL),
body=('{"success":true,"task_id":423444944,'
'"log":"https://catalogd.archive.org/log/423444944"}'))
- r = modify_metadata('test', dict(foo=1))
+ r = modify_metadata('nasa', dict(foo=1))
assert r.status_code == 200
assert r.json() == {
'task_id': 423444944,
diff -Nru python-internetarchive-1.8.1/tests/test_config.py python-internetarchive-1.8.5/tests/test_config.py
--- python-internetarchive-1.8.1/tests/test_config.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/tests/test_config.py 2019-06-07 17:28:42.000000000 -0400
@@ -17,62 +17,40 @@
@responses.activate
def test_get_auth_config():
- headers = {'set-cookie': 'logged-in-user=test@archive.org',
- 'set-cookie2': 'logged-in-sig=test-sig; version=0'}
- # set-cookie2: Ugly hack to workaround responses lack of support for multiple headers
- responses.add(responses.POST, 'https://archive.org/account/login.php',
- adding_headers=headers)
-
test_body = """{
- "key": {
- "s3secretkey": "test-secret",
- "s3accesskey": "test-access"
+ "success": true,
+ "values": {
+ "cookies": {
+ "logged-in-sig": "foo-sig",
+ "logged-in-user": "foo%40example.com"
+ },
+ "email": "foo@example.com",
+ "itemname": "@jakej",
+ "s3": {
+ "access": "Ac3ssK3y",
+ "secret": "S3cretK3y"
+ },
+ "screenname":"jakej"
},
- "screenname": "foo",
- "success": 1
- }"""
- responses.add(responses.GET, 'https://archive.org/account/s3.php',
- body=test_body, adding_headers=headers,
- content_type='application/json')
- responses.add(responses.GET, 'https://s3.us.archive.org',
- body=test_body, adding_headers=headers,
- content_type='application/json')
-
- class UglyHack(httplib.HTTPResponse):
- def __init__(self, headers):
- self.fp = True
- if six.PY2:
- self.msg = httplib.HTTPMessage(StringIO())
- else:
- self.msg = httplib.HTTPMessage()
- for (k, v) in headers.items():
- self.msg[k] = v
-
- original_func = requests.adapters.HTTPAdapter.build_response
-
- def ugly_hack_build_response(self, req, resp):
- resp._original_response = UglyHack(resp.getheaders())
- response = original_func(self, req, resp)
- return response
-
- ugly_hack = mock.patch('requests.adapters.HTTPAdapter.build_response',
- ugly_hack_build_response)
- ugly_hack.start()
+ "version": 1}"""
+ responses.add(responses.POST, 'https://archive.org/services/xauthn/',
+ body=test_body)
r = internetarchive.config.get_auth_config('test@example.com', 'password1')
- ugly_hack.stop()
- assert r['s3']['access'] == 'test-access'
- assert r['s3']['secret'] == 'test-secret'
- assert r['cookies']['logged-in-user'] == 'test@archive.org'
- assert r['cookies']['logged-in-sig'] == 'test-sig'
+ assert r['s3']['access'] == 'Ac3ssK3y'
+ assert r['s3']['secret'] == 'S3cretK3y'
+ assert r['cookies']['logged-in-user'] == 'foo%40example.com'
+ assert r['cookies']['logged-in-sig'] == 'foo-sig'
@responses.activate
def test_get_auth_config_auth_fail():
# No logged-in-sig cookie set raises AuthenticationError.
- responses.add(responses.POST, 'https://archive.org/account/login.php')
+ responses.add(responses.POST, 'https://archive.org/services/xauthn/',
+ body='{"error": "failed"}')
try:
- internetarchive.config.get_auth_config('test@example.com', 'password1')
+ r = internetarchive.config.get_auth_config('test@example.com', 'password1')
except AuthenticationError as exc:
+ return
assert str(exc) == ('Authentication failed. Please check your credentials '
'and try again.')
diff -Nru python-internetarchive-1.8.1/tests/test_item.py python-internetarchive-1.8.5/tests/test_item.py
--- python-internetarchive-1.8.1/tests/test_item.py 2018-06-28 19:18:10.000000000 -0400
+++ python-internetarchive-1.8.5/tests/test_item.py 2019-06-07 17:28:42.000000000 -0400
@@ -147,6 +147,7 @@
with IaRequestsMock() as rsps:
rsps.add(responses.GET, DOWNLOAD_URL_RE, body='test content')
nasa_item.download(files='nasa_meta.xml')
+ rsps.reset()
with pytest.raises(ConnectionError):
nasa_item.download(files='nasa_meta.xml')
@@ -179,6 +180,7 @@
rsps.add(responses.GET, DOWNLOAD_URL_RE, body='test content')
nasa_item.download(files='nasa_meta.xml')
+ rsps.reset()
rsps.add(responses.GET, DOWNLOAD_URL_RE, body='new test content')
nasa_item.download(files='nasa_meta.xml')
load_file('nasa/nasa_meta.xml') == 'new test content'
@@ -200,6 +202,7 @@
assert load_file('nasa/nasa_meta.xml') == 'overwrite based on md5'
# test no overwrite based on checksum.
+ rsps.reset()
rsps.add(responses.GET, DOWNLOAD_URL_RE,
body=load_test_data_file('nasa_meta.xml'))
nasa_item.download(files='nasa_meta.xml', checksum=True)
Reply to: