[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1086998: marked as done (ITP: borghash -- Memory-efficient hash table (implemented in Cython))



Your message dated Fri, 08 Nov 2024 19:00:10 +0000
with message-id <E1t9UDK-005EcN-VC@fasolo.debian.org>
and subject line Bug#1086998: fixed in borghash 0.0.1-1
has caused the Debian Bug report #1086998,
regarding ITP: borghash -- Memory-efficient hash table (implemented in Cython)
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
1086998: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1086998
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: wnpp
Severity: wishlist
Owner: Gianfranco Costamagna <locutusofborg@debian.org>

* Package name    : borghash
  Version         : 0.0.1
  Upstream Author : The Borg Collective (see AUTHORS file) <None>
* URL             : None
* License         : BSD-3-clause
  Programming Lang: Python
  Description     : Memory-efficient hash table (implemented in Cython)

Binary package names: python3-borghash


Needed for borgbackup2

BorgHash
=========

Memory-efficient hashtable implementations as a Python library,
implemented in Cython.

HashTable
---------

``HashTable`` is a rather low-level implementation, usually one rather wants to
use the ``HashTableNT`` wrapper. But read on to get the basics...

Keys and Values
~~~~~~~~~~~~~~~

The keys MUST be perfectly random ``bytes`` of arbitrary, but constant length,
like from a cryptographic hash (sha256, hmac-sha256, ...).
The implementation relies on this "perfectly random" property and does not
implement an own hash function, but just takes 32 bits from the given key.

The values are binary ``bytes`` of arbitrary, but constant length.

The length of the keys and values is defined when creating a ``HashTable``
instance (after that, the length must always match that defined length).

Implementation details
~~~~~~~~~~~~~~~~~~~~~~

To have little memory overhead overall, the hashtable only stores uint32_t
indexes into separate keys and values arrays (short: kv arrays).

A new key just gets appended to the keys array. The corresponding value gets
appended to the values array. After that, the key and value do not change their
index as long as they exist in the hashtable and the ht and kv arrays are in
memory. Even when kv pairs are deleted from ``HashTable``, the kv arrays never
shrink and the indexes of other kv pairs don't change.

This is because we want to have stable array indexes for the keys/values so the
indexes can be used outside of ``HashTable`` as memory-efficient references.

Memory allocated
~~~~~~~~~~~~~~~~

For a hashtable load factor of 0.1 - 0.5, a kv array grow factor of 1.3 and
N kv pairs, memory usage in bytes is approximately:

- Hashtable: from ``N * 4 / 0.5`` to ``N * 4 / 0.1``
- Keys/Values: from ``N * len(key+value) * 1.0`` to ``N * len(key+value) * 1.3``
- Overall: from ``N * (8 + len(key+value))`` to ``N * (40 + len(key+value) * 1.3)``

When the hashtable or the kv arrays are resized, there will be short memory
usage spikes. For the kv arrays, ``realloc()`` is used to avoid copying of
data and memory usage spikes, if possible.

HashTableNT
-----------

``HashTableNT`` is a convenience wrapper around ``HashTable``:

- accepts and returns ``namedtuple`` values
- implements persistence: can read (write) the hashtable from (to) a file.

Keys and Values
~~~~~~~~~~~~~~~

Keys: ``bytes``, see ``HashTable``.

Values: any fixed type of ``namedtuple`` that can be serialized to ``bytes``
by Python's ``struct`` module using a given format string.

When setting a value, it is automatically serialized. When a value is returned,
it will be a ``namedtuple`` of the given type.

Persistence
~~~~~~~~~~~

``HashTableNT`` has ``.write()`` and ``.read()`` methods to save/load its
content to/from a file, using an efficient binary format.

When a ``HashTableNT`` is saved to disk, only the non-deleted entries are
persisted and when it is loaded from disk, a new hashtable and new, dense
kv arrays are built - thus, kv indexes will be different!

API
---

HashTable / HashTableNT have an API similar to a dict:

- ``__setitem__`` / ``__getitem__`` / ``__delitem__`` / ``__contains__``
- ``get()``, ``pop()``, ``setdefault()``
- ``items()``, ``len()``
- ``read()``, ``write()``, ``size()``

Example code
------------

::

    # HashTableNT mapping 256bit key [bytes] --> Chunk value [namedtuple]
    Chunk = namedtuple("Chunk", ["refcount", "size"])
    # 256bit (32Byte) key, 2x 32bit (4Byte) values
    ht = HashTableNT(key_size=32, value_format="<II", value_type=Chunk)

    key = b"x" * 32  # the key is usually from a cryptographic hash fn
    value = Chunk(refcount=1, size=42)
    ht[key] = value
    assert ht[key] == value

    for key, value in ht.items():
        assert isinstance(key, bytes)
        assert isinstance(value, Chunk)

    file = "dump.bin"  # giving an fd of a file opened in binary mode also works
    ht.write(file)
    ht = HashTableNT.read(file)

Building / Installing
---------------------
::

    python setup.py build_ext --inplace
    python -m build
    pip install dist/borghash*.tar.gz


Want a demo?
------------

Run ``borghash-demo`` after installing the ``borghash`` package.

It will show you the demo code, run it and print the results for your machine.

Results on an Apple MacBook Pro (M3 Pro CPU) are like:

::

    HashTableNT in-memory ops (count=50000): insert: 0.062s, lookup: 0.066s, pop: 0.061s.
    HashTableNT serialization (count=50000): write: 0.020s, read: 0.021s.


State of this project
---------------------

**API is still unstable and expected to change as development goes on.**

**As long as the API is unstable, there will be no data migration tools,
like e.g. for reading an existing serialized hashtable.**

There might be missing features or optimization potential, feedback welcome!

Borg?
-----

Please note that this code is currently **not** used by the stable release of
BorgBackup (aka "borg"), but might be used by borg master branch in the future.

License
-------

BSD license.

--- End Message ---
--- Begin Message ---
Source: borghash
Source-Version: 0.0.1-1
Done: Gianfranco Costamagna <locutusofborg@debian.org>

We believe that the bug you reported is fixed in the latest version of
borghash, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 1086998@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Gianfranco Costamagna <locutusofborg@debian.org> (supplier of updated borghash package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Format: 1.8
Date: Thu, 07 Nov 2024 22:14:57 +0000
Source: borghash
Binary: python3-borghash python3-borghash-dbgsym
Architecture: source amd64
Version: 0.0.1-1
Distribution: unstable
Urgency: low
Maintainer: Gianfranco Costamagna <locutusofborg@debian.org>
Changed-By: Gianfranco Costamagna <locutusofborg@debian.org>
Description:
 python3-borghash - Memory-efficient hash table (implemented in Cython)
Closes: 1086998
Changes:
 borghash (0.0.1-1) unstable; urgency=low
 .
   * Initial release (Closes: #1086998)
Checksums-Sha1:
 00b1794c75043b3485f5dec4fcf26b205840914f 1915 borghash_0.0.1-1.dsc
 2d6f7105504b5cc01dba93626fabb7b3abaebbe9 215842 borghash_0.0.1.orig.tar.gz
 f33d6621744b020e00270b6551cae85bc0741be3 1980 borghash_0.0.1-1.debian.tar.xz
 08415795a95d0a450f21a81edd6caf37534f5ca2 7822 borghash_0.0.1-1_amd64.buildinfo
 2c97b5b6b37af0b649cbb3c92236a6b7988c38f2 575432 python3-borghash-dbgsym_0.0.1-1_amd64.deb
 f7445284c9dc8ae1dfa2ab6e32273ce813c802db 205316 python3-borghash_0.0.1-1_amd64.deb
Checksums-Sha256:
 0fbda357e2535393908746feca443bfc1b8e9b9f44c09e6ca3bec2aa82d8e2a2 1915 borghash_0.0.1-1.dsc
 e5aaf0a1fbad76cbdabc3bdcd442b43f22025d7afd4288ab3f77c894ecba204d 215842 borghash_0.0.1.orig.tar.gz
 ac3f020b7f0b29dd346027cf625522863f637d86a1df7a6ac928ccff5eee2e4d 1980 borghash_0.0.1-1.debian.tar.xz
 76bfffab3bf6517104e7611fb8c4c44ed4ecbb661585ace0efcb6305134c2935 7822 borghash_0.0.1-1_amd64.buildinfo
 da45b257ad487a257a05dfed18459edaef0d85959572e7a5b10e064950220aad 575432 python3-borghash-dbgsym_0.0.1-1_amd64.deb
 5ea5cf19ce38cf0b9ee773b0ecf84faccbde990d343198b0ccf41c50836e7588 205316 python3-borghash_0.0.1-1_amd64.deb
Files:
 95bdf8d40cdba722bfa6336dbf6bbf8c 1915 python optional borghash_0.0.1-1.dsc
 ac09ff1ecf999d2832673a9c8e3443cb 215842 python optional borghash_0.0.1.orig.tar.gz
 66fe31764f0fbc3641282420af2fd4b3 1980 python optional borghash_0.0.1-1.debian.tar.xz
 f7dcb5d21af1260db66315c8bc71c287 7822 python optional borghash_0.0.1-1_amd64.buildinfo
 8c1a4995c95933ef9983dd820bb40a8f 575432 debug optional python3-borghash-dbgsym_0.0.1-1_amd64.deb
 2c21feae966e8a7d99a0a14bf7634f19 205316 python optional python3-borghash_0.0.1-1_amd64.deb

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEkpeKbhleSSGCX3/w808JdE6fXdkFAmctQN8ACgkQ808JdE6f
XdndsQ//fTodapE2P7TcvOWWIO8/QlensNDs9JaruoxWj/4Wr+xZN0z17K8naXY0
sqj+wpItcZBnNdUdiVPItmIVFAA1lDMPtQjH1YDlSdyhw2T2cbNFKBEnCVS/vifD
NI/e86vQdh633TxHSrd7JGhzdWPfMqVD5aFXzRJtvwl3w3sZOzZtA2LEwot3Gz+2
QXWX1VkzJb8j3bszaw/CU45cQLukULQ6EO1HBTj4ykSR5NOScyxnI4JC5lDyzHIx
nnNC0MwBN9E3LFEF7NiGsV/fjhAWqdoNUB406BMgbLhSOTHMaQsdEkpzqSU03KyM
UlWgGRKQtBKSbRDDzTWeCmDmCSErDflcHVbdOXtB7H+92Fo14RW80VzF/O3hQgg6
aox8Rg92N4s7UAdgM+Q+Fv/D4LXlcmQQ9IeaLpDosacgJU+L6oyuHKiRvSI9wORE
JCIc4AsyNdUPddnloNl/U2d6sA2I97A8EAkC9FdbCX2ys/B7sMdgitgxnR/AOHog
KkI2hTZucHYFcXnLDOqHGkJDiGSYQOTc1kUQQDzTnnCXhnog8nHunBiTIQMEmGUq
vC/HCl6/4yCUuV+6mhs7nPwR1qtdsZW7JzpYLuGuhSNRieemcnPpaJGqCzhae7JF
H+/ELp3akaZn8OhZqhxcFLz01m/WpS3WNbL5DBOmNZ8hzTW5uIE=
=Bpty
-----END PGP SIGNATURE-----

Attachment: pgpQMTlC4U8Kf.pgp
Description: PGP signature


--- End Message ---

Reply to: