Crate foldhash

source
Expand description

This crate provides foldhash, a fast, non-cryptographic, minimally DoS-resistant hashing algorithm designed for computational uses such as hashmaps, bloom filters, count sketching, etc.

When should you not use foldhash:

  • You are afraid of people studying your long-running program’s behavior to reverse engineer its internal random state and using this knowledge to create many colliding inputs for computational complexity attacks.

  • You expect foldhash to have a consistent output across versions or platforms, such as for persistent file formats or communication protocols.

  • You are relying on foldhash’s properties for any kind of security. Foldhash is not appropriate for any cryptographic purpose.

Foldhash has two variants, one optimized for speed which is ideal for data structures such as hash maps and bloom filters, and one optimized for statistical quality which is ideal for algorithms such as HyperLogLog and MinHash.

Foldhash can be used in a #![no_std] environment by disabling its default "std" feature.

§Usage

The easiest way to use this crate with the standard library [HashMap] or [HashSet] is to import them from foldhash instead, along with the extension traits to make [HashMap::new] and [HashMap::with_capacity] work out-of-the-box:

use foldhash::{HashMap, HashMapExt};

let mut hm = HashMap::new();
hm.insert(42, "hello");

You can also avoid the convenience types and do it manually by initializing a RandomState, for example if you are using a different hash map implementation like hashbrown:

use hashbrown::HashMap;
use foldhash::fast::RandomState;

let mut hm = HashMap::with_hasher(RandomState::default());
hm.insert("foo", "bar");

The above methods are the recommended way to use foldhash, which will automatically generate a randomly generated hasher instance for you. If you absolutely must have determinism you can use FixedState instead, but note that this makes you trivially vulnerable to HashDoS attacks and might lead to quadratic runtime when moving data from one hashmap/set into another:

use std::collections::HashSet;
use foldhash::fast::FixedState;

let mut hm = HashSet::with_hasher(FixedState::with_seed(42));
hm.insert([1, 10, 100]);

If you rely on statistical properties of the hash for the correctness of your algorithm, such as in HyperLogLog, it is suggested to use the RandomState or FixedState from the quality module instead of the fast module. The latter is optimized purely for speed in hash tables and has known statistical imperfections.

Finally, you can also directly use the RandomState or FixedState to manually hash items using the BuildHasher trait:

use std::hash::BuildHasher;
use foldhash::quality::RandomState;

let random_state = RandomState::default();
let hash = random_state.hash_one("hello world");

Modules§

  • The foldhash implementation optimized for speed.
  • The foldhash implementation optimized for quality.