From 94e93ba7e6fef1fd0257c7e3f84539e3ed828e70 Mon Sep 17 00:00:00 2001 From: Thomas Waldmann Date: Sun, 16 Jan 2022 20:39:29 +0100 Subject: [PATCH] formula is only approximately correct the movement of the start of the hashing window stops at (file_size - window_size), thus THAT would be the factor in that formula, not just file_size. for medium and big files, window_size is much smaller than file_size, so guess we can just say "approximately" for the general case. --- docs/internals/data-structures.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/internals/data-structures.rst b/docs/internals/data-structures.rst index 08b0b84d9..6d1b4ab07 100644 --- a/docs/internals/data-structures.rst +++ b/docs/internals/data-structures.rst @@ -633,7 +633,7 @@ This results in a high chance that a single cluster of changes to a file will on result in 1-2 new chunks, aiding deduplication. Using normal hash functions this would be extremely slow, -requiring hashing ``window size * file size`` bytes. +requiring hashing approximately ``window size * file size`` bytes. A rolling hash is used instead, which allows to add a new input byte and compute a new hash as well as *remove* a previously added input byte from the computed hash. This makes the cost of computing a hash for each