mirror of https://github.com/grpc/grpc.git
The C based gRPC (C++, Python, Ruby, Objective-C, PHP, C#)
https://grpc.io/
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
122 lines
4.7 KiB
122 lines
4.7 KiB
|
|
__collisionsTest__ is a brute force hash analyzer |
|
which will measure a 64-bit hash algorithm's collision rate |
|
by generating billions of hashes, |
|
and comparing the result to an "ideal" target. |
|
|
|
The test requires a very large amount of memory. |
|
By default, it will generate 24 billion of 64-bit hashes, |
|
requiring __192 GB of RAM__ for their storage. |
|
The number of hashes can be modified using command `--nbh=`. |
|
Be aware that testing the collision ratio of 64-bit hashes |
|
requires a very large amount of hashes (several billion) for meaningful measurements. |
|
|
|
To reduce RAM usage, an optional filter can be requested, with `--filter`. |
|
It reduces the nb of candidates to analyze, hence associated RAM budget. |
|
Note that the filter itself requires a lot of RAM |
|
(32 GB by default, can be modified using `--filterlog=`, |
|
a too small filter will not be efficient, aim at ~2 bytes per hash), |
|
and reading and writing into filter cost a significant CPU budget, |
|
so this method is slower. |
|
It also doesn't allow advanced analysis of partial bitfields, |
|
since most hashes will be discarded and not stored. |
|
|
|
When using the filter, the RAM budget consists of the filter and a list of candidates, |
|
which will be a fraction of the original hash list. |
|
Using default settings (24 billion hashes, 32 GB filter), |
|
the number of potential candidates should be reduced to less than 2 billion, |
|
requiring ~14 GB for their storage. |
|
Such a result also depends on hash algorithm's efficiency. |
|
The number of effective candidates is likely to be lower, at ~ 1 billion, |
|
but storage must allocate an upper bound. |
|
|
|
For the default test, the expected "optimal" collision rate for a 64-bit hash function is ~18 collisions. |
|
|
|
#### How to build |
|
``` |
|
make |
|
``` |
|
|
|
Note: the code is a mix of C99 and C++14, |
|
it's not compatible with a C90-only compiler. |
|
|
|
#### Build modifier |
|
|
|
- `SLAB5`: use alternative pattern generator, friendlier for weak hash algorithms |
|
- `POOL_MT`: if `=0`, disable multi-threading code (enabled by default) |
|
|
|
#### How to integrate any hash in the tester |
|
|
|
The build script will compile files found in `./allcodecs`. |
|
Put the source code here. |
|
This also works if the hash is a single `*.h` file. |
|
|
|
The glue happens in `hashes.h`. |
|
In this file, there are 2 sections: |
|
- Adds the required `#include "header.h"`, and creates a wrapper |
|
to respect the format expected by the function pointer. |
|
- Adds the wrapper, along with the name and an indication of the output width, |
|
to the table, at the end of `hashes.h` |
|
|
|
Build with `make`. Locate your new hash with `./collisionsTest -h`, |
|
it should be listed. |
|
|
|
|
|
#### Usage |
|
|
|
``` |
|
usage: ./collisionsTest [hashName] [opt] |
|
|
|
list of hashNames: (...) |
|
|
|
Optional parameters: |
|
--nbh=NB Select nb of hashes to generate (25769803776 by default) |
|
--filter Enable the filter. Slower, but reduces memory usage for same nb of hashes. |
|
--threadlog=NB Use 2^NB threads |
|
--len=NB Select length of input (255 bytes by default) |
|
``` |
|
|
|
#### Some advises on how to setup a collisions test |
|
|
|
Most tests are primarily driven by the amount of RAM available. |
|
Here's a method to decide the size of the test. |
|
|
|
Presuming that RAM budget is not plentiful, for this example 32 GB, |
|
the `--filter` mode is actually compulsory to measure anything meaningful. |
|
Let's plan 50% of memory for the filter, that's 16 GB. |
|
This will be good enough to filter about 10% less hashes than this size. |
|
Let's round down to 14 G. |
|
|
|
By requesting 14G, the expectation is that the program will automatically |
|
size the filter to 16 GB, and expect to store ~1G candidates, |
|
leaving enough room to breeze for the system. |
|
|
|
The command line becomes: |
|
``` |
|
./collisionsTest --nbh=14G --filter NameOfHash |
|
``` |
|
|
|
#### Examples: |
|
|
|
Here are a few results produced with this tester: |
|
|
|
| Algorithm | Input Len | Nb Hashes | Expected | Nb Collisions | Notes | |
|
| --- | --- | --- | --- | --- | --- | |
|
| __XXH3__ | 255 | 100 Gi | 312.5 | 326 | | |
|
| __XXH64__ | 255 | 100 Gi | 312.5 | 294 | | |
|
| __XXH128__ low64 | 512 | 100 Gi | 312.5 | 321 | | |
|
| __XXH128__ high64| 512 | 100 Gi | 312.5 | 325 | | |
|
| __XXH128__ | 255 | 100 Gi | 0.0 | 0 | a 128-bit hash is expected to generate 0 collisions | |
|
|
|
Test on small inputs: |
|
|
|
| Algorithm | Input Len | Nb Hashes | Expected | Nb Collisions | Notes | |
|
| --- | --- | --- | --- | --- | --- | |
|
| __XXH64__ | 8 | 100 Gi | 312.5 | __0__ | `XXH64` is bijective for `len==8` | |
|
| __XXH3__ | 8 | 100 Gi | 312.5 | __0__ | `XXH3` is also bijective for `len==8` | |
|
| __XXH3__ | 16 | 100 Gi | 312.5 | 332 | | |
|
| __XXH3__ | 32 | 14 Gi | 6.1 | 3 | | |
|
| __XXH128__ | 16 | 25 Gi | 0.0 | 0 | test range 9-16 | |
|
| __XXH128__ | 32 | 25 Gi | 0.0 | 0 | test range 17-128 | |
|
| __XXH128__ | 100 | 13 Gi | 0.0 | 0 | test range 17-128 | |
|
| __XXH128__ | 200 | 13 Gi | 0.0 | 0 | test range 129-240 |
|
|
|