This way if only one of key and value is 64bit (eg. pointer), and other is 32bit,
the whole item will fit in 128bit, whereas before it would have been bumped up to
196 if only value was 64bit (a common use-case for us.)
This gives best performance for short strings, now that we have a h-advance cache as well.
The en-words benchmark in particular, now ot-font is faster than ft.
Second to last line is of interest:
Before:
-----------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------------------------------------
BM_Shape/en-words.txt/Roboto-Regular.ttf/hb 29.8 ms 29.8 ms 23
BM_Shape/en-words.txt/Roboto-Regular.ttf/ft 30.4 ms 30.4 ms 23
BM_Shape/en-words.txt/SourceSerifVariable-Roman.ttf/hb 16.3 ms 16.3 ms 43
BM_Shape/en-words.txt/SourceSerifVariable-Roman.ttf/ft 16.5 ms 16.5 ms 42
BM_Shape/en-words.txt/SourceSerifVariable-Roman.ttf/var/hb 18.0 ms 18.0 ms 39
BM_Shape/en-words.txt/SourceSerifVariable-Roman.ttf/var/ft 17.8 ms 17.8 ms 39
After:
behdad@Behdads-MacBook-Pro harfbuzz % ninja -Cbuild && build/perf/benchmark-shape --benchmark_filter=en-words
-----------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------------------------------------
BM_Shape/en-words.txt/Roboto-Regular.ttf/hb 30.0 ms 30.0 ms 23
BM_Shape/en-words.txt/Roboto-Regular.ttf/ft 30.3 ms 30.3 ms 23
BM_Shape/en-words.txt/SourceSerifVariable-Roman.ttf/hb 16.3 ms 16.3 ms 43
BM_Shape/en-words.txt/SourceSerifVariable-Roman.ttf/ft 16.4 ms 16.4 ms 43
BM_Shape/en-words.txt/SourceSerifVariable-Roman.ttf/var/hb 17.6 ms 17.6 ms 40
BM_Shape/en-words.txt/SourceSerifVariable-Roman.ttf/var/ft 17.8 ms 17.8 ms 39