Bug 70509 - Candrabindu+Visarga doesn't work in Devanagari
https://bugs.freedesktop.org/show_bug.cgi?id=70509
We categorize both bindus and visarga as syllable-modifiers.
OT spec doesn't actually say what characters go in the syllable
modifier category, and allows one. We just allow up to two now.
Test case: U+0930,U+0941,U+0901,U+0903
Uniscribe currently doesn't support that and produces a
dotted circle.
More like Uniscribe... We still allow user-defined features to
work across syllables, but not pres,blws,abs,psts,etc.
This "regressed" Sinhala numbers by 11. These are cases were
there's Consonant followed by Ra,Halant,ZWJ at the of text.
The Ra,Halant,ZWJ ends up forming reph, which is wrong...
But before we were also ligating that reph with the previous
consonant. That's even more wrong. That's also what Uniscribe
does.
Current numbers:
BENGALI: 353732 out of 354188 tests passed. 456 failed (0.128745%)
DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60732 out of 60747 tests passed. 15 failed (0.0246926%)
KANNADA: 951030 out of 951913 tests passed. 883 failed (0.0927606%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
MALAYALAM: 1048140 out of 1048334 tests passed. 194 failed (0.0185056%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271655 out of 271847 tests passed. 192 failed (0.070628%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
The new test suite runs tests included under
hb/test/shaping/tests/*.tests, which themselves reference
font files stored by sha1sum under hb/test/shaping/fonts/sha1sum.
The fonts are produced using a subsetter to only include glyphs
needed to run the test.
Four initial tests are added for (Chain)Context matching,
of which three currently fail.
After the Ngapi hackfest work, we were assuming that fonts
won't use presentation features to choose specific forms
(eg. conjuncts). As such, we were using auto-joiner behavior
for such features. It proved to be troublesome as many fonts
used presentation forms ('pres') for example to form conjuncts,
which need to be disabled when a ZWJ is inserted.
Two examples:
U+0D2F,U+200D,U+0D4D,U+0D2F with kartika.ttf
U+0995,U+09CD,U+200D,U+09B7 with vrinda.ttf
What we do now is to never do magic to ZWJ during GSUB's main input
match for Indic-style shapers. Note that backtrack/lookahead are still
matched liberally, as is GPOS. This seems to be an acceptable
compromise.
As to the bug that initially started this work, that one needs to
be fixed differently:
Bug 58714 - Kannada u+0cb0 u+200d u+0ccd u+0c95 u+0cbe does not
provide same results as Windows8
https://bugs.freedesktop.org/show_bug.cgi?id=58714
New numbers:
BENGALI: 353689 out of 354188 tests passed. 499 failed (0.140886%)
DEVANAGARI: 707305 out of 707394 tests passed. 89 failed (0.0125814%)
GUJARATI: 366349 out of 366457 tests passed. 108 failed (0.0294714%)
GURMUKHI: 60706 out of 60747 tests passed. 41 failed (0.067493%)
KANNADA: 951030 out of 951913 tests passed. 883 failed (0.0927606%)
KHMER: 299070 out of 299124 tests passed. 54 failed (0.0180527%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048102 out of 1048334 tests passed. 232 failed (0.0221304%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091753 out of 1091754 tests passed. 1 failed (9.15957e-05%)
TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
The code was confused because it was expecting left matra to have
POS_PRE_M, like we do in the Myanmar shaper, but that is not what
we were doing in this shaper. Rewrite to rely on category only.
Test case: U+AA06,U+AA34,U+AA2F
Ouch, how did things ever work without this?! The added test that has a
dot-reph as well as a pre-base reordering Ra perfectly demonstrates the
bug (tested with Nirmala font from Win8 for example). Testing suggests
that Win8 shaper has the *exact* same bug / behavior that we used to
have. Odd.
This sequence: U+120B,U+135F,U+120B with the Nyala font from Win7
exposes a GPOS bug in Uniscribe, in that the positioned mark is wrongly
moved as a result a following kern.
This is the one "failure" in the Ethiopic test suite :-).
ETHIOPIC: 118900 out of 118901 tests passed. 1 failed (0.000841036%)
With FreeSerif, it seems that the 'ccmp' feature does ligature
substituttions. That was then causing syllable match failures. We now
find syllables before any features have been applied.
Test sequence: U+0D9A,U+0DCA,U+200D,U+0DBB,U+0DCF
After we implemented dotted-circle, we were still ignoring any tests
that had dottedcircle in it for any of the shapers. That meant that if
we wrongly outputted dottedcircle, the test was being ignored. Ouch!
Fixing that shows regressions across the board. Most are Uniscribe
bugs: NOT inserting dotted-circle when it should. Some are arou
machine bugs. This is in fact a nice way to catch Indic-machine
deficiencies and when I fix the regressions, our clusters should be
much closer to Uniscribe. For now, we regressed from:
BENGALI: 353997 out of 354285 tests passed. 288 failed (0.0812905%)
DEVANAGARI: 707339 out of 707394 tests passed. 55 failed (0.00777502%)
GUJARATI: 366489 out of 366506 tests passed. 17 failed (0.0046384%)
GURMUKHI: 60769 out of 60809 tests passed. 40 failed (0.0657797%)
KANNADA: 951086 out of 951913 tests passed. 827 failed (0.0868777%)
KHMER: 299106 out of 299124 tests passed. 18 failed (0.00601757%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048104 out of 1048416 tests passed. 312 failed (0.0297592%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271747 out of 271847 tests passed. 100 failed (0.0367854%)
TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%)
TELUGU: 970558 out of 970573 tests passed. 15 failed (0.00154548%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
To:
BENGALI: 353990 out of 354285 tests passed. 295 failed (0.0832663%)
DEVANAGARI: 707315 out of 707394 tests passed. 79 failed (0.0111678%)
GUJARATI: 366447 out of 366506 tests passed. 59 failed (0.016098%)
GURMUKHI: 60707 out of 60809 tests passed. 102 failed (0.167738%)
KANNADA: 951042 out of 951913 tests passed. 871 failed (0.0915%)
KHMER: 298962 out of 299124 tests passed. 162 failed (0.0541581%)
LAO: 53611 out of 53644 tests passed. 33 failed (0.0615167%)
MALAYALAM: 1048074 out of 1048416 tests passed. 342 failed (0.0326206%)
ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
SINHALA: 271666 out of 271847 tests passed. 181 failed (0.0665816%)
TAMIL: 1091835 out of 1091837 tests passed. 2 failed (0.000183178%)
TELUGU: 970553 out of 970573 tests passed. 20 failed (0.00206064%)
TIBETAN: 208469 out of 208469 tests passed. 0 failed (0%)
Investigating.