After 763e5466c0, one doesn't
need to set flags for different pieces of text. The flags now
are something the client sets up once, depending on how it
actually uses the buffer. As such, don't clear it in
clear_contents().
Tests updated.
With this change, we now by default replace broken UTF-8/16/32 bits
with U+FFFD. This can be changed by calling new API on the buffer.
Previously the replacement value used to be (hb_codepoint_t)-1.
Note that hb_buffer_clear_contents() does NOT reset the replacement
character.
See discussion here:
6f13b6d62d
New API:
hb_buffer_set_replacement_codepoint()
hb_buffer_get_replacement_codepoint()
New API:
hb_buffer_flags_t
HB_BUFFER_FLAGS_DEFAULT
HB_BUFFER_FLAG_BOT
HB_BUFFER_FLAG_EOT
HB_BUFFER_FLAG_PRESERVE_DEFAULT_IGNORABLES
hb_buffer_set_flags()
hb_buffer_get_flags()
We use the BOT flag to decide whether to insert dottedcircle if the
first char in the buffer is a combining mark.
The PRESERVE_DEFAULT_IGNORABLES flag prevents removal of characters like
ZWNJ/ZWJ/...
Can be -1 for NUL-terminated string. This is useful for passing parts
of a larger string to a function without having to copy or modify the
string first.
Affected functions:
hb_tag_t hb_tag_from_string()
hb_direction_from_string()
hb_language_from_string()
hb_script_from_string()
For two reasons:
1. User can always call hb_buffer_pre_allocate() themselves, and
2. Now we do a pre_alloc in add_utfX anyway, so the total number of
reallocs is limited to a small number (~3) anyway. This just makes the
API cleaner.
This includes HB_DIRECTION_INVALID and HB_SCRIPT_INVALID.
The INVALID will cause a "guess whatever from the text" in hb_shape().
While it's not ideal, it works better than the previous defaults at
least (HB_DIRECTION_LTR and HB_SCRIPT_COMMON).