HarfBuzz text shaping engine
http://harfbuzz.github.io/
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
312 lines
11 KiB
312 lines
11 KiB
<?xml version="1.0"?> |
|
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" |
|
"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [ |
|
<!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'"> |
|
<!ENTITY version SYSTEM "version.xml"> |
|
]> |
|
<chapter id="getting-started"> |
|
<title>Getting started with HarfBuzz</title> |
|
<section id="an-overview-of-the-harfbuzz-shaping-api"> |
|
<title>An overview of the HarfBuzz shaping API</title> |
|
<para> |
|
The core of the HarfBuzz shaping API is the function |
|
<function>hb_shape()</function>. This function takes a font, a |
|
buffer containing a string of Unicode codepoints and |
|
(optionally) a list of font features as its input. It replaces |
|
the codepoints in the buffer with the corresponding glyphs from |
|
the font, correctly ordered and positioned, and with any of the |
|
optional font features applied. |
|
</para> |
|
<para> |
|
In addition to holding the pre-shaping input (the Unicode |
|
codepoints that comprise the input string) and the post-shaping |
|
output (the glyphs and positions), a HarfBuzz buffer has several |
|
properties that affect shaping. The most important are the |
|
text-flow direction (e.g., left-to-right, right-to-left, |
|
top-to-bottom, or bottom-to-top), the script tag, and the |
|
language tag. |
|
</para> |
|
|
|
<para> |
|
For input string buffers, flags are available to denote when the |
|
buffer represents the beginning or end of a paragraph, to |
|
indicate whether or not to visibly render Unicode <literal>Default |
|
Ignorable</literal> codepoints, and to modify the cluster-merging |
|
behavior for the buffer. For shaped output buffers, the |
|
individual X and Y offsets and <literal>advances</literal> |
|
(the logical dimensions) of each glyph are |
|
accessible. HarfBuzz also flags glyphs as |
|
<literal>UNSAFE_TO_BREAK</literal> if breaking the string at |
|
that glyph (e.g., in a line-breaking or hyphenation process) |
|
would require re-shaping the text. |
|
</para> |
|
|
|
<para> |
|
HarfBuzz also provides methods to compare the contents of |
|
buffers, join buffers, normalize buffer contents, and handle |
|
invalid codepoints, as well as to determine the state of a |
|
buffer (e.g., input codepoints or output glyphs). Buffer |
|
lifecycles are managed and all buffers are reference-counted. |
|
</para> |
|
|
|
<para> |
|
Although the default <function>hb_shape()</function> function is |
|
sufficient for most use cases, a variant is also provided that |
|
lets you specify which of HarfBuzz's shapers to use on a buffer. |
|
</para> |
|
|
|
<para> |
|
HarfBuzz can read TrueType fonts, TrueType collections, OpenType |
|
fonts, and OpenType collections. Functions are provided to query |
|
font objects about metrics, Unicode coverage, available tables and |
|
features, and variation selectors. Individual glyphs can also be |
|
queried for metrics, variations, and glyph names. OpenType |
|
variable fonts are supported, and HarfBuzz allows you to set |
|
variation-axis coordinates on font objects. |
|
</para> |
|
|
|
<para> |
|
HarfBuzz provides glue code to integrate with various other |
|
libraries, including FreeType, GObject, and CoreText. Support |
|
for integrating with Uniscribe and DirectWrite is experimental |
|
at present. |
|
</para> |
|
</section> |
|
|
|
<section id="terminology"> |
|
<title>Terminology</title> |
|
<para> |
|
|
|
</para> |
|
<variablelist> |
|
<?dbfo list-presentation="blocks"?> |
|
<varlistentry> |
|
<term>script</term> |
|
<listitem> |
|
<para> |
|
In text shaping, a <emphasis>script</emphasis> is a |
|
writing system: a set of symbols, rules, and conventions |
|
that is used to represent a language or multiple |
|
languages. |
|
</para> |
|
<para> |
|
In general computing lingo, the word "script" can also |
|
be used to mean an executable program (usually one |
|
written in a human-readable programming language). For |
|
the sake of clarity, HarfBuzz documents will always use |
|
more specific terminology when referring to this |
|
meaning, such as "Python script" or "shell script." In |
|
all other instances, "script" refers to a writing system. |
|
</para> |
|
<para> |
|
For developers using HarfBuzz, it is important to note |
|
the distinction between a script and a language. Most |
|
scripts are used to write a variety of different |
|
languages, and many languages may be written in more |
|
than one script. |
|
</para> |
|
</listitem> |
|
</varlistentry> |
|
|
|
<varlistentry> |
|
<term>shaper</term> |
|
<listitem> |
|
<para> |
|
In HarfBuzz, a <emphasis>shaper</emphasis> is a |
|
handler for a specific script-shaping model. HarfBuzz |
|
implements separate shapers for Indic, Arabic, Thai and |
|
Lao, Khmer, Myanmar, Tibetan, Hangul, Hebrew, the |
|
Universal Shaping Engine (USE), and a default shaper for |
|
non-complex scripts. |
|
</para> |
|
</listitem> |
|
</varlistentry> |
|
|
|
<varlistentry> |
|
<term>cluster</term> |
|
<listitem> |
|
<para> |
|
In text shaping, a <emphasis>cluster</emphasis> is a |
|
sequence of codepoints that must be treated as an |
|
indivisible unit. Clusters can include code-point |
|
sequences that form a ligature or base-and-mark |
|
sequences. Tracking and preserving clusters is important |
|
when shaping operations might separate or reorder |
|
code points. |
|
</para> |
|
<para> |
|
HarfBuzz provides three cluster |
|
<emphasis>levels</emphasis> that implement different |
|
approaches to the problem of preserving clusters during |
|
shaping operations. |
|
</para> |
|
</listitem> |
|
</varlistentry> |
|
|
|
<varlistentry> |
|
<term>grapheme</term> |
|
<listitem> |
|
<para> |
|
In linguistics, a <emphasis>grapheme</emphasis> is one |
|
of the indivisible units that make up a writing system or |
|
script. Often, graphemes are individual symbols (letters, |
|
numbers, punctuation marks, logograms, etc.) but, |
|
depending on the writing system, a particular grapheme |
|
might correspond to a sequence of several Unicode code |
|
points. |
|
</para> |
|
<para> |
|
In practice, HarfBuzz and other text-shaping engines |
|
are not generally concerned with graphemes. However, it |
|
is important for developers using HarfBuzz to recognize |
|
that there is a difference between graphemes and shaping |
|
clusters (see above). The two concepts may overlap |
|
frequently, but there is no guarantee that they will be |
|
identical. |
|
</para> |
|
</listitem> |
|
</varlistentry> |
|
|
|
<varlistentry> |
|
<term>syllable</term> |
|
<listitem> |
|
<para> |
|
In linguistics, a <emphasis>syllable</emphasis> is an |
|
a sequence of sounds that makes up a building block of a |
|
particular language. Every language has its own set of |
|
rules describing what constitutes a valid syllable. |
|
</para> |
|
<para> |
|
For text-shaping purposes, the various definitions of |
|
"syllable" are important because script-specific shaping |
|
operations may be applied at the syllable level. For |
|
example, a reordering rule might specify that a vowel |
|
mark be reordered to the beginning of the syllable. |
|
</para> |
|
<para> |
|
Syllables will consist of one or more Unicode code |
|
points. The definition of a syllable for a particular |
|
writing system might correspond to how HarfBuzz |
|
identifies clusters (see above) for the same writing |
|
system. However, it is important for developers using |
|
HarfBuzz to recognize that there is a difference between |
|
syllables and shaping clusters. The two concepts may |
|
overlap frequently, but there is no guarantee that they |
|
will be identical. |
|
</para> |
|
</listitem> |
|
</varlistentry> |
|
</variablelist> |
|
|
|
</section> |
|
|
|
|
|
<section id="a-simple-shaping-example"> |
|
<title>A simple shaping example</title> |
|
|
|
<para> |
|
Below is the simplest HarfBuzz shaping example possible. |
|
</para> |
|
<orderedlist numeration="arabic"> |
|
<listitem> |
|
<para> |
|
Create a buffer and put your text in it. |
|
</para> |
|
</listitem> |
|
</orderedlist> |
|
<programlisting language="C"> |
|
#include <hb.h> |
|
|
|
hb_buffer_t *buf; |
|
buf = hb_buffer_create(); |
|
hb_buffer_add_utf8(buf, text, -1, 0, -1); |
|
</programlisting> |
|
<orderedlist numeration="arabic"> |
|
<listitem override="2"> |
|
<para> |
|
Set the script, language and direction of the buffer. |
|
</para> |
|
</listitem> |
|
</orderedlist> |
|
<programlisting language="C"> |
|
hb_buffer_set_direction(buf, HB_DIRECTION_LTR); |
|
hb_buffer_set_script(buf, HB_SCRIPT_LATIN); |
|
hb_buffer_set_language(buf, hb_language_from_string("en", -1)); |
|
</programlisting> |
|
<orderedlist numeration="arabic"> |
|
<listitem override="3"> |
|
<para> |
|
Create a face and a font from a font file. |
|
</para> |
|
</listitem> |
|
</orderedlist> |
|
<programlisting language="C"> |
|
hb_blob_t *blob = hb_blob_create_from_file(filename); /* or hb_blob_create_from_file_or_fail() */ |
|
hb_face_t *face = hb_face_create(blob, 0); |
|
hb_font_t *font = hb_font_create(face); |
|
</programlisting> |
|
<orderedlist numeration="arabic"> |
|
<listitem override="4"> |
|
<para> |
|
Shape! |
|
</para> |
|
</listitem> |
|
</orderedlist> |
|
<programlisting> |
|
hb_shape(font, buf, NULL, 0); |
|
</programlisting> |
|
<orderedlist numeration="arabic"> |
|
<listitem override="5"> |
|
<para> |
|
Get the glyph and position information. |
|
</para> |
|
</listitem> |
|
</orderedlist> |
|
<programlisting language="C"> |
|
unsigned int glyph_count; |
|
hb_glyph_info_t *glyph_info = hb_buffer_get_glyph_infos(buf, &glyph_count); |
|
hb_glyph_position_t *glyph_pos = hb_buffer_get_glyph_positions(buf, &glyph_count); |
|
</programlisting> |
|
<orderedlist numeration="arabic"> |
|
<listitem override="6"> |
|
<para> |
|
Iterate over each glyph. |
|
</para> |
|
</listitem> |
|
</orderedlist> |
|
<programlisting language="C"> |
|
hb_position_t cursor_x = 0; |
|
hb_position_t cursor_y = 0; |
|
for (unsigned int i = 0; i < glyph_count; i++) { |
|
hb_codepoint_t glyphid = glyph_info[i].codepoint; |
|
hb_position_t x_offset = glyph_pos[i].x_offset; |
|
hb_position_t y_offset = glyph_pos[i].y_offset; |
|
hb_position_t x_advance = glyph_pos[i].x_advance; |
|
hb_position_t y_advance = glyph_pos[i].y_advance; |
|
/* draw_glyph(glyphid, cursor_x + x_offset, cursor_y + y_offset); */ |
|
cursor_x += x_advance; |
|
cursor_y += y_advance; |
|
} |
|
</programlisting> |
|
<orderedlist numeration="arabic"> |
|
<listitem override="7"> |
|
<para> |
|
Tidy up. |
|
</para> |
|
</listitem> |
|
</orderedlist> |
|
<programlisting language="C"> |
|
hb_buffer_destroy(buf); |
|
hb_font_destroy(font); |
|
hb_face_destroy(face); |
|
hb_blob_destroy(blob); |
|
</programlisting> |
|
|
|
<para> |
|
This example shows enough to get us started using HarfBuzz. In |
|
the sections that follow, we will use the remainder of |
|
HarfBuzz's API to refine and extend the example and improve its |
|
text-shaping capabilities. |
|
</para> |
|
</section> |
|
</chapter>
|
|
|