|
|
|
@ -1,115 +1,176 @@ |
|
|
|
|
<chapter id="what-is-harfbuzz"> |
|
|
|
|
<title>What is HarfBuzz?</title> |
|
|
|
|
<para> |
|
|
|
|
HarfBuzz is a <emphasis>text shaping engine</emphasis>. It solves |
|
|
|
|
the problem of selecting and positioning glyphs from a font given a |
|
|
|
|
Unicode string. |
|
|
|
|
HarfBuzz is a <emphasis>text shaping engine</emphasis>. If you |
|
|
|
|
give HarfBuzz a font and a string containing a sequence of Unicode |
|
|
|
|
codepoints, HarfBuzz selects and positions the corresponding |
|
|
|
|
glyphs from the font, applying all of the necessary layout rules |
|
|
|
|
and font features. HarfBuzz then returns the string to you in the |
|
|
|
|
form that is correctly arranged for the language and writing |
|
|
|
|
system. |
|
|
|
|
</para> |
|
|
|
|
<section id="why-do-i-need-it"> |
|
|
|
|
<title>Why do I need it?</title> |
|
|
|
|
<para> |
|
|
|
|
HarfBuzz can properly shape all of the world's major writing |
|
|
|
|
systems. It runs on virtually all operating systems and software |
|
|
|
|
platforms, and it supports all of the standard font formats in use |
|
|
|
|
today. |
|
|
|
|
</para> |
|
|
|
|
<section id="why-do-i-need-a-shaping-engine"> |
|
|
|
|
<title>Why do I need a shaping engine?</title> |
|
|
|
|
<para> |
|
|
|
|
Text shaping is an integral part of preparing text for display. It |
|
|
|
|
is a fairly low level operation; HarfBuzz is used directly by |
|
|
|
|
graphic rendering libraries such as Pango, and the layout engines |
|
|
|
|
in Firefox, LibreOffice and Chromium. Unless you are |
|
|
|
|
<emphasis>writing</emphasis> one of these layout engines yourself, |
|
|
|
|
you will probably not need to use HarfBuzz - normally higher level |
|
|
|
|
libraries will turn text into glyphs for you. |
|
|
|
|
Text shaping is an integral part of preparing text for |
|
|
|
|
display. Before a Unicode sequence can be rendered, the |
|
|
|
|
codepoints in the sequence must be mapped to the glyphs |
|
|
|
|
provided in the font, and the glyphs must be positioned |
|
|
|
|
correctly relative to each other. For many of the scripts |
|
|
|
|
supported in Unicode, these steps involve script-specific layout |
|
|
|
|
rules. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
Text shaping is a fairly low-level operation. HarfBuzz is |
|
|
|
|
used directly by graphic rendering libraries such as Pango, as |
|
|
|
|
well as by the layout engines in Firefox, LibreOffice, and |
|
|
|
|
Chromium. Unless you are <emphasis>writing</emphasis> one of |
|
|
|
|
these layout engines yourself, you will probably not need to use |
|
|
|
|
HarfBuzz: normally, lower-level libraries will turn text into |
|
|
|
|
glyphs for you. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
However, if you <emphasis>are</emphasis> writing a layout engine |
|
|
|
|
or graphics library yourself, you will need to perform text |
|
|
|
|
shaping, and this is where HarfBuzz can help you. Here are some |
|
|
|
|
reasons why you need it: |
|
|
|
|
shaping, and this is where HarfBuzz can help you. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
Here are some specific scenarios where a text-shaping engine |
|
|
|
|
like HarfBuzz helps you: |
|
|
|
|
</para> |
|
|
|
|
<itemizedlist> |
|
|
|
|
<listitem> |
|
|
|
|
<para> |
|
|
|
|
OpenType fonts contain a set of glyphs, indexed by glyph ID. |
|
|
|
|
The glyph ID within the font does not necessarily relate to a |
|
|
|
|
Unicode codepoint. For instance, some fonts have the letter |
|
|
|
|
"a" as glyph ID 1. To pull the right glyph out of |
|
|
|
|
the font in order to display it, you need to consult a table |
|
|
|
|
within the font (the "cmap" table) which maps |
|
|
|
|
Unicode codepoints to glyph IDs. Text shaping turns codepoints |
|
|
|
|
into glyph IDs. |
|
|
|
|
OpenType fonts contain a set of glyphs (that is, shapes |
|
|
|
|
to represent the letters, numbers, punctuation marks, and |
|
|
|
|
all other symbols), which are indexed by a <literal>glyph ID</literal>. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
The glyph ID within the font does not necessarily correlate |
|
|
|
|
to a predictable Unicode codepoint. For instance, some fonts |
|
|
|
|
have the letter "a" as glyph ID 1, but many others do |
|
|
|
|
not. To pull the right glyph out of the font in order to |
|
|
|
|
display "a", you need to consult the table inside |
|
|
|
|
the font (the <literal>cmap</literal> table) that maps Unicode |
|
|
|
|
codepoints to glyph IDs. In other words, <emphasis>text shaping turns |
|
|
|
|
codepoints into glyph IDs</emphasis>. |
|
|
|
|
</para> |
|
|
|
|
</listitem> |
|
|
|
|
<listitem> |
|
|
|
|
<para> |
|
|
|
|
Many OpenType fonts contain ligatures: combinations of |
|
|
|
|
characters which are rendered together. For instance, it's |
|
|
|
|
common for the <literal>fi</literal> combination to appear in |
|
|
|
|
print as the single ligature "fi". Whether you should |
|
|
|
|
render text as <literal>fi</literal> or "fi" does not |
|
|
|
|
depend on the input text, but on the capabilities of the font |
|
|
|
|
and the level of ligature application you wish to perform. |
|
|
|
|
Text shaping involves querying the font's ligature tables and |
|
|
|
|
determining what substitutions should be made. |
|
|
|
|
characters that are rendered as a single unit. For instance, |
|
|
|
|
it is common for the <literal>fi</literal> letter |
|
|
|
|
combination to appear in print as the single ligature glyph |
|
|
|
|
"fi". |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
Whether you should render an "f, i" sequence |
|
|
|
|
as <literal>fi</literal> or as "fi" does not |
|
|
|
|
depend on the input text. Rather, it depends on the whether |
|
|
|
|
or not the font includes an "fi" glyph and on the |
|
|
|
|
level of ligature application you wish to perform. The font |
|
|
|
|
and the amount of ligature application used are under your |
|
|
|
|
control. In other words, <emphasis>text shaping involves |
|
|
|
|
querying the font's ligature tables and determining what |
|
|
|
|
substitutions should be made</emphasis>. |
|
|
|
|
</para> |
|
|
|
|
</listitem> |
|
|
|
|
<listitem> |
|
|
|
|
<para> |
|
|
|
|
While ligatures like "fi" are typographic |
|
|
|
|
refinements, some languages <emphasis>require</emphasis> such |
|
|
|
|
While ligatures like "fi" are optional typographic |
|
|
|
|
refinements, some languages <emphasis>require</emphasis> certain |
|
|
|
|
substitutions to be made in order to display text correctly. |
|
|
|
|
In Tamil, when the letter "TTA" (ட) letter is |
|
|
|
|
followed by "U" (உ), the combination should appear |
|
|
|
|
as the single glyph "டு". The sequence of Unicode |
|
|
|
|
characters "டஉ" needs to be rendered as a single |
|
|
|
|
glyph from the font - text shaping chooses the correct glyph |
|
|
|
|
from the sequence of characters provided. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
For example, in Tamil, when the letter "TTA" (ட) |
|
|
|
|
letter is followed by "U" (உ), the pair |
|
|
|
|
must be replaced by the single glyph "டு". The |
|
|
|
|
sequence of Unicode characters "டஉ" needs to be |
|
|
|
|
substituted with a single "டு" glyph from the |
|
|
|
|
font. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
But "டு" does not have a Unicode codepoint. To |
|
|
|
|
find this glyph, you need to consult the table inside |
|
|
|
|
the font (the <literal>GSUB</literal> table) that contains |
|
|
|
|
substitution information. In other words, <emphasis>text shaping |
|
|
|
|
chooses the correct glyph for a sequence of characters |
|
|
|
|
provided</emphasis>. |
|
|
|
|
</para> |
|
|
|
|
</listitem> |
|
|
|
|
<listitem> |
|
|
|
|
<para> |
|
|
|
|
Similarly, each Arabic character has four different variants: |
|
|
|
|
within a font, there will be glyphs for the initial, medial, |
|
|
|
|
final, and isolated forms of each letter. Unicode only encodes |
|
|
|
|
one codepoint per character, and so a Unicode string will not |
|
|
|
|
tell you which glyph to use. Text shaping chooses the correct |
|
|
|
|
form of the letter and returns the correct glyph from the font |
|
|
|
|
that you need to render. |
|
|
|
|
Similarly, each Arabic character has four different variants |
|
|
|
|
corresponding to the different positions in might appear in |
|
|
|
|
within a sequence. Inside a font, there will be separate |
|
|
|
|
glyphs for the initial, medial, final, and isolated forms of |
|
|
|
|
each letter, each at a different glyph ID. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
Unicode only assigns one codepoint per character, so a |
|
|
|
|
Unicode string will not tell you which glyph variant to use |
|
|
|
|
for each character. To decide, you need to analyze the whole |
|
|
|
|
string and determine the appropriate glyph for each character |
|
|
|
|
based on its position. In other words, <emphasis>text |
|
|
|
|
shaping chooses the correct form of the letter by its |
|
|
|
|
position and returns the correct glyph from the font</emphasis>. |
|
|
|
|
</para> |
|
|
|
|
</listitem> |
|
|
|
|
<listitem> |
|
|
|
|
<para> |
|
|
|
|
Other languages have marks and accents which need to be |
|
|
|
|
rendered in certain positions around a base character. For |
|
|
|
|
instance, the Moldovan language has the Cyrillic letter |
|
|
|
|
"zhe" (ж) with a breve accent, like so: ӂ. Some |
|
|
|
|
fonts will contain this character as an individual glyph, |
|
|
|
|
whereas other fonts will not contain a zhe-with-breve glyph |
|
|
|
|
but expect the rendering engine to form the character by |
|
|
|
|
overlaying the two glyphs ж and ˘. Where you should draw the |
|
|
|
|
combining breve depends on the height of the preceding glyph. |
|
|
|
|
Again, for Arabic, the correct positioning of vowel marks |
|
|
|
|
depends on the height of the character on which you are |
|
|
|
|
placing the mark. Text shaping tells you whether you have a |
|
|
|
|
Other languages involve marks and accents that need to be |
|
|
|
|
rendered in specific positions relative a base character. For |
|
|
|
|
instance, the Moldovan language includes the Cyrillic letter |
|
|
|
|
"zhe" (ж) with a breve accent, like so: "ӂ". |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
Some fonts will provide this character as a single |
|
|
|
|
zhe-with-breve glyph, but other fonts will not and, instead, |
|
|
|
|
will expect the rendering engine to form the character by |
|
|
|
|
superimposing the separate "ж" and "˘" |
|
|
|
|
glyphs. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
But exactly where you should draw the breve depends on the |
|
|
|
|
height and width of the preceding zhe glyph. To find the |
|
|
|
|
right position, you need to consult the table inside |
|
|
|
|
the font (the <literal>GPOS</literal> table) that contains |
|
|
|
|
positioning information. |
|
|
|
|
In other words, <emphasis>text shaping tells you whether you have a |
|
|
|
|
precomposed glyph within your font or if you need to compose a |
|
|
|
|
glyph yourself out of combining marks, and if so, where to |
|
|
|
|
position those marks. |
|
|
|
|
glyph yourself out of combining marks—and, if so, where to |
|
|
|
|
position those marks.</emphasis> |
|
|
|
|
</para> |
|
|
|
|
</listitem> |
|
|
|
|
</itemizedlist> |
|
|
|
|
<para> |
|
|
|
|
If this is something that you need to do, then you need a text |
|
|
|
|
shaping engine: you could use Uniscribe if you are using Windows; |
|
|
|
|
you could use CoreText on OS X; or you could use HarfBuzz. In the |
|
|
|
|
rest of this manual, we are going to assume that you are the |
|
|
|
|
implementor of a text layout engine. |
|
|
|
|
If tasks like these are something that you need to do, then you need a text |
|
|
|
|
shaping engine. You could use Uniscribe if you are writing |
|
|
|
|
Windows software; you could use CoreText on macOS; or you could |
|
|
|
|
use HarfBuzz. |
|
|
|
|
</para> |
|
|
|
|
<para> |
|
|
|
|
In the rest of this manual, we are going to assume that you are the |
|
|
|
|
implementor of a text-layout engine. |
|
|
|
|
</para> |
|
|
|
|
</section> |
|
|
|
|
<section id="why-is-it-called-harfbuzz"> |
|
|
|
|
<title>Why is it called HarfBuzz?</title> |
|
|
|
|
<para> |
|
|
|
|
HarfBuzz began its life as text shaping code within the FreeType |
|
|
|
|
project, (and you will see references to the FreeType authors |
|
|
|
|
within the source code copyright declarations) but was then |
|
|
|
|
abstracted out to its own project. This project is maintained by |
|
|
|
|
HarfBuzz began its life as text-shaping code within the FreeType |
|
|
|
|
project (and you will see references to the FreeType authors |
|
|
|
|
within the source code copyright declarations), but was then |
|
|
|
|
extracted out to its own project. This project is maintained by |
|
|
|
|
Behdad Esfahbod, and named HarfBuzz. Originally, it was a shaping |
|
|
|
|
engine for OpenType fonts - "HarfBuzz" is the Persian |
|
|
|
|
for "open type". |
|
|
|
|
</para> |
|
|
|
|
</section> |
|
|
|
|
</chapter> |
|
|
|
|
</chapter> |
|
|
|
|