harfbuzz/docs/usermanual-buffers-language...

<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
               "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
  <!ENTITY % local.common.attrib "xmlns:xi  CDATA  #FIXED 'http://www.w3.org/2003/XInclude'">
  <!ENTITY version SYSTEM "version.xml">
]>
<chapter id="buffers-language-script-and-direction">
  <title>Buffers, language, script and direction</title>
  <para>
    The input to the HarfBuzz shaper is a series of Unicode characters, stored in a
    buffer. In this chapter, we'll look at how to set up a buffer with
    the text that we want and how to customize the properties of the
    buffer. We'll also look at a piece of lower-level machinery that
    you will need to understand before proceeding: the functions that
    HarfBuzz uses to retrieve Unicode information.
  </para>
  <para>
    After shaping is complete, HarfBuzz puts its output back
    into the buffer. But getting that output requires setting up a
    face and a font first, so we will look at that in the next chapter
    instead of here.
  </para>
  <section id="creating-and-destroying-buffers">
    <title>Creating and destroying buffers</title>
    <para>
      As we saw in our <emphasis>Getting Started</emphasis> example, a
      buffer is created and 
      initialized with <function>hb_buffer_create()</function>. This
      produces a new, empty buffer object, instantiated with some
      default values and ready to accept your Unicode strings.
    </para>
    <para>
      HarfBuzz manages the memory of objects (such as buffers) that it
      creates, so you don't have to. When you have finished working on 
      a buffer, you can call <function>hb_buffer_destroy()</function>:
    </para>
    <programlisting language="C">
      hb_buffer_t *buf = hb_buffer_create();
      ...
      hb_buffer_destroy(buf);
    </programlisting>
    <para>
      This will destroy the object and free its associated memory -
      unless some other part of the program holds a reference to this
      buffer. If you acquire a HarfBuzz buffer from another subsystem
      and want to ensure that it is not garbage collected by someone
      else destroying it, you should increase its reference count:
    </para>
    <programlisting language="C">
      void somefunc(hb_buffer_t *buf) {
      buf = hb_buffer_reference(buf);
      ...
    </programlisting>
    <para>
      And then decrease it once you're done with it:
    </para>
    <programlisting language="C">
      hb_buffer_destroy(buf);
      }
    </programlisting>
    <para>
      While we are on the subject of reference-counting buffers, it is
      worth noting that an individual buffer can only meaningfully be
      used by one thread at a time.
    </para>
    <para>
      To throw away all the data in your buffer and start from scratch,
      call <function>hb_buffer_reset(buf)</function>. If you want to
      throw away the string in the buffer but keep the options, you can
      instead call <function>hb_buffer_clear_contents(buf)</function>.
    </para>
  </section>
  
  <section id="adding-text-to-the-buffer">
    <title>Adding text to the buffer</title>
    <para>
      Now we have a brand new HarfBuzz buffer. Let's start filling it
      with text! From HarfBuzz's perspective, a buffer is just a stream
      of Unicode code points, but your input string is probably in one of
      the standard Unicode character encodings (UTF-8, UTF-16, or
      UTF-32). HarfBuzz provides convenience functions that accept
      each of these encodings:
      <function>hb_buffer_add_utf8()</function>,
      <function>hb_buffer_add_utf16()</function>, and
      <function>hb_buffer_add_utf32()</function>. Other than the
      character encoding they accept, they function identically.
    </para>
    <para>
      You can add UTF-8 text to a buffer by passing in the text array,
      the array's length, an offset into the array for the first
      character to add, and the length of the segment to add:
    </para>
    <programlisting language="C">
    hb_buffer_add_utf8 (hb_buffer_t *buf,
                    const char *text,
                    int text_length,
                    unsigned int item_offset,
                    int item_length)
    </programlisting>
    <para>
      So, in practice, you can say:
    </para>
    <programlisting language="C">
      hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text));
    </programlisting>
    <para>
      This will append your new characters to
      <parameter>buf</parameter>, not replace its existing
      contents. Also, note that you can use <literal>-1</literal> in
      place of the first instance of <function>strlen(text)</function>
      if your text array is NULL-terminated. Similarly, you can also use
      <literal>-1</literal> as the final argument want to add its full
      contents.
    </para>
    <para>
      Whatever start <parameter>item_offset</parameter> and
      <parameter>item_length</parameter> you provide, HarfBuzz will also
      attempt to grab the five characters <emphasis>before</emphasis>
      the offset point and the five characters
      <emphasis>after</emphasis> the designated end. These are the
      before and after "context" segments, which are used internally
      for HarfBuzz to make shaping decisions. They will not be part of
      the final output, but they ensure that HarfBuzz's
      script-specific shaping operations are correct. If there are
      fewer than five characters available for the before or after
      contexts, HarfBuzz will just grab what is there.
    </para>
    <para>
      For longer text runs, such as full paragraphs, it might be
      tempting to only add smaller sub-segments to a buffer and
      shape them in piecemeal fashion. Generally, this is not a good
      idea, however, because a lot of shaping decisions are
      dependent on this context information. For example, in Arabic
      and other connected scripts, HarfBuzz needs to know the code
      points before and after each character in order to correctly
      determine which glyph to return.
    </para>
    <para>
      The safest approach is to add all of the text available, then
      use <parameter>item_offset</parameter> and
      <parameter>item_length</parameter> to indicate which characters you
      want shaped, so that HarfBuzz has access to any context.
    </para>
    <para>
      You can also add Unicode code points directly with
      <function>hb_buffer_add_codepoints()</function>. The arguments
      to this function are the same as those for the UTF
      encodings. But it is particularly important to note that
      HarfBuzz does not do validity checking on the text that is added
      to a buffer. Invalid code points will be replaced, but it is up
      to you to do any deep-sanity checking necessary.
    </para>
    
  </section>
  
  <section id="setting-buffer-properties">
    <title>Setting buffer properties</title>
    <para>
      Buffers containing input characters still need several
      properties set before HarfBuzz can shape their text correctly.
    </para>
    <para>
      Initially, all buffers are set to the
      <literal>HB_BUFFER_CONTENT_TYPE_INVALID</literal> content
      type. After adding text, the buffer should be set to
      <literal>HB_BUFFER_CONTENT_TYPE_UNICODE</literal> instead, which
      indicates that it contains un-shaped input
      characters. After shaping, the buffer will have the
      <literal>HB_BUFFER_CONTENT_TYPE_GLYPHS</literal> content type.
    </para>
    <para>
      <function>hb_buffer_add_utf8()</function> and the
      other UTF functions set the content type of their buffer
      automatically. But if you are reusing a buffer you may want to
      check its state with
      <function>hb_buffer_get_content_type(buffer)</function>. If
      necessary you can set the content type with
    </para>
    <programlisting language="C">
      hb_buffer_set_content_type(buf, HB_BUFFER_CONTENT_TYPE_UNICODE);
    </programlisting>
    <para>
      to prepare for shaping.
    </para>
    <para>
      Buffers also need to carry information about the script,
      language, and text direction of their contents. You can set
      these properties individually:
    </para>
    <programlisting language="C">
      hb_buffer_set_direction(buf, HB_DIRECTION_LTR);
      hb_buffer_set_script(buf, HB_SCRIPT_LATIN);
      hb_buffer_set_language(buf, hb_language_from_string("en", -1));
    </programlisting>
    <para>
      However, since these properties are often the repeated for
      multiple text runs, you can also save them in a
      <literal>hb_segment_properties_t</literal> for reuse:
    </para>
    <programlisting language="C">
      hb_segment_properties_t *savedprops;
      hb_buffer_get_segment_properties (buf, savedprops);
      ...
      hb_buffer_set_segment_properties (buf2, savedprops);
    </programlisting>
    <para>
      HarfBuzz also provides getter functions to retrieve a buffer's
      direction, script, and language properties individually.
    </para>
    <para>
      HarfBuzz recognizes four text directions in
      <type>hb_direction_t</type>: left-to-right
      (<literal>HB_DIRECTION_LTR</literal>), right-to-left (<literal>HB_DIRECTION_RTL</literal>),
      top-to-bottom (<literal>HB_DIRECTION_TTB</literal>), and
      bottom-to-top (<literal>HB_DIRECTION_BTT</literal>).  For the
      script property, HarfBuzz uses identifiers based on the
      <ulink
      url="https://unicode.org/iso15924/">ISO 15924
      standard</ulink>. For languages, HarfBuzz uses tags based on the
      <ulink url="https://tools.ietf.org/html/bcp47">IETF BCP 47</ulink> standard.
    </para>
    <para>
      Helper functions are provided to convert character strings into
      the necessary script and language tag types.
    </para>
    <para>
      Two additional buffer properties to be aware of are the
      "invisible glyph" and the replacement code point. The
      replacement code point is inserted into buffer output in place of
      any invalid code points encountered in the input. By default, it
      is the Unicode <literal>REPLACEMENT CHARACTER</literal> code
      point, <literal>U+FFFD</literal> "&#xFFFD;". You can change this with
    </para>
    <programlisting language="C">
      hb_buffer_set_replacement_codepoint(buf, replacement);
    </programlisting>
    <para>
      passing in the replacement Unicode code point as the
      <parameter>replacement</parameter> parameter.
    </para>
    <para>
      The invisible glyph is used to replace all output glyphs that
      are invisible. By default, the standard space character
      <literal>U+0020</literal> is used; you can replace this (for
      example, when using a font that provides script-specific
      spaces) with 
    </para>
    <programlisting language="C">
      hb_buffer_set_invisible_glyph(buf, replacement_glyph);
    </programlisting>
    <para>
      Do note that in the <parameter>replacement_glyph</parameter>
      parameter, you must provide the glyph ID of the replacement you
      wish to use, not the Unicode code point.
    </para>
    <para>
      HarfBuzz supports a few additional flags you might want to set
      on your buffer under certain circumstances. The
      <literal>HB_BUFFER_FLAG_BOT</literal> and
      <literal>HB_BUFFER_FLAG_EOT</literal> flags tell HarfBuzz
      that the buffer represents the beginning or end (respectively)
      of a text element (such as a paragraph or other block). Knowing
      this allows HarfBuzz to apply certain contextual font features
      when shaping, such as initial or final variants in connected
      scripts.
    </para>
    <para>
      <literal>HB_BUFFER_FLAG_PRESERVE_DEFAULT_IGNORABLES</literal>
      tells HarfBuzz not to hide glyphs with the
      <literal>Default_Ignorable</literal> property in Unicode. This 
      property designates control characters and other non-printing
      code points, such as joiners and variation selectors. Normally
      HarfBuzz replaces them in the output buffer with zero-width
      space glyphs (using the "invisible glyph" property discussed
      above); setting this flag causes them to be printed, which can
      be helpful for troubleshooting.
    </para>
    <para>
      Conversely, setting the
      <literal>HB_BUFFER_FLAG_REMOVE_DEFAULT_IGNORABLES</literal> flag
      tells HarfBuzz to remove <literal>Default_Ignorable</literal>
      glyphs from the output buffer entirely. Finally, setting the
      <literal>HB_BUFFER_FLAG_DO_NOT_INSERT_DOTTED_CIRCLE</literal>
      flag tells HarfBuzz not to insert the dotted-circle glyph
      (<literal>U+25CC</literal>, "&#x25CC;"), which is normally
      inserted into buffer output when broken character sequences are
      encountered (such as combining marks that are not attached to a
      base character).
    </para>
  </section>
  
  <section id="customizing-unicode-functions">
    <title>Customizing Unicode functions</title>
    <para>
      HarfBuzz requires some simple functions for accessing
      information from the Unicode Character Database (such as the
      <literal>General_Category</literal> (gc) and
      <literal>Script</literal> (sc) properties) that is useful
      for shaping, as well as some useful operations like composing and
      decomposing code points.
    </para>
    <para>
      HarfBuzz includes its own internal, lightweight set of Unicode
      functions. At build time, it is also possible to compile support
      for some other options, such as the Unicode functions provided
      by GLib or the International Components for Unicode (ICU)
      library. Generally, this option is only of interest for client
      programs that have specific integration requirements or that do
      a significant amount of customization.
    </para>
    <para>
      If your program has access to other Unicode functions, however,
      such as through a system library or application framework, you
      might prefer to use those instead of the built-in
      options. HarfBuzz supports this by implementing its Unicode
      functions as a set of virtual methods that you can replace —
      without otherwise affecting HarfBuzz's functionality.
    </para>
    <para>
      The Unicode functions are specified in a structure called
      <literal>unicode_funcs</literal> which is attached to each
      buffer. But even though <literal>unicode_funcs</literal> is
      associated with a <type>hb_buffer_t</type>, the functions
      themselves are called by other HarfBuzz APIs that access
      buffers, so it would be unwise for you to hook different
      functions into different buffers.
    </para>
    <para>
      In addition, you can mark your <literal>unicode_funcs</literal>
      as immutable by calling
      <function>hb_unicode_funcs_make_immutable (ufuncs)</function>.
      This is especially useful if your code is a
      library or framework that will have its own client programs. By
      marking your Unicode function choices as immutable, you prevent
      your own client programs from changing the
      <literal>unicode_funcs</literal> configuration and introducing
      inconsistencies and errors downstream.
    </para>
    <para>
      You can retrieve the Unicode-functions configuration for
      your buffer by calling <function>hb_buffer_get_unicode_funcs()</function>:
    </para>
    <programlisting language="C">
      hb_unicode_funcs_t *ufunctions;
      ufunctions = hb_buffer_get_unicode_funcs(buf);
    </programlisting>
    <para>
      The current version of <literal>unicode_funcs</literal> uses six functions:
    </para>
    <itemizedlist>
      <listitem>
	<para>
	  <function>hb_unicode_combining_class_func_t</function>:
	  returns the Canonical Combining Class of a code point.
      	</para>
      </listitem>
      <listitem>
	<para>
	  <function>hb_unicode_general_category_func_t</function>:
	  returns the General Category (gc) of a code point.
      	</para>
      </listitem>
      <listitem>
	<para>
	  <function>hb_unicode_mirroring_func_t</function>: returns
	  the Mirroring Glyph code point (for bi-directional
	  replacement) of a code point.
      	</para>
      </listitem>
      <listitem>
	<para>
	  <function>hb_unicode_script_func_t</function>: returns the
	  Script (sc) property of a code point.
      	</para>
      </listitem>
      <listitem>
	<para>
	  <function>hb_unicode_compose_func_t</function>: returns the
	  canonical composition of a sequence of two code points.
	</para>
      </listitem>
      <listitem>
	<para>
	  <function>hb_unicode_decompose_func_t</function>: returns
	  the canonical decomposition of a code point.
	</para>
      </listitem>
    </itemizedlist>
    <para>
      Note, however, that future HarfBuzz releases may alter this set.
    </para>
    <para>
      Each Unicode function has a corresponding setter, with which you
      can assign a callback to your replacement function. For example,
      to replace
      <function>hb_unicode_general_category_func_t</function>, you can call
    </para>
    <programlisting language="C">
      hb_unicode_funcs_set_general_category_func (*ufuncs, func, *user_data, destroy)	    
    </programlisting>
    <para>
      Virtualizing this set of Unicode functions is primarily intended
      to improve portability. There is no need for every client
      program to make the effort to replace the default options, so if
      you are unsure, do not feel any pressure to customize
      <literal>unicode_funcs</literal>. 
    </para>
  </section>
  
</chapter>
Usermanual: minor wording updates, build fixes. 6 years ago			`<?xml version="1.0"?>`
			`<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"`
			`"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [`
			`<!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'">`
			`<!ENTITY version SYSTEM "version.xml">`
			`]>`
Correct tag hierarchy, to allow for table-of-contents entries. 9 years ago			`<chapter id="buffers-language-script-and-direction">`
Current state and skeleton outline 9 years ago			`<title>Buffers, language, script and direction</title>`
			`<para>`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`The input to the HarfBuzz shaper is a series of Unicode characters, stored in a`
Current state and skeleton outline 9 years ago			`buffer. In this chapter, we'll look at how to set up a buffer with`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`the text that we want and how to customize the properties of the`
			`buffer. We'll also look at a piece of lower-level machinery that`
			`you will need to understand before proceeding: the functions that`
			`HarfBuzz uses to retrieve Unicode information.`
			`</para>`
			`<para>`
			`After shaping is complete, HarfBuzz puts its output back`
			`into the buffer. But getting that output requires setting up a`
			`face and a font first, so we will look at that in the next chapter`
			`instead of here.`
Current state and skeleton outline 9 years ago			`</para>`
Correct tag hierarchy, to allow for table-of-contents entries. 9 years ago			`<section id="creating-and-destroying-buffers">`
Current state and skeleton outline 9 years ago			`<title>Creating and destroying buffers</title>`
			`<para>`
Usermanual: small updates. 6 years ago			`As we saw in our <emphasis>Getting Started</emphasis> example, a`
			`buffer is created and`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`initialized with <function>hb_buffer_create()</function>. This`
Current state and skeleton outline 9 years ago			`produces a new, empty buffer object, instantiated with some`
			`default values and ready to accept your Unicode strings.`
			`</para>`
			`<para>`
Usermanual: small updates. 6 years ago			`HarfBuzz manages the memory of objects (such as buffers) that it`
			`creates, so you don't have to. When you have finished working on`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`a buffer, you can call <function>hb_buffer_destroy()</function>:`
Current state and skeleton outline 9 years ago			`</para>`
			`<programlisting language="C">`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`hb_buffer_t *buf = hb_buffer_create();`
			`...`
			`hb_buffer_destroy(buf);`
			`</programlisting>`
Current state and skeleton outline 9 years ago			`<para>`
			`This will destroy the object and free its associated memory -`
			`unless some other part of the program holds a reference to this`
[docs] s/Harfbuzz/HarfBuzz/g 7 years ago			`buffer. If you acquire a HarfBuzz buffer from another subsystem`
Current state and skeleton outline 9 years ago			`and want to ensure that it is not garbage collected by someone`
			`else destroying it, you should increase its reference count:`
			`</para>`
			`<programlisting language="C">`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`void somefunc(hb_buffer_t *buf) {`
			`buf = hb_buffer_reference(buf);`
			`...`
			`</programlisting>`
Current state and skeleton outline 9 years ago			`<para>`
			`And then decrease it once you're done with it:`
			`</para>`
			`<programlisting language="C">`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`hb_buffer_destroy(buf);`
			`}`
			`</programlisting>`
			`<para>`
			`While we are on the subject of reference-counting buffers, it is`
			`worth noting that an individual buffer can only meaningfully be`
			`used by one thread at a time.`
			`</para>`
Current state and skeleton outline 9 years ago			`<para>`
			`To throw away all the data in your buffer and start from scratch,`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`call <function>hb_buffer_reset(buf)</function>. If you want to`
Current state and skeleton outline 9 years ago			`throw away the string in the buffer but keep the options, you can`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`instead call <function>hb_buffer_clear_contents(buf)</function>.`
Current state and skeleton outline 9 years ago			`</para>`
Correct tag hierarchy, to allow for table-of-contents entries. 9 years ago			`</section>`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago
Correct tag hierarchy, to allow for table-of-contents entries. 9 years ago			`<section id="adding-text-to-the-buffer">`
Current state and skeleton outline 9 years ago			`<title>Adding text to the buffer</title>`
			`<para>`
[docs] s/Harfbuzz/HarfBuzz/g 7 years ago			`Now we have a brand new HarfBuzz buffer. Let's start filling it`
			`with text! From HarfBuzz's perspective, a buffer is just a stream`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`of Unicode code points, but your input string is probably in one of`
			`the standard Unicode character encodings (UTF-8, UTF-16, or`
			`UTF-32). HarfBuzz provides convenience functions that accept`
			`each of these encodings:`
			`<function>hb_buffer_add_utf8()</function>,`
			`<function>hb_buffer_add_utf16()</function>, and`
			`<function>hb_buffer_add_utf32()</function>. Other than the`
			`character encoding they accept, they function identically.`
			`</para>`
			`<para>`
			`You can add UTF-8 text to a buffer by passing in the text array,`
			`the array's length, an offset into the array for the first`
			`character to add, and the length of the segment to add:`
			`</para>`
			`<programlisting language="C">`
			`hb_buffer_add_utf8 (hb_buffer_t *buf,`
			`const char *text,`
			`int text_length,`
			`unsigned int item_offset,`
			`int item_length)`
			`</programlisting>`
			`<para>`
			`So, in practice, you can say:`
			`</para>`
			`<programlisting language="C">`
			`hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text));`
			`</programlisting>`
			`<para>`
			`This will append your new characters to`
			`<parameter>buf</parameter>, not replace its existing`
			`contents. Also, note that you can use <literal>-1</literal> in`
			`place of the first instance of <function>strlen(text)</function>`
			`if your text array is NULL-terminated. Similarly, you can also use`
			`<literal>-1</literal> as the final argument want to add its full`
			`contents.`
			`</para>`
			`<para>`
			`Whatever start <parameter>item_offset</parameter> and`
			`<parameter>item_length</parameter> you provide, HarfBuzz will also`
			`attempt to grab the five characters <emphasis>before</emphasis>`
			`the offset point and the five characters`
			`<emphasis>after</emphasis> the designated end. These are the`
			`before and after "context" segments, which are used internally`
			`for HarfBuzz to make shaping decisions. They will not be part of`
			`the final output, but they ensure that HarfBuzz's`
			`script-specific shaping operations are correct. If there are`
			`fewer than five characters available for the before or after`
			`contexts, HarfBuzz will just grab what is there.`
			`</para>`
			`<para>`
			`For longer text runs, such as full paragraphs, it might be`
			`tempting to only add smaller sub-segments to a buffer and`
			`shape them in piecemeal fashion. Generally, this is not a good`
			`idea, however, because a lot of shaping decisions are`
			`dependent on this context information. For example, in Arabic`
			`and other connected scripts, HarfBuzz needs to know the code`
			`points before and after each character in order to correctly`
			`determine which glyph to return.`
			`</para>`
			`<para>`
			`The safest approach is to add all of the text available, then`
			`use <parameter>item_offset</parameter> and`
			`<parameter>item_length</parameter> to indicate which characters you`
			`want shaped, so that HarfBuzz has access to any context.`
Current state and skeleton outline 9 years ago			`</para>`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`<para>`
			`You can also add Unicode code points directly with`
			`<function>hb_buffer_add_codepoints()</function>. The arguments`
			`to this function are the same as those for the UTF`
			`encodings. But it is particularly important to note that`
			`HarfBuzz does not do validity checking on the text that is added`
			`to a buffer. Invalid code points will be replaced, but it is up`
			`to you to do any deep-sanity checking necessary.`
			`</para>`

Correct tag hierarchy, to allow for table-of-contents entries. 9 years ago			`</section>`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago
Correct tag hierarchy, to allow for table-of-contents entries. 9 years ago			`<section id="setting-buffer-properties">`
Current state and skeleton outline 9 years ago			`<title>Setting buffer properties</title>`
			`<para>`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`Buffers containing input characters still need several`
			`properties set before HarfBuzz can shape their text correctly.`
Current state and skeleton outline 9 years ago			`</para>`
			`<para>`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`Initially, all buffers are set to the`
			`<literal>HB_BUFFER_CONTENT_TYPE_INVALID</literal> content`
			`type. After adding text, the buffer should be set to`
			`<literal>HB_BUFFER_CONTENT_TYPE_UNICODE</literal> instead, which`
			`indicates that it contains un-shaped input`
			`characters. After shaping, the buffer will have the`
			`<literal>HB_BUFFER_CONTENT_TYPE_GLYPHS</literal> content type.`
			`</para>`
			`<para>`
			`<function>hb_buffer_add_utf8()</function> and the`
			`other UTF functions set the content type of their buffer`
			`automatically. But if you are reusing a buffer you may want to`
			`check its state with`
			`<function>hb_buffer_get_content_type(buffer)</function>. If`
			`necessary you can set the content type with`
			`</para>`
			`<programlisting language="C">`
			`hb_buffer_set_content_type(buf, HB_BUFFER_CONTENT_TYPE_UNICODE);`
			`</programlisting>`
			`<para>`
			`to prepare for shaping.`
			`</para>`
			`<para>`
			`Buffers also need to carry information about the script,`
			`language, and text direction of their contents. You can set`
			`these properties individually:`
			`</para>`
			`<programlisting language="C">`
			`hb_buffer_set_direction(buf, HB_DIRECTION_LTR);`
			`hb_buffer_set_script(buf, HB_SCRIPT_LATIN);`
			`hb_buffer_set_language(buf, hb_language_from_string("en", -1));`
			`</programlisting>`
			`<para>`
			`However, since these properties are often the repeated for`
			`multiple text runs, you can also save them in a`
			`<literal>hb_segment_properties_t</literal> for reuse:`
			`</para>`
			`<programlisting language="C">`
			`hb_segment_properties_t *savedprops;`
			`hb_buffer_get_segment_properties (buf, savedprops);`
			`...`
			`hb_buffer_set_segment_properties (buf2, savedprops);`
			`</programlisting>`
			`<para>`
			`HarfBuzz also provides getter functions to retrieve a buffer's`
			`direction, script, and language properties individually.`
			`</para>`
			`<para>`
			`HarfBuzz recognizes four text directions in`
			`<type>hb_direction_t</type>: left-to-right`
			`(<literal>HB_DIRECTION_LTR</literal>), right-to-left (<literal>HB_DIRECTION_RTL</literal>),`
			`top-to-bottom (<literal>HB_DIRECTION_TTB</literal>), and`
			`bottom-to-top (<literal>HB_DIRECTION_BTT</literal>). For the`
			`script property, HarfBuzz uses identifiers based on the`
			`<ulink`
Usermanual; minor. 6 years ago			`url="https://unicode.org/iso15924/">ISO 15924`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`standard</ulink>. For languages, HarfBuzz uses tags based on the`
			`<ulink url="https://tools.ietf.org/html/bcp47">IETF BCP 47</ulink> standard.`
			`</para>`
			`<para>`
			`Helper functions are provided to convert character strings into`
			`the necessary script and language tag types.`
			`</para>`
			`<para>`
			`Two additional buffer properties to be aware of are the`
			`"invisible glyph" and the replacement code point. The`
			`replacement code point is inserted into buffer output in place of`
			`any invalid code points encountered in the input. By default, it`
			`is the Unicode <literal>REPLACEMENT CHARACTER</literal> code`
			`point, <literal>U+FFFD</literal> "�". You can change this with`
			`</para>`
			`<programlisting language="C">`
			`hb_buffer_set_replacement_codepoint(buf, replacement);`
			`</programlisting>`
Usermanual, minor: flesh out invisible-glyph discussion in buffers chapter. 6 years ago			`<para>`
			`passing in the replacement Unicode code point as the`
			`<parameter>replacement</parameter> parameter.`
			`</para>`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`<para>`
			`The invisible glyph is used to replace all output glyphs that`
			`are invisible. By default, the standard space character`
			`<literal>U+0020</literal> is used; you can replace this (for`
			`example, when using a font that provides script-specific`
			`spaces) with`
			`</para>`
			`<programlisting language="C">`
Usermanual, minor: flesh out invisible-glyph discussion in buffers chapter. 6 years ago			`hb_buffer_set_invisible_glyph(buf, replacement_glyph);`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`</programlisting>`
Usermanual, minor: flesh out invisible-glyph discussion in buffers chapter. 6 years ago			`<para>`
			`Do note that in the <parameter>replacement_glyph</parameter>`
			`parameter, you must provide the glyph ID of the replacement you`
			`wish to use, not the Unicode code point.`
			`</para>`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`<para>`
			`HarfBuzz supports a few additional flags you might want to set`
			`on your buffer under certain circumstances. The`
			`<literal>HB_BUFFER_FLAG_BOT</literal> and`
			`<literal>HB_BUFFER_FLAG_EOT</literal> flags tell HarfBuzz`
			`that the buffer represents the beginning or end (respectively)`
			`of a text element (such as a paragraph or other block). Knowing`
			`this allows HarfBuzz to apply certain contextual font features`
			`when shaping, such as initial or final variants in connected`
			`scripts.`
			`</para>`
			`<para>`
			`<literal>HB_BUFFER_FLAG_PRESERVE_DEFAULT_IGNORABLES</literal>`
			`tells HarfBuzz not to hide glyphs with the`
			`<literal>Default_Ignorable</literal> property in Unicode. This`
			`property designates control characters and other non-printing`
			`code points, such as joiners and variation selectors. Normally`
			`HarfBuzz replaces them in the output buffer with zero-width`
Usermanual, minor: flesh out invisible-glyph discussion in buffers chapter. 6 years ago			`space glyphs (using the "invisible glyph" property discussed`
			`above); setting this flag causes them to be printed, which can`
			`be helpful for troubleshooting.`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`</para>`
			`<para>`
			`Conversely, setting the`
			`<literal>HB_BUFFER_FLAG_REMOVE_DEFAULT_IGNORABLES</literal> flag`
			`tells HarfBuzz to remove <literal>Default_Ignorable</literal>`
			`glyphs from the output buffer entirely. Finally, setting the`
			`<literal>HB_BUFFER_FLAG_DO_NOT_INSERT_DOTTED_CIRCLE</literal>`
			`flag tells HarfBuzz not to insert the dotted-circle glyph`
			`(<literal>U+25CC</literal>, "◌"), which is normally`
			`inserted into buffer output when broken character sequences are`
			`encountered (such as combining marks that are not attached to a`
			`base character).`
Current state and skeleton outline 9 years ago			`</para>`
Correct tag hierarchy, to allow for table-of-contents entries. 9 years ago			`</section>`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago
Correct tag hierarchy, to allow for table-of-contents entries. 9 years ago			`<section id="customizing-unicode-functions">`
Current state and skeleton outline 9 years ago			`<title>Customizing Unicode functions</title>`
			`<para>`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`HarfBuzz requires some simple functions for accessing`
			`information from the Unicode Character Database (such as the`
			`<literal>General_Category</literal> (gc) and`
			`<literal>Script</literal> (sc) properties) that is useful`
			`for shaping, as well as some useful operations like composing and`
			`decomposing code points.`
			`</para>`
			`<para>`
Usermanual-buffers-chapter: trim out fallback-of-ufuncs talk and just mention that stuff exists if you care to go find it. 6 years ago			`HarfBuzz includes its own internal, lightweight set of Unicode`
			`functions. At build time, it is also possible to compile support`
			`for some other options, such as the Unicode functions provided`
			`by GLib or the International Components for Unicode (ICU)`
			`library. Generally, this option is only of interest for client`
			`programs that have specific integration requirements or that do`
			`a significant amount of customization.`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`</para>`
			`<para>`
			`If your program has access to other Unicode functions, however,`
			`such as through a system library or application framework, you`
			`might prefer to use those instead of the built-in`
			`options. HarfBuzz supports this by implementing its Unicode`
			`functions as a set of virtual methods that you can replace —`
			`without otherwise affecting HarfBuzz's functionality.`
			`</para>`
			`<para>`
			`The Unicode functions are specified in a structure called`
			`<literal>unicode_funcs</literal> which is attached to each`
			`buffer. But even though <literal>unicode_funcs</literal> is`
			`associated with a <type>hb_buffer_t</type>, the functions`
			`themselves are called by other HarfBuzz APIs that access`
			`buffers, so it would be unwise for you to hook different`
			`functions into different buffers.`
			`</para>`
			`<para>`
			`In addition, you can mark your <literal>unicode_funcs</literal>`
			`as immutable by calling`
Usermanual, minor: flesh out invisible-glyph discussion in buffers chapter. 6 years ago			`<function>hb_unicode_funcs_make_immutable (ufuncs)</function>.`
			`This is especially useful if your code is a`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago			`library or framework that will have its own client programs. By`
			`marking your Unicode function choices as immutable, you prevent`
			`your own client programs from changing the`
			`<literal>unicode_funcs</literal> configuration and introducing`
			`inconsistencies and errors downstream.`
			`</para>`
			`<para>`
			`You can retrieve the Unicode-functions configuration for`
			`your buffer by calling <function>hb_buffer_get_unicode_funcs()</function>:`
			`</para>`
			`<programlisting language="C">`
			`hb_unicode_funcs_t *ufunctions;`
			`ufunctions = hb_buffer_get_unicode_funcs(buf);`
			`</programlisting>`
			`<para>`
			`The current version of <literal>unicode_funcs</literal> uses six functions:`
			`</para>`
			`<itemizedlist>`
			`<listitem>`
			`<para>`
			`<function>hb_unicode_combining_class_func_t</function>:`
			`returns the Canonical Combining Class of a code point.`
			`</para>`
			`</listitem>`
			`<listitem>`
			`<para>`
			`<function>hb_unicode_general_category_func_t</function>:`
			`returns the General Category (gc) of a code point.`
			`</para>`
			`</listitem>`
			`<listitem>`
			`<para>`
			`<function>hb_unicode_mirroring_func_t</function>: returns`
			`the Mirroring Glyph code point (for bi-directional`
			`replacement) of a code point.`
			`</para>`
			`</listitem>`
			`<listitem>`
			`<para>`
			`<function>hb_unicode_script_func_t</function>: returns the`
			`Script (sc) property of a code point.`
			`</para>`
			`</listitem>`
			`<listitem>`
			`<para>`
			`<function>hb_unicode_compose_func_t</function>: returns the`
			`canonical composition of a sequence of two code points.`
			`</para>`
			`</listitem>`
			`<listitem>`
			`<para>`
			`<function>hb_unicode_decompose_func_t</function>: returns`
			`the canonical decomposition of a code point.`
			`</para>`
			`</listitem>`
			`</itemizedlist>`
			`<para>`
			`Note, however, that future HarfBuzz releases may alter this set.`
			`</para>`
			`<para>`
			`Each Unicode function has a corresponding setter, with which you`
			`can assign a callback to your replacement function. For example,`
			`to replace`
			`<function>hb_unicode_general_category_func_t</function>, you can call`
			`</para>`
			`<programlisting language="C">`
			`hb_unicode_funcs_set_general_category_func (ufuncs, func, user_data, destroy)`
			`</programlisting>`
			`<para>`
			`Virtualizing this set of Unicode functions is primarily intended`
			`to improve portability. There is no need for every client`
			`program to make the effort to replace the default options, so if`
			`you are unsure, do not feel any pressure to customize`
			`<literal>unicode_funcs</literal>.`
Current state and skeleton outline 9 years ago			`</para>`
Correct tag hierarchy, to allow for table-of-contents entries. 9 years ago			`</section>`
[Docs] Usermanual; fill out Buffers chapter. 6 years ago
Usermanual: minor wording updates, build fixes. 6 years ago			`</chapter>`