lwsync() is a full read/write-barrier. That's all we need, never
need sync(). I'm not sure why an isync() was used in fetch_and_add,
but since that's a read-modify-write, I just changed it to have
lwsync() on both sides.
Originally, glib's atomic_get was implemented as "memory_barrier; load".
I copied this into cairo, fontconfig, and harfbuzz. However, that's
wrong. Correct way is "load; memory_barrier". The details are long
and hard to fully grasp. Best to read:
https://www.kernel.org/doc/Documentation/memory-barriers.txt
Also see my report against GNOME:
https://gitlab.gnome.org/GNOME/glib/issues/1449
Note that this is irrelevant if C++11-like atomic ops are available.
This reverts commit b1e5650c67.
After lots of head-scratching and finally finding the only truly
readable source to be the good old:
https://www.kernel.org/doc/Documentation/memory-barriers.txt
I've convinced myself that we need consume memory-ordering on get().
The location of memory-barrier in a load should be after, not before
the load. That needs fixing. I'll do that separately.
In macOS 10.12, the `OSMemoryBarrier` and related APIs were deprecated
in favor of using `std::atomic`. On the way to supporting `std::atomic`,
we can favor using the "Intel primitives" which are also available on
macOS.
s/atomic_int/atomic_int_impl/ and s/atomic_ptr/atomic_ptr_impl/
to bring it in par with hb_mutex_impl_t, then re-introduce
hb_atomic_int_t as a wrapper around hb_atomic_int_impl_t.
In hb_reference_count_t, make it clear the non-atomic get and set
are intentional due to nature of the cases they are used in
(comparison to -1 and the debug output/tracing).
I added these because the older mingw32 toolchain didn't have
MemoryBarrier(). The newer mingw-w64 toolchain however has.
As reported by John Emmas this was causing build failure with
MSVC (on glib) because of inline issues. But that reminded me
that we may be taking this path even if the system implements
MemoryBarrier as a function, which is a waste. So, just remove
it.