alignment options (for cache-alignment).
We shrink by:
1) Removing an unnecessary zone pointer.
2) Replacing gpr_mu (40 bytes when using pthread_mutex_t) with
std::atomic_flag.
We also header-inline the fastpath alloc (ie. when not doing a zone
alloc) and move the malloc() for a zone alloc outside of the mutex
critical zone, which allows us to replace the mutex with a spinlock.
We also cache-align created arenas.