From 14ea772bfddf7bc9a93be8e4bbbcd0c911ab86f0 Mon Sep 17 00:00:00 2001
From: Vijay Pai <vpai@google.com>
Date: Fri, 17 Mar 2017 10:29:20 -0700
Subject: [PATCH] Explain combiner

---
 doc/combiner-explainer.md | 158 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 158 insertions(+)
 create mode 100644 doc/combiner-explainer.md

diff --git a/doc/combiner-explainer.md b/doc/combiner-explainer.md
new file mode 100644
index 00000000000..9d0a4f30f22
--- /dev/null
+++ b/doc/combiner-explainer.md
@@ -0,0 +1,158 @@
+# Combiner Explanation
+## Talk by ctiller, notes by vjpai
+
+Typical way of doing critical section
+
+```
+mu.lock()
+do_stuff()
+mu.unlock()
+```
+
+An alternative way of doing it is
+
+```
+class combiner {
+  run(f) {
+    mu.lock()
+    f()
+    mu.unlock()
+  }
+  mutex mu;
+}
+
+combiner.run(do_stuff)
+```
+
+If you have two threads calling combiner, there will be some kind of
+queuing in place. It's called `combiner` because you can pass in more
+than one do_stuff at once and they will run under a common `mu`.
+
+The implementation described above has the issue that you're blocking a thread
+for a period of time, and this is considered harmful because it's an application thread that you're blocking.
+
+Instead, get a new property:
+* Keep things running in serial execution
+* Don't ever sleep the thread
+* But maybe allow things to end up running on a different thread from where they were started
+* This means that `do_stuff` doesn't necessarily run to completion when `combiner.run` is invoked
+
+```
+class combiner {
+  mpscq q; // multi-producer single-consumer queue can be made non-blocking
+  state s; // is it empty or executing
+  
+  run(f) {
+    if (q.push(f)) { 
+      // q.push returns true if it's the first thing
+      while (q.pop(&f)) { // modulo some extra work to avoid races
+        f();
+      }
+    }
+  }
+}
+```
+
+The basic idea is that the first one to push onto the combiner
+executes the work and then keeps executing functions from the queue
+until the combiner is drained.
+
+Our combiner does some additional work, with the motivation of write-batching.
+
+We have a second tier of `run` called `run_finally`. Anything queued
+onto `run_finally` runs after we have drained the queue. That means
+that there is essentially a finally-queue. This is not guaranteed to
+be final, but it's best-effort. In the process of running the finally
+item, we might put something onto the main combiner queue and so we'll
+need to re-enter.
+
+`chttp2` runs all ops in the run state except if it sees a write it puts that into a finally. That way anything else that gets put into the combiner can add to that write.
+
+```
+class combiner {
+  mpscq q; // multi-producer single-consumer queue can be made non-blocking
+  state s; // is it empty or executing
+  queue finally; // you can only do run_finally when you are already running something from the combiner
+  
+  run(f) {
+    if (q.push(f)) { 
+      // q.push returns true if it's the first thing
+      loop:
+      while (q.pop(&f)) { // modulo some extra work to avoid races
+        f();
+      }
+      while (finally.pop(&f)) {
+        f();
+      }
+      goto loop;
+    }
+  }
+}
+```
+
+So that explains how combiners work in general. In gRPC, there is
+`start_batch(..., tag)` and then work only gets activated by somebody
+calling `cq::next` which returns a tag. This gives and API-level
+guarantee that there will be a thread doing polling to actually make
+work happen. However, some operations are not covered by a poller
+thread, such as cancellation that doesn't have a completion. Other
+callbacks that don't have a completion are the internal work that gets
+done before the batch gets completed. We need a condition called
+`covered_by_poller` that means that the item will definitely need some
+thread at some point to call `cq::next` . This includes those
+callbacks that directly cause a completion but also those that are
+indirectly required before getting a completion. If we can't tell for
+sure for a specific path, we have to assumed it is not covered by
+poller.
+
+The above combiner has the problem that it keeps draining for a
+potentially infinite amount of time and that can lead to a huge tail
+latency for some operations. So we can tweak it by returning to the application
+if we know that it is valid to do so:
+
+```
+while (q.pop(&f)) {
+  f();
+  if (control_can_be_returned && some_still_queued_thing_is_covered_by_poller) {
+    offload_combiner_work_to_some_other_thread();
+  }
+}
+```
+
+`offload` is more than `break`; it does `break` but also causes some
+other thread that is currently waiting on a poll to break out of its
+poll. This is done by setting up a per-polling-island work-queue
+(distributor) wakeup FD. The work-queue is the converse of the combiner; it
+tries to spray events onto as many threads as possible to get as much concurrency as possible.
+
+So `offload` really does:
+
+``` 
+  distributor.run(continue_from_while_loop);
+  break;
+```
+
+This needs us to add another class variable for a `distributor`
+(called a `workqueue` in the code).
+
+```
+workqueue::run(f) {
+  q.push(f)
+  eventfd.wakeup()
+}
+
+workqueue::readable() {
+  eventfd.consume();
+  q.pop(&f);
+  f();
+  if (!q.empty()) {
+    eventfd.wakeup(); // spray across as many threads as are waiting on this workqueue
+  }
+}
+```
+
+In principle, `run_finally` could get starved, but this hasn't
+happened in practice. If we were concerned about this, we could put a
+limit on how many things come off the regular `q` before the `finally`
+queue gets processed.
+