Theory and Practice: Tree Buffers

Circular buffers are one of the most fundamental and pervasive data structures. They are an efficient implementation for buffering linear sequences. Tree buffers are a more general data structure.

You heard of circular buffers: Alice reads out a sequence $x_0$, $x_1$, $x_2$, … of facts. Bob's memory has size $h$, and he stores $x_k$ at address $k \bmod h$. So, Bob always remembers the last $h$ facts that Alice read.

Let's see one way to generalize the problem. Instead of reading out a sequence of facts, Alice gives reasons for the stated facts. In other words, Alice describes a tree, not a sequence. For example, Alice could describe the tree

by saying $$\begin{align*} &\text{$0:a$} \\ &\text{$1:b$, follows from $0$}\\ &\text{$2:b$, follows from $0$}\\ &\text{$3:c$, follows from $1$} \end{align*}$$ Each node has an identifier and a label. In this example, the identifiers are $0$, $1$, $2$, and $3$, but they could be anything that doesn't repeat. The labels put on nodes, on the other hand, could repeat. Indeed, the label $b$ repeats in the example above. You may think of identifiers as memory addresses, and of labels as memory content.

Alice could describe the same tree by saying $$\begin{align*} &\text{$0:a$} \\ &\text{$2:b$, follows from $0$}\\ &\text{$1:b$, follows from $0$}\\ &\text{$3:c$, follows from $1$} \end{align*}$$

In the case of the circular buffer, Bob recalls only the $h$ most recent facts. What would the equivalent be for trees? However we define the equivalent, the following properties are desirable:

What constitutes a recent fact should depend on the structure of the tree, not on the order in which Alice decided to describe the tree.
The case of a skinny, chain-like tree should simplify to the case handled by circular buffers.

For example, if the last thing Alice said was ‘$y$ follows from $x$’ (omitting the label of $y$), then we'd expect to have in memory $h$ ancestors of $y$. But, this implies that one timestep ahead, just before Alice said ‘$y$ follows from $x$’, we must have had in memory $h-1$ ancestors of $x$. If $x$ can be any of the nodes that Alice mentioned previously, then we must keep in memory all the nodes that Alice mentioned! We just can't do better unless we ask for a little help from Alice: We'd like her to also tell Bob if she isn't going to add any more children to a node.

A node is implicitly active when Alice adds it to the tree, then the node possibly receives some children, then the node is explicitly made inactive, and then the node receives no more children. In particular, a node is always active immediately after being added to the tree. In this context, a natural generalization of the linear case is the following: Ensure that Bob keeps in memory $h$ recent ancestors for each active node. Another way to say this is that we want to keep in memory those nodes that are at distance $\le h$ from some active node.

Of course, Bob would like to not keep in memory much more than necessary. Also, Bob wants to process each message from Alice in constant time. Can he do this? The task is difficult because one message from Alice may change the height of many nodes — too many to process in constant time. You can check this by playing with the tree below.

(Update: The slides from CAV2015 have a better click-and-see interface.)

Instructions: Click on a green node to add a child. Control-click a green node to deactivate it. The $h$ is fixed to be $3$. The nodes are labeled by their distance to an active node. The green nodes are active, the yellow nodes are inactive but needed, the red nodes are not needed. Example: To simulate a linear buffer, repeatedly do the following: (1) click on the green node, (2) control-click on the same node.

Yet, Bob can do it! For details, see the paper and the code. There, you will also find a generic application for this data structure: tracing nondeterministic automata, not necessarily finite. This application has special cases like

providing error traces during runtime verification, and
providing functionality similar to that of regexp capturing groups

Theory and Practice

27 April 2015

Tree Buffers

No comments:

Post a Comment