Tic-tac-toe, (or noughts and crosses or Xs and Ox), is a turn-based game for two players who alternately tag the spaces of a $3 \times 3$ grid with their respective marker: an X or an O. The object of the game is to place three markers in a row, either horizontally, vertically, or diagonally. Given only the mechanics of Tic-tac-toe, the game can be expressed as Combinatorial Group by defining a set $A$ of generators $\{a_i\}$ which describe the actions that can be taken by either player. The Cayley Graph of this group can be constructed which will express all the possible ways the game can be played. Using the Cayley Graph as a model, it should be possible to learn the Tic-tac-toe game tree using dynamic programming techniques (hint: the game tree is a sub-graph of the Cayley Graph).
Before going any further, it is important to understand the structure of the Tic-tac-toe group. Tic-tac-toe is expressed as a finite combinatorial group on the set, $S$, of $4^9$ possible board positions: the 9 grid locations which can be empty or contain an X, an O, or the superposition of X and O, $\ast$. The generator set, $A$, is a proper subset of $S$ with a cardinality of 10; the tagging of each of the 9 grid locations with a marker, and the empty grid (not playing is also a valid play). The identity element of the group is the empty grid, $\varnothing$, which is also the initial configuration in the game. The group law is the bijective group operation which combines an initial state with an action to produce the final state, and is expressed as follows:
$$ p: S \times S \mapsto S $$
with
$$ p(S,S) = \{ s, s' \in S : s \cdot s' = s_{ij} \cdot s'_{ij} \} $$
In other words, the application of the group law will evaluate the dot-product of each grid cell location. The dot-product of grid cells is defined as follows:
$$ s_{ij} \cdot s'_{ij} = \left\{
\begin{array}{lr}
s_{ij} & \quad s_{ij} \neq \varnothing \land s'_{ij} = \varnothing \\
s'_{ij} & \quad s_{ij} = \varnothing \land s'_{ij} \neq \varnothing \\
\ast & \quad s_{ij} = \overline{s'_{ij}} \\
\varnothing & \quad s_{ij} = s'_{ij} \\
\overline{s_{ij}} & s'_{ij} = \ast \land s_{ij} \neq \varnothing
\end{array}
\right .
$$
The product of a marker with an empty cell tags the cell with the marker, two different markers will tag the cell with the superposition of both ($\ast$). The product of two similar markers will tag the cell as empty, therefore the group law described here is an autoinverse; this means that applying the law to a position with itself will result in the identity element.
The group $E$ is expressed as $\langle A|p \rangle$, and its full state space is specified by repeated applications of the generator. The fact that $E$ is a group can be asserted by verifying that it satisfies the group axioms:
- Totality: The set is closed under the operation $p$.
- Associativity: The operation $p$ will combine any two positions in $S$ and yield another position in $S$.
- Identity: There exists an identity element.
- Divisibility: For each element in the group, there exists an inverse which yields the identity element when the group law is applied thereto.
The proof that the group satisfies these axioms should be pretty evident. A formal proof of this fact is left as a future exercise.
- NOTE:
- The state space can be further constrained by defining a more intelligent group law. The state set $S$ could be partitioned into two sub-sets: $S = X \cup O$; where $X$ is the set of positions which allow X to play, and $O$ is the set of positions which allow O to play (note that the intersection of $X$ and $O$ is not empty). This would simplify the Cayley Graph and thus reduce the time required to learn the game tree. However, this would greatly increase the complexity of the group law, making it more prone to error.
The abstract structure of the Tic-tac-toe group can be encoded with a Cayley graph, $\Gamma$, where each of vertices represents a position, and the edges represent that possible transitions resulting from an agent making a move.
The Cayley graph of the Tic-tac-toe group is isomorphic to the backup diagram of the approximate value function, $V^\pi(s)$. By extending the graph -- associating values for each of the vertices (states), and weights for the edges -- it can be used as an initial approximation of the value function. Dynamic programming algorithms will iteratively update the values and weights to obtain a better approximation of the optimal value function. By removing the edges that tend toward a zero probability of being followed, the resulting graph should be isomorphic to the game tree.
Initially, the value of each state will be set to zero with the exception of winning states which will have high values, and losing states which have low values. Given the sets $W$ and $L$ which contain all the winning and losing positions respectively (note: $W \cap L = \emptyset$), the initial values could be assigned as follows:
$$\forall s \in S \quad : \quad V^\pi(s) = \left\{
\begin{array}{lr}
\gg 0 & \quad s \in W \\
\ll 0 & \quad s \in L \\
0 & \quad s \notin W \cup L
\end{array}
\right .
$$
The Tic-tac-toe group allows for positions that are not valid in a regular game (i.e. the states with superpositions). These moves should be suppressed in the process of iteratively improving the approximation of the value function. To do this, the transitions leading to invalid positions could be assigned a very small weight, ensuring that the probability of following the edge tends toward zero. The same could be done to prevent actions which place a marker in a previously occupied grid cell:
$$
P( s \cdot a = s') = \left\{
\begin{array}{lr}
0 & \quad \exists i,j \in \mathbb{Z}/3 : \quad s'_{ij} \neq \varnothing \land a_{ij} \neq \varnothing \\
>0 & \quad \forall i,j \in \mathbb{Z}/3 : \quad s_{ij} = \varnothing \lor a_{ij} = \varnothing
\end{array}
\right .
$$
This will ensure that an agent using the Cayley graph as a value function approximation will generally not take actions leading to invalid states (which would be seen as a newbie error or an attempt at cheating by an opponent).
The simplicity of the Tic-tac-toe problem make it a good pedagogical tool to learn about reinforcement learning.It is straightforward to write a computer program to play Tic-tac-toe perfectly, to enumerate the 765 essentially different positions (the state space complexity), or the 26,830 possible games up to rotations and reflections (the game tree complexity) on this space.[1] However, by designing a program which learns how to play rather than manually building the game tree, the relatively small state space makes it easier to validate the techniques and algorithms used. Additionally, the theoretical foundations should also be applicable to more complex problems with state spaces that are too large to hand build the associated game tree.
In this article, the Tic-tac-toe problem was expressed in group theoretic terms. There is an entire body of work on group theory which may provide valuable tools for reasoning about dynamic programming algorithms used to learn approximations of the solutions to modelled problems. In future articles, the ideas developed herein will be tested by implementing them using the Didactronic toolkit. The goals of this endeavour are two-fold: 1) to validate the hypothesis that group theory provides a useful formalism for expressing reinforcement learning systems, and 2) to drive the development of the Didactronic Toolkit to make it more useful as a generalized machine learning framework.
Add new comment