Boltzmann machines

Boltzmann machine is a certain kind of probabilistic undirected graphical model / stochastic recurrent neural network / Markov Random Field (gotta love how terminology allows for to characterize the exactly same thing), applied for machine learning tasks with a energy-based method. The exact definition:

(Unrestricted) Boltzmann machine is a network of units $i = 1, \dots, n$ , each with a binary state $s_i \in \{0, 1\}$ .

The network of units has an energy, defined as a function

$E = - \left(\sum_{i,j} w_{i,j} s_i s_j + \sum_i \theta_i s_i\right),$

where $\theta_i \in \mathbb{R}$ is a bias of the unit.

Related (non-stochastic) concept: Hopfield network.

For learning task, the untis of a Boltzmann machiene are divided into a visible (input) layer and hidden layer. The general unrestricted Boltzmann machine can be used for learning with an energy-based method (simulated annealing based on Boltzmann distribution; here is the reason for the name “Boltzmann machine” I guess?). However, in practical settings, Boltzmann machines more powerful when their connectivity is restricted.

Restricted Boltzmann machines

The classical restricted Boltzmann machine (RBM) is structured as a bipartite graph of a hidden layer and a visible layer of units, i.e. the nodes in each layer are connected only to the units of the other layer.

Each possible state (configuration vector $s$ ) is given a probability

$P(s) = \frac{1}{Z} \exp(-E(s)).$

This is still an energy based model (not unlike simulated annealing), but the actual training algorithms are not annealing. Instead performing a stochastic gradient descent on a log-likelihood loss function (along the lines of)

$L(\theta, D) = \frac{1}{N}\sum_{s\in D} \log P(s)$

is preferred. A derivatives can be derived for training data. (Details slightly more complicated.)

References / Resources

Other

related concept: Kullback-Leibler divergence