What I’ve been reading about lately: Restricted Boltzmann machines.

Summary notes follow:

Boltzmann machines

Boltzmann machine is a certain kind of probabilistic undirected graphical model / stochastic recurrent neural network / Markov Random Field (gotta love how terminology allows for to characterize the exactly same thing), applied for machine learning tasks with a energy-based method. The exact definition:

(Unrestricted) Boltzmann machine is a network of units , each with a binary state .

The network of units has an energy, defined as a function

where is a bias of the unit.

Related (non-stochastic) concept: Hopfield network.

For learning task, the untis of a Boltzmann machiene are divided into a visible (input) layer and hidden layer. The general unrestricted Boltzmann machine can be used for learning with an energy-based method (simulated annealing based on Boltzmann distribution; here is the reason for the name “Boltzmann machine” I guess?). However, in practical settings, Boltzmann machines more powerful when their connectivity is restricted.

Restricted Boltzmann machines

The classical restricted Boltzmann machine (RBM) is structured as a bipartite graph of a hidden layer and a visible layer of units, i.e. the nodes in each layer are connected only to the units of the other layer.

Each possible state (configuration vector ) is given a probability

This is still an energy based model (not unlike simulated annealing), but the actual training algorithms are not annealing. Instead performing a stochastic gradient descent on a log-likelihood loss function (along the lines of)

is preferred. A derivatives can be derived for training data. (Details slightly more complicated.)

References / Resources

Other