Links/Notes: (Restricted) Boltzmann Machines
What I’ve been reading about lately: Restricted Boltzmann machines.
Summary notes follow:
Boltzmann machines
Boltzmann machine is a certain kind of probabilistic undirected graphical model / stochastic recurrent neural network / Markov Random Field (gotta love how terminology allows for to characterize the exactly same thing), applied for machine learning tasks with a energy-based method. The exact definition:
(Unrestricted) Boltzmann machine is a network of units , each with a binary state .
The network of units has an energy, defined as a function
where is a bias of the unit.
Related (non-stochastic) concept: Hopfield network.
For learning task, the untis of a Boltzmann machiene are divided into a visible (input) layer and hidden layer. The general unrestricted Boltzmann machine can be used for learning with an energy-based method (simulated annealing based on Boltzmann distribution; here is the reason for the name “Boltzmann machine” I guess?). However, in practical settings, Boltzmann machines more powerful when their connectivity is restricted.
Restricted Boltzmann machines
The classical restricted Boltzmann machine (RBM) is structured as a bipartite graph of a hidden layer and a visible layer of units, i.e. the nodes in each layer are connected only to the units of the other layer.
Each possible state (configuration vector ) is given a probability
This is still an energy based model (not unlike simulated annealing), but the actual training algorithms are not annealing. Instead performing a stochastic gradient descent on a log-likelihood loss function (along the lines of)
is preferred. A derivatives can be derived for training data. (Details slightly more complicated.)
References / Resources
Other
- related concept: Kullback-Leibler divergence