Short blog post to the paper: Hopfield Networks is All You Need, öffnet eine externe URL in einem neuen Fenster
For a much longer and detailed version go to our official blog post, öffnet eine externe URL in einem neuen Fenster.
Github link: https://github.com/ml-jku/hopfield-layers, öffnet eine externe URL in einem neuen Fenster
Main contributions
We introduce a new energy function and a corresponding new update rule which is guaranteed to converge to a local minimum of the energy function.
The new energy function is a generalization (discrete states -> continuous states) of the modern Hopfield networks introduced by Krotov&Hopfield, öffnet eine externe URL in einem neuen Fenster and Demircigil et al, öffnet eine externe URL in einem neuen Fenster.
The new Hopfield network with continuous states keeps the characteristics of their discrete counterparts: exponential storage capacity, extremely fast convergence.
Surprisingly, the new update rule is the attention mechanism of transformer networks, see the "Attention Is All You Need" paper of Vaswani et al, öffnet eine externe URL in einem neuen Fenster.
We use these new insights to analyze transformer models. We found out that they have different operating modes and prefer operating in higher energy minima, which are metastable states.
We therefore choose the title Hopfield Networks is All You Need.

A new energy function and a new update rule
We introduce a new energy function using the log-sum-exp function (lse)
\displaystyle \text{E} = - \text{lse}\left( -\beta, \boldsymbol{X}^T \mathbf{\xi} \right) + \frac{1}{2} \xi^T \xi + \beta^{-1} \text{log} N + \frac{1}{2} M^2 \ ,
which is constructed from N continuous patterns by the matrix \boldsymbol{X} =(\boldsymbol{x}_1, ..., \boldsymbol{x}_N), where M is the largest norm of all patterns.
The state \xi is updated by the following new udpate rule:
\xi^{\text{new}} = \boldsymbol{X} \text{softmax} (\beta \boldsymbol{X}^T \xi) .
We can now compare our new energy function to the discrete counterparts of Krotov&Hopfield, öffnet eine externe URL in einem neuen Fenster and Demircigil et al., öffnet eine externe URL in einem neuen Fenster, which are also composed of a sum of a function of the dot product of a pattern \boldsymbol{x}_i and a state \xi:
\displaystyle \text{E} = - F(\boldsymbol{X}^T \xi) \quad \text{and} \quad \displaystyle \text{E} = -\text{exp} (\text{lse}(1,\boldsymbol{X}^T\xi)) \ .
The most important properties of our new energy function are:
- Global convergence to a local minimum (Theorem 2)
- Exponential storage capacity (Theorem 3)
- Convergence after one update step (Theorem 4)
Exponential storage capacity and convergence after one update are inherited from Demircigil et al., öffnet eine externe URL in einem neuen Fenster
If we now (i) generalize the new update rule to multiple updates at once (\xi is replaced by the query matrix \boldsymbol{Q}), (ii) \boldsymbol{X} is denoted by \boldsymbol{K}, and (iii) the result is multiplied by \boldsymbol{W}_V setting \boldsymbol{V} = \boldsymbol{W}_V \boldsymbol{K}, we arrive at the self-attention of transformer networks.
Versatile Hopfield layer (beyond self-attention)
The new insights allow us to introduce a new PyTorch Hopfield layer which can be used as plug-in replacement for existing layers as well as for applications like multiple instance learning, set-based and permutation invariant learning, associative learning, and many more.
Additional functionalities of the new Hopfield layer compared to the transformer self-attention layer are:
- Association of two sets
- Variable Beta that determines the kind of fixed points
- Multiple Updates for precise fixed points
- Dimension of the associative space for controlling the storage capacity
- Static Patterns for fixed pattern search
- Pattern Normalization to control the fixed point dynamics by norm and shift of the patterns
If you want to test all these new functionalities in transformer models you can pass the Hopfield encoder layer and Hopfield decoder layer to the transformer encoder and transformer decoder modules.
For more information see Appendix C in our paper Hopfield Networks is All You Need, öffnet eine externe URL in einem neuen Fenster and our github repo https://github.com/ml-jku/hopfield-layers, öffnet eine externe URL in einem neuen Fenster .