Today, the most popular method for training multi-layer perceptrons (MLPs) is back-propagation. The terminology "back-propagating errors" was introduced in 1962 by Frank Rosenblatt, but he did not know how to implement this, although Henry J. Kelley had a continuous precursor of backpropagation already in 1960 in the context of control theory. Modern backpropagation is actually Seppo Linnainmaa's general reverse mode of automatic differentiation (1970) for discrete connected networks of nested differentiable functions. It is an efficient application of the chain rule (derived by Gottfried Wilhelm Leibniz in 1673) to networks of differentiable nodes. In 1982, Paul Werbos applied backpropagation to MLPs in the way that has become standard. In 1985, David E. Rumelhart et al. published an experimental analysis of the technique. Many improvements have been implemented in subsequent decades.

During backpropagation, the output values are compared with the correct answer to compute the value of some predefined error-function. The error is then fed back through the network. Using this information, the algorithm adjusts the weights of each connection in order to reduce the value of the error function by some small amount. After repeating this process for a sufficiently large number of training cycles, the network will usually converge to some state where the error of the calculations is small. In this case, one would say that the network has learned a certain target function. To adjust weights properly, one applies a general method for non-linear optimization that is called gradient descent, due to Augustin-Louis Cauchy, who first suggested it in 1847. For this, the network calculates the derivative of the error function with respect to the network weights, and changes the weights such that the error decreases (thus going downhill on the surface of the error function). For this reason, back-propagation can only be applied on networks with differentiable activation functions.
given the following text, tell me the most popular method for training multi-layer perceptrons today and explain how it works
Today, the most popular method for training multi-layer perceptrons (MLPs) is back-propagation. During backpropagation, the output values are compared with the correct answer to compute the value of some predefined error-function. The error is then fed back through the network. Using this information, the algorithm adjusts the weights of each connection in order to reduce the value of the error function by a small amount. After repeating this process for a sufficiently large number of training cycles, the network will usually converge to a state where the error of the calculations is small. In this case, one would say that the network has learned a certain target function