Here’s how: prior to the transformer, what you had was essentially a set of weighted inputs. You had LSTMs (long short term ...