Deep Learning With The Wolf
Deep Learning With The Wolf
Day 8: "Sequence to Sequence Learning with Neural Networks." (When Two LSTMs Started Speaking in Tongues.)
0:00
-11:48

Day 8: "Sequence to Sequence Learning with Neural Networks." (When Two LSTMs Started Speaking in Tongues.)

The encoder-decoder breakthrough that turned raw text into bilingual banter.

Paper: Sequence to Sequence Learning with Neural Networks — Ilya Sutskever, Oriol Vinyals & Quoc Le (2014)

The one-sentence summary: From ‘I ❤ Cats’ to ‘J’ ♥ les chats’ — how two LSTMs started talking to each other and taught the world machine translation.


What It’s About

Picture a relay race where Runner #1 takes a message in English, hands the baton to Runner #2, and—without tripping—Runner #2 sprints across the language barrier to deliver it in fault-free French. That, in spirit, is what Sutskever and friends pulled off in 2014: an encoder–decoder LSTM pipeline that transformed sequences into … well, other sequences. It was the first time a single neural network family tree could listen, remember, and speak—no hand-crafted phrase tables required.


Key Takeaways for Busy Humans

  • End-to-End Everything — Say goodbye to hand-engineered pipelines; data in, translation out.

  • Universal Interface — Any input/output that can be serialized (audio, code, protein sequences) is fair game.

  • Foundation for Attention — The pain of squeezing long sentences into a single vector motivated Bahdanau-style attention one year later, and ultimately the Transformer.

  • Encoder–Decoder as a Mindset — Prompts + completions, image captions, even humanoid-robot task planning all echo this two-brain pattern.


“Wolf Bites” — Skimmable Nuggets

  • The model beat phrase-based SMT on the WMT’14 English→French benchmark with a BLEU of 34.8—legendary at the time.

  • Training one epoch over 12M sentence pairs took ten days on eight NVIDIA K40 GPUs. Today you could replicate the experiment in an afternoon on a single RTX 4090.

  • Google Translate quietly adopted seq2seq in late 2016, causing users worldwide to wonder if the product had been possessed by fluent spirits overnight.


Notes: The podcasts for this series are done with Google Notebook and the two podcasters you hear are AI-generated. The sources used to generate today’s “notebook” were: 1) the original paper and 2) this article.


Read the original paper here.

Sources

  1. Sutskever, I.; Vinyals, O.; Le, Q. V. “Sequence to Sequence Learning with Neural Networks.” Advances in Neural Information Processing Systems 27 (2014).

  2. Google AI Blog. “A Neural Machine Translation System Per-Sentence BLEU Improvement” (2016).

  3. Kilcher, Y. “Seq2Seq Explained.” YouTube, 2020.

#Seq2Seq #MachineTranslation #DeepLearning #AIHistory #TheWolfReadsAI #deeplearningwiththewolf #dianawolftorres #deeplearning #sutskever #sequencetosequencelearning #ilyasutskever

Discussion about this episode

User's avatar