Day 8: "Sequence to Sequence Learning with Neural Networks." (When Two LSTMs Started Speaking in Tongues.)

Deep Learning With The Wolf

0:00

-11:48

Day 8: "Sequence to Sequence Learning with Neural Networks." (When Two LSTMs Started Speaking in Tongues.)

The encoder-decoder breakthrough that turned raw text into bilingual banter.

Diana Wolf Torres

May 01, 2025

Transcript

Paper: Sequence to Sequence Learning with Neural Networks — Ilya Sutskever, Oriol Vinyals & Quoc Le (2014)

The one-sentence summary: From ‘I ❤ Cats’ to ‘J’ ♥ les chats’ — how two LSTMs started talking to each other and taught the world machine translation.

What It’s About

Picture a relay race where Runner #1 takes a message in English, hands the baton to Runner #2, and—without tripping—Runner #2 sprints across the language barrier to deliver it in fault-free French. That, in spirit, is what Sutskever and friends pulled off in 2014: an encoder–decoder LSTM pipeline that transformed sequences into … well, other sequences. It was the first time a single neural network family tree could listen, remember, and speak—no hand-crafted phrase tables required.

Key Takeaways for Busy Humans

End-to-End Everything — Say goodbye to hand-engineered pipelines; data in, translation out.
Universal Interface — Any input/output that can be serialized (audio, code, protein sequences) is fair game.
Foundation for Attention — The pain of squeezing long sentences into a single vector motivated Bahdanau-style attention one year later, and ultimately the Transformer.
Encoder–Decoder as a Mindset — Prompts + completions, image captions, even humanoid-robot task planning all echo this two-brain pattern.

“Wolf Bites” — Skimmable Nuggets

The model beat phrase-based SMT on the WMT’14 English→French benchmark with a BLEU of 34.8—legendary at the time.
Training one epoch over 12M sentence pairs took ten days on eight NVIDIA K40 GPUs. Today you could replicate the experiment in an afternoon on a single RTX 4090.
Google Translate quietly adopted seq2seq in late 2016, causing users worldwide to wonder if the product had been possessed by fluent spirits overnight.

Notes: The podcasts for this series are done with Google Notebook and the two podcasters you hear are AI-generated. The sources used to generate today’s “notebook” were: 1) the original paper and 2) this article.

Read the original paper here.

Sources

Sutskever, I.; Vinyals, O.; Le, Q. V. “Sequence to Sequence Learning with Neural Networks.” Advances in Neural Information Processing Systems 27 (2014).
Google AI Blog. “A Neural Machine Translation System Per-Sentence BLEU Improvement” (2016).
Kilcher, Y. “Seq2Seq Explained.” YouTube, 2020.

#Seq2Seq #MachineTranslation #DeepLearning #AIHistory #TheWolfReadsAI #deeplearningwiththewolf #dianawolftorres #deeplearning #sutskever #sequencetosequencelearning #ilyasutskever