📜 Paper: Order Matters: Sequence to Sequence for Sets
✍️ Authors: Oriol Vinyals, Samy Bengio, Manjunath Kudlur
🏛️ Institution: Google DeepMind
📆 Date: 2015
What This Paper Is About
We use sequence-to-sequence models all the time—for translation, summarization, and code generation. They assume the input and output are ordered sequences. But here’s the problem:
Not all data is ordered.
Not all tasks care about order.
But our models always do.
This 2015 paper challenged that assumption and posed a fascinating question:
What happens when you use sequence-to-sequence models to predict sets?
Sets have no natural order. So, if your model insists on choosing one, it might:
Overfit to arbitrary patterns in the order
Penalize correct predictions just because the order is different
Fail to generalize, even when it “understands” the data

Why It Still Matters
We often say machine learning models are “brittle” or “opaque.”
This paper shows why—sometimes it’s not the architecture.
It’s that we’re asking it to care about something that shouldn’t matter.
By exploring tasks where order is irrelevant—like predicting the members of a set, or classifying unordered features—Vinyals et al. revealed a critical blind spot in deep learning:
Sequence models are sensitive to permutations, even when they shouldn’t be.
And if you’re not careful, they’ll learn to solve the wrong problem really well.
What They Did
They ran experiments on synthetic data and real-world tasks, like:
Predicting numbers in an unordered list
Sorting digits
Classifying set membership
And they tested three strategies:
Random Order: Train on arbitrary permutations.
Fixed Order: Always present data in the same (possibly meaningless) order.
Learned Order: Let the model decide the optimal order during training.
They found that:
Models trained with random or fixed orders performed worse.
Allowing the model to learn an order improved generalization and accuracy.
Permutation-invariance is hard to teach with sequential models—but essential in certain tasks.
Core Insight
“Sequence models implicitly assume an order. If your task doesn’t, you’re introducing a modeling bug.”
In modern parlance: You’ve added spurious inductive bias—a bias toward something irrelevant to the actual task.
Modern Relevance
This paper helped spark new directions in:
Set-based learning (e.g., Deep Sets, PointNet)
Permutation-invariant architectures
Attention models that aggregate unordered input
Graph networks and transformers designed for structure rather than sequence
Even in today’s era of LLMs, it’s still a cautionary tale:
Transformers love order. But the world isn’t always a sentence.
Memorable Quote
“We empirically show that the order of the target sequence can make a significant difference in model performance.”
Or more bluntly:
“Your model might fail, not because it’s dumb—but because it’s obedient.”
Podcast Note:
🎙️ Today’s podcast was generated using Google NotebookLM and features AI podcasters.
Editor’s Note
This paper changed the way I think about training objectives. It’s not enough to give your model the right input and hope for the best—you also have to make sure you’re not sneaking in the wrong incentives.
It’s like giving someone a recipe and grading them on how fast they stir, instead of whether the soup tastes good.
Read the original paper here.
Additional Resources for Inquisitive Minds:
Bash Content: Order Matters: Sequence to Sequence for Sets Summary. (19 May 2024.)
SciSpace Open Access. Order Matters: Sequence to sequence for sets
Distilled AI. Aman.AI. Primers. Order Matters.
Coming Tomorrow: Day 30 🎉
🧠 Machine Super Intelligence Discussion
A reflective ending to the series: What happens when the models don’t just help us think—but start thinking bigger than we do?
#WolfReadsAI #SequenceModels #DeepLearningBias #OriolVinyals #GoogleDeepMind #PermutationInvariance #SetLearning #DeepSets#AIModelBehavior #InductiveBias
Share this post