On GPT-3: meta-learning, scaling, implications, and deep theory. The scaling hypothesis: neural nets absorb data & compute, generalizing and becoming more Bayesian as problems get harder, manifesting new abilities even at trivial-by-global-standards-scale. The deep learning revolution has begun as foretold.
The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore’s law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers’ belated learning of this bitter lesson, and it is instructive to review some of the most prominent.
The wooden monk, a little over two feet tall, ambles in a circle. Periodically, he raises a gripped cross and rosary towards his lips and his jaw drops like a marionette’s, affixing a kiss to the crucifix. Throughout his supplications, those same lips seem to mumble, as if he’s quietly uttering penitential prayers, and occasionally the tiny monk will raise his empty fist to his torso as he beats his breast. His head is finely detailed, a tawny chestnut colour with a regal Roman nose and dark hooded eyes, his pate scraped clean of even a tonsure. For almost five centuries, the carved clergyman has made his rounds, wound up by an ingenious internal mechanism hidden underneath his carved Franciscan robes, a monastic robot making his clockwork prayers…
Ain’t I a woman?” asked American abolitionist, Sojourner Truth in 1851. In terms of human worth and value, was she not the equal of any white woman? Well, no. Not according to modern face recognition software. As computer scientist Joy Buolamwini discovered when researching AI bias some 170 years later, the algorithm thought Sojourner was a man.
When Sherrington described the human brain as the enchanted loom in the mid-20th century, the Jacquard loom featured in his prose had been one of the most complex mechanical devices ever invented for over a hundred years. It used a system of punched cards that encoded complex patterns to be weaved into textile. The punched cards devised for the Jacquard loom would later find wider use in early computers, also programmed with punchcards. This is where software was born.
Given the place the Jacquard loom held in the history of mechanical craftsmanship, it’s no surprise that Sherrington imagined the human brain, still the most complex system we know of in the universe, as a system of looms weaving ephemeral patterns into memories and cognition.
Decades have passed, and the complexity of microprocessors used in Internet-scale computing systems dwarf the complexity of even the largest Jacquard patterns. Today, we imagine usurping the capabilities of the human brain with software instead of looms.
We’ve gotten closer, but I’m not sure that computers as we know it will get us there yet. Inventing the computer and the deep neural network may still be one of the first few steps in replicating the magic of the enchanted loom, and in this post, I want to explore the future that I imagine in our steps forward.
We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language.
Artificial intelligence in healthcare is often a story of percentages. One 2017 study predicted AI could broadly improve patient outcomes by 30 to 40 percent. Which makes a manifold improvement in results particularly noteworthy.
In this case, according to one Israeli machine learning startup, AI has the potential to boost the success rate of in vitro fertilization (IVF) by as much as 3x compared to traditional methods. In other words, at least according to these results, couples struggling to conceive that use the right AI system could be multiple times more likely to get pregnant.
The Centers for Disease Control and Prevention defines assisted reproductive technology (ART) as the process of removing eggs from a woman’s ovaries, fertilizing it with sperm and then implanting it back in the body.
The overall success rate of traditional ART is less than 30%, according to a recent study in the journal Acta Informatica Medica…
Once you unleash it on large data, deep learning has its own dynamics, it does its own repair and its own optimization, and it gives you the right results most of the time. But when it doesn’t, you don’t have a clue about what went wrong and what should be fixed. In particular, you do not know if the fault is in the program, in the method, or because things have changed in the environment.
What makes us different from all these things? What makes us different is the particulars of our history, which gives us our notions of purpose and goals. That’s a long way of saying when we have the box on the desk that thinks as well as any brain does, the thing it doesn’t have, intrinsically, is the goals and purposes that we have. Those are defined by our particulars—our particular biology, our particular psychology, our particular cultural history.
Recent results in language understanding using neural networks have required training hardware of unprecedented scale, with thousands of chips cooperating on a single training run. This paper presents techniques to scale ML models on the Google TPU Multipod, a mesh with 4096 TPU-v3 chips. We discuss model parallelism to overcome scaling limitations from the fixed batch size in data parallelism, communication/collective optimizations,distributed evaluation of training metrics, and host input processing scaling optimizations. These techniques are demonstrated in both the TensorFlow and JAX programming frameworks. We also present performance results from the recent Google submission to the MLPerf-v0.7 benchmark contest, achieving record training times from16 to 28 seconds in four ML Perf models on the Google TPU-v3 Multipod machine.
In surveys of AI “experts” on when we are going to get to human level intelligence in our AI systems, I am usually an outlier, predicting it will take ten or twenty times longer than the second most pessimistic person surveyed. Others have a hard time believing that it is not right around the corner given how much action we have seen in AI over the last decade.
Could I be completely wrong? I don’t think so (surprise!), and I have come up with an analogy that justifies my beliefs. Note, I started with the beliefs, and then found an analogy that works. But I think it actually captures why I am uneasy with the predictions of just two, or three, or seven decades, that abound for getting to human level intelligence. It’s a more sophisticated and detailed version of the story about how building longer and longer ladders will not get us to the Moon.
The analogy is to heavier than air flight.
All the language that follows is expressing that analogy.
I’m going to consider a fairly unpopular idea: most efforts towards “explainable AI” are essentially pointless. Useful as an academic pursuit and topic for philosophical debate, but not much else.
Consider this article a generator of interesting intuitions and viewpoints, rather than an authoritative take-down of explainability techniques.
InstaHide (a recent method that claims to give a way to train neural networks while preserving training data privacy) was just awarded the 2nd place Bell Labs Prize (an award for “finding solutions to some of the greatest challenges facing the information and telecommunications industry.”). This is a grave error.
For many scientists, including myself, having a black-box structure prediction tool is not sufficient to declare the protein folding problem solved. A solution requires an in-depth understanding of the mechanisms that determine protein structure. Whether or not AlphaFold can contribute to identifying these mechanisms is a question that scientists can only start to examine, and only if AlphaFold becomes sufficiently accessible and inspectable for critical examination by outside experts. I hope this will happen, and in fact I am optimistic that it will happen: the problem is important enough to deserve a serious effort by everyone involved. AlphaFold is not the end of the quest for a solution of the protein folding problem, but it could well turn out to be the beginning of a new chapter in the story.