Beauty Is a Trap?
Mathematical beauty, anyway...
One of the signs that you’re getting to be an Old Professor is when your former students start showing up as distinguished lecturers. This happened to me on Tuesday night, when Nikhil Srivastava ‘05 gave a public lecture on campus about “AI” generated math proofs. Nikhil started at Union the same term I did, and while he was ultimately a Math and CS major and English minor, he did take several physics classes with me1, so I skipped one of The Pip’s school baseball games to go say hi and hear his talk.

His title was “The Uncanny Valley of Proof,” and concerned the First Proof Project to test the ability of “AI” systems to generate research-level math proofs. There’s been a ton of hype about this, but as he noted in setting up the topic, this has mostly consisted of AI companies feeding questions into their proprietary models and generating press releases any time they got a successful result. But, of course, nobody knows how many overall attempts they made, or what other resources were used, so it’s difficult to assess What It All Means.
First Proof took an interesting approach to this problem, recruiting working mathematicians to sign up with recent proofs that they had completed but not yet published (so the proofs themselves could not possibly be in the training data for the models). They posted ten research problems and encrypted proofs (to establish priority), and issued a public call for attempts to generate proofs of these in a one-week window (after which they would decrypt and publish the proofs).
After collecting a bunch of submissions, they had the resulting proofs evaluated by working mathematicians, who checked the validity of the proofs, and also assessed them in a qualitative manner. It’s weirdly difficult to find a summary of the results— this blog post is the best I’ve come across that isn’t paywalled— but the submissions produced valid proofs for about seven of the ten research problems.
Nikhil’s talk focused on looking in some detail at the proof for the question he submitted2, which is the reason for the “uncanny valley” reference in his title. He acknowledged that it was a correct proof, but described his response to it with a collection of terms that included “repulsed.” In contrast to the elegant proof he and his students had found, which after one big jump redefining the problem in terms of a different type of mathematical object has a kind of “inevitable” logic, he characterized it as a series of steps that are all valid and reasonable, but lack the sort of overarching coherent narrative found in the best human-generated proofs. He described it as producing a bunch of “Yes, you can do that, but why?” steps, that end up at the correct result.
He contrasted this to another recent success, the AI-generated solution of an “Erdős Problem,” which I’m fairly certain was #1196; I forgot to copy down the number, and it’s remarkably difficult to find a non-paywalled straight news story about this that’s readable through the cacaphonous grinding of axes on the part of the author. The best I’ve come across is this article, which characterizes the solution in approximately the same terms Nikhil used in his talk. Apologies if I’ve gotten the wrong number…
He described this proof, which was apparently produced by an LLM with some prompting by an undergrad math major, as more aesthetically pleasing in terms of having the “right” kind of narrative logic. The more interesting note to me, though, was that the key step of the proof involved taking the less beautiful path3. As he described it, there are two “obvious” moves you can make to start attacking the problem, but one of them seems more elegant and attractive to human mathematicians because it involves some cool tricks in the early stages. The successful approach, on the other hand, doesn’t make that move, and just grinds through some tricky but unsexy techniques.
What jumped out to me about these was an analogy to the history of quantum physics4, specifically the development of quantum electrodynamics (QED). When measurements of the Lamb Shift and the anomalous magnetic moment of the electron made it clear that the divergent results of early attempts at QED couldn’t simply be waved off, Niels Bohr was more or less openly rooting for some really radical revision of the underlying laws of physics5. The answer to the problem, however, turned out to be just a bunch of tedious math to show that, in fact, the infinite quantities were of the right type to be canceled out— Schwinger and Tomonaga using tricky but relatively conventional techniques, Feynman via an unconventional and more intuitive route, with Dyson tying the three together. (And it turns out that Ernst Stueckelberg had had the solution a decade earlier, but nobody read it…)
There are a bunch of long-running arguments over whether the QED approach is “ugly” or “beautiful” in mathematical terms, carrying over into similar arguments about the further extensions of field theory, which I don’t really want to get into. I do think there’s potentially an interesting parallel in that Bohr and company were looking in the wrong place for a solution because of essentially aesthetic concerns (it would be much cooler if this required a radical revision of physics) in the same way that a bunch of mathematicians went down blind alleys on Erdős 1196 because they were drawn to shiny cool techniques. And then I wonder about the difference in approaches to the First Proof problem— does the “repulsive” but successful computer-generated proof suggest that the solution to this problem was delayed by the human need for elegance and narrative logic? Are we getting trapped and passing up potential proofs for essentially aesthetic reasons?
This inevitably wanders off into deep waters of “What is the purpose of high-level math research, anyway?” that I don’t really have the time (or mathematical chops) to explore in detail. (That won’t stop me giving this post a slightly clickbait-y title, of course…) It’s interesting to consider, though, especially as tools that seem more likely to generate less satisfying but correct approaches become more powerful and more readily available.
Anyway, the talk ended with a note that First Proof is doing a second round with tighter constraints— AI companies will provide code to be run by the First Proof team so they control the actual prompts and available resources. (The first round was completely unconstrained, and most of the solutions involved “limited human supervision,” which doesn’t really clear things up.) I’ll be interested to see where things go from here, even if I can’t really follow any of the problems being solved…
There was a good deal more to the talk, including a lot of discussion of heuristic models of what the LLMs might be doing to generate these proofs, and that sort of thing. I’m focusing on a narrow slice of it that aligns with my particular interests, but the whole thing was pretty interesting, and it’s always nice to see a former student doing well. Even if it does make me feel like I’m about the crumble into dust…
Addendum: I hadn’t seen this result publicized by OpenAI yesterday before I wrote this (and obviously not before the talk, which was two days ago). I mention it because not long after hitting publish on this post, I saw this ex-Twitter thread which says similar-ish things about the approach taken by the AI in this case.
If you enjoyed this, and would like to receive more of this kind of thing in between various wibbling about the state of academia, here’s a button:
And if you feel so moved, the comments will be open:
It was clear even then that he was destined for big things— he used to doze off in the front row of my Modern Physics course, then wake up and ask really difficult questions.
It’s #4 in the list of problems in the First Proof paper, if you want to look it up.
[robert_frost_420 has entered the chat]
Come on, you knew this was coming…
This was a bit of a pattern for Bohr, who was willing to discard conservation of energy as a general principle in order to explain beta decay, and was sort of disappointed when Pauli solved the problem by introducing the neutrino.


For human mathematicians and scientists, elegance may be not just an aesthetic preference but a practical heuristic.
A human exploring the immense space of possible proofs "by hand" (by brain?) takes a long time to work through any given approach. That means going down a blind alley is very costly, and the need for heuristics is high; approaches that can be explored quickly and discarded right away if they don't pan out.
What makes a proof "elegant?" I'm not a mathematician (my dad was), but I would guess there's a strong correlation with simplicity; ease of understanding; an outcome that seems to flow naturally and inevitably from the premises; "one weird trick" that, once grasped, makes the rest just fall into place. All of those seem to me like fair proxies for "quicker to explore."
But if the cost of exploration falls, then the need for heuristics is reduced, and it becomes more feasible to take a brute-force approach where you just let the machines bash their way through. Our instincts rebel at the results because we're used to a world where cognition is a scarce resource and it feels wasteful.
There's a whole lot more I want to add about parallels in software and the benefits and drawbacks of abstractions, but I think I better stop or I'll be at this all day. :)