Two Cultures and the Academic Literature

A different kind of search history

Apr 27, 2023

Yesterday was an exceedingly long day, and I slept very poorly, so I’m going for something relatively lightweight today, namely a longer version of my response to a brief Twitter thread about library use in academia (screenshotted below because Twitter and Substack have an ongoing beef so embedding doesn’t work):

I ended up replying to this because it runs up against one of the handful of topics where I become Cranky Two Cultures Guy. Specifically, this is a (very mild) example of people in not-STEM fields assuming that the way they do research is completely universal, and that those of us in STEM fields are just Doing It Wrong. In fact the culture and practices involved are very different, and as a result the core skills that need to be part of an undergraduate curriculum are different.

I’m not saying that the assertion here is wrong— we definitely don’t spend much time teaching students how to use library databases. But that’s because the whole structure of the way research works is fundamentally different in ways that reduces the salience of database-searching skills in a way that means it’s not something we would emphasize for STEM majors.

There are two main structural factors here that come into play, which I’ll discuss here in the opposite order from the Twitter thread linked above, just for kicks. The first is that research in the sciences does not place very much importance on the distinction between “primary” and “secondary” sources. My very-slightly-exaggerated attempt to get this point across to colleagues from the other side of campus is that the only “primary” source in science is nature: an experiment or observation you made yourself or a calculation or proof you did on your own. Anything I can find through a library is, more or less by definition, a “secondary source.”

As a result, there’s very little need in research science to track down and scrutinize the original founding documents of any particular line of research, because all published treatments have more or less the same status. And it can be actively counterproductive to go all the way back to the beginning, because most modern discussions are better for learning the basics, because subsequent work has refined our conceptual understanding and the apparatus used to put the original ideas into practice. You wouldn’t want to try to understand electromagnetism by reading the original papers that introduced Maxwell’s Equations, not least because the first published version involves something like 20 individual equations. The modern notation that renders them as four simple and elegant expressions was developed something like 30 years after Maxwell introduced the ideas.

Unless you’re specifically interested in the historical development of some concept, you’re much better grabbing a modern textbook or a review article to learn the key ideas. Neither of which you should need a sophisticated database search to find.

On top of that, the second factor is that research in the sciences tends to be cumulative in a very immediate and short-term sense. That is, a new student will generally join an existing project aimed at some huge and long-term goal, and they will learn enough about how it works to make an incremental advance toward that goal. After which they hand off the long-term project to the next student, who learns and makes their own incremental advance, and so on. This is especially true at the undergraduate level, where the time available for learning and advancing the project is very short.

That kind of progress-by-concatenation does not generally require broad or deep searching of the past literature in the field. The most immediately relevant resources for a new student joining a scientific research group are the thesis of the previous student (and references therein), plus any parallel developments reported by other groups in the very recent past. Most of the searching that needs to be done is tracking direct chains of reference forwards and backwards, which is why tools like the Astrophysics Data System are optimized for that.

This is not to say that there aren’t cases where broader and deeper searches are necessary, but they’re mostly necessary when embarking on a new project, something that’s relatively rare. Especially at the undergraduate level— those kinds of large-scale shifts are generally decided by the PI of a research group, not the most junior students. So, again, given the limited time available, it’s just not a very high priority to teach search skills in the undergraduate curriculum.

Of course, the lack of facility with literature searching does sometimes bite scientists in the ass when launching new projects— there’s a joke that I’ve mostly heard from medical-chemistry types about how “a month in the lab can save you a few hours in the library.” There are times when a whole prospective line of research has been superseded by a publication that’s outside the immediately obvious chains of references, and a better search capability might’ve avoided a bunch of wasted effort.

On the other hand, though, there are often issues with past research that make it still worthwhile to check directly. I know a fair number of people who do precision spectroscopy experiments who have run across shockingly large errors in the previously published values of various quantities. It’s important not to give too much weight to the kinds of things you can search up through the library: the only truly authoritative source is nature itself.

So, there’s a deep structural divide here that leads to a lot of confusion across the disciplinary divide. Which, to be fair, goes both ways— one of my previous roles here was Director of Undergraduate Research, and in that capacity, I evaluated a bunch of proposals from and attended talks by students in not-STEM fields, which I tended to find frustratingly incomplete. Many of them seemed like very good examples of preliminary work to me, something that in my world would be passed off to another student to carry on a new analysis that would lead to something more coherent and convincing down the road. But the next year, there would be an entirely different set of students starting from zero on an entirely different set of research questions, and doing adequate to very good preliminary analyses, but no more than that.

That’s just a reflection of the different research practices in STEM and not-STEM, though: on the other side of the gap, there’s more importance attached to deciding what question to investigate, and less investment in the continuity of a project. (This is probably a result of the higher cost of doing scientific research, particularly in experimental and computational fields.) Faculty have long-term research agendas, where they make their own slow and cumulative progress, but they’re not generally bringing students on to make incremental progress in the same way we do in the sciences.

All of this amounts to basically a pair of disciplinary quirks, which ought to be met with a “Hunh. How ‘bout that?” followed by a live-and-let-live approach. All too often, though, I run into a dogged insistence that institutional structures should be narrowly designed to support one model or the other, that incentives (financial and otherwise) should be targeted toward rewarding one particular set of practices, and that curricula should be designed to emphasize skills relevant to one group of disciplinary practices. Which is enormously frustrating because a deep dive into academic database searching is about as relevant to the needs of most physics majors as learning to typeset documents in LaTeX would be for literature majors.

There are, as with any two-kinds-of-thing-in-the-world split, edge cases and exceptions to this, but having spent years explaining this to colleagues over and over again, I think this gets at the broad cultural difference between groups of research disciplines. I might possibly come back to the edge cases and exceptions at some later time, and if you’d like to be here for that, click this button:

If you simply can’t wait and want to castigate me for not discussing some particular edge case in the above, the comments will be open:

Counting Atoms

Discussion about this post