Wednesday, September 30, 2015

Descriptive vs. optimal bayesian modeling

In the past fifteen years, Bayesian models have fast become one of the most important tools in cognitive science. They have been used to create quantitative models of psychological data across a wide variety of domains, from perception and motor learning all the way to categorization and communication. But these models have also had their critics, and one of the recurring critiques of the models has been their entanglement with claims that the mind is rational or optimal. How can optimal models of mind be right when we also have so much evidence for the sub-optimality of human cognition?*

An exciting new manuscript by Tauber, Navarro, Perfors, and Steyvers makes a provocative claim: you can give up on the optimal foundations of Bayesian modeling and still make use of the framework as an explicit toolkit for describing cognition.** I really like this idea. For the last several years, I've been arguing for decoupling optimality from the Bayesian project. I even wrote a paper called "throwing out the Bayesian baby with the optimal bathwater" (which was about Bayesian models of baby data, clever right?).

In this post, I want to highlight two things about the TNPS paper, which I generally really liked and enjoyed reading. First, it contains an innovative fusion of Bayesian cognitive modeling and Bayesian data analysis. BDA has been a growing and largely independent strand of the literature; fusing BDA with cognitive models makes a lot of really rich new theoretical development possible. Second, it contains two direct replications that succeed spectacularly, and it does so without making any fuss whatsoever – this is, in my view, what observers of the "replication crisis" should be aspiring to.

1. Bayesian cognitive modeling meets Bayesian data analysis.

The meat of the TNPS paper revolves around three case studies in which they use the toolkit of Bayesian data analysis to fit cognitive models to rich experimental datasets. In each case they argue that taking an optimal perspective – in which the structure of the model is argued to be normative relative to some specified task – is overly restrictive. Instead, they specify a more flexible set of models with more parameters. Some settings of these parameters may be "suboptimal" for many tasks but have a better chance of fitting the human data. And the fitted parameters of these models then can reveal aspects of how human learners treat the data – for example, how heavily they weight new observations or what sampling assumptions they make.

This fusion of Bayesian cognitive modeling and Bayesian data analysis is really exciting to me because it allows the underlying theory to be much more responsive to the data. I've been doing less cognitive modeling in recent years in part because my experience was that my models weren't as responsive as I liked to the data that I and others collected. I often came to a point where I would have to do something awful to my elegant and simple cognitive model in order to make it fit the human data.

One example of this awfulness comes from a paper I wrote on word segmentation. We found that an optimal model from the computational linguistics literature did a really good job fitting human data - if you assumed that it observed data equivalent to something between a tenth and a hundredth of the data the humans observed. I chalked this problem up to "memory limitations" but didn't have much more to say about it. In fact, nearly all my work on statistical learning has included some kind of memory limitation parameter, more or less – a knob that I'd twiddle to make the model look like the data.***

In their first case study, TNPS estimate the posterior distribution of this "data discounting" parameter as part of their descriptive Bayesian analysis. That may not seem like a big advance from the outside, but in fact it opens the door to putting into place much more psychologically-inspired memory models as part of the analytic framework. (Dan Yurovsky and I played with something a bit like this in a recent paper on cross-situational word learning – where we estimated a power-law memory decay on top of an ideal observer word learning model – but without the clear theoretical grounding that TNPS). I would love to see this kind of work really try to understand what this sort of data discounting means, and how it integrates with our broader understanding of memory.

2. The role of replication.

Something that flies completely under the radar in this paper is how closely TNPS replicate the previous empirical findings reported. Their Figure 1 tells a great story:

Panel (a) shows the original data and model fits from Griffiths & Tenenbaum (2007), and panel (b) shows their own data and replicated fits. This is awesome. Sure, the model doesn't perfectly fit the data - and that's TNPS's eventual point (along with a related point about individual variation). But clearly GT measured a true effect, and they measured it with high precision.

The same thing was true of Griffiths & Tenenbaum (2006) – the second case study in TNPS. GT2006 was a study about estimating conditional distributions for different processes, e.g. given that you've lived X years, how likely is it that you live Y. At the risk of belaboring the point, I'll show you three datasets on this question. First from GT2006, second from TNPS, and third a new, unreported dataset from my replication class a couple of years ago.**** The conditions (panels) are plotted in different orders in each plot, but if you take the time to trace one, say lifespans or poems, you will see just how closely these three datasets replicate one another. Not just the shape of the curve but also the precise numerical values:

This result is the ideal outcome to strive for in our responses to the reproducibility crisis. Quantitative theory requires precise measurement - you just can't get anywhere fitting a model to a small number of noisily estimated conditions. So you have to strive to get precise measures – and this leads to a virtuous cycle. Your critics can disagree with your model precisely because they have a wealth of data to fit their more complex models to (that's exactly TNPS's move here).

I think it's no coincidence that quite a few of the first big data, mechanical turk studies I saw were done by computational cognitive scientists. Not only were they technically oriented and happy to port their experiments to the web, they also were motivated by a severe need for more measurement precision. And that kind of precision leads to exactly the kind of reproducibility we're all striving for.

* Think Tversky & Kahneman, but there are many many issues with this argument...
** Many thanks to Josh Tenenbaum for telling me about the paper; thanks also to the authors for posting the manuscript.
*** I'm not saying the models were in general overfit to the data – just that they needed some parameter that wasn't directly derived from the optimal task analysis.
**** Replication conducted by Calvin Wang.

Monday, September 14, 2015

Marr's attacks and more: Discussion of TopiCS special issue

In David Marr's pioneering book, Vision, he proposed that no single analysis provides a complete understanding of an information processing device. Instead, you really need to have a theory at three different levels, answering three different sets of questions; only together do these three analyses constitute a full understanding. Here's his summary of the three levels of analysis that he proposed:

Since 1982 when the book came out, this framework has been extremely influential in cognitive science, but it has also spurred substantial debate. One reason these debates have been especially noticeable lately is due to the increasing popularity of Bayesian approaches to cognitive science, which are often posed as analyses at the computational theory level. Critiques of Bayesian approaches (e.g., Jones & Love; Bowers & Davis; Endress; Marcus & Davies) often take implicit or explicit aim at computational theory analyses, claiming that they neglect critical psychological facts and that analyses at only the computational level run the risk of being unconstrained "just so" stories.*

In a recent special issue of Topics in Cognitive Science, a wide variety of commentators re-examined the notion of levels of analysis. The papers range from questioning of the utility of separate levels all the way to proposals of new, intermediate levels. Folks in my lab were very interested in the topic, so we split up the papers amongst us and each read one, with everyone reading this nice exposition by Bechtel & Shagrir.  The papers vary widely, and I haven't read all of them. That said, the lab meeting discussion was really interesting and so I thought I would summarize three points from it that made contact with several of the articles.

1. The role of iteration between levels in the practice of research. 

Something that felt missing from a lot of the articles we read was a sense of the actual practice of science. There was a lot of talk about the independence of levels of analysis or the need for other levels (e.g., in rational process models). But something I didn't see at all in the articles we discussed was any notion of how these philosophical stances would interact with the day-to-day practice of science. In my own work, I often iterate between hypotheses about cognitive constraints (e.g., memory and attention) and the actual structure of the information processing mechanisms I'm interested in. If I predict a particular effect and then I don't observe it, I often wonder if my task was too demanding cognitively. I'll then try to test that question by removing some sort of memory demand.

An example of this strategy comes from a paper I wrote a couple of years ago. I had noticed several important "failures" in artificial language learning experiments and wondered the extent to which these should be taken as revealing hard limits on our learning abilities, or whether they were basically just results of softer memory constraints. So I tried to reproduce the same learning problems in contexts with more limited constraints, e.g. by giving participants unlimited access to the stimulus materials in an audio loop or even a list of sentences written on index cards. For some problems, this manipulation was all it took to raise performance to ceiling level. But other problems were still fairly difficult for many learners even when they could see all the materials laid out in front of them! This set of findings then allowed me to distinguish which phenomena were a product of memory demands and which might constrain the representation or computation beyond those processing limitations.**

2. The differences between rational analysis and computational level analysis. 

I'm a strong proponent of the view that a computational level analysis that uses normative or optimal (e.g. Bayesian) inference tools doesn't have to imply that the agent is optimal. In a debate a couple of years ago about an infant learning model I made, I tried to separate the baggage of rational analysis from the useful tools that come from the computational level examination of the task that the agent faces. The article was called "throwing out the Bayesian baby with the optimal bathwater," and it still summarizes my position on the topic pretty well. But I didn't see this distinction between rational analysis and computational level analysis being made consistently in the special issue articles I looked at.

I generally worry that critiques of the computational level tend to end up leveling arguments against rational analysis instead, because of its stronger optimality assumptions. In contrast, the ideal observer framework – which is used much more in perception studies – is a way of posing computational level analyses that doesn't presuppose that the actual observer is ideal. Rather, the ideal observer is a model that is created to make ideal use of the available information; the output of this model can then be compared to the empirical data, and its assumptions can be rejected when they do not fit performance. I really like the statement of this position that's given in this textbook chapter by William Geisler.

3. The question of why representation should be grouped with algorithm. 

I had never really thought about this before, perhaps because it'd been a while since I went back to Marr. Marr calls level 2 "representation and algorithm." If we reflect on the modern practice of probabilistic cognitive modeling, that label doesn't work at all – we almost always describe the goal of computational level analysis as discovering the representation. Consider Kemp & Tenenbaum's "discovery of structural form" – this paper is a beautiful example of representation-finding, and is definitely posed at the highest level of analysis.

Maybe here's one way to think about this issue: Marr's idea of representation in level 2 was that the scientist took for granted that many representations of a stimulus were possible and was interested in the particular type that best explained human performance. In contrast, in a lot of the hard problems that probabilistic cognitive models get applied to – physical simulation, social goal inference, language comprehension, etc. – the challenge is to design any representation that in principle has the expressivity to capture the problem space. And that's really a question of understanding what the problem is that's being solved, which is after all the primary goal of computational level analysis on Marr's account.


My own take on Marr's level's is much more pragmatic than the majority of authors in the special issue. I tend to see the levels as a guide for different approaches, perhaps more like Dennett's stances than like true ontological distinctions. An investigator can try on whichever one seems like it will give the most leverage on the problem at hand, and swapping or discarding a level in a particular case doesn't require reconsidering your ideological commitments. From that perspective, it has always seemed rather odd or shortsighted for people to critique someone on the level of analysis they are interested in at the moment. A more useful move is just to point out a phenomenon that their theorizing doesn't explain...

* Of course, we also have plenty of responses to these critiques....
** I'm painfully aware that this discussion presupposes that there is some distinction between storage and computation, but that's a topic for another day perhaps.

Thanks to everyone in the lab for a great discussion – Kyle MacDonald, Erica Yoon, Rose Schneider, Ann Nordmeyer, Molly Lewis, Dan Yurovsky, Gabe Doyle, and Okko Räsänen.

Peebles, D., & Cooper, R. (2015). Thirty Years After Marr's : Levels of Analysis in Cognitive Science Topics in Cognitive Science, 7 (2), 187-190 DOI: 10.1111/tops.12137