Monday, June 29, 2015

A one-trial replication of Chemla & Spector (2011)

tl;dr: Replication of a somewhat controversial finding in experimental semantics/pragmatics.

How do we go beyond the literal semantics of what someone says to infer what they actually meant? Pragmatic inferences – inferences about language use in context – are an important part of language comprehension, and one of the topics I'm most interested in these days. The case study for much of the experimental work on pragmatics has been scalar implicature (in fact, I taught an entire course on this topic last winter). For example, if I say "some of the students passed the test," you can infer that some but not all of the students passed the test. (If I had meant "all passed the test," I probably would have said that).

Although these have been taken as canonical examples of pragmatic inferences, things have gotten a bit more complicated in recent years. A number of linguists have argued that these implicatures are actually generated automatically and are part of the grammar, rather than being generated based on expectations about speakers' intended meanings in context. I won't review the whole literature on this issue (it's quite complicated) but one particularly important phenomenon in the debate is the existence of what are called "local" scalar implicatures – that is, implicatures that are generated within an utterance rather than at the level of the entire utterance.

Here's an example, from a very nice paper by Chemla & Spector (2011). C&S showed participants displays like these:

Then they asked participants to make a graded judgment about the truth of sentences with respect to these pictures. The key sentence (translated into English, the original was in French) is "Exactly one letter is connected with some of its circles." Critically, the different pictures were designed so as to be congruent with different interpretations of the experimental items. C&S posited three such readings:

  • "literal" reading: "exactly one circle is connected with some or all of its circles" (C&S say that the others also must be connected with none, but I'm not sure why);
  • "local" reading: "exactly one circle is connected with some but not all of its circles"
  • "global" reading: "exactly one circle is connected with some but not all of its circles, the others are connected with none"

The local interpretation was the critical one for their purposes, because it required the scalar implicature within the sentence (strengthening "some" to "some but not all") but no implicature at the global level, e.g. that the others are connected with none. As an experimental linking hypothesis, they claimed that participants' degree of truth judgment would be proportional to the number of readings that a particular picture supported. As shown above, the different displays that they used rendered different combinations of readings true.

C&S found data that strongly supported the availability of local readings. In fact, the local reading was even stronger than the literal reading (this wasn't necessarily a prediction, but made their case stronger):

This experimental finding has been controversial, however. Geurts & van Tiel (2014), following previous work by Geurts that didn't find embedded (local) implicatures, have critiqued this and other papers. And a paper I was involved in, Potts et al. (under review), has a much more extensive take on this issue, as well as a different, more naturalistic paradigm.

But in addition to theoretical questions, every time I talk to people about the C&S finding, they bring up doubts about the paradigm that C&S used, whether various replications show order effects, and whether this effect is general across languages (in French, the original language, their "some of its" was certain de, which isn't even a quantifier, technically). In this post, I'm reporting what I say to people when they mention these worries. In particular, I have replicated C&S's basic finding several times in various classes, often in ways that address the critiques above. Here I'll present a version I ran last summer for a course at ESSLLI 2014.

This was a class demo of Amazon Mechanical Turk, so my method was extremely basic. I took exactly the four images above and showed them to four independent groups of 50 US-based participants (total N = 200), who made judgments about whether the target sentence was true of the picture using a seven-point likert scale. So this was a one-trial, completely between-subjects design. Note that there were two manipulation checks relating to descriptions of the display (31 participants failed), and we excluded 39 more for doing more than one trial. Final N was 130.

Here are the data, with 95% CIs:

We replicate the finding that local implicatures were available at detectable levels (e.g., ratings clearly better than those for false sentences). The magnitude is different from C&S, though: the sentences were judged to be far better for the literal pictures than the local ones. Another interesting aspect of this discussion has been about various different response formats. As I mentioned, we used a 7-point likert scale, but participants essentially only used the endpoints (as in the Potts et al. paper above). It seems that participants either "see" a reading or don't. They don't seem to be finding multiple readings and judging the picture to match the sentence to a certain extent, or at least they are not doing this in any substantial number. Here's the histogram:

In sum, we replicate Chemla & Spector, in English, with a standard likert scale, without any fillers, order effects, or extra items to be compared against one another. Some – but not all – participants found an interpretation consistent with local scalar implicature. Code and data available here