# Unimaginable Mathematics

## May 5, 2015

It’s Springtime in NYC, and Guttman statistics students are learning about Sampling Distributions and the Central Limit Theorem.

A sampling distribution.

This has been a tough semester. It’s the first the time I’ve taught Statistics B at Guttman. I’ve been reminded repeatedly of Mathews and Clark’s paper where they used APOS Theory to assess students’ understanding of the Central Limit Theorem. The short version of their story is that none of the students they interviewed had any clue what was going on. The authors had hoped to identify the objects and processes that students must construct to understand the Central Limit Theorem. Instead, they found that “none of these eight students was able to discuss the Central Limit Theorem in a meaningful way.” This included A+ students and at least one class where the instructor said they put “major emphasis” on the Central Limit Theorem.

Part of the problem here is that we tend to avoid the difficult problems in education. In most classrooms, statistics is taught procedurally. Procedural understanding is easier to measure, and it’s what our students expect. Can you calculate the standard error? Did you use the correct formula? Did you plug in the correct numbers? Those are easy questions to answer for both the student and the professor. Here’s a more difficult question: do you know what those things mean?

The other problem is that Central Limit Theorem is really hard. At it’s core, it’s a theorem about the sampling distribution, which is in itself a very difficult topic.

My class meets 4 times a week, and we spent two weeks discussing the sampling distribution and Central Limit Theorem. In our first discussion, we took a “population” of 5 students in the class, asked them what borough they lived in, constructed all possible samples of size 3, and calculated the proportion from Brooklyn (the estimator) in each of these samples. The students used a carefully scaffolded handout to construct the sampling distribution from this information, and then reflected on how the standard deviation and mean might be used to measure error and bias.

This took about two days. The hardest part for them was just delineating all of the samples of size 3. They had no algorithm to enumerate the samples in an orderly fashion so many students repeated samples or stopped when they ran out of ideas without any clear way to confirm they’d exhausted all the options.

Next, we imagined a population of six M&M’s, 3 blue, 1 green, and 2 red. This time, I gave a list of all samples of size 4, and asked them to calculate the proportions that were blue and green in each sample. Again, through a carefully scaffolded process, they constructed sampling distributions for the proportion of green and blue M&M’s and reflected on the error and bias in these distributions.

When you roll a 6-sided die, it’s easy to think of this as a random process with a list of different outcomes. Students don’t think of selecting a sample as a random process with a list of different outcomes. Part of this is because they usually only get to see one sample whereas a die can be rolled several times.

After our first two discussions, I was thinking that students needed some assistance in seeing the sample selection process as something random. I gave them a population of 2 “cat people” and 5 “dog people.” Students were asked to calculate the parameter, and to think about possible outcomes. For example, is it possible for 25% of the sample to be cat people? What about 100% of the sample? Can you describe a sample with an estimator of 50%?

I wrote each of the 35 possible samples on a piece of paper, and came around the class with the samples in a cup. Each student randomly drew one of the samples from the cup, calculated it’s estimator, and wrote this information on a post-it note. They were then asked to attach their post-it note to the white board so that we created one big sampling distribution together as a class. We repeated this process with different colored post-it notes and a different, more biased sampling method, and then we discussed how the shape and standard error of the two distributions were different.

You’d think students were getting it pretty good by now, right? They weren’t, but I had to go on so we launched into a discussion about the Central Limit Theorem. I was tired of creating sampling distributions by hand; it was tedious and sucked the life out of class so we used this Central Limit Theorem app. The setup was to imagine creating random samples of NBA basketball players. According to the NBA Tattoos Tumblr blog, 55% of the 442 players in the 2013-2014 season had tattoos. That gave us a parameter of p = 0.55, and a population of 442 players.

I’m a Trail Blazers fan.

I gave students a handout that helped orient themselves to the information on the website, setting the slider for p = 0.55, and making sense of the different graphs. Once they were comfortable with the app, I had them adjust the slider, slowly increasing the sample size, and taking notes about the standard deviation and the shape of the sampling distribution as they went. The goal was for them to see for themselves that the distribution becomes normal and the standard deviation becomes equal to the standard error.

The app shows you estimators for a random 8 of the samples. This turned out to be the best part of the lesson. As students looked at their friend’s computers, they wanted to know why they’d “gotten it wrong,” why their friends had different answers. Of course, the point is that the selection of samples is random so they didn’t do it wrong. Just like the roll of a die comes up different every time, no two students are going to get the exact same random selection of samples. They were at least starting to understand that fact.

Finally, we had a discussion more focused on when you can and cannot use Central Limit Theorem. I laid out the 3 conditions of the theorem and gave students different scenarios in which they could check the conditions.

As you can see, the sampling distribution was a “major emphasis” in my class, but the depressing fact is that students still don’t understand it. After some dismal test performances, I decided to return on Friday to the topic one last time. I prepared a carefully laid out worksheet showing all of the samples piling up on a graph so that students could literally see the sampling distribution. One table of especially diligent students got through the handout and made sense of it, but I was shocked with just how difficult it was for them.

Even my best students had trouble graphing the probabilities in the sampling distribution. Instead of counting the 12 samples in which 67% of the respondents said yes, they counted the 24 respondents who said yes. In other words, they were thinking of respondents rather than samples as the objects in the sampling distribution. They got hung up on some very simple calculations, and it seemed like two parallel narratives developed. On the one hand, there was a narrative about calculating numbers, a sort of obstacle course of formulas and complicated hoops to jump through in order to get the “right” answer. Parallel to this was a narrative about statistical meaning. It was like these two narratives were on separate paths that never crossed. They got further down the calculation path, but never stopped to ask how the calculations they had just done might inform or give meaning to the statistics.

In hindsight, I should have broken chapter 7 into two tests, the first on the sampling distribution and the second on the Central Limit Theorem and confidence intervals. It’s crazy that all three of those concepts are in one chapter. I also could have made my lessons more focused, introducing a single idea each class and pounding home that one point one day at a time. I could have spent two classes having students construct sampling distributions without discussing error, bias, sampling methods, etc. Later, we could have used sampling distributions to talk about error and bias.

There are always things you could have done differently, but there is also something especially difficult about the sampling distribution. From the perspective of APOS theory, the sampling distribution treats samples as objects. Until this point, the objects in a probability distribution are always respondents, but now they are sets of respondents. According to Dubinsky and crew, this means that students have to develop an object understanding of a sample.

I wonder if this is really the issue. My students are comfortable acting on samples, removing or adding respondents to create new samples. Plus, a sample is really just a subset of a fixed size. Is that such a difficult concept to reify? It seems like the bigger issue is that students need to be able to conceive of the set of all possible samples of a fixed size. That is a very big set.

APOS theory is based on Piaget’s concept of reflective abstraction. It’s founder, Ed Dubinsky, has long been an advocate of using the programming language ISETL as a way to teach mathematics. Seymour Papert is another pupil of Piaget who has also advocated for the use of programming in math education. Papert’s version of Piaget is a little strange; at times he sounds more like Vygotskyite, especially in the significance he attributes to cultural artifacts. For example, in Mindstorms Papert describes a “typical experiment in combinatorial thinking” in which children are asked to form “all possible combinations of beads of assorted colors.” He notes that “it is really quite remarkable that most children are unable to do this systematically and accurately until they are in the fifth or sixth grades,” and then provides the following analysis.

The task of making families of beads can be looked at as constructing and executing a program, a very common sort of program, in which two loops are nested: Fix a first color and run through all possible second colors: then repeat until all possible first colors have been run through. For someone who is thoroughly used to computers and programming there is nothing “formal” or abstract about this task. For a child in a computer culture it would be as concrete as matching up knives and forks at the dinner table. Even the common “bug” of including some families twice (for example red-blue and blue-red) would be well-known. Our culture is rich in pairs, couples, and one-to-one correspondences of all sorts, and it is rich in language for talking about such things. This richness provides both incentive and a supply of models and tools for children to build ways to think about such issues as whether three large pieces of candy are more or less than four much smaller pieces. For such problems our children acquire an excellent intuitive sense of quantity. But our culture is relatively poor in models of systematic procedures. Until recently there was not even a name in popular language for programming let alone for the ideas needed to do so successfully. There is no word for “nested loops” and no word for the double-counting bug. (pg. 22)

In short, students have never encountered anything like a set of all combinations before. It is difficult for them to imagine this set.

In his short story, The Library of Babel, Borges describes a library “composed of an indefinite, perhaps infinite number of hexagonal galleries.” The books in this library are composed of “twenty-five orthographic symbols.” William Goldbloom Bloch has a wonderful book of essays exposing the “unimaginable mathematics” behind Borges’ story. In the first of these essays, he works through a rough estimate of the number of books in the library: 25^131200, a number that he notes is several magnitudes larger than the number of particles in the universe.

Somehow, it feels all too easy, even anticlimactic, as though instead we should have had to write pages and pages of dense, technical high-level mathematics, overcoming one complex puzzle after another, before arriving at the answer. But most of the beauty–the elegance–of mathematics is this: applying potent ideas and clean notation to a problem much as the precise taps of a diamond-cutter cleave and husk the dispensable parts of the crystal, ultimately revealing the fire within. (pg. 17)

As Bloch notes, “the number of books in the library, although easily notated, is unimaginable.”

It seems that the problem with the sampling distribution is just the same. Although easily notated, it is unimaginable. At best, I can show my students a small sampling distribution for a population of 7 and sample sizes of 3.

A “small” sampling distribution.

This example completely contradicts the point of the sampling distribution. If there were only 7 people in the population then you would just talk to all 7 of them. There would be no need to select a sample and use statistical inference. The precise point of the sampling distribution is that we cannot survey every person in the population so we need to how know reliable a small sample of these people will be. Just as children struggle to systematically arrange beads of different colors in Papert’s example, my students struggle to conceive of the sampling distribution. It’s unimaginably large and they have no cultural artifacts or prior knowledge to build upon. What we really need is a programming language that is visual and accessible in which students could develop a concrete understanding of the sampling distribution through systematic thinking.