Analysis: Why this tutoring ‘moment’ could die If we don’t tighten up the models
Mike Goldstein | November 27, 2023
Your donation will help us produce journalism like this. Please give today.
In a new Aspen Economic Strategy Group report, Jonathan Guryan and Jens Ludwig argue schools are bungling the rollout of high-dosage tutoring: “When schools are faced with the possibility of change, they tend to do fewer of the hard things that will help students and more of the easier things.”
So what happens next?
In a March column in The 74, Kevin Huffman warned: “I worry that policymakers will pretend high-dosage tutoring is happening at scale and then, when student outcomes do not measurably improve, declare that it hasn’t worked.”
So what’s the answer for scaling up at quality? Proven good models need to become great, so when they scale and inevitably dilute, they “merely” retreat back to: good. We must make it easier to be a good or great tutor. And that requires unusual “within program” research and development. In an essay published just last week, the Overdeck Foundation’s Pete Lavorini made that very case, noting there are “a number of exciting innovations underway to lessen the implementation burden without sacrificing effectiveness, by adjusting the high-impact tutoring ‘formula.’ ”
Let me describe what tutor innovation looks like in real life. First, you need decent scale. When I started Match Tutoring in 2004, we had 45 tutors (living literally inside the school, on our top floor). My friend, economist Matt Kraft, wrote in The74 how measuring that program’s impact launched his career studying tutoring. But 45 people is just not enough educators to easily A/B test “what works for individual tutors.”
Last year, I met a math educator, Manan Khurma, who founded a math tutoring company in India called Cuemath, with 3,300 tutors. I asked whether I could, with a few colleagues, (carefully) try new ideas, to see what works for his thousands students across the world? Manan said yes, he was interested in anything empirically valid that made tutoring better.
Second, you need a “problem of practice.” We zoomed in on a common problem, familiar to many educators: student talk! Some kids, especially if confused, are reluctant to speak up, to share what they’re thinking. Common Core and the National Council of Teachers of Mathematics both emphasize the need for math discourse, but teacher training in this area hasn’t led to kids speaking up more.
How to change this?
My colleague Carol Yu wondered if a Fitbit type device — a “Talk Meter” — might help, or would it annoy kids, or teachers?
We started small, enlisting a few kids and tutors to try a prototype. An AI bot would patrol a tutorial, and then, roughly 20 minutes into a tutorial, a little box would pop up on the screen. It told teacher and student what the talk ratio was, just like a Fitbit offers your step count when you glance at it. If either party was talking too much, they’d adjust.
The early signals were promising! So we ran a rigorous randomized control trial with 742 Cuemath teachers, and enlisted some research help, from Stanford’s Dora Demszky. This is often a third step: Enlist a scholar to bolster your measurement efforts.
The results were strong. In a forthcoming journal article, Dr. Demszky will describe the full experiment, but the punchline is student reasoning increased by 24%, and the talk ratio converged on 50-50 between kid and tutor — exactly what we wanted. Tutors asked better questions, and “built” on what kids said. Both students and tutors liked the Talk Meter (it led to lighthearted, warm interactions as well). Introverts particularly improved.
Fourth, you can layer experiments on top of one another. One we’re trying now is whether one-on-one coaching would build on TalkMeter success.
Should other programs build their own TalkMeters or tutor coaching efforts? That’s not our claim (though when I shared the TalkMeter result with friends leading other prominent tutoring organizations, several said “OMG — we should do this.”) There’s a key distinction that matters for scale. A technology intervention like TalkMeter is context specific. And a human intervention like coaching is talent specific.
I learned that lesson 14 years ago. We launched a teacher coaching program in New Orleans, with a wonderful educator named Erica. I enlisted Matt Kraft to measure it. He found large gains for teachers. Then we added coaches. The impact was diluted — a finding he wrote about here.
The point here is that high quality experiments, often in partnership with scholars, can help specific program models vault to greatness, as a way to counteract inevitable dilution at scale.
While we co-sign on the Guryan/Ludwig desire to “push” schools to do hard things, we also should make hard things easier, to have “good” impact by combining “great programs” with “merely solid” execution. (Of course, nothing can overcome shoddy execution).
That’s the only way this high-dosage tutoring movement will survive and expand.
Mike Goldstein is co-founder of the Math Learning Lab at Cuemath in India and the founder of Match Education in Boston.