How Artificial Intelligence Could Change Schools & Change How We Test Students
Greg Toppo | February 19, 2025
Your donation will help us produce journalism like this. Please give today.
Among other distinctions, Kristen DiCerbo can lay claim to being one of the first people on the planet to come face-to-face with the educational potential of generative artificial intelligence.
In the fall of 2022, months before the public got a glimpse of ChatGPT, DiCerbo, a learning scientist and chief learning officer at Khan Academy, got access to a beta version of Open AI’s GPT-4 model. The startup needed Khan Academy’s help training it to pass the Advanced Placement biology exam, a requirement dreamed up by Microsoft co-founder Bill Gates, who wanted improved performance as a condition of handing Open AI more funding.
Khan Academy founder Sal Khan and DiCerbo negotiated a partnership with Open AI, and just five months later, their AI-powered Khanmigo tutoring bot debuted. Last summer, Khan Academy launched an AI writing coach.
Nearly two years in, DiCerbo remains bullish on the possibilities of AI tutoring, cheerfully engaging critics about the limitations of the technology, even as by all measures it evolves and improves.
Much of the press for Khanmigo has been positive: late last year, 60 Minutes produced an upbeat feature on Khan Academy’s efforts — host Anderson Cooper called Khanmigo’s potential “staggering,” but tempered the observation by adding, “It’s still very much a work in progress.”
Other media accounts have challenged Khan’s predictions that AI will revolutionize education anytime soon, with a Wall Street Journal reporter a year ago observing that Khanmigo didn’t consistently know how to round answers or calculate square roots and “typically didn’t correct mistakes when asked to double-check solutions.”
Khan Academy has said improvements are ongoing, but that at least a few errors are likely to persist. The organization stresses that Khanmigo remains “imperfect” and “still evolving.”
In March, DiCerbo will appear at South by Southwest EDU, alongside Curriculum Associate’s Kristen Huff and Akisha Osei Sarfo of the Council of the Great City School to discuss how AI can improve school assessments. The panel will be moderated by The 74’s Greg Toppo, who spoke recently with DiCerbo in a wide-ranging interview.
They talked about Khanmigo, its critics and why she feels “cautiously optimistic” about the role of AI in education.
The interview has been edited for length and clarity.
You’ve been with Khan Academy now for almost five years, and it’s been an eventful time. You’ve spent a lot of that time creating and improving Khanmigo. What are the latest developments?
We have learned a lot in what is coming up on two years since Khanmigo launched. In terms of what students are doing, we definitely see some interesting things we didn’t necessarily expect. Students who are English language learners really like and use the supports in other languages. We probably shouldn’t have been surprised, but always need to be reminded that it’s important to just have instruction on how to use new technology and tools, and what that looks like. For students, how do you ask good questions? And for teachers, how do you integrate it? So both professional learning for teachers and supports for students have been important things that we’ve added over time.
The other thing is that we have found that Khanmigo as a tutor works best when it is paired with educational content we have already created. It is better integrated and has lower error rates when it’s using, and has reference to, the existing problems that were written and verified by people — and not just the problems, but the [step-by-step] hints and the answers that already exist in our system. When it can reference those, Khanmigo is better. And when students are just working on the practice that is part of Khan Academy generally, they are using Khanmigo as an assistant and as a help to get unstuck.
When we talked last year, you used that word “unstuck.” You guys have come in for some criticism from critics like Ben Riley and Dan Meyer, who say Khanmigo gets math wrong, among other things. Meyer last year said he’s become a kind of pro bono consultant for you guys. [DiCerbo laughs.] You’re familiar with the criticisms, and I wonder: How have they landed? And have they had an effect on the product?
Dan has very good classroom experience and is extremely knowledgeable about teaching math. So many times, the things he says align to conversations internally that we’re already having. And the things he says are things that we end up changing and doing. We always appreciate criticism that helps us improve and move on. A lot of our work has been on things like working to better evaluate math accuracy, improve it, and get the balance right between how much Khanmigo gives help versus asks questions — all of the things we’re working to tune and get right in that sweet spot for what the student needs.
Dan actually just this week had another Khanmigo post. The thing he misinterprets about us is that he thinks we’re trying to replace teachers, and he thinks we don’t value teachers. That’s what that whole post was about. And that is just not how we see what we’re doing. We see Khanmigo as a tutor that’s also working in the same ecosystem, but the teacher is fundamental to this whole process.
I saw the 60 Minutes piece with Anderson Cooper a while back and I wonder how that landed.
First of all, the writing piece they highlighted is something we’re pretty excited about. Very often in schools, kids do writing assignments and teachers end up with a huge stack of writing. As Sarah [Robertson], who’s our product manager, said in the piece, she had to limit herself to only 10 minutes per essay feedback, and still it would take her hours and hours as a secondary school teacher to grade all of these essays — and then the students get them back two weeks later. That’s not immediate feedback. So the idea that we can potentially provide more immediate feedback to students on their writing is pretty interesting to us.
There’s a lot of concern over cheating.
We can say, “Hey, we’re going to flag this piece,” which Anderson did in his demo — he just cut and pasted in a whole bunch of content. We can say, “Hey, we don’t know where this content came from. We’re not going to make the judgment, but teacher, here’s a flag for you to check on.”. Not surprisingly, we got a lot of queries from school districts asking about getting access.
When I was writing the piece last year about IBM Watson and the effort to make it into a tutor, you expressed a cautious optimism that despite all the failures we’ve seen, this time was different. It’s been almost a year now. I wonder if your feelings have changed about AI tutoring generally and Khanmigo specifically?
I would still characterize how I feel as cautiously optimistic. I don’t think this is The Golden Ticket that’s going to save us all and be the sole reason that educational outcomes improve. I do think it still can be an important tool in the toolbox.
Does the change in presidential administrations have any bearing on your work, given that President Trump’s got an apparent interest in AI and support from big tech, specifically Open AI and Meta?
There is a lot of noise about what may or may not happen. We are basically sticking to “What are our technology partners doing, and what are we able to then partner with them to build?” And we will see what actually comes to fruition and deal with it if and when anything actually happens. We’re not counting on anything either way.
My last question about this topic is the earthquake that happened with the Chinese AI startup Deep Seek. The interpretation that I’ve been hearing is that it has caused supreme havoc at places like Open AI. I wonder if any of this has redounded to you guys?
Not specifically the Deep Seek piece, but it’s just part of what we have thought is likely to be the future — it’s just a little bit sooner than perhaps we thought. The models themselves become a commodity. Even since we launched, the prices have come down so far that it’s significant. We’re able to offer what we do at significantly lower prices, and that’s just likely to continue. And it’s not going to be the models themselves that are the “moat” or the differentiator — it’s going to be what people build with them.
Is it even in the realm of possibility that you would work with a company like Deep Seek?
Well, Deep Seek’s model is open-source, so you can install it on your own machine. And that’s part of the concern about security and privacy with the app that, of course, has ties to the Chinese government. Then there’s the question about the model itself, as an open-source model, How does it perform? I would not rule out us using open-source models from different sources, but they would have to be evaluated, like all our models are, for security and privacy and their performance.
Let’s talk about South by Southwest. The session we’re doing is titled “How AI Makes Assessment More Actionable in Instruction,” which doesn’t exactly roll off the tongue. But it gets to an interesting idea, which is that AI can make assessments better: more invisible, more customizable, and help teachers adapt instruction. I wonder what you’re seeing in terms of the ways AI is moving into that field?
It’s interesting, because the assessment conversation has lagged a bit behind the learning conversation when it comes to AI. But it seems to be picking up speed this year, both at South by Southwest and at ASU+GSV.
Traditionally we’ve had multiple-choice tests. You and I know there’s the whole game-based, simulation-based movement. What does AI let us do? The idea of a conversational-based assessment is interesting. What if the assessment looks like what happens when a teacher sits down next to a student and says, “Explain your thinking. How did you get to that?” There’s a conversation there. And that could potentially be an interesting way of adding to assessments that we already have. Of course, there would be questions: Is that standardized? Because different kids might get different questions as they engage in this conversation. How do we deal with that when we’re talking about high-stakes assessment?
The last thing I think is interesting is helping teachers and parents make sense of assessment data and get recommendations. Can AI help with that? Instead of getting this printout that says, “Your student got a 580 on this,” and you’re like, “What does that even mean? What should I do?” If you could have a conversation about that, that might be an interesting piece. We’ve been exploring that in something we have called Class Snapshots and recommendations that allow teachers to talk about their students’ Khan Academy performance. What else might they assign? How might they group students based on those kinds of things?
In the past couple of months I’ve been playing around with AI tools that summarize and analyze big chunks of text and YouTube videos and whatnot. It strikes me that we are going to become so used to having a tool like this break things down for us that if schools can’t help us break our students’ performance down, we’re going to be disappointed. Is my cart ahead of the horse?
I always try to figure out if I’m in a bubble or not, because I feel the same way. I know lots of people that similarly are really getting into the habit of whenever they get a large amount of information, put it into an AI tool and get the summarization. I’m not quite sure how broad-based that is when we think about all of the parents out there and all the schools, but that is what I’m seeing, and it might become an expectation in the near future.
Is there something on the horizon that you are looking at that maybe others aren’t paying attention to — good, bad or other?
The video was out quite a while ago of Sal and his son doing that little demo [of Khanmigo]. We’re starting to get to a place where the AI is seeing what the student is working on, and is able to interact with that and move forward. I’m pretty excited about that.