Reflections of a CS251 TA

2024-12-13

For context, CS 251 is a course at Carnegie Mellon about theoretical computer science. Topics covered include DFAs, Turing Machines, undecidability, P-NP, Cook and Karp reductions, and a bunch more miscellany. Though the content of the course did impact my teaching and my experience, I will try to steer away from actually discussing mathematics here: my main goal is to focus on the “soft skills” side of teaching.

I want to start off by saying that I am so grateful for the opportunity to work with such wonderful students, fellow TAs, and an amazing professor. It may sound cliche (and it is), but TAing CS251 has genuinely been such a life-changing experience. I have made so many new friends through TAing and deepened some connections with some old friends, and I’m really excited to see what the next semester has in store.

Expectations and surprises

Of course, the main impetus for me to TA the course was work with students. And there were a lot of surprises in store for me in this regard. Having selected quite a busy schedule this semester, I frankly wanted to minimize the number of hours I could put in while still doing a good job for my students.

To establish some background, there are three difficulty levels of recitation that students can self-select in. Cantor is the slowest recitation, meant to reinforce the definitions and emphasize the process needed to solve common genres of problems in the course. Godel is the fastest, with a heavy focus on developing intuition for the kinds of arguments made in the course rather than working through some of the more boring details. And Turing is somewhere in-between. (These are just my interpretations of the roles each level should play. But I strongly believe this is the correct perspective, even if it is not universally understood.)

Obviously I picked the fastest level. This was for several reasons:

I thought that Godel students would be more prepared, meaning I would not have to spend as much time helping them out with mentor meetings and reviewing the fundamentals of the course.
I hate going through boring details and trivial verifications, it is far more interesting to emphasize the core ideas than to formally write out “This transformation is polynomial time because adding a constant number of vertices is polynomial time” on the board for every problem. This is the kind of thing I think would be really helpful to explicitly write out for Cantor students, but I do not think it is interesting. I personally would rather not teach this way, though I can if the need arises.
There are a few people every semester who are really mathematically mature and take the time to learn interesting results outside of the course syllabus, such as Rice’s Theorem. I find talking with these students to be a delight, and the vast majority (if not all) of them sign up for Godel.

Having taught a semester, I want to look back at my initial reasoning:

While this might be true on average (the jury is still out on this one), it certainly was not the case for me. Without delving into too much detail: I expected most of my students to find the course either easy or at worst moderately difficult. I was kind of blindsided to find out that, despite signing up for the most advanced recitation section, many students still found the course somewhat difficult. In retrospect, it should not have been that surprising. Two of my very intelligent friends struggled in the course, one of them clinching a B after a very intensive final review session I taught, and another one dropping the course entirely to try again next semester. It is by no means an easy course, no matter how well prepared you are.

In some part, my experience TAing a Godel section was a bit anomalous. Most other TAs did not have as rough an initial start as I did. But no TA had the easy-going experience that I hoped and expected would be tied to Godel. Nobody will have an entire mentee class that doesn’t struggle with the course, the laws of probability and the difficulty of the course prevent that. So going forwards, this will not be as much of a factor when deciding which level to teach. The difference is not as dramatic as I expected or hoped for.

It goes without saying I am very proud of my students. Despite a rough start (which is a very common experience in the first few weeks of CS251, no matter what level of recitation), they worked hard to learn the content and improved their performance dramatically. It is very rewarding to see your students grow, to see negative trends turn into positive ones, and I am glad that I got to experience it even while teaching the fastest level of recitation.
Theoretically I should put more emphasis on the proof templates for e.g. reductions in the course. But my co-TA (Sebastian) would go through these verifications in what I found to be excruciating detail, which is part of why I could get away with flying through the problems and just paying homage to the various details. (Although to be fair, the first time we introduced poly-time checks, I said “make sure to check your transformation is poly-time, it is free points” about 7 times with great emphasis. So these homages are not at all brief.)
This was the part of the course that went most as expected. I had a student who really knew a lot about mathematical logic, and talking about computation theory was quite refreshing.

Depth of the course

Segueing from Point 3, now I have to leverage my first criticism against the course (and unfortunately will have to discuss a little bit of mathematics here). The course is quite difficult because it has difficult problems, but it has a shocking lack of depth. Many of my conversations with these mathematically mature people were about this issue. I understand and even agree with the perspective that the course should not be too deep, since it is merely an introductory course. (It is not CS251’s fault that we do not run a graduate level computation theory course more often, and these students would have been much better served taking such an unfortunately nonexistent course.)

It is quite a shame, because CS251 has such a good presentation of the basics: DFAs, Turing Machines, undecidability, reductions. But very often, we stop before discussing the really deep or interesting results. We start with surface level definitions, and besides the fact that we can show specifically that HALTs is undecidable, we do not prove many shocking, interesting, or powerful results. And if the goal was to make the course more approachable, then this would be an understandable motive. But we throw away depth to make this course accessible, only to haphazardly throw away accessibility for a few cool Putnam-flavored problems.

Don’t get me wrong, I find the Putnam to be fun (even if I am quite terrible at it). But I really do not see a good reason to put hard homework problems that can take an upwards of two hours, when these problems do not develop the core ideas of the course, and omit fundamental results such as Rice’s Theorem. We could be making the course easier and yet more cohesive by putting standard results as homework exercises, rather than random miscellany.

This leads to CS251 being a very frustrating experience for people already familiar with this kind of mathematics.¹ A lot of effort is spent to solve some admittedly cool problems, but very little additional understanding of the core ideas of the course is gained. (I actually have more mixed feelings about the last third of the course just being random miscellany, since putting on filler means the exams are easier to study for, something which makes a lot of sense for a CS core.)

Content

Professor Ada loves to say that CS251 is about the people and not the content. I think this rings doubly true for me, because frankly the content is not my cup of tea. To be clear, I think it is very cool! But it is like milk tea to me: I don’t really drink it. I’d rather enjoy a nice cup of fruit tea with some lychee jelly.

But to continue the analogy, I’m not really drinking the milk tea myself, I’m the one serving it. Of course, to serve good tea, I have to taste it myself (which means I do spend a little bit of time thinking about all the content of CS251). But it is a much different experience, and it is very fun to see people acknowledge that boba tea is fire, even if they’re drinking a very different kind of boba tea and won’t be inhaling it every day in the quantities that I am.

You know, if I could magically shape the world to my every whim, I would be teaching algebra rather than e.g. Karp reductions. But this is just not the reality we live in, and frankly the difference in content is not that important to me. It would be a much different story if God decided to punish me and make me, say, a CS210 TA and I had to manually grade code (ugh). But the content of CS251 is actually cool even if it’s not literally my absolute favorite (algebra). I enjoy teaching it, and the social benefits of TAing CS251 more than make up for the content of an algebra course being slightly more interesting.

Speaking of…

The social aspect is the best part of TAing a large intro-level CS course. Hanging out with other TAs during socials and getting to be an absolute menace on Slack are experiences that you just don’t get when you’re the sole TA of an upper-div math or CS course. And getting to know the students has also been really nice.

We’ve had houseparties, gone climbing, baked together,² and held a puzzlehunt.³ I’ve gotten to know so many cool people and had so much fun over these events. There are a lot of funny stories here, but perhaps I will share them at a different time.

Office Hours

Teaching office hours was quite surprising. For context, I taught Sunday office hours, the second earliest time after the homework released that they were offered. So compared to (say) Tuesday, the night right before the homework was due, there was a lot less traffic.

The first surprise was the regularity at which students showed up to office hours. I’d have imagined people would irregularly show up once or twice to office hours, popping in whenever convenient. However, quite the opposite happened: most of the people at Sunday office hours were regulars. As someone who uses office hours on a “as needed” basis⁴ I found that quite surprising. But objectively this is quite effective, because usually people like me only realize they need help when they are beyond cooked for a class.⁵

The second surprise was how hard the problems could be. Let me be frank, when I took the course I did not find the problem sets to be unreasonably hard. In retrospect this is because I was fed so many of the solutions and the structure of the course meant that, even if I couldn’t solve a problem, I would still be fine. And the point is that the course is structured in a way where the problems are objectively hard (I mean come on, what is a Putnam A6 doing here), but because of the support systems they don’t feel impossible. But having TAed for a semester, these problems are actually quite hard. Having to explain them and also explain how I’d come up with this sort of solution is quite a doozy.

And a lot of times, I’d have to engage with creative attempts at solutions by students. Many times they didn’t work. Sometimes they would’ve theoretically worked, but these approaches were overly complicated and missed the key insights that the problem was trying to deliver. Occasionally they would work and even be better than the official solution. And having to tell these apart in maybe ten seconds can be intellectually demanding.

On the flip side, I do not have to have the solution and all its details in mind when I respond to a student. My students are very smart people, after all. And sometimes it is just about saying the right string of words to give a student the little push they need. I must confess there were times I suggested ideas whose conclusions I did not know (I hope I was clear about it when this was the case). But with such talented students, sometimes a hint about the kind of ideas that might work is enough. Sometimes, I deliberately do not give too much away and suggest the conclusions of ideas when I think it would be more helpful for the student’s growth.

And I really hope I have made the right judgment calls here. Giving too much away can deprive students of valuable learning opportunities, while giving too little detail or being too cryptic could cause a student to waste an hour of their life. And these are split second decisions: there is only so long I can say, “give me a second to think”, before I have to give an answer.

Students I have talked to have universally come away with the impression that I seem very confident in what I am saying when I teach office hours (and recitation). I am glad that it looks like I know what I am doing, but to be honest, I am never too sure. I’ve erred on the side of giving too much away because I think the problems in this course are particularly hard. But I think I should dispel this myth somewhere, if only for my own self-satisfaction.

One final note about actually teaching the problems. I think the most important trait of teaching is empathy, in the sense that for some given problem, you should be able to imagine the first things a student might try and common mistakes they might make. And working with students during office hours, trying to predict where their approaches might fail and responding to their insights on the fly, really helped me develop this skill.

Grading

Here is where my approach diverges from the average TA’s approach in this course, and of course, I think my approach is better.

I firmly believe that people should be graded proportional to how much of the problem they understood. In other words, “could someone clueless have written the same thing as you?” If you are asked to show some problem $L$ is NP-Hard, and you just write “reduce $L$ to $K$ ” for some NP-Hard $K$ and nothing else, you clearly ran out of time to actually solve the problem. This is clearly not a claim that you know how to do the problem, it is just initial setup.

Yet we give e.g. 4 points out of 10 for writing the absolute trivial “reduction” of the identity map. This does not show anything besides that the student was conscious for at least 2 of the last 8 weeks of the course, quite honestly. I would be much less inclined to give partial like this for two reasons:

It rewards structural understanding over conceptual understanding far too much. Our specific template of “how to solve a reduction problem” should not matter, all that should matter is that the student can correctly state a reduction and prove it. Yet here we are rewarding students massively for knowing how we write reduction arguments even if they have little to no insight on the actual reduction required for this problem.
Actually realizing that the rubrics are designed this way would change a student’s behavior a lot. It would lead to students trying to game the rubric by making sure to write the right flavor of incorrect solutions without the key insights. I think we should just encourage students to solve as many questions as they can, and not have them try to game the test for as much partial as they possibly could. And the best way to do that is not to give large amounts of partial.

I will also say that having a rubric may even be antithetical to this perspective of awarding a student based on how much of the solution they understood. But with so many students and so many different graders, I think a rubric is necessary for consistency. (But they are usually too narrowly designed, in my opinion.)

On the other hand, I think we are far too liberal in handing out deductions. Trivial off-by-one or similar errors, particularly those easily fixable, should not be cause for any deductions in my eyes. I am far more generous towards essentially correct solutions, because exam stress can cause people to mix up small details they wouldn’t otherwise. In these scenarios I think we ought to offer a little more grace and leniency when the error does not actually affect the proof.

However, I am only one TA among twenty-something. And in a course staff where majority and precedent rule, all I can do is try to argue for leniency on deductions. So of course, I do grade with the same rubrics as the other TAs, because a TA who disagrees with a rubric is fine, a TA who has gone rogue is not.

Conclusions

TAing for CS251 was really fun. When I started, I expected this to be something I only did for one semester just to see what it was like. While I may have my disagreements with some parts of how the course is run, I really respect the time and care put into it, and that people with more experience than me have very good reasons for their preferences and beliefs. (And I think it would be a shame if, after TAing for an entire semester, I had no thoughts whatsoever about the way the course was run.) All in all, the course is run very effectively. If CS251 has gotten a lot of students to appreciate some of the ideas of theoretical computer science and gotten some very strong students to think deeply about the best way to teach it, then by every metric CS251 has been a huge success. The professor is great, my co-TAs are great, and most importantly of all, the students are fantastic.

I’m really forward to TAing again in the spring! See you all next semester!