|
I started working at Netflix in February 2025. Like most new engineers in a new role, I was given an onboarding task, something simple and straightforward designed to get me familiar with the codebase and get a quick win. Onboarding tasks are meant to be simple and light weight, essentially a task you can finish it in a relatively short amount of time and get a quick win. My onboarding task fit all those criteria. It was well-scoped and the context was clear. I dove in, got my development environment up set up and made the code change. I wrote unit tests and integration tests to verify everything worked as expected. All green. Everything looked good. While working on the task, I noticed something odd. The API I was modifying was taking two arguments that appeared to be duplicates. It looked like redundant code that could be cleaned up. So I thought, “Why not fix this while I’m here? Kill two birds with one stone and make a good impression.” So I did. All previous tests were passing after I made this optimization and everything looked good. I made the first pull request, and honestly, I was proud of it. The PR was reviewed and approved by everyone on the team who had context on the service. I merged it after approval and the change went live. As someone who loves building things that reach people in real time, I was thrilled. This wasn’t just any service—it was part of Netflix’s streaming infrastructure. My code was out there, running, affecting millions of users. But that excitement didn’t last long. As the change rolled out to different regions, we got paged. Some users were experiencing streaming issues. The engineer on-call investigated and traced the problem back to my pull request—specifically, to that “optimization” I had made. My PR was immediately rolled back, and I was notified. My heart sank. After my own investigation, I discovered that the “redundant” argument wasn’t a mistake. It was there by design. An upstream service depended on that exact structure, and when I removed it, I broke their integration, which in turn broke streaming for some users. No one on my team knew that the dependency on the upstream team was structured that way, which is why the PR had sailed through review. I had to present the incident to my team and later to my org. That was my first presentation at Netflix, by the way. Talk about an introduction. During the presentation, I focused on what I could have done differently—how I shouldn’t have assumed the code was redundant, how I should have asked more questions etc, etc. I showed up taking full responsibility and blame for everything that had happened. I instantly got feedback after my first presentation urging me to take a different direction for future incidents. While taking full responsibility is expected, I was advised to present incidents in a different way in the future. The goal of the review was not to point blame. Multiple people reviewed the PR, so this wasn’t just on me. What mattered more was asking: What can we put in place to prevent this from happening again? How can we work better with upstream teams to understand how they’re using our service? That shift in perspective stuck with me. I walked away with several takeaways that have shaped how I approach development ever since: Keep changes small. One PR should do one thing. If you’re making a large change, break it into the small chunks. This limits the blast radius of potential issues, makes roll backs easier and makes reviews more effective. It’s much easier for teammates to carefully review 3-4 files than 54. Don’t make assumptions. That “obviously redundant” code might be there for a reason you don’t know about. Always ask. Reach out to people who might have context. Mistakes will still happen. You can follow all the best practices and still cause incidents. That’s software development. What matters is that you take responsibility, learn from it, and work to prevent repeating it. My first PR at Netflix didn’t go how I expected. But looking back, I’m grateful for how me and my team handled it. The incident taught me more in two week than I could have learned from months of smooth sailing. If you’ve ever shipped something that broke production, you’re not alone. We’ve all been there. The key is to grow from it and using it as a learning opportunity not just for you but for your whole team and company. Uma |
Join my email list for updates, behind-the-scenes thoughts, and content I don’t post anywhere else.
I was recently speaking on a panel at an event for software engineers who recently got laid off. You could feel the tension in the room. Smart people. Solid experiences. But the same question kept coming up in different forms: How do you stand out when companies have hundreds of qualified candidates to choose from? My answer was simple. The narrative you can build around your career and how you convey it is what will make you stand out. Your narrative is more than what you’ve done or how many...
I used to think the most important career document was your resume. Turns out, I was wrong. There's a document that will quietly do more heavy lifting for your career than your resume ever will. It's called a brag sheet and once you start one, you're never going back. So what exactly is a brag sheet? It's a living document that tracks all the work you've done and the value you've delivered. Think of it as your resume's more detailed, always-up-to-date cousin. Here's Why You Should Have One A...
Before we dive in: Today is the final day for new members to get 25% off their subscription to The Code Room for life. Whatever price you lock in today stays with you forever. Join here. The CodeRoom_ Very early on in my career, I used to think mastering a programming language would set me apart as a software engineer. But here's what I've learned from my time at Microsoft and Netflix, and from all the interviews I've done with other companies: knowing a programming language isn't what makes...