My First Code At Netflix Caused An Incident

I started working at Netflix in February 2025.

Like most new engineers in a new role, I was given an onboarding task, something simple and straightforward designed to get me familiar with the codebase and get a quick win.

February 2025. Los Gatos California. First Time In Netflix Office

Onboarding tasks are meant to be simple and light weight, essentially a task you can finish it in a relatively short amount of time and get a quick win.

My onboarding task fit all those criteria. It was well-scoped and the context was clear.

I dove in, got my development environment up set up and made the code change. I wrote unit tests and integration tests to verify everything worked as expected. All green. Everything looked good.

While working on the task, I noticed something odd. The API I was modifying was taking two arguments that appeared to be duplicates. It looked like redundant code that could be cleaned up.

So I thought, “Why not fix this while I’m here? Kill two birds with one stone and make a good impression.” So I did. All previous tests were passing after I made this optimization and everything looked good.

I made the first pull request, and honestly, I was proud of it. The PR was reviewed and approved by everyone on the team who had context on the service. I merged it after approval and the change went live.

As someone who loves building things that reach people in real time, I was thrilled. This wasn’t just any service—it was part of Netflix’s streaming infrastructure. My code was out there, running, affecting millions of users.

But that excitement didn’t last long.

As the change rolled out to different regions, we got paged. Some users were experiencing streaming issues.

The engineer on-call investigated and traced the problem back to my pull request—specifically, to that “optimization” I had made.

My PR was immediately rolled back, and I was notified.

My heart sank.

After my own investigation, I discovered that the “redundant” argument wasn’t a mistake. It was there by design. An upstream service depended on that exact structure, and when I removed it, I broke their integration, which in turn broke streaming for some users.

No one on my team knew that the dependency on the upstream team was structured that way, which is why the PR had sailed through review.

I had to present the incident to my team and later to my org. That was my first presentation at Netflix, by the way. Talk about an introduction.

During the presentation, I focused on what I could have done differently—how I shouldn’t have assumed the code was redundant, how I should have asked more questions etc, etc. I showed up taking full responsibility and blame for everything that had happened.

I instantly got feedback after my first presentation urging me to take a different direction for future incidents. While taking full responsibility is expected, I was advised to present incidents in a different way in the future. The goal of the review was not to point blame. Multiple people reviewed the PR, so this wasn’t just on me.

What mattered more was asking: What can we put in place to prevent this from happening again? How can we work better with upstream teams to understand how they’re using our service?

That shift in perspective stuck with me.

I walked away with several takeaways that have shaped how I approach development ever since:

Keep changes small. One PR should do one thing. If you’re making a large change, break it into the small chunks. This limits the blast radius of potential issues, makes roll backs easier and makes reviews more effective. It’s much easier for teammates to carefully review 3-4 files than 54.

Don’t make assumptions. That “obviously redundant” code might be there for a reason you don’t know about. Always ask. Reach out to people who might have context.

Mistakes will still happen. You can follow all the best practices and still cause incidents. That’s software development. What matters is that you take responsibility, learn from it, and work to prevent repeating it.

My first PR at Netflix didn’t go how I expected. But looking back, I’m grateful for how me and my team handled it. The incident taught me more in two week than I could have learned from months of smooth sailing.

If you’ve ever shipped something that broke production, you’re not alone. We’ve all been there. The key is to grow from it and using it as a learning opportunity not just for you but for your whole team and company.

Uma

Uma Abu (umacodes)

My First Code At Netflix Caused An Incident

If I Asked You to Tell Me About Yourself…

The Career Document Nobody Talks About

Your tech stack won't get you hired