WEBVTT

00:00.320 --> 00:04.040
Today I resigned from GitHub after spending almost four years there and

00:04.540 --> 00:08.400
growing to senior engineer. I left to help improve AI safety because

00:08.900 --> 00:12.320
there is a dangerous trend in how AI coding agents are being deployed right

00:12.820 --> 00:16.440
now and this is going to affect you personally. I'm going

00:16.940 --> 00:20.920
to break down what I observed, why it made me walk away, and what

00:21.420 --> 00:24.760
I'm doing about this problem. So first of all, I didn't

00:25.260 --> 00:28.620
leave GitHub because I hated my job. It was genuinely the best job

00:29.120 --> 00:32.860
I've had so far. I built the first version of newer respectful AI

00:33.360 --> 00:36.460
support systems with my team, helped other engineers across the org

00:36.960 --> 00:40.660
figure out how to build mature LLM solutions, and those experiences

00:41.160 --> 00:45.180
shaped who I am as an engineer today. I've also experienced the birth

00:45.680 --> 00:49.260
of agentic coding from more traditional AI autocomplete to newer

00:49.760 --> 00:52.980
agentic systems like Claude Code, GitHub, Copilot and others.

00:53.840 --> 00:57.640
Now we as programmers and non programmers experience the benefits of agent

00:58.140 --> 01:01.600
decoding daily and it has been a massive productivity boost.

01:02.080 --> 01:06.120
But over the past year I started noticing an extremely dangerous

01:06.620 --> 01:09.920
trend across the wider industry that I just couldn't ignore

01:10.420 --> 01:13.560
anymore. Companies are not just using AI for code

01:14.060 --> 01:17.400
generation, which I'm a huge fan of, just like many of you. They are now

01:17.900 --> 01:21.620
replacing entire parts of the software development lifecycle with AI

01:22.120 --> 01:25.660
agents. Review testing, deployment decisions

01:26.160 --> 01:29.620
and even architectural choices are being handed

01:30.120 --> 01:33.660
to agents with less and less human oversight. And I'm

01:34.160 --> 01:37.660
not only talking about small startups here that can afford this

01:38.160 --> 01:42.060
risk. The industry pressure has become so high that

01:42.560 --> 01:45.820
companies with millions of users are shipping code that a

01:46.320 --> 01:48.660
human barely touches before it goes into production.

01:49.710 --> 01:53.110
Now, before you misinterpret me, I actually just want to say that at GitHub

01:53.610 --> 01:56.750
I found there to still be a lot of respect for good high quality software

01:56.990 --> 02:00.710
every single day. It's just that the industry trend

02:01.210 --> 02:04.350
overall is causing software quality to drop quickly.

02:04.590 --> 02:08.430
There are already projects and platforms trying to test the limits of

02:08.930 --> 02:11.790
how far we can push coding agents to automate everything.

02:12.350 --> 02:15.910
And the direction here is clear. Some organizations are starting

02:16.410 --> 02:19.950
to point at pull requests as an annoying bottleneck that

02:20.450 --> 02:23.950
slows down their AI generated output. Now think about that for just

02:24.450 --> 02:28.350
a second. The one remaining checkpoint where a human looks

02:28.850 --> 02:32.510
at the code before it reaches your users. And the industry is calling that a

02:33.010 --> 02:36.670
bottleneck. The ultimate step here seems to be to displace

02:37.170 --> 02:40.830
human review altogether. So why does

02:41.330 --> 02:45.160
this problem make me quit my create job at GitHub? Well,

02:45.310 --> 02:49.030
well, it has been one of the best jobs I've had and I will easily

02:49.530 --> 02:53.150
admit this AI that I've used at my job very often writes

02:53.230 --> 02:56.790
better code than me, and it outputs code faster

02:57.290 --> 03:01.349
than me. One of the ways this has excited executives in the tech industry is

03:01.849 --> 03:05.990
that it promises that every developer can be a 10x or even a 100x

03:06.490 --> 03:09.150
developer, because 10x is not enough anymore. Now,

03:09.390 --> 03:12.810
before we get too enthusiastic, think about what that actually means for

03:13.310 --> 03:16.970
a moment. If one developer can generate 100

03:17.470 --> 03:21.570
times the code, they can no longer meaningfully review what

03:22.070 --> 03:25.610
was produced. And I'm noticing this every single day as I

03:26.110 --> 03:29.650
work as an engineer. Instead of the industry seeing this as a problem,

03:29.970 --> 03:33.970
the collective response has been that agents should just take care of reviewing

03:34.470 --> 03:38.090
code as well. The critical issue with this is, is that AI

03:38.590 --> 03:41.930
writes better code often, as I just said, but it does

03:42.430 --> 03:46.130
not always write better code. It will continue to

03:46.630 --> 03:50.010
hallucinate, it will continue to make mistakes, and when tasks are

03:50.510 --> 03:53.850
long and complex enough, it will create incomplete solutions.

03:54.570 --> 03:58.090
But it does always write more code than me. Way,

03:58.410 --> 04:01.770
way more code. In a normal workday in the past,

04:01.930 --> 04:04.650
I might have had days where I wrote a few hundred lines of code.

04:05.090 --> 04:08.530
And now it's easy to write thousands of lines in a single session.

04:09.330 --> 04:13.490
So what happens when you have an imperfect system generating

04:13.990 --> 04:16.370
enormous amounts of almost perfect code?

04:16.930 --> 04:20.809
You get a statistical guarantee that there will be bugs

04:21.309 --> 04:25.130
in the system that you as a human, do not have the capacity to

04:25.630 --> 04:28.850
check for anymore. This is not a gut feeling,

04:29.010 --> 04:32.410
it is a mathematical fact. And the industry's

04:32.910 --> 04:36.810
current response to this problem is to just throw more agents at

04:37.310 --> 04:41.250
it. Oh, let's just evaluate agent output with more agents with just

04:41.410 --> 04:44.530
slightly different system prompts. Let's write automated tests

04:45.030 --> 04:48.450
with agents. Let's let agents test software end to end, stack more

04:48.950 --> 04:52.490
AI calls with slightly different prompts on top of each other, and just

04:52.990 --> 04:56.330
hope for the best. Now, in my brutal opinion, this is the

04:56.830 --> 05:00.390
software industry's equivalent of, of just playing dollhouse and pretending

05:00.890 --> 05:04.350
that there is a solution to this problem. Because we are currently trusting

05:04.590 --> 05:08.190
the same type of technology that's creating the mistakes to

05:08.690 --> 05:12.670
catch its own mistakes. Imperfect systems stacked

05:13.170 --> 05:16.270
on imperfect systems do not converge to perfection.

05:16.830 --> 05:19.910
The industry right now is actively choosing to

05:20.410 --> 05:24.280
prioritize raw code output over everything else. The amount of

05:24.780 --> 05:28.000
code generated can no longer be stopped by humans. And this is

05:28.500 --> 05:32.600
not because of some fictional rogue AI like you might read in some fiction books.

05:33.100 --> 05:36.920
No, this is just because the industry is letting this happen on purpose as

05:37.420 --> 05:40.920
part of a strategic decision. Now, unlike many statements made

05:41.420 --> 05:45.480
in the AI space, which are often based on hype or ulterior motives,

05:45.720 --> 05:49.040
I am confident in stating the following the future

05:49.540 --> 05:53.470
deployment of the fully autonomous AI software development lifecycle will

05:53.970 --> 05:57.230
lead to an absolute increase in broken software.

05:58.270 --> 06:01.230
Now I want to be clear about something because you might be thinking Xen,

06:01.730 --> 06:05.230
do you think that we should be stopping AI coding altogether? No,

06:05.730 --> 06:08.870
I'm not saying that we need to stop the point coding agents because this is

06:09.370 --> 06:12.790
highly unrealistic and I'm quite progressive. And honestly, I never want

06:13.290 --> 06:16.590
to go back to manually writing all my code myself anymore. And to be

06:17.090 --> 06:20.510
honest, some drop in quality is acceptable because some bugs just don't

06:21.010 --> 06:24.190
matter that much. Even in professional settings, not everything needs to be perfect.

06:24.350 --> 06:28.130
No, nobody is going to get hurt if, say, a support portal button

06:28.630 --> 06:32.130
breaks in an edge case condition. In general, being able to ship way

06:32.630 --> 06:35.930
more and break a few small things along the way is a net win

06:36.430 --> 06:40.490
for the majority of software out there. And I also understand that Vibe coding

06:40.990 --> 06:44.290
is awesome. It's very cool to one shot your favorite childhood game and that as

06:44.790 --> 06:48.170
a non programmer you can get into software easier nowadays. It's very nice.

06:48.730 --> 06:52.250
The problem is that the push to use these agents is so pervasive

06:52.750 --> 06:56.230
across every industry that bugs will inevitably appear. And in

06:56.730 --> 07:00.110
software where we cannot afford it, in healthcare systems,

07:00.190 --> 07:03.470
on financial platforms and infrastructure that millions

07:03.970 --> 07:07.990
of people depend on every day. And without fundamentally

07:08.490 --> 07:12.270
new forms of monitoring, AI coding agents will cause

07:12.770 --> 07:15.390
more harm than good in these kinds of critical systems.

07:16.510 --> 07:19.830
So it's a big problem, but what am I doing about it? Because it's

07:20.330 --> 07:24.200
easy to complain from the sidelines, right? Well, that is what made me resign

07:24.700 --> 07:28.360
from GitHub. It is very easy to just complain about problems that AI

07:28.860 --> 07:32.000
is causing, but if an opportunity presents itself to actually contribute

07:32.500 --> 07:35.960
to a possible solution, it is only right to try it, even if it

07:36.460 --> 07:39.840
can be difficult. I happen to find an opportunity at a research

07:40.340 --> 07:43.440
lab to work on exactly this problem. A new role

07:43.520 --> 07:47.600
building monitoring systems specifically designed for AI coding agents.

07:48.580 --> 07:51.940
Now next week I'll share the full details on this new role

07:52.100 --> 07:54.820
on my LinkedIn. So if you want to be the first to know,

07:54.980 --> 07:57.700
connect with me using the link in the description down below.

07:58.500 --> 08:02.380
But I'm also sharing this story because four years ago I was starting

08:02.880 --> 08:05.380
out as a junior and now I get to work on problems that I think

08:05.880 --> 08:09.500
genuinely matter for the wider industry. And this shows that it is

08:10.000 --> 08:13.300
still possible to get a high paid career while making

08:13.800 --> 08:18.000
a positive change or at least trying to do so. And also genuinely loving

08:18.500 --> 08:22.120
your job. And this is also why I push and help others towards their own

08:22.620 --> 08:25.920
ideal AI career as a way to give back. I'll be happy

08:26.420 --> 08:29.920
to answer any questions you have about the path that got me here. Make sure

08:30.420 --> 08:33.280
to leave a comment down below and connect with me on LinkedIn.