Two decades of Git: A conversation with creator Linus Torvalds - https://www.youtube.com/watch?v=sCr_gb8rdEI It's been 20 years, almost to the hour, since Git was self hosted enough to write its initial commit. Did you expect to be sitting here 20 years later still using it and talking about it? Still using it, yes. Maybe not talking about it. I mean that was. Has been one of the big surprises by just how much it was basically how much it took over the whole CM world just because I saw it as a solution to my problems. And I obviously thought it was superior. Even literally 20 years ago to the day, I thought that first version, which was pretty raw, to be honest. Even that version was superior to cbs. Sure, but, but at the same time I'd seen CBS just hold on to the market. I mean, SVN came around, but it's just CBS in, in another guys, right? For many, many decades. So I was like, okay, this market is very sticky. Yeah, I can't use CBS because I hate it with a passion, so I'll do my own thing. I couldn't use bitkeeper obviously anymore, so I was like, okay, I'll do something that works for me and I won't care about anybody else. And immediately that showed. In the first few months and years people were complaining that it was kind of hard to use, not intuitive enough. Right. Then something happened, like there was a switch that. Yeah, well, you mentioned bitkeeper. Maybe we can talk about that a little bit. Pretty famously, you wrote the initial version of git in around 10 or so dates as a replacement for the kernel. Yes and no. It was actually fewer than. Well, it was about 10 days until I could use it for the kernel. Yes, but to be fair, the whole process started like December or November the year before. So 2004, what happened was Bitkeeper had always worked fairly well for me. It wasn't perfect, but it was light years ahead of anything else. I tried. But beatkeeper in the kernel community was always very, not entirely welcomed by the community because it was commercial, it was free for open source use. Because Larry McVoy, who I knew, really liked open source, right? I mean he, at the same time he was, was making a business around it and he wanted to, to, to sell bitkeeper to big companies. But not being open source and being used for one of the biggest open source projects around was kind of a sticking point for a lot of people. And it was for me too. I mean, to some degree I really wanted to use open source, but at the same time I'm very pragmatic and there was nothing open source that was remotely good enough. So I was kind of hoping that something would come up that would be better. But what did come up was that Tridge in Australia basically reversed engineered bitkeeper, which wasn't that hard because Bit Keeper internally was basically a good wrapper around SCCS, which goes back to like the 60s. I mean this is not an SCCS is almost worse than CBS. But that was explicitly against the license rules for Bitkeeper. Like Bitkeeper was like, you can use this for open source, but you can't reverse engineering and you can't try to kind of clone bitkeeper. And that made for huge issues. I mean we. And this was something private. So I was talking Larry and I was emailing with Fridge and we were trying to come up with a solution, but Tridge and Larry were really on completely opposite ends of the spectrum. And then there was no solution coming up. Up. So for. By the time I started writing Git, I had actually been thinking about the issue for four months and thinking about what, what worked for me and thinking about how do I do something that does even better than bitkeeper does, but doesn't do it the way bitkeeper does it? Because I did not want to be in the situation where Larry would say, hey, you did the one thing you were not supposed to do, right? So yes, the writing part was maybe 10 days until I started using Git for the kernel, but there was a lot of like mental going over what the ideas should be. Well, I want to talk about maybe both of those things. We can start with that kind of 10 day period. So as I understand it, you had sort of taken that period as a time away from the kernel and had mostly focused on git and isolation. What was that transition like for you to just be working on Git and not thinking about the kernel? Since it was only two weeks, it ended up being that way. It wasn't actually a huge deal. I'd done things like that just for like in the last 35 years. I've been on vacation a couple of times, right? Not very many times, but I have been like away from the kernel for two weeks at a time before. And it was kind of interesting because it was one of my reactions was how much easier it is to do programming and users right? You need to be. There's so much less you need to care about. There's. Yeah, you don't need to worry about memory allocations, you don't need to worry about a lot of things. And debugging is so much easier when you have all this infrastructure that you're writing when you're doing so it was actually Somewhat. I mean, I wouldn't say relaxing, but it was fun to do something user spacey where I had a fairly clear goal of what I wanted. I mean clear goal in the sense I knew the direction, I didn't know the details well, that's the other thing actually I want to talk about is one of the things I find so interesting about git, especially 20 years on is it's so the development model that it encourages to me seems so simple that it's almost obvious at this point. But I don't say that as a reductive term. I think there must have been quite a lot of thought into distilling down from the sort of universe of source control ideas down into something that became git. Tell me, you know, what were the sort of non obvious choices you made at the time to get what we have? The fact that you say it's obvious now. I think it wasn't obvious at the time. I think one of the reasons people found Git to be very hard to use was that most people who started out using git were coming from a background of something CBS like. And the git mindset. I came at it from a file system person standpoint where I had this disdain and almost hatred of most source control management projects. So I was not at all interested in kind of maintaining the status quo. Like the biggest issue for me was. Well, there were two huge issues. One was performance because when I back then I still applied a lot of patches, which I mean git has made almost go away because now I just merge other people's code. But for me one of the goals was that I could apply a patch series and basically half a minute, even when it was like 50, 100 patches, you shouldn't need a coffee to. Right, exactly. And that was important to me because it's actually a quality of life thing. It's one of those things where if things are just instant, some mistake happens, you see the result immediately and you just go on and you fix it. And some of the others projects I had been looking at took like half a minute per patch, which was not acceptable to me. And that was because the kernel is a fairly large project and a lot of these SCMs are just not. Were not designed to be scalable. Yeah, so that was one of the issues. But one of the issues really was I knew I needed it to be distributed, but I needed to be really, really stable. And people kind of think that using the SHA1 hashes was a huge mistake. But to me SHA1 hashes were never about the security, it was about finding corruption, of course, because we'd actually had some of that during the bitkeeper thing. So bitkeeper used CRCs and MT5s. Right. But didn't use it for everything. So one of the early designs for me was absolutely everything was protected by a really good hash. And that kind of drove the whole project, like having two or three really fundamental design ideas, which is why at a low level Git is actually fairly simple. And then the complexities are in the details and user interfaces and in all the things it has to be able to do, because everybody wants it to do crazy things. But. But having a low level design that has a few core concepts made it a easier to write and much easier to think about and also to some degree explain to people what the ideas are. And I kind of compare it to Unix. UNIX has like a core philosophy of everything is a process, everything is a file. You pipe things between things. And then the reality is it's not actually simple. I mean, there's the simple concepts that underlie the philosophy, but then all the details are very complicated. And I think that's what made me appreciate UNIX in the first place. Yeah. And I think Git has some of the same kind of. There's a fundamental core simplicity to the design and then there is the complexity of implementation. There's a through line from UNIX into the way that Git was designed. Yes, you mentioned SHA1. One of the things that I think about in this sort of week or two where you were developing the first version of Git is you made a lot of decisions that have sort of stuck with us. Yeah. Were there any, including or not SHA one that you regretted or wish you had done differently? Well, I mean, SHA1 I regret in the sense that I think it caused a lot of pointless churn with the whole trying to support shot 256 as well as shot one. And I understand why it happened, but I do think it was mostly pointless. I don't think there was a huge real need for it, but people were worried, so it was shown. So I think there's a lot of wasted effort there. There's a number of other small issues. I think I made a mistake in how the indexed file entries are sorted. I think there's these stupid details that made things harder than they should be. But at the same time, many of those things could be fixed, but they're small enough, it doesn't really matter. All the complexities are elsewhere. Yeah. So it sounds like you have few regrets. I think that's Good. I'm curious also, in that sort of two week period, were there any moments where you weren't sure what you were trying to achieve was going to work or come together or be usable or did you always have a pretty clear. I had a clear idea of the initial stages but I wasn't sure how it would work in the long run. So honestly after the first week I had something that was good for applying patches but not so much for everything else. I had the basics for doing merges and the data structures were in place for that, but it actually took. I think it took an additional week before I did my first merge and there were a number of things where I wasn't. I didn't. I had kind of the big picture end result in mind, but I wasn't sure if I'm going to get there. Yeah, the first steps, I mean the first week or two, I mean you can go and look at the code and people have. And it is not complicated code. No, it's. I think the first version was 10,000 lines or something. You can more or less read it in a single setting. Yeah. And it's fairly straightforward and doesn't do a lot of error checking and stuff like that. It's really a. Let's get this working because I have another project that I consider more important that I want to need to get back to. It really was. I will hit. I mean and it happened where I would hit issues that required me to do some changes. The first version, you can tell it's not. I think we ended up doing a backwards incompatible object store transfer at one point at least. FSCK complains about some of the old objects we had because we. I changed the data format. I didn't know that's where that came from. Yeah. No. So there were things that were. The first version just was not doing everything it needed to do. Yeah. And I forget if I actually did a conversion or not. I may not have ever needed to convert. Yeah. And we just have a few warnings for like a few objects in the kernel where. Or FSCK will say hey, this is an old no longer supported format kind of thing. But on the other, on the whole it really worked. I mean surprisingly well. Yeah. The big issue was always people's acceptance of it. Right. And that took a long time. Yeah. Well, we talked a little bit about how sort of merging was put in place, but not functional until maybe week two or week three. What were the other sort of features that you left out of the initial version that you later realized were actually quite essential to the project. Well, it wasn't so much later I realized it was stuff that I didn't care about. But I knew that if this is going to go anywhere, somebody else will. I mean, the first week when I was using it for the kernel, I was literally using the raw, what is now called plumbing commands by hand. There was no. No so called porcelain. There was nothing above that to make it usable. So to make a commit, you'd go do these very arcane things. Commit tree. Yeah, commit tree. Right. And that just returns sha that you by hand just write into the head file and that was it. Did hash object exist in the first version? I think that was one of the first binaries that I had where I could just check that I could hash everything by hand and it would return the hash to standard out. Then you could do whatever you wanted to it. But it was like the early porcelain was me scripting shell scripts around these very hard to use things. And honestly, it wasn't easy to use even with my shell scripts. But to be fair, the first initial target audience for this were pretty hardcore kernel people who hadn't been using bitkeeper. So they at least knew a lot of the concepts I was aiming for and people picked it up. I mean, I think I had. It didn't take that long before some other kernel developers started actually using it. And I was actually surprised by how quickly some source control people started coming in. And I started getting patches from the outside within days being the first git version public. So we've talked a lot about the sort of first couple of weeks with git. I want to move forward a bit. You made the decision to hand off maintainership to June pretty early on in the project. I wonder if you could tell me a little bit about what it's been like to sort of watch him run the project and really watch the community interact with it at a little bit of a distance after all these years. I mean, to be honest, I mean, I maintained git for like three or four months. I think I handed it off in August or something like that. And when I handed it off, I truly just handed it off. I was like, I'm still around. I was still reading the git mailing list, which I don't do anymore. Junior wanted to make sure that if he asked me anything, I'd be okay. But at the same time I was like, this is not what I want to do. I still feel silly. My oldest daughter went off to college and two months later she sends this text to me and say says that I'm more well known at the computer science lab for Git than for Linux because they actually use Git for everything there. And I was like, Git was never a big thing for me. It was a. I need to get this done to do the kernel. Sure. And it's kind of ridiculous that, yes, I used four months of my life maintaining it, but now at the 20 years later, yes, you should definitely talk to Junio, not to me because he's been doing a great job and I'm very happy it worked out so well. But to be honest, I'll take credit for having worked with people on the Internet for long enough that I was like, during the four months I was maintaining it, I was pretty good at picking up who has got the good taste. Yeah. To be a good maintainer. That's what it's about is taste for you. For me, it's. It's hard to describe. Yes. But it's. Yes, you have to. You can see, you can see it in patches. You can see in how they react to other people's code, how they, how they think kind of things. And he was not the first person in the project, but he was one of the early ones that was around from pretty, pretty much week one after it I had made it public. So he was one of the early persons. But it wasn't like you were the first one tag, you're it. It was more like, okay, I have now seen this person work for. For three months and I don't want to maintain this project. I will ask him if he wants to be the maintainer. I think he was a bit nervous at first, but. But it really has been working on. Yeah, he's certainly run the project very admirably in the. Yeah, I mean, so taste is to me very important. But practically speaking, the fact that you stick around the project for 20 years, that's the even more important part. Right. And he asks. Yeah, I mean, he's knowledgeable about almost every area of the tree to a surprising degree. Okay, so we've talked a lot about early Git. I want to talk a little bit about sort of the middle period of Git maybe. One of the things that I find so interesting about the tool, given how ubiquitous it's become. It's clearly been effective at aiding the kernel's development, but it's also been really effective for university students writing little class projects on their laptops. What do you think was unique about Git that made it effective at sort of both extremes of the software engineering spectrum, the distributed nature really ends up making so many things so easy. And that was one big part that set Git apart from all, pretty much all SCMs before. Yeah, was. I mean there had been distributed SCMs, but there had, as far as I know, never been something where it was like the number one design. I mean, along with the other number one goals, you can work with Git purely locally and then later if you want to make it available in any other place, it's so easy. And that's very different from say CBS where you have to, in order to work with it, you have to set up this kind of centralized repository and if you ever want to move it anywhere else, it's just very, very painful and you can't share it with somebody else without losing track of it. There's always going to be one special repository when you're using a traditional scm. And the fact that Git didn't do that and very much by design didn't do that, I mean, that's what made services like GitHub trivial. I mean, I'm trivializing GitHub because I realize there's a lot of work in making all the infrastructure around Git, but at the same time the basic git hosting side is basically nothing because the whole design of Git is designed around making it easy to copy and every repository is the same and equal. And I think that ended up being what made it so easy to then use as an individual developer. When you make a new repository, git repository, it's not a big deal. It's like you do get in it and you're done and you don't need to set up any infrastructure and you don't need to do any of the stuff that you traditionally needed to do with an scm. And then if that project ever grows to be something where you decide, oh, maybe I want other people to work with it, that works too. And again, you don't have to do anything about it, you just push it to GitHub and again, you're done. That was something I very much wanted and I didn't realize how many other people wanted it too. I thought people were happy with CBS and sbn, right? Well, I didn't really think that, but I thought they were sufficient for most people at that time. We talked a little bit about just now sort of how Git has applicability on both ends of the software engineering extremes. I've lived my whole life with version control as part of software development. And one of the things I'M curious about is how you see Git's role in shaping how software development gets done today. That's too big of a question for me. I don't know. It wasn't why I wrote Git. I wrote it for my own issues. I think GitHub and the other hosting services have made it clear how easy it is now to make all these random small projects in ways that it didn't used to be. That has resulted in a lot of dead projects too. Like you find these one off things where somebody did something and left it behind and it's still there. But does that really change how software development is done in the big picture? I don't know. I mean, it changes the details. It makes collaboration easier to some degree. It makes it easier to do these throwaway projects and if they don't work, they don't work. And if they do work now, you can work together with other people. But I'm not sure it changed anything fundamentally in software development. Moving ahead a little bit, modern software development has never been changing faster than it is today. Are you going to say the AI word? I'm not going to say the AI word. Unless you want to. No, no, no. What are some of the areas of the tool that you think have evolved or maybe still need to evolve to. To continue to support the sort of new and demanding workflows that people are using it for? I'd love to see more bug tracking stuff. I mean, everybody is doing that. I mean, there are, whether you call it bug tracking or issues or whatever you want to call it, they're all. I'd love to see that be more unified because right now it's very fragmented, where every single hosting site does their own version of it. And I understand why they do it. A, there is no kind of standard good base and B, it's also a way to do the value add and keep people in that ecosystem, even when Git itself means that it's really easy to move the code. But I do wish there was a more unified thing where we're bug tracking and. And issues in general would be something that would be more shared among the hosting sites. Sure. You mentioned earlier that you were maybe not pretty quick, but it's at least been a while since you sort of regularly follow the mailing list. Yeah. In fact, it's been a little bit of time since you even committed to the project. I think by my count, It's August of 2022 was the last time we had a commitment. Yeah. I have a few experimental patches in my tree that just I keep around. So these days I do a pool of the Git sources and I have I think four or five patches that I use myself and I think I've posted a couple of them to the Git main list, but they're not very important. They're like details that tend to be very specific to my workflow. But honestly, I mean this is true of, of the Linux kernel too that I've been doing Linux for 35 years and it did everything I needed in the first year. Right. And the thing that keeps me going on the kernel side is a hardware keeps evolving and a kernel needs to evolve with that, of course. But B it's all the like needs of other people that never in my life would I need all of the features that the kernel does. Yeah, but I'm interested in kernels and I'm still doing that 35 years later. When it came to Git, it was like Git did what I needed within the first year, in fact mostly within the first few months. And when it did what I needed I lost interest because I didn't. When it comes to kernels, I'm really interested in how they work and this is what I do. But when it comes to scms, it's like, yeah, I'm not at all interested. Have there been any features that you followed in the past handful of years from the project that you found interesting? I liked how the merge strategies got slightly smarter. I liked how some of the scripts were finally rewritten in C just to make them faster. Because I saw that even though I don't apply like 100 patch series anymore, I do end up doing things like rebasing for test trees and stuff like that, and having some of the performance improvements. But then, I mean those are fairly small implementation details. In the end they're not the kind of big changes that. I mean, I think the biggest change that I was still tracking a few years ago was all the multiple hashes thing, which really looks very painful to me. Have there been any tools in the sort of ecosystem that you've used alongside? I mean I'm a huge tick user myself. I don't know if you. I've never. No, even early on when we had like when Git was really hard to use and they were like these add on UIs. The only wrapper around Git I ever used was Git K and that was obviously integrated into Git fairly quickly. But I still use entirely the command language. I don't use any of the editor integration stuff. I don't do any of that because my editor is too stupid to integrate with anything. It's much less good. So I occasionally do statistics on my git history usage just because I'm like, what commands do I use? And it turns out I use five git commands and Git merge and Git blame and Git log are three of them pretty much. So I'm a very casual user of git in that sense. I have to ask about what the other two are. I mean obviously Git commit and gitpo I did this top five thing at some point and it may have changed, but there's like, there's not a lot of. I do have a few scripts than them to like use Git rev list and go really low do statistics for the project. But then they. In terms of your interaction with the project. Yeah, yeah. What do you feel like have been some of the features in the project either from early on or in the time since that maybe haven't gotten the appreciation they deserve? Oh. I mean it has gotten so much more appreciation than it deserves that I. Yeah, that's the reverse of what I would ask me. Like a big thing for me was when people actually started appreciating what Git could do instead of complaining about how different it was. Yeah. And that, I mean that. That was several years after the initial git, I think it was. How was it? It was these strange web developers who started using git in a big way. It's like Ruby on Rails, I think. I mean, which I had no idea. I still don't know what Ruby even is. Right. But Ruby on Rails people started using git sometime in 2008, time frame, something like this. Right. And it was strange because it brought in a completely new kind of Git user, at least that I hadn't seen before. Right. And it must have existed like in the background. It just made it very obvious that suddenly you had all these young people who had never used SCM in their life before and Git was the first thing they ever used and it was what the project they were using was using. So it was kind of the default thing. And I think it changed the dynamics when you didn't have these old timers who had used a very different SCM their whole life. And suddenly you had young people who had never seen anything else and appreciated. And instead of saying Git is so hard, I started seeing these people who were complaining about how do I do this when this old project is in cvs? So that was funny. But yeah, now the fact that people are appreciating git, I mean, way more than I ever thought. Yeah. Especially considering the first few years when I got a lot of hate for really get interview. Oh, the complaints kept coming. Tell me about it. Oh, I mean I. It's more like. I can't point to T test, you'd have to google it. But the number of people who sent me, why does it do this? And the flame wars over my choice of. Of names. For example, I didn't have like git status, which actually is one of the commands I use fairly regularly. It's in top five. It's probably not in the top five, but it's still like something fairly common. I don't think I'd ever used it with CVS because it was so slow and people had all these expectations. So I just remember the first few years, the complaints about why the names of the sub commands are different for no good reasons. And the main reason was I just didn't like CBS very much. So I did things differently on purpose sometimes. And the shift, literally like between 2007 and 2010, those years. Yeah, yeah. When people went from complaining about how hard Git was to use to really appreciating some of the power of git was to me, interesting. Sure. We've talked about the sort of very early days and the inception of the project. We've talked a little bit about how Git is used in the wild. Today I want to spend maybe just a moment thinking about the future of the project. I guess maybe to start, I wonder in your mind, what are the sort of biggest challenges that Git either is facing or will face? I don't even know. I mean, it has just been so much more successful than I ever. I mean, the statistics are insane. Git went from used for the kernel and a couple of other projects to being fairly popular to now being like 98% of the SCM. I mean that. Sure, that's a number I saw in some report from last year. I mean it's, it's. I don't know how true that is, but it's like big. Yeah. And in that sense I wouldn't worry about challenges because I think scms there is a very strong network effect where. And that's probably why once it took off, it took off in a big way is there's just. When every other project is using git by default, all the new projects will use git too. Because the pain of having two different SCMs for two different projects to work on is Just not worth it. So I would not see that as a challenge for git as much as I would see it as a challenge for anybody else who thinks they have something better. And honestly, because git does everything that I need, the challenges would likely come from new uses. I mean, we saw some of that. We saw some of that with people who used git in ways that explicitly were things I considered to be the wrong approach, like Microsoft, the mono repo for everything which showed scalability issues. I'm not saying Microsoft was wrong to do that. I'm saying this is literally what git was not designed to do. I assume most of those problems have been solved because I'm not seeing any complaints. But at the same time, I'm not following the git mailing list as much as I used to. I don't even know if the large file issue is considered to be solved. If you want to put a DVD image in git that was like, why would you ever want to do that? But I mean, that's the challenge. When git is everywhere. You find all these people who do strange things that you would never imagine that I didn't imagine and that I considered to be actively wrong. But hey, I mean, that's a personal opinion. Clearly other people have very different personal opinions. So that's always a challenge. I mean, that's something I see in the kernel too, where I go, why the hell are you doing that? That shouldn't work. But you're clearly doing it to that extent. I mean, we talked about how, whether it's 98% or what the statistic is, Git is obviously a huge dominant component in software development. At the same time there are new version control upstarts that seem to pop up. Pjool comes to mind. Jiu Jitsu, Piper and things like that. I'm curious if you ever tried any of them. No, I don't. I mean, literally, since I came from this, from being completely uninterested in source control, why would I look at alternatives now that I have something that works for me? Yeah, I, I did. I mean, I really came into git not liking source control and now I don't hate it anymore. And I think that databases are, are my particular, like that's the most boring thing in life thing. But SEM still haven't. Haven't been something I'm really interested in. You've given me a little bit of an end to my last question for you. So on schedule. Linux came about 34 years ago. Yeah, git20. That question and so we're maybe five or so years overdue for the next big thing. No, no, I, I see the other way around. All the projects that I've had to make, I had to make because I couldn't find anything better that somebody else did. But I much prefer other people solving my problems for me. Right. So me having to come up with a project is actually a failure of the world. Right. And the world just hasn't failed in the last 20 years for me. Right. Yeah. I started doing Linux because I needed an operating system and there was nothing that served my needs. And I started doing Git for the same reason I started subsurface, which is my dialogue. Well, no longer my dive blog software, but that was so specialized that it never took off in a big way and that sought one particular problem. But my computer use is actually so limited that I think I've solved all the problems. Part of it is probably I've been doing it so long that I can only do things in certain ways. I'm still using the same editor that I used when I was in college because my fingers have learned one thing and there's no going back. And I know the editor is crap and I maintain it because it's a dead project that nobody else uses. So I have a source tree and I compile my own version every time I install a new machine. And I would suggest nobody ever use that editor, but I can't. I've tried, I tried multiple times finding an editor that is more modern and does fancy things like colorize my source code and do things like that. And every time I try it, I'm like, yeah, these hands are too old for this. Right? So I really hope there's no project that comes along that makes me go, I have to do this. Well, on that note, on that note, thank you for 20 years of gift. Well, hey, I did it for my own very selfish reasons and really, I mean, this is the point to say again that yes, out of the 20 years I spent four months on it, and really all the credit goes to junior and hey, all the other people who involved in Git that have by now done so much more than I ever did in any of that. Thank you. Welcome.