Delivering Safe C++ - Bjarne Stroustrup - CppCon 2023 - https://www.youtube.com/watch?v=I8UvQKvOSSw Going to talk about how to write C for some definition of safety. And the general idea is, what is safety? What kind of safety are there? What do we need? That's the first quarter. Then I'm going to show you that we've been creeping up on that for a few decades. It's part of the initial aims of C. Then I'm going to talk about how to write good contemporary C under the label of the C core guidelines. And then I'm going to talk about profiles, which is about how to guarantee safety, because guidelines and being careful is not sufficient in all areas. So one of the reasons there's so much talk about safety is that parts of the us government started going on about safety, which is quite reasonable, but they're talking about the whole community, which may or may not be true. And they're talking about the mythical language C C, which I have something to say about. Anyway. You can look it up. This is a serious concern that we have to deal with. On the other hand, there's no reason to panic. C is doing well in general thing. I mean, t o b measures noise, not usage. So these numbers doesn't show precisely anything, but it does show that maybe a billion or two people depend on what we are doing, so we better do it well. So we have to address the safety issue. It's a real serious problem. I mean, I really don't want my brakes not to work when I press it if I had a car. And there are other things. If you're in financials, you don't want a transaction to disappear, especially when it went, if it was going into your account. And so there's a lot of aspects of this. And the interesting thing is that massive improvements really is possible in a lot of areas. One of my messages in this talk is, don't write c, C. Write C. We can do much, much better than some of the problems that has been documented. And, well, if we don't do it, somebody else will tell us what to do, and we like that even less. So ignoring safety issues would hurt the community, and offering guaranteed safety will be in the best tradition of C. So this is actually an opportunity. I mean, don't let a problem stop you from doing something good. So the idea of complete type and resource safety was in C from the beginning. Simula was one of the completely safe languages, except for the bugs. I was pretty good at breaking it. But one thing that we know is that we couldn't have complete safety with the hardware we had then and we can't now for all languages and for all uses. But being careful doesn't scare. So we have to use judicious programming techniques supported by library, enforced by language rules and analysis. And I wrote up a basic model for how to do that a few years ago. I actually presented it here, but not much happened. We need it to be C. That is, there shouldn't be restrictions on what we can do, even though there might be restrictions on how we do it, and there shouldn't be any decline in performance. This is c, and actually some of the techniques for writing safe codes improve performance. I'm talking mostly about what can be done by a compiler and static checking because it is free or actually gives improvements in performance. But of course you need range checking to deal with things that cannot be dealt with statically. And so basically I'm talking about type and resource safety. And I think this is pretty well defined. Every object is accessed according to the type with which it was defined. That's type safety. And every object is properly constructed and destroyed resource safety. You can manage resources that way. If you don't initialize things, then you're breaking some rule. And every pointer either point to a valid object or is a null pointer. That's memory safety. That's harder to achieve, but we can do it. And a reference through a pointer is not through the null pointer. That is, we have to check for null pointers before we go, and dereference these valid pointers, and access through a subscripted pointer is in range that requires a runtime check. And I'll get to that. So basically, this is just what the rules require. You read the standard and that's what it requires. Read Dennis Ritchie's 45 page C manual from 78. That's what it requires. It's just we haven't been doing it. And so the rules I'm putting forward here are more deduced than invented. They are deduced for what it takes to do what's on this slide. And enforcement rules that I'm suggesting are mutually dependent. You can't just take one thing out of context and expect to get easily specified benefit out of it. You have to have a framework for what you are doing to see what you are planning to get out of it. And C still has to serve a wide variety of users and areas. We have millions of users and billions of people relying on us, and one size doesn't fit all. We can't just say, this is, this is what safety is. Everybody do it this way. That doesn't quite work. C is also a systems programming. We manipulate hardware in various ways. We use unusual hardware that is not known in the language specification, and we can't outsource this to other languages. A lot of the so called safe languages outsource all the low level stuff to CSE. So if we couldn't do it well, there would be c left and basically. But somebody has to do the dirty job here. And we can't break billions of lines of existing code. I say can't, not shouldn't, because we can't. If we try, people will use the old compilers, they will go to different subset, they will just ignore us. This can't be done. And we can't upgrade the millions of developers quickly. It takes a long time. I keep bumping into people who learned C from videos and books that are ten or 20 years old. I mean, it's tragic, but it could have been so much easier and the result could have been so much better if they had up to date materials. Teaching up to date circumental. But getting a whole community like this to move forward is much harder than most people imagine. Okay, these are difficulties, but we must improve, and we can. So the challenge is to describe what we mean by type safe C use. No violations of the static type system, no resource leak. If a system leaks a resource, memory locks, file handles and such, I wouldn't consider it safe because I can crash it with the equivalent of denial of service attack. Or I could just be sloppy and it crashes when it runs out of resources. So I'm very keen on resource safety. And this is actually one of the things that came into circumental plus plus on in the first two weeks, and we have to convince developers to do this. There's a lot of belief out there that if you do grubby, complicated, low level stuff, it must be faster. And furthermore, I can write this grubby, low level, complicated stuff to show how smart I am. This does not work. Work. We have to raise the level of programming and get this to work at scale. I mean, if I have a class of students, I can get them to do what I tell them to, because otherwise they get a bad grade. If you are a manager, you can get people to write code the way you like it to, otherwise they don't get their pay rise or they get fired. But at scale, we actually have to convince people. They have to believe it's true. And so that's an important thing. And stability over decades matters. I mean, the only reason people will believe that the code they write today will run in ten years is that the code that was written ten years ago runs today. And safety is not just type safety. I think most of the areas I see are logic errors. I have seen less than an equal where greater than was expected and the cost was very large. And if you want to prove that the program does, what exactly does you get a very restricted language. There's some AdA code that goes in that direction. Resource leaks. I mentioned that concurrency errors. We are doing a lot of concurrency to be able to scale our problems and such, and we have to make sure that this works. You could consider any concurrency error type violation because an object wasn't used in its proper way, but it's well worth separating it out so that you can analyze and address it directly. Memory corruption, we just have to eliminate that type errors. If you use low level code with a lot of costs and void stars and other tricky stuff like that, void even worse, we have to avoid that and timing errors. If a response is needed in say 1.2 milliseconds, then that's not. If you are not on time, it's not good enough in a lot of real time control applications and allocation unpredictability. There's a ban on, on the use of the free store in flight software. You cannot allocate something after the engine starts, and you can't deallocate something at any point because you might get fragmentation and stuff like that. This means that separately managed chunk of memories are very important in C. All the significant applications have something on the side where they manage their memory themselves. The vector is the first and simplest example. We're all using that. It actually takes a chunk of memory and manages it for you. And then termination errors. I've dealt with systems where termination is not acceptable. Now if you have, say, 40,000 processors, you have to take into account that something will crash roughly every day, or at least every week. And therefore you can have a strategy that says, well, if a processor has a problem, just crash it, because we have written our software to make it work. But what if there's not such an extra thing? What if you are not allowed to crash? A friend of mine pointed out that he was programming scuba controllers. There's only one processor in theme. No, crashing is not an option. Some financial systems, legally you can't crash because you would lose a transaction, and that's not allowed legal. So there's many things here. I'm sure you could find another slide or half a slide of examples of this kind of thing. We have to be able to say, what is it we're trying to do. We're not trying to do everything everywhere because that is not necessary for a lot of people. And anyway, it's not feasible on the scale we are talking about. So security is also not memory safety. There's been some confusion in places between security and safety. Type safety. Type safety is not security. I was at a security course at Bell Labs many, many years ago. The first class was lockpicking. The second class, you are not allowed to use your badge to get into the building. If I can get your computer, all your backup tapes, all your memory sticks, I've got your stuff. Spies inside attacks. There's if you're a large organization, you have free people that can be bought or idealistic for something that's not the company's thing. Spear phishing apparently works very well. Door rattling if you try enough places, people have not done the right thing. Denial of service, SQL injection, corrupted input data. So if you want to attack something, you always attack the weakest link. I was told, how do you avoid getting your car stolen at Newhart airport? The answer is park next to a nicer car. So what I'm pointing out, I'm talking about type safety and memory safety and things like that. But don't confuse it with security. Security is a system property, and the system involves computers, people, storage areas, physical things, lots of stuff. I'm not talking about all of that, but remember it if somebody comes and screams security because you could have a bug in your program. Languages are not safe. Like that thing of writing less than instead of greater than such. And all safe general purpose languages have escape clauses. You have to access hardware resources. You have to improve efficiency of key abstractions. Doing a safe linked list implementation is very, very hard if you want it to be verified safe. I think we are up against the holding problem, but let's say it's just close to impossible. So there are things that we want to to do and we have to use the techniques that are not verifiability to safe. And then you have trusted code segments that works. Those are trouble spots, of course, but also libraries code written under less strict rules. You have to call something. How about the operating system that's written in c? You can't verify the operating system for that reason. It's far too complicated. And often the escape clause is c. So we can't just close all unsafe areas. Being safe where it matters, having it guaranteed where it matters, and preferably by default is really good. And so pointing out that you can't get absolute safety is not an excuse for ignoring safety issues. It is an argument for focusing our efforts to where we can actually make progress. So I'm going back in history. One of the reasons I started with C was I wanted to deal with hardware and I wanted to abstract from it so that I could write better code. And static type safety was the idea. I've been written and writing in languages that were statically safe for a while, including simula, which is the root of a lot of the higher level stuff in C. We had classes, encapsulation, things like that. But it's an elusive ideal, because sometimes we need progress, sometimes we need better progress in what we understand. Sometimes we need better hardware. So efficient use of hardware, that's where C is managed, complexity, that's where simula is. And I've been trying to move us more towards the similar area where we can afford it, where we can do it. Okay, so when I started, if you wanted to call a square root, you could crash the machine. Square root of two would crash the machine if you are lucky. Otherwise you just give you a bad result. The point was that the compiler had forgotten that you required a double and it wasn't converted. So one of the first things I did was to make sure that such things didn't crash. Since crashes are considered a safety problem. You could say that I started on this right away, actually. It was very interesting in the context of tightening up the language today was that I got ten years of trouble out of that little fix. I mean, even c today is like that. But it was controversial. Do you mean I have to look at the declaration to see what it means? I can't just look at the code. That was an argument I heard a lot. It's incompatible. So whenever people didn't like c, c, whatever, in the early days, they pointed to, oh, it's incompatible, it's different. You haven't done your job right. Well, I couldn't get my type safety without having an incompatibility. And one thing I've learned is people, when they want to not use a language like c, they pick something and says, oh, it doesn't do that today. It is safety. It's not safe. Okay. Anyway, it was essential. The type checking, overloading, user defined types, consistent linking, type safe linking, all of these things require that the language supported argument checking properly. Fine. So sometimes you just have to do it. It's important. And I think we'll get to that again on safety issues. And basic idea was to represent things in code and create abstractions that you could use. That makes your code simpler and actually safer. Vector string file handle concurrent task message queues. Da da da da da da. Unfortunately, they're not all standard today, but most of them are. Things take time. I think if we want to address safety, we need abstractions that support that notion. And immutability came in early. Const I felt I needed it. I was coming out of operating systems and things like that, and I knew that those things that couldn't happen, things that I didn't want to happen if I had constants. Actually, what I really wanted was read only and write only. But I talked to the C guys and they would take Const, they wouldn't take read only and write only. We would have been better off with that. But fine. Basically you can do little constants and you can do interfaces this way. That way we get better interfaces. That's also something that keeps going on. RaiI this was in the first two weeks, have a constructor that constructs the object, initializes the object, establishes the invariant of the object, if nothing else. If you have encapsulation, you need it. If you don't have encapsulation, you still need it to be able to think about your code. And then when you are finished, if it has acquired resources, you have to give them back again. Otherwise you have a resource leak. Any librarian can tell you that people will take out books and they will forget to give them back again. It's sort of human nature, and we have to do these things at scale. So resource release has to be automatic. And it was phrased rather differently in those days because I hadn't invented the terminology. But today we have it, and I hope we are all using it. Examples of non memory resources I have there file handles, locks, sockets, shaders, things like that. Okay, so basically here is an example of a resource leak. This is naive, unsafe code. And I used to see it a lot. People open a file, they acquire the file, the file handle, they use it, and they get out again. And code is not actually that nice. It's sometimes two pages long, and there's a return statement or a long jump or exception in that. Use f, and you never get to the fclose. So we need to not have such code. All the compiler sees, all the analysis that we see, all that the programmer sees is that a pointer comes out of a function and the manual says it has to be given back to another function. This is not good code, and it was a source of bugs. And so what we do is represent things directly in code. I talk about a resource, it should be in the code. Whoops, did I? Huh, strange. Out of order. Anyway, so object oriented programming came out of similar well defined interfaces, classes, abstract base classes, overloading, all that kind of stuff. Stuff. That's Oliohan Dahl in checkered jacket and Kristen Nugo in the other jacket. He was the one that invented inheritance and basically object oriented programming. But where I lost a slide that showed how to handle the file handle thing, obviously you create a file handle class and it's the structure closes the file, problem solved. And we should just do more of that. So the evolution of the language, of course, continued templates allowed us to have compile time selection of implementations, and we finally got concepts. So we have precisely defined interfaces, we really should use those consistently. If I had a time machine, I would go back and tell me about concepts back in 88, and we would have had much better templates, and I would have had an easier time designing and implementing them containers, so that we don't have to fiddle with arrays and pointers. And it enables range checking. There's enough information available to range check containers, and the major implementation implementations have range checked vectors. Unfortunately there's not a standard for it, and you can't just walk up to a computer and without doing anything else, gets rain shaking. That's sad, I think, and relevant in the context of safety and algorithms. We had the traditional STL begin end stuff, and now we can actually just sort the vector or any container. And that again means that there's bugs you can't make, like you can't sort from in to begin or things like that. It gets simpler, it gets clearer and safer. And smart pointers we can use, but they're still pointers. So we have range for and spans the loops, the c style loops there a sort of suspect, I mean, is n really the number of elements there? And you can't range check it, because from the language point of view, because there is not enough information in the pointer to be able to do range check. Now we have span that knows the size. We can range checks where it's needed, or it can just do the whole thing, like in that example there. I'll again point out that Dennis Ritchie and I was discussing this problem back in the eighties, and he wanted fat pointers that were span. So did I, but it took us some time to get it. This is not, I mean, if Dennis had been in control of c, we would have had this 20 years ago. And here was the slide that had disappeared. I will put it in the right place, but basically you solve the problem by having the appropriate abstraction that does the appropriate checks and releases the appropriate resources. Fine. Better late than never. Okay, so I wrote up the ideals for circumental, and basically one of the first things is no implicit violations of the static type system. And those are quite a few things there. The point is, this has been documented for a while when I'm talking about safety and safer things. And higher level didn't start yesterday. It's the best tradition of the language, and benefits comes from using the language. Well, use the facilities we're talking about here, and avoid raw pointers that you're supposed to delete. An owning raw pointer. Don't subscript a raw pointer because you don't actually know what the range is. And don't dereference raw pointer before you check whether it points to something. Don't have uninitialized variables and you can do it. That book there, which I wrote for beginners in 2001, I think it doesn't show any pointers and arrays until chapter 17, after I show them how to do graphics. This can be done. The good c, the higher level C is a consistent set of features. You don't actually have to go and fiddle with the low level stuff, so you really, really need to. You can write perfectly good code, perfectly good applications without going down there. And so if you want safety, don't write C C. There's no such language, but there certainly are uses where you'd think people had it. They say it's c, and they do all the things I said on the previous slides that you shouldn't, and they avoid the higher level stuff, usually claiming performance problems and they don't measure. I teach sometimes graduate students, and they still don't know how to measure stuff and they still talk about efficiency. This shouldn't happen. So anyway, we have to evolve our style towards what's provably safe, because provably safe is the easiest thing to deal with. It's easiest things to think about. And as usual I'm trying to talk people into it. It has its limits, so being careful doesn't scale. We have to formulate the rules for safe use. We have to provide ways of verifying that people actually do what they are trying to do. I mean, I don't think anybody here, I'm not excluded, that hasn't written a piece of code and they thought it did something and it didn't. I would say it happens well, probably every week. To me. That's why I like compilers. And we have to articulate the guidelines so that we can, so that we can understand what we are saying. Long, long lists of complicated rules or greek letters is actually not going to be easy to follow. And then we have to enforce the guidelines where needed, for where needed means we have to say, this is what I want here in my code. We'll get to that. So the state of affairs is essentially everything. What I describe in the past and what I will describe in the rest of the talk has been tried, and many at scale, for instance, range checking, strings, vectors and span. But nowhere has it all been integrated into a consistent, coherent whole. And that's what I'm arguing we should do. Much of the work is that I'm talking to is influenced by the work on the core guidelines, but we have to go further than the guidelines. As I said, being careful doesn't scale, and the aim is still guaranteed. Type and resource safe C and then other things too. This type and resource safe is pretty fundamental, but there's other things we want, and therefore we have to specify what these other things are. And a lot of stuff can be done today. We don't have to go and wait for a new release of the standard or something like that. And this is not just about safety. I have seen codes speed up by going to a higher level, by expressing more clearly what should be done. If I can understand what's going to happen, what should be done, so can the optimizer. And quite often you get benefits. One of my best ways of debugging and speeding up things these days is to rip out the complicated stuff, and what is left the optimizer can do a good job on. Oops. Okay, C core guidelines or just for information. That is the machine that makes your high end chips high end processes, essentially all of them. And of course, it's programmed in c. They contacted me to see if I could help them hire good C programmers. No, I'm not in that business. But really good C programmers can do really interesting things. General strategy. We rely on static analysis to eliminate potential errors, and this is impossible for general code. It's easy enough to write a piece of code that cannot be proven to be correct, and global static analysis is just unaffordable scale. So basically we need rules to simplify what we are writing to, something that can be analyzed efficiently and cheaply. Local static analysis. And there are some analyzers for the core guidelines that does things like that, and then provide libraries to make relying on these rules feasible. If we had to do everything at the language level, things get slow and unpleasant to write, and then we won't do it. With the right abstractions, with the right libraries, just about everything becomes pleasant. Okay, there's a philosophy here. This is a slide I made, I don't know, almost ten years ago, I highlighted a few things in red. Statically type safe can be checked compile time, don't leak resources, prefer immutable data. All of this is safety related. So again, this is in the best tradition of C and raising the level of the writing. And then there's low level rules that we use to implement these ideals. So basically we state what we want to and then we make long lists of rules that allows us to approximate getting that. And that's the low level things. Yeah. And the strategy of the core guidelines and of the profiles that I'm going to talk about next is the observation simple subsetting doesn't work. If you want to subset C to something safe or something elegant or something whatever, the first thing you have to get rid of is subscripting raw pointers and explicit memory management without any implicit releases from destructors. Well, if you take that away, you can't build just about anything. So what you do instead is you build some better abstractions, vectors, smart pointers, file handle classes, things like that. And once that is done, you can then cut away a lot of the complicated and dangerous stuff at the bottom. This strategy seems to work. We use the STL, the standard library, of course, and then there's a small support library, GSL, that was, by the way, where span came from. And a lot of people still use GSL span because it's guaranteed range checked. But the point is, we add no new language features. I want the result of using all of this to be ISO standard C. I don't want to design a new language. It's too hard work and it takes too long. I want to write code, good code now. So what we want is c on steroids, not some pitiful subset. Okay? And so we can actually, okay, if I had 10 hours instead of one and a bit, I could go through this, listen, and argue that we can do all of that. Of course, there's not time to this. This is a keynote, not a deep technical dive. So we can eliminate uninitialized variables, range errors, knowledge pointer dereferencing, resource list and dangling references. We can do more things. Unions use variants, casts, just don't do them. Overloading and template programming eliminates most cost if you do it right. Underfloor flow. I have not worked myself on that there, but I showed you a picture of an engine. Earlier there was a marine diesel engine so big that if you look carefully, you could see the engineer by cylinder head number five. And that is run by some generic programming that checks for underflow and overflow so it can be done. I just don't have specific proposal for it. And we have some ways of dealing with data erasers, but that takes a whole day at least. So that's it. Uninitialized variables. I mean, just do it. There's many clever ways of having delayed initialization that act like like initialization, but the code becomes brittle and it becomes hard for the human to see what is going on, even if the compiler have no problems with it making sure that every object is properly initialized. I am in favor of just initializing everything. There's one exceptional case, at least probably more, which is a huge buffer. And if you're doing low latency stuff, you don't want to initialize the buffer first and then fill it with good stuff second. Then you've just doubled the time for filling that buffer. I can think of quite a few industries where you would need to be able to have the escape clause from the initialization rules where you can mark things. This one is uninitialized, so it stands out in the code, so you can check jacket and things like that. A lot of these rules, to become realistic, you have to have some kind of escape clause, and you have to have it explicit so that the humans and the static analyzers can understand what is going on. Range checking of course I want range checking. I don't want subscripting of individual pointers. If you are writing C style code, you should be horrified, but we can do without it. Vector and span are the two main examples where the range checking can be done, and sometimes is being done simply because there's sufficient information available to do the check with a pointer. There isn't. We have to trust that we have checked the ranges correctly, and sometimes we don't get it right, so it's much better to use the abstraction for it. Range four is really nice. It eliminates a lot of the static checking, because you only have to check the beginning and the end. Algorithms are good. I had a bunch of students do some measurements of loops two weeks ago. They were really deeply shocked for each and accumulate beat their hand coded C style loops noticeably. But you should have seen my students, okay. And rightfully so. They've been told low level stuff is important and efficient, so often neural pointed here referencing. It's fairly easy to check that you have a non null pointer. There's an abstraction in the TSL there for it. Or you could have the static analyzer check that there is a test close enough to the use that you can verify that it's been done. Again, I don't want analyzers to be overly clever because I want to understand it too. No resource leaks. Be consistent with using abstractions that follow the RaII. We have a whole bunch of them. A naked new is a code smell. Don't have it. A naked delete is the same. Don't have it. Not an application code. They belong in implementations of abstractions. And I see a fair amount of code like this, mostly from people coming in from Java and C sharp, where you have to say new to get a user, to get an object of a user defined class. So you have a gadget there. I don't know what's inside the gadget. It may grab locks, it may grab file handles and things like that. And then you write some code, and then you have to remember to delete it. And people quite real says, well, I don't want to have to write that delete, but they say I want my garbage collector. But the garbage collector is not going to help you because of what was inside the gadget. You don't know what it is. It may be something that has to be explicit, freed, and that code of course, can either throw and get out, not getting to the delete, or it can return. Perfectly ordinary stuff. This has nothing to do with exceptions, though. I love exceptions, and they're really good for this kind of stuff. You can get into the trouble without having that. And so resource handles, we use a lot of unique pointers and shared pointers these days, and that sort of solves the problem. We are now guaranteed that the structure is called and things are properly released, simple and cheap. But you know, it's still a pointer and it still uses the free store. There's allocations. So what really we should do is more local objects there. Everything works without added notation and added allocations. Of course, if you are writing code for an embedded processor with very limited stack space, you have to be careful about this. But in general programming, and actually in most programming, this is what's preferred. It's not just this that should make you worried, but you see the naked new and analyzers easily warn you against that. But you should also be a little bit suspicious about the smart pointers because they're really inelegant. Okay, so dangling pointers, you really have to eliminate dangling pointers. There's all kinds of bad things that can be happened with a dangling pointer. You can write through it into somebody else's memory, you can read from it and get some garbage data, you can scramble things. This is really bad. And by pointer I mean anything that directly refers to an object. I mean this could be a smart pointer if you're allowed to dangle. It could be references, you name it. And we have to eliminate them. We can eliminate them with a combination of rules and by assuming that raw pointers are not owners so that we don't have to get interference from the memory management problems here. Okay, so here is a piece of code that's not okay. It looks innocent. I get a pointer and I delete it. Well, under the rules I'm crafting, that is a naked nu, and the analyzer will reject it. And that is good because when you see g, it makes something an object, it passes it to f and then it uses it. The use down there is the use of a dangling pointer. And of course in a real piece of code, the void f is not visible like that. It's probably o on some library somewhere that you've never seen. But the static analyzers can handle this. And should I really would like the guarantee that this doesn't happen in my code. And what you do is you deal with every object has one and only one owner. That means that we know who is supposed to do the deletes, the closings or whatever you call them. And then there can be as many pointers as you like, as long as they are not in the execution after where the owner was. This is a fairly simple model and it can be enforced. Here's an example. I have a function here that has a local variable and I call g with the address of it, and I g then stores it in a global. This has to be stopped. We have to analyze it and say that it's not okay to take something that comes as an argument and store it in something global because you don't know if it is valid. After you execute, you exit. On the other hand, things that we get from pointers actually pretty good, because under this rule they will be valid when they come into the function and they will be valid until we leave the function again, because they came from outside the scope. And what do we got here? Remember, containers, pointers, or anything. The points can be in containers. So I took here a vector of integers. Rest was something, and that is okay. We create collections of pointers, iterators, wherever we call them, and give them some functions to be called, and then the function comes back again, everything is fine. Because the fact that the pointers were valid in the calling code f hereda means that it's valid in the cold code. We're fine. We are not actually having to eliminate all kinds of stuff. This works. On the other hand, it is not okay to return that rest because there are pointers to local elements in it that will go out of scope, that will become invalid when we leave. Invalidation can be a serious problem, so we need to eliminate it. Sorry. Here we have a traditional vector use pushback of nine that can relocate all the elements. Fine. That's usually fine, but G uses it in a not good way. It starts by grabbing a pointer to the first element, an iterator to the first element, then it calls the function, it does the pushback, and then it tries to use the pointer it kept. I mean, this is the reason for using reserve and for using linked containers like a list, because things don't relocate. We can however detect that this piece of code is bad. It has been done. So what we actually do, we have the idea of invalidated. When the pointer points to an element of a container that may have relocated the elements that called invalidated, it may or may not be valid, and then we can handle it by a fairly simple rule. And no, a const member function doesn't invalidate because it would have to do something nasty to the const. It would have to use a cast or something like that. So also it's easily validated by an analyzer. On the other hand, we must assume that any non cont member function could invalidate. It has access to the data and it's relocate, it can move things around. I would think that we need a new annotation in the core guidelines and in the profiles, which basically says, yeah, this one doesn't invalidate, even though it is non const standard example, vector operator subscript. It doesn't reallocate, but it can't be const, so it's also easily validated. So that is a provable thing. So it is perfectly safe and easy to do. So represent ownership. We use ownership abstractions as usual. Stay high level, don't mess with the low level things if you don't. But we always have to consider how do we deal with code that's not written according to our rules, like thing, like code that has C style interfaces with lots of pointers in them. And so we have an annotation in the core guidelines, primarily to deal with the fact that the world isn't that simple. And we don't own it all. And that's called owner. I can say this point as an owner, I'm passing it over to you, and I know it's your job to delete this thing. It can be done. So basically there's been a fair amount of work done with this, and there's some rules about how you can copy owners and not you can deduce them from what is safe. So I'm not going to go through them here. If you're looking at the slideshow so you can freeze this and see what this is. So now I want to get to future stuff. Where do we go to here? We can write good c. I hope I made a reasonable argument for that. And too many developers, don't they write C C, this mythical language, guidelines, not enough. We need guarantees, partly because we make mistakes and partly because we are more comfortable if we have a real proof. And we need new standards for what the static analyzers, what the compilers analyze. And so we have some alternatives of how we can proceed from here to get the guarantees we can change c. We can start using another language, which is of course what proponents of all other new languages are suggesting, and we can enforce a variety of guidelines. That's the profiles I'm arguing for. So fixing c, how there is so many different or an incomparable ideas about how to change c, and to change c in ways that addresses the problems I've been talking about. So it will not look like c, and there are several suggestions. So we have years of delay and chaos, and a single cleanup language will provide a single kind of safety, unless of course it adopts the profile approach. And that would not be what a lot of people talk about. And the clean up C will have to interoperate with classical C code forever. That is just the fact. These billions of lines of code are not going to disappear. And much of it is critical and much of it is really high quality. So gradual adoption is essential, partial adoption is essential. And, well, for C, I have had it's meant to evolve, but also meant to evolve in a reasonable, compatible way. If you accept that last statement, it won't be fixed in the way people talk about it. And then we can try using another language. Safety is used as an argument, often C C, and very typically ignoring C's strength and very often ignoring weaknesses of alternatives. Often the safe dimensions is just memory safety. That's not enough, and the need for unsafe construct is not heavily featured. And the need to interoperate with other languages including C and C tend not to be mentioned, and the cost of conversion can be ferocious. That's rarely mentioned. And of course this is natural, this is human nature. You argue for your case and you over estimate the virtues of what you have and underestimate strength of the opposition. But still, we have to be more serious about how we have these arguments, and we need to get some numbers. And anyway, which other language, the way I have seen it argued, we were going to have c replaced by about seven different languages as of suggestions of about now. By the time it happens to 40 years from now, will probably have 20 different ones and they have to interoperate. This is going to be difficult. Anyway. Let's look. There's new languages, new resources, new expert users. Every new language, of course, is claimed to be simpler, cleaner, safer and more productive than C. On the other hand, we heard that one from Java at the time. I said, yeah, if it survives, it will become three times bigger and be better for it. I was right about that. You can apply that argument to most new languages and often the claim superiority is in a limited domain, and again, often compared to c. C. And one thing I have noticed, when you have a new language, this happened with C also. You get a bunch of entrusts writing code. They are just much better than average. They are much more entrustiastic, they are better informed, they know all the latest stuff. And then you come up against the law of large numbers. When you have a language that is used at a large scale, your developers are average quality, average enthusiasm, etcetera. Which means that number is comparing a small new community with a large older community is going to be skewed. Let's see, I was thinking about what it would take to convert something. So consider converting a 10 million line system. There's lots of those. It needs high reliability and high performance, because if it wasn't, why would you bother converting it? For safety reasons. And such a good developer completes n lines of production quality code a day. I don't know what n is, it depends on what kind of code we're talking about. But for the kind of critical software we are talking about now, maybe 510, 100, let's say 2000 lines a year, that's not too far off. And let's assume that a 10 million line system can be written in the new language in a new way, from scratch, in 5 million lines, half the size. This is sort of very optimistic, but if this is true, it'll take 500 developers five years to complete the new system. And the old system will have to be maintained for those five years till you can replace it. And so what is the loaded salary of a good developer? That's the cost of the salary. Pension funds, buildings, heating, cooling, computing, all of this kind of stuff. Let's say half a million in the US, it's not that far off. We can argue about 100,000 in each way, but this thing. So it would cost roughly a billion added cost over those five years. And for 1 million you can divide, for 100 million you can multiply. And I made the assumption of us developers, if you can find developers with relevant experience in the new language, that's often a problem. And maybe outsourcing could cut cost. That doesn't always happen when you outsource, and it has its own problems, especially when you start talking security and such. It's not obvious how you can cut that cost dramatically. But anyway, I said that I assumed the new system would halve the size of the old one. It doesn't happen very, very often, but it certainly is possible if you have better understanding of the problem, better language, better tooling, et cetera, such. And there will be no feature creep for five years. This is hard. And the new system would be. Would work and be delivered on time. What is going on with this stupid system? Oh, I see what went on with that system. I am not seeing what you are seeing now. I am. Okay, so there are, of course, people for whom a billion dollars or $100 million and not a lot of money. So let's not just dismiss the whole thing for that reason. There's people who think that a 10 million line system is medium. I did some asking around here and there. And so I think these kind of numbers are an argument for an incremental and evolutionary approach, as opposed to just going to something brand new. And obviously that could be c, or a combination of C and other languages that are able to interoperate smoothly with C, and you can have a community of both things. I think it's better to stay with C anyway. C will play a major role for decades to come, and we have to keep working on it. Standards committee has to focus and make C better. And then I was looking up something I knew to find a quote of people like quotes. So there's something that's called Gale's law. A complex system that works is invariably found to have evolved from a simple system that worked. In other words, this idea of just building the new system over on the side without any of the problems of the old one, is a fantasy. But it's a very popular fantasy. I agree with that last statement down there, there. So profiles, how can we guarantee safety with standard C code guidelines are not enough and the profile summary here is that everything is ISoC standard. The fundamental guarantees is the most fundamental guarantee that we want is complete type and resource safety. Ownership has to constitute a dag, otherwise the model breaks down. If you have loops and such, you can handle loops with say shared pointers and weak pointers, but to easy proof, stick to a dag. Pointers dealt with. As I said before, gradual conversion offering guarantees has to be supported. The set of guarantees are open. I do not know whatever everybody wants for guarantees. I can see somebody wanting an unsafe guarantee where they can use all the unsafe features in the latest optimizer. In other words, it's an open set and we want some fundamental guarantees to be standard. I'm imagining type and resource safety, memory safety, range safety, arithmetic safety, things like that could be standardized. Nothing's easy to standardize, but this is possible. And the set of guarantees assumed are stated in code. Otherwise we don't know which kind of analysis to apply to our code and we don't know if we're users what kind of analysis the worse applied for that code. And there's many notions of safety. I've said this before, but I think it's worth showing the list again because people tend to forget it and focus on the thing that they're looking at just now in their everyday problems. And here's the type and resource safety definition. Basically every object is accessed according to the type with which it was defined. Boom. That's a nice one. And we don't have any leaks. And so what about C? C? Well, it's not a language, but I think it's a style of usage that I happen not to like. It's probably also unavoidable because you have to call C libraries and then we have to think about how to core things with all the guarantees. The owner annotation and not null checking is examples of that. We can't just focus on making the new shiny version of everything. It's too complex for static analysis. If we can write arbitrary code, we are up against dynamic linking cost and the holding problem. Arbitrary C focuses to deal with low level abstraction. So we have to raise the abstraction level to the point where analysis is possible and understanding by humans. And we care about performance and type and resource safety. So the idea is guarantees with static analysis have coding rules to make sure that the analysis is feasible and have libraries to make it reasonable to write such code and the profiles is a coherent set of guarantees, not just a set of unrelated tests. And the profile is then specified as a set of guarantees and then implemented with a set of rules that yields the guarantee that we want it. And I've had people saying this is just too novel, it's too complicated, it's too new. What we want is something simple and new. This shiny new language that you can define in 45 pages, just like the first version of C. I don't believe so. There's a vote there. You're always supposed to not do something new, you're supposed to just put some kind of facade on something old. I don't believe this is feasible. I think we have to have an overall approach, a general approach. Each individual techniques and features has been tried before, many times, and that's good, but for specific tasks, and they are not, none of those solves all the problems. So a combined and coherent approach is necessary. And profiles has to go beyond guidelines because we want it validated and we have to deal with, provide code annotations and we have to deal with mixing of profiles, different sets of guarantees. In a large system you can't imagine a million lines of code that all follows exactly the same rule. And if you can, I'll just up the number to 10 million. So we mix profiles and I think we have to find the tightest specification of when we can use the non validated, non guaranteed features. We can't just say this, these hundred thousand lines are not checked. That's not good enough. It is good to have maybe the implementation of the linked list. Two pages of code have that specified. So there's some design work going here, still under development. I'm suggesting that you have module based controls, memory safety, type safety is suggested and encode controls, suppress type safety in this area and enforce type safety in this area. I could have a piece of code that is just too complicated or too difficult to convert, so I can't do it. But I can say for this section here I would like the analysis to be done and suggest standard profiles, type safety range arithmetic would be a good initial set. And here's a summary of what it would take to do this in terms of syntax. And this is work in progress. There's papers written about it and you can go and look it up, there's also talks about it. And basically no, we're not here yet. We've come a long, long way from classic C and from CV classes and from C eleven. And we have to first of all get people who are not up to date with c to become up to date. It does not mean using every new feature exclusively and using every neat little new thing. No, it means defining what you want and seeing the simplest way of getting it. The core guidelines are an attempt to that, and we need to start standardizing profiles as described. And of course this is something that can't be done by one or two people. Maybe it could in five years, but I don't think we have five years. So how can you help? How do we refine profiles? What profiles do we need? How do we best formalize the specification of a profile, and what can we do? Now I'm dreaming of something like profiles lite that provides most of the guarantee of a profile, but can't do all the last things because, say, the static analyzer isn't up to it yet. And what library components can we simplify to use? I've been using a din array that doesn't have pushback so that I can use it in a concurrent system without worrying that somebody does a pushback. Or on the other thread, maybe we need more of that. And I'm setting up GitHub where people can put suggestions and where I'm going to put my drafts and such, so that we can create a community working on getting this kind of stuff done in a reasonable time. The GitHub is not live, but should become live this afternoon. Okay, thank you. Do we have questions? Do we have time for questions? Or have I run over so badly as we can't? If you have a question, please come up to the microphone. There's one over there also. Hey, thanks a lot. So you mentioned, for example, vector without pushback in certain profiles. Do you think that a profile will eliminate member functions in some cases, or just make those member functions behave differently? I don't quite understand the question, sorry. You mentioned that in, in a certain context you might want a vector class that doesn't have pushback. Yes. So there could be a profile that eliminates certain member functions. That's one way of doing it. That's not the way I imagined doing it because, well, I did it some other way because I couldn't actually mess with our standard implementation of the vector. So for instance, my students get a range check vector. I can't do that without messing with the implementation, and there are several implementations, so I don't do that. Similarly, what I did in case of the invalidation was I used another class, which I called din array, which I believe you can find in the GSL, which simply didn't have that operation, that's the other way of doing it. And if you are not the standards committee or a vendor of a standard library, that's the way you must do it. And so I did. Okay, question on immutable data. You said prefer immutable data over mutable data. At the same time, famously the string class and C is mutable. And I saw quite, you know, some discussion about immutable string class, but it didn't really catch on. I hear a lot about immutable string classes, but you know, I like mutating some strings some of the time, and I have not seen a coherent argument for why it's better to have it defaulted immutable. What are the benefits? What are the problems? And in the absence of such a suggestion, we're not going to get one. Furthermore, there's so much code out there, so it would almost certainly be an example of a different thing that was immutable. But somebody has to do a thorough, logical, and hopefully data argument for why immutable strings are good. Is it an issue of correctness? Is it an issue of performance, all of that? I haven't seen that kind of argument. Thank you. I noticed a lot of the various suggestions you're making are enforced by static analyzers or other tooling separately from the compiler. One of the issues I've observed when setting up new C projects on my own or at work is getting additional tooling into the build chain, into the CI, takes effort, and is not trivial effort at production scale. Do you have any suggestions on how to make that process easier, of how to make tooling integration simpler? There are a couple of problems related to tool chains, and yes, it's hard to get something new into the tool chains. And the usual problem with CDH is that we have many of something we never just. It's not that C doesn't have a graphic system, it has many graphic systems. We have many build systems, we have many static analyzing systems. So what I am hoping is that the profiles will encourage people to have to build static analyzers that supports a specific profile, and that the build system, the compilation system, sees that this is required and uses the installed static analyzer. Now, a lot of what I'm doing can actually be put into the compiler itself, because a compiler is a static analyzer, and these days quite sophisticated static analyzer. The simplest would be the no uninitialized data rule, which they are already doing more complicated stuff to detect whether it's been initialized or not. But I think the idea of profile annotations should help with that problem. Thanks. So it seems that the new improvement on safety and c are adding or addition to the language, new annotations, new types. Do you think there will be a time where the default is actually the safety ones and unsafety one? You actually need addition if you want to maintain compatibility, you have to work through extensions and so you cannot effectively eliminate features from the language. We tried several times with deprecations and such, and even incompatible changes have caused a lot of grief. So I'm not going that way. What I actually am going for is to simplify the use of the language through the guidelines. And then if you follow the guidelines, you have a simpler language to deal with. You don't actually have to know the horrors of pointers to pointers to void because the analysis will stop you from writing one. And so extend the language, make a simpler subset of the language that you can enforce. Thank you. Hi, you briefly mentioned about you wanted to add write only when you were talking about Const as the language feature. You mentioned that it, it will create a better interface or something like that. So what were you thinking of a use case of writeonly and then is there alternative we can use or. Yeah, anything about write only case. I'm afraid I didn't understand that. Oh, so when you were mentioning about the const, you talked about read only and write only, and then it would have created a better interface if there was write only like Const. The const example was just an example of how from the very earliest days, we have tried to create a safer language, a language where it's harder to make mistakes and easier to understand. And I just threw in a historical fact that I didn't just actually have const, which is read only, I also had write only, but I couldn't get it. Okay. All right, thank you, Will. So do you think like sometimes you might want to express that the function is thread safe or that it does I o operations? You think maybe under the hood you could use tag dispatching to implement the profile you were talking about? I have not thought about this. So this is an off the cough comment. But yes, I could imagine there being a thread safe annotation that would, that may be part of a profile. Profiles will invariably partly be built out of other profiles. Currently in the core guidelines we have memory safety and type safety. I think there's three of them that would add up to type and resource safety. And I would imagine that a thread safe thread would be part of a profile. What I really would not like to see is a total sort of 50 or 100 things you can request, because then you got chaos. And if you go and see the current static analyzers, they are very prone to giving you a free choice. And that very often doesn't add up to a coherent set of guarantees. So yes, it would be a candidate for something, but I'm not quite sure what. Do you think that if we had compiled and reflection as part of the language, we will be able to implement most of the static analysis checks by using concepts or concepts for functions instead of external analyzers? No, I don't think so. It addresses. I would like to see static reflection, but I think it addresses different sets of problems. Yeah, thank you for the talk. I think there is a good example when you're in a controlled environment, when you are the author of the code. But what if, how, what do you think? How you can improve the safety when you're dealing with the libraries that you know, c, c post libraries that passing your pointers, what can be done? That is the problem which I refer to as mixing profiles. You can, you can annotate the import or the include with something about what assumptions you may make about what you're importing. And that's about all you can do. You can handle subsetting of profiles, arbitrary, you can subsetting and disjointed, but sort of arbitrary sets of guarantees are just too hard to mechanically deal with. And so we'll have to have some annotations and have some humans actually try and understand and see what they're dealing with. And the problem will not go away for decades. Basically, if you have ranges, we have to know it's a problem. We have to try and deal with it as best we can, but we can't eliminate it. Thank you. Hi. So I noticed that you mentioned concept in your talk, and we try to use it in our libraries at work. I guess the problem we have is that it's making the code become more complex and understanding for other people when they try to maintain apply risk. And we haven't found any actual benefit of using concept. So I wonder if you can elaborate on the benefits of it and some resource for guidelines to use it correctly. Sure. I mean, I have used it successfully in code that are not sort of computer sciences, especially communication software, and we found it very helpful. One thing to avoid is people using requires clauses, as requires, requires and sort of low level uses of guarantees. That creates a filthy mess. I don't know if that's what you did, but that's one thing I have seen failing. People think they're using modules. They're not. They're using the assembly code for creating modules. And, well, it's as successful as assembler, and it takes some experience to build proper concepts. I've seen people try and build concepts on the idea that they have to be the absolute minimal concept constraints for everything. Again, you get far too many concepts and you can't remember them. When I write about concepts, speaks about concepts, I say concepts has to be designed to encourage composition. That is, they have to be sufficiently high level that a concept can be used in many places, in many algorithms. So if people build a less than concept and somebody else uses a less than an equals concept, you are on the way to something chaotic. Whereas if you build an ordering concept as there is in the standard, then with all the four operations, you are on the way to something better. I mean, it may be that your problem area has something that we haven't addressed, but in most cases I have seen problems like that. It has been immaturity of use. It's like when people start writing code, they don't write good code, they make beginners mistakes. And I hope you don't think I'm rude when I suggest that that might be a possibility. If you analyze your stuff and see if you can be precise about what it was. That didn't seem to work. I would like to hear about it. Yeah, thank you. And I think this is it. Thank you. It.