Graceful Shutdown
Let's picture a very realistic scenario. You're in the middle of processing a critical payment transaction and suddenly your server needs to restart for a deployment. Someone pushed something to the production environment and your server needs to deploy itself. And of course, we have uh techniques like zero downtime deployment etc. which makes sure that our existing server does
not go down before our new server, the server with our new code comes up and is ready to receive traffic. Those mechanisms are there. But at some point when our new server is ready to go online, is ready to receive traffic, our old server has to shut down. It has to stop receiving traffic and the transition will happen to the new server. During that moment we are talking about that part that critical
moment and you are in the middle of a transaction. Let's say you are in the middle of a e-commerce transaction you are buying something from Amazon or Flipkart and Amazon server or Flipkart server needs to restart for some kind of deployment. Now the question is what exactly happens to that payment and does it get lost in the digital world or does the customer you do you get charged twice because of some kind of race condition. All these scenarios you have
to think about as a backend engineer and this is not a new problem. This problem has been around since the start of servers and backends. And of course we already have a solution and that solution is called graceful shutdown and it is exactly as the name sounds. We want to stop our server gracefully. We don't want to stop abruptly suddenly. That is the whole idea and of course there are some surrounding concepts some
surrounding concepts to understand so that you have your foundations clear why we are doing this etc. So if we want to oversimplify what graceful shutdown means for our server, it we can basically say that we want to teach our server, we want to teach our back end good manners in the sense cannot stop abruptly when it is between a transition into a new deployment. It has to perform some kind of steps, some kind of good manner in the sense it cannot just slam the door when it is time to leave, when it is time to shut down. Instead, your back end, your application, it politely
finishes whatever it is doing. It finishes its ongoing conversations, says goodbye to all the guests and cleans up after itself, then finally closes the door. That's what we mean by good manners. When you have guests over, when you're talking to them, when it's 9:00, when it's time to go to sleep, you don't just push your guests out of the door and slam the door on their face. There are some steps you have to perform. So it's similar in that way. And in this video, we'll talk about the art and the
science of making your backend applications as good manner, as efficient as possible. And worrying about graceful shutdown will give your application a very good user experience and also avoid issues like data corruption like we just talked about. If you are in between a transaction some kind of payment transaction then we can avoid issues like double charging the customer and the transaction getting lost and processing refunds etc. Let's
start with the first concept which is process life cycle management. Now we are talking about this because your back end is an application which will run as a process in some kind of server in some kind of computer. So every application runs in a server within a process. This is an important thing. It runs within a process. If you are familiar with operating system concepts, then uh it will make sense to you. Otherwise, not a
big deal. You can just learn the term which is called process. Whatever you run, everything that runs in an operating system runs as a process. And like all the living things, every process has a life cycle. when it starts, how it starts, when it ends, and how it ends. So in a way they're born when the process starts and they live while the process is executing and they die when the process is terminated. This
whole thing is called the life cycle of a process. Now for implementing or understanding graceful shutdown, we have to understand this process life cycle because that is very closely connected to how graceful shutdown is implemented. So this is our OS, the operating system where our process is running, our application is running. When your operating system decides that it's time for your application to stop running, it does not just pull the plug, right? It does not just kill the process. It
follows a established protocol, an established protocol of communication. how to communicate with the process to say that it's time for you to stop and we are going to follow these these these steps and then we are going to stop it. So you can imagine it like a conversation between your operating system and your application which is running inside a process in your operating system. To simplify it, your operating system sends a message in the conversation like hey it's time for you to stop and your application returns a
message that okay give me a few minutes or give me a few seconds realistically speaking then I'll stop myself or you can stop me and of course this conversation does not happen through text we are talking about programs here so they don't understand text and this whole communication between your application and your operating system. This whole communication this happens through a concept called signals. Now
signals are an important concept in Unix operating system. When I say Unix operating system, we mean all the Linux operating system whatever you're familiar with the Arch Linux, Ubuntu etc. and also Macs. Mac has also originated from a Unix core a Unix kernel and mostly when we are talking about servers we mean Linux only because 99% of the time you'll see uh whenever we deploy our application in some kind of server some kind of cloud provider it
usually selects a Linux operating system and deploys your application there you'll never see Windows except for some specialized use cases like Windows server etc mostly we use Linux based operating systems for deploying our servers. Now, Unix operating systems they have this concept called signals which is used for IPC. Again, if you are a computer science students, you'll know this term is called interprocess communication. Simply speaking, this is a technique using which two processes can communicate with each other with
some established protocol which you don't have to worry about. Now, the way it works is you have your application here, right? It is running inside a process. This is a process. Your application is running inside a process and it registers some handlers. And what do we mean by handlers? Handlers are basically uh you can imagine some kind of code which waits and which is continuously running behind the scenes and it is waiting for a waiting for some kind of communication some kind of
signal from your application for these signals right these operating system concepts of communicating between two processes these are called signals and your application registers creates some kind of handlers that detects Whenever these signals come and then it does something and we'll talk about what it does exactly and these handlers they basically say that they're basically telling to your operating system that when you want me to stop send me this
specific message. Of course you cannot say that just stop. That does not work. It is text which is just human readable. So it has to be some kind of message. We'll talk about what kind of message it is. that your handlers your application using or through its handler saying that whenever you want me to stop you have to send me this specific message and I'll handle it appropriately. I'll handle I'll stop myself using predefined protocols using predefined steps. Let's talk about these signals. What do I mean? There are two major types of
signals we'll talk about but one is sig term and the second one is sig kill this part this sig this basically means signal this also means signal and these are the commands you can say one is for termination one is for cl we'll soon talk about what is the difference between them now let's talk about the first one which is sig term now as I said sig the first part means obviously signal and the second part means terminate terminate Now sick term signal
which means terminate. It is a polite way for your operating system to ask your application to shut down. It is not an extreme way. It is just a some kind of nudge. So imagine you are standing and someone just comes from behind and just pokes at your shoulder like hey you can imagine the sigtom signal something like that. It is a very gentle nudge and it means that whenever your operating system sends a s term signal to your
application it means that hey excuse me could you please finish up and leave right it is a very gentle request and because of that your application your back end has an opportunity to complete whatever it is already doing right it it it does not have to leave that moment only it has some kind of window we'll talk about what kind of window it has some kind window a few seconds to complete whatever it is doing. Now what it might be doing? So since we are talking about back end an HTTP back end
it might be processing some requests that is the primary thing your back end does your client your front end whatever your client have some kind of app some kind of web app some kind of chrome extension whatever your client is it sends HTTP requests to your back end and your back end processes those and returns some kind of response at a random point of time your back end is probably processing let's say 10 or 12 requests And if your application is big enough, let's say it is processing around 500 or 600 requests concurrently
at one point of time. So when it gets a sdum signal, it's time for it to finish processing those requests. Let's write it down whatever it does. First, finish existing requests. That's the first thing. Second, clean up resources. We'll talk about what does cleanup actually mean for your back end. And the third is exit. Now we'll talk about we'll go deep into one and two just a bit after we talk about both summ. But for now let's
just keep it very high level. Now we already understood that sik term is a very gentle request from your operating system to your application to your back end to finish up whatever it is doing and clean up resources and leave gracefully. So who exactly uses this kind of signal? Now this signal is mostly used by deployment systems or process managers or orchestration platforms like Kubernetes etc. Basically any kind of system that you have established for managing your process.
It can be a Kubernetes or something like systemd or or PM2 if you are familiar with PM2 process manager. So these things these systems these tools they use this signal which is sig term to properly let your application finish whatever it is doing and clean up and leave gracefully. Second I forgot about another important signal which is sig int. We have the kind of signal which is second as I said the first part this is
called signal and this is for this case is an interrupt an interrupt. Now when does this happen? If you are a developer you probably already using this which is the most famous use case of this signal is control + c. If you have worked with any kind of command line applications, any kind of terminal based applications then you have most probably used this uh some kind of process is running some kind of task is running and if you want to abruptly close it at that moment you
can press Ctrl C in your keyboard and that process is instantly stopped. Say for example here I am running a back end a backend process in my local of course it is a go based back end and this is currently running it is ready to accept request and it is serving any kind of client that is sending request to it now if I want to close it what do I do I just press control and c in my keyboard and here you can see that we have of course implemented graceful shutdown for this application that's why we are
logging whatever is happening in our app but For now, let's focus on this part where it says that a signal has been received and the signal type is an interrupt signal. As we saw, this requires a user a developer to press some key in this case control C using a keyboard. This is mostly used during development environments. We as developers mostly use it process to process communication. when process to
process communication is happening they normally don't use it since a sig interrupt based signal uh requires some kind of key press control C. So it is also called a user initiated shutdown and in pretty much all the cases you want to handle the signal sig int the same way you are handling sig turn. If you think about it, it makes sense. It does not matter whether your process, whether your back end is running in a development environment in your local and you want to stop it using Ctrl C
versus if it is running in your cloud provider in let's say it is running inside an AWS EC2 instance and you are using a process manager called PM2 is a process manager and you have deployed your back end in a EC2 instance. So when there is time to stop your application, you'll be using PM2 in some kind of programmatic way or manually. And PM2 will send a signal to your back end which will be SIG term signal. As I
said, sig int signal will be used by developers because it requires a key press. Sig term signal will be used by programs. In both these cases, we will want to handle our shutdown process the same way. It does not really matter if a human is initiating it or a program is initiating it. The what matters is the intention we want to shut down. So we want to shut down in a clean way, a graceful way. Now let's talk about the final thing which is let's write it in
red since it is a kill signal which is sig kill. As I said the first part means signal and the second part is the actual command which is a kill command and it is exactly as it sounds like we want to instantly kill the application. The interesting thing about this signal is this cannot be caught or it cannot also be ignored which means that in our application we cannot register some kind of handler and those handlers will be able to do some kind of tasks. they will be able to do some kind of cleanup etc etc when they receive s kill signal that
will not happen because this particular kind of signal it cannot be detected by our application our application is not given that kind of permission right that kind of capability or also it cannot be ignored which means that you cannot say since I was not able to detect it which basically means that I'll just ignore it we don't have to really stop that also does not happen. So if your application is sent a kill signal, it will not be able to detect it and it has to stop at that particular moment. Nothing else
happens. It just stops. That's why it is called a kill signal. You can imagine it as a nuclear option. This is the equivalent of instead of doing this, clicking on your system icon and clicking on shutdown, you what you do, you just go to your power plug and you just pull the plug. That's it. Your computer just dies. That is exactly how signal works. And this is why graceful shutdown is an important concept. If you don't respond, if you don't respect the
polite signals and what do I mean by polite signals? Sit term and sing int. These are the polite signals which let you to finish whatever you are doing which let you to clean up and which let you to gracefully exit. If you don't respect those signals then eventually of course you will receive a kill signal. So you will have to stop and you don't even get the opportunity to clean up after yourself. Now that we have a very high level idea about uh three different kinds of signals the polite signals, the impolite signals and what exactly
happens. Let's talk about two important things here that we briefly talked about. The first one is what does it mean by finishing existing requests and what does it mean by cleaning up resources. Right? These are the two important steps which happens during a graceful shutdown process. The first important part of a graceful shutdown which is stopping on the-fly request. What do we mean by on the-fly requests? Your HTTP server your back end it is able to cater to or it is able to process multiple requests at the same
time or concurrently basically. So when it is time to stop your back end, when it is time to stop your server, at that point it is possible that your back end is already processing a couple of requests. Now it can be 10, 12 or it can be hundreds or it can be thousands depending on the scale of your server. And that's what you mean by on the-ly requests. The requests which are already getting processed by your server at a particular time. Now to understand this, you can imagine a restaurant. Let's say
you have gone to a restaurant with your friends and for some reason the restaurant has to close whether it is time for it to close it is the closing time let's say something like 10:30 11 in the night or some other reason but the point is the restaurant has to close what exactly happens the the restaurant owners they cannot just come up and turn all the lights off and they cannot just throw you out of the restaurant right not a good idea instead what happens the restaurant owners they will Ask someone at the reception or at the gate to stop
allowing new customers, right? Stop allowing new people to get into the restaurant first. That's the first step that happens. You don't want new customers that you have to deal with that you have to say no to. First thing that you have to take care of. Second thing is they will announce to all the existing customers, all the existing uh people in the restaurant who are already having their meal. They'll announce to them that it is time for us to close up. You have let's say 15 minutes, 20 minutes so that you can finish your
meal. Take your time. 15 20 minutes should be more than enough to finish whatever you're having and then please leave the restaurant, pay your bills, of course, pay your bills and pay your tips and leave the restaurant. That's exactly how it plays out in this situation. Now imagine the same situation for our back end for our application and we call that process as connection draining which means that when your application receives a shutdown signal. Now it can
be sig term sent from some kind of process or it can be a sig int sent from a control c based process from some kind of developer. Right? The first thing that it has to do is stop accepting new connections. As I said, the first thing that you have to do is stop letting new customers get into the restaurant. Same way, the first thing your application has to do is stop accepting new connections, which makes the whole situations more messy, more difficult to
deal with. That's the first thing. stop accepting new connections, new requests from any kind of client so that we can deal with the existing connections, the existing requests that are already being processed by our server and we let them finish as soon as possible. Now the implementation of connection draining this will obviously be different depending on the architecture of the application that we're dealing with. So for example for HTTP servers for our back end the core back end the HTTP back
end this will mean that it has to stop accepting new HTTP requests from any kind of client and allow inflight requests on the fly requests the existing requests to finish to complete. The same way if we are talking about a database application now as we have already discussed in the previous videos database is also a backend it can be imagined as a back end. It is a back end just not in the sense that we talk about when we are talking about an HTTP back end but it is still a back end. It is it
is still an application. Now for a database based application what it has to do it has to finish all the existing queries or all the existing transactions and it has to stop taking new transaction or taking uh new queries into execution before it closes the connection. Same way for a websocket based connections it has to first notify the clients that it is closing then close the socket. It cannot just close the socket abruptly. So depending on the
application architecture HTTP back end database websocket the the technical steps of implementing graceful shutdown will be different but the highle idea is the same. Stop accepting new connections. Stop accepting new requests. finish the existing ones and then close the connection. Three-step process. Now, the problem here is the challenge with connection draining is the timing of it because you want to give the existing connections enough
time to complete their work but you cannot really wait as long as they need. There should be some kind of limit. So most production systems, most backend systems, they will implement some kind of timeout mechanism will have some kind of timeout. For example, 30 seconds or let's say 60 seconds. It depends on you. Mostly the common thing is 30 seconds. And this will be the maximum duration your system will wait for you. After
that, we'll just stop, right? We'll give you 30 seconds to finish whatever requests that you are processing. And most of the time, it is fine if you're not accepting new requests. If you're not accepting new connections, then 30 seconds should be more than enough to finish up all the existing requests. But if for some reason for some kind of blocking operation, you cannot finish within this window, then we'll be forcefully stopped. We do have a backup plan. We cannot just wait for our system for our back end to process all the requests and by taking as long as it
needs, right? We cannot let that happen. There has to be some kind of limit and the timeout is that hard limit. you have exactly this amount of time to finish whatever you are doing. Now choosing this timeout also creates a very interesting design consideration because the question is how long exactly should you wait. If it is too short then you risk interrupting actual legitimate operations. But if it is too long your whole shutdown process it becomes very
sluggish and it it eventually impacts your deployment speed your system responsiveness. So the time the amount of timeout it depends on your application's typical request duration and your operational requirements. So this is not a hard and fast rule 30 seconds 60 seconds you have to understand your systems the kind of requests that you are processing if it is normal if you are talking about traditional normal back end then 30
seconds should be more than enough but we are talking about websockets or if we are talking about other more complicated architectures then you have to understand your system and decide a timeout which suits you and your system accordingly. This whole connection training also requires some kind of coordination between your load balancers and your service discovery systems which means that it also has to work with your health check systems and registering and dregistering with your service discovery. Now these are a little advanced stuff. Service discovery basically means that if you have
deployed a set of applications let's say your backend your database your elastic search instance after deployment how they connect with each other how they communicate with each other this part of the workflow is the responsibility of your service discovery tool service discovery mechanism anyway moving on the second thing that we talked about is cleanup or resource cleanup if you are working on your desk You have a working area and it is time for you to leave the house or it is time for you to go to
sleep. Before you leave your desk, you do some kind of clean up, right? If you have been having your coffee, you take out the cup and put it in the sink or if you manage some of our cables, etc., etc., like we all have some kind of tiny cleanup stuff before we leave our desks. Same in the context of our backend application. When we say resources, it means that things like file handles or network connections or database connections or temporary files or caches or any other system resources that the
application had acquired during its execution, it has to let go of those. For example, uh when your application was running, when your back end was running, you tried to access a particular location in your file system. And the way file system access works with back end with programs is you have to send a signal to your operating system. It provides you a handle to that file. There is a protocol the way a process a backend application can access the underlying file system. But speaking at a very high level, you get a kind of
handle to your file system which you have to let go which you have to clean up at some point otherwise that handle will keep running and you will acquire more and more and more memory which basically means that you'll keep eating your RAM random access memory and at some point you will run out of it. Right? So cleaning up after your file handles is important. But the most common kind of resource cleanup that we'll see is cleaning up after a network connections. The our operating system or
OS which is the mediator. Let's say you have your application here or your back end here and this is the internet. You receive all kind of requests but all the requests they go through your operating system before it reaches your application. Your operating system which is the actual driver which receives all the requests from your network card and provides you right. So obviously it it will have all the knowledge about all the network connections that you have and typically operating systems they
limit the number of file handles as I already said and also network connections a particular process can have open simultaneously. The same way if you don't clean up your file handles you will run out of memory. The same way network connections also if you don't give up a network connection after dealing with it then eventually you will run out of memory or you will face some kind of performance issues. If we are talking about database connections before your application shuts down or before your backend process is shut down
the database transactions that your backend was dealing with they either have to be committed or rolled back explicitly by your application. If you don't then those transactions might get into an inconsistent state which might lead to things like deadlocks or data corruption and all different kinds of issues. Right? These these are things that we mean when we say resource cleanup file handles, database connections, network connections etc. And one thing to keep in mind is when we
are cleaning up after our resources in our whole graceful shutdown workflow, we want to clean up resources in the reverse order of the way we acquired them. So let's say here you established a radius connection, then you established your DB connection and whatever. So when you are giving up your resources, you should go the reverse order. And why do we need to do this? So that we can prevent situations where we are not cleaning up a resource. We are not cleaning up an operation which
depends on the previous operation. Right? That's the reason we clean up the resources in the reverse order of the way we acquired. Right? That is another thing that you have to keep in mind when you are implementing your graceful shutdown. I know that we typically avoid looking at code in this particular series. But just to give you some kind of context, you don't have to understand this code. I just want to show a pretty practical and realistic example so that we don't have just a very hollow understanding of the graceful shutdown procedure. Now we'll quickly just go
through the whole thing that we just discussed. But this time we'll just look at the code. Now as I said the first thing that you do is we want to register some kind of handler which will wait for some kind of signals. So this is what we are doing. Again this is a golang code. You don't have to understand it. You just follow the narrative and that's more than enough. Now here at this step we are registering a handler during using some kind of context which is a golang concept but we are waiting for interrupts from our operating system. And if we receive an interrupt we call
this function which is a shutdown function which is a graceful shutdown function. And what do we mean by gracefully shutdown? Let's go inside this. What do we see first? What we are doing? We are shutting down our HTTP server. Since this is a back end application, the core part of this is the HTTP engine. The first thing that we're doing is we are calling a method which is provided by our frameworks. Typically, whatever framework, whatever library that you are using for your HTTP server, they provide you with some kind of function so that when you call it, they internally the library internally
stops receiving additional connections and cleans up and finishes whatever the existing connection it has. Then this function finishes. Then what we doing? we are closing our database. Same way the database also it stops receiving additional queries, additional transactions and it finishes the existing ones and it gives up whatever handle it has, whatever database connections it has. As you already know the way our backend and our database connect is through TCP through TCP connections. You can imagine this is our
app, this is our back end and this is our database right kind of database. The way these can connect is through TCP. This is the TCP protocol. So for our back end to connect to our database, it has to have a TCP connection which is an active connection running. That's the only way our back end can connect to our database. It can communicate with our database. Of course when we are talking about pooling database pooling we have a number of these connections active with
our database and we use a particular connection from this pool of connections to communicate with our database. When we want to close the database connections the database first has to stop accepting new queries using these connections using these paths. It will finish whatever existing queries existing transaction that is already processing. Then we'll start closing all these paths one by one. That is exactly what we mean by cleaning up our database
resources and in the end we are also cleaning up our background job processing server which is using radius. So internally this function also closes our radius connection and after all this we have successfully gracefully shut down. So if you want to take a look at this, let's say, let me just clear this and let's start our server first task run and let me press Ctrl C which will send an interrupt signal to our back end. Then it can initiate the graceful
shutdown procedure. So I pressed Ctrl C. It took some time since there are no in-flight requests. This is a pretty locally run back end. It was able to shut down instantly but it still we saw that it still took around 1 second of time before it could properly shut down. So if you look at the logs we can see that for starting our server we the logs were connected to the database starting our background job server and finally we started our server. This was our start
log up to this point. All of this were our start logs. Then at this point as you can see the control C here when we pressed our control C the first thing that we did is we closed our database connection closed our database connection we stop our background job processing server all these logs are coming from async Q which is the background job processing library that we we are using so when it received the shutdown signal when we called the method which shuts down all the radius connections whatever that it internally
does it also logged some messages like starting gracefully shutdown waiting for all workers to finish all workers have finished and exiting and finally we log a message that the server has exited properly and that's how we stop our server so this whole workflow is called the process life cycle starting and the running phase and the stopping phase and this part is called the graceful shutdown procedure this is very important especially when we are deploying our applications when we don't want to risk corrupting our inflight
requests and corrupting our workflows, any kind of workflows and providing a very good a very delightful user experience to our customers to our users. So that's pretty much all you need to understand about graceful shutdown. You don't really have to understand the code because most probably you'll be using some kind of library some kind of framework can be nodej can be go it can be rust and python and most of the frameworks they have these code available which you can just copy and paste for implementing the graceful shutdown workflow. But it is important that you understand what
exactly happens when you do gracefully shutdown and why it is important. That's why we talked so much about it in this