Skip to main content

Concurrency & Parallelism IO Bound vs CPU Bound

Every backend system that you will ever build all of them will have one common requirement which is it needs to be able to handle multiple things at once because if you have a web server an HTTP based web server which is what we mean in this context when we are saying a backend application if you have a web server which can only process one request which is coming from a web browser. If your server can only process

one request like that coming from a user at a time then all the other say thousand users who are also trying to send their request they either have to wait or they will get an error that the server is busy processing one request. So you'll have to wait and that of course that is a very impossible situation and it will not work in a actual production application where we have thousands of users. So understanding how our servers and in turn how our operating system can do multiple things at once and all the

different ways that different programming languages and different runtimes like NodeJS or Python or Rust or Go. The way they let you handle this or the way they let you express this requirement of yours to be able to handle multiple things that plays a very important role in forming your mental model while you're building your application. This particular video has nothing to do with any kind of technique or any kind of tips or any kind of concrete actionable items that you can

just watch the video and go ahead and implement that. But this is important in the sense that you can form a mental model about how your back end process your requests, process all the requests that are coming from your client and how it can perform multiple tasks. Having that level of understanding will help you debug things. It will help you structure your application a little better and will help you forming the bigger architectural decisions down the length. Most of us when we are learning to create these backend systems, we

learn from our respective programming languages APIs or its documentation that keywords like async and await. Using this will help us make things concurrent or will help us make things work together and using threads. They help us parallelize our executions. But we don't really understand what these keywords actually do behind the scenes. and how they actually help us on a mechanical

level. We know that they help us and it works and we can go a long way without understanding how they work and the mechanics of it. But it's always good to understand how they work to get one step ahead in understanding your systems. So in this video what I want to do is I want to at least I will try to give a clear mental model of how things work from ground up starting from the root level to handling or interacting with your application code. Okay. So with

that premise we can start. So one particular scenario that you can relate to or one particular scenario that you will face in your day-to-day life as a backend engineer is dealing with requests because of course that's where all of it starts. So let's start from there. We have a browser let's say a single client. We have a user and it is using Chrome to make a request to our server. This is our backend server and we have all this routing layer, service layer, handler layer, repository layer that you already know. But the gist is we receive a request from our client and

we need to process it. And to process it, what do we mean? We need to make some kind of database interaction. This is our database. We need to make some kind of query, wait for the response while our database does its own execution, its own processing, and send the response back. That's what a typical API call looks like. Now, let's talk about this time. be sending a query to our database and the time when we are waiting for our database to process that request. Now talking about this particular time on a local network while

you are running your server in a local host this processing time might take somewhere around 1 to 2 millconds assuming it's a very simple database query and going further with that logic let's say we have deployed it in production and our database is in a different availability zone so this same query might take somewhere around 20 to 30 milliseconds and if our database is in a different region itself Then the same query might take somewhere around 90 to 100 millconds if it's a very far region. And now my question is while the

server is waiting this time basically either 2 millconds or 30 mconds or 100 millconds during this time what is our server doing right that is the question that we are trying to answer in this video. Now going by our naive approach assuming that our server can handle only one request at a time which was the premise that we established at the start of this video. So if we go by that premise if our server is processing synchronously which means line by line execution without any kind of

concurrency then during this 2 milliseconds or 30 mconds or 100 millconds our server is doing nothing. the CPU of our server is completely idle. That's what we say in technical terms while it is waiting for our databases network packets to arrive. Right? So let's let's try to quantify let's try to give some numbers to this amount of waste that we are causing by keeping our server CPU idle. So a modern

CPU, the kind of CPUs that we have access to today, the kind of CPUs that we work with today, they can execute around roughly speaking 3 billion instructions per second, which means 3 million instructions per millisecond. So we are saying that our server is waiting and sitting idle for 100 millisecond straight while it is waiting for the response from our database. It could have executed 300 million instructions.

300 million instructions it could have executed but instead since we are only processing one request at a time how many instructions it ended up processing ended up executing zero. So this is the amount of waste that you'll cause by not making your program concurrent by not doing multiple things at once. Now let's try to quantify it even further. A typical mid to complex level API call it will involve three to five database queries. It might also call one to external services like sending emails or

interacting with radius to get things from the cache or insert things into the cache. And each of these operation so in total let's say there are five operations involving external services whether it is database or email provider or radius. So in total the API call involved five different network operations and if we say on average if it spent 50 milliseconds waiting for each of the network's response then in total it spent around 250 milliseconds

waiting for network's response basically or in other words we say waiting for IO input output all these operations whether we are dealing with network or we are dealing with files or we are dealing with standard input or standard output like input from keyboard or showing on display etc etc. So all these operations are categorized as input output operations. But this is only the time your server spent waiting for the responses of different different network calls. But your server also did some amount of CPU level processing, right?

And that was only 10 milliseconds. It only used the CPU for 10 mconds. But for the rest of the 250 mconds, it just waited for the responses of different different network calls. Which means that your server your server's resources your server's hardware resources CPU and memory they were idle 95% of the time you were not using any of your computer's hardware 95% of the time and this is exactly the problem concurrency

tries to solve while we are building our backend applications if you notice most of the times or more than half of the times our application our program it either waits for the database response or it waits for external API calls like email cache etc etc or it waits for file system right if it is dealing with temporary files or file uploading file downloading etc etc all these which we

call as IO more than 70% of the time in a typical backend application is spent on IO and what is the use of concurrency using this mechanism called concurrency which means to oversimplify doing multiple things at once. So using this mechanism concurrency we are able to make use of our CPU or our memory to do additional things to do the other things while it is waiting for the network calls. So for the 70% of the time when it is waiting for the responses of the

network calls we say that while we wait for these network calls you can use the CPU you can use the memory for other things. We'll talk about what are those other things but that's the problem concurrency tries to solve. We have one terminology out of the way. Now let's tackle the other one which is parallelism which is often confused with concurrency and for good reasons. So at one side we have parallelism and another side which is concurrency. What does parallelism mean? It is exactly what it

sounds like. So we say that a program is parallel when it is able to execute multiple instructions. at the same time at the same moment which means that to achieve parallelism we need hardware level support. So if you say that your program is executing instructions parallelly it means it needs two CPU cores at least so that it can at the same moment execute two different instructions at the same time because one CPU core can only execute one

instructions at one moment. That is the limitation that we have. It is not really a limitation and you will not really see it as a limitation once we are done with this video. But that's the constraint that we have which is your CPU one CPU can only execute one instruction at one moment of time. So to achieve parallelism you need hardware level support which means at least two cores of CPU to execute two different instructions at the same time. That's what we mean by parallelism. It is very intuitive to understand. Now concurrency

is a little tricky. Concurrency also means that we are doing multiple things at once. But the difference is we can also achieve concurrency with only one core of CPU because concurrency is primarily about structuring our program. It is about creating our program in a way that we can start, pause, resume multiple instructions at any moment of time so that it feels like we are doing multiple things even though at one moment of time we are only using one CPU

core to do or to execute one instructions at that moment. But since we have structured our program or our operating system or our programming language runtime enables us with its feature so that we can write our programs in a way so that one instruction can start here then it can go to some other thing while some other instruction is processing and then again this one comes back. So so this is all about structure that is the difference between parallelism and concurrency. We'll talk more about this and try to clear as much confusion as possible. But

that is the most simplified definitions of parallelism and concurrency. So to express it in one way, parallelism is about doing multiple things at once while concurrency is about dealing with multiple things at once. Even though we are doing one thing at a time, but we're dealing with multiple things at once. If we want to visualize it. So let's say this is our timeline and these are our millisecond scales. This is 1. This is

  1. This is 20, 30, 40, 50 and 60. This is how time passes in one direction. Now we have two different requests. This is request A and this is request B which let's say arrived at the same time which we got these two requests at the same time. Now this is the timeline of request A and this is the timeline of request B. So the moment request A arrived in our server we started processing it. So we did let's say some

validation here some routing logic some JSON D serialization etc etc. So all of this requires CPU work. So for the first 5 milliseconds right let's say this is half this is 5 millconds. So up till here we used our CPU. So let's imagine for this example we have only one core of circuit. We used our CPU for the first 5 milliseconds and meanwhile request B was waiting and after this we wanted to make a database query. Of course most of the API calls we have to

make some amount of database interaction. So after 5 seconds of processing we needed some response from our database. So till here to let's say we took around 40 mconds. This is the time that we are waiting for our databases response. Now after 5 milliseconds, we don't need the CPU anymore. So the moment we leave the CPU or the moment we don't need the CPU anymore because we are waiting for an IO response a network response that's when

our operating system scheduleuler or a programming language runtime whatever is handling thisuling it takes the request B and gives it the access to CPU so that it can start processing and it can start using the CPU. So from 5 millisecond point the request B got access to the CPU. So let's say it took around 15 millisecond since it had a lot of processing some kind of validation some kind of a long recursion loop whatever. So it took around till 15 the CPU slice for 15 millconds it used the CPU and

after 15 mconds again it also had to make some kind of database interaction then again it also started waiting for the IO. So it also waited till 50. This is the time it is waiting for the database to respond back. Now at 40 milliseconds our database return the response for request A. But it's not like the moment our database responds back the CPU will be given instantly to request A. So there will be some kind of shoulduling logic or some kind of priority based logic. So up till here

even though our database was not really processing but our request A did not get the CPU but at exactly 50 milliseconds point our request A gets the CPU back and now again it does some kind of processing using the CPU let's say it tries to filter out some values etc. Now it is done at this point. It just responds back to our client. And same way after request A is done with it. Meanwhile the request B the database response for request B is also returned and it also does some kind of processing and returns the respond back. Now the point that I want to bring your

attention is while request A was given the access to CPU the request B was waiting. It did not get any kind of CPU access and only after request A was paused or at least it was suspended because it was waiting for IIO. It was waiting for a network's response after that request B was given access to CPU. Then only it started using the CPU for any kind of processing. That's the only point that I wanted to make. Even though we had one core, we still did not waste

any amount of CPU. Of course, we have oversimplified it with two requests. But at a time, our server is not just dealing with requests. It's also trying to log messages to our standard output, which is also an IO process, right? It is also trying to send telemetry data, which means external API calls. It is also trying to run background jobs, hundreds of different things it tries to do. And you can imagine this CPU access being divided across all these demands

for CPU. and the requests or any kind of task which requires any kind of network response or any kind of IO response they just wait while the CPU just jumps around does the processing work and the moment that particular task or that particular job or that particular process whatever you want to call it the moment it starts requiring some kind of IO response it jumps from that and goes to some other process so this is how concurrency works and this is how we don't waste our CPU cycles now Since we

had only one core even though both of these requests for a client's perspective both of these requests are currently processing right they are both the request B has not ended yet it's not it has not thrown any kind of error for a client's perspective from a highle perspective both of them are currently in progress that's what we say but at this moment of time the moment you zoom in the moment you look at the time slice at one time window only one request is

being processed by the CPU even though both of them are in progress even though we are doing multiple things but at a point of time we are only doing one. So this is exactly the definition of concurrency. But if we were to imagine how parallelism looks like as compared to concurrency, then this will change. Request Arived, it got CPU and request B arrived. And now that we have two cores, let's imagine for the sake of the parallelism example. The moment request B arrives, since we have another core available, the request B also gets a CPU slice and it also starts processing. So

in case of parallelism also, both of the requests are currently in progress. But the difference is at this particular time window we are not doing just one thing. We are doing both of these things at the same time the same time window and that's what parallelism looks like. Now coming back to the practical question why does it matter? Why does it matter for us to understand this the the concept of IO and the concept of CPU? What is IO bound and what is CPU bound? Why do we need to understand this? Because as we have already said most of

the things that happens in a requests life cycle in a typical API call are things that are IO bound. This means things that are waiting for some kind of external resources rather than doing actual computation. And the only time that we use the CPU is when we are doing actual computation when we are executing instructions in our CPU core. rest of the things things like interacting with the database interacting with file system interacting with logging standard input standard output everything

requires IIO which means sending the request waiting for the response to come back so we call those kinds of things IO bound and when we are actually doing computation let's say you are validating so you are actually going through the JSON bytes you're checking the values you're comparing stuff you're adding stuff while you are doing that you're doing actual computation you're using CPU cycles we call that CPU bound so that is the difference between IObound and CPUbound and most of our backend applications they are IO bound for IO

bound tasks one thing which is mandatory is concurrence otherwise as we have already said it it cannot function if it only process one request per second you're wasting 95% of your resources and that's just not reasonable because you want to handle as many requests simultaneously as possible and while you're doing that you want to keep the CPU busy with other requests actual computing so it does not matter whether you have one CPU core or eight. The bottleneck is always going to be IO when we are dealing with backend applications. But there are some kind of

workloads which we call as CPUbound. The things that I already mentioned like validations or JSON dialing etc. They are CPUbound but they are not as heavily CPU bound. You will not notice it. They'll mostly be finished in one or 2 milliseconds because our CPUs are that fast. But some of the heavy CPUbound operations are things like image processing because in image processing or graphics there are a lot of matrix multiplication and heavy computation tasks which use a lot of CPU cycles. Same way if you are dealing with

encryption in our authentication middleware we are verifying a JWT token. So while we are doing that we are dealing with encryption and very low-level computation that also is CPUbound because it uses a lot of CPU cycles to process or to crunch numbers. So for CPUbound tasks what is more beneficial is parallelism because you want to use CPU you want to finish all these tasks you want to finish executing all these instructions as fast as possible and parallelism using multiple

CPU cores to finish these executions is always faster as compared to concurrency where you only use CPU for one instruction at a time in your backend system of course you need both you need concurrency so that you can handle your IO bound workloads. You can deal with multiple database connections. You can deal with external API calls. You can deal with file system. You can do logging. You can deal with background jobs etc. All these different IO bound workloads. But you also need parallelism so that you can do all these heavy tasks or any kind of computation based

operations as fast as possible. But one thing to remember is most of the back end applications are mostly IO bound. they are not CPUbound unless you are doing some niche work of video encoding or encryption etc etc. Now let's talk about how do computers actually do multiple different things at once. What are the technical elements that come into play to make this possible and fundamentally looking there are majorly only two ways. First is using threads

and second event loops. So every concurrency feature that you encounter in any of your programming languages. So if you work with go then it is go routines or if you work with javascript or typescript or nodejs it is asyncate or if you work with java it's virtual threads. All of these concurrency primitives that we are aware of in all our programming languages that we work with on our day-to-day basis. All of them build on these two mechanisms mostly. It's not like they use threads directly but the primitives that have

been built they are either built on top of threads or the concept of event loops. We'll talk about what these two mean. That's what it all boils down to. So let's talk about threads first. So to simplify thread can be considered as an independent piece of execution by your operating system. We are not talking in the context of programming languages yet. But thread is a feature that is given by your operating system to run a single and independent piece of execution in your CPU obviously. So when you create a thread or uh a piece of

logic, a piece of program which will be run independently on top of your operating system, your operating system allocates or it creates or it assigns a couple of things. First is stack. And what do we use the stack for? For keeping track of our function calls. For example, if it starts with a main function, then main function called another function like get users, we'll keep pushing different function calls which was called after which in the order into the stack. Same way we also

keep our local variables. If we have a function, let's say we have a main function and here we say let a equals to 3, then this variable that we created, this variable that we assigned, the memory for this will be inside our stack and apart from stack OS also creates an instruction pointer and what is the use of instruction pointer to keep track of where exactly we are in our execution workflow in the code. So it keeps a pointer to that so that it can come back to it after the context switch. Now one

critical property, one behavior of threads and it is an important property because this is where it differs from other models of concurrency like event loop and all which is the the the piece of logic also called as a scheduleuler which decides if we have multiple threads let's say 1 2 3 which thread stops which thread starts and which thread blocks resumes all these things is decided by a scheduleuler and the important property is this scheduleuler is controlled by our operating system. That is an important property because in

the later part we'll discuss how a certain programming language runtimes they also have a scheduleuler on top of our operating system scheduleuler to make things more efficient to make things more lightweight. But traditionally speaking threads are scheduled using a scheduleuler which is controlled by our operating system. So what does the scheduleuler actually do? It gives each of the thread a time or a time slice for doing what? for some amount of processing using our CPU. So each thread is given a time slice for a

particular amount of CPU processing capability and usually it's in milliseconds. It might be less also but we measure it in terms of milliseconds and let's say if it assigns somewhere like 2 millconds to each thread preigned then after 2 milliseconds the scheduleuler will go ahead and pause this particular thread. It will save the state of it like what was the last instruction that it was trying to execute. It will save that information and it will switch to another thread to start processing that and after another 2 millconds it will switch to thread 3.

So that's how the basic schedule shoulder algorithm works. We will not go into operating system should schedule shoulder algorithms and all. It's it's a huge topic. People have written books about it. If you are interested more to learn about threads and operating system shoulduling and everything then you can read a book. One of the most popular books is this one operating system three easy pieces. A very good book to give you that that intuition that understanding of how our operating systems actually work. Then you can make

more sense of all these abstractions of our programming language concurrency. But anyway that's optional. That's your choice if you want to go down that path. And another thing is this type of shoulduling after a given time the shoulduler pausing a thread and starting the execution of another thread. This is called preemptive shoulduling because threads or executions are preempted or stopped before they're finished whether they like it or not. It's not in their control. So coming back to our IO problem when a thread let's say it it

performs a blocking operation. What do we mean by blocking operation? which means any kind of operation any kind of activity where the CPU cannot do anything right the CPU is useless which is any kind of input output operation or any kind of operation which deals with devices for example we've already talked about things like logging or network calls which is basically dealing with your network card dealing with packets TCP HTTP and all these things same goes on database calls database call is also

kind of a network call only But no it can be considered as a major category. So all these things we already know are blocking calls or IO calls where CPU is useless. CPU cannot do anything. CPU can only do raw processing, crunching numbers, crunching information, all these things. So whenever a thread encounters any kind of blocking operation, it goes ahead and it tells your operating system that I have encountered an a blocking operation, an IO operation. So I cannot really do anything until that completes. So what your operating system does it marks that

particular thread let's say in this case it is thread number one so it marks that as blocked as in it cannot be processed by CPU at the moment and it switches to a different thread is thread number two and when the IO of thread number one completes let's say it was a database query if the database comes back with a response then this becomes runnable again it's not like it will instantly the moment we get the response it will again get the CPU processing capability but it will be put in a state some state so that the scheduleuler can pick it up

again when it gets the opportunity another important property of threads which I should also mention is if we have two processes I don't think we talked about processes yet let's say we have process one and process two so every time you run your program it can only run as a process as a process of your operating system So if you have two process even though they are from the same program or the same code if you have two processes then the thread from process one is isolated from a thread

from process two right it makes sense because of security reasons because of memory corruption reasons and they cannot see each other's memory. That is a very important point to remember. But if we have two threads let's say in process one these are the processes and these are threads let's say. But if we have a thread two for process one, then these people can share memory. All right. And what do we mean by memory in the first place? Memory like the heap or

global variables in your program or anything that gets allocated dynamically. The interesting thing here is if thread one allocates let's say a an object or a strct if you're dealing with Go or an object if you are dealing with JavaScript. If thread one allocates an object or a strct in the heap or in memory then thread two can also access that particular object using a pointer using the address of that object. Okay. Another important feature of threads. Two threads can access the memory of

each other as long as they are in the same process. This particular memory sharing this is very powerful because threads two different threads they can communicate by simply reading and writing to the same memory place or the same data structure or the same variable whatever data structure that we are talking about it can be a map it can be a particular variable global variable or any kind of storage. So the communication happens through shared memory and since all of this happens through pointers because they know the

exact address of that particular variable that particular map in the memory there is no kind of copy no kind of serialization anything that's why the access the thread access is also pretty fast but this particular property of threads and the fact that threads communicate by sharing the memory This can become very dangerous and we'll discuss that part the dangerous part in the later part of this video. But the point here is if you have multiple CPU cores you have let's say two CPU cores

then multiple threads thread 1 and thread 2 they can actually execute in parallel at the same time window. Let's say 0 to 50 millisecond. Both of these threads can run in two different CPU cores at the same time. We're not talking about concurrency. We are talking about parallelism now. Same way if you have four cores then four threads can run in parallel. And as you have already discussed using this kind of parallelism truly can speed up any kind of CPU bound work that your program is

trying to perform. But one discussion that always happens when we are talking about thread is the cost or also called as an overhead of using threads. What kind of overheads that we're talking about? The first one is memory. As we have already discussed, each thread whenever our operating system creates a new thread, it's also creating a new stack for that thread for pushing function calls, variables and all these things. And this particular stack size on an operating system like Linux, it

can be somewhere around 8 mgabytes. Even though most of the memory in this 8 mgabytes is just virtual memory, the actual physical space is not assigned until it is needed. So that feature is there. That feature is provided by the operating system. But it still comes with some megabytes of memory the actual physical space. So even if you are talking about kilobytes of space for stack, let's say your programming language or your framework, it creates a new thread for each request it gets.

Even though no kind of framework does that but assuming it deals with HTTP requests that way using a new thread so that they can work concurrently and sometimes they can also work parallelly. So if you get 10,000 requests during a traffic spike, which means 10,000 threads, 10,000 native operating system threads and assuming each one has around like 500 KB or even 1 MB, we're talking about 10,000 and more than around 8 or 9

GB of memory just to deal with HTTP requests. Your server also needs memory for a lot of different kinds of tasks, hundreds of different kinds of tasks. they will quickly run out of memory and your program is eventually going to crash or your server might also eventually going to crash. So this thing is already clear that thread is very memory heavy because of the requirement that each thread comes with a cost of creating a particular stack of some kilobytes or some megabytes. Second kind of overhead is the creation overhead. So

every time we want to create a thread, every time our programming language or our framework wants to create a thread, it has to make a system call to our operating system kernel. And once the kernel receives this system call, it does a couple of things. things like setting up the stack as you already discussed, allocating different data structures that it needs to internally manage this particular thread, things like instruction pointers. Then at last adding it to the scheduleuler to schedule it, pause it and block it etc etc in different different phases. Now

all these things can take somewhere from a few microconds to a few milliseconds. Now that's that does not sound like a lot of time in terms of human perception but when we are talking about function calls we are talking about running our programs this this is a major overhead this can be a major cause of latency that's about the creation overhead of threads. Then the last one which is the most important one and the major cause of latency most of the times is the context switch. So imagining we have three threads going on and we have a

scheduleuler here. So every time our schedule shoulder wants to switch from thread one to thread two or thread two to thread three or thread two to thread one or thread three to thread two whatever any kind of switch from one thread to another. First thing it has to save the current threads CPU registers. Then it also has to update all kind of bookkeeping to keep track of the state. Then it has to select the next set that it wants to run. then fetch all the

saved registers of that thread, the thread that it wants to run currently and the bookkeeping, the state management, all that. So restore and all these things can take somewhere around 1 to 10 microconds in our modern hardware at least. So if we have four cores of CPUs and you have around 100 threads, so this scheduleuler, our operating system scheduleuler has to constantly juggle between them like and constantly switch between them. And we are talking about microsconds and milliseconds of context switching time for each switch depending

on where is your data whether it is CPU cache or not. And if we are talking about thousand threads then this context switch time can go beyond milliseconds and milliseconds of time. And the context switch time is primarily very unproductive. We are not really doing anything. It's just a maintenance work. It's just work that needs to happen before the actual work happens, the actual processing happens. So all these milliseconds and milliseconds of CPU time that you are wasting on context switch can be spent on actual programs processing and making the program faster

and this is one of those reasons threads per request at least this kind of model has always struggled in terms of high concurrency high concurrency requirements. Now the second method that our operating system has to do multiple things at once. The first one that we discussed was threading using threads to do multiple things at once. The second one is the event loop model. Now the number one difference that we can see in the event loop model at least is instead of having multiple threads to do multiple things at once. In the event

loop model we usually have one thread and that one thread and using this mechanism of waiting and using callbacks and all these thing that we'll discuss. It does all these multiple things but it does not solely depend on the number of threads to do multiple things at once. So usually we have only one thread and the key differentiation here is the the number one thing that we have to keep in mind is we can never block the event

loop and what do we mean by not blocking the event loop. So when a particular task let's say the request A it needs to do some kind of database query. So this request a instead of blocking and waiting for the response of the database it says that okay I want the result of this database it says that to the program to the event loop using the primitives of your language mostly it's using the async syntax it says that okay I want the result of this particular database query from this function and it says when we finally get the response

back from our database then only resume my functionality until then I am handing of the control of CPU execution so that you can run other things. The moment it does that it is pushed into a queue. Now event loop has multiple cues and multiple algorithms priorities and all these things that is a completely different topic of discussion but you can imagine it as a simple queue. So the moment some kind of IO happens using the event loop mechanism the task hands over the control back to the event loop and

it pauses until the IO is complete. So the event loop keeps track of all these tasks that are currently waiting for any kind of CPU processing but they are blocked because of some kind of IO operation. To support this functionality the programming languages cannot themselves implement this. The operating system has to support this. So in Linux we use something called eol provided by the Linux operating system. Same way for Mac OS operating system we use something like KQ. So all these things that are

provided by our operating system they make event loop possible to to make concurrency easy basically and using these things the event loop can monitor all these thousand different connections at the same time in a loop. So every time the iteration of a loop runs it checks whether the IO operation is complete or not. If it is complete then it puts that task back into the queue depending on the priority of course and it resumes its functionality resumes by giving it more CPU processing time. So on every loop iteration check for IO

operation completion then run call back then back to the loop. This is how the event loop works in a very oversimplified way. And because of this setup, because of this singlethreaded call back and loop based setup of our event loop architecture, we can have a lot of efficiency. When it comes to IObound problems, you have to remember that when it comes to CPUbound problem, the more the number of threads, the faster our program is going to execute. But when it comes to IO bound problems,

it does not matter how many threads that you have. In fact, the more number of threads, the more amount of context switching it happens. and eventually your program becomes slower. So for concurrent workloads as we have already discussed and for IO bound workloads event loop works better. Event loop is more efficient as compared to the threading model. Threading based concurrency. No kind of context switching happens here because you only have single thread. There is no multiple megabytes and gigabytes of stack memory

because you only have single thread. So a lot of problems related to memory and latency and context switching comes down because of the fact that we are only using single thread for performing our event loop. But what is the trade-off? The trade-off is you can never block the event loop. For example, if your code in your programming language, if you do something which is CPU intensive or CPUbound task which takes let's say 100 milliseconds or more. So during this time your event loop completely stops.

It cannot do anything. it just waits for that particular task to finish. That's why in an event loop based architecture in an event loop based concurrency handling mechanism our priority is always to not block our event loop ever. Anyway coming back. So our priority when we are dealing with event loop based architecture our callbacks whatever we execute after IO or the CPU processing part they have to be fast and usually talking realistically in a most of our

backend applications in SAS backend applications. We do not really have any kind of task which takes like 100 millconds and any kind of tasks which are this amount of CPU bound. mostly only if you are dealing with animations and images, video rendering or ML workloads all these things then only you take this much time but in a typical backend SAS workload the latency is always because of IObound stuff and this is the reason because of this limitation

because of this constraint of event loop based architecture any language which uses event loop for example JavaScript they make use of these callbacks promises and async weight all these primitives all these primitives are basically syntactic sugars all these convenient functions so that you have the right mental model of how your program is going to work you do not write any kind of blocking operation so that you get in the way of your event loop so all these primitives that is provided to us by JavaScript and it can be the same for

other programming languages also asynch primitives they they just help you to to to to give up your control at particular IO bound operations at particular points for example the moment you start a database query you write await right or the moment you make some kind of API call you write await and the moment you write await you're saying that okay I'm giving up control of my CPU slice or my processing so that when this particular network operation is complete I will be given back the CPU processing capability

so that I can finish this particular function. So these are all syntactic sugars or syntactic convenient functions so that it is easy for you to deal with the event loop. Let's understand how in the context of a back end in the context of a backend dealing with different different requests at the same time and doing different IO operations like database operations works in different different concurrency models. how it works in a threading model, how it works in the event loop model and how it works with languages like Go which uses a

different kind of setup which we'll discuss. So this is how it works with the first kind of setup which is the threading model. We use multiple threads to do multiple things at once. So imagine we are dealing with two requests. One is request A and one is request B. The first thing that happens is our back end receives a request which is request A. And the first thing that you do after receiving a request is we parse whatever the JSON body or whatever the query parameters that are being sent

in that particular request and which is a CPU operation because it deals with parsing bytes and loading it to memory and all these things. Okay, so this is pretty much straightforward. Then what happens after we are done with all the parsing and all the validation etc etc we want to make a database call and this is what our query looks like. Select star from users. This particular handler deals with returning all the users from our database and sending it back to the client sending back to the front and for that we want to run this particular database query which is select star from

users. Now to go a little deeper into how this particular query works in phases. In the first phase, we establish a TCP connection with the database or if we have a pool, we get a connection from the pool and use that connection to send some bytes. The information that you want to receive this particular query, we send it over to our database. We have a database which our back end talks to and we send this query to our database. Now after we send it, we have a socket using which we are communicating with

our database and we use that socket for the read operation or basically we are waiting for any kind of response from our database using this function read on our database socket which is a blocking operation which is an IO operation the kind of operation that we've been discussing for so long any kind of operation which our CPU does not have any power over or it it it does not have the capacity to do anything in that situation because it mostly depends on external things. So in this case this is one of those function which is IObound. So we have to wait for the response from

our database and the moment it happens as we have already discussed that the moment our thread blocks in a threading model based concurrency the moment our thread blocks the task tells our operating systemuler that since this thread one is blocked we have to switch to a different thread. Since we are trying to do multiple things at once the moment something blocks because of IO we have to switch to a different task. So while we are dealing with request A, our back end also receives request number B. So we switch to request B and we start executing that. And the same thing we

do, we get the request, we do the routing, we do the body parsing, load it into the memory, load it into a strct or object or whatever your programming language provides and then we do the validation transformation whatever we want and then we want to do a database query. Now this time we want to fetch all the orders from our system and return it to our front end. So same thing we establish a connection to our database and we send this particular query in the form of bytes to our database in using a TCP connection. Then

on the same connection socket we call read and we wait for the response. Then again this is an IO block operation. This is an IO bound operation and our CPU cannot do anything. So again this task tells the scheduleuler that we are blocked. So we have to switch to a different thread and then our OS scheduleuler checks that if we have a another thread which is thread number three. So in this case if you are creating a new thread per request or if you have a another thread available then it will switch to thread three and will

do the same kind of thing again. If there are no threads then it will just wait and block everything until one of these IO operations are complete. One of these database responses are received back. The moment we receive some kind of network packets from our network card, then OS goes ahead. Our OS schedule should scheduleuler whichever is dealing with all this pausing resuming of task and threads. It goes ahead and wakes up thread one that okay uh we have received our response back. You can start executing the rest of the logic that you have which is a CPU operation. So at

this point the database has returned the response. Now it continues the remaining of the functionality which is like process the query results. It receives some kind of bytes from the database. It takes that bytes and it reads it, parses it and loads it into some kind of memory which is a native data structure of that programming language. So if you're dealing with JavaScript, it might load it into objects or an array of objects. In Python, it might load it into a dictionary. In Go, it might load it into

a strct. So that's what we do. Serialize it into our own data structure so that we can read it in the process phase. Then we take that our own data structure our own something like strruct object dictionary whatever we have whatever programming language that we have then we take that and we serialize it into a JSON since we have to send it over the network we have to make a network call an HTTP call we have to serialize into JSON assuming that's what you're using for your serialization standard after that we send the response back and this

particular thread which was executing request A this goes back to the pool of threads that we have if If we try to imagine it as a code, let's say we are using Python as our programming language. Then this is what it would look like. We have a function which called handle request. It is the handler which deals with this particular endpoint. And here we make a database query. We want to fetch a single user from our database. So we are seeing select star from users where ID is this as a parameter. We are passing the ID of

the user. And the moment we do this, this becomes a IO operation or a network call. So at this point our thread blocks right assuming that we are doing a threading based execution. It is not async aait based execution that which you might be more familiar with. So this is a threading model. So we don't use any kind of async aait or the event loop based architecture. So the moment we do it our thread blocks whichever thread was executing this particular function it blocks and switches over to another thread if that exists or it waits for

the response back from our database. And the moment it gets the response back from our database gets the CPU processing time again and it starts executing and returns the response back to whatever the parent function is. So this blocking operation at this point if we take a look while we are reading the code as a human it looks like okay this executes and then this executes and since it all happens in microconds or milliseconds it looks like okay we are just going from line by line it's all sequential but the blocking operation is not visible to us. the the the the

switch between different threads that is not visible to us but it is happening behind the scenes. So this is how a threading model based multiple request handling works. Now let's look at how an event loop based multiple request handling would look like. So first we have request a the similar kind of setup that we had with threading. There is not much of difference when it comes to the sequence of operation that happened or the result that we got using threading

and kind of results that we'll get while using event loop. There is not much of a difference there but the difference lies in the pausing the blocking and the resuming of operations and how efficient it is. So first we get the request number a and as usually we will parse that particular request for any kind of body query parameter etc etc so that we understand what exactly the client wants from us. Then we start our database query. So same kind of setup we establish a connection with our database

using TCP connection. We send all those bytes and we register a call back. Now this is how it it differentiates. We'll look at the code of how this looks like in a bit. But what we are saying in an event loop based architecture at the point of where we are giving up control back to our operating system saying that okay this operation is now waiting for an IO response that is the reason I'm giving up control so that you can execute more tasks while I am waiting for IO using a CPU processing and that it does by registering a call back. So

it says so something like db.query query it runs this and it also passes a function a piece of logic and we call this a call back which means once the result of bitquery returns we want to run this piece of function that's why we are calling it as a call back okay that's how the event loop based architecture works it all works on top of call back sludge okay now the moment it does that it it returns and what do we mean by return it's not like we will

return the response of that HTTP request no it basically switches to another task while we were parsing request A let's imagine request B also arrives so the moment request A pauses for IO the moment request A hands over the control of CPU processing because of the IO blocking we pick up request B as in the event loop picks up request B okay now it does the same thing it passes the HTTP request it makes a different

database query with the TCP connection and It establishes or it registers a different call back a different piece of logic that it wants to run for request B and again it switches. Now while this is happening how does it check that whether the database response of request A is completed or not. That's why we call it as a loop. So you can imagine a loop is running. In each iteration it checks whether the IO of A is completed or not whether the IO of B is completed or not. So each iteration of that loop you can

imagine it as a infinite loop and it keeps running the loop keeps running that's why we call it as an event loop and with each iteration it checks whether the IO operations are complete or not the ones for which we have registered our call backs for which we have a particular piece of function that we want to run when that particular IO response is back and how does it monitor this how does it wait for all these IO responses how does it know whether they are complete or not using OS OS level functions as we have already mentioned

for Linux it's using eole for Mac OS it's using KQ and something for Windows right and using these it becomes very efficient to check for all these IO responses in each iteration so let's say after a couple of iterations of event loop while it is doing other things like logging things to a standard output or doing some kind of email sending etc etc in a couple of iterations it saw that the DB response of request B has been completed okay so what happens the eole connection monitoring it returns it

returns and it says that socket B is now readable. So the database socket from which we were expecting bytes back some kind of response back now it is readable. So the moment we do that the call back that it had registered the piece of function that it had said that after the database query returns the response for request B we want to run this piece of logic the one that we had registered now it starts running that using event loop and it does all the different kind of processing loading data into the data structure as we did

in our setting model and returns the result back to the client after doing the JSON serialization etc all the boiler plate stuff after that the database response for request A also comes back and we run the call back for request A and we send it back as a response back to our browser. Now, now it all looks kind of the same. The only difference is instead of pausing the thread and resuming the operation from the instruction that we're running at that particular point of time before pausing. The way we pause and unpause in

an event loop based architecture is using callbacks. And when we use threading model use make use of operating systems native data structures operating system native instruction pointers all these heavy data structures to resume our operation. But in an event loop based architecture use callbacks to resume our operation back. Now if you try to see the code of how an event loop based concurrency looks like this is a JavaScript code where we are using event loop to perform concurrency or basically

dealing with multiple requests at once. So this is what it looks like before ES6 JavaScript version. This is how we used to do concurrency the upper part where we have a function we are receiving the request and the function to send response back some kind of abstraction over request of JSON. So we are calling this function which is our database driver which has the database connection this TCB connection handling logic. So this is our query select star from users and we want to fetch a particular user

ID which we are receiving in our request. Now in the third parameter as you can see we are passing a function an anonymous function and we are saying that after this particular piece of logic runs this particular IO response is received right so after our event loop receives the response back from our database the result of that is going to be available in this function if there are some errors it will be in the first parameter and the result is going to be available in the second parameter and we can use that value in this particular particular call back. So this is what we

mean by call back. A particular piece of logic that we provide beforehand to our event loop. So that the moment this particular piece of logic returns some response, we run this function. So once we get the user, we send it back to our client using this send response function using these parameters. Right? This is what it used to look like. Now with modern JavaScript after ES6, this is what it looks like using async commit. So we have the similar function and we run this database query and we pass the

user id but there are no callbacks. Okay, we are just using two more keywords. One is async, the second one is await. This async await based syntax is just a syntactic sugar on top of callback. So I don't want to go too deep into the JavaScript ecosystem. But the the problem with this kind of setup the call back based setup the reason we came up with this async aait based syntax is let's imagine you want to do another database query and another API call after you get the response from this

particular database. So in the call back you have to write another database query right select star from orders or whatever then for that again you have to pass a call back and if you have let's say five or six of these async operations six of these IO bound operations and for each of those you'll have to pass a call back so after a point it will start looking like this okay there are more and more nested functions inside each of these and that continued for a long time before we came up with this async. So previously we used to have these kinds of functions

where we had a lot of these nested callbacks. So to prevent that to make it more readable they came up with this async await based syntax which is just a syntactic sugar and by that I mean the moment you write await whatever comes after that at least in the context of this function I don't mean in the context of the whole program for this function the moment you write await whatever comes after that until the completion of this function until the last line of the function that you can imagine is being passed as a call back

to this IO blocking operation. So this is same as writing this but when we write like this we can do multiple IO blocking operation and we won't have any nested call backs. The idea is the same. The moment you write await you hand over your control to the event loop saying that until we get the response back from our database you can do other stuff and the moment you get a response back you can continue from this point. Okay. And your event loop handles all that using a single thread. There are no context switching and no expensive stack memory

allocating by your operating system. So everything happens in a very efficient manner and a very lightweight manner. That's why event loop is considered more efficient than the threading model as long as we are talking about IO bound operations. We are not talking about CPUbound operations. For CPUbound operations, threading will always be faster than event loop. Next, let's take a look at another interesting concurrency model which is not exactly threading and not event loop. Also, it

uses something else. We can consider it as a virtual thread. And we call it virtual thread and not thread because when we say thread, it is a an operation an entity which is managed natively by your operating system. It is not something that your programming language or your programming runtime handles. The facility is managed by your operating system which we already know is considered heavy because there is a lot of overhead related to memory creating and switching. So, Go uses a different

approach which we consider as a virtual thread. But you will not see this term being used when you're working with Go or when you're reading documentation of Go or anywhere. They don't call it virtual thread. they call it a go routine. Okay. So the conceptually they can be considered as a virtual thread but they are called go routines and the moment you start a go program it it runs as a go routine and that go routine the main go routine can also create other go routines while it is running. Okay. So

that's how the go programming language works. We'll not go too deep into go in this video. We'll just focus on the concurrency model of how go concurrency works. So one interesting thing with Go is as a backend language when you are using Go the standard library Go creates a new Go routine which is a virtual thread for each request and in the start of this video at least when we were talking about the threading model we discussed that create if you create a

separate thread for each of the requests that you're receiving then you might run out of memory because each thread comes with some megabytes of overhead and eventually if to create like let's say 10,000 threads or 20,000 threads eventually you'll run out of memory because threads are very expensive but if you take a look at go's code the standard libraries code the source code of the go programming language then there is this function which is the serve function this is the function which basically accepts all the incoming connections listens for new connections

and it creates a new goin for each new HTTP request that it receives and it assigns a handle handler a particular handler that you had configured beforehand while writing code and it assigns that handler to deal with that request. So if you go to this line at the end you can see that it is creating a new go routine in go programming language you can use this keyword called go to create a new go routine or create a new virtual thread and whatever function that you pass after writing this keyword that will run as a new

virtual thread or as a new go routine. So go programming language by default it creates a new go routine for each request it receives. We are talking in the context of back end now. So imagine this scenario we have two requests request A and B and first we receive request A it created a new go routine our programming language runtime and then we got request B it created another go routine. Okay now we do the same kind of stuff. We parse the request we do

validation transformation. We load it into memory by d serializing into a strct and all those stuff. Then we get into the database stuff. Same we do the TCP connection stuff. We convert our SQL query into bytes. We send those bytes and wait for network. Now the moment that happens, we know that this is a IO block operation. Right? Same for go routine 2 also. We do all this stuff and the moment we run the database query, this is an IO block operation. The moment it happens the go runtime scheduleuler we'll talk about what this

is we're not talking about OS shoulduler it's not anuler which is provided by our operating system it is go runtime's own scheduleuler it pauses this particular go routine the same way our osuler our operating system scheduleuler pauses a particular thread switches the context goes to the next thread and start executing the go runtime shoulduler does something similar but instead of dealing with actual ual operating system level threads. It deals with go routines which are kind of a virtual thread which only exist in the context of go runtime.

Okay. Now same thing happens the moment some kind of IO operation starts and we start waiting for the response the go schedule shoulduler it pauses that function and it moves on to some other go routine right the go routine two maybe or go routin 3 if there are thousand go routines because for each request we are spinning up a new go routine we may be switching into any of those thousand go routines that does not matter okay now let's talk a little bit about the go runtime shoulduler how does

it network. If you remember the threading model, which you should because we literally talked about it 10 minutes back, it is exactly the same which happened on this layer, the operating system level threads which we only had a limited amount of for good reasons because threads are expensive. They come with a lot of overhead overhead like creating overhead, switching overhead and the memory overhead of maintaining all the stacks. So we already know this model of execution creating and using different

operating system level threads to do multiple things and it usually relates to how many CPU cores that you have because in one CPU core only one thread can execute at a time. Okay, it is not necessary that the number of threads are directly related to the number of CPU cores that you have. So even if you have one CPU core, you can have four threads but at a time only one thread will be executing in your CPU. We already know this layer of execution how threads work, how the context switch happens, how threads are created, a new stack is allocated, the instruction pointer is

created and the data structures are created to maintain the state of the thread and all these things. Right? This layer now comes another layer, the go routines layer or the go runtime scheduleuler. Now this is not related to our operating system. This is something which comes with the Go programming language. This is embedded. Now this is where the magic happens and this is why Go is such an efficient language when it comes to both CPU and IO. Even though it

is not using an event loop based architecture but because of this Go runtime scheduleuler based architecture, it manages IO operations so efficiently. So this is how it works. For each operating system thread there is a flag that you can set while starting a go program which is go max which means how many threads that you want to create by default how many CPU cores you have that many threads are created right so let's imagine you have four CPU cores and four operating system threads are created now

you're capped at four threads for your go program that's the first fact and these are those four threads m1 m2 m3 and m4 now for each time you create a new go routine the go runtime shoulduler it assigns that particular go routine or it pushes that particular go routine into one of these cues. So each operating system thread you can imagine it as a queue and depending on some algorithm your go runtimeul it pushes go routines into these cues. So one operating system thread will be dealing with multiple go routines at once. So

instead of dealing or instead of creating, pausing, switching all these things on operating system level threads, now you're dealing with go routines. You're still doing the resuming operation, the switching operation and the memory thing, all those things. But now it's much more lightweight because the go runtime is handling it. And when you actually switch from one go to other, it is usually just a pointer switch, which means you just take the pointer and you point it to something else. Okay? And that thing is pretty lightweight. So for

that reason at a time we can spin up like thousands and millions of go routines depending of course you'll run out of memory at some point but compared to the number of threads that you can create the operating system level threads you can create 100 times more go routines because the mapping happens like this. So going back to our earlier example, let's say there are two requests, two HTTP requests and since we create a new go routine and this is the reason the go HTTP package creates a new

go routine for each request because go routines are so lightweight that we do not care. We can afford to create a new go routine for each request. So if G1 is blocked for some kind of IO operation, for any kind of database operation, the M1 thread, the M1 OS level thread, now the execution of course happens on the OS level thread, which means at a single point of time, the OS level thread M1 takes one of these go routines and it runs it. The moment G1 blocks, go routin one blocks, it pauses G1, which means

the go runtime pauses G1 and M1 starts running G2 now. Okay. And if we take a look at the code of how a particular go handler would look like. This is what it looks like. We have a function here and it takes the request returns a response and this is the database query. We are running this function DB.query. This is our query. Select star from users where ID equal to this. We are passing the user ID in the second parameter. And the moment we do this this particular go routine whatever go routine that was executing this function it pauses and

the go scheduleuler picks up another go routine. Maybe it's another request or maybe it's a logging operation or whatever and it starts executing that and when this response comes back this particular go routin which was paused because of this IO operation now it starts executing again and it sends a response back to the client. Now, okay. So, the distinction, the primary distinction here is the virtual thread method is kind of the same as the threading model, but we are dealing with

another layer of abstraction. Instead of operating system leveler, we are using the go scheduleuler. Instead of operating system threads, we are using the virtual threads also called as go routines. And that is the reason it is so lightweight. Another thing that causes a lot of confusion at least in the event loop model is how does the async await keyword works because in different different programming languages like Python, JavaScript, Rust and all that. So let me talk a little bit about that. Consider this particular function. We have a function here which

is fetch user data and the parameter is a user ID. We take that user ID. We run the first query, the database query which is to get the user DB.get user. We pass the user ID and we await that which means this is a blocking operation and by writing await in front of this function we are saying that we are giving up the processing to our event loop until this is completed and all this should be put into a call back function which will only run after we get the response back from this

function. Okay. Now the moment we get the response back from this function this call back all the code after this line this starts executing again we have a database call here. So we hand over our control back to the event loop and we start waiting for the response of this and this will be put into a call back which will only get executed after we get the response back from this function and after we get the response back from this function we take the result and return it back. So this is what it means from all the discussions

that we have done so far. To make it a little more technical how our programming language deals with it, you can imagine it as a state machine. So the same function we can convert it into this function. Now the signature is the same. We have a function the name is fetch user data and the parameter is a user ID. So we can imagine an async function as a state machine which interacts with the event loop. Let's say for user ID while running this function we passed user ID as 1 2 3 and we start

executing. In the first step our state is zero. We are initializing a variable called state and while starting out it is zero. And we also have two more variables to store the users and also to store orders. Then we define another function inside this function and which we will keep returning again and again. And since in JavaScript functions are a first class citizens you can return functions from other functions. So we define this function which is called step and we return that after defining. Now when this function runs what happens

we have a switch statement here and depending on which state we are in we run that particular state. Okay. So by default in the first execution run the value is zero. So in case zero the moment we enter the execution we make it as state one. Okay we increment the value of state to one. And this is where the IO blocking operation comes. We call this database function and we pass it the parameter. And in the then statement

which is the call back part. We are assigning whatever results come back from the database query and we are assigning it to user since we are trying to fetch the user in the first place. And we go to the next step. And since this is a state machine kind of setup and the caller when it runs this function so during this state transition so in the first await the moment we write the first await keyword we are running this part the first state which is the state equals to zero and after this completes we hand over the

transition part when this part completes and before this part starts the state machine execution one starts during this time our event loop is free is not executing this function function. Of course, it is executing other racing functions or other racing functions, but it is not executing this function. And after we get the result of database, after we get the user back, we transition into into state machine execution one. Then again, we do it the value two, we call this async function

in the call back. We assign the result to orders variable and when we transition into the next state and after the event loop pauses and then the result of this returns and we go to the execution two and finally we return user and orders. So you can imagine this is kind of how your programming language runtime executing async functions as a state machine. It goes from one state to another every time we use an await

keyword. If you understand the state machine kind of setup, a couple of things that become clear is one why can we only use await inside async functions. Why? Because this function the moment you write async in front of that function that has to be transformed into a state machine. That's why if a function is not marked as async you cannot use the await keyword. Both of them work together as a state machine. Second, why blocking the event loop is so bad? Because if you block the event

loop, which means that state machine will never get to go to the next state, it cannot go from state zero to state one, from state one to state two. So it's important that we never block the event loop. Right? Of course, you don't have to internalize all these things. You don't have to understand deeply all these things. How your programming language runtime converts async functions into state machines. How all these concurrency primitives works behind the scenes. But it it it helps if you do understand it. And like any

system concurrency also comes with its own set of problems. And most of the time all the different kinds of problems, different categories of problems that are caused due to concurrency is because of this single reason which is shared state or shared memory. To understand a few of these problems, let's take this very simple example which is a counter example. And what we are trying to do, we have two different threads and they are trying to update a variable which is this counter

variable which starts with the value of zero and they try to update this variable at the same time or simultaneously in a concurrent way. Now to take a variable and increment its value with one a thread has to perform three different operations. First, it has to read the current value of the counter into a register into a CPU register. Before it can perform any kind of operation, it has to read the current value. Then once it has read the value

of the counter into the register, it needs to add one. Now the value of that register becomes one. Once that is done, finally it can take the value of this register which is one and write it back to the counter variable which is this one. So to increment the value with one, it has to perform three different operations. Now imagine this kind of scenario where two different threads are trying to increment the value at the same time. And this is a timeline. So you can imagine it as millisecond 1, 2,

3 or second 1, second 2, second or any kind of time unit as you want. So at millisecond 1, f a started first as a step one read the value of this counter variable into the memory which is the value of zero. And at the same time at millisecond 2 before thread A could even increment its value in its register thread B read the value of the counter variable into the memory which is the value of zero. And at millisecond 3 thread A incremented its own resistors value one. And at millisecond 4, thread

B did the same thing. It increased the it added one to the register value. Then finally at millisecond 5, thread A wrote the value of one into the counter variable. So now the value of counter variable is 1 at millisecond 5. Now at last at millisecond 6 the last operation that happens is thread B after it has increased the value of its own register it takes that value which is value one and it writes it back to the counter variable. So ideally since both of these

threads are working on this single variable and they are trying to increment its value. Ideally after running all these operations we should expect the value of two for the counter variable. But because these two threads are interacting or trying to increment the value of a single shared variable and their operations are intertwined in different different milliseconds. we get the final value of one. And this problem is called a lost update problem which happens when two different threads try

to operate or try to work on a single shared variable. And because of this the increment of one of the thread was completely lost. This is what we call as a race condition. And at this point you might say that if I use a programming language like JavaScript which has a single thread and I use async aait so I should not worry about the problem of race condition or any other problem which happens due to concurrency. Right. Unfortunately in async await based setup

for example this code where what we are doing we have a variable which is the variable of balance and with the initial value of 100 and we have a function which called withdraw which takes the parameter of amount. First here what we check we check if the value of balance is greater than or equal to amount is the parameter that we are passed and balance is the global variable. If yes, then we call this function which is process withdrawal which maybe executes a database query. That's why we have to write await. This is an async function.

And finally, after this is done, we take the global variable and we subtract the value amount from our balance because we withdrew that amount from our global balance. Okay, this is how it works. So imagine we called the withdraw function twice in some location in our program. And when we do that, we can imagine a situation like this. So we have our balance variable. So we are just trying to visualize whatever function that we just saw in our JavaScript using async

of it. Okay. So this is the execution of the first withdraw function call and this is the execution of the second withdraw function call. We pass the amount 100 in both of these and now we can track how they execute. So in both of these instances let's say this started executing first since this is a concurrent setup this started executing first and the first thing that we do is we compare and what do we compare? We compare the amount with the balance to check if we have the balance to perform

the withdraw then we allow it. So we checked if 100 is either greater or equal to 100 which is true because 100 equals to 100 and we come inside and here we have to perform an async function and because of that we started executing this async function and the and the moment we do that we have to take the control and give it back to our event loop because that's how event loop based architecture works. Okay. And here at this point we give up control and the moment we give up control the second function started executing. We can imagine and here we again compared the

value which is if 100 is greater than or equals to 100. So true and we came inside again we started with this async function and gave the control back and the moment we gave the control back we came back to this function because this is how concurrent setups work. Every time you call await, you give up control of your CPU so that your program can go and execute something else while you are waiting for your casing function to finish or your database call to finish. So when we come here, this was

successful and we update the value of balance 100 minus 100 and the balance is now zero. After we are done with this, we come back to this. By this time, the database has responded back. Now at this line by the time we start executing this the value of balance is already zero and since this check that we made if we had done this check after the value of balance became zero this would not have been true we would not have entered into this function and since this check happened in between this execution now we are in this situation what happens by

the time we start executing this line which is the balance deduction line here the value of balance balance is already zero because we ran this and now the value of balance is 0. So now what happens instead of 100 - 100 it becomes 0 - 100. So the value of balance is now minus 100 which is an inappropriate value which is not a valid value for this function. So even if you use a single threaded async awaitbased setup

you are still not free from race conditions because this kind of situations can also happen. So what is the solution? How do we prevent race conditions? This is again one of those topics which can go very deep and you can read more about different different race condition prevention mechanisms that we have available in the operating system book. But some of the most famous ones that I will just give a brief about is first locks or mutexes. So this is how it works. Imagining we are using Python as our programming language and we are using this library called

threading to use the locking mechanism. So we have a function which is the increment function that we are just talking about and before we start executing or before we actually perform this operation the write operation what we do we acquire a lock which means that this particular statement whatever is inside this this is also called as mutual exclusion in technical language. So this area while we have this lock only one thread can enter into this

area. The moment one thread acquires a lock and goes inside all the other threads which are called to execute the same function they have to wait to get the lock and they will only get the lock after this first function is done with it. So with this kind of setup we can avoid race condition to certain extent. There are also other techniques, other mechanisms using which we can avoid race conditions etc etc. Like in in Go programming language especially there is a primitive called channels where instead of depending on a particular

global variable or depending on a shared variable where uh different go routines or different threads virtual threads whatever you want to call different independent executions can update that particular variable. what you do these different threads these different go routines they pass messages and only one particular go routine updates the value of that variable. So channels are another modern solution to the problem of race conditions. So over the time a lot of people have come up with a lot of solutions for all the different problems

related to concurrency. But this video is not about the the technicalities of dealing with concurrency and the problems that come with it. Is mostly to wrap our head around one single concept which is what is IO bound and what is CPUbound. That's the only goal of this video. It does not matter whether you understand all these threads, logs, concurrency etc or not. As long as you understand the concept of IO bound and CPUbound, this video has been successful. Okay. Now let's summarize everything that we discussed in this

video. For any kind of IO bound workloads which requires a high amount of concurrency things like web servers, API gateways, services that make a lot of API calls to other services, all these things. These kinds of primitives prove a lot more efficient as compared to the threading model which is like things like async aait or go routines which is the virtual threads for go runtime and also if you're using Java Java also has virtual threads in the modern versions of it. Same way for CPUbound workloads which means any kind

of numbers processing or string manipulation, image manipulation, video manipulation, any kind of core processing, crunching numbers, crunching bytes into the memory, out of the memory or any kind of processing that you can imagine which happens in our CPU which is performed by the CPU that is called CPUbound workloads and in those cases threads with parallelism is more efficient as compared to our event loop based architecture. So all in all, concurrency lets your program stay productive to not waste CPU cycles while

it is waiting for any kind of IO. IO in the sense any kind of work which is not handled by the CPU. It can be interacting with your network card to send and receive packets to interact with your input output standard input output or making any kind of external API calls dealing with databases. All these things these are called IO or input output and parallelism. It lets you use multiple CPU cores to do multiple things at the same moment of time. And most of the backend work that

you will see it will lean heavier on the concurrency side not on the parallelism side. But again there are cases where you might need both or you might need parallelism more than you need concurrency. But as long as you understand the difference between them you should be able to make the right decision. You should be able to choose the right tools for your use