Fundamentals of Goroutines
Goroutine
When asked to articulate the advantages of Go to Gophers, concurrency frequently arises as a key topic. The foundation of this lies in goroutines, which can be managed with lightness and simplicity. I have briefly elaborated on this subject.
Concurrency vs. Parallelism
Before comprehending goroutines, I intend to clarify two concepts that are often conflated.
- Concurrency: Concurrency pertains to managing numerous tasks at once. This does not necessarily imply simultaneous execution; rather, it is a structural and logical concept where multiple tasks are divided into smaller units and executed alternately, creating the appearance of simultaneous processing to the user. Concurrency is achievable even on a single-core processor.
- Parallelism: Parallelism signifies "the simultaneous processing of multiple tasks across multiple cores." It literally involves carrying out tasks in parallel, executing distinct operations concurrently.
Goroutines facilitate the straightforward implementation of concurrency via the Go runtime scheduler and naturally leverage parallelism through the GOMAXPROCS setting.
Java's multi-threading, which is widely utilized, represents a quintessential example of parallelism.
Why are Goroutines advantageous?
Lightweight
The cost of creation is significantly lower compared to other languages. This raises the question of why Go uses less memory; it is because creation is managed internally by the Go runtime. This is due to their nature as lightweight logical threads, which are smaller than OS thread units, require an initial stack size of approximately 2KB, and can dynamically adjust their stack size based on user implementation.
Managed at the stack unit level, creation and removal are exceptionally fast and inexpensive, enabling the execution of millions of goroutines without imposing a burden. Consequently, Goroutines can minimize OS kernel intervention thanks to the runtime scheduler.
High Performance
Firstly, as previously explained, Goroutines involve minimal OS kernel intervention, resulting in lower costs for context switching at the user level compared to OS thread units, thus enabling rapid task transitions.
Furthermore, they are managed by allocating them to OS threads using an M:N model. By establishing an OS thread pool, processing can be accomplished with fewer threads, obviating the need for numerous threads. For instance, if a goroutine enters a waiting state, such as during a system call, the Go runtime executes another goroutine on the OS thread, allowing the OS thread to continuously and efficiently utilize the CPU for faster processing.
This capability enables Go to achieve superior performance, particularly in I/O operations, compared to other programming languages.
Concise
The ability to easily manage functions with a single go keyword when concurrency is required is a significant advantage.
Complex locks such as Mutex and Semaphore are often necessary, and the use of locks invariably necessitates consideration of deadlock states, which introduces complexity from the design phase of development.
Goroutines advocate for data transfer through Channels, adhering to the philosophy "Don't communicate by sharing memory; share memory by communicating." SELECT further enhances this by enabling the processing of data from channels as soon as they are ready. Additionally, sync.WaitGroup allows for straightforward waiting until all goroutines have completed, simplifying workflow management. These tools help prevent data race conditions between threads and enable safer concurrent processing.
Furthermore, context can be utilized at the user level to control the lifecycle, cancellation, timeout, deadline, and request scope, thereby ensuring a certain degree of stability.
Goroutine Parallel Processing (GOMAXPROCS)
While the benefits of goroutine concurrency have been discussed, you might wonder if parallelism is supported. The number of CPU cores in modern systems often exceeds double digits, and even home PCs are equipped with a significant number of cores.
However, Goroutines do support parallel processing, which is managed by GOMAXPROCS.
If GOMAXPROCS is not explicitly set, its value is determined differently across various versions:
Prior to 1.5: The default value was 1. If more than 1 was required, it was mandatory to set it using a method such as
runtime.GOMAXPOCS(runtime.NumCPU()).1.5 to 1.24: The default was changed to use all available logical cores. From this version onward, developers generally did not need to configure it unless specific constraints were required.
1.25: As a language popular in containerized environments, it now inspects cGroups on Linux to determine the
CPU limitconfigured for the container.Thus, if there are 10 logical cores and the CPU limit is 5,
GOMAXPROCSwill be set to the lower value of 5.
The modification in version 1.25 represents a substantial change, significantly enhancing the language's utility in containerized environments. This has enabled a reduction in unnecessary thread creation and context switching, thereby preventing CPU throttling.
1package main
2
3import (
4 "fmt"
5 "math/rand"
6 "runtime"
7 "time"
8)
9
10func exe(name int, wg *sync.WaitGroup) {
11 defer wg.Done()
12
13 fmt.Printf("Goroutine %d: ์์\n", name) // Goroutine %d: Start
14 time.Sleep(10 * time.Millisecond) // ์์
์๋ฎฌ๋ ์ด์
์ ์ํ ์ง์ฐ // Delay for task simulation
15 fmt.Printf("Goroutine %d: ์์\n", name) // Goroutine %d: Start
16}
17
18func main() {
19 runtime.GOMAXPROCS(2) // CPU ์ฝ์ด 2๊ฐ๋ง ์ฌ์ฉ // Use only 2 CPU cores
20 wg := sync.WaitGroup();
21 goroutineCount := 10
22 wg.Add(goroutineCount)
23
24 for i := 0; i < goroutineCount; i++ {
25 go exe(i, &wg)
26 }
27
28 fmt.Println("๋ชจ๋ goroutine์ด ๋๋ ๋๊น์ง ๋๊ธฐํฉ๋๋ค...") // Waiting for all goroutines to finish...
29 wg.Wait()
30 fmt.Println("๋ชจ๋ ์์
์ด ์๋ฃ๋์์ต๋๋ค.") // All tasks completed.
31
32}
33
Goroutine Scheduler (M:N Model)
Expanding on the previous point regarding the M:N model for allocating and managing OS threads, there is a more specific goroutine GMP model.
- G (Goroutine): The smallest unit of work executed in Go.
- M (Machine): An OS thread (the actual location of work).
- P (Processor): A logical processor managed by the Go runtime.
P additionally maintains a Local Run Queue and acts as a scheduler, assigning allocated Gs to Ms. In simple terms, goroutines operate as follows:
The GMP operational process is as follows:
- When a G (Goroutine) is created, it is allocated to the Local Run Queue of a P (Processor).
- The P (Processor) assigns the G (Goroutine) from its Local Run Queue to an M (Machine).
- The M (Machine) returns the status of the G (Goroutine): blocked, complete, or preempted.
- Work-Stealing: If a P's Local Run Queue becomes empty, another P checks the Global Queue. If no G (Goroutine) is found there either, it "steals" work from another Local P (Processor) to ensure all Ms remain active.
- System Call Handling (Blocking): If a G (Goroutine) becomes blocked during execution, the M (Machine) enters a waiting state. At this point, the P (Processor) detaches from the blocked M (Machine) and pairs with another M (Machine) to execute the next G (Goroutine). This prevents CPU waste even during I/O operation wait times.
- If a single G (Goroutine) occupies the processor for an extended period (preempted), it yields execution to other Gs (Goroutines).
Go's Garbage Collector (GC) also runs on Goroutines, allowing memory to be cleaned up in parallel with minimal interruption to application execution (STW), thereby efficiently utilizing system resources.
Finally, Go's strong advantages are numerous, and I hope many developers enjoy using Go.
Thank you.