Goroutine

When asked to discuss the advantages of Golang to Gophers, discussions frequently feature topics related to Concurrency. The foundation of this content is the goroutine, which can be processed lightly and simply. I have prepared a brief overview concerning this subject.

Concurrency vs Parallelism

Before understanding goroutines, I intend to first address two frequently confused concepts.

Concurrency: Concurrency pertains to handling numerous tasks concurrently. It does not necessarily imply actual simultaneous execution, but rather is a structural and logical concept where multiple tasks are divided into small units and executed in an alternating fashion, making it appear to the user as if multiple tasks are being processed simultaneously. Concurrency is possible even on a single core.
Parallelism: Parallelism is defined as "processing multiple tasks simultaneously across multiple cores." It involves conducting work literally in parallel, executing different tasks at the same time.

Goroutines facilitate the easy implementation of concurrency via the Go runtime scheduler, and through the GOMAXPROCS setting, they naturally enable the utilization of parallelism as well.

The highly utilized multi-threading employed in Java is a representative concept of parallelism.

Why Are Goroutines Advantageous?

Lightweight

The cost of creation is significantly lower compared to other languages. This raises the question of why Golang would use fewer of them, which stems from the fact that their allocation is managed internally by the Go runtime. This is because they are lightweight logical threads; they are smaller than OS thread units, require an initial stack size of approximately 2KB, and the stack size can be dynamically varied by adding capacity based on user implementation. Because they are managed on a stack basis, creation and termination can be handled very quickly and inexpensively, allowing millions of goroutines to run without significant overhead. Consequently, thanks to the runtime scheduler, Goroutines can minimize intervention from the OS kernel.

Performance

Primarily, as explained above, Goroutines involve less OS kernel intervention, meaning that context switching performed at the User-Level incurs a lower cost than that for OS threads, allowing for rapid task switching.

Furthermore, they are managed by being allocated to OS threads using the M:N model. An OS thread pool is established, allowing processing to occur with fewer threads rather than requiring many threads. For instance, if a goroutine enters a waiting state, such as during a system call, the Go runtime executes another goroutine on that OS thread, ensuring the OS thread remains continuously active and utilizes the CPU efficiently for fast processing.

This capability allows Golang to achieve superior performance compared to other languages, particularly in I/O operations.

Concise

A significant advantage is that when concurrency is required, a function can be easily executed simply by prepending the go keyword.

One must utilize complex Locks such as Mutex or Semaphore, and employing Locks invariably necessitates consideration of the potential state of DeadLock, requiring complex steps starting from the pre-development design phase.

Goroutines recommend data transfer via Channel based on the philosophy: "Do not communicate by sharing memory; instead, share memory by communicating." Furthermore, SELECT supports functionality that, when combined with a Channel, allows processing to commence immediately upon data readiness in a channel. Additionally, using sync.WaitGroup permits simple waiting until all multiple goroutines have completed, thereby facilitating easy management of the workflow. Thanks to these tools, data race conditions between threads are prevented, enabling concurrency management with greater safety.

Moreover, by utilizing context, one can control the lifecycle, cancellation, timeout, deadline, and request scope at the User-Level, thereby guaranteeing a certain degree of stability.

Parallel Operations of Goroutines (GOMAXPROCS)

Although we have discussed the merits of goroutine concurrency, a question may arise as to whether parallelism is supported. This is pertinent because recent CPU core counts exceed double digits, unlike in the past, and even consumer PCs contain a substantial number of cores.

However, Goroutines do proceed with parallel tasks; that function is served by GOMAXPROCS.

If GOMAXPROCS is not explicitly set, the configuration differs based on the version.

Prior to 1.5: The default value was 1; configuration via a method such as runtime.GOMAXPROCS(runtime.NumCPU()) was mandatory if a value greater than 1 was required.
1.5 to 1.24: This was changed to the total number of available logical cores. From this point, developers rarely needed to configure it unless significant constraints were necessary.
1.25: Befitting a language renowned in container environments, it now inspects the CPU limit configured for the container by checking the cGroup on Linux.

Consequently, if the number of logical cores is 10 and the CPU limit value is 5, GOMAXPROCS is set to the lower figure, 5.

The modification in 1.25 represents a substantial change because the language's applicability in containerized environments has increased. This change allows for the prevention of CPU throttling by reducing unnecessary thread creation and context switching.

 1package main
 2
 3import (
 4	"fmt"
 5	"math/rand"
 6	"runtime"
 7	"time"
 8)
 9
10func exe(name int, wg *sync.WaitGroup) {
11	defer wg.Done()
12
13	fmt.Printf("Goroutine %d: Start\n", name)
14	time.Sleep(10 * time.Millisecond) // Delay for job simulation
15	fmt.Printf("Goroutine %d: Start\n", name)
16}
17
18func main() {
19	runtime.GOMAXPROCS(2) // Use only 2 CPU cores
20	wg := sync.WaitGroup();
21  goroutineCount := 10
22	wg.Add(goroutineCount)
23
24	for i := 0; i < goroutineCount; i++ {
25		go exe(i, &wg)
26	}
27
28	fmt.Println("Waiting until all goroutines are finished...")
29	wg.Wait()
30	fmt.Println("All tasks are completed.")
31
32}

Goroutine Scheduler (M:N Model)

If we elaborate further on the preceding section regarding allocation and management onto OS threads using the M:N model, we encounter the goroutine GMP model.

G (Goroutine): The smallest unit of work executed in Go.
M (Machine): The OS thread (the actual execution location).
P (Processor): A logical processor managed by the Go runtime.

P additionally possesses a Local Run Queue and functions as the scheduler responsible for assigning the allocated G to an M. Simply put, a goroutine is managed through the following process involving GMP:

The operational process of GMP is as follows:

When a G (Goroutine) is created, it is assigned to the Local Run Queue of a P (Processor).
The P (Processor) allocates the G (Goroutine) in the Local Run Queue to an M (Machine).
The M (Machine) returns the state of the G (Goroutine), which can be block, complete, or preempted.
Work-Stealing: If a P's Local Run Queue becomes empty, another P checks the Global Queue. If there are no G (Goroutines) there either, it steals work from another local P (Processor) to ensure that all M's operate continuously without rest.
System Call Handling (Blocking): If a Block occurs while a G (Goroutine) is executing, the M (Machine) enters a waiting state. At this juncture, the P (Processor) detaches from the blocked M (Machine) and pairs with another M (Machine) to execute the next G (Goroutine). Consequently, there is no wasted CPU time even during waiting periods in I/O operations.
If a single G (Goroutine) preempts for an extended duration, it yields the opportunity for execution to another G (Goroutine).

In Golang, the GC (Garbage Collector) also executes atop Goroutines, allowing for parallel memory cleanup while minimally interrupting application execution (STW), leading to efficient utilization of system resources.

Finally, this [feature] represents one of the strong advantages of Golang, and as there are many others, I hope many developers enjoy using Go.

Thank you.