If Statements in the Go Language
If Statements in Go
First, the reason we chose Go is that among modern languages, Go possesses the most "aesthetically pleasing" assembly, and its syntactic efficiency often proves overwhelming even when compared to classical languages.
Now that we have understood the basic operation of a Go program from the previous lecture, let us proceed to compare Go and Assembly line by line.
Source Code
To begin with, as is the case in Go, even modern compilers, including GCC, automatically optimize branching statements that serve no purpose. Since C compilers like GCC and Clang perform very aggressive optimizations at the industry-standard -O2 level, the era in which a programmer could not fully trust the compiler has effectively been finalized since the late 20th century.
Therefore, it only becomes meaningful if we provide conditions that are difficult for the compiler to predict and transform into alternative syntax.
1package main
2
3import (
4 "os"
5 "strconv"
6)
7
8func main() {
9 // If this were replaced with something predictable like x = 10,
10 // where the branch can be removed, the compiler would optimize it and delete the branch.
11 // Thus, to observe this directly in assembly, one would typically use -O0 in C,
12 // or force the compiler to use external values that are unpredictable.
13 // Since this section covers modern programming, we will not use methods
14 // to disable Go's binary optimization.
15 if len(os.Args) < 2 {
16 return
17 }
18 x, _ := strconv.Atoi(os.Args[1])
19
20 if x < 10 {
21 println("X is smaller than 10")
22 } else {
23 println("X is larger or same as 10")
24 }
25}
In this case, because the input cannot be predicted by the compiler, the branching statement is translated into machine code as is.
Assembly Language
1TEXT main.main(SB) /home/yjlee/introduction-to-golang/learn-golang/if-and-switch/golang-if/main.go
2 main.go:8 0x47a840 493b6610 CMPQ SP, 0x10(R14)
3 main.go:8 0x47a844 7670 JBE 0x47a8b6
4 main.go:8 0x47a846 55 PUSHQ BP
5 main.go:8 0x47a847 4889e5 MOVQ SP, BP
6 main.go:8 0x47a84a 4883ec10 SUBQ $0x10, SP
7 main.go:15 0x47a84e 48833d12fb0a0002 CMPQ os.Args+8(SB), $0x2
8 main.go:15 0x47a856 7c58 JL 0x47a8b0
9 main.go:15 0x47a858 488b0d01fb0a00 MOVQ os.Args(SB), CX
10 main.go:18 0x47a85f 488b4110 MOVQ 0x10(CX), AX
11 main.go:18 0x47a863 488b5918 MOVQ 0x18(CX), BX
12 main.go:18 0x47a867 e834e8ffff CALL strconv.Atoi(SB)
13 main.go:20 0x47a86c 4883f80a CMPQ AX, $0xa
14 main.go:20 0x47a870 7d1d JGE 0x47a88f
15 main.go:21 0x47a872 e809befbff CALL runtime.printlock(SB)
16 main.go:21 0x47a877 488d0519f50100 LEAQ 0x1f519(IP), AX
17 main.go:21 0x47a87e bb15000000 MOVL $0x15, BX
18 main.go:21 0x47a883 e878c6fbff CALL runtime.printstring(SB)
19 main.go:21 0x47a888 e853befbff CALL runtime.printunlock(SB)
20 main.go:21 0x47a88d eb1b JMP 0x47a8aa
21 main.go:23 0x47a88f e8ecbdfbff CALL runtime.printlock(SB)
22 main.go:23 0x47a894 488d05c8040200 LEAQ 0x204c8(IP), AX
23 main.go:23 0x47a89b bb1a000000 MOVL $0x1a, BX
24 main.go:23 0x47a8a0 e85bc6fbff CALL runtime.printstring(SB)
25 main.go:23 0x47a8a5 e836befbff CALL runtime.printunlock(SB)
26 main.go:25 0x47a8aa 4883c410 ADDQ $0x10, SP
27 main.go:25 0x47a8ae 5d POPQ BP
28 main.go:25 0x47a8af c3 RET
29 main.go:16 0x47a8b0 4883c410 ADDQ $0x10, SP
30 main.go:16 0x47a8b4 5d POPQ BP
31 main.go:16 0x47a8b5 c3 RET
32 main.go:8 0x47a8b6 e845f0feff CALL runtime.morestack_noctxt.abi0(SB)
33 main.go:8 0x47a8bb eb83 JMP main.main(SB)
By using the go tool, we are kindly informed about which syntax matches which assembly, one by one.
Since we are learning about comparison statements and if branching statements this time, we should focus on a few lines.
CMPQ Instruction & JL Instruction
The CMPQ instruction is used to compare 4-byte (4-word) data types; its etymology is derived from CoMPare Quadword, abbreviated as CMPQ.
Looking at memory address 0x47a84e, the statement CMPQ os.Args+8(SB), $0x2 is present.
In this case, the program compares the number of received arguments with the hexadecimal 0x2 (which is simply 2).
Afterward, it performs a jump after comparison via JL if the arguments are less than 2 (i.e., if only the program itself is the argument). In other words, this is an abbreviation of Jump if Less than.
Regarding the previous comparison operation, if the argument was less than 2, it jumps to address 0x47a8b0, where JGE is located.
However, since this statement uses the AX register, we must identify the nature of the value stored in the register.
MOVQ Instruction
Next, we need to know how it saves the starting address of the data using the 'CX' register and how it intends to extract the actual data after reading the address.
Looking at the range 0x47858-0x47863, it performs this operation in stages.
First, the starting address of the argument array is inserted into the CX register with the MOVQ os.Args(SB), CX instruction. At this point, one must understand Go's string type.
Go's string is a structure, and this structure consists of 16 bytes, composed of two 8-byte data elements.
| struct | 8 byte | 8 byte |
|---|---|---|
| string | mem address | string length |
Visually, it is as depicted above; the first 8 bytes store the start address of the string, and the latter 8 bytes store the length of the string.
Therefore, the string's address is stored in the AX register, and the string's length is stored in the BX register.
CALL
In previous posts, when looking at runtime functions, the instruction CALL was attached.
This is prefixed to functions used in Go, and it literally means to call a certain function. Afterward, the CALL function is used to convert the string into an integer, and at this point, where the integer is stored is not visible as it is abstracted into the function.
CMPQ Instruction & JGE Instruction
Returning to the previous address 0x47a86c, the instruction is comparing the string's address and the number 0xa (10 in decimal)!
This means that because the corresponding argument is no longer used within the program, it has overwritten the string's location to create a space for the integer variable x.
This is the reality of the aggressive optimization that takes place in languages like Go.
Afterward, the JGE instruction appears, which is an abbreviation for Jump if Greater or Equals. Therefore, this statement asks whether it is greater than or equal to the comparison target.
Thus, it is not the x < 10 statement as is; rather, the comparison direction of the statement is reversed to x < 10!
This is because, in machine code, skipping proactively when the condition is not met is more intuitive and saves one instruction compared to performing the comparison once when the condition is met and then checking again if it is not met.
Since such optimization is quite classical, it is a pattern that appears frequently even in compilers with significantly lower levels of optimization, unlike the strconv.Atoi example seen above, so it is worth noting.
Therefore, by applying this, one can obtain source code that is 100% identical at the assembly level even if the source code itself is different.
Mirror Image Code Example
Using the script below, one can verify that the bash script creates two source files that are 100% mirror images, and when looking at just main (excluding metadata that changes from time to time), one obtains exactly the same assembly.
1#!/usr/bin/env bash
2
3# 1. Complete initialization of existing residual files and directories
4echo "[1/6] Cleaning up old artifacts..."
5rm -rf test_dir main_orig main_asm orig.asm asm.asm orig_pure.asm asm_pure.asm
6mkdir -p test_dir
7
8# 2. Write original version source code (main.go)
9echo "[2/6] Generating main.go..."
10cat << 'EOF' > main.go
11package main
12
13import (
14 "os"
15 "strconv"
16)
17
18func main() {
19 if len(os.Args) < 2 {
20 return
21 }
22 x, _ := strconv.Atoi(os.Args[1])
23
24 s1 := "X is smaller than 10"
25 s2 := "X is larger or same as 10"
26
27 if x < 10 {
28 println(s1)
29 } else {
30 println(s2)
31 }
32}
33EOF
34
35# 3. Write mirror image version source code (main_from_asm.go)
36# Perfectly symmetrically synchronize the operator structure so that the compiler uses the optimization template (JGE) as is
37echo "[3/6] Generating main_from_asm.go..."
38cat << 'EOF' > main_from_asm.go
39package main
40
41import (
42 "os"
43 "strconv"
44)
45
46func main() {
47 if len(os.Args) < 2 {
48 return
49 }
50 x, _ := strconv.Atoi(os.Args[1])
51
52 s1 := "X is smaller than 10"
53 s2 := "X is larger or same as 10"
54
55 // Maintaining the x < 10 structure with 10 as the baseline ensures the compiler
56 // adopts the exact same JGE mechanism and block arrangement as main.go.
57 if x < 10 {
58 println(s1)
59 } else {
60 println(s2)
61 }
62}
63EOF
64
65# 4. Perform builds in an identical directory path and filename environment
66echo "[4/6] Compiling both sources inside 'test_dir'..."
67cp main.go test_dir/main.go
68cd test_dir && go build -o ../main_orig main.go && cd ..
69
70rm test_dir/main.go
71cp main_from_asm.go test_dir/main.go
72cd test_dir && go build -o ../main_asm main.go && cd ..
73
74# 5. Extract pure main.main assembly function using go tool objdump
75echo "[5/6] Extracting main.main assembly sections..."
76go tool objdump -s "main\.main" main_orig > orig.asm
77go tool objdump -s "main\.main" main_asm > asm.asm
78
79# Remove virtual address, offset, and machine code byte data text,
80# and filter only the pure instruction set (Opcode & Operands) fields that the CPU will execute
81awk '{print $4, $5, $6, $7}' orig.asm > orig_pure.asm
82awk '{print $4, $5, $6, $7}' asm.asm > asm_pure.asm
83
84# 6. Verify diff of both machine code instruction structures
85echo "[6/6] Verifying assembly structural integrity via diff..."
86echo "------------------------------------------------------------"
87
88if diff orig_pure.asm asm_pure.asm > /dev/null; then
89 echo "===> [Success] The main.main machine code logic of both binaries matches 100%! <==="
90 echo "We perfectly synchronized the compiler's optimization pipeline guidelines to obtain the same assembly."
91else
92 echo "===> [Failure] Differences were found in the assembly instruction structure. <==="
93 diff -u orig_pure.asm asm_pure.asm
94fi
95echo "------------------------------------------------------------"
When actually running the source, you can obtain the following information.
1[1/6] Cleaning up old artifacts...
2[2/6] Generating main.go...
3[3/6] Generating main_from_asm.go...
4[4/6] Compiling both sources inside 'test_dir'...
5[5/6] Extracting main.main assembly sections...
6[6/6] Verifying assembly structural integrity via diff...
7------------------------------------------------------------
8===> [Success] The main.main machine code logic of both binaries matches 100%! <===
9We perfectly synchronized the compiler's optimization pipeline guidelines to obtain the same assembly.
10------------------------------------------------------------
Conclusion
We have learned that while programming languages provide many abstractions, there are very interesting and aggressive optimizations hidden behind these abstractions. Furthermore, by exploiting this, we were able to create mirror-image code where the sources differ but the assembly is identical. If you are interested in the low-level and happen to encounter proprietary software written in Go, it seems that disassembling and analyzing the assembly directly to recover the source is not an impossible task.
Next Lecture
In the next session, we will look into the select-case statement, which has a different kind of charm than the If statement.