GoSuda

If Statements in the Go Language

By Lee Yunjin
views ...

If Statements in Go

First, the reason we chose Go is that among modern languages, Go possesses the most "aesthetically pleasing" assembly, and its syntactic efficiency often proves overwhelming even when compared to classical languages.

Now that we have understood the basic operation of a Go program from the previous lecture, let us proceed to compare Go and Assembly line by line.

Source Code

To begin with, as is the case in Go, even modern compilers, including GCC, automatically optimize branching statements that serve no purpose. Since C compilers like GCC and Clang perform very aggressive optimizations at the industry-standard -O2 level, the era in which a programmer could not fully trust the compiler has effectively been finalized since the late 20th century.

Therefore, it only becomes meaningful if we provide conditions that are difficult for the compiler to predict and transform into alternative syntax.

 1package main
 2
 3import (
 4    "os"
 5    "strconv"
 6)
 7
 8func main() {
 9    // If this were replaced with something predictable like x = 10,
10    // where the branch can be removed, the compiler would optimize it and delete the branch.
11    // Thus, to observe this directly in assembly, one would typically use -O0 in C,
12    // or force the compiler to use external values that are unpredictable.
13    // Since this section covers modern programming, we will not use methods 
14    // to disable Go's binary optimization.
15    if len(os.Args) < 2 {
16        return
17    }
18    x, _ := strconv.Atoi(os.Args[1])
19
20    if x < 10 {
21        println("X is smaller than 10")
22    } else {
23        println("X is larger or same as 10")
24    }
25}

In this case, because the input cannot be predicted by the compiler, the branching statement is translated into machine code as is.

Assembly Language

 1TEXT main.main(SB) /home/yjlee/introduction-to-golang/learn-golang/if-and-switch/golang-if/main.go
 2  main.go:8             0x47a840                493b6610                CMPQ SP, 0x10(R14)
 3  main.go:8             0x47a844                7670                    JBE 0x47a8b6
 4  main.go:8             0x47a846                55                      PUSHQ BP
 5  main.go:8             0x47a847                4889e5                  MOVQ SP, BP
 6  main.go:8             0x47a84a                4883ec10                SUBQ $0x10, SP
 7  main.go:15            0x47a84e                48833d12fb0a0002        CMPQ os.Args+8(SB), $0x2
 8  main.go:15            0x47a856                7c58                    JL 0x47a8b0
 9  main.go:15            0x47a858                488b0d01fb0a00          MOVQ os.Args(SB), CX
10  main.go:18            0x47a85f                488b4110                MOVQ 0x10(CX), AX
11  main.go:18            0x47a863                488b5918                MOVQ 0x18(CX), BX
12  main.go:18            0x47a867                e834e8ffff              CALL strconv.Atoi(SB)
13  main.go:20            0x47a86c                4883f80a                CMPQ AX, $0xa
14  main.go:20            0x47a870                7d1d                    JGE 0x47a88f
15  main.go:21            0x47a872                e809befbff              CALL runtime.printlock(SB)
16  main.go:21            0x47a877                488d0519f50100          LEAQ 0x1f519(IP), AX
17  main.go:21            0x47a87e                bb15000000              MOVL $0x15, BX
18  main.go:21            0x47a883                e878c6fbff              CALL runtime.printstring(SB)
19  main.go:21            0x47a888                e853befbff              CALL runtime.printunlock(SB)
20  main.go:21            0x47a88d                eb1b                    JMP 0x47a8aa
21  main.go:23            0x47a88f                e8ecbdfbff              CALL runtime.printlock(SB)
22  main.go:23            0x47a894                488d05c8040200          LEAQ 0x204c8(IP), AX
23  main.go:23            0x47a89b                bb1a000000              MOVL $0x1a, BX
24  main.go:23            0x47a8a0                e85bc6fbff              CALL runtime.printstring(SB)
25  main.go:23            0x47a8a5                e836befbff              CALL runtime.printunlock(SB)
26  main.go:25            0x47a8aa                4883c410                ADDQ $0x10, SP
27  main.go:25            0x47a8ae                5d                      POPQ BP
28  main.go:25            0x47a8af                c3                      RET
29  main.go:16            0x47a8b0                4883c410                ADDQ $0x10, SP
30  main.go:16            0x47a8b4                5d                      POPQ BP
31  main.go:16            0x47a8b5                c3                      RET
32  main.go:8             0x47a8b6                e845f0feff              CALL runtime.morestack_noctxt.abi0(SB)
33  main.go:8             0x47a8bb                eb83                    JMP main.main(SB)

By using the go tool, we are kindly informed about which syntax matches which assembly, one by one.

Since we are learning about comparison statements and if branching statements this time, we should focus on a few lines.

CMPQ Instruction & JL Instruction

The CMPQ instruction is used to compare 4-byte (4-word) data types; its etymology is derived from CoMPare Quadword, abbreviated as CMPQ.

Looking at memory address 0x47a84e, the statement CMPQ os.Args+8(SB), $0x2 is present. In this case, the program compares the number of received arguments with the hexadecimal 0x2 (which is simply 2).

Afterward, it performs a jump after comparison via JL if the arguments are less than 2 (i.e., if only the program itself is the argument). In other words, this is an abbreviation of Jump if Less than. Regarding the previous comparison operation, if the argument was less than 2, it jumps to address 0x47a8b0, where JGE is located. However, since this statement uses the AX register, we must identify the nature of the value stored in the register.

MOVQ Instruction

Next, we need to know how it saves the starting address of the data using the 'CX' register and how it intends to extract the actual data after reading the address.

Looking at the range 0x47858-0x47863, it performs this operation in stages.

First, the starting address of the argument array is inserted into the CX register with the MOVQ os.Args(SB), CX instruction. At this point, one must understand Go's string type.

Go's string is a structure, and this structure consists of 16 bytes, composed of two 8-byte data elements.

struct8 byte8 byte
stringmem addressstring length

Visually, it is as depicted above; the first 8 bytes store the start address of the string, and the latter 8 bytes store the length of the string.

Therefore, the string's address is stored in the AX register, and the string's length is stored in the BX register.

CALL

In previous posts, when looking at runtime functions, the instruction CALL was attached. This is prefixed to functions used in Go, and it literally means to call a certain function. Afterward, the CALL function is used to convert the string into an integer, and at this point, where the integer is stored is not visible as it is abstracted into the function.

CMPQ Instruction & JGE Instruction

Returning to the previous address 0x47a86c, the instruction is comparing the string's address and the number 0xa (10 in decimal)!

This means that because the corresponding argument is no longer used within the program, it has overwritten the string's location to create a space for the integer variable x.

This is the reality of the aggressive optimization that takes place in languages like Go.

Afterward, the JGE instruction appears, which is an abbreviation for Jump if Greater or Equals. Therefore, this statement asks whether it is greater than or equal to the comparison target.

Thus, it is not the x < 10 statement as is; rather, the comparison direction of the statement is reversed to x < 10! This is because, in machine code, skipping proactively when the condition is not met is more intuitive and saves one instruction compared to performing the comparison once when the condition is met and then checking again if it is not met.

Since such optimization is quite classical, it is a pattern that appears frequently even in compilers with significantly lower levels of optimization, unlike the strconv.Atoi example seen above, so it is worth noting.

Therefore, by applying this, one can obtain source code that is 100% identical at the assembly level even if the source code itself is different.

Mirror Image Code Example

Using the script below, one can verify that the bash script creates two source files that are 100% mirror images, and when looking at just main (excluding metadata that changes from time to time), one obtains exactly the same assembly.

 1#!/usr/bin/env bash
 2
 3# 1. Complete initialization of existing residual files and directories
 4echo "[1/6] Cleaning up old artifacts..."
 5rm -rf test_dir main_orig main_asm orig.asm asm.asm orig_pure.asm asm_pure.asm
 6mkdir -p test_dir
 7
 8# 2. Write original version source code (main.go)
 9echo "[2/6] Generating main.go..."
10cat << 'EOF' > main.go
11package main
12
13import (
14        "os"
15        "strconv"
16)
17
18func main() {
19        if len(os.Args) < 2 {
20                return
21        }
22        x, _ := strconv.Atoi(os.Args[1])
23
24        s1 := "X is smaller than 10"
25        s2 := "X is larger or same as 10"
26
27        if x < 10 {
28                println(s1)
29        } else {
30                println(s2)
31        }
32}
33EOF
34
35# 3. Write mirror image version source code (main_from_asm.go)
36# Perfectly symmetrically synchronize the operator structure so that the compiler uses the optimization template (JGE) as is
37echo "[3/6] Generating main_from_asm.go..."
38cat << 'EOF' > main_from_asm.go
39package main
40
41import (
42        "os"
43        "strconv"
44)
45
46func main() {
47        if len(os.Args) < 2 {
48                return
49        }
50        x, _ := strconv.Atoi(os.Args[1])
51
52        s1 := "X is smaller than 10"
53        s2 := "X is larger or same as 10"
54
55        // Maintaining the x < 10 structure with 10 as the baseline ensures the compiler
56        // adopts the exact same JGE mechanism and block arrangement as main.go.
57        if x < 10 {
58                println(s1)
59        } else {
60                println(s2)
61        }
62}
63EOF
64
65# 4. Perform builds in an identical directory path and filename environment
66echo "[4/6] Compiling both sources inside 'test_dir'..."
67cp main.go test_dir/main.go
68cd test_dir && go build -o ../main_orig main.go && cd ..
69
70rm test_dir/main.go
71cp main_from_asm.go test_dir/main.go
72cd test_dir && go build -o ../main_asm main.go && cd ..
73
74# 5. Extract pure main.main assembly function using go tool objdump
75echo "[5/6] Extracting main.main assembly sections..."
76go tool objdump -s "main\.main" main_orig > orig.asm
77go tool objdump -s "main\.main" main_asm > asm.asm
78
79# Remove virtual address, offset, and machine code byte data text, 
80# and filter only the pure instruction set (Opcode & Operands) fields that the CPU will execute
81awk '{print $4, $5, $6, $7}' orig.asm > orig_pure.asm
82awk '{print $4, $5, $6, $7}' asm.asm > asm_pure.asm
83
84# 6. Verify diff of both machine code instruction structures
85echo "[6/6] Verifying assembly structural integrity via diff..."
86echo "------------------------------------------------------------"
87
88if diff orig_pure.asm asm_pure.asm > /dev/null; then
89    echo "===> [Success] The main.main machine code logic of both binaries matches 100%! <==="
90    echo "We perfectly synchronized the compiler's optimization pipeline guidelines to obtain the same assembly."
91else
92    echo "===> [Failure] Differences were found in the assembly instruction structure. <==="
93    diff -u orig_pure.asm asm_pure.asm
94fi
95echo "------------------------------------------------------------"

When actually running the source, you can obtain the following information.

 1[1/6] Cleaning up old artifacts...
 2[2/6] Generating main.go...
 3[3/6] Generating main_from_asm.go...
 4[4/6] Compiling both sources inside 'test_dir'...
 5[5/6] Extracting main.main assembly sections...
 6[6/6] Verifying assembly structural integrity via diff...
 7------------------------------------------------------------
 8===> [Success] The main.main machine code logic of both binaries matches 100%! <===
 9We perfectly synchronized the compiler's optimization pipeline guidelines to obtain the same assembly.
10------------------------------------------------------------

Conclusion

We have learned that while programming languages provide many abstractions, there are very interesting and aggressive optimizations hidden behind these abstractions. Furthermore, by exploiting this, we were able to create mirror-image code where the sources differ but the assembly is identical. If you are interested in the low-level and happen to encounter proprietary software written in Go, it seems that disassembling and analyzing the assembly directly to recover the source is not an impossible task.

Next Lecture

In the next session, we will look into the select-case statement, which has a different kind of charm than the If statement.