RISC-V Bare Metal Programming Chapter 3: A Link to the Past

Submitted by MarcAdmin on Tue, 11/12/2019 - 12:27

Previous chapters of the RISC-V bare metal programming tutorial have focused primarily on the assembler. In chapter 2, assembler directives were discussed along with their relationship to the positioning of code in an executable. The various sections of where code and data reside have well defined semantics in the Executable and Linkable Format specification. In this chapter, these semantics and the linking process will be examined in more detail.

The typical programming workflow involves processing the source file using either an assembler (for assembly code), or a compiler (for higher-level programming languages such as C) to produce an object file. The object file by itself cannot be run since it will have references to memory addresses which are relative to the code's position rather than absolute memory offsets. The relative addresses need to be translated into absolute addresses in order for them to make sense to the processor. For example, the following code to call the sum function:

        .text
        .align 2
        .global _start
        .global _stack_end
_start:
        li      a0, 5
        li      a1, 4
        la      sp,_stack_end
        call    sum
stop:   j       stop

Will be assembled to the following object code:

 1: $ riscv64-unknown-elf-objdump -d main.o
 2: 
 3: main.o:     file format elf64-littleriscv
 4: 
 5: 
 6: Disassembly of section .text:
 7: 
 8: 0000000000000000 <_start>:
 9:    0:	00500513          	li	a0,5
10:    4:	00400593          	li	a1,4
11:    8:	00000117          	auipc	sp,0x0
12:    c:	00010113          	mv	sp,sp
13:   10:	00000097          	auipc	ra,0x0
14:   14:	000080e7          	jalr	ra # 10 <_start+0x10>
15: 
16: 0000000000000018 <stop>:
17:   18:	0000006f          	j	18 <stop>
18: 

The auipc (Add Upper Immediate to PC) adds 0 to the program counter on line 11 which stores the value of the program counter in the sp register. This is followed by a no-op (a move of the value of sp into sp which simply increments the program counter). The purpose of this sequence of instructions is to set the top of the stack, the zero in this instruction must be replaced with the stack's address. The linker is responsible for filling in the correct address which is at the offset defined by the _stack_end symbol. The final linked application is shown in the following listing:

$ riscv64-unknown-elf-objdump -d sum.elf

sum.elf:     file format elf64-littleriscv


Disassembly of section .text:

0000000080000000 <_start>:
    80000000:	00500513          	li	a0,5
    80000004:	00400593          	li	a1,4
    80000008:	00009117          	auipc	sp,0x9
    8000000c:	ff810113          	addi	sp,sp,-8 # 80009000 <_stack_end>
    80000010:	008000ef          	jal	ra,80000018 <sum>

0000000080000014 <stop>:
    80000014:	0000006f          	j	80000014 <stop>

0000000080000018 <sum>:
    80000018:	fe010113          	addi	sp,sp,-32
    8000001c:	00113c23          	sd	ra,24(sp)
    80000020:	00b50533          	add	a0,a0,a1
    80000024:	01813083          	ld	ra,24(sp)
    80000028:	02010113          	addi	sp,sp,32
    8000002c:	00008067          	ret

In this listing, the auipc instruction will set sp to address 0x80009008 because:

  • pc is 0x80000008 (the offset of the current instruction)
  • The result of auipc is sp = pc + (0x9 << 12)

The next instruction will subtract 8 from the value of sp to set the top of the stack at memory offset 0x80009000.

Similarly, the next two instructions starting at line 13 (offsets 0x10 and 0x14) of the object file are place-holders for the the call to the sum subroutine. The auipc instruction sets the ra register to the current value of the program counter (ra = pc + (0x0 << 12)), the next instruction jumps to the address in ra (offset 10) then sets ra to pc + 4 (0x18). If the program could execute as-is, this would result in an infinite loop. The proper memory offsets need to be filled in by the linker.

In the final linked program, these two instructions are replaced by a jal which sets ra (the return address) to the instruction at 0x80000014 (the infinite loop), then jumps to the offset at 0x80000018 (the start of the sum subroutine).

The Adventure of Link

Although the linking phase may seem like magic, it is largely under the control of the developer via the linker script. Chapter 2 introduced an example linker script. However, the explanation of its purpose was very superficial. In this chapter, the process of linking the application will be studied more thoroughly. The linker script from the chapter 2 is illustrated in the following listing:

 1: OUTPUT_ARCH( "riscv" )
 2: SECTIONS {
 3: 	. = 0x80000000;
 4: 	.text : {
 5: 		PROVIDE(_text_start = .);
 6: 		* (.text.init);
 7: 		* (.text .text.*);
 8: 		PROVIDE(_text_end = .);
 9: 	}
10: 	PROVIDE(_global_pointer = .);
11: 	.rodata : {
12: 		PROVIDE(_rodata_start = .);
13: 		*(.srodata .srodata.*) *(.rodata .rodata.*)
14: 		PROVIDE(_rodata_end = .);
15: 	}
16: 	.data : {
17: 		. = ALIGN(4096);
18: 		PROVIDE(_data_start = .);
19: 		*(.sdata .sdata.*) *(.data .data.*)
20: 		PROVIDE(_data_end = .);
21: 	}
22: 	.bss : {
23: 		PROVIDE(_bss_start = .);
24: 		*(.sbss .sbss.*) *(.bss .bss.*)
25: 		PROVIDE(_bss_end = .);
26: 	}
27: 	PROVIDE(_stack_start = _bss_end);
28: 	PROVIDE(_stack_end = _stack_start + 0x8000);
29: }

The first statement in the linker script sets the architecture of the target machine; in this case RISC-V.

The more important command is the SECTIONS statement which defines the different sections of the ELF file. As discussed in chapter 2, code and data have different memory requirements. Typically code will be stored in read-only memory and data will be stored in memory that can be read-only or writable depending on the constraints on the data. The SECTIONS declaration is used to prescribe how the code and data will be organized in the final binary.

Within the SECTIONS block of the script, the period (.) is a special token that represents the location counter. This is essentially the current offset in memory. By default the location counter always starts at offset 0. However, the current position can be set explicitly by assigning to the period token. Since the reset vector of the VirtIO board is at memory address 0x80000000, this address is assigned to the location counter to ensure that this area of memory is populated by the linked binary.

Code

Once the location counter is initialized, the next statement in the linker script is the declaration of the .text section. This is where all the executable code is expected to be found. The section is declared by specifying its name (.text) followed by a colon (:) and a pair of braces that enclose the statements specific to the current section.

The first statement in the .text section block is a PROVIDE command. This defines a linker symbol that can be used when linking. In this case it defines a global symbol named _text_start which has the current value of the location counter. This symbol can be used to refer to the starting offset of the .text section in memory.

The next statement uses wildcards to aggregate the assembly code in the .text.init section of all object files provided to the linker. The next statement aggregates the code from the .text and .text.* sections in all of the provided object files. The later rule will match any section name prefixed with the .text. substring.

Note that the order in which object files are provided to the linker matters. If the add.o and main.o files are linked with the add function provided first, the resulting binary will have the following layout:

$ riscv64-unknown-elf-ld -T linker.lds -o sum.elf add.o main.o
$ riscv64-unknown-elf-objdump -d sum.elf 

sum.elf:     file format elf64-littleriscv


Disassembly of section .text:

0000000080000000 <sum>:
    80000000:	fe010113          	addi	sp,sp,-32
    80000004:	00113c23          	sd	ra,24(sp)
    80000008:	00b50533          	add	a0,a0,a1
    8000000c:	01813083          	ld	ra,24(sp)
    80000010:	02010113          	addi	sp,sp,32
    80000014:	00008067          	ret

0000000080000018 <_start>:
    80000018:	00500513          	li	a0,5
    8000001c:	00400593          	li	a1,4
    80000020:	00009117          	auipc	sp,0x9
    80000024:	fe010113          	addi	sp,sp,-32 # 80009000 <_stack_end>
    80000028:	fd9ff0ef          	jal	ra,80000000 <sum>

000000008000002c <stop>:
    8000002c:	0000006f          	j	8000002c <stop>

This is clearly not the desired result since the first instruction will create a stack frame, save the return address (which is undefined), then perform the add with un-initialized argument registers. To ensure that the initialization comes before the function implementation, the main.s file must be changed to add its code to the .text.init section:

 1:         .section ".text.init"
 2:         .align 2
 3:         .global _start
 4:         .global _stack_end
 5: _start:
 6:         li      a0, 5
 7:         li      a1, 4
 8:         la      sp,_stack_end
 9:         call    sum
10: stop:   j       stop

The main.s file uses the .section assembler directive on line 1 to declare that all code that follows should be copied into the .text.init section. This will ensure that the initialization code will always be first due to the command to aggregate code from this section preceding any other in the linker script. The linked program will now have the proper layout regardless of the order in which the object files are provided to the linker.

$ riscv64-unknown-elf-ld -T linker.lds -o sum.elf add.o main.o
$ riscv64-unknown-elf-objdump -d sum.elf 

sum.elf:     file format elf64-littleriscv


Disassembly of section .text:

0000000080000000 <_start>:
    80000000:	00500513          	li	a0,5
    80000004:	00400593          	li	a1,4
    80000008:	00009117          	auipc	sp,0x9
    8000000c:	ff810113          	addi	sp,sp,-8 # 80009000 <_stack_end>
    80000010:	008000ef          	jal	ra,80000018 <sum>

0000000080000014 <stop>:
    80000014:	0000006f          	j	80000014 <stop>

0000000080000018 <sum>:
    80000018:	fe010113          	addi	sp,sp,-32
    8000001c:	00113c23          	sd	ra,24(sp)
    80000020:	00b50533          	add	a0,a0,a1
    80000024:	01813083          	ld	ra,24(sp)
    80000028:	02010113          	addi	sp,sp,32
    8000002c:	00008067          	ret

The _start code is located at offset 0x80000000 as expected even if the object file for the sum function was provided to the linker first. Now that the code is properly organized, let's look at the data.

Data

The linker script defines three data sections: .rodata, .data, and .bss. It may seem odd to have four sections (if the .text section is counted) when a program is comprised only of two types on information: code and data. The reason is that data can be divided up into three categories:

  • Global Read-only Data
  • Global Initialized Data
  • Global Un-initialized Data

Local data is not considered because this type of data is generated at run time and will be stored either in stack memory, or some allocated heap buffer. For now the focus will be on global data. The differences in each type of data can be illustrated using a simple C program:

 1: const int operand1 = 4 ;
 2: int operand2 = 5 ;
 3: int result ;
 4: 
 5: int
 6: sum( int op1, int op2 )
 7: {
 8:         return op1 + op2 ;
 9: }
10: 
11: int
12: main( int argc, char** argv )
13: {
14:         result = sum( operand1, operand2 ) ;
15:         while ( 1 ) ;
16: }

Global Read-only Data

The operand1 variable is declared on line 1. This is a read-only value initialized to the integer 4. The compiler will store this data in the .srodata section of the object file:

$ riscv64-unknown-elf-objdump -s -j.srodata addc.o 

addc.o:     file format elf64-littleriscv

Contents of section .srodata:
 0000 04000000                             ....

Assembler directives can also be used to populate the .rodata section in an assembly program. The following fragment can be added to the end of the main.s program to declare the operand1 constant in the .rodata section:

1:         .section ".rodata"
2: operand1:       .word   4

The constant operand1 is defined on line 2 using the .word assembler directive. This will store the given value as a 32-bit quantity in the current memory word. This directive allows any number of words to be stored by specifying the values as a comma separated list. For example, the following directive will store the 32-bit values 0x0001, 0x0002, and 0x0003 in successive memory words:

.word 1, 2, 3

If this program is linked using the example linker script, this data will be included in the .rodata section. The script instructs the linker to aggregate all definitions of .srodata and .rodata as well as section names prefixed with .rodata. or .srodata. into a single .rodata section.

Global Initialized Data

The operand2 variable is declared on line 2. This is a mutable variable initialized with the integer value 5. The compiler will store this data in the .sdata section of the object file.

$ riscv64-unknown-elf-objdump -s -j.sdata addc.o 

addc.o:     file format elf64-littleriscv

Contents of section .sdata:
 0000 05000000                             ....

The linker.lds script instructs the linker to aggregate all data declared in the .sdata, and .data sections, as well as all sections prefixed by .data. or .sdata. into a single .data section.

The operand2 global value can similarly be defined using assembler directives. The relevant assembly code is illustrated in the following listing:

        .data
operand2:       .word   5

The .data assembler directive ensures that all following instructions or directives will affect the .data section.

1.2.3 Global Un-initialized Data

The result variable is declared on line 3 of the C program. This variable is said to be un-initialized because it is not assigned a value at build time. The C language guarantees that un-initialized global variables will be initialized to zero. The system is responsible for initializing this data to zero before handing control over to the C program. This is easier to do if un-initialized data is aggregated into a common section. The loader can then simply zero out the entire section; in this case the .bss section:

$ riscv64-unknown-elf-objdump -D -j.bss sumc.o 

sumc.o:     file format elf64-littleriscv


Disassembly of section .bss:

0000000000000000 <result>:
   0:	0000                	unimp
	...

The linker script will instruct the linker to aggregate all data definitions in the .sbss or .bss sections as, well as those in section names prefixed by .sbss. or .bss., into a single .bss section.

The result global variable can also be defined using assembler directives. The relevant assembly code is illustrated in the following listing:

        .bss
result:         .word   0

The .bss assembler directive will ensure that the result memory location is located in the .bss section of the binary file. Note that the value of the result variable is initialized to zero. Even though this is un-initialized data, if no value is specified, no memory will be allocated to the result variable. Since this is the .bss section, initializing this memory to zero satisfies the C language guarantees.

The Final Boss

The assembly program to add two integer values can be update to read its operands from memory. The following listing shows the updated main.s assembly code:

 1:         .section ".text.init"
 2:         .align 2
 3:         .global _start
 4:         .global _stack_end
 5: _start:
 6:         lw      a0, operand2
 7:         lw      a1, operand1
 8:         la      sp,_stack_end
 9:         call    sum
10:         la      t1, result
11:         sw      a0, 0(t1)
12: stop:   j       stop
13:         .section ".rodata"
14: operand1:       .word   4
15:         .data
16: operand2:       .word   5
17:         .bss
18: result:         .word   0

The value of operand2 is loaded into the function argument register a0 on line 6. The value of operand1 is loaded into the function argument register a1 on line 7. The sum function is called with those operands and the result (stored in a0), is saved in the memory word allocated for result in the .bss section on line 11. Before saving the result, the memory location of the result variable must first be loaded into a register; this is accomplished using the la instruction on line 10.

This program can be assembled and linked with the add.s program as follows:

$ riscv64-unknown-elf-as -o main.o main.s
$ riscv64-unknown-elf-as -o add.o add.s
$ riscv64-unknown-elf-ld -T linker.lds -o sum.elf add.o main.o

The memory offset of the result can be obtained by inspecting the resulting ELF file:

$ riscv64-unknown-elf-objdump -D -j.bss sum.elf 

sum.elf:     file format elf64-littleriscv


Disassembly of section .bss:

0000000080001004 <result>:
    80001004:	0000                	unimp
	...

The program can now be tested using the qemu emulator. Given that the memory location of the result variable is 0x80001004, we can inspect this memory location to ensure that it contains the expected result:

$ qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel sum.elf
QEMU 3.1.0 monitor - type 'help' for more information
(qemu) xp /1dw 0x80001004
0000000080001004:          9
(qemu) 

The memory location that corresponds with the result variable does in fact contain the value 9 (which is the sum of 4 and 5). Therefore the program is behaving as expected.

Conclusion

In this chapter, the linking process was studied in more detail. The process of linking the object files into a final binary was hopefully demystified by describing the process of transforming relative offsets to absolute memory offsets. The primary example was setting the location of the top of the stack and setting up the sum function call.

The use of a linker script illustrates how code and data can more intelligently be organized in a binary file. The greater flexibility offered when using the linker script has allowed for enhancements to the original add.s program from chapter 1. The operands for the sum are now read from offsets in memory; each in different data sections. Moreover additional assembler directives were introduced in this chapter to allow better control over how code and data are placed in the linked binary file. In particular:

.text
Specify that what follows goes into the .text section.
.data
Specify that what follows goes into the .data section.
.bss
Specify that what follows goes into the .bss section.
.section
Set the section name explicitly for the code and data that follows.
.word
Store the specified 32-bit quantities into successive memory words.

Other useful assembler directives that were not covered in this chapter include:

.byte
Store the specified 8-bit quantities into successive bytes of memory.
.half
Store the specified 16-bit quantities into successive memory half-words.
.dword
Store the specified 64-bit quantities into successive memory double-words.
.string
Store a string in memory and null-terminate it.

Following chapters will start to look at some of the standard extensions of the RISC-V ISA. So far only the RV64I instructions have been used, wereas the ISA defines extensions for multiplication (RVM), atomic operations (RVA), floating point and double precision floating point operations (RVF and RVD), and compressed instructions (RVC). These extensions will be useful for building more complex programs.

Tags

Add new comment