The previous chapter of this tutorial went over the steps required to setup a RISC-V development environment to create a program that runs on a bare metal VirtIO board using QEMU. Even though the example program – which calculates the sum two integers – was written in RISC-V assembly, no prior knowledge was required to follow along. This chapter will dive into the details of RISC-V assembly language as well as expand on what exactly is happening at each step of the development. The topics covered in this chapter will include an overview of the RISC-V architecture, its assembly instructions, pseudo-instructions, and directives.
The following listing illstrates the assembly code of the add.s program from the previous chapter:
1: .text 2: .global _start 3: _start: 4: li a0, 5 5: li a1, 4 6: add a0, a0, a1 7: stop: j stop
The code was changed a little to use a different set of registers. This program when assembled results in an object file which can then be linked to create the file that will be loaded onto the board. In this scenario, only the one object file is used, however, more complex programs may required more than one object file. The linker's job is to put all of these files together into a single executable program.
This add.s program is composed of one instruction (6), three pseudo-instructions (4 5, and 7), and two directives (1, 2). Moreover, the instructions and pseudo-instructions have operands comprised of either registers or immediates. Understanding each of these entities will help when creating more complex programs in RISC-V assembly.
Registers
RISC-V systems will have a base set of 32 registers x0-x31. The x0 register is read-only with a value fixed to zero. The rest will have varying content. The application binary interface (ABI) prescribes conventions for the name and usage of the various registers. These are listed in the following table:
Register(s) | ABI Name(s) | Description | Saved by |
---|---|---|---|
x0 | zero | Hard-wired zero | N/A |
x1 | ra | Function return address | Caller |
x2 | sp | Stack pointer | Callee |
x3 | gp | Global pointer | N/A |
x4 | tp | Thread pointer | N/A |
x5 | t0 | Temporary/alternate link register | Caller |
x6-x7 | t1-t2 | Temporary values | Caller |
x8 | s0/fp | Saved register/Frame pointer | Callee |
x9 | s1 | Saved register | Callee |
x10-x11 | a0-a1 | Function arguments/Return values | Caller |
x12-x17 | a2-a7 | Function arguments | Caller |
x18-x27 | s2-s11 | Saved registers | Callee |
x28-x31 | t3-t6 | Temporary values | Caller |
Registers can be referred by their ABI names or their actual names in assembly programs; the two are interchangeable.
When a function is invoked, it may modify the values of some of these registers. For this reason it is advisable to save the contents of those registers in memory in order to be able to restore them when the function completes. The ABI convention prescribes which party in a function call (the caller or the callee) is responsible for saving these values. This convention is described in the "Saved by" column of the table.
If a register is to be saved by the caller, its value should be stored in a frame of the stack, that was allocated for that purpose, prior to calling the function. This ensures that the values can be restored when the function returns. In general, it is a good idea to save all of the registers if the caller does not know which registers may be modified by the callee.
Registers to be saved by the callee only need to be saved to memory if the function uses those registers. Functions must not leave a trace, the state of the machine must be the same as it was prior to the function being invoked (with the exception of the desired function result).
A function implementation defined in RISC-V assembly should use the following prologue before doing any of its work:
function_label: addi sp, sp, -framesize # The stack grows downward sd ra,framesize-8(sp) # Save the return address # Save registers owned by the callee as needed to memory.
This will ensure that the function can return to the point where it was called, and that any register state will be saved.
Before a function returns, the saved register values must be restored. This is achieved by the following epilogue which should end a function call.
# restore registers from the stack if needed ld ra, framesize-8(sp) # Restore the return address register addi sp, sp, framesize # Pop the stack ret # return to the caller
This will restore the saved registers, set the return address and de-allocate the stack frame that was used to save this information.
The add.s program can be enhanced to use the function prologue and epliogue to define a function that calculates the sum its arguments in registers a0 and a1. This new program is illustrated in the listing that follows:
.text .align 2 .global sum sum: addi sp, sp, -32 # Stack frames must be 16-bit aligned sd ra, 24(sp) # Save the return address add a0, a0, a1 # Add the function operands ld ra, 24(sp) # restore return address addi sp, sp, 32 # De-allocate the stack frame ret
The sum function can be called by name from a different assembler program. Create a main.s program with the following content:
.text .align 2 .global _start _start: li a0, 5 li a1, 4 call sum stop: j stop
This will load the values 5 and 4 into the registers used for arguments to the the sum function and call it. This can be assembled and linked as follows:
$ riscv64-unknown-elf-as -o add.o add.s $ riscv64-unknown-elf-as -o main.o main.s $ riscv64-unknown-elf-ld -Ttext=0x80000000 -o sum.elf main.o add.o
If this program is assembled, linked, and run in QEMU, it will call the sum function to calculate the sum of the operands. This can be verified by inspecting the value of register a0 which should be 9.
NOTE: The order in which the object files are supplied to the linker is important. If the add.o file is supplied first, the program will not run because its content will be located at the reset address.
Instructions
Instructions are mnemonics that map directly to machine codes. For example the add instruction in the sum function corresponds with the op-code 0x33 (or b0110011).
When the add instruction is combined with its operands, the result is a single machine instruction. In RISC-V, all machine instructions are 32-bits long (unless you're using the compressed extension, but for now we'll just deal with 32-bit operations).
The add instruction is what's known as an R-type instruction because its operands are all registers. This type of instruction has the following form:
BITS | 31:25 | 24:20 | 19:15 | 14:12 | 11:7 | 6:0 |
---|---|---|---|---|---|---|
R-Type | func7 | rs2 | rs1 | func3 | rd | opcode |
In this table, the operation's function is a combination of func7 and func3. For the add instruction, this is b0000000 and b000. The rs2 and rs1 field are the source registers whose value will be added. The rd field will be the destination register for the result. Notice that the register fields are 5-bits wide. This allows the instruction to reference any of the 32 base registers (i.e. 25 registers). The add instruction from the previous example will be constructed as follows:
- func7
- b0000000
- rs2
- b01011 (for x11 which is a1)
- rs1
- b01010 (for x10 which is a0)
- func3
- b000
- rd
- b01010 (for x10)
- opcode
- b0110011
Putting it all together, we get b00000000101101010000010100110011, or 0x00B50533. We can confirm this by disassembling the object file that was created:
$ riscv64-unknown-elf-objdump -d add.o sum.o: file format elf64-littleriscv Disassembly of section .text: 0000000000000000 <sum>: 0: fe010113 addi sp,sp,-32 4: 00113c23 sd ra,24(sp) 8: 00b50533 add a0,a0,a1 c: 01813083 ld ra,24(sp) 10: 02010113 addi sp,sp,32 14: 00008067 ret
The add instruction in the sum function is at offset 8 of the object file, the machine instruction is 00b50533 which is the value that we had calculated for the instruction.
In addition to R-Type instructions, there are also I-Type instructions that operate on immediates (literal values), S-Type for storing to memory, U-Type for loading values from memory, B-Type for branching, and J-Type for jumps (e.g. function calls). The following table describes the layout of each of these instruction types:
BITS | 31:25 | 24:20 | 19:15 | 14:12 | 11:7 | 6:0 |
---|---|---|---|---|---|---|
R-Type | func7 | rs2 | rs1 | func3 | rd | opcode |
I-Type | imm[11:5] | imm[4:0] | rs1 | func3 | rd | opcode |
S-Type | imm[11:5] | rs2 | rs1 | func3 | imm[4:0] | opcode |
U-Type | imm[31:25] | imm[24:20] | imm[19:15] | imm[14:12] | rd | opcode |
B-Type | imm[12,10:5] | rs2 | rs1 | func3 | imm[4:1,11] | opcode |
J-Type | imm[20,10:5] | imm[4:1,11] | imm[19:15] | imm[14:12] | rd | opcode |
Pseudo-Instructions
Unlike instructions, pseudo-instructions do not map directly to op-codes. Typically these represent idioms to make the programmer's life a little easier.
For example, the li in the sum program is an example of a pseudo-instruction. As explained in the previous chapter, this pseudo-instruction maps to an addi I-Type instruction which adds the value of x0 to the immediate value and stores the result in the destination register. Pseudo-instructions provide convenient mnemonics for programming without adding additional op-codes.
Pseudo-instructions may also translate to more than one assembler instruction. For example, the call pseudo-instruction will be
translated into a sequence of three instructions: auipc, addi, and jal.
Directives
Directives are commands for the assembler rather than instructions that it will translate into machine code. Directives can be used to tell the assembler where to place code and data in the resulting object file, or to setup the memory of the target system. The previous example used assembler directives to export global symbols, to set the alignment for instructions, and to ensure that the code is assembled into the ".text" section of the object file.
To understand the purpose of the assembler directives, it is important to understand how assembled code is linked together. The assembler produces object files that are combined to produce an Executable and Linkable Format (ELF) file. This file will be segmented into different sections:
- .text
- CPU instructions (the executable code).
- .rodata
- Read-only data.
- .data
- Global, mutable, initialized data.
- .bss
- Global, mutable, un-initialized data.
Up to now, only the text section has been used. The location where the code is loaded was specified using the "-T" option when invoking the linker. If multiple object files are passed to the linker, their text sections merged into a single contiguous section.
Code and data will have different run-time requirements. Code is generally read-only where as data my required read-write permissions. Therefore it is advantageous that code and data are not interleaved. To ensure this, the locations of text and data sections of the program should not overlap.
To avoid having to define the position of each section at the command line, the linker allows the memory layout to be defined using a linker script:
OUTPUT_ARCH( "riscv" ) SECTIONS { . = 0x80000000; .text : { PROVIDE(_text_start = .); .*(.text.init) main.o (.text) .*(.text .text.*) PROVIDE(_text_end = .); } PROVIDER(_global_pointer = ,); .rodata : { PROVIDE(_rodata_start = .); .*(.rodata .rodata.*) PROVIDE(_rodata_end = .); } .data : { . = ALIGN(4096); PROVIDE(_data_start = .); .*(.sdata .sdata.*) *(.data .data.*) PROVIDE(_data_end = .); } .bss : { PROVIDE(_bss_start = .); .*(.sbss .sbss.*) *(.bss .bss.*) PROVIDE(_bss_end = .); } PROVIDE(_stack_start = _bss_end); PROVIDE(_stack_end = _stack_start + 0x8000); }
The SECTIONS
keyword is used to specify how the various sections are layed out in the file. In the linker script shown previously, the .text
, .rodata
, .data
, and .bss
sections are defined.
The .text section will include all code that follows a .section .text.init
or .text
assembler directive. The sum.s and main.s will therefore both be included in this section. On line 7 of the linker script, the text section of the main.o object file is included explicitly. This will ensure that the main program appears in the linked program before the sum function does (which is included by the wildcard on the next line).
The PROVIDE
keyword is used to define a symbol at the address of the definition. The start and end of each of the sections will be provided by the linker. Moreover, the start and end of the stack memory area can be declared in this way. In a later chapter, these symbols will be used to setup the stack pointer.
Putting it All Together
The program can now be assembled and linked using the following sequence of commands:
$ riscv64-unknown-elf-as -o sum.o sum.s $ riscv64-unknown-elf-as -o main.o main.s $ riscv64-unknown-elf-ld -T linker.lds -o sum.elf main.o sum.o
This will produce an ELF file called sum.elf. By inspecting the sum.elf file, we can see that the _start symbol shows up before the sum function:
$ riscv64-unknown-elf-objdump -d sum.elf sum.elf: file format elf64-littleriscv Disassembly of section .text: 0000000080000000 <_start>: 80000000: 00500513 li a0,5 80000004: 00400593 li a1,4 80000008: 00009117 auipc sp,0x9 8000000c: ff810113 addi sp,sp,-8 # 80009000 <_stack_end> 80000010: 008000ef jal ra,80000018 <sum> 0000000080000014 <stop>: 80000014: 0000006f j 80000014 <stop> 0000000080000018 <sum>: 80000018: fe010113 addi sp,sp,-32 8000001c: 00113c23 sd ra,24(sp) 80000020: 00b50533 add a0,a0,a1 80000024: 01813083 ld ra,24(sp) 80000028: 02010113 addi sp,sp,32 8000002c: 00008067 ret
The disassembled sum.elf also shows that the call pseudo-instruction was translated to the following sequence of instructions:
auipc sp,0x9 addi sp,sp,-8 jal ra,80000018
This program can be run in QEMU just as before and the result should be the same as previous runs.
Conclusion
This chapter of the Bare Metal RISC-V tutorial covered the assembly language in a little more details. Assembly programs are made up of directives, pseudo-instructions, and instructions. Directives provide guidance to the assembler on how to organize the assembled code. Pseudo-instructions provide useful mnemonics that are mapped to one or more primitive assembler instructions. Instructions are translated into binary machine instructions which direct the execution flow of the processor. In future chapters, this information will be utilised to make the RISC-V processors do more intersting things.
Add new comment