RISC-V Bare Metal Programming Chapter 2: OpCodes Assemble!
The previous chapter of this tutorial went over the steps required to setup a RISC-V development environment to create a program that runs on a bare metal VirtIO board using QEMU. Even though the example program – which calculates the sum two integers – was written in RISC-V assembly, no prior knowledge was required to follow along. This chapter will dive into the details of RISC-V assembly language as well as expand on what exactly is happening at each step of the development. The topics covered in this chapter will include an overview of the RISC-V architecture, its assembly instructions, pseudo-instructions, and directives.
The following listing illustrates the assembly code of the add.s
program from the previous chapter:
1: .text
2: .global _start
3: _start:
4: li a0, 5
5: li a1, 4
6: add a0, a0, a1
7: stop: j stop
The code was changed a little to use a different set of registers. This program when assembled results in an object file which can then be linked to create the file that will be loaded onto the board. In this scenario, only the one object file is used, however, more complex programs may required more than one object file. The linker’s job is to put all of these files together into a single executable program.
This add.s
program is composed of one instruction (6), three pseudo-instructions (4 5, and 7), and two directives (1, 2). Moreover, the instructions and pseudo-instructions have operands comprised of either registers or immediates. Understanding each of these entities will help when creating more complex programs in RISC-V assembly.
RISC-V systems will have a base set of 32 registers x0-x31
. The x0
register is read-only with a value fixed to zero. The rest will have varying content. The application binary interface (ABI) prescribes conventions for the name and usage of the various registers. These are listed in the following table:
Register(s) | ABI Name(s) | Description | Saved by |
x0 | zero | Hard-wired zero | N/A |
x1 | ra | Function return address | Caller |
x2 | sp | Stack pointer | Callee |
x3 | gp | Global pointer | N/A |
x4 | tp | Thread pointer | N/A |
x5 | t0 | Temporary/alternate link register | Caller |
x6-x7 | t1-t2 | Temporary values | Caller |
x8 | s0/fp | Saved register/Frame pointer | Callee |
x9 | s1 | Saved register | Callee |
x10-x11 | a0-a1 | Function arguments/Return values | Caller |
x12-x17 | a2-a7 | Function arguments | Caller |
x18-x27 | s2-s11 | Saved registers | Callee |
x28-x31 | t3-t6 | Temporary values | Caller |
Registers can be referred by their ABI names or their actual names in assembly programs; the two are interchangeable.
When a function is invoked, it may modify the values of some of these registers. For this reason it is advisable to save the contents of those registers in memory in order to be able to restore them when the function completes. The ABI convention prescribes which party in a function call (the caller or the callee) is responsible for saving these values. This convention is described in the “Saved by” column of the table.
If a register is to be saved by the caller, its value should be stored in a frame of the stack, that was allocated for that purpose, prior to calling the function. This ensures that the values can be restored when the function returns. In general, it is a good idea to save all of the registers if the caller does not know which registers may be modified by the callee.
Registers to be saved by the callee only need to be saved to memory if the function uses those registers. Functions must not leave a trace, the state of the machine must be the same as it was prior to the function being invoked (with the exception of the desired function result).
A function implementation defined in RISC-V assembly should use the following prologue before doing any of its work:
addi sp, sp, -framesize # The stack grows downward
sd ra,framesize-8(sp) # Save the return address
# Save registers owned by the callee as needed to memory.
This will ensure that the function can return to the point where it was called, and that any register state will be saved.
Before a function returns, the saved register values must be restored. This is achieved by the following epilogue which should end a function call.
# restore registers from the stack if needed
ld ra, framesize-8(sp) # Restore the return address register
addi sp, sp, framesize # Pop the stack
ret # return to the caller
This will restore the saved registers, set the return address and de-allocate the stack frame that was used to save this information.
The add.s
program can be enhanced to use the function prologue and epliogue to define a function that calculates the sum its arguments in registers a0
and a1
. This new program is illustrated in the listing that follows:
.align 2
.global sum
addi sp, sp, -32 # Stack frames must be 16-bit aligned
sd ra, 24(sp) # Save the return address
add a0, a0, a1 # Add the function operands
ld ra, 24(sp) # restore return address
addi sp, sp, 32 # De-allocate the stack frame
The sum
function can be called by name from a different assembler program. Create a main.s
program with the following content:
.align 2
.global _start
li a0, 5
li a1, 4
call sum
stop: j stop
This will load the values 5 and 4 into the registers used for arguments to the the sum function and call it. This can be assembled and linked as follows:
$ riscv64-unknown-elf-as -o add.o add.s
$ riscv64-unknown-elf-as -o main.o main.s
$ riscv64-unknown-elf-ld -Ttext=0x80000000 -o sum.elf main.o add.o
If this program is assembled, linked, and run in QEMU, it will call the sum
function to calculate the sum of the operands. This can be verified by inspecting the value of register a0
which should be 9.
NOTE: The order in which the object files are supplied to the linker is important. If the
file is supplied first, the program will not run because its content will be located at the reset address.
Instructions are mnemonics that map directly to machine codes. For example the add
instruction in the sum
function corresponds with the op-code 0x33
(or b0110011
When the add
instruction is combined with its operands, the result is a single machine instruction. In RISC-V, all machine instructions are 32-bits long (unless you’re using the compressed extension, but for now we’ll just deal with 32-bit operations).
The add
instruction is what’s known as an R-type instruction because its operands are all registers. This type of instruction has the following form:
BITS | 31:25 | 24:20 | 19:15 | 14:12 | 11:7 | 6:0 |
R-Type | func7 | rs2 | rs1 | func3 | rd | opcode |
In this table, the operation’s function is a combination of func7 and func3. For the add
instruction, this is b0000000
and b000
. The rs2 and rs1 field are the source registers whose value will be added. The rd field will be the destination register for the result. Notice that the register fields are 5-bits wide. This allows the instruction to reference any of the 32 base registers (i.e. 25 registers). The add
instruction from the previous example will be constructed as follows:
- func7:
- rs2:
which isa1
) - rs1:
which isa0
) - func3:
- rd:
) opcodeb0110011
Putting it all together, we get b00000000101101010000010100110011
, or 0x00B50533
. We can confirm this by disassembling the object file that was created:
$ riscv64-unknown-elf-objdump -d add.o
sum.o: file format elf64-littleriscv
Disassembly of section .text:
0000000000000000 <sum>:
0: fe010113 addi sp,sp,-32
4: 00113c23 sd ra,24(sp)
8: 00b50533 add a0,a0,a1
c: 01813083 ld ra,24(sp)
10: 02010113 addi sp,sp,32
14: 00008067 ret
The add
instruction in the sum
function is at offset 8 of the object file, the machine instruction is 00b50533
which is the value that we had calculated for the instruction.
In addition to R-Type instructions, there are also I-Type instructions that operate on immediates (literal values), S-Type for storing to memory, U-Type for loading values from memory, B-Type for branching, and J-Type for jumps (e.g. function calls). The following table describes the layout of each of these instruction types:
BITS | 31:25 | 24:20 | 19:15 | 14:12 | 11:7 | 6:0 |
R-Type | func7 | rs2 | rs1 | func3 | rd | opcode |
I-Type | imm[11:5] | imm[4:0] | rs1 | func3 | rd | opcode |
S-Type | imm[11:5] | rs2 | rs1 | func3 | imm[4:0] | opcode |
U-Type | imm[31:25] | imm[24:20] | imm[19:15] | imm[14:12] | rd | opcode |
B-Type | imm[12,10:5] | rs2 | rs1 | func3 | imm[4:1,11] | opcode |
J-Type | imm[20,10:5] | imm[4:1,11] | imm[19:15] | imm[14:12] | rd | opcode |
Unlike instructions, pseudo-instructions do not map directly to op-codes. Typically these represent idioms to make the programmer’s life a little easier.
For example, the li
in the sum
program is an example of a pseudo-instruction. As explained in the previous chapter, this pseudo-instruction maps to an addi
I-Type instruction which adds the value of x0
to the immediate value and stores the result in the destination register. Pseudo-instructions provide convenient mnemonics for programming without adding additional op-codes.
Pseudo-instructions may also translate to more than one assembler instruction. For example, the call
pseudo-instruction will be
translated into a sequence of three instructions: auipc
, addi
, and jal
Directives are commands for the assembler rather than instructions that it will translate into machine code. Directives can be used to tell the assembler where to place code and data in the resulting object file, or to setup the memory of the target system. The previous example used assembler directives to export global symbols, to set the alignment for instructions, and to ensure that the code is assembled into the “.text” section of the object file.
To understand the purpose of the assembler directives, it is important to understand how assembled code is linked together. The assembler produces object files that are combined to produce an Executable and Linkable Format (ELF) file. This file will be segmented into different sections:.text
CPU instructions (the executable code). .rodata
Read-only data. .data
Global, mutable, initialized data. .bss
Global, mutable, un-initialized data.
Up to now, only the text section has been used. The location where the code is loaded was specified using the “-T” option when invoking the linker. If multiple object files are passed to the linker, their text sections merged into a single contiguous section.
Code and data will have different run-time requirements. Code is generally read-only where as data my required read-write permissions. Therefore it is advantageous that code and data are not interleaved. To ensure this, the locations of text and data sections of the program should not overlap.
To avoid having to define the position of each section at the command line, the linker allows the memory layout to be defined using a linker script:
OUTPUT_ARCH( "riscv" )
. = 0x80000000;
.text : {
PROVIDE(_text_start = .);
main.o (.text)
.*(.text .text.*)
PROVIDE(_text_end = .);
PROVIDER(_global_pointer = ,);
.rodata : {
PROVIDE(_rodata_start = .);
.*(.rodata .rodata.*)
PROVIDE(_rodata_end = .);
.data : {
. = ALIGN(4096);
PROVIDE(_data_start = .);
.*(.sdata .sdata.*) *(.data .data.*)
PROVIDE(_data_end = .);
.bss : {
PROVIDE(_bss_start = .);
.*(.sbss .sbss.*) *(.bss .bss.*)
PROVIDE(_bss_end = .);
PROVIDE(_stack_start = _bss_end);
PROVIDE(_stack_end = _stack_start + 0x8000);
keyword is used to specify how the various sections are layed out in the file. In the linker script shown previously, the .text
, .rodata
, .data
, and .bss
sections are defined.
The .text section will include all code that follows a .section .text.init
or .text
assembler directive. The sum.s
and main.s
will therefore both be included in this section. On line 7 of the linker script, the text section of the main.o
object file is included explicitly. This will ensure that the main program appears in the linked program before the sum
function does (which is included by the wildcard on the next line).
keyword is used to define a symbol at the address of the definition. The start and end of each of the sections will be provided by the linker. Moreover, the start and end of the stack memory area can be declared in this way. In a later chapter, these symbols will be used to setup the stack pointer.
Putting it All Together
The program can now be assembled and linked using the following sequence of commands:
$ riscv64-unknown-elf-as -o sum.o sum.s
$ riscv64-unknown-elf-as -o main.o main.s
$ riscv64-unknown-elf-ld -T linker.lds -o sum.elf main.o sum.o
This will produce an ELF file called sum.elf
. By inspecting the sum.elf
file, we can see that the _start
symbol shows up before the sum
$ riscv64-unknown-elf-objdump -d sum.elf
sum.elf: file format elf64-littleriscv
Disassembly of section .text:
0000000080000000 <_start>:
80000000: 00500513 li a0,5
80000004: 00400593 li a1,4
80000008: 00009117 auipc sp,0x9
8000000c: ff810113 addi sp,sp,-8 # 80009000 <_stack_end>
80000010: 008000ef jal ra,80000018 <sum>
0000000080000014 <stop>:
80000014: 0000006f j 80000014 <stop>
0000000080000018 <sum>:
80000018: fe010113 addi sp,sp,-32
8000001c: 00113c23 sd ra,24(sp)
80000020: 00b50533 add a0,a0,a1
80000024: 01813083 ld ra,24(sp)
80000028: 02010113 addi sp,sp,32
8000002c: 00008067 ret
The disassembled sum.elf
also shows that the call
pseudo-instruction was translated to the following sequence of instructions:
auipc sp,0x9
addi sp,sp,-8
jal ra,80000018
This program can be run in QEMU just as before and the result should be the same as previous runs.
This chapter of the Bare Metal RISC-V tutorial covered the assembly language in a little more details. Assembly programs are made up of directives, pseudo-instructions, and instructions. Directives provide guidance to the assembler on how to organize the assembled code. Pseudo-instructions provide useful mnemonics that are mapped to one or more primitive assembler instructions. Instructions are translated into binary machine instructions which direct the execution flow of the processor. In future chapters, this information will be utilised to make the RISC-V processors do more interesting things.
Filed under: RISC V - @ 2019-11-06 08:03