Up to this point, the RISC-V tutorial has focused on single applications running on a single hardware thread. The application's environment was composed of the processor's state and memory map. The processor state was controlled via assembly instructions, and the memory map was defined at build time via a linker script. However, modern systems are almost always multiprogrammed and will execute many applications concurrently (by interleaving their instructions in a single stream). Moreover, multiple hardware threads are common, allowing application instructions to execute simultaneously. This workflow requires a lot more care to ensure correctness and proper separation of memory. This idea was touched upon briefly in chapter 4 while discussing the A extension which provides atomic memory operations that were used to define synchronization primitives. In addition to memory synchronization, the application must control the execution environment of all of the active hardware threads, this chapter will explore the mechanisms available for this purpose.
The examples presented thus far can all be logically separated into three layers: The application execution environment (AEE), the application binary interface (ABI), and the application code. This organisation allows for a single application to execute in a single AEE.
The AEE is defined by the processor state and the static memory map which is defined by the linker script. The ABI describes the conventions that programs must follow and manages dynamic memory; in this case the stack. The upshot of defining an ABI layer is that application code in the third layer can interact with an abstract view of the machine implementation. This simplifies the development of applications by hiding some of the hardware details. For example, the preamble and postamble code for function calls, which are part of this ABI, were used by the application code to ensure that dynamic memory was managed correctly. These code templates can be provided by developer tools, such as high-level language compilers.
This becomes even more important as application environments become more complex. As complexity increases in the application environment, more can be done by the ABI to hide some of the tedious details of the layer beneath it. This is where the operating system (OS) comes into play. The OS is a supervisor process which is sandwiched between the ABI and the supervisor binary interface (SBI). In this configuration, the SBI hides more of the hardware details from the OS, which provides an even more abstract view of the system to the applications. The definition of an SBI also improves the portability of the OS layer.
Another advantage of this layered approach is that it is easier to enforce separation between applications. One application should not interfere with other applications, or with the supervisor itself. The RISC-V ISA defines three different privilege modes to enforce this. From most to least privileged, the three levels are: machine, supervisor, and user.
ISA implementations may provide 1-3 of the defined privilege modes. All hardware implementations must provide machine mode (or M-mode). A secure embedded system may provide machine and user modes. A virtualized multiprogramming system should provide all three modes.
What's my Pay Grade?
The capabilities of the ISA implementation can be queried via the misa control and status register. The function illustrated in the following listing will determine which integer ISA and the number of privilege modes that are supported by the current hardware thread:
1: .text 2: .align 2 3: .global __system_check 4: __system_check: 5: # Input 6: # None 7: # 8: # Returns 9: # - a0: The number of supported privilege modes (1, 2 or 3). 10: # - a1: The register width used by the ISA (in bytes). 11: csrr t0, misa # Load misa into t0 12: li a1, 4 # Load minimum register width 13: li a0, 1 # M-mode is always supported 14: # Probe for user-mode 15: lui t1, 0x100 # Set the u-mode mask in t1 16: and t1, t0, t1 17: beqz t1, 1f # Determine if the U-mode bit is set in misa 18: addi a0, a0, 1 # U-mode is available, increment a0 19: 1: # Probe for supervisor-mode 20: lui t1, 0x40 # Set the S-mode mask in t1 21: and t1, t0, t1 22: beqz t1, 2f # Determine if the S-mode bit is set in misa 23: addi a0, a0, 1 # S-mode is available, increment a0 24: 2: # Determine register width 25: bgez t0, 3f # Determine if a1 holds the register width 26: slli a1, a1, 1 # Multiply register width by 2 27: slli t0, t0, 1 28: j 2b 29: 3: 30: ret 31:
This function loads the misa Control and Status Register (CSR) into the temporary register t0 on line 11 using the
csrr pseudo instruction. The minimum register width is 4-bytes, therefore the immediate 4 is loaded into register a1 as an initial value (line 12). Moreover, machine-mode is required by hardware implementations, thus the initial value of a1 is 1.
The least-significant 26-bits of the misa CSR are flags which indicate supported extensions; one for each letter of the alphabet. The S and U extensions are for supervisor and user mode respectively. Therefore, user mode will be available if bit 20 is set to 1. A bit mask is created on line 15.
Since the mask is an immediate value that is wider than what is allowed by RISC-V I-type instructions, the
luiinstruction can be used to load the 20-bit immediate value then shift it left by 12-bits with a single instruction. Thus the immediate value 0x100 becomes 0x100000 when loaded following the load operation.
A bit-wise AND is performed using the bit-mask and the misa CSR value to determine if user-mode is available. If the result of the
and instruction is zero on line 17, user-mode is not supported. Otherwise the value of a1 is incremented on line 18.
Similarly supervisor mode is supported if bit 18 is set to 1. A bit mask is created on line 20, then a bit-wise AND is performed with the value if the misa CSR on the next line. If the result is zero, then supervisor mode is not supported. Otherwise the number of privilege modes is incremented by 1.
The final step of the
__system_check function is to determine the width of the hart's registers. The misa CSR's most-significant 2-bits encodes the width of the registers used by the ISA: 1) 32-bits, 2) 64-bits, 3) 128-bits. Determining this is complicated by the fact that the CSR's width is not known prior to checking this field. To overcome this, a check is performed to determine if the register's value is negative, in which case the most-significant bit must be set to 1. Therefore the number of bytes in a1 will be multiplied by 2 (by shifting it to the left by 1 bit on line 26), the value of t0 is shifted by 1 to the left, and the check is performed again. If the value of t0 is determined to be positive on line 25 (i.e. the msb is 0), then the
__system_check function has completed its probe. The following listing illustrates the main program which performs the system check:
1: .section ".text.init" 2: .align 2 3: .global _start 4: .global _stack_end 5: _start: 6: la sp, _stack_end 7: call __system_check 8: stop: j stop 9:
__system_check function is called on line 7. When this function returns the register a0 should hold the number of supported privilege modes, and register a1 should hold the number of bytes in a register. If this program is run in
info registers command can be used at the console to determine the hardware details:
make chapter5 qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel chapter5.elf QEMU 3.1.0 monitor - type 'help' for more information (qemu) info registers pc 000000008000000c mhartid 0000000000000000 mstatus 0000000000000000 mip 0000000000000000 mie 0000000000000000 mideleg 0000000000000000 medeleg 0000000000000000 mtvec 0000000000000000 mepc 0000000000000000 mcause 0000000000000000 zero 0000000000000000 ra 000000008000000c sp 0000000080009000 gp 0000000000000000 tp 0000000000000000 t0 000000000028225a t1 0000000000040000 t2 0000000000000000 s0 0000000000000000 s1 0000000000000000 a0 0000000000000003 a1 0000000000000008 ...
The value of a0 is 3, therefore the
qemu VirtIO platform supports all three privilege modes. The value of a1 is 8, which means that the register width is 64-bits (8 bytes). This value could be used as a stack offset allowing for the definition of the function call preamble and postamble as part of a library. This library could then be used to define an operating system's ABI.
It's a Trap!
Typically, systems should run in the most restricted environment possible in order to minimize catastrophes in the event of a system fault. However, when an exceptional event occurs, the system may want to raise its privilege level in order to deal with it. These types of events are often associated with an interrupt.
The machine-mode status CSR, mstatus, allows some control over a hart's operating state, including enabling or disabling global interrupts. Three fields are defined in the register's least significant four bits for this purpose; one for each of the privilege modes.
specific interrupt types for each privilege mode must be enabled individually via the mie register which will be described later
The fields defined in the mstatus register are illustrated in the following table:
The UIE, SIE, and MIE fields will enable interrupts globally for the user, supervisor, and machine modes respectively. If the $x$IE fields value is set to 1, interrupts will be enabled globally for privilege mode \(x\) and any privilege mode \(y < x\) provided $y$IE is also set to 1. If the hart is operating at privilege level \(x\), interrupts at privilege levels inferior to \(x\) will be disabled regardless of the state of the associated interrupt bit in mstatus.
To be of any use, there must be a mechanism to handle interrupts. The BASE field of the mtvec CSR can be set to the base address of a trap-vector to handle interrupts. The MODE field, that occupies the least-significant two bits of mtvec, specify the trap mode. When the MODE field is set to 0, the trap mode will be set to call the handler directly. Otherwise the base address expresses the base of a vector of trap handlers; one for each interrupt type indexed by the interrupt code. For the time being, a single interrupt handler will be used to dispatch to the appropriate handler for interrupts or exceptions.
The code that follows illustrates a skeleton for a trap handler implementation in direct mode:
1: .text 2: .align 2 3: .global trap_handler 4: trap_handler: 5: # Trap handler preamble (save registers). 6: csrrw a0, mscratch, a0 7: sd a1, 0(a0) 8: sd a2, 8(a0) 9: sd a3, 16(a0) 10: sd a4, 24(a0) 11: # Decode the cause of the interrupt. 12: csrr a1, mcause 13: bgez a1, exception 14: interrupt: 15: andi a1, a1, 0x3f # Isolate the cause field 16: # TODO: Dispatch to specific interrupt handler 17: j trap_handler_restore_state 18: exception: 19: addi a1, a1, 0x3f # Isolate the cause field 20: # TODO: Dispatch to specific exception handler 21: trap_handler_restore_state: 22: ld a4, 24(a0) 23: ld a3, 16(a0) 24: ld a2, 8(a0) 25: ld a1, 0(a0) 26: csrrw a0, mscratch, a0 27: mret
The first instruction on line 6 will atomically swap the values of the mscratch CSR and a0. This register is defined to provide additional data to trap handlers. Typically, this should be set to an memory address of a buffer where register data can be saved while the handler is active.
The next four lines save the contents of registers a1-a4 in the memory buffer located at the address in mscratch, then the value of the mcause CSR is copied into a1 on line 12. The mcause register is used to indicate the cause of synchronous and asynchronous exceptions. If the most significant bit in this register is zero, then the trap was caused by a synchronous exception. This can be determined by testing the value of a1 to see if it is greater than or equal to zero (on line 13; if the msb is 1, the register will be negative and the execution will fall through to the interrupt handler code.
To use the trap handler, it's address must be set in the base field of the mtvec CSR. This is illustrated in the following program:
1: .section ".text.init" 2: .align 2 3: .global _start 4: .global _stack_end 5: _start: 6: la sp, _stack_end 7: la a0, scratch 8: csrrw a0, mscratch, a0 9: la a0, trap_handler 10: csrrw a0, mtvec, a0 11: call __system_check 12: stop: j stop 13: .bss 14: scratch: .dword 0, 0, 0, 0
The scratch area where the register state can be saved is defined on line 14. This allocates 32 bytes of space to save the contents of up to 4 registers which is enough for the current
trap_handler implementation. The address of the scratch is loaded into register a0 on line 7. This register's value is then swapped with the value of the mscratch CSR on line 8.
The address of the trap handler function is loaded into register a0 on line 9, then it is swapped with the contents of the mtvec CSR on line 10. The MODE field is left as zero to set the trap mode to direct; which will cause all synchronous and asynchronous interrupts to branch to the base address.
What's the Time?
A platform's real-time counter is one of the possible sources of asynchronous exceptions that can cause an interrupt. The timer is typically external to the processing core. The VirtIO machine of the QEMU emulator includes the core-level interrupt module (CLINT). This module defines the mtime register that exposes the current value of the real-time counter. This value expresses the number of clock cycles that have elapsed since the processor was reset. This does not represent the real time, but a count of real-time intervals (determined by the oscillator frequency).
The mtime register is mapped to a particular address in physical memory. The actual address is specified in the memory map of the VirtIO machine in the QEMU system (see hw/riscv/virt.c). The listing that follows illustrates a function to retrieve the current real-time counter value:
1: .equ CLINT_BASE, 0x2000000 # The base address of the CLINT module 2: .equ CLINT_MTIME, 0xbff8 # The offset of the MTIME register 3: .macro ld_mtime rd # Macro to access the MTIME memory mapped register 4: li t0, CLINT_BASE 5: li t1, CLINT_MTIME 6: add t0, t0, t1 # Determine the absolute address of MTIME 7: ld \rd, 0(t0) # Read the counter value 8: .endm 9: 10: .text 11: .align 2 12: .global __clock_cycle 13: __clock_cycle: 14: # Retrieve the current value of the real-time counter. 15: # 16: # Inputs: None 17: # 18: # Returns: 19: # - a0: The current value of the mtime register. 20: ld ld_mtime a0 21: ret
The CLINT_BASE symbol is defined on line 1, its value is the absolute base address of the memory mapped registers of the CLINT module (see include/hw/riscv/sifive_clint.h of the QEMU source). The CLINT_MTIME symbol, defined on line 2, specifies the offset of the mtime register relative to the CLINT's base memory address. The offset is added to the base address and its result stored in register t0 on line 6. Finally the value of the the real-time counter is retrieved on line 7. All of these operations are defined in a macro starting on line 3.
The mtime register is useful to determine the time since the board was reset. However, it cannot generate interrupts by itself. The mtimecmp memory mapped register will cause a timer interrupt to be posted when its value is less than the value contained in mtime. In other words, if when the periodically increasing value of mtime exceeds that contained in mtimecmp, a timer interrupt will be posted (provided that timer interrupts are enabled). Therefore to receive an interrupt after some fixed interval of time, the current value of mtime must be retrieved, then the timeout value must be added thereto and the result saved in the mtimecmp register. This process is illustrated in the following listing:
1: .equ CLINT_MTIMECMP, 0x4000 # The offset of the MTIMECMP register 2: .macro st_mtimecmp rs 3: li t0, CLINT_BASE 4: li t1, CLINT_MTIMECMP 5: add t1, t0, t1 6: sd \rs, 0(t1) 7: .endm 8: .global __timer_create 9: __timer_create: 10: # Set a timer to trigger an interrupt when a given number of 11: # clock cycles have elapsed. 12: # 13: # Inputs: 14: # - a0: The timeout value in clock cycles. 15: ld_mtime t1 16: ld t1, 0(t1) 17: add a0, a0, t1 # Add the timeout to the clock cycle 18: st_mtimecmp a0 19: ret
__timer_create function will set a timer to trigger an interrupt after the given number of clock cycles have elapsed. This code re-uses the macro defined previously to read the current value of the real-time counter, then adds the desired number of cycles thereto, and writes the result to the mtimecmp register. When mtime's value is greater than the value in mtimecmp, the MTIP field of the mip CSR register will be asserted to indicate that a timer interrupt is pending, and the trap handler will be called. The following code will set up this process:
1: .section ".text.init" 2: .align 2 3: .global _start 4: .global _stack_end 5: .global CLOCK_MONOTONIC 6: _start: 7: la sp, _stack_end 8: la t0, __mtrap_handler # Load trap vector address 9: csrrw zero, mtvec, t0 10: li t0, 0b1<<3 11: csrrs t0, mstatus, t0 # Enable interrupts globally (ref:set-mstatus.MIE) 12: li t0, 0b1<<7 13: csrrs t0, mie, t0 # Enable timer interrupts (ref:set-mie.MTIE) 14: 1: li a0, 0x10000 # Set the timeout value 15: call __timer_create # Set the timer 16: wfi # Wait for interrupts 17: j 1b 18: 19: .align 2 20: __mtrap_handler: # Machine interrupt handler 21: csrrc t0, mcause, zero # Get the cause of the interrupt 22: bgez t0, 2f # Exit on an exception 23: slli t0, t0, 1 24: srli t0, t0, 1 25: li t1, 7 # The timer interrupt has code 7. 26: bne t0, t1, 2f # Check for timer interrupts 27: addi s0, s0, 1 # Increment the interrupt count. 28: 2: mret # Machine trap return
After setting up the stack, this program loads the address for the trap handler on line 8, and stores this address in the mtvec CSR. Machine-mode interrupts are then enabled globally by setting bit 3 of the mstatus CSR to 1 (on line 11), and M-mode timer interrupts are enabled (on line 13) by setting bit 7 of the mie CSR to 1. On line 14, an immediate value is loaded into register a0. This value will be used to create the timer on line 15. Once the timer is armed, the
wfi instruction is used on line 16 to wait for an exception to occur at which point control jumps to the interrupt handler. When the handler returns, control jumps back to line 14, and the timer is armed again. This should cause periodic calls to the trap vector.
A machine-level trap handler is defined on line 20. This handler will increment the value in register s0 every time a timer exception is triggered. The cause of the interrupt is determined on line 21 which atomically reads the value of the mcause CSR and sets its value to zero. If the value of this register was greater-than, or equal to, zero (line 22), the interrupt was caused by a synchronous exception (whereby the most-significant bit of the register will be set to 1). In the case of a synchronous exception, the handler simply exists by calling the
mret instruction which returns control to the instruction that was executing when the exception occurred. Otherwise the next two lines will shift off the most significant bit (i.e. set it to zero). The interrupt code is checked on line 26, if it corresponds with the timer interrupt, the value of s0 is incremented by 1.
The state of the hart can be inspected via the
info registers command in
qemu. If the timing of the snapshot is such that the trap handler is executing the machine CSRs will have the following state:
(qemu) info registers pc 000000008000004c mhartid 0000000000000000 mstatus 0000000000001880 mip 0000000000000080 mie 0000000000000080 mideleg 0000000000000000 medeleg 0000000000000000 mtvec 0000000080000048 mepc 0000000080000040 mcause 8000000000000007
As usual, the pc register shows the current address of the active instruction, however, in this case control has jumped into the trap handler. When an exception is triggered, the following operations are executed atomically:
- Interrupts are disabled globally (bit 3 of mstatus is set to 0).
- The mstatus.MPIE field (which represents the previous global interrupt mode) is set the value of mstatus.MIE
- Interrupts are disabled globally by setting the mstatus.MIE field is set to zero.
- The mstatus.MPP field (bits 12:11) is set to 0b11 which indicates the privilege level prior to the exception being raised.
- The address of the instruction that follows the last one to execute before the exception was raised is saved in the mepc register.
- The cause of the exception is written in register mcause.
- The pc register is set to the address in mtvec.
When the interrupt handler completes its task, the
mret instruction is executed which will reverse this process. Control will be set to the address in mepc, and the value of mstatus.MPIE will be written to mstatus.MIE and cleared. The mip register will be cleared to indicate that there are no longer interrupts pending. Control flow will continue from where it was interrupted, and the core will be ready to handle new interrupts. More importantly, The hart will be returned to the privilege mode specified in the mstatus.MPP field. This behaviour can be exploited to set the current privilege mode of the processor.
Enter the User
The default privilege level when the processor is reset is machine mode. Programs running at this level have full control over the processor via the control and status registers. However, this opens up the system to abuse. To limit any damage that is possible by a wayward program, it is better to run in user mode. Getting to user mode is a simple matter of setting up the mstatus register and returning from M-mode via the
mret instruction. The following macro will set the priviledge mode to the specified level:
1: # Set the privilege mode to that specified by the immediate 2: # value. 3: .macro setmode imm 4: li t0, 0x1800 #(ref:clear mstatus.MPP) 5: li t0, \imm 6: slli t0, t0, 11 7: csrrs t0, mstatus, t0 #(ref:set mstatus.MPP) 8: la t0, 1f 9: csrrw zero, mepc, t0 10: mret 11: 1: 12: .endm
This macro will set the privilege mode to the value specified as an immediate by first clearing the mstatus.MPP field on line 4, then replacing it with the encoding of the desired privilege mode. The following table lists the privilege modes and their encodings:
The fourth column of this table lists the immediate value that should be supplied to the macro to set the associated privilege mode. The encoded privilege value is loaded into register t0 on line 5, then shifted to the right position on the next line. This field is set via the
csrrs instruction on line 7 which effectively sets mstatus to the bit-wise OR of its previous value with the value in t0. However, before returning from machine-mode, the mepc register must be updated with the address of the instruction to which control will return.
The address immediately following the
mret instruction is loaded into register t0 on line 8, then written to mepc on line 9. When the
mret instruction is executed, pc will be set to this address. If the main program is updated to invoke this macro with the argument 0 just before the
wfi, the program should be in U-mode by the time the timer expires:
26: 1: li a0, 0x10000 # Set the timeout value 27: call __timer_create # Set the timer 28: setmode 0 29: wfi 30: j 1b 31:
Anything that runs following the call to the
setmode macro will be executing in U-mode. However, the trap handler will execute in machine mode. Therefore the trap handler is useful for performing tasks that require machine mode privilege. Fortunately, traps can occur for asynchronous interrupts as well as synchronous exceptions. Therefore the trap handler can be used to implement system calls.
ecall instruction is an environment call which raises a synchronous exception, and sets mcause to the code indicating the active privilege mode when it was executed. The trap handler can be updated to jump to the appropriate function, which will execute in machine mode, then returning to the original pivilege mode when it is complete. The trap handler must be updated to handle the system call:
1: .align 2 2: __mtrap_handler: 3: csrr t0, mcause 4: bgez t0, 1f 5: slli t0, t0, 1 6: srli t0, t0, 1 7: li t1, 7 8: bne t0, t1, 2f 9: addi s0, s0, 1 10: j 2f 11: 1: 12: li t1, 8 13: bne t0, t1, 2f 14: push_stack 15: call __syscall 16: pop_stack 17: csrr t0, mepc 18: addi t0, t0, 4 19: csrrw zero, mepc, t0 20: 2: csrrw t0, mcause, zero 21: mret 22:
This handler is updated by adding some code for synchronous exceptions (starting at line 12). The value of mcause is compared with the user-environment call exception code (8) to see if a system call was requested. If so, the handler will save the registers on the stack via the
push_stack macro on line 14, call the
__syscall function, then resture the registers to their values prior to the call via the
pop_stack macro on line 16.
After the system call has finished processing, the handler loads the value of the mepc CSR which should contain the address of the
ecall instruction that caused the trap. This address is incremented by 4 to skip to the instruction that follows
ecall on line 18, then stores the updated address in mepc before executing
mret. This will set the program counter to the instruction immediately following the one that triggered the system call, and restore the privilege mode to U-mode.
Typically system calls are identified by a number. If its arguments are stored in the registers a0 to a7, and its return value in a0, system calls will follow the same convention as regular function calls (albeit with greater privilege). The following code snippet will invoke the system call associated with the identifier 0x100:
1: li a0, 0x100 2: ecall
The timer can now be set from user mode via a system call. The following implementation of
__syscall will invoke
__timer_create with the timeout value specified in register a1 when system call 0x100 is requested:
1: __syscall: 2: mv s1, a0 3: li t0, 0x100 4: bne t0, a0, 1f 5: mv a0, a1 6: push_stack 7: call __timer_create 8: pop_stack 9: 1: ret 10:
This function will check the register a0 to determine the system call id that is requested. If it is 0x100, then it will move the timeout value from register a1 to a0 on line 5, push the stack, then invoke the
__timer_create function with M-mode privilege. The main program can be updated to set the privilege level to U-mode, then make a system call 0x100 to set the timer. The updated main program is illustrated in the following listing:
1: .section ".text.init" 2: .align 2 3: .global _stack_end 4: .global CLOCK_MONOTONIC 5: .global _start 6: _start: 7: la sp, _stack_end 8: call __system_check 9: la t0, __mtrap_handler 10: csrrw zero, mtvec, t0 11: li t0, 0b1<<3 12: csrrs zero, mstatus, t0 13: li t0, 0b1<<7 14: csrrs zero, mie, t0 15: setmode 0 16: 1: li a0, 0x100 17: li a1, 0x10000 18: ecall 19: wfi 20: j 1b
The major difference from the previous main program is that the timer is not set directly, but via a system call. The syscall id is loaded into register a0 on line 16, and the timeout value into register a1 on line 17. This set of instructions will be repeated each time a timeout is triggered.
This chapter has delved into how RISC-V handles synchronous and asynchronous exceptions as well as the privilege mode instructions available in the ISA. This may be one of the key components in the development of an operating system by allowing privileged functions to run separately from user code. System calls allow U-mode applications to request services which require M-mode (or S-mode) privilege to execute via the
Moreover, the asynchronous exception handling provides a good introduction into how RISC-V processors deal with events originating from external peripherals. This will become important when creating programs intended to interact with the system user.
Although the examples in this chapter were restricted to M-mode and U-mode privilege levels. The supervisor mode was briefly discussed. Having three privilege levels is useful when creating hypervisors: guest operating systems can operate in S-mode while the virtualization environment uses M-mode.
The next chapter will leverage many of the capabilities discussed here to interface with more of the external components in a RISC-V system. In particular the UART module will allow users to interact with applications via a serial console.