RISC-V https://www.vociferousvoid.org/main/ en RISC-V Bare Metal Programming - Chapter 5: It's a Trap! https://www.vociferousvoid.org/main/riscv_bare_metal_chapter5 <span property="dc:title" class="field field--name-title field--type-string field--label-hidden">RISC-V Bare Metal Programming - Chapter 5: It&#039;s a Trap!</span> <div property="content:encoded" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><div class="tex2jax_process"> <p>Up to this point, the RISC-V tutorial has focused on single applications running on a single hardware thread. The application's environment was composed of the processor's state and memory map. The processor state was controlled via assembly instructions, and the memory map was defined at build time via a linker script. However, modern systems are almost always multiprogrammed and will execute many applications concurrently (by interleaving their instructions in a single stream). Moreover, multiple hardware threads are common, allowing application instructions to execute simultaneously. This workflow requires a lot more care to ensure correctness and proper separation of memory. This idea was touched upon briefly in <a href="https://www.vociferousvoid.org/main/riscv_bare_metal_chapter4">chapter 4</a> while discussing the <b>A</b> extension which provides atomic memory operations that were used to define synchronization primitives. In addition to memory synchronization, the application must control the execution environment of all of the active hardware threads, this chapter will explore the mechanisms available for this purpose.</p> <div class="outline-2" id="outline-container-orgbf395c9"> <h2 id="orgbf395c9">The ABI</h2> <div class="outline-text-2" id="text-orgbf395c9"> <p>The examples presented thus far can all be logically separated into three layers: The application execution environment (AEE), the application binary interface (ABI), and the application code. This organisation allows for a single application to execute in a single AEE.</p> <p>The AEE is defined by the processor state and the static memory map which is defined by the linker script. The ABI describes the conventions that programs must follow and manages dynamic memory; in this case the stack. The upshot of defining an ABI layer is that application code in the third layer can interact with an abstract view of the machine implementation. This simplifies the development of applications by hiding some of the hardware details. For example, the preamble and postamble code for function calls, which are part of this ABI, were used by the application code to ensure that dynamic memory was managed correctly. These code templates can be provided by developer tools, such as high-level language compilers.</p> <p>This becomes even more important as application environments become more complex. As complexity increases in the application environment, more can be done by the ABI to hide some of the tedious details of the layer beneath it. This is where the operating system (OS) comes into play. The OS is a supervisor process which is sandwiched between the ABI and the supervisor binary interface (SBI). In this configuration, the SBI hides more of the hardware details from the OS, which provides an even more abstract view of the system to the applications. The definition of an SBI also improves the portability of the OS layer.</p> <p>Another advantage of this layered approach is that it is easier to enforce separation between applications. One application should not interfere with other applications, or with the supervisor itself. The RISC-V ISA defines three different privilege modes to enforce this. From most to least privileged, the three levels are: machine, supervisor, and user.</p> <p>ISA implementations may provide 1-3 of the defined privilege modes. All hardware implementations must provide machine mode (or M-mode). A secure embedded system may provide machine and user modes. A virtualized multiprogramming system should provide all three modes.</p> </div> </div> <div class="outline-2" id="outline-container-org4951020"> <h2 id="org4951020">What's my Pay Grade?</h2> <div class="outline-text-2" id="text-org4951020"> <p>The capabilities of the ISA implementation can be queried via the <b>misa</b> control and status register. The function illustrated in the following listing will determine which integer ISA and the number of privilege modes that are supported by the current hardware thread:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.text</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> __system_check <span class="linenr"> 4: </span><span style="color: #87cefa;">__system_check</span>: <span class="linenr"> 5: </span> # Input <span class="linenr"> 6: </span> # None <span class="linenr"> 7: </span> # <span class="linenr"> 8: </span> # Returns <span class="linenr"> 9: </span> # - a0: The number of supported privilege modes (1, 2 or 3). <span class="linenr">10: </span> # - a1: The register width used by the ISA (in bytes). <span class="coderef-off" id="coderef-load-misa"><span class="linenr">11: </span> <span style="color: #00ffff;">csrr</span> t0, misa # Load misa into t0</span> <span class="coderef-off" id="coderef-load-xlen"><span class="linenr">12: </span> <span style="color: #00ffff;">li</span> a1, 4 # Load minimum register width</span> <span class="linenr">13: </span> <span style="color: #00ffff;">li</span> a0, 1 # M-mode is always supported <span class="linenr">14: </span> # Probe for user-mode <span class="coderef-off" id="coderef-umode-mask"><span class="linenr">15: </span> <span style="color: #00ffff;">lui</span> t1, 0x100 # Set the u-mode mask in t1</span> <span class="linenr">16: </span> <span style="color: #00ffff;">and</span> t1, t0, t1 <span class="coderef-off" id="coderef-check-umode"><span class="linenr">17: </span> <span style="color: #00ffff;">beqz</span> t1, 1f # Determine if the U-mode bit is set in misa</span> <span class="coderef-off" id="coderef-set-umode"><span class="linenr">18: </span> <span style="color: #00ffff;">addi</span> a0, a0, 1 # U-mode is available, increment a0</span> <span class="linenr">19: </span><span style="color: #87cefa;">1</span>: # Probe for supervisor-mode <span class="coderef-off" id="coderef-smode-mask"><span class="linenr">20: </span> <span style="color: #00ffff;">lui</span> t1, 0x40 # Set the S-mode mask in t1</span> <span class="linenr">21: </span> <span style="color: #00ffff;">and</span> t1, t0, t1 <span class="coderef-off" id="coderef-check-smode"><span class="linenr">22: </span> <span style="color: #00ffff;">beqz</span> t1, 2f # Determine if the S-mode bit is set in misa</span> <span class="coderef-off" id="coderef-set-smode"><span class="linenr">23: </span> <span style="color: #00ffff;">addi</span> a0, a0, 1 # S-mode is available, increment a0</span> <span class="linenr">24: </span><span style="color: #87cefa;">2</span>: # Determine register width <span class="coderef-off" id="coderef-have-xlen"><span class="linenr">25: </span> <span style="color: #00ffff;">bgez</span> t0, 3f # Determine if a1 holds the register width</span> <span class="coderef-off" id="coderef-scale-xlen"><span class="linenr">26: </span> <span style="color: #00ffff;">slli</span> a1, a1, 1 # Multiply register width by 2</span> <span class="linenr">27: </span> <span style="color: #00ffff;">slli</span> t0, t0, 1 <span class="linenr">28: </span> <span style="color: #00ffff;">j</span> 2b <span class="linenr">29: </span><span style="color: #87cefa;">3</span>: <span class="linenr">30: </span> <span style="color: #00ffff;">ret</span> <span class="linenr">31: </span> </pre></div> <p>This function loads the <b>misa</b> Control and Status Register (CSR) into the temporary register <b>t0</b> on line <a class="coderef" href="#coderef-load-misa" onmouseout="CodeHighlightOff(this, 'coderef-load-misa');" onmouseover="CodeHighlightOn(this, 'coderef-load-misa');">11</a> using the <code>csrr</code> pseudo instruction. The minimum register width is 4-bytes, therefore the immediate 4 is loaded into register <b>a1</b> as an initial value (line <a class="coderef" href="#coderef-load-xlen" onmouseout="CodeHighlightOff(this, 'coderef-load-xlen');" onmouseover="CodeHighlightOn(this, 'coderef-load-xlen');">12</a>). Moreover, machine-mode is required by hardware implementations, thus the initial value of <b>a1</b> is 1.</p> <p>The least-significant 26-bits of the <b>misa</b> CSR are flags which indicate supported extensions; one for each letter of the alphabet. The <b>S</b> and <b>U</b> extensions are for supervisor and user mode respectively. Therefore, user mode will be available if bit 20 is set to 1. A bit mask is created on line <a class="coderef" href="#coderef-umode-mask" onmouseout="CodeHighlightOff(this, 'coderef-umode-mask');" onmouseover="CodeHighlightOn(this, 'coderef-umode-mask');">15</a>.</p> <blockquote><p>Since the mask is an immediate value that is wider than what is allowed by RISC-V I-type instructions, the <code>lui</code> instruction can be used to load the 20-bit immediate value then shift it left by 12-bits with a single instruction. Thus the immediate value 0x100 becomes 0x100000 when loaded following the load operation.</p> </blockquote> <p>A bit-wise <b>AND</b> is performed using the bit-mask and the <b>misa</b> CSR value to determine if user-mode is available. If the result of the <code>and</code> instruction is zero on line <a class="coderef" href="#coderef-check-umode" onmouseout="CodeHighlightOff(this, 'coderef-check-umode');" onmouseover="CodeHighlightOn(this, 'coderef-check-umode');">17</a>, user-mode is not supported. Otherwise the value of <b>a1</b> is incremented on line <a class="coderef" href="#coderef-set-umode" onmouseout="CodeHighlightOff(this, 'coderef-set-umode');" onmouseover="CodeHighlightOn(this, 'coderef-set-umode');">18</a>.</p> <p>Similarly supervisor mode is supported if bit 18 is set to 1. A bit mask is created on line <a class="coderef" href="#coderef-smode-mask" onmouseout="CodeHighlightOff(this, 'coderef-smode-mask');" onmouseover="CodeHighlightOn(this, 'coderef-smode-mask');">20</a>, then a bit-wise <b>AND</b> is performed with the value if the <b>misa</b> CSR on the next line. If the result is zero, then supervisor mode is not supported. Otherwise the number of privilege modes is incremented by 1.</p> <p>The final step of the <code>__system_check</code> function is to determine the width of the hart's registers. The <b>misa</b> CSR's most-significant 2-bits encodes the width of the registers used by the ISA: 1) 32-bits, 2) 64-bits, 3) 128-bits. Determining this is complicated by the fact that the CSR's width is not known prior to checking this field. To overcome this, a check is performed to determine if the register's value is negative, in which case the most-significant bit must be set to 1. Therefore the number of bytes in <b>a1</b> will be multiplied by 2 (by shifting it to the left by 1 bit on line <a class="coderef" href="#coderef-scale-xlen" onmouseout="CodeHighlightOff(this, 'coderef-scale-xlen');" onmouseover="CodeHighlightOn(this, 'coderef-scale-xlen');">26</a>), the value of <b>t0</b> is shifted by 1 to the left, and the check is performed again. If the value of t0 is determined to be positive on line <a class="coderef" href="#coderef-have-xlen" onmouseout="CodeHighlightOff(this, 'coderef-have-xlen');" onmouseover="CodeHighlightOn(this, 'coderef-have-xlen');">25</a> (i.e. the msb is 0), then the <code>__system_check</code> function has completed its probe. The following listing illustrates the main program which performs the system check:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr">1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr">2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr">3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr">4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr">5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr">6: </span> <span style="color: #00ffff;">la</span> sp, _stack_end <span class="coderef-off" id="coderef-call-systemcheck"><span class="linenr">7: </span> <span style="color: #00ffff;">call</span> __system_check</span> <span class="linenr">8: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">9: </span> </pre></div> <p>The <code>__system_check</code> function is called on line <a class="coderef" href="#coderef-call-systemcheck" onmouseout="CodeHighlightOff(this, 'coderef-call-systemcheck');" onmouseover="CodeHighlightOn(this, 'coderef-call-systemcheck');">7</a>. When this function returns the register <b>a0</b> should hold the number of supported privilege modes, and register <b>a1</b> should hold the number of bytes in a register. If this program is run in <code>qemu</code>, the <code>info registers</code> command can be used at the console to determine the hardware details:</p> <div class="org-src-container"> <pre class="src src-sh"> make chapter5 qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel chapter5.elf QEMU 3.1.0 monitor - type <span style="color: #ffa07a;">'help'</span> for more information (qemu) info registers pc 000000008000000c mhartid 0000000000000000 mstatus 0000000000000000 mip 0000000000000000 mie 0000000000000000 mideleg 0000000000000000 medeleg 0000000000000000 mtvec 0000000000000000 mepc 0000000000000000 mcause 0000000000000000 zero 0000000000000000 ra 000000008000000c sp 0000000080009000 gp 0000000000000000 tp 0000000000000000 t0 000000000028225a t1 0000000000040000 t2 0000000000000000 s0 0000000000000000 s1 0000000000000000 a0 0000000000000003 a1 0000000000000008 ... </pre></div> <p>The value of <b>a0</b> is 3, therefore the <code>qemu</code> VirtIO platform supports all three privilege modes. The value of <b>a1</b> is 8, which means that the register width is 64-bits (8 bytes). This value could be used as a stack offset allowing for the definition of the function call preamble and postamble as part of a library. This library could then be used to define an operating system's ABI.</p> </div> </div> <div class="outline-2" id="outline-container-org59973e7"> <h2 id="org59973e7">It's a Trap!</h2> <div class="outline-text-2" id="text-org59973e7"> <p>Typically, systems should run in the most restricted environment possible in order to minimize catastrophes in the event of a system fault. However, when an exceptional event occurs, the system may want to raise its privilege level in order to deal with it. These types of events are often associated with an interrupt.</p> <p>The machine-mode status CSR, <b>mstatus</b>, allows some control over a hart's operating state, including enabling or disabling global interrupts. Three fields are defined in the register's least significant four bits for this purpose; one for each of the privilege modes.</p> <blockquote><p>specific interrupt types for each privilege mode must be enabled individually via the <b>mie</b> register which will be described later</p> </blockquote> <p>The fields defined in the <b>mstatus</b> register are illustrated in the following table:</p> <table border="2" cellpadding="6" cellspacing="0" frame="hsides" rules="groups"><caption class="t-above"><span class="table-number">Table 1:</span> mstatus CSR register</caption> <colgroup><col class="org-right" /><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /></colgroup><thead><tr><th class="org-right" scope="col">Bit</th> <th class="org-left" scope="col">0</th> <th class="org-left" scope="col">1</th> <th class="org-left" scope="col">2</th> <th class="org-left" scope="col">3</th> <th class="org-left" scope="col">4</th> <th class="org-left" scope="col">5</th> <th class="org-left" scope="col">6</th> <th class="org-left" scope="col">7</th> </tr></thead><tbody><tr><td class="org-right">0</td> <td class="org-left">UIE</td> <td class="org-left">SIE</td> <td class="org-left"> </td> <td class="org-left">MIE</td> <td class="org-left">UPIE</td> <td class="org-left">SPIE</td> <td class="org-left"> </td> <td class="org-left">MPIE</td> </tr><tr><td class="org-right">+8</td> <td class="org-left">MPP</td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left">MPP[0]</td> <td class="org-left">MPP[1]</td> <td class="org-left">FS[0]</td> <td class="org-left">FS[1]</td> <td class="org-left">XS[0]</td> </tr><tr><td class="org-right">+16</td> <td class="org-left">XS[1]</td> <td class="org-left">MPRV</td> <td class="org-left">SUM</td> <td class="org-left">MXR</td> <td class="org-left">TVM</td> <td class="org-left">TW</td> <td class="org-left">TSR</td> <td class="org-left"> </td> </tr><tr><td class="org-right">+24</td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> </tr><tr><td class="org-right">+32</td> <td class="org-left">UXL[0]</td> <td class="org-left">UXL[1]</td> <td class="org-left">SXL[0]</td> <td class="org-left">SXL[1]</td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> </tr><tr><td class="org-right">+40</td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> </tr><tr><td class="org-right">+48</td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> </tr><tr><td class="org-right">+56</td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left">SD</td> </tr></tbody></table><p>The <b>UIE</b>, <b>SIE</b>, and <b>MIE</b> fields will enable interrupts globally for the user, supervisor, and machine modes respectively. If the $x$IE fields value is set to 1, interrupts will be enabled globally for privilege mode \(x\) and any privilege mode \(y &lt; x\) provided $y$IE is also set to 1. If the hart is operating at privilege level \(x\), interrupts at privilege levels inferior to \(x\) will be disabled regardless of the state of the associated interrupt bit in <b>mstatus</b>.</p> <p>To be of any use, there must be a mechanism to handle interrupts. The <b>BASE</b> field of the <b>mtvec</b> CSR can be set to the base address of a trap-vector to handle interrupts. The <b>MODE</b> field, that occupies the least-significant two bits of <b>mtvec</b>, specify the trap mode. When the <b>MODE</b> field is set to 0, the trap mode will be set to call the handler directly. Otherwise the base address expresses the base of a vector of trap handlers; one for each interrupt type indexed by the interrupt code. For the time being, a single interrupt handler will be used to dispatch to the appropriate handler for interrupts or exceptions.</p> <p>The code that follows illustrates a skeleton for a trap handler implementation in direct mode:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.text</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> trap_handler <span class="linenr"> 4: </span><span style="color: #87cefa;">trap_handler</span>: <span class="linenr"> 5: </span> # Trap handler preamble (save registers). <span class="coderef-off" id="coderef-trap-load scratch"><span class="linenr"> 6: </span> <span style="color: #00ffff;">csrrw</span> a0, mscratch, a0</span> <span class="linenr"> 7: </span> <span style="color: #00ffff;">sd</span> a1, 0(a0) <span class="linenr"> 8: </span> <span style="color: #00ffff;">sd</span> a2, 8(a0) <span class="linenr"> 9: </span> <span style="color: #00ffff;">sd</span> a3, 16(a0) <span class="linenr">10: </span> <span style="color: #00ffff;">sd</span> a4, 24(a0) <span class="linenr">11: </span> # Decode the cause of the interrupt. <span class="coderef-off" id="coderef-trap-get cause"><span class="linenr">12: </span> <span style="color: #00ffff;">csrr</span> a1, mcause</span> <span class="coderef-off" id="coderef-trap-check exception"><span class="linenr">13: </span> <span style="color: #00ffff;">bgez</span> a1, exception</span> <span class="linenr">14: </span><span style="color: #87cefa;">interrupt</span>: <span class="linenr">15: </span> <span style="color: #00ffff;">andi</span> a1, a1, 0x3f # Isolate the cause field <span class="linenr">16: </span> # TODO: Dispatch to specific interrupt handler <span class="linenr">17: </span> <span style="color: #00ffff;">j</span> trap_handler_restore_state <span class="linenr">18: </span><span style="color: #87cefa;">exception</span>: <span class="linenr">19: </span> <span style="color: #00ffff;">addi</span> a1, a1, 0x3f # Isolate the cause field <span class="linenr">20: </span> # TODO: Dispatch to specific exception handler <span class="linenr">21: </span><span style="color: #87cefa;">trap_handler_restore_state</span>: <span class="linenr">22: </span> <span style="color: #00ffff;">ld</span> a4, 24(a0) <span class="linenr">23: </span> <span style="color: #00ffff;">ld</span> a3, 16(a0) <span class="linenr">24: </span> <span style="color: #00ffff;">ld</span> a2, 8(a0) <span class="linenr">25: </span> <span style="color: #00ffff;">ld</span> a1, 0(a0) <span class="linenr">26: </span> <span style="color: #00ffff;">csrrw</span> a0, mscratch, a0 <span class="linenr">27: </span> <span style="color: #00ffff;">mret</span> </pre></div> <p>The first instruction on line <a class="coderef" href="#coderef-trap-load scratch" onmouseout="CodeHighlightOff(this, 'coderef-trap-load scratch');" onmouseover="CodeHighlightOn(this, 'coderef-trap-load scratch');">6</a> will atomically swap the values of the <b>mscratch</b> CSR and <b>a0</b>. This register is defined to provide additional data to trap handlers. Typically, this should be set to an memory address of a buffer where register data can be saved while the handler is active.</p> <p>The next four lines save the contents of registers <b>a1</b>-<b>a4</b> in the memory buffer located at the address in <b>mscratch</b>, then the value of the <b>mcause</b> CSR is copied into <b>a1</b> on line <a class="coderef" href="#coderef-trap-get cause" onmouseout="CodeHighlightOff(this, 'coderef-trap-get cause');" onmouseover="CodeHighlightOn(this, 'coderef-trap-get cause');">12</a>. The <b>mcause</b> register is used to indicate the cause of synchronous and asynchronous exceptions. If the most significant bit in this register is zero, then the trap was caused by a synchronous exception. This can be determined by testing the value of <b>a1</b> to see if it is greater than or equal to zero (on line <a class="coderef" href="#coderef-trap-check exception" onmouseout="CodeHighlightOff(this, 'coderef-trap-check exception');" onmouseover="CodeHighlightOn(this, 'coderef-trap-check exception');">13</a>; if the msb is 1, the register will be negative and the execution will fall through to the interrupt handler code.</p> <p>To use the trap handler, it's address must be set in the base field of the <b>mtvec</b> CSR. This is illustrated in the following program:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">la</span> sp, _stack_end <span class="coderef-off" id="coderef-load scratch"><span class="linenr"> 7: </span> <span style="color: #00ffff;">la</span> a0, scratch</span> <span class="coderef-off" id="coderef-set scratch"><span class="linenr"> 8: </span> <span style="color: #00ffff;">csrrw</span> a0, mscratch, a0</span> <span class="coderef-off" id="coderef-load trap handler"><span class="linenr"> 9: </span> <span style="color: #00ffff;">la</span> a0, trap_handler</span> <span class="coderef-off" id="coderef-set trap handler"><span class="linenr">10: </span> <span style="color: #00ffff;">csrrw</span> a0, mtvec, a0</span> <span class="linenr">11: </span> <span style="color: #00ffff;">call</span> __system_check <span class="linenr">12: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">13: </span> <span style="color: #00ffff;">.bss</span> <span class="coderef-off" id="coderef-define scratch"><span class="linenr">14: </span><span style="color: #87cefa;">scratch</span>: .dword 0, 0, 0, 0</span> </pre></div> <p>The scratch area where the register state can be saved is defined on line <a class="coderef" href="#coderef-define scratch" onmouseout="CodeHighlightOff(this, 'coderef-define scratch');" onmouseover="CodeHighlightOn(this, 'coderef-define scratch');">14</a>. This allocates 32 bytes of space to save the contents of up to 4 registers which is enough for the current <code>trap_handler</code> implementation. The address of the scratch is loaded into register <b>a0</b> on line <a class="coderef" href="#coderef-load scratch" onmouseout="CodeHighlightOff(this, 'coderef-load scratch');" onmouseover="CodeHighlightOn(this, 'coderef-load scratch');">7</a>. This register's value is then swapped with the value of the <b>mscratch</b> CSR on line <a class="coderef" href="#coderef-set scratch" onmouseout="CodeHighlightOff(this, 'coderef-set scratch');" onmouseover="CodeHighlightOn(this, 'coderef-set scratch');">8</a>.</p> <p>The address of the trap handler function is loaded into register <b>a0</b> on line <a class="coderef" href="#coderef-load trap handler" onmouseout="CodeHighlightOff(this, 'coderef-load trap handler');" onmouseover="CodeHighlightOn(this, 'coderef-load trap handler');">9</a>, then it is swapped with the contents of the <b>mtvec</b> CSR on line <a class="coderef" href="#coderef-set trap handler" onmouseout="CodeHighlightOff(this, 'coderef-set trap handler');" onmouseover="CodeHighlightOn(this, 'coderef-set trap handler');">10</a>. The <b>MODE</b> field is left as zero to set the trap mode to direct; which will cause all synchronous and asynchronous interrupts to branch to the base address.</p> </div> </div> <div class="outline-2" id="outline-container-orga01113b"> <h2 id="orga01113b">What's the Time?</h2> <div class="outline-text-2" id="text-orga01113b"> <p>A platform's real-time counter is one of the possible sources of asynchronous exceptions that can cause an interrupt. The timer is typically external to the processing core. The VirtIO machine of the QEMU emulator includes the core-level interrupt module (CLINT). This module defines the <b>mtime</b> register that exposes the current value of the real-time counter. This value expresses the number of clock cycles that have elapsed since the processor was reset. This does not represent the real time, but a count of real-time intervals (determined by the oscillator frequency).</p> <p>The <b>mtime</b> register is mapped to a particular address in physical memory. The actual address is specified in the memory map of the VirtIO machine in the QEMU system (see <a href="https://git.qemu.org/?p=qemu.git;a=blob_plain;f=hw/riscv/virt.c;hb=refs/heads/stable-3.1">hw/riscv/virt.c</a>). The listing that follows illustrates a function to retrieve the current real-time counter value:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="coderef-off" id="coderef-clint base"><span class="linenr"> 1: </span> <span style="color: #00ffff;">.equ</span> CLINT_BASE, 0x2000000 # The base address of the CLINT module</span> <span class="coderef-off" id="coderef-mtime offset"><span class="linenr"> 2: </span> <span style="color: #00ffff;">.equ</span> CLINT_MTIME, 0xbff8 # The offset of the MTIME register</span> <span class="coderef-off" id="coderef-ld-mtime macro"><span class="linenr"> 3: </span> <span style="color: #00ffff;">.macro</span> ld_mtime rd # Macro to access the MTIME memory mapped register</span> <span class="linenr"> 4: </span> <span style="color: #00ffff;">li</span> t0, CLINT_BASE <span class="linenr"> 5: </span> <span style="color: #00ffff;">li</span> t1, CLINT_MTIME <span class="coderef-off" id="coderef-mtime addr"><span class="linenr"> 6: </span> <span style="color: #00ffff;">add</span> t0, t0, t1 # Determine the absolute address of MTIME</span> <span class="coderef-off" id="coderef-read counter"><span class="linenr"> 7: </span> <span style="color: #00ffff;">ld</span> \rd, 0(t0) # Read the counter value</span> <span class="linenr"> 8: </span> <span style="color: #00ffff;">.endm</span> <span class="linenr"> 9: </span> <span class="linenr">10: </span> <span style="color: #00ffff;">.text</span> <span class="linenr">11: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr">12: </span> <span style="color: #00ffff;">.global</span> __clock_cycle <span class="linenr">13: </span><span style="color: #87cefa;">__clock_cycle</span>: <span class="linenr">14: </span> # Retrieve the current value of the real-time counter. <span class="linenr">15: </span> # <span class="linenr">16: </span> # Inputs: None <span class="linenr">17: </span> # <span class="linenr">18: </span> # Returns: <span class="linenr">19: </span> # - a0: The current value of the mtime register. <span class="linenr">20: </span> <span style="color: #00ffff;">ld</span> ld_mtime a0 <span class="linenr">21: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>The CLINT_BASE symbol is defined on line <a class="coderef" href="#coderef-clint base" onmouseout="CodeHighlightOff(this, 'coderef-clint base');" onmouseover="CodeHighlightOn(this, 'coderef-clint base');">1</a>, its value is the absolute base address of the memory mapped registers of the CLINT module (see <a href="https://git.qemu.org/?p=qemu.git;a=blob_plain;f=include/hw/riscv/sifive_clint.h;hb=refs/heads/stable-3.1">include/hw/riscv/sifive_clint.h</a> of the QEMU source). The CLINT_MTIME symbol, defined on line <a class="coderef" href="#coderef-mtime offset" onmouseout="CodeHighlightOff(this, 'coderef-mtime offset');" onmouseover="CodeHighlightOn(this, 'coderef-mtime offset');">2</a>, specifies the offset of the <b>mtime</b> register relative to the CLINT's base memory address. The offset is added to the base address and its result stored in register <b>t0</b> on line <a class="coderef" href="#coderef-mtime addr" onmouseout="CodeHighlightOff(this, 'coderef-mtime addr');" onmouseover="CodeHighlightOn(this, 'coderef-mtime addr');">6</a>. Finally the value of the the real-time counter is retrieved on line <a class="coderef" href="#coderef-read counter" onmouseout="CodeHighlightOff(this, 'coderef-read counter');" onmouseover="CodeHighlightOn(this, 'coderef-read counter');">7</a>. All of these operations are defined in a macro starting on line <a class="coderef" href="#coderef-ld-mtime macro" onmouseout="CodeHighlightOff(this, 'coderef-ld-mtime macro');" onmouseover="CodeHighlightOn(this, 'coderef-ld-mtime macro');">3</a>.</p> <p>The <b>mtime</b> register is useful to determine the time since the board was reset. However, it cannot generate interrupts by itself. The <b>mtimecmp</b> memory mapped register will cause a timer interrupt to be posted when its value is less than the value contained in <b>mtime</b>. In other words, if when the periodically increasing value of <b>mtime</b> exceeds that contained in <b>mtimecmp</b>, a timer interrupt will be posted (provided that timer interrupts are enabled). Therefore to receive an interrupt after some fixed interval of time, the current value of <b>mtime</b> must be retrieved, then the timeout value must be added thereto and the result saved in the <b>mtimecmp</b> register. This process is illustrated in the following listing:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="coderef-off" id="coderef-mtimecmp_offset"><span class="linenr"> 1: </span> <span style="color: #00ffff;">.equ</span> CLINT_MTIMECMP, 0x4000 # The offset of the MTIMECMP register</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.macro</span> st_mtimecmp rs <span class="linenr"> 3: </span> <span style="color: #00ffff;">li</span> t0, CLINT_BASE <span class="linenr"> 4: </span> <span style="color: #00ffff;">li</span> t1, CLINT_MTIMECMP <span class="linenr"> 5: </span> <span style="color: #00ffff;">add</span> t1, t0, t1 <span class="linenr"> 6: </span> <span style="color: #00ffff;">sd</span> \rs, 0(t1) <span class="linenr"> 7: </span> <span style="color: #00ffff;">.endm</span> <span class="linenr"> 8: </span> <span style="color: #00ffff;">.global</span> __timer_create <span class="linenr"> 9: </span><span style="color: #87cefa;">__timer_create</span>: <span class="linenr">10: </span> # Set a timer to trigger an interrupt when a given number of <span class="linenr">11: </span> # clock cycles have elapsed. <span class="linenr">12: </span> # <span class="linenr">13: </span> # Inputs: <span class="linenr">14: </span> # - a0: The timeout value in clock cycles. <span class="linenr">15: </span> <span style="color: #00ffff;">ld_mtime</span> t1 <span class="linenr">16: </span> <span style="color: #00ffff;">ld</span> t1, 0(t1) <span class="coderef-off" id="coderef-set_timeout"><span class="linenr">17: </span> <span style="color: #00ffff;">add</span> a0, a0, t1 # Add the timeout to the clock cycle</span> <span class="linenr">18: </span> <span style="color: #00ffff;">st_mtimecmp</span> a0 <span class="linenr">19: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>The <code>__timer_create</code> function will set a timer to trigger an interrupt after the given number of clock cycles have elapsed. This code re-uses the macro defined previously to read the current value of the real-time counter, then adds the desired number of cycles thereto, and writes the result to the <b>mtimecmp</b> register. When <b>mtime</b>'s value is greater than the value in <b>mtimecmp</b>, the MTIP field of the <b>mip</b> CSR register will be asserted to indicate that a timer interrupt is pending, and the trap handler will be called. The following code will set up this process:</p> <div class="org-src-container"> <pre class="src src-asm" id="org0bbc010"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span> <span style="color: #00ffff;">.global</span> CLOCK_MONOTONIC <span class="linenr"> 6: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 7: </span> <span style="color: #00ffff;">la</span> sp, _stack_end <span class="coderef-off" id="coderef-load-mtrap"><span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> t0, __mtrap_handler # Load trap vector address</span> <span class="linenr"> 9: </span> <span style="color: #00ffff;">csrrw</span> zero, mtvec, t0 <span class="linenr">10: </span> <span style="color: #00ffff;">li</span> t0, 0b1&lt;&lt;3 <span class="linenr">11: </span> <span style="color: #00ffff;">csrrs</span> t0, mstatus, t0 # Enable interrupts globally (ref:set-mstatus.MIE) <span class="linenr">12: </span> <span style="color: #00ffff;">li</span> t0, 0b1&lt;&lt;7 <span class="linenr">13: </span> <span style="color: #00ffff;">csrrs</span> t0, mie, t0 # Enable timer interrupts (ref:set-mie.MTIE) <span class="coderef-off" id="coderef-load-timeout"><span class="linenr">14: </span><span style="color: #87cefa;">1</span>: <span style="color: #00ffff;">li</span> a0, 0x10000 # Set the timeout value</span> <span class="coderef-off" id="coderef-trap-settimer"><span class="linenr">15: </span> <span style="color: #00ffff;">call</span> __timer_create # Set the timer</span> <span class="coderef-off" id="coderef-wfi"><span class="linenr">16: </span> <span style="color: #00ffff;">wfi</span> # Wait for interrupts</span> <span class="linenr">17: </span> <span style="color: #00ffff;">j</span> 1b <span class="linenr">18: </span> <span class="linenr">19: </span> <span style="color: #00ffff;">.align</span> 2 <span class="coderef-off" id="coderef-mtrap-handler"><span class="linenr">20: </span><span style="color: #87cefa;">__mtrap_handler</span>: # Machine interrupt handler</span> <span class="coderef-off" id="coderef-trap-get-cause"><span class="linenr">21: </span> <span style="color: #00ffff;">csrrc</span> t0, mcause, zero # Get the cause of the interrupt</span> <span class="coderef-off" id="coderef-trap-exception"><span class="linenr">22: </span> <span style="color: #00ffff;">bgez</span> t0, 2f # Exit on an exception</span> <span class="linenr">23: </span> <span style="color: #00ffff;">slli</span> t0, t0, 1 <span class="linenr">24: </span> <span style="color: #00ffff;">srli</span> t0, t0, 1 <span class="linenr">25: </span> <span style="color: #00ffff;">li</span> t1, 7 # The timer interrupt has code 7. <span class="coderef-off" id="coderef-trap-check-mtip"><span class="linenr">26: </span> <span style="color: #00ffff;">bne</span> t0, t1, 2f # Check for timer interrupts</span> <span class="linenr">27: </span> <span style="color: #00ffff;">addi</span> s0, s0, 1 # Increment the interrupt count. <span class="coderef-off" id="coderef-trap-mret"><span class="linenr">28: </span><span style="color: #87cefa;">2</span>: <span style="color: #00ffff;">mret</span> # Machine trap return</span> </pre></div> <p>After setting up the stack, this program loads the address for the trap handler on line <a class="coderef" href="#coderef-load-mtrap" onmouseout="CodeHighlightOff(this, 'coderef-load-mtrap');" onmouseover="CodeHighlightOn(this, 'coderef-load-mtrap');">8</a>, and stores this address in the <b>mtvec</b> CSR. Machine-mode interrupts are then enabled globally by setting bit 3 of the <b>mstatus</b> CSR to 1 (on line <a class="coderef" href="#coderef-set-mstatus.MIE" onmouseout="CodeHighlightOff(this, 'coderef-set-mstatus.MIE');" onmouseover="CodeHighlightOn(this, 'coderef-set-mstatus.MIE');">11</a>), and M-mode timer interrupts are enabled (on line <a class="coderef" href="#coderef-set-mie.MTIE" onmouseout="CodeHighlightOff(this, 'coderef-set-mie.MTIE');" onmouseover="CodeHighlightOn(this, 'coderef-set-mie.MTIE');">13</a>) by setting bit 7 of the <b>mie</b> CSR to 1. On line <a class="coderef" href="#coderef-load-timeout" onmouseout="CodeHighlightOff(this, 'coderef-load-timeout');" onmouseover="CodeHighlightOn(this, 'coderef-load-timeout');">14</a>, an immediate value is loaded into register <b>a0</b>. This value will be used to create the timer on line <a class="coderef" href="#coderef-trap-settimer" onmouseout="CodeHighlightOff(this, 'coderef-trap-settimer');" onmouseover="CodeHighlightOn(this, 'coderef-trap-settimer');">15</a>. Once the timer is armed, the <code>wfi</code> instruction is used on line <a class="coderef" href="#coderef-wfi" onmouseout="CodeHighlightOff(this, 'coderef-wfi');" onmouseover="CodeHighlightOn(this, 'coderef-wfi');">16</a> to wait for an exception to occur at which point control jumps to the interrupt handler. When the handler returns, control jumps back to line <a class="coderef" href="#coderef-load-timeout" onmouseout="CodeHighlightOff(this, 'coderef-load-timeout');" onmouseover="CodeHighlightOn(this, 'coderef-load-timeout');">14</a>, and the timer is armed again. This should cause periodic calls to the trap vector.</p> <p>A machine-level trap handler is defined on line <a class="coderef" href="#coderef-mtrap-handler" onmouseout="CodeHighlightOff(this, 'coderef-mtrap-handler');" onmouseover="CodeHighlightOn(this, 'coderef-mtrap-handler');">20</a>. This handler will increment the value in register <b>s0</b> every time a timer exception is triggered. The cause of the interrupt is determined on line <a class="coderef" href="#coderef-trap-get-cause" onmouseout="CodeHighlightOff(this, 'coderef-trap-get-cause');" onmouseover="CodeHighlightOn(this, 'coderef-trap-get-cause');">21</a> which atomically reads the value of the <b>mcause</b> CSR and sets its value to zero. If the value of this register was greater-than, or equal to, zero (line <a class="coderef" href="#coderef-trap-exception" onmouseout="CodeHighlightOff(this, 'coderef-trap-exception');" onmouseover="CodeHighlightOn(this, 'coderef-trap-exception');">22</a>), the interrupt was caused by a synchronous exception (whereby the most-significant bit of the register will be set to 1). In the case of a synchronous exception, the handler simply exists by calling the <code>mret</code> instruction which returns control to the instruction that was executing when the exception occurred. Otherwise the next two lines will shift off the most significant bit (i.e. set it to zero). The interrupt code is checked on line <a class="coderef" href="#coderef-trap-check-mtip" onmouseout="CodeHighlightOff(this, 'coderef-trap-check-mtip');" onmouseover="CodeHighlightOn(this, 'coderef-trap-check-mtip');">26</a>, if it corresponds with the timer interrupt, the value of <b>s0</b> is incremented by 1.</p> <p>The state of the hart can be inspected via the <code>info registers</code> command in <code>qemu</code>. If the timing of the snapshot is such that the trap handler is executing the machine CSRs will have the following state:</p> <div class="org-src-container"> <pre class="src src-sh"> (qemu) info registers pc 000000008000004c mhartid 0000000000000000 mstatus 0000000000001880 mip 0000000000000080 mie 0000000000000080 mideleg 0000000000000000 medeleg 0000000000000000 mtvec 0000000080000048 mepc 0000000080000040 mcause 8000000000000007 </pre></div> <p>As usual, the <b>pc</b> register shows the current address of the active instruction, however, in this case control has jumped into the trap handler. When an exception is triggered, the following operations are executed atomically:</p> <ol class="org-ol"><li>Interrupts are disabled globally (bit 3 of <b>mstatus</b> is set to 0).</li> <li>The <b>mstatus.MPIE</b> field (which represents the previous global interrupt mode) is set the value of <b>mstatus.MIE</b></li> <li>Interrupts are disabled globally by setting the <b>mstatus.MIE</b> field is set to zero.</li> <li>The <b>mstatus.MPP</b> field (bits 12:11) is set to 0b11 which indicates the privilege level prior to the exception being raised.</li> <li>The address of the instruction that follows the last one to execute before the exception was raised is saved in the <b>mepc</b> register.</li> <li>The cause of the exception is written in register <b>mcause</b>.</li> <li>The <b>pc</b> register is set to the address in <b>mtvec</b>.</li> </ol><p>When the interrupt handler completes its task, the <code>mret</code> instruction is executed which will reverse this process. Control will be set to the address in <b>mepc</b>, and the value of <b>mstatus.MPIE</b> will be written to <b>mstatus.MIE</b> and cleared. The <b>mip</b> register will be cleared to indicate that there are no longer interrupts pending. Control flow will continue from where it was interrupted, and the core will be ready to handle new interrupts. More importantly, The hart will be returned to the privilege mode specified in the <b>mstatus.MPP</b> field. This behaviour can be exploited to set the current privilege mode of the processor.</p> </div> </div> <div class="outline-2" id="outline-container-orgd8dddb4"> <h2 id="orgd8dddb4">Enter the User</h2> <div class="outline-text-2" id="text-orgd8dddb4"> <p>The default privilege level when the processor is reset is machine mode. Programs running at this level have full control over the processor via the control and status registers. However, this opens up the system to abuse. To limit any damage that is possible by a wayward program, it is better to run in user mode. Getting to user mode is a simple matter of setting up the <b>mstatus</b> register and returning from M-mode via the <code>mret</code> instruction. The following macro will set the priviledge mode to the specified level:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> # Set the privilege mode to that specified by the immediate <span class="linenr"> 2: </span> # value. <span class="linenr"> 3: </span> <span style="color: #00ffff;">.macro</span> setmode imm <span class="linenr"> 4: </span> <span style="color: #00ffff;">li</span> t0, 0x1800 #(ref:clear mstatus.MPP) <span class="coderef-off" id="coderef-load priv-mode"><span class="linenr"> 5: </span> <span style="color: #00ffff;">li</span> t0, \imm</span> <span class="coderef-off" id="coderef-create priv mask"><span class="linenr"> 6: </span> <span style="color: #00ffff;">slli</span> t0, t0, 11</span> <span class="linenr"> 7: </span> <span style="color: #00ffff;">csrrs</span> t0, mstatus, t0 #(ref:set mstatus.MPP) <span class="coderef-off" id="coderef-load return address"><span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> t0, 1f</span> <span class="coderef-off" id="coderef-set mepc"><span class="linenr"> 9: </span> <span style="color: #00ffff;">csrrw</span> zero, mepc, t0</span> <span class="linenr">10: </span> <span style="color: #00ffff;">mret</span> <span class="coderef-off" id="coderef-return location"><span class="linenr">11: </span><span style="color: #87cefa;">1</span>:</span> <span class="linenr">12: </span> <span style="color: #00ffff;">.endm</span> </pre></div> <p>This macro will set the privilege mode to the value specified as an immediate by first clearing the <b>mstatus.MPP</b> field on line <a class="coderef" href="#coderef-clear mstatus.MPP" onmouseout="CodeHighlightOff(this, 'coderef-clear mstatus.MPP');" onmouseover="CodeHighlightOn(this, 'coderef-clear mstatus.MPP');">4</a>, then replacing it with the encoding of the desired privilege mode. The following table lists the privilege modes and their encodings:</p> <table border="2" cellpadding="6" cellspacing="0" frame="hsides" rules="groups"><caption class="t-above"><span class="table-number">Table 2:</span> Privilege level encodings</caption> <colgroup><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-right" /></colgroup><thead><tr><th class="org-left" scope="col">Level</th> <th class="org-left" scope="col">Name</th> <th class="org-left" scope="col">Encoding</th> <th class="org-right" scope="col">Immediate</th> </tr></thead><tbody><tr><td class="org-left">U</td> <td class="org-left">User/Application</td> <td class="org-left">0b00</td> <td class="org-right">0</td> </tr><tr><td class="org-left">S</td> <td class="org-left">Supervisor</td> <td class="org-left">0b01</td> <td class="org-right">1</td> </tr><tr><td class="org-left">M</td> <td class="org-left">Machine</td> <td class="org-left">0b11</td> <td class="org-right">3</td> </tr></tbody></table><p>The fourth column of this table lists the immediate value that should be supplied to the macro to set the associated privilege mode. The encoded privilege value is loaded into register <b>t0</b> on line <a class="coderef" href="#coderef-load priv-mode" onmouseout="CodeHighlightOff(this, 'coderef-load priv-mode');" onmouseover="CodeHighlightOn(this, 'coderef-load priv-mode');">5</a>, then shifted to the right position on the next line. This field is set via the <code>csrrs</code> instruction on line <a class="coderef" href="#coderef-set mstatus.MPP" onmouseout="CodeHighlightOff(this, 'coderef-set mstatus.MPP');" onmouseover="CodeHighlightOn(this, 'coderef-set mstatus.MPP');">7</a> which effectively sets <b>mstatus</b> to the bit-wise <b>OR</b> of its previous value with the value in <b>t0</b>. However, before returning from machine-mode, the <b>mepc</b> register must be updated with the address of the instruction to which control will return.</p> <p>The address immediately following the <code>mret</code> instruction is loaded into register <b>t0</b> on line <a class="coderef" href="#coderef-load return address" onmouseout="CodeHighlightOff(this, 'coderef-load return address');" onmouseover="CodeHighlightOn(this, 'coderef-load return address');">8</a>, then written to <b>mepc</b> on line <a class="coderef" href="#coderef-set mepc" onmouseout="CodeHighlightOff(this, 'coderef-set mepc');" onmouseover="CodeHighlightOn(this, 'coderef-set mepc');">9</a>. When the <code>mret</code> instruction is executed, <b>pc</b> will be set to this address. If the main program is updated to invoke this macro with the argument 0 just before the <code>wfi</code>, the program should be in U-mode by the time the timer expires:</p> <div class="org-src-container"> <pre class="src src-asm" id="org6f4fee5"> <span class="linenr">26: </span><span style="color: #87cefa;">1</span>: <span style="color: #00ffff;">li</span> a0, 0x10000 # Set the timeout value <span class="linenr">27: </span> <span style="color: #00ffff;">call</span> __timer_create # Set the timer <span class="coderef-off" id="coderef-set U-mode"><span class="linenr">28: </span> <span style="color: #00ffff;">setmode</span> 0</span> <span class="linenr">29: </span> <span style="color: #00ffff;">wfi</span> <span class="linenr">30: </span> <span style="color: #00ffff;">j</span> 1b <span class="linenr">31: </span> </pre></div> <p>Anything that runs following the call to the <code>setmode</code> macro will be executing in U-mode. However, the trap handler will execute in machine mode. Therefore the trap handler is useful for performing tasks that require machine mode privilege. Fortunately, traps can occur for asynchronous interrupts as well as synchronous exceptions. Therefore the trap handler can be used to implement system calls.</p> <p>The <code>ecall</code> instruction is an environment call which raises a synchronous exception, and sets <b>mcause</b> to the code indicating the active privilege mode when it was executed. The trap handler can be updated to jump to the appropriate function, which will execute in machine mode, then returning to the original pivilege mode when it is complete. The trap handler must be updated to handle the system call:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 2: </span><span style="color: #87cefa;">__mtrap_handler</span>: <span class="linenr"> 3: </span> <span style="color: #00ffff;">csrr</span> t0, mcause <span class="coderef-off" id="coderef-Check for exception"><span class="linenr"> 4: </span> <span style="color: #00ffff;">bgez</span> t0, 1f</span> <span class="linenr"> 5: </span> <span style="color: #00ffff;">slli</span> t0, t0, 1 <span class="linenr"> 6: </span> <span style="color: #00ffff;">srli</span> t0, t0, 1 <span class="coderef-off" id="coderef-Timer interrupt code"><span class="linenr"> 7: </span> <span style="color: #00ffff;">li</span> t1, 7</span> <span class="linenr"> 8: </span> <span style="color: #00ffff;">bne</span> t0, t1, 2f <span class="coderef-off" id="coderef-Increment interrupt count"><span class="linenr"> 9: </span> <span style="color: #00ffff;">addi</span> s0, s0, 1</span> <span class="linenr">10: </span> <span style="color: #00ffff;">j</span> 2f <span class="linenr">11: </span><span style="color: #87cefa;">1</span>: <span class="coderef-off" id="coderef-U-mode ecall code"><span class="linenr">12: </span> <span style="color: #00ffff;">li</span> t1, 8</span> <span class="linenr">13: </span> <span style="color: #00ffff;">bne</span> t0, t1, 2f <span class="coderef-off" id="coderef-save registers"><span class="linenr">14: </span> <span style="color: #00ffff;">push_stack</span></span> <span class="linenr">15: </span> <span style="color: #00ffff;">call</span> __syscall <span class="coderef-off" id="coderef-restore registers"><span class="linenr">16: </span> <span style="color: #00ffff;">pop_stack</span></span> <span class="coderef-off" id="coderef-Load mret address"><span class="linenr">17: </span> <span style="color: #00ffff;">csrr</span> t0, mepc</span> <span class="coderef-off" id="coderef-Set mret address"><span class="linenr">18: </span> <span style="color: #00ffff;">addi</span> t0, t0, 4</span> <span class="linenr">19: </span> <span style="color: #00ffff;">csrrw</span> zero, mepc, t0 <span class="linenr">20: </span><span style="color: #87cefa;">2</span>: <span style="color: #00ffff;">csrrw</span> t0, mcause, zero <span class="coderef-off" id="coderef-Machine trap return"><span class="linenr">21: </span> <span style="color: #00ffff;">mret</span></span> <span class="linenr">22: </span> </pre></div> <p>This handler is updated by adding some code for synchronous exceptions (starting at line <a class="coderef" href="#coderef-U-mode ecall code" onmouseout="CodeHighlightOff(this, 'coderef-U-mode ecall code');" onmouseover="CodeHighlightOn(this, 'coderef-U-mode ecall code');">12</a>). The value of <b>mcause</b> is compared with the user-environment call exception code (8) to see if a system call was requested. If so, the handler will save the registers on the stack via the <code>push_stack</code> macro on line <a class="coderef" href="#coderef-save registers" onmouseout="CodeHighlightOff(this, 'coderef-save registers');" onmouseover="CodeHighlightOn(this, 'coderef-save registers');">14</a>, call the <code>__syscall</code> function, then resture the registers to their values prior to the call via the <code>pop_stack</code> macro on line <a class="coderef" href="#coderef-restore registers" onmouseout="CodeHighlightOff(this, 'coderef-restore registers');" onmouseover="CodeHighlightOn(this, 'coderef-restore registers');">16</a>.</p> <p>After the system call has finished processing, the handler loads the value of the <b>mepc</b> CSR which should contain the address of the <code>ecall</code> instruction that caused the trap. This address is incremented by 4 to skip to the instruction that follows <code>ecall</code> on line <a class="coderef" href="#coderef-Set mret address" onmouseout="CodeHighlightOff(this, 'coderef-Set mret address');" onmouseover="CodeHighlightOn(this, 'coderef-Set mret address');">18</a>, then stores the updated address in <b>mepc</b> before executing <code>mret</code>. This will set the program counter to the instruction immediately following the one that triggered the system call, and restore the privilege mode to U-mode.</p> <p>Typically system calls are identified by a number. If its arguments are stored in the registers <b>a0</b> to <b>a7</b>, and its return value in <b>a0</b>, system calls will follow the same convention as regular function calls (albeit with greater privilege). The following code snippet will invoke the system call associated with the identifier 0x100:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="coderef-off" id="coderef-load syscall id"><span class="linenr">1: </span><span style="color: #87cefa;">li</span> <span style="color: #00ffff;">a0</span>, 0x100</span> <span class="coderef-off" id="coderef-execute the syscall"><span class="linenr">2: </span><span style="color: #87cefa;">ecall</span></span> </pre></div> <p>The timer can now be set from user mode via a system call. The following implementation of <code>__syscall</code> will invoke <code>__timer_create</code> with the timeout value specified in register <b>a1</b> when system call 0x100 is requested:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span><span style="color: #87cefa;">__syscall</span>: <span class="linenr"> 2: </span> <span style="color: #00ffff;">mv</span> s1, a0 <span class="coderef-off" id="coderef-timer syscall"><span class="linenr"> 3: </span> <span style="color: #00ffff;">li</span> t0, 0x100</span> <span class="linenr"> 4: </span> <span style="color: #00ffff;">bne</span> t0, a0, 1f <span class="coderef-off" id="coderef-set timeout arg"><span class="linenr"> 5: </span> <span style="color: #00ffff;">mv</span> a0, a1</span> <span class="linenr"> 6: </span> <span style="color: #00ffff;">push_stack</span> <span class="linenr"> 7: </span> <span style="color: #00ffff;">call</span> __timer_create <span class="linenr"> 8: </span> <span style="color: #00ffff;">pop_stack</span> <span class="linenr"> 9: </span><span style="color: #87cefa;">1</span>: <span style="color: #00ffff;">ret</span> <span class="linenr">10: </span> </pre></div> <p>This function will check the register <b>a0</b> to determine the system call id that is requested. If it is 0x100, then it will move the timeout value from register <b>a1</b> to <b>a0</b> on line <a class="coderef" href="#coderef-set timeout arg" onmouseout="CodeHighlightOff(this, 'coderef-set timeout arg');" onmouseover="CodeHighlightOn(this, 'coderef-set timeout arg');">5</a>, push the stack, then invoke the <code>__timer_create</code> function with M-mode privilege. The main program can be updated to set the privilege level to U-mode, then make a system call 0x100 to set the timer. The updated main program is illustrated in the following listing:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> CLOCK_MONOTONIC <span class="linenr"> 5: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 6: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 7: </span> <span style="color: #00ffff;">la</span> sp, _stack_end <span class="linenr"> 8: </span> <span style="color: #00ffff;">call</span> __system_check <span class="linenr"> 9: </span> <span style="color: #00ffff;">la</span> t0, __mtrap_handler <span class="linenr">10: </span> <span style="color: #00ffff;">csrrw</span> zero, mtvec, t0 <span class="linenr">11: </span> <span style="color: #00ffff;">li</span> t0, 0b1&lt;&lt;3 <span class="linenr">12: </span> <span style="color: #00ffff;">csrrs</span> zero, mstatus, t0 <span class="linenr">13: </span> <span style="color: #00ffff;">li</span> t0, 0b1&lt;&lt;7 <span class="linenr">14: </span> <span style="color: #00ffff;">csrrs</span> zero, mie, t0 <span class="linenr">15: </span> <span style="color: #00ffff;">setmode</span> 0 <span class="coderef-off" id="coderef-set timer syscall"><span class="linenr">16: </span><span style="color: #87cefa;">1</span>: <span style="color: #00ffff;">li</span> a0, 0x100</span> <span class="coderef-off" id="coderef-set system timeout"><span class="linenr">17: </span> <span style="color: #00ffff;">li</span> a1, 0x10000</span> <span class="linenr">18: </span> <span style="color: #00ffff;">ecall</span> <span class="linenr">19: </span> <span style="color: #00ffff;">wfi</span> <span class="linenr">20: </span> <span style="color: #00ffff;">j</span> 1b </pre></div> <p>The major difference from the previous main program is that the timer is not set directly, but via a system call. The syscall id is loaded into register <b>a0</b> on line <a class="coderef" href="#coderef-set timer syscall" onmouseout="CodeHighlightOff(this, 'coderef-set timer syscall');" onmouseover="CodeHighlightOn(this, 'coderef-set timer syscall');">16</a>, and the timeout value into register <b>a1</b> on line <a class="coderef" href="#coderef-set system timeout" onmouseout="CodeHighlightOff(this, 'coderef-set system timeout');" onmouseover="CodeHighlightOn(this, 'coderef-set system timeout');">17</a>. This set of instructions will be repeated each time a timeout is triggered.</p> </div> </div> <div class="outline-2" id="outline-container-orgdb17f69"> <h2 id="orgdb17f69">Conclusion</h2> <div class="outline-text-2" id="text-orgdb17f69"> <p>This chapter has delved into how RISC-V handles synchronous and asynchronous exceptions as well as the privilege mode instructions available in the ISA. This may be one of the key components in the development of an operating system by allowing privileged functions to run separately from user code. System calls allow U-mode applications to request services which require M-mode (or S-mode) privilege to execute via the <code>ecall</code> instruction.</p> <p>Moreover, the asynchronous exception handling provides a good introduction into how RISC-V processors deal with events originating from external peripherals. This will become important when creating programs intended to interact with the system user.</p> <p>Although the examples in this chapter were restricted to M-mode and U-mode privilege levels. The supervisor mode was briefly discussed. Having three privilege levels is useful when creating hypervisors: guest operating systems can operate in S-mode while the virtualization environment uses M-mode.</p> <p>The next chapter will leverage many of the capabilities discussed here to interface with more of the external components in a RISC-V system. In particular the UART module will allow users to interact with applications via a serial console.</p> </div> </div> </div> </div> <span rel="sioc:has_creator" class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/main/user/1" lang="" about="/main/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">MarcAdmin</a></span> <span property="dc:date dc:created" content="2019-12-11T02:36:21+00:00" datatype="xsd:dateTime" class="field field--name-created field--type-created field--label-hidden">Tue, 12/10/2019 - 21:36</span> <div class="field field--name-field-tags field--type-entity-reference field--label-above clearfix"> <h3 class="field__label">Tags</h3> <ul class="links field__items"> <li><a href="/main/riscv" rel="dc:subject" hreflang="en">RISC-V</a></li> <li><a href="/main/taxonomy/term/17" rel="dc:subject" hreflang="en">Computer Architecture</a></li> </ul> </div> Wed, 11 Dec 2019 02:36:21 +0000 MarcAdmin 35 at https://www.vociferousvoid.org/main https://www.vociferousvoid.org/main/riscv_bare_metal_chapter5#comments RISC-V Bare Metal Programming - Chapter 4: Another Brick in the Wall https://www.vociferousvoid.org/main/riscv_bare_metal_chapter4 <span property="dc:title" class="field field--name-title field--type-string field--label-hidden">RISC-V Bare Metal Programming - Chapter 4: Another Brick in the Wall</span> <div property="content:encoded" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><div class="tex2jax_process"> <p><a href="https://www.vociferousvoid.org/main/risc_bare_metal_chapter3">Chapter 3</a> of this RISC-V bare metal tutorial studied the linking process and how a developer can control where code and data are placed in memory. Constants, initialized variables and uninitialized variables were defined and explicitly positioned in RAM as prescribed by a linker script. The running example program was updated to read operands from RAM to perform its task, and subsequently store the result in a different location in RAM. However, up to this point only the base RV64I instruction set has been used. This chapter will explore some of the standard extensions available in the RISC-V ISA.</p> <p>One of the objectives in the design of the RISC-V ISA is to support many different deployment environments which may have varying constraints for efficiency, performance, and cost. For this reason, the base instruction set was restricted to the minimum required to build a useful program. This reduces the processor complexity potentially yielding performance and efficiency gains. However, these gains may be lost when performing more complex computations. To address potential limitations in the base instruction set, optional standard extensions have been defined to expand the available set of instructions. The standard extensions available for 32 and 64-bit instruction sets include:</p> <dl class="org-dl"><dt>M</dt> <dd>Support for multiply and divide (RV32M and RV64M).</dd> <dt>A</dt> <dd>Atomic operations (RV32A and RV64A).</dd> <dt>F</dt> <dd>Floating point support (RV32F and RV64F).</dd> <dt>D</dt> <dd>Double precision floating point support (RV32D and RV64D).</dd> </dl><p>This set of standard extensions are typically included in most implementations of RISC-V cores. The base set plus these extensions is often referred to as the <b>G</b> instruction set (RV32G or RV64G). Each of these standard extensions will be explored in this chapter.</p> <div class="outline-2" id="outline-container-org657fc9a"> <h2 id="org657fc9a">Multiply</h2> <div class="outline-text-2" id="text-org657fc9a"> <p>The <b>M</b> extension provides instructions for multiplying and dividing integers using both word and double-word length operands. When using word length operands, the result will not require more than 64-bits of memory which fits in an RV64I registers. The following listing of the <code>product.s</code> source file shows the assembly code of a function to multiply word sized integer operands:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.text</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> __imul32 <span class="linenr"> 4: </span><span style="color: #87cefa;">__imul32</span>: <span class="linenr"> 5: </span> # Input: <span class="linenr"> 6: </span> # a0: 32-bit multiplicand <span class="linenr"> 7: </span> # a1: 32-bit multiplier <span class="linenr"> 8: </span> # Result: <span class="linenr"> 9: </span> # a0: 64-bit product <span class="linenr">10: </span> <span style="color: #00ffff;">addi</span> sp, sp, -32 <span class="linenr">11: </span> <span style="color: #00ffff;">sd</span> ra, 24(sp) <span class="coderef-off" id="coderef-multiplication"><span class="linenr">12: </span> <span style="color: #00ffff;">mulw</span> a0, a0, a1</span> <span class="linenr">13: </span> <span style="color: #00ffff;">ld</span> ra, 24(sp) <span class="linenr">14: </span> <span style="color: #00ffff;">addi</span> sp, sp, 32 <span class="linenr">15: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>Due to the fact that the arguments of this function are expected to be word-length data, the calcluation of the product can be performed using a single instruction (<b>mulw</b> on line <a class="coderef" href="#coderef-multiplication" onmouseout="CodeHighlightOff(this, 'coderef-multiplication');" onmouseover="CodeHighlightOn(this, 'coderef-multiplication');">12</a>). The main program can be updated as follows to invoke the <b>__imul32</b> function:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">lw</span> a0, operand1 <span class="linenr"> 7: </span> <span style="color: #00ffff;">lw</span> a1, operand2 <span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> sp,_stack_end <span class="linenr"> 9: </span> <span style="color: #00ffff;">call</span> sum <span class="linenr">10: </span> <span style="color: #00ffff;">la</span> t1, result1 <span class="linenr">11: </span> <span style="color: #00ffff;">sw</span> a0, 0(t1) <span class="coderef-off" id="coderef-call_imul32"><span class="linenr">12: </span> <span style="color: #00ffff;">call</span> __imul32</span> <span class="linenr">13: </span> <span style="color: #00ffff;">la</span> t1, result2 <span class="coderef-off" id="coderef-save_imul32"><span class="linenr">14: </span> <span style="color: #00ffff;">sd</span> a0, 0(t1)</span> <span class="linenr">15: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">16: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".rodata"</span> <span class="linenr">17: </span><span style="color: #87cefa;">operand1</span>: .word 4 <span class="linenr">18: </span> <span style="color: #00ffff;">.data</span> <span class="linenr">19: </span><span style="color: #87cefa;">operand2</span>: .word 5 <span class="linenr">20: </span> <span style="color: #00ffff;">.bss</span> <span class="coderef-off" id="coderef-sum_result"><span class="linenr">21: </span><span style="color: #87cefa;">result1</span>: .word 0</span> <span class="coderef-off" id="coderef-__imul32_result"><span class="linenr">22: </span><span style="color: #87cefa;">result2</span>: .dword 0</span> </pre></div> <p>The <b>.bss</b> section of the <b>ELF</b> file was updated to declare two result variables: <b>result1</b> on line <a class="coderef" href="#coderef-sum_result" onmouseout="CodeHighlightOff(this, 'coderef-sum_result');" onmouseover="CodeHighlightOn(this, 'coderef-sum_result');">21</a> which will hold the sum of the operands in a word, and <b>result2</b> on line <a class="coderef" href="#coderef-__imul32_result" onmouseout="CodeHighlightOff(this, 'coderef-__imul32_result');" onmouseover="CodeHighlightOn(this, 'coderef-__imul32_result');">22</a> which will hold their product in a double-word.</p> <p>After the sum of the operands is calculated, and the result is saved in memory, it is kept in register <b>a0</b> to be used as the multiplicand. The value of <b>operand2</b> will be used as the multiplier; its value should still be in the <b>a1</b> register since its content is not modified by the <b>sum</b> function. The <b>__imul32</b> function is then called on line <a class="coderef" href="#coderef-call_imul32" onmouseout="CodeHighlightOff(this, 'coderef-call_imul32');" onmouseover="CodeHighlightOn(this, 'coderef-call_imul32');">12</a> and the result is saved in memory at line <a class="coderef" href="#coderef-save_imul32" onmouseout="CodeHighlightOff(this, 'coderef-save_imul32');" onmouseover="CodeHighlightOn(this, 'coderef-save_imul32');">14</a>.</p> <p>The program can be compiled and executed in <b>qemu</b> using the following sequence of commands:</p> <pre class="example"> riscv64-unknown-elf-as -o add.o add.s riscv64-unknown-elf-as -o main.o main.s riscv64-unknown-elf-as -o product.o product.s riscv64-unknown-elf-ld -T chapter3.lds -o main.elf add.o main.o product.o qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel main.elf QEMU 3.1.0 monitor - type 'help' for more information (qemu) </pre><p>The <code>chapter3.lds</code> linker script is the same one that was used in chapter 3. The result values can be inspected from the <b>qemu</b> console using the <b>xp</b> command:</p> <pre class="example"> (qemu) xp /1wd 0x80001004 0000000080001004: 9 (qemu) xp /1gd 0x80001008 0000000080001008: 45 (qemu) </pre><p>The location of <b>result1</b> in memory is the same as <b>result</b> from the previous chapter. The memory location of <b>result2</b> will be 4-bytes beyond <b>result1</b> since this value is 32-bits wide. Therefore the product result can be found at memory offset 0x80001008. This can easily be verified using the <b>objdump</b> utility:</p> <pre class="example"> $ riscv64-unknown-elf-objdump -D -j.bss main.elf sum.elf: file format elf64-littleriscv Disassembly of section .bss: 0000000080001004 &lt;result1&gt;: 80001004: 0000 unimp ... 0000000080001008 &lt;result2&gt;: ... </pre><p>As expected the multiplication of 9 and 5 is 45.</p> <p>Multiplication using registers is a little more complicated when dealing with 64-bit values. This is due to the fact that the product will be wider (in bits) than either the multiplier or multiplicand. The <b>__imul32</b> function assumes that the operands are word-length values, therefore the result will fit in a single double-word register. However, the calculated product will be truncated if double-word length operands are provided. The product of two 64-bit values may have as many as 128 bits which is wider than any available register in the RV64I instruction set. To mitigate this problem, the RISC-V ISA requires two instructions to perform a multiplication: one to calculate the most significant double-word (<b>mulh</b>), and a second to calculate the least significant double-word (<b>mul</b>). The following listing illustrates the <b>__imul64</b> function that can handle 64-bit operands:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.global</span> __imul64 <span class="linenr"> 2: </span><span style="color: #87cefa;">__imul64</span>: <span class="linenr"> 3: </span> # Input: <span class="linenr"> 4: </span> # a0: 64-bit multiplicand <span class="linenr"> 5: </span> # a1: 64-bit multiplier <span class="linenr"> 6: </span> # Result: <span class="linenr"> 7: </span> # a0: low 64-bits of the product <span class="linenr"> 8: </span> # a1: high 64-bits of the product <span class="linenr"> 9: </span> <span style="color: #00ffff;">addi</span> sp, sp, -32 <span class="linenr">10: </span> <span style="color: #00ffff;">sd</span> ra, 24(sp) <span class="coderef-off" id="coderef-save_t1"><span class="linenr">11: </span> <span style="color: #00ffff;">sd</span> t1, 16(sp)</span> <span class="coderef-off" id="coderef-save_t0"><span class="linenr">12: </span> <span style="color: #00ffff;">sd</span> t0, 8(sp)</span> <span class="coderef-off" id="coderef-mv_arg0"><span class="linenr">13: </span> <span style="color: #00ffff;">mv</span> t0, a0</span> <span class="coderef-off" id="coderef-mv_arg1"><span class="linenr">14: </span> <span style="color: #00ffff;">mv</span> t1, a1</span> <span class="coderef-off" id="coderef-__imul64_low"><span class="linenr">15: </span> <span style="color: #00ffff;">mul</span> a0, t0, t1</span> <span class="coderef-off" id="coderef-__imul64_high"><span class="linenr">16: </span> <span style="color: #00ffff;">mulh</span> a1, t0, t1</span> <span class="linenr">17: </span> <span style="color: #00ffff;">ld</span> t0, 8(sp) <span class="coderef-off" id="coderef-restore_t1"><span class="linenr">18: </span> <span style="color: #00ffff;">ld</span> t1, 16(sp)</span> <span class="coderef-off" id="coderef-restore_t0"><span class="linenr">19: </span> <span style="color: #00ffff;">ld</span> ra, 24(sp)</span> <span class="linenr">20: </span> <span style="color: #00ffff;">addi</span> sp, sp, 32 <span class="linenr">21: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>This code can be added to the <code>product.s</code> source file to provide a multiplication operation that uses 64-bit integers. The first thing this function does is save the contents of registers <b>t1</b> (line <a class="coderef" href="#coderef-save_t1" onmouseout="CodeHighlightOff(this, 'coderef-save_t1');" onmouseover="CodeHighlightOn(this, 'coderef-save_t1');">11</a>) and <b>t0</b> (line <a class="coderef" href="#coderef-save_t0" onmouseout="CodeHighlightOff(this, 'coderef-save_t0');" onmouseover="CodeHighlightOn(this, 'coderef-save_t0');">12</a>) which will be used by this function.</p> <blockquote><p><b>note</b> these are supposed to be caller saved registers, presumably the caller of the product function would have saved them. However, we are saving them here anyway</p> </blockquote> <p>The values of the function arguments are then moved into the temporary registers (lines <a class="coderef" href="#coderef-mv_arg0" onmouseout="CodeHighlightOff(this, 'coderef-mv_arg0');" onmouseover="CodeHighlightOn(this, 'coderef-mv_arg0');">13</a> and <a class="coderef" href="#coderef-mv_arg1" onmouseout="CodeHighlightOff(this, 'coderef-mv_arg1');" onmouseover="CodeHighlightOn(this, 'coderef-mv_arg1');">14</a>). This is required because, unlike the first version of this function, the arguments need to be reused and the value of <b>a0</b> will be overwritten by the first mutiplication on line <a class="coderef" href="#coderef-__imul64_low" onmouseout="CodeHighlightOff(this, 'coderef-__imul64_low');" onmouseover="CodeHighlightOn(this, 'coderef-__imul64_low');">15</a> which calculates the product of the low 32-bits of the operands. The second multiplication (line <a class="coderef" href="#coderef-__imul64_high" onmouseout="CodeHighlightOff(this, 'coderef-__imul64_high');" onmouseover="CodeHighlightOn(this, 'coderef-__imul64_high');">16</a>) will calculate the product of the high 32-bits of the operands and store the result in <b>a1</b>.</p> <p>The main program must be updated to handle a potential 128-bit result from the <b>__imul64</b> function:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">lw</span> a0, operand2 <span class="linenr"> 7: </span> <span style="color: #00ffff;">lw</span> a1, operand1 <span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> sp,_stack_end <span class="linenr"> 9: </span> <span style="color: #00ffff;">call</span> sum <span class="linenr">10: </span> <span style="color: #00ffff;">la</span> t1, result1 <span class="linenr">11: </span> <span style="color: #00ffff;">sw</span> a0, 0(t1) <span class="coderef-off" id="coderef-call__imul64"><span class="linenr">12: </span> <span style="color: #00ffff;">call</span> __imul64</span> <span class="linenr">13: </span> <span style="color: #00ffff;">la</span> t1, result2 <span class="coderef-off" id="coderef-save_product_low"><span class="linenr">14: </span> <span style="color: #00ffff;">sd</span> a0, 8(t1)</span> <span class="coderef-off" id="coderef-save_product_high"><span class="linenr">15: </span> <span style="color: #00ffff;">sd</span> a1, 0(t1)</span> <span class="linenr">16: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">17: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".rodata"</span> <span class="linenr">18: </span><span style="color: #87cefa;">operand1</span>: .word 4 <span class="linenr">19: </span> <span style="color: #00ffff;">.data</span> <span class="linenr">20: </span><span style="color: #87cefa;">operand2</span>: .word 5 <span class="linenr">21: </span> <span style="color: #00ffff;">.bss</span> <span class="linenr">22: </span><span style="color: #87cefa;">result1</span>: .word 0 <span class="coderef-off" id="coderef-product128_result"><span class="linenr">23: </span><span style="color: #87cefa;">result2</span>: .dword 0, 0</span> </pre></div> <p>The most significant change is that the result must be stored to memory using two instructions: one to store the product of the low 32-bits (line <a class="coderef" href="#coderef-save_product_low" onmouseout="CodeHighlightOff(this, 'coderef-save_product_low');" onmouseover="CodeHighlightOn(this, 'coderef-save_product_low');">14</a>), and one to store the product of the high 32-bits (line <a class="coderef" href="#coderef-save_product_high" onmouseout="CodeHighlightOff(this, 'coderef-save_product_high');" onmouseover="CodeHighlightOn(this, 'coderef-save_product_high');">15</a>). The <b>result2</b> variable on line <a class="coderef" href="#coderef-product128_result" onmouseout="CodeHighlightOff(this, 'coderef-product128_result');" onmouseover="CodeHighlightOn(this, 'coderef-product128_result');">23</a> must also be updated to reserve 128-bits for the product. The arguments of the <b>__imul64</b> function are the same as those of the <b>__imul32</b> function. Therefore the new function can be invoked by simply changing the call label on line <a class="coderef" href="#coderef-call__imul64" onmouseout="CodeHighlightOff(this, 'coderef-call__imul64');" onmouseover="CodeHighlightOn(this, 'coderef-call__imul64');">12</a>.</p> <p>After recompiling and linking the modified source files, the result can be inspecetd in the <b>qemu</b> console by printing out 2 double-word values at offset 0x80001008:</p> <pre class="example"> $ qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel main.elf QEMU 3.1.0 monitor - type 'help' for more information (qemu) xp /2gd 0x80001008 0000000080001008: 45 0 (qemu) quit </pre><p>Note that the <b>__imul64</b> function can also be used with 32-bit operands. The value of the high double-word will be zero in this case since no overflow occurred.</p> </div> </div> <div class="outline-2" id="outline-container-org9ca4bd4"> <h2 id="org9ca4bd4">Divide</h2> <div class="outline-text-2" id="text-org9ca4bd4"> <p>The RVM extension also provides instructions to calculate the quotient and remainder a division of an integer by another integer. This is slightly less complicated than multiplication because the result cannot be wider than the operands. However, this also limits divisions to dividends and divisors with a maximum of 64-bits. Therefore this is not a true reciprocal of the multiplication which can have a 128-bit result.</p> <p>The following listing illustrates the contents of the <code>divide.s</code> source file which defines the function to divide an unsigned 64-bit integer divisor by an unsigned 64-bit integer dividend.</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.text</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> __idiv64u <span class="linenr"> 4: </span><span style="color: #87cefa;">__idiv64u</span>: <span class="linenr"> 5: </span> <span style="color: #00ffff;">addi</span> sp, sp, -32 <span class="linenr"> 6: </span> <span style="color: #00ffff;">sd</span> ra, 24(sp) <span class="coderef-off" id="coderef-__idiv64u_check_divzero"><span class="linenr"> 7: </span> <span style="color: #00ffff;">beqz</span> a1, __idiv64u_exit</span> <span class="linenr"> 8: </span> <span style="color: #00ffff;">div</span> a0, a0, a1 <span class="linenr"> 9: </span><span style="color: #87cefa;">__idiv64u_exit</span>: <span class="linenr">10: </span> <span style="color: #00ffff;">ld</span> ra, 24(sp) <span class="linenr">11: </span> <span style="color: #00ffff;">addi</span> sp, sp, 32 <span class="linenr">12: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>This function is fairly straight forward, after ensuring that the dividend is not zero, it simply calls the <b>div</b> instruction to calculate the quotient. The check to ensure that the dividend is not zero on line <a class="coderef" href="#coderef-__idiv64u_check_divzero" onmouseout="CodeHighlightOff(this, 'coderef-__idiv64u_check_divzero');" onmouseover="CodeHighlightOn(this, 'coderef-__idiv64u_check_divzero');">7</a> is necessary because R64M does not trap on a divide by zero error. If the dividend is zero, the <b>div</b> instruction will be skipped.</p> <p>Since the result of the <b>__imul64</b> function is a 64-bit value due to its small operands, the <b>__idiv64</b> function can be invoked on the result to verify its accuracy. The <code>main.s</code> program can be updated as follows to divide the result of <b>__imul64</b> by <b>operand2</b>, and save the result in a variable in the <b>.data</b> section named <b>result3</b>.</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">lw</span> a0, operand1 <span class="linenr"> 7: </span> <span style="color: #00ffff;">lw</span> a1, operand2 <span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> sp,_stack_end <span class="linenr"> 9: </span> <span style="color: #00ffff;">call</span> sum <span class="linenr">10: </span> <span style="color: #00ffff;">la</span> t1, result1 <span class="linenr">11: </span> <span style="color: #00ffff;">sw</span> a0, 0(t1) <span class="linenr">12: </span> <span style="color: #00ffff;">call</span> __imul64 <span class="linenr">13: </span> <span style="color: #00ffff;">la</span> t1, result2 <span class="linenr">14: </span> <span style="color: #00ffff;">sd</span> a0, 0(t1) <span class="linenr">15: </span> <span style="color: #00ffff;">sd</span> a1, 8(t1) <span class="coderef-off" id="coderef-check_overflow"><span class="linenr">16: </span> <span style="color: #00ffff;">bnez</span> a1, stop</span> <span class="coderef-off" id="coderef-load_dividend"><span class="linenr">17: </span> <span style="color: #00ffff;">lw</span> a1, operand2</span> <span class="coderef-off" id="coderef-call_divide"><span class="linenr">18: </span> <span style="color: #00ffff;">call</span> __idiv64u</span> <span class="coderef-off" id="coderef-load_result3_addr"><span class="linenr">19: </span> <span style="color: #00ffff;">la</span> t0, result3</span> <span class="coderef-off" id="coderef-save_quotient"><span class="linenr">20: </span> <span style="color: #00ffff;">sd</span> a0, 0(t0)</span> <span class="linenr">21: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">22: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".rodata"</span> <span class="linenr">23: </span><span style="color: #87cefa;">operand1</span>: .word 4 <span class="linenr">24: </span> <span style="color: #00ffff;">.data</span> <span class="linenr">25: </span><span style="color: #87cefa;">operand2</span>: .word 5 <span class="linenr">26: </span> <span style="color: #00ffff;">.bss</span> <span class="linenr">27: </span><span style="color: #87cefa;">result1</span>: .word 0 <span class="linenr">28: </span><span style="color: #87cefa;">result2</span>: .dword 0, 0 <span class="coderef-off" id="coderef-division_result"><span class="linenr">29: </span><span style="color: #87cefa;">result3</span>: .dword 0</span> </pre></div> <p>After <b>__imul64</b> returns, the value is checked for overflow (line <a class="coderef" href="#coderef-check_overflow" onmouseout="CodeHighlightOff(this, 'coderef-check_overflow');" onmouseover="CodeHighlightOn(this, 'coderef-check_overflow');">16</a>) by asserting that the value returned in <b>a1</b> is zero. This will ensure that the result of the multiplication fits in a single 64-bit register. If the result is greater than 64-bits wide, the division will be skipped. Otherwise <b>operand2</b> is loaded into register <b>a1</b>. This check is not strictly necessary unless different operand values are used which may result in an overflow.</p> <p>The divide function will determine the quotient of the <b>__imul64</b> result by the value of <b>operand2</b>. The quotient will be stored in the <b>result3</b> variable. This should be the same as the result of the <b>sum</b> function (in <b>result1</b>). This can be verified by assembling and linking this program and running the binary in <b>qemu</b>. The value of <b>result3</b></p> <pre class="example"> riscv64-unknown-elf-as -o add.o add.s riscv64-unknown-elf-as -o divide.o divide.s riscv64-unknown-elf-as -o main.o main.s riscv64-unknown-elf-as -o product.o product.s riscv64-unknown-elf-ld -T chapter3.lds -o main.elf add.o divide.o main.o product.o qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel main.elf QEMU 3.1.0 monitor - type 'help' for more information (qemu) xp /1wd 0x80001004 0000000080001004: 9 (qemu) xp /1gd 0x80001018 0000000080001018: 9 (qemu) </pre><p>The offset of the <b>result3</b> variable will be 0x80001018; it is 16-bytes beyond the <b>result2</b> variable which is locaed at 0x80001008 (therefore +0x10). This can be verified using <b>objdump</b> as in the previous example.</p> <p>As expected, <b>result3</b> contains the integer 9 which is the result of the <b>sum</b> function in variable <b>result1</b> at offset 0x80001004.</p> <p>This value is convenient because 5 divides 45 exactly. If we divided the result of <b>__imul64</b> by <b>operand1</b> instead, the result would be 11 and there would be a remainder of 1. In the current implementation, this value is lost. However, the divide function can be updated to calculate the quotient and the remainder of a division. The updated <b>__idiv64u</b> function is illustrated in the following listing.</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span><span style="color: #87cefa;">__idiv64u</span>: <span class="linenr"> 2: </span> # Input: <span class="linenr"> 3: </span> # a0: 64-bit divisor <span class="linenr"> 4: </span> # a1: 64-bit dividend <span class="linenr"> 5: </span> # Returns: <span class="linenr"> 6: </span> # a0 =&gt; 64-bit quotient <span class="linenr"> 7: </span> # a1 =&gt; 64-bit remainder <span class="linenr"> 8: </span> <span style="color: #00ffff;">addi</span> sp, sp, -32 <span class="linenr"> 9: </span> <span style="color: #00ffff;">sd</span> ra, 24(sp) <span class="linenr">10: </span> <span style="color: #00ffff;">sd</span> t1, 16(sp) <span class="linenr">11: </span> <span style="color: #00ffff;">sd</span> t0, 8(sp) <span class="linenr">12: </span> <span style="color: #00ffff;">beqz</span> a1, __idiv64u_exit <span class="linenr">13: </span> <span style="color: #00ffff;">mv</span> t0, a0 <span class="linenr">14: </span> <span style="color: #00ffff;">mv</span> t1, a1 <span class="coderef-off" id="coderef-calculate_quotient"><span class="linenr">15: </span> <span style="color: #00ffff;">div</span> a0, t0, t1</span> <span class="coderef-off" id="coderef-calculate_remainder"><span class="linenr">16: </span> <span style="color: #00ffff;">rem</span> a1, t0, t1</span> <span class="linenr">17: </span><span style="color: #87cefa;">__idiv64u_exit</span>: <span class="linenr">18: </span> <span style="color: #00ffff;">ld</span> t0, 8(sp) <span class="linenr">19: </span> <span style="color: #00ffff;">ld</span> t1, 16(sp) <span class="linenr">20: </span> <span style="color: #00ffff;">ld</span> ra, 24(sp) <span class="linenr">21: </span> <span style="color: #00ffff;">addi</span> sp, sp, 32 <span class="linenr">22: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>This new implementation will save the argument values in temporary registers because this is a two-step function and the first argument would be overriden in the first step. The <b>divide</b> function then calculates the quotient on line <a class="coderef" href="#coderef-calculate_quotient" onmouseout="CodeHighlightOff(this, 'coderef-calculate_quotient');" onmouseover="CodeHighlightOn(this, 'coderef-calculate_quotient');">15</a>, and the remainder on line <a class="coderef" href="#coderef-calculate_remainder" onmouseout="CodeHighlightOff(this, 'coderef-calculate_remainder');" onmouseover="CodeHighlightOn(this, 'coderef-calculate_remainder');">16</a>. The <code>main.s</code> program must also be updated to save the result of the new <b>divide</b> function in two double words.</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">lw</span> a0, operand1 <span class="linenr"> 7: </span> <span style="color: #00ffff;">lw</span> a1, operand2 <span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> sp,_stack_end <span class="linenr"> 9: </span> <span style="color: #00ffff;">call</span> sum <span class="linenr">10: </span> <span style="color: #00ffff;">la</span> t1, result1 <span class="linenr">11: </span> <span style="color: #00ffff;">sw</span> a0, 0(t1) <span class="linenr">12: </span> <span style="color: #00ffff;">call</span> __imul64 <span class="linenr">13: </span> <span style="color: #00ffff;">la</span> t1, result2 <span class="linenr">14: </span> <span style="color: #00ffff;">sd</span> a0, 0(t1) <span class="linenr">15: </span> <span style="color: #00ffff;">sd</span> a1, 8(t1) <span class="linenr">16: </span> <span style="color: #00ffff;">bnez</span> a1, stop <span class="coderef-off" id="coderef-load_operand1_dividend"><span class="linenr">17: </span> <span style="color: #00ffff;">lw</span> a1, operand1</span> <span class="linenr">18: </span> <span style="color: #00ffff;">beqz</span> a1, stop <span class="linenr">19: </span> <span style="color: #00ffff;">call</span> __idiv64u <span class="linenr">20: </span> <span style="color: #00ffff;">la</span> t0, result3 <span class="linenr">21: </span> <span style="color: #00ffff;">sd</span> a0, 0(t0) <span class="coderef-off" id="coderef-save_remainder"><span class="linenr">22: </span> <span style="color: #00ffff;">sd</span> a1, 8(t0)</span> <span class="linenr">23: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">24: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".rodata"</span> <span class="linenr">25: </span><span style="color: #87cefa;">operand1</span>: .word 4 <span class="linenr">26: </span> <span style="color: #00ffff;">.data</span> <span class="linenr">27: </span><span style="color: #87cefa;">operand2</span>: .word 5 <span class="linenr">28: </span> <span style="color: #00ffff;">.bss</span> <span class="linenr">29: </span><span style="color: #87cefa;">result1</span>: .word 0 <span class="linenr">30: </span><span style="color: #87cefa;">result2</span>: .dword 0, 0 <span class="coderef-off" id="coderef-result3_2dwords"><span class="linenr">31: </span><span style="color: #87cefa;">result3</span>: .dword 0, 0</span> </pre></div> <p>The only changes are that <b>operand1</b> is used as the dividend on line <a class="coderef" href="#coderef-load_operand1_dividend" onmouseout="CodeHighlightOff(this, 'coderef-load_operand1_dividend');" onmouseover="CodeHighlightOn(this, 'coderef-load_operand1_dividend');">17</a> and an instruction was added on line <a class="coderef" href="#coderef-save_remainder" onmouseout="CodeHighlightOff(this, 'coderef-save_remainder');" onmouseover="CodeHighlightOn(this, 'coderef-save_remainder');">22</a> to store the remainder in ram. The <b>result3</b> variable was also updated to allocate two double-words of memory on line <a class="coderef" href="#coderef-result3_2dwords" onmouseout="CodeHighlightOff(this, 'coderef-result3_2dwords');" onmouseover="CodeHighlightOn(this, 'coderef-result3_2dwords');">31</a>. If this program is assembled and linked, then executed in <b>qemu</b> (as in the previous example), the contents of <b>operand3</b> can be inspected to see that both the quotient and remainder have been calculated:</p> <pre class="example"> (qemu) xp /2gd 0x80001018 0000000080001018: 11 1 </pre><p>This provides a more flexible implementation of <b>__idiv64u</b>, but if a true reciprocal of the <b>__imul64</b> function is desired, the function must allow for a 128-bit divisor argument. The RV64M extension does not define an instruction to calculate this, therefore the calculation must be performed in parts.</p> <p>If the 128-bit divisor is broken up into four words, the division can be carried out on each part individually and the result combined. This is possible because of the following:</p> <p>\(x = 2^{32}w_h + w_l\)</p> <p>The quotient of \(x\) by some integer \(d\) can be calculated as:</p> <p>\(x/d = 2^{32}w_h/d + (2^{32}*w_{h}\mod{d} + w_l)/d\)</p> <p>This calculation can be implemented with the following RISC-V assembly code:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.global</span> __idiv128u <span class="linenr"> 2: </span><span style="color: #87cefa;">__idiv128u</span>: <span class="linenr"> 3: </span> # Input: <span class="linenr"> 4: </span> # a0: Address where the 128-bit quotient will be stored (high <span class="linenr"> 5: </span> # dword, low dword). <span class="linenr"> 6: </span> # a1: 64-bit dividend <span class="linenr"> 7: </span> # a2: Address of the 128-bit divisor (high dword, low dword) <span class="linenr"> 8: </span> # Returns: <span class="linenr"> 9: </span> # a0: Address of the 128-bit quotient <span class="linenr">10: </span> # a1: 64-bit remainder <span class="linenr">11: </span> <span style="color: #00ffff;">addi</span> sp, sp, -32 <span class="linenr">12: </span> <span style="color: #00ffff;">sd</span> ra, 24(sp) <span class="linenr">13: </span> # Check for divide by zero <span class="linenr">14: </span> <span style="color: #00ffff;">beqz</span> a1, __idiv128u_exit <span class="linenr">15: </span> <span style="color: #00ffff;">addi</span> t2, a2, 16 <span class="linenr">16: </span> <span style="color: #00ffff;">li</span> t3, 0 # t3 = remainder <span class="linenr">17: </span><span style="color: #87cefa;">__idiv128u_next_dword</span>: <span class="linenr">18: </span> <span style="color: #00ffff;">lwu</span> t1, (a2) # t1 = low word <span class="linenr">19: </span> <span style="color: #00ffff;">ld</span> t0, (a2) <span class="linenr">20: </span> <span style="color: #00ffff;">srli</span> t0, t0, 32 # t0 = high word <span class="linenr">21: </span><span style="color: #87cefa;">__idiv128u_high_word</span>: <span class="linenr">22: </span> <span style="color: #00ffff;">slli</span> t3, t3, 32 <span class="linenr">23: </span> <span style="color: #00ffff;">add</span> t0, t0, t3 <span class="linenr">24: </span> <span style="color: #00ffff;">divu</span> t4, t0, a1 # t4 = t0/a1 <span class="linenr">25: </span> <span style="color: #00ffff;">slli</span> t5, t4, 32 # t5 = t4 * 2^32 <span class="linenr">26: </span> <span style="color: #00ffff;">remu</span> t3, t0, a1 # t3 = t0 mod a1 <span class="linenr">27: </span><span style="color: #87cefa;">__idiv128u_low_word</span>: <span class="coderef-off" id="coderef-__idiv128u_scale_remainder"><span class="linenr">28: </span> <span style="color: #00ffff;">slli</span> t3, t3, 32 # t3 = t3 * 2^32</span> <span class="coderef-off" id="coderef-__idiv128u_add_remainder"><span class="linenr">29: </span> <span style="color: #00ffff;">add</span> t0, t1, t3</span> <span class="linenr">30: </span> <span style="color: #00ffff;">divu</span> t4, t0, a1 <span class="linenr">31: </span> <span style="color: #00ffff;">add</span> t5, t5, t4 <span class="linenr">32: </span> <span style="color: #00ffff;">remu</span> t3, t0, a1 <span class="linenr">33: </span> <span style="color: #00ffff;">sd</span> t5, (a0) <span class="linenr">34: </span> <span style="color: #00ffff;">addi</span> a2, a2, 8 <span class="linenr">35: </span> <span style="color: #00ffff;">addi</span> a0, a0, 8 <span class="linenr">36: </span> <span style="color: #00ffff;">bne</span> t2, a2, __idiv128u_next_dword <span class="linenr">37: </span> <span style="color: #00ffff;">mv</span> a0, t3 <span class="linenr">38: </span><span style="color: #87cefa;">__idiv128u_exit</span>: <span class="linenr">39: </span> <span style="color: #00ffff;">ld</span> ra, 24(sp) <span class="linenr">40: </span> <span style="color: #00ffff;">addi</span> sp, sp, 32 <span class="linenr">41: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>This function iteratively performs a 64-bit division on 32-bit words of the divisor. The remainder is scaled (<a class="coderef" href="#coderef-__idiv128u_scale_remainder" onmouseout="CodeHighlightOff(this, 'coderef-__idiv128u_scale_remainder');" onmouseover="CodeHighlightOn(this, 'coderef-__idiv128u_scale_remainder');">28</a>), then added to the next word of the divisor (line <a class="coderef" href="#coderef-__idiv128u_add_remainder" onmouseout="CodeHighlightOff(this, 'coderef-__idiv128u_add_remainder');" onmouseover="CodeHighlightOn(this, 'coderef-__idiv128u_add_remainder');">29</a>) and the process is repeated for the next 64-bit double word.</p> <p>The following listing illustrates an updated <code>main.s</code> file:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">lw</span> a0, operand1 <span class="linenr"> 7: </span> <span style="color: #00ffff;">lw</span> a1, operand2 <span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> sp, _stack_end <span class="linenr"> 9: </span> <span style="color: #00ffff;">call</span> sum <span class="linenr">10: </span> <span style="color: #00ffff;">la</span> t1, result1 <span class="linenr">11: </span> <span style="color: #00ffff;">sw</span> a0, 0(t1) <span class="linenr">12: </span> <span style="color: #00ffff;">call</span> __imul64 <span class="linenr">13: </span> <span style="color: #00ffff;">la</span> t1, divisor <span class="linenr">14: </span> <span style="color: #00ffff;">sd</span> a0, 8(t1) <span class="linenr">15: </span> <span style="color: #00ffff;">sd</span> a1, 0(t1) <span class="linenr">16: </span> <span style="color: #00ffff;">la</span> a0, quotient <span class="linenr">17: </span> <span style="color: #00ffff;">lw</span> a1, operand1 <span class="linenr">18: </span> <span style="color: #00ffff;">la</span> a2, divisor <span class="linenr">19: </span> <span style="color: #00ffff;">call</span> __idiv128u <span class="coderef-off" id="coderef-save__idiv128u_remainder"><span class="linenr">20: </span> <span style="color: #00ffff;">la</span> t0, remainder</span> <span class="linenr">21: </span> <span style="color: #00ffff;">sd</span> a0, (t0) <span class="linenr">22: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">23: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".rodata"</span> <span class="linenr">24: </span><span style="color: #87cefa;">operand1</span>: .word 4 <span class="linenr">25: </span> <span style="color: #00ffff;">.data</span> <span class="linenr">26: </span><span style="color: #87cefa;">operand2</span>: .word 5 <span class="linenr">27: </span> <span style="color: #00ffff;">.bss</span> <span class="linenr">28: </span><span style="color: #87cefa;">result1</span>: .word 0 <span class="linenr">29: </span><span style="color: #87cefa;">result2</span>: .dword 0, 0 <span class="linenr">30: </span><span style="color: #87cefa;">result3</span>: .dword 0, 0 <span class="linenr">31: </span><span style="color: #87cefa;">divisor</span>: .dword 0, 0 <span class="linenr">32: </span><span style="color: #87cefa;">quotient</span>: .dword 0, 0 <span class="linenr">33: </span><span style="color: #87cefa;">remainder</span>: .dword 0 </pre></div> <p>This updated main program does not perform an overflow check since the <b>__idiv128u</b> function can handle a 128-bit divisor. This function also reads its operands directly from memory rather than from registers due to the fact that the divisor may not fit in a single register. The memory at label <b>quotient</b> will be updated with the result of the division. The remainder will be returned by the function, which is then saved to the memory at label <b>remainder</b> on line <a class="coderef" href="#coderef-save__idiv128u_remainder" onmouseout="CodeHighlightOff(this, 'coderef-save__idiv128u_remainder');" onmouseover="CodeHighlightOn(this, 'coderef-save__idiv128u_remainder');">20</a>.</p> </div> </div> <div class="outline-2" id="outline-container-org2adc904"> <h2 id="org2adc904">Atomic Instructions</h2> <div class="outline-text-2" id="text-org2adc904"> <p>Synchronization is an important feature in multiprocessing systems. Thus far, the examples have used a single hardware thread, or hart, therefore there has not been any need to synchronize memory access. RISC-V defines the <b>A</b> extension which provides instructions to atomically read-modify-write data in memory. These instructions can be used to support synchronization between multiple hardware threads running in the same memory space.</p> <p>The most basic synchronization primitive is the atomic compare and swap operation. This will compare a value in a register with a value in memory. If the two values are equal, the value in another register will be swapped with the value in memory. The pseudo code for this is as follows:</p> <ol class="org-ol"><li>Load value in register <b>R1</b></li> <li>Load address of the second value in <b>R2</b></li> <li>Load the value at address <b>R2</b> into a temporary register <b>T1</b></li> <li>Load swap value in register <b>R3</b></li> <li>If <b>R1</b> == <b>T1</b>:<br /><ol class="org-ol"><li>Store <b>R3</b> at memory location <b>R2</b></li> <li><b>R3</b> := <b>T1</b></li> </ol></li> </ol><p>This entire sequence is expected to be performed atomically (i.e. there can be no interrupt between the time the value <b>T1</b> is read from memory, and the end of the procedure. This can be implemented using Load Reserved/Store Conditional instructions provided by the RVA extension. The following listing illustrates the implementation of a compare-and-swap function:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.text</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> compare_and_swap <span class="linenr"> 4: </span> # a0: Address of value operand <span class="linenr"> 5: </span> # a1: Value to compare <span class="linenr"> 6: </span> # a2: Value to swap if (a0) == a1 <span class="linenr"> 7: </span> # return: a0 == 0 =&gt; CAS successful <span class="linenr"> 8: </span> # return: a0 == 1 =&gt; CAS failed <span class="linenr"> 9: </span><span style="color: #87cefa;">compare_and_swap</span>: <span class="coderef-off" id="coderef-load-reserved"><span class="linenr">10: </span> <span style="color: #00ffff;">lr.d</span> t0, (a0)</span> <span class="linenr">11: </span> <span style="color: #00ffff;">bne</span> t0, a1, nomatch <span class="coderef-off" id="coderef-match-store-conditional"><span class="linenr">12: </span> <span style="color: #00ffff;">sc.d</span> a0, a2, (a0)</span> <span class="coderef-off" id="coderef-cas-failed"><span class="linenr">13: </span> <span style="color: #00ffff;">bnez</span> a0, compare_and_swap</span> <span class="linenr">14: </span> <span style="color: #00ffff;">j</span> exit <span class="linenr">15: </span><span style="color: #87cefa;">nomatch</span>: <span class="linenr">16: </span> <span style="color: #00ffff;">li</span> a0, 1 <span class="linenr">17: </span><span style="color: #87cefa;">exit</span>: <span class="linenr">18: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>This function will atomically compare the value in memory located at the address in <b>a0</b> with the value in register <b>a1</b>, and store the value of <b>a2</b> at the location in <b>a0</b> if they match.</p> <p>The load-reserved instruction on line <a class="coderef" href="#coderef-load-reserved" onmouseout="CodeHighlightOff(this, 'coderef-load-reserved');" onmouseover="CodeHighlightOn(this, 'coderef-load-reserved');">10</a> loads the value at memory location <b>a0</b> into register <b>t0</b>, and registers a reservation on the address in memory. The nature of the memory reservation is specific to the implementation of the RISC-V core and is transparent to the program. The memory range that is reserved can be arbitrarily sized, however, it must be at least large enough to enclose the value that was loaded.</p> <p>The value of <b>t0</b> is then compared with <b>a1</b>. If the values match, the store-conditional instruction on line <a class="coderef" href="#coderef-match-store-conditional" onmouseout="CodeHighlightOff(this, 'coderef-match-store-conditional');" onmouseover="CodeHighlightOn(this, 'coderef-match-store-conditional');">12</a> will save the value in <b>a2</b> to the memory location of <b>a0</b>. This will also release the reservation on the memory address. If the values do not match, the memory is not updated (this instruction is skipped).</p> <p>If another hardware thread writes data to the memory for which there is a reservation, then the store-conditional instruction will fail and a non-zero error code will be written to the destination register which is <b>a0</b> in this function (line <a class="coderef" href="#coderef-match-store-conditional" onmouseout="CodeHighlightOff(this, 'coderef-match-store-conditional');" onmouseover="CodeHighlightOn(this, 'coderef-match-store-conditional');">12</a>. In this case, the compare-and-swap operation is restarted (line <a class="coderef" href="#coderef-cas-failed" onmouseout="CodeHighlightOff(this, 'coderef-cas-failed');" onmouseover="CodeHighlightOn(this, 'coderef-cas-failed');">13</a>.</p> <p>The main program shown in the following listing will invoke the compare-and-swap function:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">la</span> sp, _stack_end <span class="coderef-off" id="coderef-setup-cas-arguments"><span class="linenr"> 7: </span> <span style="color: #00ffff;">la</span> a0, n</span> <span class="linenr"> 8: </span> <span style="color: #00ffff;">li</span> a1, 5 <span class="linenr"> 9: </span> <span style="color: #00ffff;">li</span> a2, 6 <span class="coderef-off" id="coderef-successful-cas"><span class="linenr">10: </span> <span style="color: #00ffff;">call</span> compare_and_swap</span> <span class="linenr">11: </span> <span style="color: #00ffff;">la</span> a0, n <span class="linenr">12: </span> <span style="color: #00ffff;">li</span> a1, 5 <span class="linenr">13: </span> <span style="color: #00ffff;">li</span> a2, 7 <span class="coderef-off" id="coderef-unsuccessful-cas"><span class="linenr">14: </span> <span style="color: #00ffff;">call</span> compare_and_swap</span> <span class="linenr">15: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="coderef-off" id="coderef-load-alignment"><span class="linenr">16: </span> <span style="color: #00ffff;">.balign</span> 8</span> <span class="linenr">17: </span><span style="color: #87cefa;">n</span>: .dword 5 </pre></div> <p>Starting at line <a class="coderef" href="#coderef-setup-cas-arguments" onmouseout="CodeHighlightOff(this, 'coderef-setup-cas-arguments');" onmouseover="CodeHighlightOn(this, 'coderef-setup-cas-arguments');">7</a>, the function arguments are setup by first loading the address of the variable <b>n</b> into <b>a0</b>. Note that the alignment of the data loaded by the <b>lr.d</b> instruction must be aligned on an 8-byte boundary (similarly the <b>lr.w</b> instruction expects the data to be aligned to a 4-byte boundary). The <b>.balign</b> (byte align) assembler directive on line <a class="coderef" href="#coderef-load-alignment" onmouseout="CodeHighlightOff(this, 'coderef-load-alignment');" onmouseover="CodeHighlightOn(this, 'coderef-load-alignment');">16</a> ensures that this is the case.</p> <p>The first invocation of the function on line <a class="coderef" href="#coderef-successful-cas" onmouseout="CodeHighlightOff(this, 'coderef-successful-cas');" onmouseover="CodeHighlightOn(this, 'coderef-successful-cas');">10</a> will succeed, thus the value of <b>n</b> will be updated to 6. the second invocation will fail, this the value of <b>n</b> will not be changed. This can be verified by assembling the program and inspecting the memory from the <b>qemu</b> monitor:</p> <pre class="example"> riscv64-unknown-elf-as -o chapter4_cas_main.o chapter4_cas_main.s riscv64-unknown-elf-as -o cas.o cas.s riscv64-unknown-elf-ld -T chapter3.lds -o chapter4-cas.elf chapter4_cas_main.o cas.o qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel chapter4-cas.elf QEMU 3.1.0 monitor - type 'help' for more information (qemu) xp /1gd 0x80001008 0000000080001008: 6 (qemu) </pre><p>In addition to the load-reserved/store-conditional instructions, the RVA extension also provides atomic memory operations. These atomically perform an operation on a value in memory, and swap the previous content of the memory location into the targetted register. The supported operations include: <b>add</b>, <b>and</b>, <b>or</b>, <b>xor</b>, <b>max</b>, <b>min</b>, and <b>swap</b>. Moreover, the <b>min</b> and <b>max</b> instructions have signed and unsigned variants. These instructions are convenient for defining another useful synchronization primitive: the test-and-set spinlock.</p> <p>Spinlocks can be acquired by setting a sentinel value in a specific memory location, but only if that value is not already set therein. If the target memory location already contains the sentinel value, the spinlock will loop until it is released. The lock is released by clearing the memory location (i.e. setting it to zero). The implementation of a spinlock acquire/release pair is illustrated in the listing that follows:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.text</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> spinlock_acquire <span class="linenr"> 4: </span><span style="color: #87cefa;">spinlock_acquire</span>: <span class="linenr"> 5: </span> # a0 = memory address of the spinlock <span class="coderef-off" id="coderef-load_sentinel"><span class="linenr"> 6: </span> <span style="color: #00ffff;">li</span> t1, 1 #</span> <span class="coderef-off" id="coderef-set_sentinel"><span class="linenr"> 7: </span> <span style="color: #00ffff;">amoswap.d.aq</span> t0, t1, (a0) #</span> <span class="coderef-off" id="coderef-test_if_locked"><span class="linenr"> 8: </span> <span style="color: #00ffff;">bnez</span> t1, spinlock_acquire #</span> <span class="linenr"> 9: </span> <span style="color: #00ffff;">ret</span> <span class="linenr">10: </span> <span class="linenr">11: </span> <span style="color: #00ffff;">.global</span> spinlock_release <span class="linenr">12: </span><span style="color: #87cefa;">spinlock_release</span>: <span class="linenr">13: </span> # a0 = memory address of the spinlock <span class="coderef-off" id="coderef-unset_sentinel"><span class="linenr">14: </span> <span style="color: #00ffff;">amoswap.d.rl</span> zero, zero, (a0) #</span> <span class="linenr">15: </span> <span style="color: #00ffff;">ret</span> <span class="linenr">16: </span> </pre></div> <p>This listing defines two sub-routines: one to acquire a spinlock, and one to release it. The <b>spinlock_acquire</b> function loads the value 1 to use as the sentinel on line <a class="coderef" href="#coderef-load_sentinel" onmouseout="CodeHighlightOff(this, 'coderef-load_sentinel');" onmouseover="CodeHighlightOn(this, 'coderef-load_sentinel');">6</a>. Then the atomic memory operation <b>amoswap</b> is used on line <a class="coderef" href="#coderef-set_sentinel" onmouseout="CodeHighlightOff(this, 'coderef-set_sentinel');" onmouseover="CodeHighlightOn(this, 'coderef-set_sentinel');">7</a> to swap the value of the sentinel with the contents of the memory location specified in <b>a0</b>. The value contained in the lock location will be saved in register <b>t0</b>. If this value is not zero, the lock was already acquired by another thread, therefore the function will try again (line <a class="coderef" href="#coderef-test_if_locked" onmouseout="CodeHighlightOff(this, 'coderef-test_if_locked');" onmouseover="CodeHighlightOn(this, 'coderef-test_if_locked');">8</a>), otherwise the function returns.</p> <p>the <b>spinlock</b> release function will simply write zero into the memory location specified in <b>a0</b>. This will allow another thread that is spinning on the lock to acquire it.</p> <p>The <b>amoswap</b> instruction has two variants: one for double-words (<b>amoswap.d</b>) and one for word values (<b>amoswap.w</b>). Moreover, there are flags which define define the release consistency semantics of the memory operation (the <b>.aq</b> and <b>.rl</b> suffixes). Basically by setting the <b>.aq</b> suffix on the operation, then the effect of memory operations that occur after this one in the current hardware thread will not be observed by another thread before the effect of the current instruction. Conversely, when the <b>.rl</b> suffix is specified, the effects of memory operations preceding that of the current instruction will not be observed by other threads after its own effect.</p> <p>The following program illustrates the use of the spinlock functions to define a critical section:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">la</span> sp, _stack_end <span class="coderef-off" id="coderef-load_lockaddr"><span class="linenr"> 7: </span> <span style="color: #00ffff;">la</span> a0, lock #</span> <span class="coderef-off" id="coderef-spinlock_acquire"><span class="linenr"> 8: </span> <span style="color: #00ffff;">call</span> spinlock_acquire #</span> <span class="coderef-off" id="coderef-critical-section-start"><span class="linenr"> 9: </span> <span style="color: #00ffff;">la</span> t0, n #</span> <span class="linenr">10: </span> <span style="color: #00ffff;">ld</span> a0, (t0) <span class="linenr">11: </span> <span style="color: #00ffff;">li</span> a1, 1 <span class="linenr">12: </span> <span style="color: #00ffff;">call</span> sum <span class="linenr">13: </span> <span style="color: #00ffff;">la</span> t0, n <span class="coderef-off" id="coderef-critical-section-end"><span class="linenr">14: </span> <span style="color: #00ffff;">sd</span> a0, (t0) #</span> <span class="linenr">15: </span> <span style="color: #00ffff;">la</span> a0, lock <span class="coderef-off" id="coderef-spinlock_release"><span class="linenr">16: </span> <span style="color: #00ffff;">call</span> spinlock_release #</span> <span class="linenr">17: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">18: </span> <span style="color: #00ffff;">.data</span> <span class="linenr">19: </span> <span style="color: #00ffff;">.balign</span> 8 <span class="linenr">20: </span><span style="color: #87cefa;">lock</span>: .dword 0 <span class="linenr">21: </span><span style="color: #87cefa;">n</span>: .dword 0 <span class="linenr">22: </span> </pre></div> <p>This program will attempt to acquire the spinlock on line <a class="coderef" href="#coderef-spinlock_acquire" onmouseout="CodeHighlightOff(this, 'coderef-spinlock_acquire');" onmouseover="CodeHighlightOn(this, 'coderef-spinlock_acquire');">8</a> (the address of the lock variable is loaded on line <a class="coderef" href="#coderef-load_lockaddr" onmouseout="CodeHighlightOff(this, 'coderef-load_lockaddr');" onmouseover="CodeHighlightOn(this, 'coderef-load_lockaddr');">7</a>). This function call will block until the lock is acquired. Since there is only a single hardware thread, the lock should be acquired immediately. The critical section starts on line <a class="coderef" href="#coderef-critical-section-start" onmouseout="CodeHighlightOff(this, 'coderef-critical-section-start');" onmouseover="CodeHighlightOn(this, 'coderef-critical-section-start');">9</a>. The variable <b>n</b> is loaded and incremented by calling the <b>sum</b> function (defined in a previous chapter). The critical section ends on line <a class="coderef" href="#coderef-critical-section-end" onmouseout="CodeHighlightOff(this, 'coderef-critical-section-end');" onmouseover="CodeHighlightOn(this, 'coderef-critical-section-end');">14</a>, at which point the program releases the spinlock (<a class="coderef" href="#coderef-spinlock_release" onmouseout="CodeHighlightOff(this, 'coderef-spinlock_release');" onmouseover="CodeHighlightOn(this, 'coderef-spinlock_release');">16</a>. Following the execution of this program, the contents of the variable <b>n</b> should be 1:</p> <pre class="example"> riscv64-unknown-elf-as -o chapter4_spinlock_main.o chapter4_spinlock_main.s riscv64-unknown-elf-as -o spinlock.o spinlock.s riscv64-unknown-elf-as -o add.o add.s riscv64-unknown-elf-ld -T chapter3.lds -o chapter4-spinlock.elf chapter4_spinlock_main.o spinlock.o add.o qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel chapter4-spinlock.elf QEMU 3.1.0 monitor - type 'help' for more information (qemu) xp /1gd 0x80001008 0000000080001008: 1 </pre></div> </div> <div class="outline-2" id="outline-container-org86a22e8"> <h2 id="org86a22e8">Floating Point</h2> <div class="outline-text-2" id="text-org86a22e8"> <p>In <a href="https://www.vociferousvoid.org/main/risv_bare_metal_chapter2">chapter 2</a>, the base set of the base <b>I</b> (integer) registers were enumerated. However, when inspecting the VirtIO machine in <b>qemu</b>, using the <code>info registers</code> command, certain registers were listed that are not described in the table. These registers exist to support the <b>F</b> or <b>D</b> extensions which provide floating point arithmetic instructions that work with operands which conform to the IEEE 754-2008 standard. The <b>F</b> extension provides support for single-precission values and operands, and the <b>D</b> extension provides the same instructions for double-precision values.</p> <p>The 32 additional registers, <b>f0</b>-<b>f31</b>, are used exclusively by the instructions provided by the RVF and RVD extensions. This doubles the number of registers available to the processor without increasing the space required for the register specifier in the instruction op-code since only enough bits to enumerate 32 registers are required (5 bits).</p> <p>If only the RVF extension is supported, the <b>f</b> registers will be 32-bits wide. If the RVD extension is supported, the <b>f</b> registers will be 64-bits wide. If both RVF and RVD are supported, the RVF instructions will use only the lower 32-bits of the 64-bit registers.</p> <p>The <b>f</b> registers are enumerated in the following table with their ABI name and a description:</p> <table border="2" cellpadding="6" cellspacing="0" frame="hsides" rules="groups"><colgroup><col class="org-left" /><col class="org-left" /><col class="org-left" /></colgroup><thead><tr><th class="org-left" scope="col">Register(s)</th> <th class="org-left" scope="col">ABI Name(s)</th> <th class="org-left" scope="col">Description</th> </tr></thead><tbody><tr><td class="org-left">f0-f7</td> <td class="org-left">ft0-ft7</td> <td class="org-left">Temporary</td> </tr><tr><td class="org-left">f8-f9</td> <td class="org-left">fs0-fs1</td> <td class="org-left">Saved register</td> </tr><tr><td class="org-left">f10-f11</td> <td class="org-left">fa0-fa1</td> <td class="org-left">Function argument/Return value</td> </tr><tr><td class="org-left">f12-f17</td> <td class="org-left">fa2-fa7</td> <td class="org-left">Function argument</td> </tr><tr><td class="org-left">f18-f27</td> <td class="org-left">fs2-fs11</td> <td class="org-left">Saved register</td> </tr><tr><td class="org-left">f28-f31</td> <td class="org-left">ft8-ft9</td> <td class="org-left">Temporary</td> </tr></tbody></table><p>These registers roughly mirror the base integer registers with two notable exception: unlike <b>x0</b>, <b>f0</b> is not hardwired to 0, it can be used just like every other register. Moreover there are no registers to manage return addresses, stacks, globals, or threads. The equivalent <b>f</b> registers are used as temporaries.</p> <p>The convention for who is responsible for saving the contents of the registers is essentially the same as the equivalent base integer registers: Saved registers and temporary registers are to be saved by the callee. All other registers must be saved by the caller.</p> <p>In addition to the 32 <b>f</b> registers, the RVF and RVD extensions define a status and control register: <b>fcsr</b>. The RVF and RVD extensions provide the <b>frcsr</b> instruction to read this register, storing its value into the targetted integer register. Similarly, the <b>fscsr</b> instruction will copy the original value of <b>fcsr</b> into the destination integer register, and the write the value in the source integer register thereto.</p> <p>The <b>fcsr</b> prescribes the rounding mode used by floating point operations. The rounding mode field occupies bits 5-7 of the register. The RVF and RVD extensions also define the <b>frrm</b> instruction to retrieve the rounding mode.</p> <p>The <b>fcsr</b> register also contains flags to indicate exception conditions that may have occured while executing floating-point arithmetic since it was last reset. These errors include:</p> <dl class="org-dl"><dt>NV</dt> <dd>Invalid operation (<b>fcsr</b>[4])</dd> <dt>DZ</dt> <dd>Divide by zero (<b>fcsr</b>[3])</dd> <dt>OF</dt> <dd>Overflow (<b>fcsr</b>[2])</dd> <dt>UF</dt> <dd>Underflow (<b>fcsr</b>[1])</dd> <dt>NX</dt> <dd>Inexact (<b>fcsr</b>[0])</dd> </dl><p>The floating-point exception flags can also be retrieved using the <b>frflags</b> instruction which saves their state in the specified integer registers.</p> <p>The RVF and RVD extensions define two load instructions and two store instructions. These are essentially mirrors of the base load and store instructions that use the <b>f</b> registers rather than the <b>x</b> integer registers. Therefore their addressing mode and format are the same as the <b>lw</b>, <b>ld</b>, <b>sw</b> and <b>sd</b> instructions.</p> <p>The RVF and RVD extensions also provide a set of arithmetic instructions including:</p> <ul class="org-ul"><li>fadd</li> <li>fsub</li> <li>fmul</li> <li>fdiv</li> <li>fsqrt</li> </ul><p>Each instruction has a single- and double-precision variant which can be specified by adding a <b>.s</b> or <b>.d</b> suffix to the instruction respectively.</p> <p>The floating-point arithmetic instructions will operate using only the <b>f</b> registers, therefore the extensions provide instructions to move data from integer to floating point registers.</p> <p>The following function implementation will demonstrate some of these instructions. The function in <code>fvector.s</code> will multiply each element from an array of floating-point values by a floating-point scalar:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.text</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> __vec_scalef <span class="linenr"> 4: </span> # a0: number of elements, 'n', in the array <span class="linenr"> 5: </span> # fa0: A double-precision floating-point scalar 'a' <span class="linenr"> 6: </span> # a1: Address of array of x[n] double-precision floating-point values. <span class="linenr"> 7: </span><span style="color: #87cefa;">__vec_scalef</span>: <span class="linenr"> 8: </span> <span style="color: #00ffff;">addi</span> sp, sp, -32 <span class="linenr"> 9: </span> <span style="color: #00ffff;">sd</span> ra, 24(sp) <span class="linenr">10: </span> <span style="color: #00ffff;">beqz</span> a0, __vec_scalef_exit <span class="linenr">11: </span><span style="color: #87cefa;">__vec_scalef_loop</span>: <span class="coderef-off" id="coderef-load_double"><span class="linenr">12: </span> <span style="color: #00ffff;">fld</span> fa5,0(a1)</span> <span class="coderef-off" id="coderef-mul_double"><span class="linenr">13: </span> <span style="color: #00ffff;">fmul.d</span> fa5, fa5, fa0</span> <span class="coderef-off" id="coderef-store_double"><span class="linenr">14: </span> <span style="color: #00ffff;">fsd</span> fa5,0(a1)</span> <span class="linenr">15: </span> <span style="color: #00ffff;">addi</span> a1, a1,8 <span class="linenr">16: </span> <span style="color: #00ffff;">addi</span> a0, a0,-1 <span class="linenr">17: </span> <span style="color: #00ffff;">bnez</span> a0, __vec_scalef_loop <span class="linenr">18: </span><span style="color: #87cefa;">__vec_scalef_exit</span>: <span class="linenr">19: </span> <span style="color: #00ffff;">ld</span> ra, 24(sp) <span class="linenr">20: </span> <span style="color: #00ffff;">addi</span> sp, sp, 32 <span class="linenr">21: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>In this function each value of the double array is loaded on line <a class="coderef" href="#coderef-load_double" onmouseout="CodeHighlightOff(this, 'coderef-load_double');" onmouseover="CodeHighlightOn(this, 'coderef-load_double');">12</a> at each iteration (up to a maximum set by the integer value in <b>a0</b>). The loaded value is multipled by the double-precision floating-point value in <b>fa0</b> on line <a class="coderef" href="#coderef-mul_double" onmouseout="CodeHighlightOff(this, 'coderef-mul_double');" onmouseover="CodeHighlightOn(this, 'coderef-mul_double');">13</a>, then stored to the same memory location on line <a class="coderef" href="#coderef-store_double" onmouseout="CodeHighlightOff(this, 'coderef-store_double');" onmouseover="CodeHighlightOn(this, 'coderef-store_double');">14</a>.</p> <p>The source data for the function can be defined using the <b>.double</b> assembler directive. This directive will store double-precision floating-point values in successive memory double-words. The <b>.float</b> directive will do the same for single-precision floating-point values.</p> <p>There are many more instructions defined in the RVF and RVD extensions. Enough to dedicate an entire chapter to this topic. Moreover, the <b>qemu</b> support for the RVF and RVD does not seem to be fully immplemented for the version available in the Debian 10 packages. A more thorough investigation of these extensions will be reserved for a future chapter.</p> </div> </div> <div class="outline-2" id="outline-container-orgfdc44a9"> <h2 id="orgfdc44a9">Conclusion</h2> <div class="outline-text-2" id="text-orgfdc44a9"> <p>The RISC-V architecture is designed to be a simple as possible but no simpler. Therefore a building block philosophy is followed to allow chip designers to include as many or as few instructions as needed. This provides some flexibility to system designers to satisfy cost, efficiency, and performance constraints specific to the application domain.</p> <p>Breaking out instructions into optional extensions is like having Lego bricks representing sub-sets of the total RISC-V ISA. In this chapter the <b>M</b>, <b>A</b>, <b>F</b>, and <b>D</b> extensions were used to create a small library of functions that can be re-used in the future to perform more complex calculations, and to synchronize memory access across hardware threads.</p> <p>In addition to these there are two other optional standard extensions that were not covered in this chapter:</p> <dl class="org-dl"><dt>C</dt> <dd>Compressed instructions.</dd> <dt>V</dt> <dd>Vector instructions for SIMD processing.</dd> </dl><p>Discussion of these extensions will be reserved for future chapter.</p> <p>In the next chapter the priviledged instruction set will be described. This allows for varying levels of support for the base instructions. In this chapter, the utility functions defined so far will be used to create more complex programs. The syncrhonization utilities will be particularly useful when dealing with interrupts.</p> </div> </div> </div> </div> <span rel="sioc:has_creator" class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/main/user/1" lang="" about="/main/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">MarcAdmin</a></span> <span property="dc:date dc:created" content="2019-11-30T06:41:45+00:00" datatype="xsd:dateTime" class="field field--name-created field--type-created field--label-hidden">Sat, 11/30/2019 - 01:41</span> <div class="field field--name-field-tags field--type-entity-reference field--label-above clearfix"> <h3 class="field__label">Tags</h3> <ul class="links field__items"> <li><a href="/main/riscv" rel="dc:subject" hreflang="en">RISC-V</a></li> </ul> </div> Sat, 30 Nov 2019 06:41:45 +0000 MarcAdmin 30 at https://www.vociferousvoid.org/main https://www.vociferousvoid.org/main/riscv_bare_metal_chapter4#comments RISC-V Bare Metal Programming Chapter 3: A Link to the Past https://www.vociferousvoid.org/main/riscv_bare_metal_chapter3 <span property="dc:title" class="field field--name-title field--type-string field--label-hidden">RISC-V Bare Metal Programming Chapter 3: A Link to the Past</span> <div property="content:encoded" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><div class="tex2jax_process"> <p>Previous chapters of the RISC-V bare metal programming tutorial have focused primarily on the assembler. In <a href="https://www.vociferousvoid.org/main/riscv_bare_metal_chapter2">chapter 2</a>, assembler directives were discussed along with their relationship to the positioning of code in an executable. The various sections of where code and data reside have well defined semantics in the Executable and Linkable Format specification. In this chapter, these semantics and the linking process will be examined in more detail.</p> <p>The typical programming workflow involves processing the source file using either an assembler (for assembly code), or a compiler (for higher-level programming languages such as C) to produce an object file. The object file by itself cannot be run since it will have references to memory addresses which are relative to the code's position rather than absolute memory offsets. The relative addresses need to be translated into absolute addresses in order for them to make sense to the processor. For example, the following code to call the <b>sum</b> function:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #00ffff;">.text</span> <span style="color: #00ffff;">.align</span> 2 <span style="color: #00ffff;">.global</span> _start <span style="color: #00ffff;">.global</span> _stack_end <span style="color: #87cefa;">_start</span>: <span style="color: #00ffff;">li</span> a0, 5 <span style="color: #00ffff;">li</span> a1, 4 <span style="color: #00ffff;">la</span> sp,_stack_end <span style="color: #00ffff;">call</span> sum <span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop </pre></div> <p>Will be assembled to the following object code:</p> <pre class="example"> <span class="linenr"> 1: </span>$ riscv64-unknown-elf-objdump -d main.o <span class="linenr"> 2: </span> <span class="linenr"> 3: </span>main.o: file format elf64-littleriscv <span class="linenr"> 4: </span> <span class="linenr"> 5: </span> <span class="linenr"> 6: </span>Disassembly of section .text: <span class="linenr"> 7: </span> <span class="linenr"> 8: </span>0000000000000000 &lt;_start&gt;: <span class="linenr"> 9: </span> 0: 00500513 li a0,5 <span class="linenr">10: </span> 4: 00400593 li a1,4 <span class="coderef-off" id="coderef-set_stack"><span class="linenr">11: </span> 8: 00000117 auipc sp,0x0</span> <span class="linenr">12: </span> c: 00010113 mv sp,sp <span class="coderef-off" id="coderef-call_sum"><span class="linenr">13: </span> 10: 00000097 auipc ra,0x0</span> <span class="linenr">14: </span> 14: 000080e7 jalr ra # 10 &lt;_start+0x10&gt; <span class="linenr">15: </span> <span class="linenr">16: </span>0000000000000018 &lt;stop&gt;: <span class="linenr">17: </span> 18: 0000006f j 18 &lt;stop&gt; <span class="linenr">18: </span> </pre><p>The <b>auipc</b> (Add Upper Immediate to PC) adds 0 to the program counter on line <a class="coderef" href="#coderef-set_stack" onmouseout="CodeHighlightOff(this, 'coderef-set_stack');" onmouseover="CodeHighlightOn(this, 'coderef-set_stack');">11</a> which stores the value of the program counter in the <b>sp</b> register. This is followed by a no-op (a move of the value of <b>sp</b> into <b>sp</b> which simply increments the program counter). The purpose of this sequence of instructions is to set the top of the stack, the zero in this instruction must be replaced with the stack's address. The linker is responsible for filling in the correct address which is at the offset defined by the <strong>_stack_end</strong> symbol. The final linked application is shown in the following listing:</p> <pre class="example"> $ riscv64-unknown-elf-objdump -d sum.elf sum.elf: file format elf64-littleriscv Disassembly of section .text: 0000000080000000 &lt;_start&gt;: 80000000: 00500513 li a0,5 80000004: 00400593 li a1,4 80000008: 00009117 auipc sp,0x9 8000000c: ff810113 addi sp,sp,-8 # 80009000 &lt;_stack_end&gt; 80000010: 008000ef jal ra,80000018 &lt;sum&gt; 0000000080000014 &lt;stop&gt;: 80000014: 0000006f j 80000014 &lt;stop&gt; 0000000080000018 &lt;sum&gt;: 80000018: fe010113 addi sp,sp,-32 8000001c: 00113c23 sd ra,24(sp) 80000020: 00b50533 add a0,a0,a1 80000024: 01813083 ld ra,24(sp) 80000028: 02010113 addi sp,sp,32 8000002c: 00008067 ret </pre><p>In this listing, the <b>auipc</b> instruction will set <b>sp</b> to address 0x80009008 because:</p> <ul class="org-ul"><li><b>pc</b> is 0x80000008 (the offset of the current instruction)</li> <li>The result of <b>auipc</b> is <b>sp</b> = <b>pc</b> + (0x9 &lt;&lt; 12)</li> </ul><p>The next instruction will subtract 8 from the value of <b>sp</b> to set the top of the stack at memory offset 0x80009000.</p> <p>Similarly, the next two instructions starting at line <a class="coderef" href="#coderef-call_sum" onmouseout="CodeHighlightOff(this, 'coderef-call_sum');" onmouseover="CodeHighlightOn(this, 'coderef-call_sum');">13</a> (offsets 0x10 and 0x14) of the object file are place-holders for the the call to the <b>sum</b> subroutine. The <b>auipc</b> instruction sets the <b>ra</b> register to the current value of the program counter (<b>ra</b> = <b>pc</b> + (0x0 &lt;&lt; 12)), the next instruction jumps to the address in <b>ra</b> (offset 10) then sets <b>ra</b> to <b>pc</b> + 4 (0x18). If the program could execute as-is, this would result in an infinite loop. The proper memory offsets need to be filled in by the linker.</p> <p>In the final linked program, these two instructions are replaced by a <b>jal</b> which sets <b>ra</b> (the return address) to the instruction at 0x80000014 (the infinite loop), then jumps to the offset at 0x80000018 (the start of the <b>sum</b> subroutine).</p> <div class="outline-2" id="outline-container-orgaab4bce"> <h1 id="orgaab4bce">The Adventure of Link</h1> <div class="outline-text-2" id="text-1"> <p>Although the linking phase may seem like magic, it is largely under the control of the developer via the linker script. Chapter 2 introduced an example linker script. However, the explanation of its purpose was very superficial. In this chapter, the process of linking the application will be studied more thoroughly. The linker script from the chapter 2 is illustrated in the following listing:</p> <pre class="example"> <span class="linenr"> 1: </span>OUTPUT_ARCH( "riscv" ) <span class="linenr"> 2: </span>SECTIONS { <span class="linenr"> 3: </span> . = 0x80000000; <span class="linenr"> 4: </span> .text : { <span class="linenr"> 5: </span> PROVIDE(_text_start = .); <span class="linenr"> 6: </span> * (.text.init); <span class="linenr"> 7: </span> * (.text .text.*); <span class="linenr"> 8: </span> PROVIDE(_text_end = .); <span class="linenr"> 9: </span> } <span class="linenr">10: </span> PROVIDE(_global_pointer = .); <span class="linenr">11: </span> .rodata : { <span class="linenr">12: </span> PROVIDE(_rodata_start = .); <span class="linenr">13: </span> *(.srodata .srodata.*) *(.rodata .rodata.*) <span class="linenr">14: </span> PROVIDE(_rodata_end = .); <span class="linenr">15: </span> } <span class="linenr">16: </span> .data : { <span class="linenr">17: </span> . = ALIGN(4096); <span class="linenr">18: </span> PROVIDE(_data_start = .); <span class="linenr">19: </span> *(.sdata .sdata.*) *(.data .data.*) <span class="linenr">20: </span> PROVIDE(_data_end = .); <span class="linenr">21: </span> } <span class="linenr">22: </span> .bss : { <span class="linenr">23: </span> PROVIDE(_bss_start = .); <span class="linenr">24: </span> *(.sbss .sbss.*) *(.bss .bss.*) <span class="linenr">25: </span> PROVIDE(_bss_end = .); <span class="linenr">26: </span> } <span class="linenr">27: </span> PROVIDE(_stack_start = _bss_end); <span class="linenr">28: </span> PROVIDE(_stack_end = _stack_start + 0x8000); <span class="linenr">29: </span>} </pre><p>The first statement in the linker script sets the architecture of the target machine; in this case RISC-V.</p> <p>The more important command is the <b>SECTIONS</b> statement which defines the different sections of the ELF file. As discussed in chapter 2, code and data have different memory requirements. Typically code will be stored in read-only memory and data will be stored in memory that can be read-only or writable depending on the constraints on the data. The <b>SECTIONS</b> declaration is used to prescribe how the code and data will be organized in the final binary.</p> <p>Within the <b>SECTIONS</b> block of the script, the period (.) is a special token that represents the location counter. This is essentially the current offset in memory. By default the location counter always starts at offset 0. However, the current position can be set explicitly by assigning to the period token. Since the reset vector of the VirtIO board is at memory address 0x80000000, this address is assigned to the location counter to ensure that this area of memory is populated by the linked binary.</p> </div> <div class="outline-3" id="outline-container-org4516f2c"> <h2 id="org4516f2c">Code</h2> <div class="outline-text-3" id="text-1-1"> <p>Once the location counter is initialized, the next statement in the linker script is the declaration of the .text section. This is where all the executable code is expected to be found. The section is declared by specifying its name (.text) followed by a colon (:) and a pair of braces that enclose the statements specific to the current section.</p> <p>The first statement in the .text section block is a <b>PROVIDE</b> command. This defines a linker symbol that can be used when linking. In this case it defines a global symbol named <b>_text_start </b>which has the current value of the location counter. This symbol can be used to refer to the starting offset of the .text section in memory.</p> <p>The next statement uses wildcards to aggregate the assembly code in the <b>.text.init</b> section of all object files provided to the linker. The next statement aggregates the code from the <b>.text</b> and <b>.text.*</b> sections in all of the provided object files. The later rule will match any section name prefixed with the <b>.text.</b> substring.</p> <p>Note that the order in which object files are provided to the linker matters. If the <b>add.o</b> and <b>main.o</b> files are linked with the add function provided first, the resulting binary will have the following layout:</p> <pre class="example"> $ riscv64-unknown-elf-ld -T linker.lds -o sum.elf add.o main.o $ riscv64-unknown-elf-objdump -d sum.elf sum.elf: file format elf64-littleriscv Disassembly of section .text: 0000000080000000 &lt;sum&gt;: 80000000: fe010113 addi sp,sp,-32 80000004: 00113c23 sd ra,24(sp) 80000008: 00b50533 add a0,a0,a1 8000000c: 01813083 ld ra,24(sp) 80000010: 02010113 addi sp,sp,32 80000014: 00008067 ret 0000000080000018 &lt;_start&gt;: 80000018: 00500513 li a0,5 8000001c: 00400593 li a1,4 80000020: 00009117 auipc sp,0x9 80000024: fe010113 addi sp,sp,-32 # 80009000 &lt;_stack_end&gt; 80000028: fd9ff0ef jal ra,80000000 &lt;sum&gt; 000000008000002c &lt;stop&gt;: 8000002c: 0000006f j 8000002c &lt;stop&gt; </pre><p>This is clearly not the desired result since the first instruction will create a stack frame, save the return address (which is undefined), then perform the add with un-initialized argument registers. To ensure that the initialization comes before the function implementation, the <code>main.s</code> file must be changed to add its code to the <b>.text.init</b> section:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="coderef-off" id="coderef-init_section"><span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span></span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">li</span> a0, 5 <span class="linenr"> 7: </span> <span style="color: #00ffff;">li</span> a1, 4 <span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> sp,_stack_end <span class="linenr"> 9: </span> <span style="color: #00ffff;">call</span> sum <span class="linenr">10: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop </pre></div> <p>The <code>main.s</code> file uses the <b>.section</b> assembler directive on line <a class="coderef" href="#coderef-init_section" onmouseout="CodeHighlightOff(this, 'coderef-init_section');" onmouseover="CodeHighlightOn(this, 'coderef-init_section');">1</a> to declare that all code that follows should be copied into the <b>.text.init</b> section. This will ensure that the initialization code will always be first due to the command to aggregate code from this section preceding any other in the linker script. The linked program will now have the proper layout regardless of the order in which the object files are provided to the linker.</p> <pre class="example"> $ riscv64-unknown-elf-ld -T linker.lds -o sum.elf add.o main.o $ riscv64-unknown-elf-objdump -d sum.elf sum.elf: file format elf64-littleriscv Disassembly of section .text: 0000000080000000 &lt;_start&gt;: 80000000: 00500513 li a0,5 80000004: 00400593 li a1,4 80000008: 00009117 auipc sp,0x9 8000000c: ff810113 addi sp,sp,-8 # 80009000 &lt;_stack_end&gt; 80000010: 008000ef jal ra,80000018 &lt;sum&gt; 0000000080000014 &lt;stop&gt;: 80000014: 0000006f j 80000014 &lt;stop&gt; 0000000080000018 &lt;sum&gt;: 80000018: fe010113 addi sp,sp,-32 8000001c: 00113c23 sd ra,24(sp) 80000020: 00b50533 add a0,a0,a1 80000024: 01813083 ld ra,24(sp) 80000028: 02010113 addi sp,sp,32 8000002c: 00008067 ret </pre><p>The <b>_start</b> code is located at offset 0x80000000 as expected even if the object file for the <b>sum</b> function was provided to the linker first. Now that the code is properly organized, let's look at the data.</p> </div> </div> <div class="outline-3" id="outline-container-org1578180"> <h2 id="org1578180">Data</h2> <div class="outline-text-3" id="text-1-2"> <p>The linker script defines three data sections: .rodata, .data, and .bss. It may seem odd to have four sections (if the .text section is counted) when a program is comprised only of two types on information: code and data. The reason is that data can be divided up into three categories:</p> <ul class="org-ul"><li>Global Read-only Data</li> <li>Global Initialized Data</li> <li>Global Un-initialized Data</li> </ul><p>Local data is not considered because this type of data is generated at run time and will be stored either in stack memory, or some allocated heap buffer. For now the focus will be on global data. The differences in each type of data can be illustrated using a simple C program:</p> <div class="org-src-container"> <pre class="src src-c"> <span class="coderef-off" id="coderef-rodata"><span class="linenr"> 1: </span><span style="color: #00ffff;">const</span> <span style="color: #98fb98;">int</span> <span style="color: #eedd82;">operand1</span> = 4 ;</span> <span class="coderef-off" id="coderef-data"><span class="linenr"> 2: </span><span style="color: #98fb98;">int</span> <span style="color: #eedd82;">operand2</span> = 5 ;</span> <span class="coderef-off" id="coderef-bss"><span class="linenr"> 3: </span><span style="color: #98fb98;">int</span> <span style="color: #eedd82;">result</span> ;</span> <span class="linenr"> 4: </span> <span class="linenr"> 5: </span><span style="color: #98fb98;">int</span> <span class="linenr"> 6: </span><span style="color: #87cefa;">sum</span>( <span style="color: #98fb98;">int</span> <span style="color: #eedd82;">op1</span>, <span style="color: #98fb98;">int</span> <span style="color: #eedd82;">op2</span> ) <span class="linenr"> 7: </span>{ <span class="linenr"> 8: </span> <span style="color: #00ffff;">return</span> op1 + op2 ; <span class="linenr"> 9: </span>} <span class="linenr">10: </span> <span class="linenr">11: </span><span style="color: #98fb98;">int</span> <span class="linenr">12: </span><span style="color: #87cefa;">main</span>( <span style="color: #98fb98;">int</span> <span style="color: #eedd82;">argc</span>, <span style="color: #98fb98;">char</span>** <span style="color: #eedd82;">argv</span> ) <span class="linenr">13: </span>{ <span class="linenr">14: </span> result = sum( operand1, operand2 ) ; <span class="linenr">15: </span> <span style="color: #00ffff;">while</span> ( 1 ) ; <span class="linenr">16: </span>} </pre></div> </div> <div class="outline-4" id="outline-container-org600643c"> <h3 id="org600643c">Global Read-only Data</h3> <div class="outline-text-4" id="text-1-2-1"> <p>The <b>operand1</b> variable is declared on line <a class="coderef" href="#coderef-rodata" onmouseout="CodeHighlightOff(this, 'coderef-rodata');" onmouseover="CodeHighlightOn(this, 'coderef-rodata');">1</a>. This is a read-only value initialized to the integer 4. The compiler will store this data in the <b>.srodata</b> section of the object file:</p> <pre class="example"> $ riscv64-unknown-elf-objdump -s -j.srodata addc.o addc.o: file format elf64-littleriscv Contents of section .srodata: 0000 04000000 .... </pre><p>Assembler directives can also be used to populate the .rodata section in an assembly program. The following fragment can be added to the end of the main.s program to declare the <b>operand1</b> constant in the .rodata section:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr">1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".rodata"</span> <span class="coderef-off" id="coderef-operand1_def"><span class="linenr">2: </span><span style="color: #87cefa;">operand1</span>: .word 4</span> </pre></div> <p>The constant <b>operand1</b> is defined on line <a class="coderef" href="#coderef-operand1_def" onmouseout="CodeHighlightOff(this, 'coderef-operand1_def');" onmouseover="CodeHighlightOn(this, 'coderef-operand1_def');">2</a> using the <b>.word</b> assembler directive. This will store the given value as a 32-bit quantity in the current memory word. This directive allows any number of words to be stored by specifying the values as a comma separated list. For example, the following directive will store the 32-bit values 0x0001, 0x0002, and 0x0003 in successive memory words:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #00ffff;">.word</span> 1, 2, 3 </pre></div> <p>If this program is linked using the example linker script, this data will be included in the <b>.rodata</b> section. The script instructs the linker to aggregate all definitions of <b>.srodata</b> and <b>.rodata</b> as well as section names prefixed with <b>.rodata.</b> or <b>.srodata.</b> into a single <b>.rodata</b> section.</p> </div> </div> <div class="outline-4" id="outline-container-org624b212"> <h3 id="org624b212">Global Initialized Data</h3> <div class="outline-text-4" id="text-1-2-2"> <p>The <b>operand2</b> variable is declared on line <a class="coderef" href="#coderef-data" onmouseout="CodeHighlightOff(this, 'coderef-data');" onmouseover="CodeHighlightOn(this, 'coderef-data');">2</a>. This is a mutable variable initialized with the integer value 5. The compiler will store this data in the <b>.sdata</b> section of the object file.</p> <pre class="example"> $ riscv64-unknown-elf-objdump -s -j.sdata addc.o addc.o: file format elf64-littleriscv Contents of section .sdata: 0000 05000000 .... </pre><p>The <code>linker.lds</code> script instructs the linker to aggregate all data declared in the <b>.sdata</b>, and <b>.data</b> sections, as well as all sections prefixed by <b>.data.</b> or <b>.sdata.</b> into a single <b>.data</b> section.</p> <p>The <b>operand2</b> global value can similarly be defined using assembler directives. The relevant assembly code is illustrated in the following listing:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #00ffff;">.data</span> <span style="color: #87cefa;">operand2</span>: .word 5 </pre></div> <p>The <b>.data</b> assembler directive ensures that all following instructions or directives will affect the <b>.data</b> section.</p> </div> </div> <div class="outline-4" id="outline-container-orga0973e2"> <h4 id="orga0973e2"><span class="section-number-4">1.2.3</span> Global Un-initialized Data</h4> <div class="outline-text-4" id="text-1-2-3"> <p>The <b>result</b> variable is declared on line <a class="coderef" href="#coderef-bss" onmouseout="CodeHighlightOff(this, 'coderef-bss');" onmouseover="CodeHighlightOn(this, 'coderef-bss');">3</a> of the C program. This variable is said to be un-initialized because it is not assigned a value at build time. The C language guarantees that un-initialized global variables will be initialized to zero. The system is responsible for initializing this data to zero before handing control over to the C program. This is easier to do if un-initialized data is aggregated into a common section. The loader can then simply zero out the entire section; in this case the <b>.bss</b> section:</p> <pre class="example"> $ riscv64-unknown-elf-objdump -D -j.bss sumc.o sumc.o: file format elf64-littleriscv Disassembly of section .bss: 0000000000000000 &lt;result&gt;: 0: 0000 unimp ... </pre><p>The linker script will instruct the linker to aggregate all data definitions in the <b>.sbss</b> or <b>.bss</b> sections as, well as those in section names prefixed by <b>.sbss.</b> or <b>.bss.</b>, into a single <b>.bss</b> section.</p> <p>The <b>result</b> global variable can also be defined using assembler directives. The relevant assembly code is illustrated in the following listing:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #00ffff;">.bss</span> <span style="color: #87cefa;">result</span>: .word 0 </pre></div> <p>The <b>.bss</b> assembler directive will ensure that the result memory location is located in the <b>.bss</b> section of the binary file. Note that the value of the <b>result</b> variable is initialized to zero. Even though this is un-initialized data, if no value is specified, no memory will be allocated to the result variable. Since this is the <b>.bss</b> section, initializing this memory to zero satisfies the C language guarantees.</p> </div> </div> </div> <div class="outline-3" id="outline-container-org33398eb"> <h2 id="org33398eb">The Final Boss</h2> <div class="outline-text-3" id="text-1-3"> <p>The assembly program to add two integer values can be update to read its operands from memory. The following listing shows the updated <code>main.s</code> assembly code:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="coderef-off" id="coderef-load_operand2"><span class="linenr"> 6: </span> <span style="color: #00ffff;">lw</span> a0, operand2</span> <span class="coderef-off" id="coderef-load_operand1"><span class="linenr"> 7: </span> <span style="color: #00ffff;">lw</span> a1, operand1</span> <span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> sp,_stack_end <span class="linenr"> 9: </span> <span style="color: #00ffff;">call</span> sum <span class="coderef-off" id="coderef-load_result"><span class="linenr">10: </span> <span style="color: #00ffff;">la</span> t1, result</span> <span class="coderef-off" id="coderef-save_result"><span class="linenr">11: </span> <span style="color: #00ffff;">sw</span> a0, 0(t1)</span> <span class="linenr">12: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">13: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".rodata"</span> <span class="linenr">14: </span><span style="color: #87cefa;">operand1</span>: .word 4 <span class="linenr">15: </span> <span style="color: #00ffff;">.data</span> <span class="linenr">16: </span><span style="color: #87cefa;">operand2</span>: .word 5 <span class="linenr">17: </span> <span style="color: #00ffff;">.bss</span> <span class="linenr">18: </span><span style="color: #87cefa;">result</span>: .word 0 </pre></div> <p>The value of <b>operand2</b> is loaded into the function argument register <b>a0</b> on line <a class="coderef" href="#coderef-load_operand2" onmouseout="CodeHighlightOff(this, 'coderef-load_operand2');" onmouseover="CodeHighlightOn(this, 'coderef-load_operand2');">6</a>. The value of <b>operand1</b> is loaded into the function argument register <b>a1</b> on line <a class="coderef" href="#coderef-load_operand1" onmouseout="CodeHighlightOff(this, 'coderef-load_operand1');" onmouseover="CodeHighlightOn(this, 'coderef-load_operand1');">7</a>. The <b>sum</b> function is called with those operands and the result (stored in <b>a0</b>), is saved in the memory word allocated for <b>result</b> in the <b>.bss</b> section on line <a class="coderef" href="#coderef-save_result" onmouseout="CodeHighlightOff(this, 'coderef-save_result');" onmouseover="CodeHighlightOn(this, 'coderef-save_result');">11</a>. Before saving the result, the memory location of the <b>result</b> variable must first be loaded into a register; this is accomplished using the <b>la</b> instruction on line <a class="coderef" href="#coderef-load_result" onmouseout="CodeHighlightOff(this, 'coderef-load_result');" onmouseover="CodeHighlightOn(this, 'coderef-load_result');">10</a>.</p> <p>This program can be assembled and linked with the <code>add.s</code> program as follows:</p> <pre class="example"> $ riscv64-unknown-elf-as -o main.o main.s $ riscv64-unknown-elf-as -o add.o add.s $ riscv64-unknown-elf-ld -T linker.lds -o sum.elf add.o main.o </pre><p>The memory offset of the <b>result</b> can be obtained by inspecting the resulting ELF file:</p> <pre class="example"> $ riscv64-unknown-elf-objdump -D -j.bss sum.elf sum.elf: file format elf64-littleriscv Disassembly of section .bss: 0000000080001004 &lt;result&gt;: 80001004: 0000 unimp ... </pre><p>The program can now be tested using the qemu emulator. Given that the memory location of the <b>result</b> variable is 0x80001004, we can inspect this memory location to ensure that it contains the expected result:</p> <pre class="example"> $ qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel sum.elf QEMU 3.1.0 monitor - type 'help' for more information (qemu) xp /1dw 0x80001004 0000000080001004: 9 (qemu) </pre><p>The memory location that corresponds with the <b>result</b> variable does in fact contain the value 9 (which is the sum of 4 and 5). Therefore the program is behaving as expected.</p> </div> </div> </div> <div class="outline-2" id="outline-container-org944c620"> <h1 id="org944c620">Conclusion</h1> <div class="outline-text-2" id="text-2"> <p>In this chapter, the linking process was studied in more detail. The process of linking the object files into a final binary was hopefully demystified by describing the process of transforming relative offsets to absolute memory offsets. The primary example was setting the location of the top of the stack and setting up the <b>sum</b> function call.</p> <p>The use of a linker script illustrates how code and data can more intelligently be organized in a binary file. The greater flexibility offered when using the linker script has allowed for enhancements to the original <code>add.s</code> program from chapter 1. The operands for the sum are now read from offsets in memory; each in different data sections. Moreover additional assembler directives were introduced in this chapter to allow better control over how code and data are placed in the linked binary file. In particular:</p> <dl class="org-dl"><dt>.text</dt> <dd>Specify that what follows goes into the .text section.</dd> <dt>.data</dt> <dd>Specify that what follows goes into the .data section.</dd> <dt>.bss</dt> <dd>Specify that what follows goes into the .bss section.</dd> <dt>.section</dt> <dd>Set the section name explicitly for the code and data that follows.</dd> <dt>.word</dt> <dd>Store the specified 32-bit quantities into successive memory words.</dd> </dl><p>Other useful assembler directives that were not covered in this chapter include:</p> <dl class="org-dl"><dt>.byte</dt> <dd>Store the specified 8-bit quantities into successive bytes of memory.</dd> <dt>.half</dt> <dd>Store the specified 16-bit quantities into successive memory half-words.</dd> <dt>.dword</dt> <dd>Store the specified 64-bit quantities into successive memory double-words.</dd> <dt>.string</dt> <dd>Store a string in memory and null-terminate it.</dd> </dl><p>Following chapters will start to look at some of the standard extensions of the RISC-V ISA. So far only the RV64I instructions have been used, wereas the ISA defines extensions for multiplication (RVM), atomic operations (RVA), floating point and double precision floating point operations (RVF and RVD), and compressed instructions (RVC). These extensions will be useful for building more complex programs.</p> </div> </div> </div> </div> <span rel="sioc:has_creator" class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/main/user/1" lang="" about="/main/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">MarcAdmin</a></span> <span property="dc:date dc:created" content="2019-11-12T17:27:29+00:00" datatype="xsd:dateTime" class="field field--name-created field--type-created field--label-hidden">Tue, 11/12/2019 - 12:27</span> <div class="field field--name-field-tags field--type-entity-reference field--label-above clearfix"> <h3 class="field__label">Tags</h3> <ul class="links field__items"> <li><a href="/main/riscv" rel="dc:subject" hreflang="en">RISC-V</a></li> </ul> </div> Tue, 12 Nov 2019 17:27:29 +0000 MarcAdmin 29 at https://www.vociferousvoid.org/main https://www.vociferousvoid.org/main/riscv_bare_metal_chapter3#comments RISC-V Bare Metal Programming Chapter 2: OpCodes Assemble! https://www.vociferousvoid.org/main/riscv_bare_metal_chapter2 <span property="dc:title" class="field field--name-title field--type-string field--label-hidden">RISC-V Bare Metal Programming Chapter 2: OpCodes Assemble!</span> <div property="content:encoded" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><div class="tex2jax_process"> <p>The <a href="/mmlab/riscv_bare_metal_chapter1">previous chapter</a> of this tutorial went over the steps required to setup a RISC-V development environment to create a program that runs on a bare metal VirtIO board using QEMU. Even though the example program – which calculates the sum two integers – was written in RISC-V assembly, no prior knowledge was required to follow along. This chapter will dive into the details of RISC-V assembly language as well as expand on what exactly is happening at each step of the development. The topics covered in this chapter will include an overview of the RISC-V architecture, its assembly instructions, pseudo-instructions, and directives.</p> <p>The following listing illstrates the assembly code of the <b>add.s</b> program from the previous chapter:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="coderef-off" id="coderef-section"><span class="linenr">1: </span> <span style="color: #00ffff;">.text</span></span> <span class="coderef-off" id="coderef-start_sym"><span class="linenr">2: </span> <span style="color: #00ffff;">.global</span> _start</span> <span class="linenr">3: </span><span style="color: #87cefa;">_start</span>: <span class="coderef-off" id="coderef-load5"><span class="linenr">4: </span> <span style="color: #00ffff;">li</span> a0, 5</span> <span class="coderef-off" id="coderef-load4"><span class="linenr">5: </span> <span style="color: #00ffff;">li</span> a1, 4</span> <span class="coderef-off" id="coderef-add"><span class="linenr">6: </span> <span style="color: #00ffff;">add</span> a0, a0, a1</span> <span class="coderef-off" id="coderef-loop"><span class="linenr">7: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop</span> </pre></div> <p>The code was changed a little to use a different set of registers. This program when assembled results in an object file which can then be linked to create the file that will be loaded onto the board. In this scenario, only the one object file is used, however, more complex programs may required more than one object file. The linker's job is to put all of these files together into a single executable program.</p> <p>This <b>add.s</b> program is composed of one instruction (<a class="coderef" href="#coderef-add" onmouseout="CodeHighlightOff(this, 'coderef-add');" onmouseover="CodeHighlightOn(this, 'coderef-add');">6</a>), three pseudo-instructions (<a class="coderef" href="#coderef-load5" onmouseout="CodeHighlightOff(this, 'coderef-load5');" onmouseover="CodeHighlightOn(this, 'coderef-load5');">4</a> <a class="coderef" href="#coderef-load4" onmouseout="CodeHighlightOff(this, 'coderef-load4');" onmouseover="CodeHighlightOn(this, 'coderef-load4');">5</a>, and <a class="coderef" href="#coderef-loop" onmouseout="CodeHighlightOff(this, 'coderef-loop');" onmouseover="CodeHighlightOn(this, 'coderef-loop');">7</a>), and two directives (<a class="coderef" href="#coderef-section" onmouseout="CodeHighlightOff(this, 'coderef-section');" onmouseover="CodeHighlightOn(this, 'coderef-section');">1</a>, <a class="coderef" href="#coderef-start_sym" onmouseout="CodeHighlightOff(this, 'coderef-start_sym');" onmouseover="CodeHighlightOn(this, 'coderef-start_sym');">2</a>). Moreover, the instructions and pseudo-instructions have operands comprised of either registers or immediates. Understanding each of these entities will help when creating more complex programs in RISC-V assembly.</p> <div class="outline-2" id="outline-container-org71d0bd6"> <h1 id="org71d0bd6">Registers</h1> <div class="outline-text-2" id="text-1"> <p>RISC-V systems will have a base set of 32 registers <b>x0</b>-<b>x31</b>. The <b>x0</b> register is read-only with a value fixed to zero. The rest will have varying content. The application binary interface (ABI) prescribes conventions for the name and usage of the various registers. These are listed in the following table:</p> <table border="2" cellpadding="6" cellspacing="0" frame="hsides" rules="groups"><colgroup><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /></colgroup><thead><tr><th class="org-left" scope="col">Register(s)</th> <th class="org-left" scope="col">ABI Name(s)</th> <th class="org-left" scope="col">Description</th> <th class="org-left" scope="col">Saved by</th> </tr></thead><tbody><tr><td class="org-left">x0</td> <td class="org-left">zero</td> <td class="org-left">Hard-wired zero</td> <td class="org-left">N/A</td> </tr><tr><td class="org-left">x1</td> <td class="org-left">ra</td> <td class="org-left">Function return address</td> <td class="org-left">Caller</td> </tr><tr><td class="org-left">x2</td> <td class="org-left">sp</td> <td class="org-left">Stack pointer</td> <td class="org-left">Callee</td> </tr><tr><td class="org-left">x3</td> <td class="org-left">gp</td> <td class="org-left">Global pointer</td> <td class="org-left">N/A</td> </tr><tr><td class="org-left">x4</td> <td class="org-left">tp</td> <td class="org-left">Thread pointer</td> <td class="org-left">N/A</td> </tr><tr><td class="org-left">x5</td> <td class="org-left">t0</td> <td class="org-left">Temporary/alternate link register</td> <td class="org-left">Caller</td> </tr><tr><td class="org-left">x6-x7</td> <td class="org-left">t1-t2</td> <td class="org-left">Temporary values</td> <td class="org-left">Caller</td> </tr><tr><td class="org-left">x8</td> <td class="org-left">s0/fp</td> <td class="org-left">Saved register/Frame pointer</td> <td class="org-left">Callee</td> </tr><tr><td class="org-left">x9</td> <td class="org-left">s1</td> <td class="org-left">Saved register</td> <td class="org-left">Callee</td> </tr><tr><td class="org-left">x10-x11</td> <td class="org-left">a0-a1</td> <td class="org-left">Function arguments/Return values</td> <td class="org-left">Caller</td> </tr><tr><td class="org-left">x12-x17</td> <td class="org-left">a2-a7</td> <td class="org-left">Function arguments</td> <td class="org-left">Caller</td> </tr><tr><td class="org-left">x18-x27</td> <td class="org-left">s2-s11</td> <td class="org-left">Saved registers</td> <td class="org-left">Callee</td> </tr><tr><td class="org-left">x28-x31</td> <td class="org-left">t3-t6</td> <td class="org-left">Temporary values</td> <td class="org-left">Caller</td> </tr></tbody></table><p>Registers can be referred by their ABI names or their actual names in assembly programs; the two are interchangeable.</p> <p>When a function is invoked, it may modify the values of some of these registers. For this reason it is advisable to save the contents of those registers in memory in order to be able to restore them when the function completes. The ABI convention prescribes which party in a function call (the caller or the callee) is responsible for saving these values. This convention is described in the "Saved by" column of the table.</p> <p>If a register is to be saved by the caller, its value should be stored in a frame of the stack, that was allocated for that purpose, prior to calling the function. This ensures that the values can be restored when the function returns. In general, it is a good idea to save all of the registers if the caller does not know which registers may be modified by the callee.</p> <p>Registers to be saved by the callee only need to be saved to memory if the function uses those registers. Functions must not leave a trace, the state of the machine must be the same as it was prior to the function being invoked (with the exception of the desired function result).</p> <p>A function implementation defined in RISC-V assembly should use the following prologue before doing any of its work:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #87cefa;">function_label</span>: <span style="color: #00ffff;">addi</span> sp, sp, -framesize # The stack grows downward <span style="color: #00ffff;">sd</span> ra,framesize-8(sp) # Save the return address # Save registers owned by the callee as needed to memory. </pre></div> <p>This will ensure that the function can return to the point where it was called, and that any register state will be saved.</p> <p>Before a function returns, the saved register values must be restored. This is achieved by the following epilogue which should end a function call.</p> <div class="org-src-container"> <pre class="src src-asm"> # restore registers from the stack if needed <span style="color: #00ffff;">ld</span> ra, framesize-8(sp) # Restore the return address register <span style="color: #00ffff;">addi</span> sp, sp, framesize # Pop the stack <span style="color: #00ffff;">ret</span> # return to the caller </pre></div> <p>This will restore the saved registers, set the return address and de-allocate the stack frame that was used to save this information.</p> <p>The <b>add.s</b> program can be enhanced to use the function prologue and epliogue to define a function that calculates the sum its arguments in registers <b>a0</b> and <b>a1</b>. This new program is illustrated in the listing that follows:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #00ffff;">.text</span> <span style="color: #00ffff;">.align</span> 2 <span style="color: #00ffff;">.global</span> sum <span style="color: #87cefa;">sum</span>: <span style="color: #00ffff;">addi</span> sp, sp, -32 # Stack frames must be 16-bit aligned <span style="color: #00ffff;">sd</span> ra, 24(sp) # Save the return address <span style="color: #00ffff;">add</span> a0, a0, a1 # Add the function operands <span style="color: #00ffff;">ld</span> ra, 24(sp) # restore return address <span style="color: #00ffff;">addi</span> sp, sp, 32 # De-allocate the stack frame <span style="color: #00ffff;">ret</span> </pre></div> <p>The <b>sum</b> function can be called by name from a different assembler program. Create a <b>main.s</b> program with the following content:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #00ffff;">.text</span> <span style="color: #00ffff;">.align</span> 2 <span style="color: #00ffff;">.global</span> _start <span style="color: #87cefa;">_start</span>: <span style="color: #00ffff;">li</span> a0, 5 <span style="color: #00ffff;">li</span> a1, 4 <span style="color: #00ffff;">call</span> sum <span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop </pre></div> <p>This will load the values 5 and 4 into the registers used for arguments to the the sum function and call it. This can be assembled and linked as follows:</p> <pre class="example"> $ riscv64-unknown-elf-as -o add.o add.s $ riscv64-unknown-elf-as -o main.o main.s $ riscv64-unknown-elf-ld -Ttext=0x80000000 -o sum.elf main.o add.o </pre><p>If this program is assembled, linked, and run in QEMU, it will call the <b>sum</b> function to calculate the sum of the operands. This can be verified by inspecting the value of register <b>a0</b> which should be 9.</p> <p><b>NOTE</b>: The order in which the object files are supplied to the linker is important. If the <b>add.o</b> file is supplied first, the program will not run because its content will be located at the reset address.</p> </div> </div> <div class="outline-2" id="outline-container-org7a6a704"> <h1 id="org7a6a704">Instructions</h1> <div class="outline-text-2" id="text-2"> <p>Instructions are mnemonics that map directly to machine codes. For example the <b>add</b> instruction in the <b>sum</b> function corresponds with the op-code 0x33 (or b0110011).</p> <p>When the <b>add</b> instruction is combined with its operands, the result is a single machine instruction. In RISC-V, all machine instructions are 32-bits long (unless you're using the compressed extension, but for now we'll just deal with 32-bit operations).</p> <p>The <b>add</b> instruction is what's known as an R-type instruction because its operands are all registers. This type of instruction has the following form:</p> <table border="2" cellpadding="6" cellspacing="0" frame="hsides" rules="groups"><colgroup><col class="org-left" /><col class="org-right" /><col class="org-right" /><col class="org-right" /><col class="org-right" /><col class="org-right" /><col class="org-right" /></colgroup><thead><tr><th class="org-left" scope="col">BITS</th> <th class="org-right" scope="col">31:25</th> <th class="org-right" scope="col">24:20</th> <th class="org-right" scope="col">19:15</th> <th class="org-right" scope="col">14:12</th> <th class="org-right" scope="col">11:7</th> <th class="org-right" scope="col">6:0</th> </tr></thead><tbody><tr><td class="org-left">R-Type</td> <td class="org-right">func7</td> <td class="org-right">rs2</td> <td class="org-right">rs1</td> <td class="org-right">func3</td> <td class="org-right">rd</td> <td class="org-right">opcode</td> </tr></tbody></table><p>In this table, the operation's function is a combination of <b>func7</b> and <b>func3</b>. For the <b>add</b> instruction, this is b0000000 and b000. The <b>rs2</b> and <b>rs1</b> field are the source registers whose value will be added. The <b>rd</b> field will be the destination register for the result. Notice that the register fields are 5-bits wide. This allows the instruction to reference any of the 32 base registers (i.e. 2<sup>5</sup> registers). The <b>add</b> instruction from the previous example will be constructed as follows:</p> <dl class="org-dl"><dt>func7</dt> <dd>b0000000</dd> <dt>rs2</dt> <dd>b01011 (for x11 which is a1)</dd> <dt>rs1</dt> <dd>b01010 (for x10 which is a0)</dd> <dt>func3</dt> <dd>b000</dd> <dt>rd</dt> <dd>b01010 (for x10)</dd> <dt>opcode</dt> <dd>b0110011</dd> </dl><p>Putting it all together, we get b00000000101101010000010100110011, or 0x00B50533. We can confirm this by disassembling the object file that was created:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #87cefa;">$</span> <span style="color: #00ffff;">riscv64</span>-unknown-elf-objdump -d add.o <span style="color: #87cefa;">sum</span>.o: file format elf64-littleriscv <span style="color: #87cefa;">Disassembly</span> <span style="color: #00ffff;">of</span> section .text: <span style="color: #87cefa;">0000000000000000</span> &lt;sum&gt;: <span style="color: #00ffff;">0</span>: fe010113 addi sp,sp,-32 <span style="color: #00ffff;">4</span>: 00113c23 sd ra,24(sp) <span style="color: #00ffff;">8</span>: 00b50533 add a0,a0,a1 <span style="color: #00ffff;">c</span>: 01813083 ld ra,24(sp) <span style="color: #00ffff;">10</span>: 02010113 addi sp,sp,32 <span style="color: #00ffff;">14</span>: 00008067 ret </pre></div> <p>The <b>add</b> instruction in the <b>sum</b> function is at offset 8 of the object file, the machine instruction is 00b50533 which is the value that we had calculated for the instruction.</p> <p>In addition to R-Type instructions, there are also I-Type instructions that operate on immediates (literal values), S-Type for storing to memory, U-Type for loading values from memory, B-Type for branching, and J-Type for jumps (e.g. function calls). The following table describes the layout of each of these instruction types:</p> <table border="2" cellpadding="6" cellspacing="0" frame="hsides" rules="groups"><colgroup><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /></colgroup><thead><tr><th class="org-left" scope="col">BITS</th> <th class="org-left" scope="col">31:25</th> <th class="org-left" scope="col">24:20</th> <th class="org-left" scope="col">19:15</th> <th class="org-left" scope="col">14:12</th> <th class="org-left" scope="col">11:7</th> <th class="org-left" scope="col">6:0</th> </tr></thead><tbody><tr><td class="org-left">R-Type</td> <td class="org-left">func7</td> <td class="org-left">rs2</td> <td class="org-left">rs1</td> <td class="org-left">func3</td> <td class="org-left">rd</td> <td class="org-left">opcode</td> </tr><tr><td class="org-left">I-Type</td> <td class="org-left">imm[11:5]</td> <td class="org-left">imm[4:0]</td> <td class="org-left">rs1</td> <td class="org-left">func3</td> <td class="org-left">rd</td> <td class="org-left">opcode</td> </tr><tr><td class="org-left">S-Type</td> <td class="org-left">imm[11:5]</td> <td class="org-left">rs2</td> <td class="org-left">rs1</td> <td class="org-left">func3</td> <td class="org-left">imm[4:0]</td> <td class="org-left">opcode</td> </tr><tr><td class="org-left">U-Type</td> <td class="org-left">imm[31:25]</td> <td class="org-left">imm[24:20]</td> <td class="org-left">imm[19:15]</td> <td class="org-left">imm[14:12]</td> <td class="org-left">rd</td> <td class="org-left">opcode</td> </tr><tr><td class="org-left">B-Type</td> <td class="org-left">imm[12,10:5]</td> <td class="org-left">rs2</td> <td class="org-left">rs1</td> <td class="org-left">func3</td> <td class="org-left">imm[4:1,11]</td> <td class="org-left">opcode</td> </tr><tr><td class="org-left">J-Type</td> <td class="org-left">imm[20,10:5]</td> <td class="org-left">imm[4:1,11]</td> <td class="org-left">imm[19:15]</td> <td class="org-left">imm[14:12]</td> <td class="org-left">rd</td> <td class="org-left">opcode</td> </tr></tbody></table></div> <div class="outline-3" id="outline-container-orga251f7e"> <h2 id="orga251f7e">Pseudo-Instructions</h2> <div class="outline-text-3" id="text-2-1"> <p>Unlike instructions, pseudo-instructions do not map directly to op-codes. Typically these represent idioms to make the programmer's life a little easier.</p> <p>For example, the <b>li</b> in the <b>sum</b> program is an example of a pseudo-instruction. As explained in the previous chapter, this pseudo-instruction maps to an <b>addi</b> I-Type instruction which adds the value of <b>x0</b> to the immediate value and stores the result in the destination register. Pseudo-instructions provide convenient mnemonics for programming without adding additional op-codes.</p> <p>Pseudo-instructions may also translate to more than one assembler instruction. For example, the <strong>call</strong> pseudo-instruction will be<br /> translated into a sequence of three instructions: <strong>auipc</strong>, <strong>addi</strong>, and<strong> jal</strong>.</p> <h1>Directives</h1> </div> </div> </div> <div class="outline-2" id="outline-container-org203f53f"> <div class="outline-text-2" id="text-3"> <p>Directives are commands for the assembler rather than instructions that it will translate into machine code. Directives can be used to tell the assembler where to place code and data in the resulting object file, or to setup the memory of the target system. The previous example used assembler directives to export global symbols, to set the alignment for instructions, and to ensure that the code is assembled into the ".text" section of the object file.</p> <p>To understand the purpose of the assembler directives, it is important to understand how assembled code is linked together. The assembler produces object files that are combined to produce an Executable and Linkable Format (ELF) file. This file will be segmented into different sections:</p> <dl class="org-dl"><dt>.text</dt> <dd>CPU instructions (the executable code).</dd> <dt>.rodata</dt> <dd>Read-only data.</dd> <dt>.data</dt> <dd>Global, mutable, initialized data.</dd> <dt>.bss</dt> <dd>Global, mutable, un-initialized data.</dd> </dl><p>Up to now, only the text section has been used. The location where the code is loaded was specified using the "-T" option when invoking the linker. If multiple object files are passed to the linker, their text sections merged into a single contiguous section.</p> <p>Code and data will have different run-time requirements. Code is generally read-only where as data my required read-write permissions. Therefore it is advantageous that code and data are not interleaved. To ensure this, the locations of text and data sections of the program should not overlap.</p> <p>To avoid having to define the position of each section at the command line, the linker allows the memory layout to be defined using a linker script:</p> <pre class="example"> OUTPUT_ARCH( "riscv" ) SECTIONS { . = 0x80000000; .text : { PROVIDE(_text_start = .); .*(.text.init) main.o (.text) .*(.text .text.*) PROVIDE(_text_end = .); } PROVIDER(_global_pointer = ,); .rodata : { PROVIDE(_rodata_start = .); .*(.rodata .rodata.*) PROVIDE(_rodata_end = .); } .data : { . = ALIGN(4096); PROVIDE(_data_start = .); .*(.sdata .sdata.*) *(.data .data.*) PROVIDE(_data_end = .); } .bss : { PROVIDE(_bss_start = .); .*(.sbss .sbss.*) *(.bss .bss.*) PROVIDE(_bss_end = .); } PROVIDE(_stack_start = _bss_end); PROVIDE(_stack_end = _stack_start + 0x8000); } </pre><p>The <code>SECTIONS</code> keyword is used to specify how the various sections are layed out in the file. In the linker script shown previously, the <code>.text</code>, <code>.rodata</code>, <code>.data</code>, and <code>.bss</code> sections are defined.</p> <p>The .text section will include all code that follows a <code>.section .text.init</code> or <code>.text</code> assembler directive. The <b>sum.s</b> and <b>main.s</b> will therefore both be included in this section. On line <a class="coderef" href="#coderef-main" onmouseout="CodeHighlightOff(this, 'coderef-main');" onmouseover="CodeHighlightOn(this, 'coderef-main');">7</a> of the linker script, the text section of the <b>main.o</b> object file is included explicitly. This will ensure that the main program appears in the linked program before the <b>sum</b> function does (which is included by the wildcard on the next line).</p> <p>The <code>PROVIDE</code> keyword is used to define a symbol at the address of the definition. The start and end of each of the sections will be provided by the linker. Moreover, the start and end of the stack memory area can be declared in this way. In a later chapter, these symbols will be used to setup the stack pointer.</p> <h1>Putting it All Together</h1> </div> </div> <div class="outline-2" id="outline-container-orgd504155"> <div class="outline-text-2" id="text-4"> <p>The program can now be assembled and linked using the following sequence of commands:</p> <pre class="example"> $ riscv64-unknown-elf-as -o sum.o sum.s $ riscv64-unknown-elf-as -o main.o main.s $ riscv64-unknown-elf-ld -T linker.lds -o sum.elf main.o sum.o </pre><p>This will produce an ELF file called <b>sum.elf</b>. By inspecting the <b>sum.elf</b> file, we can see that the <b>_start</b> symbol shows up before the <b>sum</b> function:</p> <pre class="example"> $ riscv64-unknown-elf-objdump -d sum.elf sum.elf: file format elf64-littleriscv Disassembly of section .text: 0000000080000000 &lt;_start&gt;: 80000000: 00500513 li a0,5 80000004: 00400593 li a1,4 80000008: 00009117 auipc sp,0x9 8000000c: ff810113 addi sp,sp,-8 # 80009000 &lt;_stack_end&gt; 80000010: 008000ef jal ra,80000018 &lt;sum&gt; 0000000080000014 &lt;stop&gt;: 80000014: 0000006f j 80000014 &lt;stop&gt; 0000000080000018 &lt;sum&gt;: 80000018: fe010113 addi sp,sp,-32 8000001c: 00113c23 sd ra,24(sp) 80000020: 00b50533 add a0,a0,a1 80000024: 01813083 ld ra,24(sp) 80000028: 02010113 addi sp,sp,32 8000002c: 00008067 ret </pre><p>The disassembled <b>sum.elf</b> also shows that the <b>call</b> pseudo-instruction was translated to the following sequence of instructions:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #87cefa;">auipc</span> <span style="color: #00ffff;">sp</span>,0x9 <span style="color: #87cefa;">addi</span> <span style="color: #00ffff;">sp</span>,sp,-8 <span style="color: #87cefa;">jal</span> <span style="color: #00ffff;">ra</span>,80000018 </pre></div> <p>This program can be run in QEMU just as before and the result should be the same as previous runs.</p> </div> </div> <div class="outline-2" id="outline-container-orgd0dd85d"> <h1 id="orgd0dd85d">Conclusion</h1> <div class="outline-text-2" id="text-5"> <p>This chapter of the Bare Metal RISC-V tutorial covered the assembly language in a little more details. Assembly programs are made up of directives, pseudo-instructions, and instructions. Directives provide guidance to the assembler on how to organize the assembled code. Pseudo-instructions provide useful mnemonics that are mapped to one or more primitive assembler instructions. Instructions are translated into binary machine instructions which direct the execution flow of the processor. In future chapters, this information will be utilised to make the RISC-V processors do more intersting things.</p> </div> </div> </div> </div> <span rel="sioc:has_creator" class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/main/user/1" lang="" about="/main/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">MarcAdmin</a></span> <span property="dc:date dc:created" content="2019-11-06T13:03:58+00:00" datatype="xsd:dateTime" class="field field--name-created field--type-created field--label-hidden">Wed, 11/06/2019 - 08:03</span> <div class="field field--name-field-tags field--type-entity-reference field--label-above clearfix"> <h3 class="field__label">Tags</h3> <ul class="links field__items"> <li><a href="/main/riscv" rel="dc:subject" hreflang="en">RISC-V</a></li> </ul> </div> Wed, 06 Nov 2019 13:03:58 +0000 MarcAdmin 28 at https://www.vociferousvoid.org/main https://www.vociferousvoid.org/main/riscv_bare_metal_chapter2#comments RISC-V Bare Metal Programming Chapter 1: The Setup https://www.vociferousvoid.org/main/riscv_bare_metal_chapter1 <span property="dc:title" class="field field--name-title field--type-string field--label-hidden">RISC-V Bare Metal Programming Chapter 1: The Setup</span> <div property="content:encoded" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><div class="tex2jax_process"> <p>This tutorial, will walk through the process of building and running a RISC-V program on bare metal hardware. The reader is assumed to be familiar with the GNU toolchain and basic C programming. Assembly experience is useful but should not be required to follow along.</p> <p>There are many tutorials available to get started with bare metal programming<sup><a class="footref" href="#fn.1" id="fnr.1">1</a></sup><sup>, </sup><sup><a class="footref" href="#fn.2" id="fnr.2">2</a></sup>. However, most of them are for more common architectures such as x86 and ARM-32. Moreover, many of the existing tutorials are targetted at aspiring OS developers and often assume that the reader is already familiar with embedded programming. The objective of this tutorial is to learn how to walk before being asked to run.</p> <p>This tutorial is loosely modelled after a similar one for the ARM-32 architecture: <a href="http://www.bravegnu.org/gnu-eprog/index.html">Embedded Programming with the GNU Toolchain</a>. However, the RISC-V ISA will be used rather than ARM-V5TE. QEMU will be used to emulate the hardware platform, thus allowing the learner to proceed without having to obtain an actual board.</p> <h2>Why RISC-V?</h2> <p>RISC-V is a modern and open instruction set architecture (ISA). As opposed to X86 and ARM, which have existed for decades and have been updated incrementally, RISC-V was developed from scratch as a clean-slate, minimalist and open ISA informed by the mistakes of the past. As an open ISA, RISC-V is ideal for educational purposes as it is not subject to the whims or fates of a single corporation. More importantly, I have not found may resources for bare metal programming using this architecture<sup><a class="footref" href="#fn.3" id="fnr.3">3</a></sup>, therefore this tutorial aims to that knowledge void.</p> <h2>Setting Up the Host</h2> <p>Most programmers start off in a self-hosted envrionment. This means that they write programs for an environment using the same environment (e.g. writing GNU/Linux programs using a GNU/Linux machine). However, in embedded development, the machine on which a program is run (the target) will generally have a different architecture from the one on which it was created (the host). This section will outline the steps for setting up a GNU/Linux workstation as a host development environment (in my case Debian Buster) to build programs for a RISC-V 64-bit architecture. The steps oulined herein will be specific to Debian based GNU/Linux distributions (i.e. Ubuntu, PureOS, Mint)<sup><a class="footref" href="#fn.4" id="fnr.4">4</a></sup>.</p> <p>The first step of any kind of programming is setting up the toolchain. Since we're building for the RISC-V architecture, we will need a suitable toolchain, and the <a href="https://github.com/riscv/riscv-gnu-toolchain">RISC-V GNU Compiler Toolchain</a> fits the bill.</p> <p>Some dependencies are required in order to build the toolchain. These can be installed using the following command:</p> <pre class="example"> $ sudo apt-get install autoconf automake autotools-dev curl \ libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison \ flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev </pre><p>Next retrieve the source for the RISC-V GNU toolchain and all the required sub-modules:</p> <pre class="example"> $ git clone --recursive https://github.com/riscv/riscv-gnu-toolchain </pre><p>Since we are dealing with a bare metal system, the toolchain must be built for the <a href="https://www.sourceware.org/newlib/">Newlib library</a>. The Newlib library is an implementation of the standard C library for embedded systems.</p> <p>Choose a destination folder on your host system where the toolchain will be installed (this tutorial will assume <code>/opt/riscv/</code>), ensuring that its location is in the PATH. Then build the toolchain with the following commands:</p> <pre class="example"> ./configure --prefix=/opt/riscv make </pre><h2>Do As RISC Does</h2> <p>Once the compiler toolchain is setup, the next requirement is an environment in which to run the programs thereby generated. QEMU will be used for this purpose. Use <code>apt-get</code> on Debian systems to install QEMU:</p> <pre class="example"> $ apt-get install qemu-system-misc </pre><p>This will install several QEMU binaries supporting various architectures including:</p> <ul class="org-ul"><li>qemu-system-riscv64</li> <li>qemu-system-riscv32</li> </ul><p>We're interested in the 64-bit RISC-V version.</p> <h1>RISC-V is Alive</h1> <p>This section describes the process of writing a simple RISC-V program in assembly, and running it on a bare metal virtual board emulated by QEMU. The programming examples are modelled after those in the <a href="http://www.bravegnu.org/gnu-eprog/index.html">Embedded Programming with the GNU Toolchain</a> tutorial, but adapted from ARM-32 to the RISC-V 64-bit architecture.</p> <p>Each line of the assembly program is composed of three optional elements: a label, an instruction, and a comment:</p> <dl class="org-dl"><dt>label</dt> <dd>A label is a convenience to allow the program to refer to a particular memory location using a symbolic name. A label is composed of a sequence of alphanumeric characters, in addition to underscores (_) or dollar signs ($). A label will always be terminated by a colon (:).</dd> <dt>instruction</dt> <dd>Instructions consist of either RISC-V assembly instructions or assembler directives. Assembler directives are prefixed with a period (.)</dd> <dt>comment</dt> <dd>Comments are preceded by an "#" charachter. Anything that follows that character will be ignored until the first newline character.</dd> </dl><p>The first example will simply calculate the sum of two literal numerical values:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #00ffff;">.text</span> <span style="color: #00ffff;">.global</span> _start <span style="color: #87cefa;">_start</span>: <span style="color: #00ffff;">li</span> a2, 5 # a2 = 5 <span style="color: #00ffff;">li</span> a3, 4 # a3 = 4 <span style="color: #00ffff;">add</span> a0, a2, a3 # a0 = a2 + a3 <span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop </pre></div> <p>The first line is an assembler directive which indicates that the program is meant to go in the "text" section of the binary file. The next line uses the <b>.global</b> directive to define the <b>_start</b> symbol. This will ensure that this symbol is visible to the loader. Next the <b>_start</b> label is defined to indicate the start offset of the program.</p> <p>The first two instructions will load integer values 5 and 4 in to registers <b>a2</b> and <b>a3</b> respectively. The instruction mnemonic stands for "Load Immediate". This is a pseudo instruction since it does not actually correspond with a RISC-V opcode; internally this maps to a RISC-V <b>addi</b> instruction:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #87cefa;">addi</span> <span style="color: #00ffff;">a2</span>,zero,5 <span style="color: #87cefa;">addi</span> <span style="color: #00ffff;">a3</span>,zero,4 </pre></div> <p>The last line defines the label "stop" and a jump instruction that will return the program counter to the memory address at that label; thus will loop indefinitely.</p> <p>Save the program to a file called <code>add.s</code> then compile it using the following command:</p> <pre class="example"> $ riscv64-unknown-elf-as -o add.o add.s </pre><p>This will create the <b>add.o</b> object file representing the assembled program. Before it can be used, the program must be linked into suitable executable file. We use the loader for this purpose. Moreover since we are targetting a bare metal machine, we have to ensure that our program is loaded to an address where it can be found by the processor.</p> <p>We will be working with the <b>virt</b> machine emulated by QEMU. The reset vector for this machine is located at address 0x8000000. This means that the first instruction that will be executed when the processor is reset will be the one at memory location 0x80000000. Therefore we must ensure that the memory offset labelled "<sub>start</sub>" in our program is loaded at the reset vector location:</p> <pre class="example"> $ riscv64-unknown-elf-ld -Ttext=0x80000000 -o add.elf add.o </pre><p>The <b>-Ttext=</b> options forces the text section (defined using the <b>.text</b> assembler directive) to be loaded at the given memory address. We can verify that the instruction with the label _start is at the correct location using the <b>nm</b> command. The output should be similar to the following:</p> <pre class="example"> $ riscv64-unknown-elf-nm add.elf 0000000080001010 T __BSS_END__ 0000000080001010 T __bss_start 0000000080001010 T __DATA_BEGIN__ 0000000080001010 T _edata 0000000080001010 T _end 0000000080001810 A __global_pointer$ 0000000080001010 T __SDATA_BEGIN__ 0000000080000000 T _start 000000008000000c t stop $ </pre><p>Notice that the <b>_start</b> label in the previous example is located at address 0000000080000000, which corresponds with the reset vector of the virt machine. We can now run the program in QEMU using the following command:</p> <pre class="example"> $ qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel add.elf QEMU 3.1.0 monitor - type 'help' for more information (qemu) </pre><p>This executes the RISC-V 64-bit QEMU emulator with the following options:</p> <dl class="org-dl"><dt>-M virt</dt> <dd>This sets the machine type to 'virt' which models a RISC-V VirtIO board using the priviledge RISC-V ISA version (1.10).</dd> <dt>-serial /dev/null</dt> <dd>Since there is no I/O in our the serial output is redirected to <code>/dev/null</code>.</dd> <dt>-nographic</dt> <dd>Again, since there is no I/O, we don't need a graphical UI.</dd> <dt>-kernel add.elf</dt> <dd>Load the kernel in add.elf.</dd> </dl><p>Although it doesn't seem that the program has done very much, remember that it is very basic and does not involve any kind if I/O. To ensure that it did what was expected, we have to inspect the state of the machine. The QEMU console allows us to do this. We can inspect the state of the registers using the <code>info registers</code> command:</p> <pre class="example"> (qemu) info registers pc 000000008000000c mhartid 0000000000000000 mstatus 0000000000000000 mip 0000000000000000 mie 0000000000000000 mideleg 0000000000000000 medeleg 0000000000000000 mtvec 0000000000000000 mepc 0000000000000000 mcause 0000000000000000 zero 0000000000000000 ra 0000000000000000 sp 0000000000000000 gp 0000000000000000 tp 0000000000000000 t0 0000000080000000 t1 0000000000000000 t2 0000000000000000 s0 0000000000000000 s1 0000000000000000 a0 0000000000000009 a1 0000000000001020 a2 0000000000000005 a3 0000000000000004 a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000 s8 0000000000000000 s9 0000000000000000 s10 0000000000000000 s11 0000000000000000 t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 ft0 0000000000000000 ft1 0000000000000000 ft2 0000000000000000 ft3 0000000000000000 ft4 0000000000000000 ft5 0000000000000000 ft6 0000000000000000 ft7 0000000000000000 fs0 0000000000000000 fs1 0000000000000000 fa0 0000000000000000 fa1 0000000000000000 fa2 0000000000000000 fa3 0000000000000000 fa4 0000000000000000 fa5 0000000000000000 fa6 0000000000000000 fa7 0000000000000000 fs2 0000000000000000 fs3 0000000000000000 fs4 0000000000000000 fs5 0000000000000000 fs6 0000000000000000 fs7 0000000000000000 fs8 0000000000000000 fs9 0000000000000000 fs10 0000000000000000 fs11 0000000000000000 ft8 0000000000000000 ft9 0000000000000000 ft10 0000000000000000 ft11 0000000000000000 </pre><p>Remember that we had loaded the immediate value 5 in register <b>a2</b>, and the immediate value 4 in register <b>a3</b>. The values in the register dump reflect this. Moreover the program will sum the values in <b>a2</b> and <b>a3</b> and store the result in register <b>a0</b>. The regiseter dump shows that the value in register <b>a0</b> is in fact 9, which is the sum of 4 and 5. Moreover we see that the program counter in register <b>pc</b> is at memory address 0x8000000c which corresponds with the label "stop" which is an infinite loop.</p> <h1>Conclusion</h1> <p>This tutorial went through the process of setting up a host environment for developing bare metal RISC-V programs, creating a simple program to add two numbers in RISC-V assembly, then running this program using QEMU. In future tutorials, the coding examples will get progressively more complex. However, the host environment will remain the same.</p> <h1>Footnotes</h1> <div id="text-footnotes"> <div class="footdef"><sup><a class="footnum" href="#fnr.1" id="fn.1">1</a></sup><div class="footpara"> <p class="footpara"><a href="http://www.bravegnu.org/gnu-eprog/index.html">http://www.bravegnu.org/gnu-eprog/index.html</a></p> </div> </div> <div class="footdef"><sup><a class="footnum" href="#fnr.2" id="fn.2">2</a></sup><div class="footpara"> <p class="footpara"><a href="https://wiki.osdev.org/Bare_Bones">https://wiki.osdev.org/Bare_Bones</a></p> </div> </div> <div class="footdef"><sup><a class="footnum" href="#fnr.3" id="fn.3">3</a></sup><div class="footpara"> <p class="footpara">One good reference is Stepen Marz tutorial for building an OS in Rust: <a href="http://osblog.stephenmarz.com/index.html">http://osblog.stephenmarz.com/index.html</a></p> </div> </div> <div class="footdef"><sup><a class="footnum" href="#fnr.4" id="fn.4">4</a></sup><div class="footpara"> <p class="footpara">Future tutorials may focus on setting up other host platforms.</p> </div> </div> </div> </div> </div> <span rel="sioc:has_creator" class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/main/user/1" lang="" about="/main/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">MarcAdmin</a></span> <span property="dc:date dc:created" content="2019-11-02T01:27:24+00:00" datatype="xsd:dateTime" class="field field--name-created field--type-created field--label-hidden">Fri, 11/01/2019 - 21:27</span> <div class="field field--name-field-tags field--type-entity-reference field--label-above clearfix"> <h3 class="field__label">Tags</h3> <ul class="links field__items"> <li><a href="/main/riscv" rel="dc:subject" hreflang="en">RISC-V</a></li> </ul> </div> Sat, 02 Nov 2019 01:27:24 +0000 MarcAdmin 27 at https://www.vociferousvoid.org/main https://www.vociferousvoid.org/main/riscv_bare_metal_chapter1#comments