Think Outloud About Nothing - Cogitationes ex mentis et machina https://www.vociferousvoid.org/main/ en RISC-V Bare Metal Programming - Chapter 5: It's a Trap! https://www.vociferousvoid.org/main/riscv_bare_metal_chapter5 <span property="dc:title" class="field field--name-title field--type-string field--label-hidden">RISC-V Bare Metal Programming - Chapter 5: It&#039;s a Trap!</span> <div property="content:encoded" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><div class="tex2jax_process"> <p>Up to this point, the RISC-V tutorial has focused on single applications running on a single hardware thread. The application's environment was composed of the processor's state and memory map. The processor state was controlled via assembly instructions, and the memory map was defined at build time via a linker script. However, modern systems are almost always multiprogrammed and will execute many applications concurrently (by interleaving their instructions in a single stream). Moreover, multiple hardware threads are common, allowing application instructions to execute simultaneously. This workflow requires a lot more care to ensure correctness and proper separation of memory. This idea was touched upon briefly in <a href="https://www.vociferousvoid.org/main/riscv_bare_metal_chapter4">chapter 4</a> while discussing the <b>A</b> extension which provides atomic memory operations that were used to define synchronization primitives. In addition to memory synchronization, the application must control the execution environment of all of the active hardware threads, this chapter will explore the mechanisms available for this purpose.</p> <div class="outline-2" id="outline-container-orgbf395c9"> <h2 id="orgbf395c9">The ABI</h2> <div class="outline-text-2" id="text-orgbf395c9"> <p>The examples presented thus far can all be logically separated into three layers: The application execution environment (AEE), the application binary interface (ABI), and the application code. This organisation allows for a single application to execute in a single AEE.</p> <p>The AEE is defined by the processor state and the static memory map which is defined by the linker script. The ABI describes the conventions that programs must follow and manages dynamic memory; in this case the stack. The upshot of defining an ABI layer is that application code in the third layer can interact with an abstract view of the machine implementation. This simplifies the development of applications by hiding some of the hardware details. For example, the preamble and postamble code for function calls, which are part of this ABI, were used by the application code to ensure that dynamic memory was managed correctly. These code templates can be provided by developer tools, such as high-level language compilers.</p> <p>This becomes even more important as application environments become more complex. As complexity increases in the application environment, more can be done by the ABI to hide some of the tedious details of the layer beneath it. This is where the operating system (OS) comes into play. The OS is a supervisor process which is sandwiched between the ABI and the supervisor binary interface (SBI). In this configuration, the SBI hides more of the hardware details from the OS, which provides an even more abstract view of the system to the applications. The definition of an SBI also improves the portability of the OS layer.</p> <p>Another advantage of this layered approach is that it is easier to enforce separation between applications. One application should not interfere with other applications, or with the supervisor itself. The RISC-V ISA defines three different privilege modes to enforce this. From most to least privileged, the three levels are: machine, supervisor, and user.</p> <p>ISA implementations may provide 1-3 of the defined privilege modes. All hardware implementations must provide machine mode (or M-mode). A secure embedded system may provide machine and user modes. A virtualized multiprogramming system should provide all three modes.</p> </div> </div> <div class="outline-2" id="outline-container-org4951020"> <h2 id="org4951020">What's my Pay Grade?</h2> <div class="outline-text-2" id="text-org4951020"> <p>The capabilities of the ISA implementation can be queried via the <b>misa</b> control and status register. The function illustrated in the following listing will determine which integer ISA and the number of privilege modes that are supported by the current hardware thread:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.text</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> __system_check <span class="linenr"> 4: </span><span style="color: #87cefa;">__system_check</span>: <span class="linenr"> 5: </span> # Input <span class="linenr"> 6: </span> # None <span class="linenr"> 7: </span> # <span class="linenr"> 8: </span> # Returns <span class="linenr"> 9: </span> # - a0: The number of supported privilege modes (1, 2 or 3). <span class="linenr">10: </span> # - a1: The register width used by the ISA (in bytes). <span class="coderef-off" id="coderef-load-misa"><span class="linenr">11: </span> <span style="color: #00ffff;">csrr</span> t0, misa # Load misa into t0</span> <span class="coderef-off" id="coderef-load-xlen"><span class="linenr">12: </span> <span style="color: #00ffff;">li</span> a1, 4 # Load minimum register width</span> <span class="linenr">13: </span> <span style="color: #00ffff;">li</span> a0, 1 # M-mode is always supported <span class="linenr">14: </span> # Probe for user-mode <span class="coderef-off" id="coderef-umode-mask"><span class="linenr">15: </span> <span style="color: #00ffff;">lui</span> t1, 0x100 # Set the u-mode mask in t1</span> <span class="linenr">16: </span> <span style="color: #00ffff;">and</span> t1, t0, t1 <span class="coderef-off" id="coderef-check-umode"><span class="linenr">17: </span> <span style="color: #00ffff;">beqz</span> t1, 1f # Determine if the U-mode bit is set in misa</span> <span class="coderef-off" id="coderef-set-umode"><span class="linenr">18: </span> <span style="color: #00ffff;">addi</span> a0, a0, 1 # U-mode is available, increment a0</span> <span class="linenr">19: </span><span style="color: #87cefa;">1</span>: # Probe for supervisor-mode <span class="coderef-off" id="coderef-smode-mask"><span class="linenr">20: </span> <span style="color: #00ffff;">lui</span> t1, 0x40 # Set the S-mode mask in t1</span> <span class="linenr">21: </span> <span style="color: #00ffff;">and</span> t1, t0, t1 <span class="coderef-off" id="coderef-check-smode"><span class="linenr">22: </span> <span style="color: #00ffff;">beqz</span> t1, 2f # Determine if the S-mode bit is set in misa</span> <span class="coderef-off" id="coderef-set-smode"><span class="linenr">23: </span> <span style="color: #00ffff;">addi</span> a0, a0, 1 # S-mode is available, increment a0</span> <span class="linenr">24: </span><span style="color: #87cefa;">2</span>: # Determine register width <span class="coderef-off" id="coderef-have-xlen"><span class="linenr">25: </span> <span style="color: #00ffff;">bgez</span> t0, 3f # Determine if a1 holds the register width</span> <span class="coderef-off" id="coderef-scale-xlen"><span class="linenr">26: </span> <span style="color: #00ffff;">slli</span> a1, a1, 1 # Multiply register width by 2</span> <span class="linenr">27: </span> <span style="color: #00ffff;">slli</span> t0, t0, 1 <span class="linenr">28: </span> <span style="color: #00ffff;">j</span> 2b <span class="linenr">29: </span><span style="color: #87cefa;">3</span>: <span class="linenr">30: </span> <span style="color: #00ffff;">ret</span> <span class="linenr">31: </span> </pre></div> <p>This function loads the <b>misa</b> Control and Status Register (CSR) into the temporary register <b>t0</b> on line <a class="coderef" href="#coderef-load-misa" onmouseout="CodeHighlightOff(this, 'coderef-load-misa');" onmouseover="CodeHighlightOn(this, 'coderef-load-misa');">11</a> using the <code>csrr</code> pseudo instruction. The minimum register width is 4-bytes, therefore the immediate 4 is loaded into register <b>a1</b> as an initial value (line <a class="coderef" href="#coderef-load-xlen" onmouseout="CodeHighlightOff(this, 'coderef-load-xlen');" onmouseover="CodeHighlightOn(this, 'coderef-load-xlen');">12</a>). Moreover, machine-mode is required by hardware implementations, thus the initial value of <b>a1</b> is 1.</p> <p>The least-significant 26-bits of the <b>misa</b> CSR are flags which indicate supported extensions; one for each letter of the alphabet. The <b>S</b> and <b>U</b> extensions are for supervisor and user mode respectively. Therefore, user mode will be available if bit 20 is set to 1. A bit mask is created on line <a class="coderef" href="#coderef-umode-mask" onmouseout="CodeHighlightOff(this, 'coderef-umode-mask');" onmouseover="CodeHighlightOn(this, 'coderef-umode-mask');">15</a>.</p> <blockquote><p>Since the mask is an immediate value that is wider than what is allowed by RISC-V I-type instructions, the <code>lui</code> instruction can be used to load the 20-bit immediate value then shift it left by 12-bits with a single instruction. Thus the immediate value 0x100 becomes 0x100000 when loaded following the load operation.</p> </blockquote> <p>A bit-wise <b>AND</b> is performed using the bit-mask and the <b>misa</b> CSR value to determine if user-mode is available. If the result of the <code>and</code> instruction is zero on line <a class="coderef" href="#coderef-check-umode" onmouseout="CodeHighlightOff(this, 'coderef-check-umode');" onmouseover="CodeHighlightOn(this, 'coderef-check-umode');">17</a>, user-mode is not supported. Otherwise the value of <b>a1</b> is incremented on line <a class="coderef" href="#coderef-set-umode" onmouseout="CodeHighlightOff(this, 'coderef-set-umode');" onmouseover="CodeHighlightOn(this, 'coderef-set-umode');">18</a>.</p> <p>Similarly supervisor mode is supported if bit 18 is set to 1. A bit mask is created on line <a class="coderef" href="#coderef-smode-mask" onmouseout="CodeHighlightOff(this, 'coderef-smode-mask');" onmouseover="CodeHighlightOn(this, 'coderef-smode-mask');">20</a>, then a bit-wise <b>AND</b> is performed with the value if the <b>misa</b> CSR on the next line. If the result is zero, then supervisor mode is not supported. Otherwise the number of privilege modes is incremented by 1.</p> <p>The final step of the <code>__system_check</code> function is to determine the width of the hart's registers. The <b>misa</b> CSR's most-significant 2-bits encodes the width of the registers used by the ISA: 1) 32-bits, 2) 64-bits, 3) 128-bits. Determining this is complicated by the fact that the CSR's width is not known prior to checking this field. To overcome this, a check is performed to determine if the register's value is negative, in which case the most-significant bit must be set to 1. Therefore the number of bytes in <b>a1</b> will be multiplied by 2 (by shifting it to the left by 1 bit on line <a class="coderef" href="#coderef-scale-xlen" onmouseout="CodeHighlightOff(this, 'coderef-scale-xlen');" onmouseover="CodeHighlightOn(this, 'coderef-scale-xlen');">26</a>), the value of <b>t0</b> is shifted by 1 to the left, and the check is performed again. If the value of t0 is determined to be positive on line <a class="coderef" href="#coderef-have-xlen" onmouseout="CodeHighlightOff(this, 'coderef-have-xlen');" onmouseover="CodeHighlightOn(this, 'coderef-have-xlen');">25</a> (i.e. the msb is 0), then the <code>__system_check</code> function has completed its probe. The following listing illustrates the main program which performs the system check:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr">1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr">2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr">3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr">4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr">5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr">6: </span> <span style="color: #00ffff;">la</span> sp, _stack_end <span class="coderef-off" id="coderef-call-systemcheck"><span class="linenr">7: </span> <span style="color: #00ffff;">call</span> __system_check</span> <span class="linenr">8: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">9: </span> </pre></div> <p>The <code>__system_check</code> function is called on line <a class="coderef" href="#coderef-call-systemcheck" onmouseout="CodeHighlightOff(this, 'coderef-call-systemcheck');" onmouseover="CodeHighlightOn(this, 'coderef-call-systemcheck');">7</a>. When this function returns the register <b>a0</b> should hold the number of supported privilege modes, and register <b>a1</b> should hold the number of bytes in a register. If this program is run in <code>qemu</code>, the <code>info registers</code> command can be used at the console to determine the hardware details:</p> <div class="org-src-container"> <pre class="src src-sh"> make chapter5 qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel chapter5.elf QEMU 3.1.0 monitor - type <span style="color: #ffa07a;">'help'</span> for more information (qemu) info registers pc 000000008000000c mhartid 0000000000000000 mstatus 0000000000000000 mip 0000000000000000 mie 0000000000000000 mideleg 0000000000000000 medeleg 0000000000000000 mtvec 0000000000000000 mepc 0000000000000000 mcause 0000000000000000 zero 0000000000000000 ra 000000008000000c sp 0000000080009000 gp 0000000000000000 tp 0000000000000000 t0 000000000028225a t1 0000000000040000 t2 0000000000000000 s0 0000000000000000 s1 0000000000000000 a0 0000000000000003 a1 0000000000000008 ... </pre></div> <p>The value of <b>a0</b> is 3, therefore the <code>qemu</code> VirtIO platform supports all three privilege modes. The value of <b>a1</b> is 8, which means that the register width is 64-bits (8 bytes). This value could be used as a stack offset allowing for the definition of the function call preamble and postamble as part of a library. This library could then be used to define an operating system's ABI.</p> </div> </div> <div class="outline-2" id="outline-container-org59973e7"> <h2 id="org59973e7">It's a Trap!</h2> <div class="outline-text-2" id="text-org59973e7"> <p>Typically, systems should run in the most restricted environment possible in order to minimize catastrophes in the event of a system fault. However, when an exceptional event occurs, the system may want to raise its privilege level in order to deal with it. These types of events are often associated with an interrupt.</p> <p>The machine-mode status CSR, <b>mstatus</b>, allows some control over a hart's operating state, including enabling or disabling global interrupts. Three fields are defined in the register's least significant four bits for this purpose; one for each of the privilege modes.</p> <blockquote><p>specific interrupt types for each privilege mode must be enabled individually via the <b>mie</b> register which will be described later</p> </blockquote> <p>The fields defined in the <b>mstatus</b> register are illustrated in the following table:</p> <table border="2" cellpadding="6" cellspacing="0" frame="hsides" rules="groups"><caption class="t-above"><span class="table-number">Table 1:</span> mstatus CSR register</caption> <colgroup><col class="org-right" /><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /></colgroup><thead><tr><th class="org-right" scope="col">Bit</th> <th class="org-left" scope="col">0</th> <th class="org-left" scope="col">1</th> <th class="org-left" scope="col">2</th> <th class="org-left" scope="col">3</th> <th class="org-left" scope="col">4</th> <th class="org-left" scope="col">5</th> <th class="org-left" scope="col">6</th> <th class="org-left" scope="col">7</th> </tr></thead><tbody><tr><td class="org-right">0</td> <td class="org-left">UIE</td> <td class="org-left">SIE</td> <td class="org-left"> </td> <td class="org-left">MIE</td> <td class="org-left">UPIE</td> <td class="org-left">SPIE</td> <td class="org-left"> </td> <td class="org-left">MPIE</td> </tr><tr><td class="org-right">+8</td> <td class="org-left">MPP</td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left">MPP[0]</td> <td class="org-left">MPP[1]</td> <td class="org-left">FS[0]</td> <td class="org-left">FS[1]</td> <td class="org-left">XS[0]</td> </tr><tr><td class="org-right">+16</td> <td class="org-left">XS[1]</td> <td class="org-left">MPRV</td> <td class="org-left">SUM</td> <td class="org-left">MXR</td> <td class="org-left">TVM</td> <td class="org-left">TW</td> <td class="org-left">TSR</td> <td class="org-left"> </td> </tr><tr><td class="org-right">+24</td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> </tr><tr><td class="org-right">+32</td> <td class="org-left">UXL[0]</td> <td class="org-left">UXL[1]</td> <td class="org-left">SXL[0]</td> <td class="org-left">SXL[1]</td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> </tr><tr><td class="org-right">+40</td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> </tr><tr><td class="org-right">+48</td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> </tr><tr><td class="org-right">+56</td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left"> </td> <td class="org-left">SD</td> </tr></tbody></table><p>The <b>UIE</b>, <b>SIE</b>, and <b>MIE</b> fields will enable interrupts globally for the user, supervisor, and machine modes respectively. If the $x$IE fields value is set to 1, interrupts will be enabled globally for privilege mode \(x\) and any privilege mode \(y &lt; x\) provided $y$IE is also set to 1. If the hart is operating at privilege level \(x\), interrupts at privilege levels inferior to \(x\) will be disabled regardless of the state of the associated interrupt bit in <b>mstatus</b>.</p> <p>To be of any use, there must be a mechanism to handle interrupts. The <b>BASE</b> field of the <b>mtvec</b> CSR can be set to the base address of a trap-vector to handle interrupts. The <b>MODE</b> field, that occupies the least-significant two bits of <b>mtvec</b>, specify the trap mode. When the <b>MODE</b> field is set to 0, the trap mode will be set to call the handler directly. Otherwise the base address expresses the base of a vector of trap handlers; one for each interrupt type indexed by the interrupt code. For the time being, a single interrupt handler will be used to dispatch to the appropriate handler for interrupts or exceptions.</p> <p>The code that follows illustrates a skeleton for a trap handler implementation in direct mode:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.text</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> trap_handler <span class="linenr"> 4: </span><span style="color: #87cefa;">trap_handler</span>: <span class="linenr"> 5: </span> # Trap handler preamble (save registers). <span class="coderef-off" id="coderef-trap-load scratch"><span class="linenr"> 6: </span> <span style="color: #00ffff;">csrrw</span> a0, mscratch, a0</span> <span class="linenr"> 7: </span> <span style="color: #00ffff;">sd</span> a1, 0(a0) <span class="linenr"> 8: </span> <span style="color: #00ffff;">sd</span> a2, 8(a0) <span class="linenr"> 9: </span> <span style="color: #00ffff;">sd</span> a3, 16(a0) <span class="linenr">10: </span> <span style="color: #00ffff;">sd</span> a4, 24(a0) <span class="linenr">11: </span> # Decode the cause of the interrupt. <span class="coderef-off" id="coderef-trap-get cause"><span class="linenr">12: </span> <span style="color: #00ffff;">csrr</span> a1, mcause</span> <span class="coderef-off" id="coderef-trap-check exception"><span class="linenr">13: </span> <span style="color: #00ffff;">bgez</span> a1, exception</span> <span class="linenr">14: </span><span style="color: #87cefa;">interrupt</span>: <span class="linenr">15: </span> <span style="color: #00ffff;">andi</span> a1, a1, 0x3f # Isolate the cause field <span class="linenr">16: </span> # TODO: Dispatch to specific interrupt handler <span class="linenr">17: </span> <span style="color: #00ffff;">j</span> trap_handler_restore_state <span class="linenr">18: </span><span style="color: #87cefa;">exception</span>: <span class="linenr">19: </span> <span style="color: #00ffff;">addi</span> a1, a1, 0x3f # Isolate the cause field <span class="linenr">20: </span> # TODO: Dispatch to specific exception handler <span class="linenr">21: </span><span style="color: #87cefa;">trap_handler_restore_state</span>: <span class="linenr">22: </span> <span style="color: #00ffff;">ld</span> a4, 24(a0) <span class="linenr">23: </span> <span style="color: #00ffff;">ld</span> a3, 16(a0) <span class="linenr">24: </span> <span style="color: #00ffff;">ld</span> a2, 8(a0) <span class="linenr">25: </span> <span style="color: #00ffff;">ld</span> a1, 0(a0) <span class="linenr">26: </span> <span style="color: #00ffff;">csrrw</span> a0, mscratch, a0 <span class="linenr">27: </span> <span style="color: #00ffff;">mret</span> </pre></div> <p>The first instruction on line <a class="coderef" href="#coderef-trap-load scratch" onmouseout="CodeHighlightOff(this, 'coderef-trap-load scratch');" onmouseover="CodeHighlightOn(this, 'coderef-trap-load scratch');">6</a> will atomically swap the values of the <b>mscratch</b> CSR and <b>a0</b>. This register is defined to provide additional data to trap handlers. Typically, this should be set to an memory address of a buffer where register data can be saved while the handler is active.</p> <p>The next four lines save the contents of registers <b>a1</b>-<b>a4</b> in the memory buffer located at the address in <b>mscratch</b>, then the value of the <b>mcause</b> CSR is copied into <b>a1</b> on line <a class="coderef" href="#coderef-trap-get cause" onmouseout="CodeHighlightOff(this, 'coderef-trap-get cause');" onmouseover="CodeHighlightOn(this, 'coderef-trap-get cause');">12</a>. The <b>mcause</b> register is used to indicate the cause of synchronous and asynchronous exceptions. If the most significant bit in this register is zero, then the trap was caused by a synchronous exception. This can be determined by testing the value of <b>a1</b> to see if it is greater than or equal to zero (on line <a class="coderef" href="#coderef-trap-check exception" onmouseout="CodeHighlightOff(this, 'coderef-trap-check exception');" onmouseover="CodeHighlightOn(this, 'coderef-trap-check exception');">13</a>; if the msb is 1, the register will be negative and the execution will fall through to the interrupt handler code.</p> <p>To use the trap handler, it's address must be set in the base field of the <b>mtvec</b> CSR. This is illustrated in the following program:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">la</span> sp, _stack_end <span class="coderef-off" id="coderef-load scratch"><span class="linenr"> 7: </span> <span style="color: #00ffff;">la</span> a0, scratch</span> <span class="coderef-off" id="coderef-set scratch"><span class="linenr"> 8: </span> <span style="color: #00ffff;">csrrw</span> a0, mscratch, a0</span> <span class="coderef-off" id="coderef-load trap handler"><span class="linenr"> 9: </span> <span style="color: #00ffff;">la</span> a0, trap_handler</span> <span class="coderef-off" id="coderef-set trap handler"><span class="linenr">10: </span> <span style="color: #00ffff;">csrrw</span> a0, mtvec, a0</span> <span class="linenr">11: </span> <span style="color: #00ffff;">call</span> __system_check <span class="linenr">12: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">13: </span> <span style="color: #00ffff;">.bss</span> <span class="coderef-off" id="coderef-define scratch"><span class="linenr">14: </span><span style="color: #87cefa;">scratch</span>: .dword 0, 0, 0, 0</span> </pre></div> <p>The scratch area where the register state can be saved is defined on line <a class="coderef" href="#coderef-define scratch" onmouseout="CodeHighlightOff(this, 'coderef-define scratch');" onmouseover="CodeHighlightOn(this, 'coderef-define scratch');">14</a>. This allocates 32 bytes of space to save the contents of up to 4 registers which is enough for the current <code>trap_handler</code> implementation. The address of the scratch is loaded into register <b>a0</b> on line <a class="coderef" href="#coderef-load scratch" onmouseout="CodeHighlightOff(this, 'coderef-load scratch');" onmouseover="CodeHighlightOn(this, 'coderef-load scratch');">7</a>. This register's value is then swapped with the value of the <b>mscratch</b> CSR on line <a class="coderef" href="#coderef-set scratch" onmouseout="CodeHighlightOff(this, 'coderef-set scratch');" onmouseover="CodeHighlightOn(this, 'coderef-set scratch');">8</a>.</p> <p>The address of the trap handler function is loaded into register <b>a0</b> on line <a class="coderef" href="#coderef-load trap handler" onmouseout="CodeHighlightOff(this, 'coderef-load trap handler');" onmouseover="CodeHighlightOn(this, 'coderef-load trap handler');">9</a>, then it is swapped with the contents of the <b>mtvec</b> CSR on line <a class="coderef" href="#coderef-set trap handler" onmouseout="CodeHighlightOff(this, 'coderef-set trap handler');" onmouseover="CodeHighlightOn(this, 'coderef-set trap handler');">10</a>. The <b>MODE</b> field is left as zero to set the trap mode to direct; which will cause all synchronous and asynchronous interrupts to branch to the base address.</p> </div> </div> <div class="outline-2" id="outline-container-orga01113b"> <h2 id="orga01113b">What's the Time?</h2> <div class="outline-text-2" id="text-orga01113b"> <p>A platform's real-time counter is one of the possible sources of asynchronous exceptions that can cause an interrupt. The timer is typically external to the processing core. The VirtIO machine of the QEMU emulator includes the core-level interrupt module (CLINT). This module defines the <b>mtime</b> register that exposes the current value of the real-time counter. This value expresses the number of clock cycles that have elapsed since the processor was reset. This does not represent the real time, but a count of real-time intervals (determined by the oscillator frequency).</p> <p>The <b>mtime</b> register is mapped to a particular address in physical memory. The actual address is specified in the memory map of the VirtIO machine in the QEMU system (see <a href="https://git.qemu.org/?p=qemu.git;a=blob_plain;f=hw/riscv/virt.c;hb=refs/heads/stable-3.1">hw/riscv/virt.c</a>). The listing that follows illustrates a function to retrieve the current real-time counter value:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="coderef-off" id="coderef-clint base"><span class="linenr"> 1: </span> <span style="color: #00ffff;">.equ</span> CLINT_BASE, 0x2000000 # The base address of the CLINT module</span> <span class="coderef-off" id="coderef-mtime offset"><span class="linenr"> 2: </span> <span style="color: #00ffff;">.equ</span> CLINT_MTIME, 0xbff8 # The offset of the MTIME register</span> <span class="coderef-off" id="coderef-ld-mtime macro"><span class="linenr"> 3: </span> <span style="color: #00ffff;">.macro</span> ld_mtime rd # Macro to access the MTIME memory mapped register</span> <span class="linenr"> 4: </span> <span style="color: #00ffff;">li</span> t0, CLINT_BASE <span class="linenr"> 5: </span> <span style="color: #00ffff;">li</span> t1, CLINT_MTIME <span class="coderef-off" id="coderef-mtime addr"><span class="linenr"> 6: </span> <span style="color: #00ffff;">add</span> t0, t0, t1 # Determine the absolute address of MTIME</span> <span class="coderef-off" id="coderef-read counter"><span class="linenr"> 7: </span> <span style="color: #00ffff;">ld</span> \rd, 0(t0) # Read the counter value</span> <span class="linenr"> 8: </span> <span style="color: #00ffff;">.endm</span> <span class="linenr"> 9: </span> <span class="linenr">10: </span> <span style="color: #00ffff;">.text</span> <span class="linenr">11: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr">12: </span> <span style="color: #00ffff;">.global</span> __clock_cycle <span class="linenr">13: </span><span style="color: #87cefa;">__clock_cycle</span>: <span class="linenr">14: </span> # Retrieve the current value of the real-time counter. <span class="linenr">15: </span> # <span class="linenr">16: </span> # Inputs: None <span class="linenr">17: </span> # <span class="linenr">18: </span> # Returns: <span class="linenr">19: </span> # - a0: The current value of the mtime register. <span class="linenr">20: </span> <span style="color: #00ffff;">ld</span> ld_mtime a0 <span class="linenr">21: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>The CLINT_BASE symbol is defined on line <a class="coderef" href="#coderef-clint base" onmouseout="CodeHighlightOff(this, 'coderef-clint base');" onmouseover="CodeHighlightOn(this, 'coderef-clint base');">1</a>, its value is the absolute base address of the memory mapped registers of the CLINT module (see <a href="https://git.qemu.org/?p=qemu.git;a=blob_plain;f=include/hw/riscv/sifive_clint.h;hb=refs/heads/stable-3.1">include/hw/riscv/sifive_clint.h</a> of the QEMU source). The CLINT_MTIME symbol, defined on line <a class="coderef" href="#coderef-mtime offset" onmouseout="CodeHighlightOff(this, 'coderef-mtime offset');" onmouseover="CodeHighlightOn(this, 'coderef-mtime offset');">2</a>, specifies the offset of the <b>mtime</b> register relative to the CLINT's base memory address. The offset is added to the base address and its result stored in register <b>t0</b> on line <a class="coderef" href="#coderef-mtime addr" onmouseout="CodeHighlightOff(this, 'coderef-mtime addr');" onmouseover="CodeHighlightOn(this, 'coderef-mtime addr');">6</a>. Finally the value of the the real-time counter is retrieved on line <a class="coderef" href="#coderef-read counter" onmouseout="CodeHighlightOff(this, 'coderef-read counter');" onmouseover="CodeHighlightOn(this, 'coderef-read counter');">7</a>. All of these operations are defined in a macro starting on line <a class="coderef" href="#coderef-ld-mtime macro" onmouseout="CodeHighlightOff(this, 'coderef-ld-mtime macro');" onmouseover="CodeHighlightOn(this, 'coderef-ld-mtime macro');">3</a>.</p> <p>The <b>mtime</b> register is useful to determine the time since the board was reset. However, it cannot generate interrupts by itself. The <b>mtimecmp</b> memory mapped register will cause a timer interrupt to be posted when its value is less than the value contained in <b>mtime</b>. In other words, if when the periodically increasing value of <b>mtime</b> exceeds that contained in <b>mtimecmp</b>, a timer interrupt will be posted (provided that timer interrupts are enabled). Therefore to receive an interrupt after some fixed interval of time, the current value of <b>mtime</b> must be retrieved, then the timeout value must be added thereto and the result saved in the <b>mtimecmp</b> register. This process is illustrated in the following listing:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="coderef-off" id="coderef-mtimecmp_offset"><span class="linenr"> 1: </span> <span style="color: #00ffff;">.equ</span> CLINT_MTIMECMP, 0x4000 # The offset of the MTIMECMP register</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.macro</span> st_mtimecmp rs <span class="linenr"> 3: </span> <span style="color: #00ffff;">li</span> t0, CLINT_BASE <span class="linenr"> 4: </span> <span style="color: #00ffff;">li</span> t1, CLINT_MTIMECMP <span class="linenr"> 5: </span> <span style="color: #00ffff;">add</span> t1, t0, t1 <span class="linenr"> 6: </span> <span style="color: #00ffff;">sd</span> \rs, 0(t1) <span class="linenr"> 7: </span> <span style="color: #00ffff;">.endm</span> <span class="linenr"> 8: </span> <span style="color: #00ffff;">.global</span> __timer_create <span class="linenr"> 9: </span><span style="color: #87cefa;">__timer_create</span>: <span class="linenr">10: </span> # Set a timer to trigger an interrupt when a given number of <span class="linenr">11: </span> # clock cycles have elapsed. <span class="linenr">12: </span> # <span class="linenr">13: </span> # Inputs: <span class="linenr">14: </span> # - a0: The timeout value in clock cycles. <span class="linenr">15: </span> <span style="color: #00ffff;">ld_mtime</span> t1 <span class="linenr">16: </span> <span style="color: #00ffff;">ld</span> t1, 0(t1) <span class="coderef-off" id="coderef-set_timeout"><span class="linenr">17: </span> <span style="color: #00ffff;">add</span> a0, a0, t1 # Add the timeout to the clock cycle</span> <span class="linenr">18: </span> <span style="color: #00ffff;">st_mtimecmp</span> a0 <span class="linenr">19: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>The <code>__timer_create</code> function will set a timer to trigger an interrupt after the given number of clock cycles have elapsed. This code re-uses the macro defined previously to read the current value of the real-time counter, then adds the desired number of cycles thereto, and writes the result to the <b>mtimecmp</b> register. When <b>mtime</b>'s value is greater than the value in <b>mtimecmp</b>, the MTIP field of the <b>mip</b> CSR register will be asserted to indicate that a timer interrupt is pending, and the trap handler will be called. The following code will set up this process:</p> <div class="org-src-container"> <pre class="src src-asm" id="org0bbc010"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span> <span style="color: #00ffff;">.global</span> CLOCK_MONOTONIC <span class="linenr"> 6: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 7: </span> <span style="color: #00ffff;">la</span> sp, _stack_end <span class="coderef-off" id="coderef-load-mtrap"><span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> t0, __mtrap_handler # Load trap vector address</span> <span class="linenr"> 9: </span> <span style="color: #00ffff;">csrrw</span> zero, mtvec, t0 <span class="linenr">10: </span> <span style="color: #00ffff;">li</span> t0, 0b1&lt;&lt;3 <span class="linenr">11: </span> <span style="color: #00ffff;">csrrs</span> t0, mstatus, t0 # Enable interrupts globally (ref:set-mstatus.MIE) <span class="linenr">12: </span> <span style="color: #00ffff;">li</span> t0, 0b1&lt;&lt;7 <span class="linenr">13: </span> <span style="color: #00ffff;">csrrs</span> t0, mie, t0 # Enable timer interrupts (ref:set-mie.MTIE) <span class="coderef-off" id="coderef-load-timeout"><span class="linenr">14: </span><span style="color: #87cefa;">1</span>: <span style="color: #00ffff;">li</span> a0, 0x10000 # Set the timeout value</span> <span class="coderef-off" id="coderef-trap-settimer"><span class="linenr">15: </span> <span style="color: #00ffff;">call</span> __timer_create # Set the timer</span> <span class="coderef-off" id="coderef-wfi"><span class="linenr">16: </span> <span style="color: #00ffff;">wfi</span> # Wait for interrupts</span> <span class="linenr">17: </span> <span style="color: #00ffff;">j</span> 1b <span class="linenr">18: </span> <span class="linenr">19: </span> <span style="color: #00ffff;">.align</span> 2 <span class="coderef-off" id="coderef-mtrap-handler"><span class="linenr">20: </span><span style="color: #87cefa;">__mtrap_handler</span>: # Machine interrupt handler</span> <span class="coderef-off" id="coderef-trap-get-cause"><span class="linenr">21: </span> <span style="color: #00ffff;">csrrc</span> t0, mcause, zero # Get the cause of the interrupt</span> <span class="coderef-off" id="coderef-trap-exception"><span class="linenr">22: </span> <span style="color: #00ffff;">bgez</span> t0, 2f # Exit on an exception</span> <span class="linenr">23: </span> <span style="color: #00ffff;">slli</span> t0, t0, 1 <span class="linenr">24: </span> <span style="color: #00ffff;">srli</span> t0, t0, 1 <span class="linenr">25: </span> <span style="color: #00ffff;">li</span> t1, 7 # The timer interrupt has code 7. <span class="coderef-off" id="coderef-trap-check-mtip"><span class="linenr">26: </span> <span style="color: #00ffff;">bne</span> t0, t1, 2f # Check for timer interrupts</span> <span class="linenr">27: </span> <span style="color: #00ffff;">addi</span> s0, s0, 1 # Increment the interrupt count. <span class="coderef-off" id="coderef-trap-mret"><span class="linenr">28: </span><span style="color: #87cefa;">2</span>: <span style="color: #00ffff;">mret</span> # Machine trap return</span> </pre></div> <p>After setting up the stack, this program loads the address for the trap handler on line <a class="coderef" href="#coderef-load-mtrap" onmouseout="CodeHighlightOff(this, 'coderef-load-mtrap');" onmouseover="CodeHighlightOn(this, 'coderef-load-mtrap');">8</a>, and stores this address in the <b>mtvec</b> CSR. Machine-mode interrupts are then enabled globally by setting bit 3 of the <b>mstatus</b> CSR to 1 (on line <a class="coderef" href="#coderef-set-mstatus.MIE" onmouseout="CodeHighlightOff(this, 'coderef-set-mstatus.MIE');" onmouseover="CodeHighlightOn(this, 'coderef-set-mstatus.MIE');">11</a>), and M-mode timer interrupts are enabled (on line <a class="coderef" href="#coderef-set-mie.MTIE" onmouseout="CodeHighlightOff(this, 'coderef-set-mie.MTIE');" onmouseover="CodeHighlightOn(this, 'coderef-set-mie.MTIE');">13</a>) by setting bit 7 of the <b>mie</b> CSR to 1. On line <a class="coderef" href="#coderef-load-timeout" onmouseout="CodeHighlightOff(this, 'coderef-load-timeout');" onmouseover="CodeHighlightOn(this, 'coderef-load-timeout');">14</a>, an immediate value is loaded into register <b>a0</b>. This value will be used to create the timer on line <a class="coderef" href="#coderef-trap-settimer" onmouseout="CodeHighlightOff(this, 'coderef-trap-settimer');" onmouseover="CodeHighlightOn(this, 'coderef-trap-settimer');">15</a>. Once the timer is armed, the <code>wfi</code> instruction is used on line <a class="coderef" href="#coderef-wfi" onmouseout="CodeHighlightOff(this, 'coderef-wfi');" onmouseover="CodeHighlightOn(this, 'coderef-wfi');">16</a> to wait for an exception to occur at which point control jumps to the interrupt handler. When the handler returns, control jumps back to line <a class="coderef" href="#coderef-load-timeout" onmouseout="CodeHighlightOff(this, 'coderef-load-timeout');" onmouseover="CodeHighlightOn(this, 'coderef-load-timeout');">14</a>, and the timer is armed again. This should cause periodic calls to the trap vector.</p> <p>A machine-level trap handler is defined on line <a class="coderef" href="#coderef-mtrap-handler" onmouseout="CodeHighlightOff(this, 'coderef-mtrap-handler');" onmouseover="CodeHighlightOn(this, 'coderef-mtrap-handler');">20</a>. This handler will increment the value in register <b>s0</b> every time a timer exception is triggered. The cause of the interrupt is determined on line <a class="coderef" href="#coderef-trap-get-cause" onmouseout="CodeHighlightOff(this, 'coderef-trap-get-cause');" onmouseover="CodeHighlightOn(this, 'coderef-trap-get-cause');">21</a> which atomically reads the value of the <b>mcause</b> CSR and sets its value to zero. If the value of this register was greater-than, or equal to, zero (line <a class="coderef" href="#coderef-trap-exception" onmouseout="CodeHighlightOff(this, 'coderef-trap-exception');" onmouseover="CodeHighlightOn(this, 'coderef-trap-exception');">22</a>), the interrupt was caused by a synchronous exception (whereby the most-significant bit of the register will be set to 1). In the case of a synchronous exception, the handler simply exists by calling the <code>mret</code> instruction which returns control to the instruction that was executing when the exception occurred. Otherwise the next two lines will shift off the most significant bit (i.e. set it to zero). The interrupt code is checked on line <a class="coderef" href="#coderef-trap-check-mtip" onmouseout="CodeHighlightOff(this, 'coderef-trap-check-mtip');" onmouseover="CodeHighlightOn(this, 'coderef-trap-check-mtip');">26</a>, if it corresponds with the timer interrupt, the value of <b>s0</b> is incremented by 1.</p> <p>The state of the hart can be inspected via the <code>info registers</code> command in <code>qemu</code>. If the timing of the snapshot is such that the trap handler is executing the machine CSRs will have the following state:</p> <div class="org-src-container"> <pre class="src src-sh"> (qemu) info registers pc 000000008000004c mhartid 0000000000000000 mstatus 0000000000001880 mip 0000000000000080 mie 0000000000000080 mideleg 0000000000000000 medeleg 0000000000000000 mtvec 0000000080000048 mepc 0000000080000040 mcause 8000000000000007 </pre></div> <p>As usual, the <b>pc</b> register shows the current address of the active instruction, however, in this case control has jumped into the trap handler. When an exception is triggered, the following operations are executed atomically:</p> <ol class="org-ol"><li>Interrupts are disabled globally (bit 3 of <b>mstatus</b> is set to 0).</li> <li>The <b>mstatus.MPIE</b> field (which represents the previous global interrupt mode) is set the value of <b>mstatus.MIE</b></li> <li>Interrupts are disabled globally by setting the <b>mstatus.MIE</b> field is set to zero.</li> <li>The <b>mstatus.MPP</b> field (bits 12:11) is set to 0b11 which indicates the privilege level prior to the exception being raised.</li> <li>The address of the instruction that follows the last one to execute before the exception was raised is saved in the <b>mepc</b> register.</li> <li>The cause of the exception is written in register <b>mcause</b>.</li> <li>The <b>pc</b> register is set to the address in <b>mtvec</b>.</li> </ol><p>When the interrupt handler completes its task, the <code>mret</code> instruction is executed which will reverse this process. Control will be set to the address in <b>mepc</b>, and the value of <b>mstatus.MPIE</b> will be written to <b>mstatus.MIE</b> and cleared. The <b>mip</b> register will be cleared to indicate that there are no longer interrupts pending. Control flow will continue from where it was interrupted, and the core will be ready to handle new interrupts. More importantly, The hart will be returned to the privilege mode specified in the <b>mstatus.MPP</b> field. This behaviour can be exploited to set the current privilege mode of the processor.</p> </div> </div> <div class="outline-2" id="outline-container-orgd8dddb4"> <h2 id="orgd8dddb4">Enter the User</h2> <div class="outline-text-2" id="text-orgd8dddb4"> <p>The default privilege level when the processor is reset is machine mode. Programs running at this level have full control over the processor via the control and status registers. However, this opens up the system to abuse. To limit any damage that is possible by a wayward program, it is better to run in user mode. Getting to user mode is a simple matter of setting up the <b>mstatus</b> register and returning from M-mode via the <code>mret</code> instruction. The following macro will set the priviledge mode to the specified level:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> # Set the privilege mode to that specified by the immediate <span class="linenr"> 2: </span> # value. <span class="linenr"> 3: </span> <span style="color: #00ffff;">.macro</span> setmode imm <span class="linenr"> 4: </span> <span style="color: #00ffff;">li</span> t0, 0x1800 #(ref:clear mstatus.MPP) <span class="coderef-off" id="coderef-load priv-mode"><span class="linenr"> 5: </span> <span style="color: #00ffff;">li</span> t0, \imm</span> <span class="coderef-off" id="coderef-create priv mask"><span class="linenr"> 6: </span> <span style="color: #00ffff;">slli</span> t0, t0, 11</span> <span class="linenr"> 7: </span> <span style="color: #00ffff;">csrrs</span> t0, mstatus, t0 #(ref:set mstatus.MPP) <span class="coderef-off" id="coderef-load return address"><span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> t0, 1f</span> <span class="coderef-off" id="coderef-set mepc"><span class="linenr"> 9: </span> <span style="color: #00ffff;">csrrw</span> zero, mepc, t0</span> <span class="linenr">10: </span> <span style="color: #00ffff;">mret</span> <span class="coderef-off" id="coderef-return location"><span class="linenr">11: </span><span style="color: #87cefa;">1</span>:</span> <span class="linenr">12: </span> <span style="color: #00ffff;">.endm</span> </pre></div> <p>This macro will set the privilege mode to the value specified as an immediate by first clearing the <b>mstatus.MPP</b> field on line <a class="coderef" href="#coderef-clear mstatus.MPP" onmouseout="CodeHighlightOff(this, 'coderef-clear mstatus.MPP');" onmouseover="CodeHighlightOn(this, 'coderef-clear mstatus.MPP');">4</a>, then replacing it with the encoding of the desired privilege mode. The following table lists the privilege modes and their encodings:</p> <table border="2" cellpadding="6" cellspacing="0" frame="hsides" rules="groups"><caption class="t-above"><span class="table-number">Table 2:</span> Privilege level encodings</caption> <colgroup><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-right" /></colgroup><thead><tr><th class="org-left" scope="col">Level</th> <th class="org-left" scope="col">Name</th> <th class="org-left" scope="col">Encoding</th> <th class="org-right" scope="col">Immediate</th> </tr></thead><tbody><tr><td class="org-left">U</td> <td class="org-left">User/Application</td> <td class="org-left">0b00</td> <td class="org-right">0</td> </tr><tr><td class="org-left">S</td> <td class="org-left">Supervisor</td> <td class="org-left">0b01</td> <td class="org-right">1</td> </tr><tr><td class="org-left">M</td> <td class="org-left">Machine</td> <td class="org-left">0b11</td> <td class="org-right">3</td> </tr></tbody></table><p>The fourth column of this table lists the immediate value that should be supplied to the macro to set the associated privilege mode. The encoded privilege value is loaded into register <b>t0</b> on line <a class="coderef" href="#coderef-load priv-mode" onmouseout="CodeHighlightOff(this, 'coderef-load priv-mode');" onmouseover="CodeHighlightOn(this, 'coderef-load priv-mode');">5</a>, then shifted to the right position on the next line. This field is set via the <code>csrrs</code> instruction on line <a class="coderef" href="#coderef-set mstatus.MPP" onmouseout="CodeHighlightOff(this, 'coderef-set mstatus.MPP');" onmouseover="CodeHighlightOn(this, 'coderef-set mstatus.MPP');">7</a> which effectively sets <b>mstatus</b> to the bit-wise <b>OR</b> of its previous value with the value in <b>t0</b>. However, before returning from machine-mode, the <b>mepc</b> register must be updated with the address of the instruction to which control will return.</p> <p>The address immediately following the <code>mret</code> instruction is loaded into register <b>t0</b> on line <a class="coderef" href="#coderef-load return address" onmouseout="CodeHighlightOff(this, 'coderef-load return address');" onmouseover="CodeHighlightOn(this, 'coderef-load return address');">8</a>, then written to <b>mepc</b> on line <a class="coderef" href="#coderef-set mepc" onmouseout="CodeHighlightOff(this, 'coderef-set mepc');" onmouseover="CodeHighlightOn(this, 'coderef-set mepc');">9</a>. When the <code>mret</code> instruction is executed, <b>pc</b> will be set to this address. If the main program is updated to invoke this macro with the argument 0 just before the <code>wfi</code>, the program should be in U-mode by the time the timer expires:</p> <div class="org-src-container"> <pre class="src src-asm" id="org6f4fee5"> <span class="linenr">26: </span><span style="color: #87cefa;">1</span>: <span style="color: #00ffff;">li</span> a0, 0x10000 # Set the timeout value <span class="linenr">27: </span> <span style="color: #00ffff;">call</span> __timer_create # Set the timer <span class="coderef-off" id="coderef-set U-mode"><span class="linenr">28: </span> <span style="color: #00ffff;">setmode</span> 0</span> <span class="linenr">29: </span> <span style="color: #00ffff;">wfi</span> <span class="linenr">30: </span> <span style="color: #00ffff;">j</span> 1b <span class="linenr">31: </span> </pre></div> <p>Anything that runs following the call to the <code>setmode</code> macro will be executing in U-mode. However, the trap handler will execute in machine mode. Therefore the trap handler is useful for performing tasks that require machine mode privilege. Fortunately, traps can occur for asynchronous interrupts as well as synchronous exceptions. Therefore the trap handler can be used to implement system calls.</p> <p>The <code>ecall</code> instruction is an environment call which raises a synchronous exception, and sets <b>mcause</b> to the code indicating the active privilege mode when it was executed. The trap handler can be updated to jump to the appropriate function, which will execute in machine mode, then returning to the original pivilege mode when it is complete. The trap handler must be updated to handle the system call:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 2: </span><span style="color: #87cefa;">__mtrap_handler</span>: <span class="linenr"> 3: </span> <span style="color: #00ffff;">csrr</span> t0, mcause <span class="coderef-off" id="coderef-Check for exception"><span class="linenr"> 4: </span> <span style="color: #00ffff;">bgez</span> t0, 1f</span> <span class="linenr"> 5: </span> <span style="color: #00ffff;">slli</span> t0, t0, 1 <span class="linenr"> 6: </span> <span style="color: #00ffff;">srli</span> t0, t0, 1 <span class="coderef-off" id="coderef-Timer interrupt code"><span class="linenr"> 7: </span> <span style="color: #00ffff;">li</span> t1, 7</span> <span class="linenr"> 8: </span> <span style="color: #00ffff;">bne</span> t0, t1, 2f <span class="coderef-off" id="coderef-Increment interrupt count"><span class="linenr"> 9: </span> <span style="color: #00ffff;">addi</span> s0, s0, 1</span> <span class="linenr">10: </span> <span style="color: #00ffff;">j</span> 2f <span class="linenr">11: </span><span style="color: #87cefa;">1</span>: <span class="coderef-off" id="coderef-U-mode ecall code"><span class="linenr">12: </span> <span style="color: #00ffff;">li</span> t1, 8</span> <span class="linenr">13: </span> <span style="color: #00ffff;">bne</span> t0, t1, 2f <span class="coderef-off" id="coderef-save registers"><span class="linenr">14: </span> <span style="color: #00ffff;">push_stack</span></span> <span class="linenr">15: </span> <span style="color: #00ffff;">call</span> __syscall <span class="coderef-off" id="coderef-restore registers"><span class="linenr">16: </span> <span style="color: #00ffff;">pop_stack</span></span> <span class="coderef-off" id="coderef-Load mret address"><span class="linenr">17: </span> <span style="color: #00ffff;">csrr</span> t0, mepc</span> <span class="coderef-off" id="coderef-Set mret address"><span class="linenr">18: </span> <span style="color: #00ffff;">addi</span> t0, t0, 4</span> <span class="linenr">19: </span> <span style="color: #00ffff;">csrrw</span> zero, mepc, t0 <span class="linenr">20: </span><span style="color: #87cefa;">2</span>: <span style="color: #00ffff;">csrrw</span> t0, mcause, zero <span class="coderef-off" id="coderef-Machine trap return"><span class="linenr">21: </span> <span style="color: #00ffff;">mret</span></span> <span class="linenr">22: </span> </pre></div> <p>This handler is updated by adding some code for synchronous exceptions (starting at line <a class="coderef" href="#coderef-U-mode ecall code" onmouseout="CodeHighlightOff(this, 'coderef-U-mode ecall code');" onmouseover="CodeHighlightOn(this, 'coderef-U-mode ecall code');">12</a>). The value of <b>mcause</b> is compared with the user-environment call exception code (8) to see if a system call was requested. If so, the handler will save the registers on the stack via the <code>push_stack</code> macro on line <a class="coderef" href="#coderef-save registers" onmouseout="CodeHighlightOff(this, 'coderef-save registers');" onmouseover="CodeHighlightOn(this, 'coderef-save registers');">14</a>, call the <code>__syscall</code> function, then resture the registers to their values prior to the call via the <code>pop_stack</code> macro on line <a class="coderef" href="#coderef-restore registers" onmouseout="CodeHighlightOff(this, 'coderef-restore registers');" onmouseover="CodeHighlightOn(this, 'coderef-restore registers');">16</a>.</p> <p>After the system call has finished processing, the handler loads the value of the <b>mepc</b> CSR which should contain the address of the <code>ecall</code> instruction that caused the trap. This address is incremented by 4 to skip to the instruction that follows <code>ecall</code> on line <a class="coderef" href="#coderef-Set mret address" onmouseout="CodeHighlightOff(this, 'coderef-Set mret address');" onmouseover="CodeHighlightOn(this, 'coderef-Set mret address');">18</a>, then stores the updated address in <b>mepc</b> before executing <code>mret</code>. This will set the program counter to the instruction immediately following the one that triggered the system call, and restore the privilege mode to U-mode.</p> <p>Typically system calls are identified by a number. If its arguments are stored in the registers <b>a0</b> to <b>a7</b>, and its return value in <b>a0</b>, system calls will follow the same convention as regular function calls (albeit with greater privilege). The following code snippet will invoke the system call associated with the identifier 0x100:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="coderef-off" id="coderef-load syscall id"><span class="linenr">1: </span><span style="color: #87cefa;">li</span> <span style="color: #00ffff;">a0</span>, 0x100</span> <span class="coderef-off" id="coderef-execute the syscall"><span class="linenr">2: </span><span style="color: #87cefa;">ecall</span></span> </pre></div> <p>The timer can now be set from user mode via a system call. The following implementation of <code>__syscall</code> will invoke <code>__timer_create</code> with the timeout value specified in register <b>a1</b> when system call 0x100 is requested:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span><span style="color: #87cefa;">__syscall</span>: <span class="linenr"> 2: </span> <span style="color: #00ffff;">mv</span> s1, a0 <span class="coderef-off" id="coderef-timer syscall"><span class="linenr"> 3: </span> <span style="color: #00ffff;">li</span> t0, 0x100</span> <span class="linenr"> 4: </span> <span style="color: #00ffff;">bne</span> t0, a0, 1f <span class="coderef-off" id="coderef-set timeout arg"><span class="linenr"> 5: </span> <span style="color: #00ffff;">mv</span> a0, a1</span> <span class="linenr"> 6: </span> <span style="color: #00ffff;">push_stack</span> <span class="linenr"> 7: </span> <span style="color: #00ffff;">call</span> __timer_create <span class="linenr"> 8: </span> <span style="color: #00ffff;">pop_stack</span> <span class="linenr"> 9: </span><span style="color: #87cefa;">1</span>: <span style="color: #00ffff;">ret</span> <span class="linenr">10: </span> </pre></div> <p>This function will check the register <b>a0</b> to determine the system call id that is requested. If it is 0x100, then it will move the timeout value from register <b>a1</b> to <b>a0</b> on line <a class="coderef" href="#coderef-set timeout arg" onmouseout="CodeHighlightOff(this, 'coderef-set timeout arg');" onmouseover="CodeHighlightOn(this, 'coderef-set timeout arg');">5</a>, push the stack, then invoke the <code>__timer_create</code> function with M-mode privilege. The main program can be updated to set the privilege level to U-mode, then make a system call 0x100 to set the timer. The updated main program is illustrated in the following listing:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> CLOCK_MONOTONIC <span class="linenr"> 5: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 6: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 7: </span> <span style="color: #00ffff;">la</span> sp, _stack_end <span class="linenr"> 8: </span> <span style="color: #00ffff;">call</span> __system_check <span class="linenr"> 9: </span> <span style="color: #00ffff;">la</span> t0, __mtrap_handler <span class="linenr">10: </span> <span style="color: #00ffff;">csrrw</span> zero, mtvec, t0 <span class="linenr">11: </span> <span style="color: #00ffff;">li</span> t0, 0b1&lt;&lt;3 <span class="linenr">12: </span> <span style="color: #00ffff;">csrrs</span> zero, mstatus, t0 <span class="linenr">13: </span> <span style="color: #00ffff;">li</span> t0, 0b1&lt;&lt;7 <span class="linenr">14: </span> <span style="color: #00ffff;">csrrs</span> zero, mie, t0 <span class="linenr">15: </span> <span style="color: #00ffff;">setmode</span> 0 <span class="coderef-off" id="coderef-set timer syscall"><span class="linenr">16: </span><span style="color: #87cefa;">1</span>: <span style="color: #00ffff;">li</span> a0, 0x100</span> <span class="coderef-off" id="coderef-set system timeout"><span class="linenr">17: </span> <span style="color: #00ffff;">li</span> a1, 0x10000</span> <span class="linenr">18: </span> <span style="color: #00ffff;">ecall</span> <span class="linenr">19: </span> <span style="color: #00ffff;">wfi</span> <span class="linenr">20: </span> <span style="color: #00ffff;">j</span> 1b </pre></div> <p>The major difference from the previous main program is that the timer is not set directly, but via a system call. The syscall id is loaded into register <b>a0</b> on line <a class="coderef" href="#coderef-set timer syscall" onmouseout="CodeHighlightOff(this, 'coderef-set timer syscall');" onmouseover="CodeHighlightOn(this, 'coderef-set timer syscall');">16</a>, and the timeout value into register <b>a1</b> on line <a class="coderef" href="#coderef-set system timeout" onmouseout="CodeHighlightOff(this, 'coderef-set system timeout');" onmouseover="CodeHighlightOn(this, 'coderef-set system timeout');">17</a>. This set of instructions will be repeated each time a timeout is triggered.</p> </div> </div> <div class="outline-2" id="outline-container-orgdb17f69"> <h2 id="orgdb17f69">Conclusion</h2> <div class="outline-text-2" id="text-orgdb17f69"> <p>This chapter has delved into how RISC-V handles synchronous and asynchronous exceptions as well as the privilege mode instructions available in the ISA. This may be one of the key components in the development of an operating system by allowing privileged functions to run separately from user code. System calls allow U-mode applications to request services which require M-mode (or S-mode) privilege to execute via the <code>ecall</code> instruction.</p> <p>Moreover, the asynchronous exception handling provides a good introduction into how RISC-V processors deal with events originating from external peripherals. This will become important when creating programs intended to interact with the system user.</p> <p>Although the examples in this chapter were restricted to M-mode and U-mode privilege levels. The supervisor mode was briefly discussed. Having three privilege levels is useful when creating hypervisors: guest operating systems can operate in S-mode while the virtualization environment uses M-mode.</p> <p>The next chapter will leverage many of the capabilities discussed here to interface with more of the external components in a RISC-V system. In particular the UART module will allow users to interact with applications via a serial console.</p> </div> </div> </div> </div> <span rel="sioc:has_creator" class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/main/user/1" lang="" about="/main/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">MarcAdmin</a></span> <span property="dc:date dc:created" content="2019-12-11T02:36:21+00:00" datatype="xsd:dateTime" class="field field--name-created field--type-created field--label-hidden">Tue, 12/10/2019 - 21:36</span> <div class="field field--name-field-tags field--type-entity-reference field--label-above clearfix"> <h3 class="field__label">Tags</h3> <ul class="links field__items"> <li><a href="/main/riscv" rel="dc:subject" hreflang="en">RISC-V</a></li> <li><a href="/main/taxonomy/term/17" rel="dc:subject" hreflang="en">Computer Architecture</a></li> </ul> </div> Wed, 11 Dec 2019 02:36:21 +0000 MarcAdmin 35 at https://www.vociferousvoid.org/main RISC-V Bare Metal Programming - Chapter 4: Another Brick in the Wall https://www.vociferousvoid.org/main/riscv_bare_metal_chapter4 <span property="dc:title" class="field field--name-title field--type-string field--label-hidden">RISC-V Bare Metal Programming - Chapter 4: Another Brick in the Wall</span> <div property="content:encoded" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><div class="tex2jax_process"> <p><a href="https://www.vociferousvoid.org/main/risc_bare_metal_chapter3">Chapter 3</a> of this RISC-V bare metal tutorial studied the linking process and how a developer can control where code and data are placed in memory. Constants, initialized variables and uninitialized variables were defined and explicitly positioned in RAM as prescribed by a linker script. The running example program was updated to read operands from RAM to perform its task, and subsequently store the result in a different location in RAM. However, up to this point only the base RV64I instruction set has been used. This chapter will explore some of the standard extensions available in the RISC-V ISA.</p> <p>One of the objectives in the design of the RISC-V ISA is to support many different deployment environments which may have varying constraints for efficiency, performance, and cost. For this reason, the base instruction set was restricted to the minimum required to build a useful program. This reduces the processor complexity potentially yielding performance and efficiency gains. However, these gains may be lost when performing more complex computations. To address potential limitations in the base instruction set, optional standard extensions have been defined to expand the available set of instructions. The standard extensions available for 32 and 64-bit instruction sets include:</p> <dl class="org-dl"><dt>M</dt> <dd>Support for multiply and divide (RV32M and RV64M).</dd> <dt>A</dt> <dd>Atomic operations (RV32A and RV64A).</dd> <dt>F</dt> <dd>Floating point support (RV32F and RV64F).</dd> <dt>D</dt> <dd>Double precision floating point support (RV32D and RV64D).</dd> </dl><p>This set of standard extensions are typically included in most implementations of RISC-V cores. The base set plus these extensions is often referred to as the <b>G</b> instruction set (RV32G or RV64G). Each of these standard extensions will be explored in this chapter.</p> <div class="outline-2" id="outline-container-org657fc9a"> <h2 id="org657fc9a">Multiply</h2> <div class="outline-text-2" id="text-org657fc9a"> <p>The <b>M</b> extension provides instructions for multiplying and dividing integers using both word and double-word length operands. When using word length operands, the result will not require more than 64-bits of memory which fits in an RV64I registers. The following listing of the <code>product.s</code> source file shows the assembly code of a function to multiply word sized integer operands:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.text</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> __imul32 <span class="linenr"> 4: </span><span style="color: #87cefa;">__imul32</span>: <span class="linenr"> 5: </span> # Input: <span class="linenr"> 6: </span> # a0: 32-bit multiplicand <span class="linenr"> 7: </span> # a1: 32-bit multiplier <span class="linenr"> 8: </span> # Result: <span class="linenr"> 9: </span> # a0: 64-bit product <span class="linenr">10: </span> <span style="color: #00ffff;">addi</span> sp, sp, -32 <span class="linenr">11: </span> <span style="color: #00ffff;">sd</span> ra, 24(sp) <span class="coderef-off" id="coderef-multiplication"><span class="linenr">12: </span> <span style="color: #00ffff;">mulw</span> a0, a0, a1</span> <span class="linenr">13: </span> <span style="color: #00ffff;">ld</span> ra, 24(sp) <span class="linenr">14: </span> <span style="color: #00ffff;">addi</span> sp, sp, 32 <span class="linenr">15: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>Due to the fact that the arguments of this function are expected to be word-length data, the calcluation of the product can be performed using a single instruction (<b>mulw</b> on line <a class="coderef" href="#coderef-multiplication" onmouseout="CodeHighlightOff(this, 'coderef-multiplication');" onmouseover="CodeHighlightOn(this, 'coderef-multiplication');">12</a>). The main program can be updated as follows to invoke the <b>__imul32</b> function:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">lw</span> a0, operand1 <span class="linenr"> 7: </span> <span style="color: #00ffff;">lw</span> a1, operand2 <span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> sp,_stack_end <span class="linenr"> 9: </span> <span style="color: #00ffff;">call</span> sum <span class="linenr">10: </span> <span style="color: #00ffff;">la</span> t1, result1 <span class="linenr">11: </span> <span style="color: #00ffff;">sw</span> a0, 0(t1) <span class="coderef-off" id="coderef-call_imul32"><span class="linenr">12: </span> <span style="color: #00ffff;">call</span> __imul32</span> <span class="linenr">13: </span> <span style="color: #00ffff;">la</span> t1, result2 <span class="coderef-off" id="coderef-save_imul32"><span class="linenr">14: </span> <span style="color: #00ffff;">sd</span> a0, 0(t1)</span> <span class="linenr">15: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">16: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".rodata"</span> <span class="linenr">17: </span><span style="color: #87cefa;">operand1</span>: .word 4 <span class="linenr">18: </span> <span style="color: #00ffff;">.data</span> <span class="linenr">19: </span><span style="color: #87cefa;">operand2</span>: .word 5 <span class="linenr">20: </span> <span style="color: #00ffff;">.bss</span> <span class="coderef-off" id="coderef-sum_result"><span class="linenr">21: </span><span style="color: #87cefa;">result1</span>: .word 0</span> <span class="coderef-off" id="coderef-__imul32_result"><span class="linenr">22: </span><span style="color: #87cefa;">result2</span>: .dword 0</span> </pre></div> <p>The <b>.bss</b> section of the <b>ELF</b> file was updated to declare two result variables: <b>result1</b> on line <a class="coderef" href="#coderef-sum_result" onmouseout="CodeHighlightOff(this, 'coderef-sum_result');" onmouseover="CodeHighlightOn(this, 'coderef-sum_result');">21</a> which will hold the sum of the operands in a word, and <b>result2</b> on line <a class="coderef" href="#coderef-__imul32_result" onmouseout="CodeHighlightOff(this, 'coderef-__imul32_result');" onmouseover="CodeHighlightOn(this, 'coderef-__imul32_result');">22</a> which will hold their product in a double-word.</p> <p>After the sum of the operands is calculated, and the result is saved in memory, it is kept in register <b>a0</b> to be used as the multiplicand. The value of <b>operand2</b> will be used as the multiplier; its value should still be in the <b>a1</b> register since its content is not modified by the <b>sum</b> function. The <b>__imul32</b> function is then called on line <a class="coderef" href="#coderef-call_imul32" onmouseout="CodeHighlightOff(this, 'coderef-call_imul32');" onmouseover="CodeHighlightOn(this, 'coderef-call_imul32');">12</a> and the result is saved in memory at line <a class="coderef" href="#coderef-save_imul32" onmouseout="CodeHighlightOff(this, 'coderef-save_imul32');" onmouseover="CodeHighlightOn(this, 'coderef-save_imul32');">14</a>.</p> <p>The program can be compiled and executed in <b>qemu</b> using the following sequence of commands:</p> <pre class="example"> riscv64-unknown-elf-as -o add.o add.s riscv64-unknown-elf-as -o main.o main.s riscv64-unknown-elf-as -o product.o product.s riscv64-unknown-elf-ld -T chapter3.lds -o main.elf add.o main.o product.o qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel main.elf QEMU 3.1.0 monitor - type 'help' for more information (qemu) </pre><p>The <code>chapter3.lds</code> linker script is the same one that was used in chapter 3. The result values can be inspected from the <b>qemu</b> console using the <b>xp</b> command:</p> <pre class="example"> (qemu) xp /1wd 0x80001004 0000000080001004: 9 (qemu) xp /1gd 0x80001008 0000000080001008: 45 (qemu) </pre><p>The location of <b>result1</b> in memory is the same as <b>result</b> from the previous chapter. The memory location of <b>result2</b> will be 4-bytes beyond <b>result1</b> since this value is 32-bits wide. Therefore the product result can be found at memory offset 0x80001008. This can easily be verified using the <b>objdump</b> utility:</p> <pre class="example"> $ riscv64-unknown-elf-objdump -D -j.bss main.elf sum.elf: file format elf64-littleriscv Disassembly of section .bss: 0000000080001004 &lt;result1&gt;: 80001004: 0000 unimp ... 0000000080001008 &lt;result2&gt;: ... </pre><p>As expected the multiplication of 9 and 5 is 45.</p> <p>Multiplication using registers is a little more complicated when dealing with 64-bit values. This is due to the fact that the product will be wider (in bits) than either the multiplier or multiplicand. The <b>__imul32</b> function assumes that the operands are word-length values, therefore the result will fit in a single double-word register. However, the calculated product will be truncated if double-word length operands are provided. The product of two 64-bit values may have as many as 128 bits which is wider than any available register in the RV64I instruction set. To mitigate this problem, the RISC-V ISA requires two instructions to perform a multiplication: one to calculate the most significant double-word (<b>mulh</b>), and a second to calculate the least significant double-word (<b>mul</b>). The following listing illustrates the <b>__imul64</b> function that can handle 64-bit operands:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.global</span> __imul64 <span class="linenr"> 2: </span><span style="color: #87cefa;">__imul64</span>: <span class="linenr"> 3: </span> # Input: <span class="linenr"> 4: </span> # a0: 64-bit multiplicand <span class="linenr"> 5: </span> # a1: 64-bit multiplier <span class="linenr"> 6: </span> # Result: <span class="linenr"> 7: </span> # a0: low 64-bits of the product <span class="linenr"> 8: </span> # a1: high 64-bits of the product <span class="linenr"> 9: </span> <span style="color: #00ffff;">addi</span> sp, sp, -32 <span class="linenr">10: </span> <span style="color: #00ffff;">sd</span> ra, 24(sp) <span class="coderef-off" id="coderef-save_t1"><span class="linenr">11: </span> <span style="color: #00ffff;">sd</span> t1, 16(sp)</span> <span class="coderef-off" id="coderef-save_t0"><span class="linenr">12: </span> <span style="color: #00ffff;">sd</span> t0, 8(sp)</span> <span class="coderef-off" id="coderef-mv_arg0"><span class="linenr">13: </span> <span style="color: #00ffff;">mv</span> t0, a0</span> <span class="coderef-off" id="coderef-mv_arg1"><span class="linenr">14: </span> <span style="color: #00ffff;">mv</span> t1, a1</span> <span class="coderef-off" id="coderef-__imul64_low"><span class="linenr">15: </span> <span style="color: #00ffff;">mul</span> a0, t0, t1</span> <span class="coderef-off" id="coderef-__imul64_high"><span class="linenr">16: </span> <span style="color: #00ffff;">mulh</span> a1, t0, t1</span> <span class="linenr">17: </span> <span style="color: #00ffff;">ld</span> t0, 8(sp) <span class="coderef-off" id="coderef-restore_t1"><span class="linenr">18: </span> <span style="color: #00ffff;">ld</span> t1, 16(sp)</span> <span class="coderef-off" id="coderef-restore_t0"><span class="linenr">19: </span> <span style="color: #00ffff;">ld</span> ra, 24(sp)</span> <span class="linenr">20: </span> <span style="color: #00ffff;">addi</span> sp, sp, 32 <span class="linenr">21: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>This code can be added to the <code>product.s</code> source file to provide a multiplication operation that uses 64-bit integers. The first thing this function does is save the contents of registers <b>t1</b> (line <a class="coderef" href="#coderef-save_t1" onmouseout="CodeHighlightOff(this, 'coderef-save_t1');" onmouseover="CodeHighlightOn(this, 'coderef-save_t1');">11</a>) and <b>t0</b> (line <a class="coderef" href="#coderef-save_t0" onmouseout="CodeHighlightOff(this, 'coderef-save_t0');" onmouseover="CodeHighlightOn(this, 'coderef-save_t0');">12</a>) which will be used by this function.</p> <blockquote><p><b>note</b> these are supposed to be caller saved registers, presumably the caller of the product function would have saved them. However, we are saving them here anyway</p> </blockquote> <p>The values of the function arguments are then moved into the temporary registers (lines <a class="coderef" href="#coderef-mv_arg0" onmouseout="CodeHighlightOff(this, 'coderef-mv_arg0');" onmouseover="CodeHighlightOn(this, 'coderef-mv_arg0');">13</a> and <a class="coderef" href="#coderef-mv_arg1" onmouseout="CodeHighlightOff(this, 'coderef-mv_arg1');" onmouseover="CodeHighlightOn(this, 'coderef-mv_arg1');">14</a>). This is required because, unlike the first version of this function, the arguments need to be reused and the value of <b>a0</b> will be overwritten by the first mutiplication on line <a class="coderef" href="#coderef-__imul64_low" onmouseout="CodeHighlightOff(this, 'coderef-__imul64_low');" onmouseover="CodeHighlightOn(this, 'coderef-__imul64_low');">15</a> which calculates the product of the low 32-bits of the operands. The second multiplication (line <a class="coderef" href="#coderef-__imul64_high" onmouseout="CodeHighlightOff(this, 'coderef-__imul64_high');" onmouseover="CodeHighlightOn(this, 'coderef-__imul64_high');">16</a>) will calculate the product of the high 32-bits of the operands and store the result in <b>a1</b>.</p> <p>The main program must be updated to handle a potential 128-bit result from the <b>__imul64</b> function:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">lw</span> a0, operand2 <span class="linenr"> 7: </span> <span style="color: #00ffff;">lw</span> a1, operand1 <span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> sp,_stack_end <span class="linenr"> 9: </span> <span style="color: #00ffff;">call</span> sum <span class="linenr">10: </span> <span style="color: #00ffff;">la</span> t1, result1 <span class="linenr">11: </span> <span style="color: #00ffff;">sw</span> a0, 0(t1) <span class="coderef-off" id="coderef-call__imul64"><span class="linenr">12: </span> <span style="color: #00ffff;">call</span> __imul64</span> <span class="linenr">13: </span> <span style="color: #00ffff;">la</span> t1, result2 <span class="coderef-off" id="coderef-save_product_low"><span class="linenr">14: </span> <span style="color: #00ffff;">sd</span> a0, 8(t1)</span> <span class="coderef-off" id="coderef-save_product_high"><span class="linenr">15: </span> <span style="color: #00ffff;">sd</span> a1, 0(t1)</span> <span class="linenr">16: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">17: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".rodata"</span> <span class="linenr">18: </span><span style="color: #87cefa;">operand1</span>: .word 4 <span class="linenr">19: </span> <span style="color: #00ffff;">.data</span> <span class="linenr">20: </span><span style="color: #87cefa;">operand2</span>: .word 5 <span class="linenr">21: </span> <span style="color: #00ffff;">.bss</span> <span class="linenr">22: </span><span style="color: #87cefa;">result1</span>: .word 0 <span class="coderef-off" id="coderef-product128_result"><span class="linenr">23: </span><span style="color: #87cefa;">result2</span>: .dword 0, 0</span> </pre></div> <p>The most significant change is that the result must be stored to memory using two instructions: one to store the product of the low 32-bits (line <a class="coderef" href="#coderef-save_product_low" onmouseout="CodeHighlightOff(this, 'coderef-save_product_low');" onmouseover="CodeHighlightOn(this, 'coderef-save_product_low');">14</a>), and one to store the product of the high 32-bits (line <a class="coderef" href="#coderef-save_product_high" onmouseout="CodeHighlightOff(this, 'coderef-save_product_high');" onmouseover="CodeHighlightOn(this, 'coderef-save_product_high');">15</a>). The <b>result2</b> variable on line <a class="coderef" href="#coderef-product128_result" onmouseout="CodeHighlightOff(this, 'coderef-product128_result');" onmouseover="CodeHighlightOn(this, 'coderef-product128_result');">23</a> must also be updated to reserve 128-bits for the product. The arguments of the <b>__imul64</b> function are the same as those of the <b>__imul32</b> function. Therefore the new function can be invoked by simply changing the call label on line <a class="coderef" href="#coderef-call__imul64" onmouseout="CodeHighlightOff(this, 'coderef-call__imul64');" onmouseover="CodeHighlightOn(this, 'coderef-call__imul64');">12</a>.</p> <p>After recompiling and linking the modified source files, the result can be inspecetd in the <b>qemu</b> console by printing out 2 double-word values at offset 0x80001008:</p> <pre class="example"> $ qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel main.elf QEMU 3.1.0 monitor - type 'help' for more information (qemu) xp /2gd 0x80001008 0000000080001008: 45 0 (qemu) quit </pre><p>Note that the <b>__imul64</b> function can also be used with 32-bit operands. The value of the high double-word will be zero in this case since no overflow occurred.</p> </div> </div> <div class="outline-2" id="outline-container-org9ca4bd4"> <h2 id="org9ca4bd4">Divide</h2> <div class="outline-text-2" id="text-org9ca4bd4"> <p>The RVM extension also provides instructions to calculate the quotient and remainder a division of an integer by another integer. This is slightly less complicated than multiplication because the result cannot be wider than the operands. However, this also limits divisions to dividends and divisors with a maximum of 64-bits. Therefore this is not a true reciprocal of the multiplication which can have a 128-bit result.</p> <p>The following listing illustrates the contents of the <code>divide.s</code> source file which defines the function to divide an unsigned 64-bit integer divisor by an unsigned 64-bit integer dividend.</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.text</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> __idiv64u <span class="linenr"> 4: </span><span style="color: #87cefa;">__idiv64u</span>: <span class="linenr"> 5: </span> <span style="color: #00ffff;">addi</span> sp, sp, -32 <span class="linenr"> 6: </span> <span style="color: #00ffff;">sd</span> ra, 24(sp) <span class="coderef-off" id="coderef-__idiv64u_check_divzero"><span class="linenr"> 7: </span> <span style="color: #00ffff;">beqz</span> a1, __idiv64u_exit</span> <span class="linenr"> 8: </span> <span style="color: #00ffff;">div</span> a0, a0, a1 <span class="linenr"> 9: </span><span style="color: #87cefa;">__idiv64u_exit</span>: <span class="linenr">10: </span> <span style="color: #00ffff;">ld</span> ra, 24(sp) <span class="linenr">11: </span> <span style="color: #00ffff;">addi</span> sp, sp, 32 <span class="linenr">12: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>This function is fairly straight forward, after ensuring that the dividend is not zero, it simply calls the <b>div</b> instruction to calculate the quotient. The check to ensure that the dividend is not zero on line <a class="coderef" href="#coderef-__idiv64u_check_divzero" onmouseout="CodeHighlightOff(this, 'coderef-__idiv64u_check_divzero');" onmouseover="CodeHighlightOn(this, 'coderef-__idiv64u_check_divzero');">7</a> is necessary because R64M does not trap on a divide by zero error. If the dividend is zero, the <b>div</b> instruction will be skipped.</p> <p>Since the result of the <b>__imul64</b> function is a 64-bit value due to its small operands, the <b>__idiv64</b> function can be invoked on the result to verify its accuracy. The <code>main.s</code> program can be updated as follows to divide the result of <b>__imul64</b> by <b>operand2</b>, and save the result in a variable in the <b>.data</b> section named <b>result3</b>.</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">lw</span> a0, operand1 <span class="linenr"> 7: </span> <span style="color: #00ffff;">lw</span> a1, operand2 <span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> sp,_stack_end <span class="linenr"> 9: </span> <span style="color: #00ffff;">call</span> sum <span class="linenr">10: </span> <span style="color: #00ffff;">la</span> t1, result1 <span class="linenr">11: </span> <span style="color: #00ffff;">sw</span> a0, 0(t1) <span class="linenr">12: </span> <span style="color: #00ffff;">call</span> __imul64 <span class="linenr">13: </span> <span style="color: #00ffff;">la</span> t1, result2 <span class="linenr">14: </span> <span style="color: #00ffff;">sd</span> a0, 0(t1) <span class="linenr">15: </span> <span style="color: #00ffff;">sd</span> a1, 8(t1) <span class="coderef-off" id="coderef-check_overflow"><span class="linenr">16: </span> <span style="color: #00ffff;">bnez</span> a1, stop</span> <span class="coderef-off" id="coderef-load_dividend"><span class="linenr">17: </span> <span style="color: #00ffff;">lw</span> a1, operand2</span> <span class="coderef-off" id="coderef-call_divide"><span class="linenr">18: </span> <span style="color: #00ffff;">call</span> __idiv64u</span> <span class="coderef-off" id="coderef-load_result3_addr"><span class="linenr">19: </span> <span style="color: #00ffff;">la</span> t0, result3</span> <span class="coderef-off" id="coderef-save_quotient"><span class="linenr">20: </span> <span style="color: #00ffff;">sd</span> a0, 0(t0)</span> <span class="linenr">21: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">22: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".rodata"</span> <span class="linenr">23: </span><span style="color: #87cefa;">operand1</span>: .word 4 <span class="linenr">24: </span> <span style="color: #00ffff;">.data</span> <span class="linenr">25: </span><span style="color: #87cefa;">operand2</span>: .word 5 <span class="linenr">26: </span> <span style="color: #00ffff;">.bss</span> <span class="linenr">27: </span><span style="color: #87cefa;">result1</span>: .word 0 <span class="linenr">28: </span><span style="color: #87cefa;">result2</span>: .dword 0, 0 <span class="coderef-off" id="coderef-division_result"><span class="linenr">29: </span><span style="color: #87cefa;">result3</span>: .dword 0</span> </pre></div> <p>After <b>__imul64</b> returns, the value is checked for overflow (line <a class="coderef" href="#coderef-check_overflow" onmouseout="CodeHighlightOff(this, 'coderef-check_overflow');" onmouseover="CodeHighlightOn(this, 'coderef-check_overflow');">16</a>) by asserting that the value returned in <b>a1</b> is zero. This will ensure that the result of the multiplication fits in a single 64-bit register. If the result is greater than 64-bits wide, the division will be skipped. Otherwise <b>operand2</b> is loaded into register <b>a1</b>. This check is not strictly necessary unless different operand values are used which may result in an overflow.</p> <p>The divide function will determine the quotient of the <b>__imul64</b> result by the value of <b>operand2</b>. The quotient will be stored in the <b>result3</b> variable. This should be the same as the result of the <b>sum</b> function (in <b>result1</b>). This can be verified by assembling and linking this program and running the binary in <b>qemu</b>. The value of <b>result3</b></p> <pre class="example"> riscv64-unknown-elf-as -o add.o add.s riscv64-unknown-elf-as -o divide.o divide.s riscv64-unknown-elf-as -o main.o main.s riscv64-unknown-elf-as -o product.o product.s riscv64-unknown-elf-ld -T chapter3.lds -o main.elf add.o divide.o main.o product.o qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel main.elf QEMU 3.1.0 monitor - type 'help' for more information (qemu) xp /1wd 0x80001004 0000000080001004: 9 (qemu) xp /1gd 0x80001018 0000000080001018: 9 (qemu) </pre><p>The offset of the <b>result3</b> variable will be 0x80001018; it is 16-bytes beyond the <b>result2</b> variable which is locaed at 0x80001008 (therefore +0x10). This can be verified using <b>objdump</b> as in the previous example.</p> <p>As expected, <b>result3</b> contains the integer 9 which is the result of the <b>sum</b> function in variable <b>result1</b> at offset 0x80001004.</p> <p>This value is convenient because 5 divides 45 exactly. If we divided the result of <b>__imul64</b> by <b>operand1</b> instead, the result would be 11 and there would be a remainder of 1. In the current implementation, this value is lost. However, the divide function can be updated to calculate the quotient and the remainder of a division. The updated <b>__idiv64u</b> function is illustrated in the following listing.</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span><span style="color: #87cefa;">__idiv64u</span>: <span class="linenr"> 2: </span> # Input: <span class="linenr"> 3: </span> # a0: 64-bit divisor <span class="linenr"> 4: </span> # a1: 64-bit dividend <span class="linenr"> 5: </span> # Returns: <span class="linenr"> 6: </span> # a0 =&gt; 64-bit quotient <span class="linenr"> 7: </span> # a1 =&gt; 64-bit remainder <span class="linenr"> 8: </span> <span style="color: #00ffff;">addi</span> sp, sp, -32 <span class="linenr"> 9: </span> <span style="color: #00ffff;">sd</span> ra, 24(sp) <span class="linenr">10: </span> <span style="color: #00ffff;">sd</span> t1, 16(sp) <span class="linenr">11: </span> <span style="color: #00ffff;">sd</span> t0, 8(sp) <span class="linenr">12: </span> <span style="color: #00ffff;">beqz</span> a1, __idiv64u_exit <span class="linenr">13: </span> <span style="color: #00ffff;">mv</span> t0, a0 <span class="linenr">14: </span> <span style="color: #00ffff;">mv</span> t1, a1 <span class="coderef-off" id="coderef-calculate_quotient"><span class="linenr">15: </span> <span style="color: #00ffff;">div</span> a0, t0, t1</span> <span class="coderef-off" id="coderef-calculate_remainder"><span class="linenr">16: </span> <span style="color: #00ffff;">rem</span> a1, t0, t1</span> <span class="linenr">17: </span><span style="color: #87cefa;">__idiv64u_exit</span>: <span class="linenr">18: </span> <span style="color: #00ffff;">ld</span> t0, 8(sp) <span class="linenr">19: </span> <span style="color: #00ffff;">ld</span> t1, 16(sp) <span class="linenr">20: </span> <span style="color: #00ffff;">ld</span> ra, 24(sp) <span class="linenr">21: </span> <span style="color: #00ffff;">addi</span> sp, sp, 32 <span class="linenr">22: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>This new implementation will save the argument values in temporary registers because this is a two-step function and the first argument would be overriden in the first step. The <b>divide</b> function then calculates the quotient on line <a class="coderef" href="#coderef-calculate_quotient" onmouseout="CodeHighlightOff(this, 'coderef-calculate_quotient');" onmouseover="CodeHighlightOn(this, 'coderef-calculate_quotient');">15</a>, and the remainder on line <a class="coderef" href="#coderef-calculate_remainder" onmouseout="CodeHighlightOff(this, 'coderef-calculate_remainder');" onmouseover="CodeHighlightOn(this, 'coderef-calculate_remainder');">16</a>. The <code>main.s</code> program must also be updated to save the result of the new <b>divide</b> function in two double words.</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">lw</span> a0, operand1 <span class="linenr"> 7: </span> <span style="color: #00ffff;">lw</span> a1, operand2 <span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> sp,_stack_end <span class="linenr"> 9: </span> <span style="color: #00ffff;">call</span> sum <span class="linenr">10: </span> <span style="color: #00ffff;">la</span> t1, result1 <span class="linenr">11: </span> <span style="color: #00ffff;">sw</span> a0, 0(t1) <span class="linenr">12: </span> <span style="color: #00ffff;">call</span> __imul64 <span class="linenr">13: </span> <span style="color: #00ffff;">la</span> t1, result2 <span class="linenr">14: </span> <span style="color: #00ffff;">sd</span> a0, 0(t1) <span class="linenr">15: </span> <span style="color: #00ffff;">sd</span> a1, 8(t1) <span class="linenr">16: </span> <span style="color: #00ffff;">bnez</span> a1, stop <span class="coderef-off" id="coderef-load_operand1_dividend"><span class="linenr">17: </span> <span style="color: #00ffff;">lw</span> a1, operand1</span> <span class="linenr">18: </span> <span style="color: #00ffff;">beqz</span> a1, stop <span class="linenr">19: </span> <span style="color: #00ffff;">call</span> __idiv64u <span class="linenr">20: </span> <span style="color: #00ffff;">la</span> t0, result3 <span class="linenr">21: </span> <span style="color: #00ffff;">sd</span> a0, 0(t0) <span class="coderef-off" id="coderef-save_remainder"><span class="linenr">22: </span> <span style="color: #00ffff;">sd</span> a1, 8(t0)</span> <span class="linenr">23: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">24: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".rodata"</span> <span class="linenr">25: </span><span style="color: #87cefa;">operand1</span>: .word 4 <span class="linenr">26: </span> <span style="color: #00ffff;">.data</span> <span class="linenr">27: </span><span style="color: #87cefa;">operand2</span>: .word 5 <span class="linenr">28: </span> <span style="color: #00ffff;">.bss</span> <span class="linenr">29: </span><span style="color: #87cefa;">result1</span>: .word 0 <span class="linenr">30: </span><span style="color: #87cefa;">result2</span>: .dword 0, 0 <span class="coderef-off" id="coderef-result3_2dwords"><span class="linenr">31: </span><span style="color: #87cefa;">result3</span>: .dword 0, 0</span> </pre></div> <p>The only changes are that <b>operand1</b> is used as the dividend on line <a class="coderef" href="#coderef-load_operand1_dividend" onmouseout="CodeHighlightOff(this, 'coderef-load_operand1_dividend');" onmouseover="CodeHighlightOn(this, 'coderef-load_operand1_dividend');">17</a> and an instruction was added on line <a class="coderef" href="#coderef-save_remainder" onmouseout="CodeHighlightOff(this, 'coderef-save_remainder');" onmouseover="CodeHighlightOn(this, 'coderef-save_remainder');">22</a> to store the remainder in ram. The <b>result3</b> variable was also updated to allocate two double-words of memory on line <a class="coderef" href="#coderef-result3_2dwords" onmouseout="CodeHighlightOff(this, 'coderef-result3_2dwords');" onmouseover="CodeHighlightOn(this, 'coderef-result3_2dwords');">31</a>. If this program is assembled and linked, then executed in <b>qemu</b> (as in the previous example), the contents of <b>operand3</b> can be inspected to see that both the quotient and remainder have been calculated:</p> <pre class="example"> (qemu) xp /2gd 0x80001018 0000000080001018: 11 1 </pre><p>This provides a more flexible implementation of <b>__idiv64u</b>, but if a true reciprocal of the <b>__imul64</b> function is desired, the function must allow for a 128-bit divisor argument. The RV64M extension does not define an instruction to calculate this, therefore the calculation must be performed in parts.</p> <p>If the 128-bit divisor is broken up into four words, the division can be carried out on each part individually and the result combined. This is possible because of the following:</p> <p>\(x = 2^{32}w_h + w_l\)</p> <p>The quotient of \(x\) by some integer \(d\) can be calculated as:</p> <p>\(x/d = 2^{32}w_h/d + (2^{32}*w_{h}\mod{d} + w_l)/d\)</p> <p>This calculation can be implemented with the following RISC-V assembly code:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.global</span> __idiv128u <span class="linenr"> 2: </span><span style="color: #87cefa;">__idiv128u</span>: <span class="linenr"> 3: </span> # Input: <span class="linenr"> 4: </span> # a0: Address where the 128-bit quotient will be stored (high <span class="linenr"> 5: </span> # dword, low dword). <span class="linenr"> 6: </span> # a1: 64-bit dividend <span class="linenr"> 7: </span> # a2: Address of the 128-bit divisor (high dword, low dword) <span class="linenr"> 8: </span> # Returns: <span class="linenr"> 9: </span> # a0: Address of the 128-bit quotient <span class="linenr">10: </span> # a1: 64-bit remainder <span class="linenr">11: </span> <span style="color: #00ffff;">addi</span> sp, sp, -32 <span class="linenr">12: </span> <span style="color: #00ffff;">sd</span> ra, 24(sp) <span class="linenr">13: </span> # Check for divide by zero <span class="linenr">14: </span> <span style="color: #00ffff;">beqz</span> a1, __idiv128u_exit <span class="linenr">15: </span> <span style="color: #00ffff;">addi</span> t2, a2, 16 <span class="linenr">16: </span> <span style="color: #00ffff;">li</span> t3, 0 # t3 = remainder <span class="linenr">17: </span><span style="color: #87cefa;">__idiv128u_next_dword</span>: <span class="linenr">18: </span> <span style="color: #00ffff;">lwu</span> t1, (a2) # t1 = low word <span class="linenr">19: </span> <span style="color: #00ffff;">ld</span> t0, (a2) <span class="linenr">20: </span> <span style="color: #00ffff;">srli</span> t0, t0, 32 # t0 = high word <span class="linenr">21: </span><span style="color: #87cefa;">__idiv128u_high_word</span>: <span class="linenr">22: </span> <span style="color: #00ffff;">slli</span> t3, t3, 32 <span class="linenr">23: </span> <span style="color: #00ffff;">add</span> t0, t0, t3 <span class="linenr">24: </span> <span style="color: #00ffff;">divu</span> t4, t0, a1 # t4 = t0/a1 <span class="linenr">25: </span> <span style="color: #00ffff;">slli</span> t5, t4, 32 # t5 = t4 * 2^32 <span class="linenr">26: </span> <span style="color: #00ffff;">remu</span> t3, t0, a1 # t3 = t0 mod a1 <span class="linenr">27: </span><span style="color: #87cefa;">__idiv128u_low_word</span>: <span class="coderef-off" id="coderef-__idiv128u_scale_remainder"><span class="linenr">28: </span> <span style="color: #00ffff;">slli</span> t3, t3, 32 # t3 = t3 * 2^32</span> <span class="coderef-off" id="coderef-__idiv128u_add_remainder"><span class="linenr">29: </span> <span style="color: #00ffff;">add</span> t0, t1, t3</span> <span class="linenr">30: </span> <span style="color: #00ffff;">divu</span> t4, t0, a1 <span class="linenr">31: </span> <span style="color: #00ffff;">add</span> t5, t5, t4 <span class="linenr">32: </span> <span style="color: #00ffff;">remu</span> t3, t0, a1 <span class="linenr">33: </span> <span style="color: #00ffff;">sd</span> t5, (a0) <span class="linenr">34: </span> <span style="color: #00ffff;">addi</span> a2, a2, 8 <span class="linenr">35: </span> <span style="color: #00ffff;">addi</span> a0, a0, 8 <span class="linenr">36: </span> <span style="color: #00ffff;">bne</span> t2, a2, __idiv128u_next_dword <span class="linenr">37: </span> <span style="color: #00ffff;">mv</span> a0, t3 <span class="linenr">38: </span><span style="color: #87cefa;">__idiv128u_exit</span>: <span class="linenr">39: </span> <span style="color: #00ffff;">ld</span> ra, 24(sp) <span class="linenr">40: </span> <span style="color: #00ffff;">addi</span> sp, sp, 32 <span class="linenr">41: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>This function iteratively performs a 64-bit division on 32-bit words of the divisor. The remainder is scaled (<a class="coderef" href="#coderef-__idiv128u_scale_remainder" onmouseout="CodeHighlightOff(this, 'coderef-__idiv128u_scale_remainder');" onmouseover="CodeHighlightOn(this, 'coderef-__idiv128u_scale_remainder');">28</a>), then added to the next word of the divisor (line <a class="coderef" href="#coderef-__idiv128u_add_remainder" onmouseout="CodeHighlightOff(this, 'coderef-__idiv128u_add_remainder');" onmouseover="CodeHighlightOn(this, 'coderef-__idiv128u_add_remainder');">29</a>) and the process is repeated for the next 64-bit double word.</p> <p>The following listing illustrates an updated <code>main.s</code> file:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">lw</span> a0, operand1 <span class="linenr"> 7: </span> <span style="color: #00ffff;">lw</span> a1, operand2 <span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> sp, _stack_end <span class="linenr"> 9: </span> <span style="color: #00ffff;">call</span> sum <span class="linenr">10: </span> <span style="color: #00ffff;">la</span> t1, result1 <span class="linenr">11: </span> <span style="color: #00ffff;">sw</span> a0, 0(t1) <span class="linenr">12: </span> <span style="color: #00ffff;">call</span> __imul64 <span class="linenr">13: </span> <span style="color: #00ffff;">la</span> t1, divisor <span class="linenr">14: </span> <span style="color: #00ffff;">sd</span> a0, 8(t1) <span class="linenr">15: </span> <span style="color: #00ffff;">sd</span> a1, 0(t1) <span class="linenr">16: </span> <span style="color: #00ffff;">la</span> a0, quotient <span class="linenr">17: </span> <span style="color: #00ffff;">lw</span> a1, operand1 <span class="linenr">18: </span> <span style="color: #00ffff;">la</span> a2, divisor <span class="linenr">19: </span> <span style="color: #00ffff;">call</span> __idiv128u <span class="coderef-off" id="coderef-save__idiv128u_remainder"><span class="linenr">20: </span> <span style="color: #00ffff;">la</span> t0, remainder</span> <span class="linenr">21: </span> <span style="color: #00ffff;">sd</span> a0, (t0) <span class="linenr">22: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">23: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".rodata"</span> <span class="linenr">24: </span><span style="color: #87cefa;">operand1</span>: .word 4 <span class="linenr">25: </span> <span style="color: #00ffff;">.data</span> <span class="linenr">26: </span><span style="color: #87cefa;">operand2</span>: .word 5 <span class="linenr">27: </span> <span style="color: #00ffff;">.bss</span> <span class="linenr">28: </span><span style="color: #87cefa;">result1</span>: .word 0 <span class="linenr">29: </span><span style="color: #87cefa;">result2</span>: .dword 0, 0 <span class="linenr">30: </span><span style="color: #87cefa;">result3</span>: .dword 0, 0 <span class="linenr">31: </span><span style="color: #87cefa;">divisor</span>: .dword 0, 0 <span class="linenr">32: </span><span style="color: #87cefa;">quotient</span>: .dword 0, 0 <span class="linenr">33: </span><span style="color: #87cefa;">remainder</span>: .dword 0 </pre></div> <p>This updated main program does not perform an overflow check since the <b>__idiv128u</b> function can handle a 128-bit divisor. This function also reads its operands directly from memory rather than from registers due to the fact that the divisor may not fit in a single register. The memory at label <b>quotient</b> will be updated with the result of the division. The remainder will be returned by the function, which is then saved to the memory at label <b>remainder</b> on line <a class="coderef" href="#coderef-save__idiv128u_remainder" onmouseout="CodeHighlightOff(this, 'coderef-save__idiv128u_remainder');" onmouseover="CodeHighlightOn(this, 'coderef-save__idiv128u_remainder');">20</a>.</p> </div> </div> <div class="outline-2" id="outline-container-org2adc904"> <h2 id="org2adc904">Atomic Instructions</h2> <div class="outline-text-2" id="text-org2adc904"> <p>Synchronization is an important feature in multiprocessing systems. Thus far, the examples have used a single hardware thread, or hart, therefore there has not been any need to synchronize memory access. RISC-V defines the <b>A</b> extension which provides instructions to atomically read-modify-write data in memory. These instructions can be used to support synchronization between multiple hardware threads running in the same memory space.</p> <p>The most basic synchronization primitive is the atomic compare and swap operation. This will compare a value in a register with a value in memory. If the two values are equal, the value in another register will be swapped with the value in memory. The pseudo code for this is as follows:</p> <ol class="org-ol"><li>Load value in register <b>R1</b></li> <li>Load address of the second value in <b>R2</b></li> <li>Load the value at address <b>R2</b> into a temporary register <b>T1</b></li> <li>Load swap value in register <b>R3</b></li> <li>If <b>R1</b> == <b>T1</b>:<br /><ol class="org-ol"><li>Store <b>R3</b> at memory location <b>R2</b></li> <li><b>R3</b> := <b>T1</b></li> </ol></li> </ol><p>This entire sequence is expected to be performed atomically (i.e. there can be no interrupt between the time the value <b>T1</b> is read from memory, and the end of the procedure. This can be implemented using Load Reserved/Store Conditional instructions provided by the RVA extension. The following listing illustrates the implementation of a compare-and-swap function:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.text</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> compare_and_swap <span class="linenr"> 4: </span> # a0: Address of value operand <span class="linenr"> 5: </span> # a1: Value to compare <span class="linenr"> 6: </span> # a2: Value to swap if (a0) == a1 <span class="linenr"> 7: </span> # return: a0 == 0 =&gt; CAS successful <span class="linenr"> 8: </span> # return: a0 == 1 =&gt; CAS failed <span class="linenr"> 9: </span><span style="color: #87cefa;">compare_and_swap</span>: <span class="coderef-off" id="coderef-load-reserved"><span class="linenr">10: </span> <span style="color: #00ffff;">lr.d</span> t0, (a0)</span> <span class="linenr">11: </span> <span style="color: #00ffff;">bne</span> t0, a1, nomatch <span class="coderef-off" id="coderef-match-store-conditional"><span class="linenr">12: </span> <span style="color: #00ffff;">sc.d</span> a0, a2, (a0)</span> <span class="coderef-off" id="coderef-cas-failed"><span class="linenr">13: </span> <span style="color: #00ffff;">bnez</span> a0, compare_and_swap</span> <span class="linenr">14: </span> <span style="color: #00ffff;">j</span> exit <span class="linenr">15: </span><span style="color: #87cefa;">nomatch</span>: <span class="linenr">16: </span> <span style="color: #00ffff;">li</span> a0, 1 <span class="linenr">17: </span><span style="color: #87cefa;">exit</span>: <span class="linenr">18: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>This function will atomically compare the value in memory located at the address in <b>a0</b> with the value in register <b>a1</b>, and store the value of <b>a2</b> at the location in <b>a0</b> if they match.</p> <p>The load-reserved instruction on line <a class="coderef" href="#coderef-load-reserved" onmouseout="CodeHighlightOff(this, 'coderef-load-reserved');" onmouseover="CodeHighlightOn(this, 'coderef-load-reserved');">10</a> loads the value at memory location <b>a0</b> into register <b>t0</b>, and registers a reservation on the address in memory. The nature of the memory reservation is specific to the implementation of the RISC-V core and is transparent to the program. The memory range that is reserved can be arbitrarily sized, however, it must be at least large enough to enclose the value that was loaded.</p> <p>The value of <b>t0</b> is then compared with <b>a1</b>. If the values match, the store-conditional instruction on line <a class="coderef" href="#coderef-match-store-conditional" onmouseout="CodeHighlightOff(this, 'coderef-match-store-conditional');" onmouseover="CodeHighlightOn(this, 'coderef-match-store-conditional');">12</a> will save the value in <b>a2</b> to the memory location of <b>a0</b>. This will also release the reservation on the memory address. If the values do not match, the memory is not updated (this instruction is skipped).</p> <p>If another hardware thread writes data to the memory for which there is a reservation, then the store-conditional instruction will fail and a non-zero error code will be written to the destination register which is <b>a0</b> in this function (line <a class="coderef" href="#coderef-match-store-conditional" onmouseout="CodeHighlightOff(this, 'coderef-match-store-conditional');" onmouseover="CodeHighlightOn(this, 'coderef-match-store-conditional');">12</a>. In this case, the compare-and-swap operation is restarted (line <a class="coderef" href="#coderef-cas-failed" onmouseout="CodeHighlightOff(this, 'coderef-cas-failed');" onmouseover="CodeHighlightOn(this, 'coderef-cas-failed');">13</a>.</p> <p>The main program shown in the following listing will invoke the compare-and-swap function:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">la</span> sp, _stack_end <span class="coderef-off" id="coderef-setup-cas-arguments"><span class="linenr"> 7: </span> <span style="color: #00ffff;">la</span> a0, n</span> <span class="linenr"> 8: </span> <span style="color: #00ffff;">li</span> a1, 5 <span class="linenr"> 9: </span> <span style="color: #00ffff;">li</span> a2, 6 <span class="coderef-off" id="coderef-successful-cas"><span class="linenr">10: </span> <span style="color: #00ffff;">call</span> compare_and_swap</span> <span class="linenr">11: </span> <span style="color: #00ffff;">la</span> a0, n <span class="linenr">12: </span> <span style="color: #00ffff;">li</span> a1, 5 <span class="linenr">13: </span> <span style="color: #00ffff;">li</span> a2, 7 <span class="coderef-off" id="coderef-unsuccessful-cas"><span class="linenr">14: </span> <span style="color: #00ffff;">call</span> compare_and_swap</span> <span class="linenr">15: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="coderef-off" id="coderef-load-alignment"><span class="linenr">16: </span> <span style="color: #00ffff;">.balign</span> 8</span> <span class="linenr">17: </span><span style="color: #87cefa;">n</span>: .dword 5 </pre></div> <p>Starting at line <a class="coderef" href="#coderef-setup-cas-arguments" onmouseout="CodeHighlightOff(this, 'coderef-setup-cas-arguments');" onmouseover="CodeHighlightOn(this, 'coderef-setup-cas-arguments');">7</a>, the function arguments are setup by first loading the address of the variable <b>n</b> into <b>a0</b>. Note that the alignment of the data loaded by the <b>lr.d</b> instruction must be aligned on an 8-byte boundary (similarly the <b>lr.w</b> instruction expects the data to be aligned to a 4-byte boundary). The <b>.balign</b> (byte align) assembler directive on line <a class="coderef" href="#coderef-load-alignment" onmouseout="CodeHighlightOff(this, 'coderef-load-alignment');" onmouseover="CodeHighlightOn(this, 'coderef-load-alignment');">16</a> ensures that this is the case.</p> <p>The first invocation of the function on line <a class="coderef" href="#coderef-successful-cas" onmouseout="CodeHighlightOff(this, 'coderef-successful-cas');" onmouseover="CodeHighlightOn(this, 'coderef-successful-cas');">10</a> will succeed, thus the value of <b>n</b> will be updated to 6. the second invocation will fail, this the value of <b>n</b> will not be changed. This can be verified by assembling the program and inspecting the memory from the <b>qemu</b> monitor:</p> <pre class="example"> riscv64-unknown-elf-as -o chapter4_cas_main.o chapter4_cas_main.s riscv64-unknown-elf-as -o cas.o cas.s riscv64-unknown-elf-ld -T chapter3.lds -o chapter4-cas.elf chapter4_cas_main.o cas.o qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel chapter4-cas.elf QEMU 3.1.0 monitor - type 'help' for more information (qemu) xp /1gd 0x80001008 0000000080001008: 6 (qemu) </pre><p>In addition to the load-reserved/store-conditional instructions, the RVA extension also provides atomic memory operations. These atomically perform an operation on a value in memory, and swap the previous content of the memory location into the targetted register. The supported operations include: <b>add</b>, <b>and</b>, <b>or</b>, <b>xor</b>, <b>max</b>, <b>min</b>, and <b>swap</b>. Moreover, the <b>min</b> and <b>max</b> instructions have signed and unsigned variants. These instructions are convenient for defining another useful synchronization primitive: the test-and-set spinlock.</p> <p>Spinlocks can be acquired by setting a sentinel value in a specific memory location, but only if that value is not already set therein. If the target memory location already contains the sentinel value, the spinlock will loop until it is released. The lock is released by clearing the memory location (i.e. setting it to zero). The implementation of a spinlock acquire/release pair is illustrated in the listing that follows:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.text</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> spinlock_acquire <span class="linenr"> 4: </span><span style="color: #87cefa;">spinlock_acquire</span>: <span class="linenr"> 5: </span> # a0 = memory address of the spinlock <span class="coderef-off" id="coderef-load_sentinel"><span class="linenr"> 6: </span> <span style="color: #00ffff;">li</span> t1, 1 #</span> <span class="coderef-off" id="coderef-set_sentinel"><span class="linenr"> 7: </span> <span style="color: #00ffff;">amoswap.d.aq</span> t0, t1, (a0) #</span> <span class="coderef-off" id="coderef-test_if_locked"><span class="linenr"> 8: </span> <span style="color: #00ffff;">bnez</span> t1, spinlock_acquire #</span> <span class="linenr"> 9: </span> <span style="color: #00ffff;">ret</span> <span class="linenr">10: </span> <span class="linenr">11: </span> <span style="color: #00ffff;">.global</span> spinlock_release <span class="linenr">12: </span><span style="color: #87cefa;">spinlock_release</span>: <span class="linenr">13: </span> # a0 = memory address of the spinlock <span class="coderef-off" id="coderef-unset_sentinel"><span class="linenr">14: </span> <span style="color: #00ffff;">amoswap.d.rl</span> zero, zero, (a0) #</span> <span class="linenr">15: </span> <span style="color: #00ffff;">ret</span> <span class="linenr">16: </span> </pre></div> <p>This listing defines two sub-routines: one to acquire a spinlock, and one to release it. The <b>spinlock_acquire</b> function loads the value 1 to use as the sentinel on line <a class="coderef" href="#coderef-load_sentinel" onmouseout="CodeHighlightOff(this, 'coderef-load_sentinel');" onmouseover="CodeHighlightOn(this, 'coderef-load_sentinel');">6</a>. Then the atomic memory operation <b>amoswap</b> is used on line <a class="coderef" href="#coderef-set_sentinel" onmouseout="CodeHighlightOff(this, 'coderef-set_sentinel');" onmouseover="CodeHighlightOn(this, 'coderef-set_sentinel');">7</a> to swap the value of the sentinel with the contents of the memory location specified in <b>a0</b>. The value contained in the lock location will be saved in register <b>t0</b>. If this value is not zero, the lock was already acquired by another thread, therefore the function will try again (line <a class="coderef" href="#coderef-test_if_locked" onmouseout="CodeHighlightOff(this, 'coderef-test_if_locked');" onmouseover="CodeHighlightOn(this, 'coderef-test_if_locked');">8</a>), otherwise the function returns.</p> <p>the <b>spinlock</b> release function will simply write zero into the memory location specified in <b>a0</b>. This will allow another thread that is spinning on the lock to acquire it.</p> <p>The <b>amoswap</b> instruction has two variants: one for double-words (<b>amoswap.d</b>) and one for word values (<b>amoswap.w</b>). Moreover, there are flags which define define the release consistency semantics of the memory operation (the <b>.aq</b> and <b>.rl</b> suffixes). Basically by setting the <b>.aq</b> suffix on the operation, then the effect of memory operations that occur after this one in the current hardware thread will not be observed by another thread before the effect of the current instruction. Conversely, when the <b>.rl</b> suffix is specified, the effects of memory operations preceding that of the current instruction will not be observed by other threads after its own effect.</p> <p>The following program illustrates the use of the spinlock functions to define a critical section:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">la</span> sp, _stack_end <span class="coderef-off" id="coderef-load_lockaddr"><span class="linenr"> 7: </span> <span style="color: #00ffff;">la</span> a0, lock #</span> <span class="coderef-off" id="coderef-spinlock_acquire"><span class="linenr"> 8: </span> <span style="color: #00ffff;">call</span> spinlock_acquire #</span> <span class="coderef-off" id="coderef-critical-section-start"><span class="linenr"> 9: </span> <span style="color: #00ffff;">la</span> t0, n #</span> <span class="linenr">10: </span> <span style="color: #00ffff;">ld</span> a0, (t0) <span class="linenr">11: </span> <span style="color: #00ffff;">li</span> a1, 1 <span class="linenr">12: </span> <span style="color: #00ffff;">call</span> sum <span class="linenr">13: </span> <span style="color: #00ffff;">la</span> t0, n <span class="coderef-off" id="coderef-critical-section-end"><span class="linenr">14: </span> <span style="color: #00ffff;">sd</span> a0, (t0) #</span> <span class="linenr">15: </span> <span style="color: #00ffff;">la</span> a0, lock <span class="coderef-off" id="coderef-spinlock_release"><span class="linenr">16: </span> <span style="color: #00ffff;">call</span> spinlock_release #</span> <span class="linenr">17: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">18: </span> <span style="color: #00ffff;">.data</span> <span class="linenr">19: </span> <span style="color: #00ffff;">.balign</span> 8 <span class="linenr">20: </span><span style="color: #87cefa;">lock</span>: .dword 0 <span class="linenr">21: </span><span style="color: #87cefa;">n</span>: .dword 0 <span class="linenr">22: </span> </pre></div> <p>This program will attempt to acquire the spinlock on line <a class="coderef" href="#coderef-spinlock_acquire" onmouseout="CodeHighlightOff(this, 'coderef-spinlock_acquire');" onmouseover="CodeHighlightOn(this, 'coderef-spinlock_acquire');">8</a> (the address of the lock variable is loaded on line <a class="coderef" href="#coderef-load_lockaddr" onmouseout="CodeHighlightOff(this, 'coderef-load_lockaddr');" onmouseover="CodeHighlightOn(this, 'coderef-load_lockaddr');">7</a>). This function call will block until the lock is acquired. Since there is only a single hardware thread, the lock should be acquired immediately. The critical section starts on line <a class="coderef" href="#coderef-critical-section-start" onmouseout="CodeHighlightOff(this, 'coderef-critical-section-start');" onmouseover="CodeHighlightOn(this, 'coderef-critical-section-start');">9</a>. The variable <b>n</b> is loaded and incremented by calling the <b>sum</b> function (defined in a previous chapter). The critical section ends on line <a class="coderef" href="#coderef-critical-section-end" onmouseout="CodeHighlightOff(this, 'coderef-critical-section-end');" onmouseover="CodeHighlightOn(this, 'coderef-critical-section-end');">14</a>, at which point the program releases the spinlock (<a class="coderef" href="#coderef-spinlock_release" onmouseout="CodeHighlightOff(this, 'coderef-spinlock_release');" onmouseover="CodeHighlightOn(this, 'coderef-spinlock_release');">16</a>. Following the execution of this program, the contents of the variable <b>n</b> should be 1:</p> <pre class="example"> riscv64-unknown-elf-as -o chapter4_spinlock_main.o chapter4_spinlock_main.s riscv64-unknown-elf-as -o spinlock.o spinlock.s riscv64-unknown-elf-as -o add.o add.s riscv64-unknown-elf-ld -T chapter3.lds -o chapter4-spinlock.elf chapter4_spinlock_main.o spinlock.o add.o qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel chapter4-spinlock.elf QEMU 3.1.0 monitor - type 'help' for more information (qemu) xp /1gd 0x80001008 0000000080001008: 1 </pre></div> </div> <div class="outline-2" id="outline-container-org86a22e8"> <h2 id="org86a22e8">Floating Point</h2> <div class="outline-text-2" id="text-org86a22e8"> <p>In <a href="https://www.vociferousvoid.org/main/risv_bare_metal_chapter2">chapter 2</a>, the base set of the base <b>I</b> (integer) registers were enumerated. However, when inspecting the VirtIO machine in <b>qemu</b>, using the <code>info registers</code> command, certain registers were listed that are not described in the table. These registers exist to support the <b>F</b> or <b>D</b> extensions which provide floating point arithmetic instructions that work with operands which conform to the IEEE 754-2008 standard. The <b>F</b> extension provides support for single-precission values and operands, and the <b>D</b> extension provides the same instructions for double-precision values.</p> <p>The 32 additional registers, <b>f0</b>-<b>f31</b>, are used exclusively by the instructions provided by the RVF and RVD extensions. This doubles the number of registers available to the processor without increasing the space required for the register specifier in the instruction op-code since only enough bits to enumerate 32 registers are required (5 bits).</p> <p>If only the RVF extension is supported, the <b>f</b> registers will be 32-bits wide. If the RVD extension is supported, the <b>f</b> registers will be 64-bits wide. If both RVF and RVD are supported, the RVF instructions will use only the lower 32-bits of the 64-bit registers.</p> <p>The <b>f</b> registers are enumerated in the following table with their ABI name and a description:</p> <table border="2" cellpadding="6" cellspacing="0" frame="hsides" rules="groups"><colgroup><col class="org-left" /><col class="org-left" /><col class="org-left" /></colgroup><thead><tr><th class="org-left" scope="col">Register(s)</th> <th class="org-left" scope="col">ABI Name(s)</th> <th class="org-left" scope="col">Description</th> </tr></thead><tbody><tr><td class="org-left">f0-f7</td> <td class="org-left">ft0-ft7</td> <td class="org-left">Temporary</td> </tr><tr><td class="org-left">f8-f9</td> <td class="org-left">fs0-fs1</td> <td class="org-left">Saved register</td> </tr><tr><td class="org-left">f10-f11</td> <td class="org-left">fa0-fa1</td> <td class="org-left">Function argument/Return value</td> </tr><tr><td class="org-left">f12-f17</td> <td class="org-left">fa2-fa7</td> <td class="org-left">Function argument</td> </tr><tr><td class="org-left">f18-f27</td> <td class="org-left">fs2-fs11</td> <td class="org-left">Saved register</td> </tr><tr><td class="org-left">f28-f31</td> <td class="org-left">ft8-ft9</td> <td class="org-left">Temporary</td> </tr></tbody></table><p>These registers roughly mirror the base integer registers with two notable exception: unlike <b>x0</b>, <b>f0</b> is not hardwired to 0, it can be used just like every other register. Moreover there are no registers to manage return addresses, stacks, globals, or threads. The equivalent <b>f</b> registers are used as temporaries.</p> <p>The convention for who is responsible for saving the contents of the registers is essentially the same as the equivalent base integer registers: Saved registers and temporary registers are to be saved by the callee. All other registers must be saved by the caller.</p> <p>In addition to the 32 <b>f</b> registers, the RVF and RVD extensions define a status and control register: <b>fcsr</b>. The RVF and RVD extensions provide the <b>frcsr</b> instruction to read this register, storing its value into the targetted integer register. Similarly, the <b>fscsr</b> instruction will copy the original value of <b>fcsr</b> into the destination integer register, and the write the value in the source integer register thereto.</p> <p>The <b>fcsr</b> prescribes the rounding mode used by floating point operations. The rounding mode field occupies bits 5-7 of the register. The RVF and RVD extensions also define the <b>frrm</b> instruction to retrieve the rounding mode.</p> <p>The <b>fcsr</b> register also contains flags to indicate exception conditions that may have occured while executing floating-point arithmetic since it was last reset. These errors include:</p> <dl class="org-dl"><dt>NV</dt> <dd>Invalid operation (<b>fcsr</b>[4])</dd> <dt>DZ</dt> <dd>Divide by zero (<b>fcsr</b>[3])</dd> <dt>OF</dt> <dd>Overflow (<b>fcsr</b>[2])</dd> <dt>UF</dt> <dd>Underflow (<b>fcsr</b>[1])</dd> <dt>NX</dt> <dd>Inexact (<b>fcsr</b>[0])</dd> </dl><p>The floating-point exception flags can also be retrieved using the <b>frflags</b> instruction which saves their state in the specified integer registers.</p> <p>The RVF and RVD extensions define two load instructions and two store instructions. These are essentially mirrors of the base load and store instructions that use the <b>f</b> registers rather than the <b>x</b> integer registers. Therefore their addressing mode and format are the same as the <b>lw</b>, <b>ld</b>, <b>sw</b> and <b>sd</b> instructions.</p> <p>The RVF and RVD extensions also provide a set of arithmetic instructions including:</p> <ul class="org-ul"><li>fadd</li> <li>fsub</li> <li>fmul</li> <li>fdiv</li> <li>fsqrt</li> </ul><p>Each instruction has a single- and double-precision variant which can be specified by adding a <b>.s</b> or <b>.d</b> suffix to the instruction respectively.</p> <p>The floating-point arithmetic instructions will operate using only the <b>f</b> registers, therefore the extensions provide instructions to move data from integer to floating point registers.</p> <p>The following function implementation will demonstrate some of these instructions. The function in <code>fvector.s</code> will multiply each element from an array of floating-point values by a floating-point scalar:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.text</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> __vec_scalef <span class="linenr"> 4: </span> # a0: number of elements, 'n', in the array <span class="linenr"> 5: </span> # fa0: A double-precision floating-point scalar 'a' <span class="linenr"> 6: </span> # a1: Address of array of x[n] double-precision floating-point values. <span class="linenr"> 7: </span><span style="color: #87cefa;">__vec_scalef</span>: <span class="linenr"> 8: </span> <span style="color: #00ffff;">addi</span> sp, sp, -32 <span class="linenr"> 9: </span> <span style="color: #00ffff;">sd</span> ra, 24(sp) <span class="linenr">10: </span> <span style="color: #00ffff;">beqz</span> a0, __vec_scalef_exit <span class="linenr">11: </span><span style="color: #87cefa;">__vec_scalef_loop</span>: <span class="coderef-off" id="coderef-load_double"><span class="linenr">12: </span> <span style="color: #00ffff;">fld</span> fa5,0(a1)</span> <span class="coderef-off" id="coderef-mul_double"><span class="linenr">13: </span> <span style="color: #00ffff;">fmul.d</span> fa5, fa5, fa0</span> <span class="coderef-off" id="coderef-store_double"><span class="linenr">14: </span> <span style="color: #00ffff;">fsd</span> fa5,0(a1)</span> <span class="linenr">15: </span> <span style="color: #00ffff;">addi</span> a1, a1,8 <span class="linenr">16: </span> <span style="color: #00ffff;">addi</span> a0, a0,-1 <span class="linenr">17: </span> <span style="color: #00ffff;">bnez</span> a0, __vec_scalef_loop <span class="linenr">18: </span><span style="color: #87cefa;">__vec_scalef_exit</span>: <span class="linenr">19: </span> <span style="color: #00ffff;">ld</span> ra, 24(sp) <span class="linenr">20: </span> <span style="color: #00ffff;">addi</span> sp, sp, 32 <span class="linenr">21: </span> <span style="color: #00ffff;">ret</span> </pre></div> <p>In this function each value of the double array is loaded on line <a class="coderef" href="#coderef-load_double" onmouseout="CodeHighlightOff(this, 'coderef-load_double');" onmouseover="CodeHighlightOn(this, 'coderef-load_double');">12</a> at each iteration (up to a maximum set by the integer value in <b>a0</b>). The loaded value is multipled by the double-precision floating-point value in <b>fa0</b> on line <a class="coderef" href="#coderef-mul_double" onmouseout="CodeHighlightOff(this, 'coderef-mul_double');" onmouseover="CodeHighlightOn(this, 'coderef-mul_double');">13</a>, then stored to the same memory location on line <a class="coderef" href="#coderef-store_double" onmouseout="CodeHighlightOff(this, 'coderef-store_double');" onmouseover="CodeHighlightOn(this, 'coderef-store_double');">14</a>.</p> <p>The source data for the function can be defined using the <b>.double</b> assembler directive. This directive will store double-precision floating-point values in successive memory double-words. The <b>.float</b> directive will do the same for single-precision floating-point values.</p> <p>There are many more instructions defined in the RVF and RVD extensions. Enough to dedicate an entire chapter to this topic. Moreover, the <b>qemu</b> support for the RVF and RVD does not seem to be fully immplemented for the version available in the Debian 10 packages. A more thorough investigation of these extensions will be reserved for a future chapter.</p> </div> </div> <div class="outline-2" id="outline-container-orgfdc44a9"> <h2 id="orgfdc44a9">Conclusion</h2> <div class="outline-text-2" id="text-orgfdc44a9"> <p>The RISC-V architecture is designed to be a simple as possible but no simpler. Therefore a building block philosophy is followed to allow chip designers to include as many or as few instructions as needed. This provides some flexibility to system designers to satisfy cost, efficiency, and performance constraints specific to the application domain.</p> <p>Breaking out instructions into optional extensions is like having Lego bricks representing sub-sets of the total RISC-V ISA. In this chapter the <b>M</b>, <b>A</b>, <b>F</b>, and <b>D</b> extensions were used to create a small library of functions that can be re-used in the future to perform more complex calculations, and to synchronize memory access across hardware threads.</p> <p>In addition to these there are two other optional standard extensions that were not covered in this chapter:</p> <dl class="org-dl"><dt>C</dt> <dd>Compressed instructions.</dd> <dt>V</dt> <dd>Vector instructions for SIMD processing.</dd> </dl><p>Discussion of these extensions will be reserved for future chapter.</p> <p>In the next chapter the priviledged instruction set will be described. This allows for varying levels of support for the base instructions. In this chapter, the utility functions defined so far will be used to create more complex programs. The syncrhonization utilities will be particularly useful when dealing with interrupts.</p> </div> </div> </div> </div> <span rel="sioc:has_creator" class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/main/user/1" lang="" about="/main/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">MarcAdmin</a></span> <span property="dc:date dc:created" content="2019-11-30T06:41:45+00:00" datatype="xsd:dateTime" class="field field--name-created field--type-created field--label-hidden">Sat, 11/30/2019 - 01:41</span> <div class="field field--name-field-tags field--type-entity-reference field--label-above clearfix"> <h3 class="field__label">Tags</h3> <ul class="links field__items"> <li><a href="/main/riscv" rel="dc:subject" hreflang="en">RISC-V</a></li> </ul> </div> Sat, 30 Nov 2019 06:41:45 +0000 MarcAdmin 30 at https://www.vociferousvoid.org/main RISC-V Bare Metal Programming Chapter 3: A Link to the Past https://www.vociferousvoid.org/main/riscv_bare_metal_chapter3 <span property="dc:title" class="field field--name-title field--type-string field--label-hidden">RISC-V Bare Metal Programming Chapter 3: A Link to the Past</span> <div property="content:encoded" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><div class="tex2jax_process"> <p>Previous chapters of the RISC-V bare metal programming tutorial have focused primarily on the assembler. In <a href="https://www.vociferousvoid.org/main/riscv_bare_metal_chapter2">chapter 2</a>, assembler directives were discussed along with their relationship to the positioning of code in an executable. The various sections of where code and data reside have well defined semantics in the Executable and Linkable Format specification. In this chapter, these semantics and the linking process will be examined in more detail.</p> <p>The typical programming workflow involves processing the source file using either an assembler (for assembly code), or a compiler (for higher-level programming languages such as C) to produce an object file. The object file by itself cannot be run since it will have references to memory addresses which are relative to the code's position rather than absolute memory offsets. The relative addresses need to be translated into absolute addresses in order for them to make sense to the processor. For example, the following code to call the <b>sum</b> function:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #00ffff;">.text</span> <span style="color: #00ffff;">.align</span> 2 <span style="color: #00ffff;">.global</span> _start <span style="color: #00ffff;">.global</span> _stack_end <span style="color: #87cefa;">_start</span>: <span style="color: #00ffff;">li</span> a0, 5 <span style="color: #00ffff;">li</span> a1, 4 <span style="color: #00ffff;">la</span> sp,_stack_end <span style="color: #00ffff;">call</span> sum <span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop </pre></div> <p>Will be assembled to the following object code:</p> <pre class="example"> <span class="linenr"> 1: </span>$ riscv64-unknown-elf-objdump -d main.o <span class="linenr"> 2: </span> <span class="linenr"> 3: </span>main.o: file format elf64-littleriscv <span class="linenr"> 4: </span> <span class="linenr"> 5: </span> <span class="linenr"> 6: </span>Disassembly of section .text: <span class="linenr"> 7: </span> <span class="linenr"> 8: </span>0000000000000000 &lt;_start&gt;: <span class="linenr"> 9: </span> 0: 00500513 li a0,5 <span class="linenr">10: </span> 4: 00400593 li a1,4 <span class="coderef-off" id="coderef-set_stack"><span class="linenr">11: </span> 8: 00000117 auipc sp,0x0</span> <span class="linenr">12: </span> c: 00010113 mv sp,sp <span class="coderef-off" id="coderef-call_sum"><span class="linenr">13: </span> 10: 00000097 auipc ra,0x0</span> <span class="linenr">14: </span> 14: 000080e7 jalr ra # 10 &lt;_start+0x10&gt; <span class="linenr">15: </span> <span class="linenr">16: </span>0000000000000018 &lt;stop&gt;: <span class="linenr">17: </span> 18: 0000006f j 18 &lt;stop&gt; <span class="linenr">18: </span> </pre><p>The <b>auipc</b> (Add Upper Immediate to PC) adds 0 to the program counter on line <a class="coderef" href="#coderef-set_stack" onmouseout="CodeHighlightOff(this, 'coderef-set_stack');" onmouseover="CodeHighlightOn(this, 'coderef-set_stack');">11</a> which stores the value of the program counter in the <b>sp</b> register. This is followed by a no-op (a move of the value of <b>sp</b> into <b>sp</b> which simply increments the program counter). The purpose of this sequence of instructions is to set the top of the stack, the zero in this instruction must be replaced with the stack's address. The linker is responsible for filling in the correct address which is at the offset defined by the <strong>_stack_end</strong> symbol. The final linked application is shown in the following listing:</p> <pre class="example"> $ riscv64-unknown-elf-objdump -d sum.elf sum.elf: file format elf64-littleriscv Disassembly of section .text: 0000000080000000 &lt;_start&gt;: 80000000: 00500513 li a0,5 80000004: 00400593 li a1,4 80000008: 00009117 auipc sp,0x9 8000000c: ff810113 addi sp,sp,-8 # 80009000 &lt;_stack_end&gt; 80000010: 008000ef jal ra,80000018 &lt;sum&gt; 0000000080000014 &lt;stop&gt;: 80000014: 0000006f j 80000014 &lt;stop&gt; 0000000080000018 &lt;sum&gt;: 80000018: fe010113 addi sp,sp,-32 8000001c: 00113c23 sd ra,24(sp) 80000020: 00b50533 add a0,a0,a1 80000024: 01813083 ld ra,24(sp) 80000028: 02010113 addi sp,sp,32 8000002c: 00008067 ret </pre><p>In this listing, the <b>auipc</b> instruction will set <b>sp</b> to address 0x80009008 because:</p> <ul class="org-ul"><li><b>pc</b> is 0x80000008 (the offset of the current instruction)</li> <li>The result of <b>auipc</b> is <b>sp</b> = <b>pc</b> + (0x9 &lt;&lt; 12)</li> </ul><p>The next instruction will subtract 8 from the value of <b>sp</b> to set the top of the stack at memory offset 0x80009000.</p> <p>Similarly, the next two instructions starting at line <a class="coderef" href="#coderef-call_sum" onmouseout="CodeHighlightOff(this, 'coderef-call_sum');" onmouseover="CodeHighlightOn(this, 'coderef-call_sum');">13</a> (offsets 0x10 and 0x14) of the object file are place-holders for the the call to the <b>sum</b> subroutine. The <b>auipc</b> instruction sets the <b>ra</b> register to the current value of the program counter (<b>ra</b> = <b>pc</b> + (0x0 &lt;&lt; 12)), the next instruction jumps to the address in <b>ra</b> (offset 10) then sets <b>ra</b> to <b>pc</b> + 4 (0x18). If the program could execute as-is, this would result in an infinite loop. The proper memory offsets need to be filled in by the linker.</p> <p>In the final linked program, these two instructions are replaced by a <b>jal</b> which sets <b>ra</b> (the return address) to the instruction at 0x80000014 (the infinite loop), then jumps to the offset at 0x80000018 (the start of the <b>sum</b> subroutine).</p> <div class="outline-2" id="outline-container-orgaab4bce"> <h1 id="orgaab4bce">The Adventure of Link</h1> <div class="outline-text-2" id="text-1"> <p>Although the linking phase may seem like magic, it is largely under the control of the developer via the linker script. Chapter 2 introduced an example linker script. However, the explanation of its purpose was very superficial. In this chapter, the process of linking the application will be studied more thoroughly. The linker script from the chapter 2 is illustrated in the following listing:</p> <pre class="example"> <span class="linenr"> 1: </span>OUTPUT_ARCH( "riscv" ) <span class="linenr"> 2: </span>SECTIONS { <span class="linenr"> 3: </span> . = 0x80000000; <span class="linenr"> 4: </span> .text : { <span class="linenr"> 5: </span> PROVIDE(_text_start = .); <span class="linenr"> 6: </span> * (.text.init); <span class="linenr"> 7: </span> * (.text .text.*); <span class="linenr"> 8: </span> PROVIDE(_text_end = .); <span class="linenr"> 9: </span> } <span class="linenr">10: </span> PROVIDE(_global_pointer = .); <span class="linenr">11: </span> .rodata : { <span class="linenr">12: </span> PROVIDE(_rodata_start = .); <span class="linenr">13: </span> *(.srodata .srodata.*) *(.rodata .rodata.*) <span class="linenr">14: </span> PROVIDE(_rodata_end = .); <span class="linenr">15: </span> } <span class="linenr">16: </span> .data : { <span class="linenr">17: </span> . = ALIGN(4096); <span class="linenr">18: </span> PROVIDE(_data_start = .); <span class="linenr">19: </span> *(.sdata .sdata.*) *(.data .data.*) <span class="linenr">20: </span> PROVIDE(_data_end = .); <span class="linenr">21: </span> } <span class="linenr">22: </span> .bss : { <span class="linenr">23: </span> PROVIDE(_bss_start = .); <span class="linenr">24: </span> *(.sbss .sbss.*) *(.bss .bss.*) <span class="linenr">25: </span> PROVIDE(_bss_end = .); <span class="linenr">26: </span> } <span class="linenr">27: </span> PROVIDE(_stack_start = _bss_end); <span class="linenr">28: </span> PROVIDE(_stack_end = _stack_start + 0x8000); <span class="linenr">29: </span>} </pre><p>The first statement in the linker script sets the architecture of the target machine; in this case RISC-V.</p> <p>The more important command is the <b>SECTIONS</b> statement which defines the different sections of the ELF file. As discussed in chapter 2, code and data have different memory requirements. Typically code will be stored in read-only memory and data will be stored in memory that can be read-only or writable depending on the constraints on the data. The <b>SECTIONS</b> declaration is used to prescribe how the code and data will be organized in the final binary.</p> <p>Within the <b>SECTIONS</b> block of the script, the period (.) is a special token that represents the location counter. This is essentially the current offset in memory. By default the location counter always starts at offset 0. However, the current position can be set explicitly by assigning to the period token. Since the reset vector of the VirtIO board is at memory address 0x80000000, this address is assigned to the location counter to ensure that this area of memory is populated by the linked binary.</p> </div> <div class="outline-3" id="outline-container-org4516f2c"> <h2 id="org4516f2c">Code</h2> <div class="outline-text-3" id="text-1-1"> <p>Once the location counter is initialized, the next statement in the linker script is the declaration of the .text section. This is where all the executable code is expected to be found. The section is declared by specifying its name (.text) followed by a colon (:) and a pair of braces that enclose the statements specific to the current section.</p> <p>The first statement in the .text section block is a <b>PROVIDE</b> command. This defines a linker symbol that can be used when linking. In this case it defines a global symbol named <b>_text_start </b>which has the current value of the location counter. This symbol can be used to refer to the starting offset of the .text section in memory.</p> <p>The next statement uses wildcards to aggregate the assembly code in the <b>.text.init</b> section of all object files provided to the linker. The next statement aggregates the code from the <b>.text</b> and <b>.text.*</b> sections in all of the provided object files. The later rule will match any section name prefixed with the <b>.text.</b> substring.</p> <p>Note that the order in which object files are provided to the linker matters. If the <b>add.o</b> and <b>main.o</b> files are linked with the add function provided first, the resulting binary will have the following layout:</p> <pre class="example"> $ riscv64-unknown-elf-ld -T linker.lds -o sum.elf add.o main.o $ riscv64-unknown-elf-objdump -d sum.elf sum.elf: file format elf64-littleriscv Disassembly of section .text: 0000000080000000 &lt;sum&gt;: 80000000: fe010113 addi sp,sp,-32 80000004: 00113c23 sd ra,24(sp) 80000008: 00b50533 add a0,a0,a1 8000000c: 01813083 ld ra,24(sp) 80000010: 02010113 addi sp,sp,32 80000014: 00008067 ret 0000000080000018 &lt;_start&gt;: 80000018: 00500513 li a0,5 8000001c: 00400593 li a1,4 80000020: 00009117 auipc sp,0x9 80000024: fe010113 addi sp,sp,-32 # 80009000 &lt;_stack_end&gt; 80000028: fd9ff0ef jal ra,80000000 &lt;sum&gt; 000000008000002c &lt;stop&gt;: 8000002c: 0000006f j 8000002c &lt;stop&gt; </pre><p>This is clearly not the desired result since the first instruction will create a stack frame, save the return address (which is undefined), then perform the add with un-initialized argument registers. To ensure that the initialization comes before the function implementation, the <code>main.s</code> file must be changed to add its code to the <b>.text.init</b> section:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="coderef-off" id="coderef-init_section"><span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span></span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="linenr"> 6: </span> <span style="color: #00ffff;">li</span> a0, 5 <span class="linenr"> 7: </span> <span style="color: #00ffff;">li</span> a1, 4 <span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> sp,_stack_end <span class="linenr"> 9: </span> <span style="color: #00ffff;">call</span> sum <span class="linenr">10: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop </pre></div> <p>The <code>main.s</code> file uses the <b>.section</b> assembler directive on line <a class="coderef" href="#coderef-init_section" onmouseout="CodeHighlightOff(this, 'coderef-init_section');" onmouseover="CodeHighlightOn(this, 'coderef-init_section');">1</a> to declare that all code that follows should be copied into the <b>.text.init</b> section. This will ensure that the initialization code will always be first due to the command to aggregate code from this section preceding any other in the linker script. The linked program will now have the proper layout regardless of the order in which the object files are provided to the linker.</p> <pre class="example"> $ riscv64-unknown-elf-ld -T linker.lds -o sum.elf add.o main.o $ riscv64-unknown-elf-objdump -d sum.elf sum.elf: file format elf64-littleriscv Disassembly of section .text: 0000000080000000 &lt;_start&gt;: 80000000: 00500513 li a0,5 80000004: 00400593 li a1,4 80000008: 00009117 auipc sp,0x9 8000000c: ff810113 addi sp,sp,-8 # 80009000 &lt;_stack_end&gt; 80000010: 008000ef jal ra,80000018 &lt;sum&gt; 0000000080000014 &lt;stop&gt;: 80000014: 0000006f j 80000014 &lt;stop&gt; 0000000080000018 &lt;sum&gt;: 80000018: fe010113 addi sp,sp,-32 8000001c: 00113c23 sd ra,24(sp) 80000020: 00b50533 add a0,a0,a1 80000024: 01813083 ld ra,24(sp) 80000028: 02010113 addi sp,sp,32 8000002c: 00008067 ret </pre><p>The <b>_start</b> code is located at offset 0x80000000 as expected even if the object file for the <b>sum</b> function was provided to the linker first. Now that the code is properly organized, let's look at the data.</p> </div> </div> <div class="outline-3" id="outline-container-org1578180"> <h2 id="org1578180">Data</h2> <div class="outline-text-3" id="text-1-2"> <p>The linker script defines three data sections: .rodata, .data, and .bss. It may seem odd to have four sections (if the .text section is counted) when a program is comprised only of two types on information: code and data. The reason is that data can be divided up into three categories:</p> <ul class="org-ul"><li>Global Read-only Data</li> <li>Global Initialized Data</li> <li>Global Un-initialized Data</li> </ul><p>Local data is not considered because this type of data is generated at run time and will be stored either in stack memory, or some allocated heap buffer. For now the focus will be on global data. The differences in each type of data can be illustrated using a simple C program:</p> <div class="org-src-container"> <pre class="src src-c"> <span class="coderef-off" id="coderef-rodata"><span class="linenr"> 1: </span><span style="color: #00ffff;">const</span> <span style="color: #98fb98;">int</span> <span style="color: #eedd82;">operand1</span> = 4 ;</span> <span class="coderef-off" id="coderef-data"><span class="linenr"> 2: </span><span style="color: #98fb98;">int</span> <span style="color: #eedd82;">operand2</span> = 5 ;</span> <span class="coderef-off" id="coderef-bss"><span class="linenr"> 3: </span><span style="color: #98fb98;">int</span> <span style="color: #eedd82;">result</span> ;</span> <span class="linenr"> 4: </span> <span class="linenr"> 5: </span><span style="color: #98fb98;">int</span> <span class="linenr"> 6: </span><span style="color: #87cefa;">sum</span>( <span style="color: #98fb98;">int</span> <span style="color: #eedd82;">op1</span>, <span style="color: #98fb98;">int</span> <span style="color: #eedd82;">op2</span> ) <span class="linenr"> 7: </span>{ <span class="linenr"> 8: </span> <span style="color: #00ffff;">return</span> op1 + op2 ; <span class="linenr"> 9: </span>} <span class="linenr">10: </span> <span class="linenr">11: </span><span style="color: #98fb98;">int</span> <span class="linenr">12: </span><span style="color: #87cefa;">main</span>( <span style="color: #98fb98;">int</span> <span style="color: #eedd82;">argc</span>, <span style="color: #98fb98;">char</span>** <span style="color: #eedd82;">argv</span> ) <span class="linenr">13: </span>{ <span class="linenr">14: </span> result = sum( operand1, operand2 ) ; <span class="linenr">15: </span> <span style="color: #00ffff;">while</span> ( 1 ) ; <span class="linenr">16: </span>} </pre></div> </div> <div class="outline-4" id="outline-container-org600643c"> <h3 id="org600643c">Global Read-only Data</h3> <div class="outline-text-4" id="text-1-2-1"> <p>The <b>operand1</b> variable is declared on line <a class="coderef" href="#coderef-rodata" onmouseout="CodeHighlightOff(this, 'coderef-rodata');" onmouseover="CodeHighlightOn(this, 'coderef-rodata');">1</a>. This is a read-only value initialized to the integer 4. The compiler will store this data in the <b>.srodata</b> section of the object file:</p> <pre class="example"> $ riscv64-unknown-elf-objdump -s -j.srodata addc.o addc.o: file format elf64-littleriscv Contents of section .srodata: 0000 04000000 .... </pre><p>Assembler directives can also be used to populate the .rodata section in an assembly program. The following fragment can be added to the end of the main.s program to declare the <b>operand1</b> constant in the .rodata section:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr">1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".rodata"</span> <span class="coderef-off" id="coderef-operand1_def"><span class="linenr">2: </span><span style="color: #87cefa;">operand1</span>: .word 4</span> </pre></div> <p>The constant <b>operand1</b> is defined on line <a class="coderef" href="#coderef-operand1_def" onmouseout="CodeHighlightOff(this, 'coderef-operand1_def');" onmouseover="CodeHighlightOn(this, 'coderef-operand1_def');">2</a> using the <b>.word</b> assembler directive. This will store the given value as a 32-bit quantity in the current memory word. This directive allows any number of words to be stored by specifying the values as a comma separated list. For example, the following directive will store the 32-bit values 0x0001, 0x0002, and 0x0003 in successive memory words:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #00ffff;">.word</span> 1, 2, 3 </pre></div> <p>If this program is linked using the example linker script, this data will be included in the <b>.rodata</b> section. The script instructs the linker to aggregate all definitions of <b>.srodata</b> and <b>.rodata</b> as well as section names prefixed with <b>.rodata.</b> or <b>.srodata.</b> into a single <b>.rodata</b> section.</p> </div> </div> <div class="outline-4" id="outline-container-org624b212"> <h3 id="org624b212">Global Initialized Data</h3> <div class="outline-text-4" id="text-1-2-2"> <p>The <b>operand2</b> variable is declared on line <a class="coderef" href="#coderef-data" onmouseout="CodeHighlightOff(this, 'coderef-data');" onmouseover="CodeHighlightOn(this, 'coderef-data');">2</a>. This is a mutable variable initialized with the integer value 5. The compiler will store this data in the <b>.sdata</b> section of the object file.</p> <pre class="example"> $ riscv64-unknown-elf-objdump -s -j.sdata addc.o addc.o: file format elf64-littleriscv Contents of section .sdata: 0000 05000000 .... </pre><p>The <code>linker.lds</code> script instructs the linker to aggregate all data declared in the <b>.sdata</b>, and <b>.data</b> sections, as well as all sections prefixed by <b>.data.</b> or <b>.sdata.</b> into a single <b>.data</b> section.</p> <p>The <b>operand2</b> global value can similarly be defined using assembler directives. The relevant assembly code is illustrated in the following listing:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #00ffff;">.data</span> <span style="color: #87cefa;">operand2</span>: .word 5 </pre></div> <p>The <b>.data</b> assembler directive ensures that all following instructions or directives will affect the <b>.data</b> section.</p> </div> </div> <div class="outline-4" id="outline-container-orga0973e2"> <h4 id="orga0973e2"><span class="section-number-4">1.2.3</span> Global Un-initialized Data</h4> <div class="outline-text-4" id="text-1-2-3"> <p>The <b>result</b> variable is declared on line <a class="coderef" href="#coderef-bss" onmouseout="CodeHighlightOff(this, 'coderef-bss');" onmouseover="CodeHighlightOn(this, 'coderef-bss');">3</a> of the C program. This variable is said to be un-initialized because it is not assigned a value at build time. The C language guarantees that un-initialized global variables will be initialized to zero. The system is responsible for initializing this data to zero before handing control over to the C program. This is easier to do if un-initialized data is aggregated into a common section. The loader can then simply zero out the entire section; in this case the <b>.bss</b> section:</p> <pre class="example"> $ riscv64-unknown-elf-objdump -D -j.bss sumc.o sumc.o: file format elf64-littleriscv Disassembly of section .bss: 0000000000000000 &lt;result&gt;: 0: 0000 unimp ... </pre><p>The linker script will instruct the linker to aggregate all data definitions in the <b>.sbss</b> or <b>.bss</b> sections as, well as those in section names prefixed by <b>.sbss.</b> or <b>.bss.</b>, into a single <b>.bss</b> section.</p> <p>The <b>result</b> global variable can also be defined using assembler directives. The relevant assembly code is illustrated in the following listing:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #00ffff;">.bss</span> <span style="color: #87cefa;">result</span>: .word 0 </pre></div> <p>The <b>.bss</b> assembler directive will ensure that the result memory location is located in the <b>.bss</b> section of the binary file. Note that the value of the <b>result</b> variable is initialized to zero. Even though this is un-initialized data, if no value is specified, no memory will be allocated to the result variable. Since this is the <b>.bss</b> section, initializing this memory to zero satisfies the C language guarantees.</p> </div> </div> </div> <div class="outline-3" id="outline-container-org33398eb"> <h2 id="org33398eb">The Final Boss</h2> <div class="outline-text-3" id="text-1-3"> <p>The assembly program to add two integer values can be update to read its operands from memory. The following listing shows the updated <code>main.s</code> assembly code:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="linenr"> 1: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".text.init"</span> <span class="linenr"> 2: </span> <span style="color: #00ffff;">.align</span> 2 <span class="linenr"> 3: </span> <span style="color: #00ffff;">.global</span> _start <span class="linenr"> 4: </span> <span style="color: #00ffff;">.global</span> _stack_end <span class="linenr"> 5: </span><span style="color: #87cefa;">_start</span>: <span class="coderef-off" id="coderef-load_operand2"><span class="linenr"> 6: </span> <span style="color: #00ffff;">lw</span> a0, operand2</span> <span class="coderef-off" id="coderef-load_operand1"><span class="linenr"> 7: </span> <span style="color: #00ffff;">lw</span> a1, operand1</span> <span class="linenr"> 8: </span> <span style="color: #00ffff;">la</span> sp,_stack_end <span class="linenr"> 9: </span> <span style="color: #00ffff;">call</span> sum <span class="coderef-off" id="coderef-load_result"><span class="linenr">10: </span> <span style="color: #00ffff;">la</span> t1, result</span> <span class="coderef-off" id="coderef-save_result"><span class="linenr">11: </span> <span style="color: #00ffff;">sw</span> a0, 0(t1)</span> <span class="linenr">12: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop <span class="linenr">13: </span> <span style="color: #00ffff;">.section</span> <span style="color: #ffa07a;">".rodata"</span> <span class="linenr">14: </span><span style="color: #87cefa;">operand1</span>: .word 4 <span class="linenr">15: </span> <span style="color: #00ffff;">.data</span> <span class="linenr">16: </span><span style="color: #87cefa;">operand2</span>: .word 5 <span class="linenr">17: </span> <span style="color: #00ffff;">.bss</span> <span class="linenr">18: </span><span style="color: #87cefa;">result</span>: .word 0 </pre></div> <p>The value of <b>operand2</b> is loaded into the function argument register <b>a0</b> on line <a class="coderef" href="#coderef-load_operand2" onmouseout="CodeHighlightOff(this, 'coderef-load_operand2');" onmouseover="CodeHighlightOn(this, 'coderef-load_operand2');">6</a>. The value of <b>operand1</b> is loaded into the function argument register <b>a1</b> on line <a class="coderef" href="#coderef-load_operand1" onmouseout="CodeHighlightOff(this, 'coderef-load_operand1');" onmouseover="CodeHighlightOn(this, 'coderef-load_operand1');">7</a>. The <b>sum</b> function is called with those operands and the result (stored in <b>a0</b>), is saved in the memory word allocated for <b>result</b> in the <b>.bss</b> section on line <a class="coderef" href="#coderef-save_result" onmouseout="CodeHighlightOff(this, 'coderef-save_result');" onmouseover="CodeHighlightOn(this, 'coderef-save_result');">11</a>. Before saving the result, the memory location of the <b>result</b> variable must first be loaded into a register; this is accomplished using the <b>la</b> instruction on line <a class="coderef" href="#coderef-load_result" onmouseout="CodeHighlightOff(this, 'coderef-load_result');" onmouseover="CodeHighlightOn(this, 'coderef-load_result');">10</a>.</p> <p>This program can be assembled and linked with the <code>add.s</code> program as follows:</p> <pre class="example"> $ riscv64-unknown-elf-as -o main.o main.s $ riscv64-unknown-elf-as -o add.o add.s $ riscv64-unknown-elf-ld -T linker.lds -o sum.elf add.o main.o </pre><p>The memory offset of the <b>result</b> can be obtained by inspecting the resulting ELF file:</p> <pre class="example"> $ riscv64-unknown-elf-objdump -D -j.bss sum.elf sum.elf: file format elf64-littleriscv Disassembly of section .bss: 0000000080001004 &lt;result&gt;: 80001004: 0000 unimp ... </pre><p>The program can now be tested using the qemu emulator. Given that the memory location of the <b>result</b> variable is 0x80001004, we can inspect this memory location to ensure that it contains the expected result:</p> <pre class="example"> $ qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel sum.elf QEMU 3.1.0 monitor - type 'help' for more information (qemu) xp /1dw 0x80001004 0000000080001004: 9 (qemu) </pre><p>The memory location that corresponds with the <b>result</b> variable does in fact contain the value 9 (which is the sum of 4 and 5). Therefore the program is behaving as expected.</p> </div> </div> </div> <div class="outline-2" id="outline-container-org944c620"> <h1 id="org944c620">Conclusion</h1> <div class="outline-text-2" id="text-2"> <p>In this chapter, the linking process was studied in more detail. The process of linking the object files into a final binary was hopefully demystified by describing the process of transforming relative offsets to absolute memory offsets. The primary example was setting the location of the top of the stack and setting up the <b>sum</b> function call.</p> <p>The use of a linker script illustrates how code and data can more intelligently be organized in a binary file. The greater flexibility offered when using the linker script has allowed for enhancements to the original <code>add.s</code> program from chapter 1. The operands for the sum are now read from offsets in memory; each in different data sections. Moreover additional assembler directives were introduced in this chapter to allow better control over how code and data are placed in the linked binary file. In particular:</p> <dl class="org-dl"><dt>.text</dt> <dd>Specify that what follows goes into the .text section.</dd> <dt>.data</dt> <dd>Specify that what follows goes into the .data section.</dd> <dt>.bss</dt> <dd>Specify that what follows goes into the .bss section.</dd> <dt>.section</dt> <dd>Set the section name explicitly for the code and data that follows.</dd> <dt>.word</dt> <dd>Store the specified 32-bit quantities into successive memory words.</dd> </dl><p>Other useful assembler directives that were not covered in this chapter include:</p> <dl class="org-dl"><dt>.byte</dt> <dd>Store the specified 8-bit quantities into successive bytes of memory.</dd> <dt>.half</dt> <dd>Store the specified 16-bit quantities into successive memory half-words.</dd> <dt>.dword</dt> <dd>Store the specified 64-bit quantities into successive memory double-words.</dd> <dt>.string</dt> <dd>Store a string in memory and null-terminate it.</dd> </dl><p>Following chapters will start to look at some of the standard extensions of the RISC-V ISA. So far only the RV64I instructions have been used, wereas the ISA defines extensions for multiplication (RVM), atomic operations (RVA), floating point and double precision floating point operations (RVF and RVD), and compressed instructions (RVC). These extensions will be useful for building more complex programs.</p> </div> </div> </div> </div> <span rel="sioc:has_creator" class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/main/user/1" lang="" about="/main/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">MarcAdmin</a></span> <span property="dc:date dc:created" content="2019-11-12T17:27:29+00:00" datatype="xsd:dateTime" class="field field--name-created field--type-created field--label-hidden">Tue, 11/12/2019 - 12:27</span> <div class="field field--name-field-tags field--type-entity-reference field--label-above clearfix"> <h3 class="field__label">Tags</h3> <ul class="links field__items"> <li><a href="/main/riscv" rel="dc:subject" hreflang="en">RISC-V</a></li> </ul> </div> Tue, 12 Nov 2019 17:27:29 +0000 MarcAdmin 29 at https://www.vociferousvoid.org/main RISC-V Bare Metal Programming Chapter 2: OpCodes Assemble! https://www.vociferousvoid.org/main/riscv_bare_metal_chapter2 <span property="dc:title" class="field field--name-title field--type-string field--label-hidden">RISC-V Bare Metal Programming Chapter 2: OpCodes Assemble!</span> <div property="content:encoded" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><div class="tex2jax_process"> <p>The <a href="/mmlab/riscv_bare_metal_chapter1">previous chapter</a> of this tutorial went over the steps required to setup a RISC-V development environment to create a program that runs on a bare metal VirtIO board using QEMU. Even though the example program – which calculates the sum two integers – was written in RISC-V assembly, no prior knowledge was required to follow along. This chapter will dive into the details of RISC-V assembly language as well as expand on what exactly is happening at each step of the development. The topics covered in this chapter will include an overview of the RISC-V architecture, its assembly instructions, pseudo-instructions, and directives.</p> <p>The following listing illstrates the assembly code of the <b>add.s</b> program from the previous chapter:</p> <div class="org-src-container"> <pre class="src src-asm"> <span class="coderef-off" id="coderef-section"><span class="linenr">1: </span> <span style="color: #00ffff;">.text</span></span> <span class="coderef-off" id="coderef-start_sym"><span class="linenr">2: </span> <span style="color: #00ffff;">.global</span> _start</span> <span class="linenr">3: </span><span style="color: #87cefa;">_start</span>: <span class="coderef-off" id="coderef-load5"><span class="linenr">4: </span> <span style="color: #00ffff;">li</span> a0, 5</span> <span class="coderef-off" id="coderef-load4"><span class="linenr">5: </span> <span style="color: #00ffff;">li</span> a1, 4</span> <span class="coderef-off" id="coderef-add"><span class="linenr">6: </span> <span style="color: #00ffff;">add</span> a0, a0, a1</span> <span class="coderef-off" id="coderef-loop"><span class="linenr">7: </span><span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop</span> </pre></div> <p>The code was changed a little to use a different set of registers. This program when assembled results in an object file which can then be linked to create the file that will be loaded onto the board. In this scenario, only the one object file is used, however, more complex programs may required more than one object file. The linker's job is to put all of these files together into a single executable program.</p> <p>This <b>add.s</b> program is composed of one instruction (<a class="coderef" href="#coderef-add" onmouseout="CodeHighlightOff(this, 'coderef-add');" onmouseover="CodeHighlightOn(this, 'coderef-add');">6</a>), three pseudo-instructions (<a class="coderef" href="#coderef-load5" onmouseout="CodeHighlightOff(this, 'coderef-load5');" onmouseover="CodeHighlightOn(this, 'coderef-load5');">4</a> <a class="coderef" href="#coderef-load4" onmouseout="CodeHighlightOff(this, 'coderef-load4');" onmouseover="CodeHighlightOn(this, 'coderef-load4');">5</a>, and <a class="coderef" href="#coderef-loop" onmouseout="CodeHighlightOff(this, 'coderef-loop');" onmouseover="CodeHighlightOn(this, 'coderef-loop');">7</a>), and two directives (<a class="coderef" href="#coderef-section" onmouseout="CodeHighlightOff(this, 'coderef-section');" onmouseover="CodeHighlightOn(this, 'coderef-section');">1</a>, <a class="coderef" href="#coderef-start_sym" onmouseout="CodeHighlightOff(this, 'coderef-start_sym');" onmouseover="CodeHighlightOn(this, 'coderef-start_sym');">2</a>). Moreover, the instructions and pseudo-instructions have operands comprised of either registers or immediates. Understanding each of these entities will help when creating more complex programs in RISC-V assembly.</p> <div class="outline-2" id="outline-container-org71d0bd6"> <h1 id="org71d0bd6">Registers</h1> <div class="outline-text-2" id="text-1"> <p>RISC-V systems will have a base set of 32 registers <b>x0</b>-<b>x31</b>. The <b>x0</b> register is read-only with a value fixed to zero. The rest will have varying content. The application binary interface (ABI) prescribes conventions for the name and usage of the various registers. These are listed in the following table:</p> <table border="2" cellpadding="6" cellspacing="0" frame="hsides" rules="groups"><colgroup><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /></colgroup><thead><tr><th class="org-left" scope="col">Register(s)</th> <th class="org-left" scope="col">ABI Name(s)</th> <th class="org-left" scope="col">Description</th> <th class="org-left" scope="col">Saved by</th> </tr></thead><tbody><tr><td class="org-left">x0</td> <td class="org-left">zero</td> <td class="org-left">Hard-wired zero</td> <td class="org-left">N/A</td> </tr><tr><td class="org-left">x1</td> <td class="org-left">ra</td> <td class="org-left">Function return address</td> <td class="org-left">Caller</td> </tr><tr><td class="org-left">x2</td> <td class="org-left">sp</td> <td class="org-left">Stack pointer</td> <td class="org-left">Callee</td> </tr><tr><td class="org-left">x3</td> <td class="org-left">gp</td> <td class="org-left">Global pointer</td> <td class="org-left">N/A</td> </tr><tr><td class="org-left">x4</td> <td class="org-left">tp</td> <td class="org-left">Thread pointer</td> <td class="org-left">N/A</td> </tr><tr><td class="org-left">x5</td> <td class="org-left">t0</td> <td class="org-left">Temporary/alternate link register</td> <td class="org-left">Caller</td> </tr><tr><td class="org-left">x6-x7</td> <td class="org-left">t1-t2</td> <td class="org-left">Temporary values</td> <td class="org-left">Caller</td> </tr><tr><td class="org-left">x8</td> <td class="org-left">s0/fp</td> <td class="org-left">Saved register/Frame pointer</td> <td class="org-left">Callee</td> </tr><tr><td class="org-left">x9</td> <td class="org-left">s1</td> <td class="org-left">Saved register</td> <td class="org-left">Callee</td> </tr><tr><td class="org-left">x10-x11</td> <td class="org-left">a0-a1</td> <td class="org-left">Function arguments/Return values</td> <td class="org-left">Caller</td> </tr><tr><td class="org-left">x12-x17</td> <td class="org-left">a2-a7</td> <td class="org-left">Function arguments</td> <td class="org-left">Caller</td> </tr><tr><td class="org-left">x18-x27</td> <td class="org-left">s2-s11</td> <td class="org-left">Saved registers</td> <td class="org-left">Callee</td> </tr><tr><td class="org-left">x28-x31</td> <td class="org-left">t3-t6</td> <td class="org-left">Temporary values</td> <td class="org-left">Caller</td> </tr></tbody></table><p>Registers can be referred by their ABI names or their actual names in assembly programs; the two are interchangeable.</p> <p>When a function is invoked, it may modify the values of some of these registers. For this reason it is advisable to save the contents of those registers in memory in order to be able to restore them when the function completes. The ABI convention prescribes which party in a function call (the caller or the callee) is responsible for saving these values. This convention is described in the "Saved by" column of the table.</p> <p>If a register is to be saved by the caller, its value should be stored in a frame of the stack, that was allocated for that purpose, prior to calling the function. This ensures that the values can be restored when the function returns. In general, it is a good idea to save all of the registers if the caller does not know which registers may be modified by the callee.</p> <p>Registers to be saved by the callee only need to be saved to memory if the function uses those registers. Functions must not leave a trace, the state of the machine must be the same as it was prior to the function being invoked (with the exception of the desired function result).</p> <p>A function implementation defined in RISC-V assembly should use the following prologue before doing any of its work:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #87cefa;">function_label</span>: <span style="color: #00ffff;">addi</span> sp, sp, -framesize # The stack grows downward <span style="color: #00ffff;">sd</span> ra,framesize-8(sp) # Save the return address # Save registers owned by the callee as needed to memory. </pre></div> <p>This will ensure that the function can return to the point where it was called, and that any register state will be saved.</p> <p>Before a function returns, the saved register values must be restored. This is achieved by the following epilogue which should end a function call.</p> <div class="org-src-container"> <pre class="src src-asm"> # restore registers from the stack if needed <span style="color: #00ffff;">ld</span> ra, framesize-8(sp) # Restore the return address register <span style="color: #00ffff;">addi</span> sp, sp, framesize # Pop the stack <span style="color: #00ffff;">ret</span> # return to the caller </pre></div> <p>This will restore the saved registers, set the return address and de-allocate the stack frame that was used to save this information.</p> <p>The <b>add.s</b> program can be enhanced to use the function prologue and epliogue to define a function that calculates the sum its arguments in registers <b>a0</b> and <b>a1</b>. This new program is illustrated in the listing that follows:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #00ffff;">.text</span> <span style="color: #00ffff;">.align</span> 2 <span style="color: #00ffff;">.global</span> sum <span style="color: #87cefa;">sum</span>: <span style="color: #00ffff;">addi</span> sp, sp, -32 # Stack frames must be 16-bit aligned <span style="color: #00ffff;">sd</span> ra, 24(sp) # Save the return address <span style="color: #00ffff;">add</span> a0, a0, a1 # Add the function operands <span style="color: #00ffff;">ld</span> ra, 24(sp) # restore return address <span style="color: #00ffff;">addi</span> sp, sp, 32 # De-allocate the stack frame <span style="color: #00ffff;">ret</span> </pre></div> <p>The <b>sum</b> function can be called by name from a different assembler program. Create a <b>main.s</b> program with the following content:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #00ffff;">.text</span> <span style="color: #00ffff;">.align</span> 2 <span style="color: #00ffff;">.global</span> _start <span style="color: #87cefa;">_start</span>: <span style="color: #00ffff;">li</span> a0, 5 <span style="color: #00ffff;">li</span> a1, 4 <span style="color: #00ffff;">call</span> sum <span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop </pre></div> <p>This will load the values 5 and 4 into the registers used for arguments to the the sum function and call it. This can be assembled and linked as follows:</p> <pre class="example"> $ riscv64-unknown-elf-as -o add.o add.s $ riscv64-unknown-elf-as -o main.o main.s $ riscv64-unknown-elf-ld -Ttext=0x80000000 -o sum.elf main.o add.o </pre><p>If this program is assembled, linked, and run in QEMU, it will call the <b>sum</b> function to calculate the sum of the operands. This can be verified by inspecting the value of register <b>a0</b> which should be 9.</p> <p><b>NOTE</b>: The order in which the object files are supplied to the linker is important. If the <b>add.o</b> file is supplied first, the program will not run because its content will be located at the reset address.</p> </div> </div> <div class="outline-2" id="outline-container-org7a6a704"> <h1 id="org7a6a704">Instructions</h1> <div class="outline-text-2" id="text-2"> <p>Instructions are mnemonics that map directly to machine codes. For example the <b>add</b> instruction in the <b>sum</b> function corresponds with the op-code 0x33 (or b0110011).</p> <p>When the <b>add</b> instruction is combined with its operands, the result is a single machine instruction. In RISC-V, all machine instructions are 32-bits long (unless you're using the compressed extension, but for now we'll just deal with 32-bit operations).</p> <p>The <b>add</b> instruction is what's known as an R-type instruction because its operands are all registers. This type of instruction has the following form:</p> <table border="2" cellpadding="6" cellspacing="0" frame="hsides" rules="groups"><colgroup><col class="org-left" /><col class="org-right" /><col class="org-right" /><col class="org-right" /><col class="org-right" /><col class="org-right" /><col class="org-right" /></colgroup><thead><tr><th class="org-left" scope="col">BITS</th> <th class="org-right" scope="col">31:25</th> <th class="org-right" scope="col">24:20</th> <th class="org-right" scope="col">19:15</th> <th class="org-right" scope="col">14:12</th> <th class="org-right" scope="col">11:7</th> <th class="org-right" scope="col">6:0</th> </tr></thead><tbody><tr><td class="org-left">R-Type</td> <td class="org-right">func7</td> <td class="org-right">rs2</td> <td class="org-right">rs1</td> <td class="org-right">func3</td> <td class="org-right">rd</td> <td class="org-right">opcode</td> </tr></tbody></table><p>In this table, the operation's function is a combination of <b>func7</b> and <b>func3</b>. For the <b>add</b> instruction, this is b0000000 and b000. The <b>rs2</b> and <b>rs1</b> field are the source registers whose value will be added. The <b>rd</b> field will be the destination register for the result. Notice that the register fields are 5-bits wide. This allows the instruction to reference any of the 32 base registers (i.e. 2<sup>5</sup> registers). The <b>add</b> instruction from the previous example will be constructed as follows:</p> <dl class="org-dl"><dt>func7</dt> <dd>b0000000</dd> <dt>rs2</dt> <dd>b01011 (for x11 which is a1)</dd> <dt>rs1</dt> <dd>b01010 (for x10 which is a0)</dd> <dt>func3</dt> <dd>b000</dd> <dt>rd</dt> <dd>b01010 (for x10)</dd> <dt>opcode</dt> <dd>b0110011</dd> </dl><p>Putting it all together, we get b00000000101101010000010100110011, or 0x00B50533. We can confirm this by disassembling the object file that was created:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #87cefa;">$</span> <span style="color: #00ffff;">riscv64</span>-unknown-elf-objdump -d add.o <span style="color: #87cefa;">sum</span>.o: file format elf64-littleriscv <span style="color: #87cefa;">Disassembly</span> <span style="color: #00ffff;">of</span> section .text: <span style="color: #87cefa;">0000000000000000</span> &lt;sum&gt;: <span style="color: #00ffff;">0</span>: fe010113 addi sp,sp,-32 <span style="color: #00ffff;">4</span>: 00113c23 sd ra,24(sp) <span style="color: #00ffff;">8</span>: 00b50533 add a0,a0,a1 <span style="color: #00ffff;">c</span>: 01813083 ld ra,24(sp) <span style="color: #00ffff;">10</span>: 02010113 addi sp,sp,32 <span style="color: #00ffff;">14</span>: 00008067 ret </pre></div> <p>The <b>add</b> instruction in the <b>sum</b> function is at offset 8 of the object file, the machine instruction is 00b50533 which is the value that we had calculated for the instruction.</p> <p>In addition to R-Type instructions, there are also I-Type instructions that operate on immediates (literal values), S-Type for storing to memory, U-Type for loading values from memory, B-Type for branching, and J-Type for jumps (e.g. function calls). The following table describes the layout of each of these instruction types:</p> <table border="2" cellpadding="6" cellspacing="0" frame="hsides" rules="groups"><colgroup><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /><col class="org-left" /></colgroup><thead><tr><th class="org-left" scope="col">BITS</th> <th class="org-left" scope="col">31:25</th> <th class="org-left" scope="col">24:20</th> <th class="org-left" scope="col">19:15</th> <th class="org-left" scope="col">14:12</th> <th class="org-left" scope="col">11:7</th> <th class="org-left" scope="col">6:0</th> </tr></thead><tbody><tr><td class="org-left">R-Type</td> <td class="org-left">func7</td> <td class="org-left">rs2</td> <td class="org-left">rs1</td> <td class="org-left">func3</td> <td class="org-left">rd</td> <td class="org-left">opcode</td> </tr><tr><td class="org-left">I-Type</td> <td class="org-left">imm[11:5]</td> <td class="org-left">imm[4:0]</td> <td class="org-left">rs1</td> <td class="org-left">func3</td> <td class="org-left">rd</td> <td class="org-left">opcode</td> </tr><tr><td class="org-left">S-Type</td> <td class="org-left">imm[11:5]</td> <td class="org-left">rs2</td> <td class="org-left">rs1</td> <td class="org-left">func3</td> <td class="org-left">imm[4:0]</td> <td class="org-left">opcode</td> </tr><tr><td class="org-left">U-Type</td> <td class="org-left">imm[31:25]</td> <td class="org-left">imm[24:20]</td> <td class="org-left">imm[19:15]</td> <td class="org-left">imm[14:12]</td> <td class="org-left">rd</td> <td class="org-left">opcode</td> </tr><tr><td class="org-left">B-Type</td> <td class="org-left">imm[12,10:5]</td> <td class="org-left">rs2</td> <td class="org-left">rs1</td> <td class="org-left">func3</td> <td class="org-left">imm[4:1,11]</td> <td class="org-left">opcode</td> </tr><tr><td class="org-left">J-Type</td> <td class="org-left">imm[20,10:5]</td> <td class="org-left">imm[4:1,11]</td> <td class="org-left">imm[19:15]</td> <td class="org-left">imm[14:12]</td> <td class="org-left">rd</td> <td class="org-left">opcode</td> </tr></tbody></table></div> <div class="outline-3" id="outline-container-orga251f7e"> <h2 id="orga251f7e">Pseudo-Instructions</h2> <div class="outline-text-3" id="text-2-1"> <p>Unlike instructions, pseudo-instructions do not map directly to op-codes. Typically these represent idioms to make the programmer's life a little easier.</p> <p>For example, the <b>li</b> in the <b>sum</b> program is an example of a pseudo-instruction. As explained in the previous chapter, this pseudo-instruction maps to an <b>addi</b> I-Type instruction which adds the value of <b>x0</b> to the immediate value and stores the result in the destination register. Pseudo-instructions provide convenient mnemonics for programming without adding additional op-codes.</p> <p>Pseudo-instructions may also translate to more than one assembler instruction. For example, the <strong>call</strong> pseudo-instruction will be<br /> translated into a sequence of three instructions: <strong>auipc</strong>, <strong>addi</strong>, and<strong> jal</strong>.</p> <h1>Directives</h1> </div> </div> </div> <div class="outline-2" id="outline-container-org203f53f"> <div class="outline-text-2" id="text-3"> <p>Directives are commands for the assembler rather than instructions that it will translate into machine code. Directives can be used to tell the assembler where to place code and data in the resulting object file, or to setup the memory of the target system. The previous example used assembler directives to export global symbols, to set the alignment for instructions, and to ensure that the code is assembled into the ".text" section of the object file.</p> <p>To understand the purpose of the assembler directives, it is important to understand how assembled code is linked together. The assembler produces object files that are combined to produce an Executable and Linkable Format (ELF) file. This file will be segmented into different sections:</p> <dl class="org-dl"><dt>.text</dt> <dd>CPU instructions (the executable code).</dd> <dt>.rodata</dt> <dd>Read-only data.</dd> <dt>.data</dt> <dd>Global, mutable, initialized data.</dd> <dt>.bss</dt> <dd>Global, mutable, un-initialized data.</dd> </dl><p>Up to now, only the text section has been used. The location where the code is loaded was specified using the "-T" option when invoking the linker. If multiple object files are passed to the linker, their text sections merged into a single contiguous section.</p> <p>Code and data will have different run-time requirements. Code is generally read-only where as data my required read-write permissions. Therefore it is advantageous that code and data are not interleaved. To ensure this, the locations of text and data sections of the program should not overlap.</p> <p>To avoid having to define the position of each section at the command line, the linker allows the memory layout to be defined using a linker script:</p> <pre class="example"> OUTPUT_ARCH( "riscv" ) SECTIONS { . = 0x80000000; .text : { PROVIDE(_text_start = .); .*(.text.init) main.o (.text) .*(.text .text.*) PROVIDE(_text_end = .); } PROVIDER(_global_pointer = ,); .rodata : { PROVIDE(_rodata_start = .); .*(.rodata .rodata.*) PROVIDE(_rodata_end = .); } .data : { . = ALIGN(4096); PROVIDE(_data_start = .); .*(.sdata .sdata.*) *(.data .data.*) PROVIDE(_data_end = .); } .bss : { PROVIDE(_bss_start = .); .*(.sbss .sbss.*) *(.bss .bss.*) PROVIDE(_bss_end = .); } PROVIDE(_stack_start = _bss_end); PROVIDE(_stack_end = _stack_start + 0x8000); } </pre><p>The <code>SECTIONS</code> keyword is used to specify how the various sections are layed out in the file. In the linker script shown previously, the <code>.text</code>, <code>.rodata</code>, <code>.data</code>, and <code>.bss</code> sections are defined.</p> <p>The .text section will include all code that follows a <code>.section .text.init</code> or <code>.text</code> assembler directive. The <b>sum.s</b> and <b>main.s</b> will therefore both be included in this section. On line <a class="coderef" href="#coderef-main" onmouseout="CodeHighlightOff(this, 'coderef-main');" onmouseover="CodeHighlightOn(this, 'coderef-main');">7</a> of the linker script, the text section of the <b>main.o</b> object file is included explicitly. This will ensure that the main program appears in the linked program before the <b>sum</b> function does (which is included by the wildcard on the next line).</p> <p>The <code>PROVIDE</code> keyword is used to define a symbol at the address of the definition. The start and end of each of the sections will be provided by the linker. Moreover, the start and end of the stack memory area can be declared in this way. In a later chapter, these symbols will be used to setup the stack pointer.</p> <h1>Putting it All Together</h1> </div> </div> <div class="outline-2" id="outline-container-orgd504155"> <div class="outline-text-2" id="text-4"> <p>The program can now be assembled and linked using the following sequence of commands:</p> <pre class="example"> $ riscv64-unknown-elf-as -o sum.o sum.s $ riscv64-unknown-elf-as -o main.o main.s $ riscv64-unknown-elf-ld -T linker.lds -o sum.elf main.o sum.o </pre><p>This will produce an ELF file called <b>sum.elf</b>. By inspecting the <b>sum.elf</b> file, we can see that the <b>_start</b> symbol shows up before the <b>sum</b> function:</p> <pre class="example"> $ riscv64-unknown-elf-objdump -d sum.elf sum.elf: file format elf64-littleriscv Disassembly of section .text: 0000000080000000 &lt;_start&gt;: 80000000: 00500513 li a0,5 80000004: 00400593 li a1,4 80000008: 00009117 auipc sp,0x9 8000000c: ff810113 addi sp,sp,-8 # 80009000 &lt;_stack_end&gt; 80000010: 008000ef jal ra,80000018 &lt;sum&gt; 0000000080000014 &lt;stop&gt;: 80000014: 0000006f j 80000014 &lt;stop&gt; 0000000080000018 &lt;sum&gt;: 80000018: fe010113 addi sp,sp,-32 8000001c: 00113c23 sd ra,24(sp) 80000020: 00b50533 add a0,a0,a1 80000024: 01813083 ld ra,24(sp) 80000028: 02010113 addi sp,sp,32 8000002c: 00008067 ret </pre><p>The disassembled <b>sum.elf</b> also shows that the <b>call</b> pseudo-instruction was translated to the following sequence of instructions:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #87cefa;">auipc</span> <span style="color: #00ffff;">sp</span>,0x9 <span style="color: #87cefa;">addi</span> <span style="color: #00ffff;">sp</span>,sp,-8 <span style="color: #87cefa;">jal</span> <span style="color: #00ffff;">ra</span>,80000018 </pre></div> <p>This program can be run in QEMU just as before and the result should be the same as previous runs.</p> </div> </div> <div class="outline-2" id="outline-container-orgd0dd85d"> <h1 id="orgd0dd85d">Conclusion</h1> <div class="outline-text-2" id="text-5"> <p>This chapter of the Bare Metal RISC-V tutorial covered the assembly language in a little more details. Assembly programs are made up of directives, pseudo-instructions, and instructions. Directives provide guidance to the assembler on how to organize the assembled code. Pseudo-instructions provide useful mnemonics that are mapped to one or more primitive assembler instructions. Instructions are translated into binary machine instructions which direct the execution flow of the processor. In future chapters, this information will be utilised to make the RISC-V processors do more intersting things.</p> </div> </div> </div> </div> <span rel="sioc:has_creator" class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/main/user/1" lang="" about="/main/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">MarcAdmin</a></span> <span property="dc:date dc:created" content="2019-11-06T13:03:58+00:00" datatype="xsd:dateTime" class="field field--name-created field--type-created field--label-hidden">Wed, 11/06/2019 - 08:03</span> <div class="field field--name-field-tags field--type-entity-reference field--label-above clearfix"> <h3 class="field__label">Tags</h3> <ul class="links field__items"> <li><a href="/main/riscv" rel="dc:subject" hreflang="en">RISC-V</a></li> </ul> </div> Wed, 06 Nov 2019 13:03:58 +0000 MarcAdmin 28 at https://www.vociferousvoid.org/main RISC-V Bare Metal Programming Chapter 1: The Setup https://www.vociferousvoid.org/main/riscv_bare_metal_chapter1 <span property="dc:title" class="field field--name-title field--type-string field--label-hidden">RISC-V Bare Metal Programming Chapter 1: The Setup</span> <div property="content:encoded" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><div class="tex2jax_process"> <p>This tutorial, will walk through the process of building and running a RISC-V program on bare metal hardware. The reader is assumed to be familiar with the GNU toolchain and basic C programming. Assembly experience is useful but should not be required to follow along.</p> <p>There are many tutorials available to get started with bare metal programming<sup><a class="footref" href="#fn.1" id="fnr.1">1</a></sup><sup>, </sup><sup><a class="footref" href="#fn.2" id="fnr.2">2</a></sup>. However, most of them are for more common architectures such as x86 and ARM-32. Moreover, many of the existing tutorials are targetted at aspiring OS developers and often assume that the reader is already familiar with embedded programming. The objective of this tutorial is to learn how to walk before being asked to run.</p> <p>This tutorial is loosely modelled after a similar one for the ARM-32 architecture: <a href="http://www.bravegnu.org/gnu-eprog/index.html">Embedded Programming with the GNU Toolchain</a>. However, the RISC-V ISA will be used rather than ARM-V5TE. QEMU will be used to emulate the hardware platform, thus allowing the learner to proceed without having to obtain an actual board.</p> <h2>Why RISC-V?</h2> <p>RISC-V is a modern and open instruction set architecture (ISA). As opposed to X86 and ARM, which have existed for decades and have been updated incrementally, RISC-V was developed from scratch as a clean-slate, minimalist and open ISA informed by the mistakes of the past. As an open ISA, RISC-V is ideal for educational purposes as it is not subject to the whims or fates of a single corporation. More importantly, I have not found may resources for bare metal programming using this architecture<sup><a class="footref" href="#fn.3" id="fnr.3">3</a></sup>, therefore this tutorial aims to that knowledge void.</p> <h2>Setting Up the Host</h2> <p>Most programmers start off in a self-hosted envrionment. This means that they write programs for an environment using the same environment (e.g. writing GNU/Linux programs using a GNU/Linux machine). However, in embedded development, the machine on which a program is run (the target) will generally have a different architecture from the one on which it was created (the host). This section will outline the steps for setting up a GNU/Linux workstation as a host development environment (in my case Debian Buster) to build programs for a RISC-V 64-bit architecture. The steps oulined herein will be specific to Debian based GNU/Linux distributions (i.e. Ubuntu, PureOS, Mint)<sup><a class="footref" href="#fn.4" id="fnr.4">4</a></sup>.</p> <p>The first step of any kind of programming is setting up the toolchain. Since we're building for the RISC-V architecture, we will need a suitable toolchain, and the <a href="https://github.com/riscv/riscv-gnu-toolchain">RISC-V GNU Compiler Toolchain</a> fits the bill.</p> <p>Some dependencies are required in order to build the toolchain. These can be installed using the following command:</p> <pre class="example"> $ sudo apt-get install autoconf automake autotools-dev curl \ libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison \ flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev </pre><p>Next retrieve the source for the RISC-V GNU toolchain and all the required sub-modules:</p> <pre class="example"> $ git clone --recursive https://github.com/riscv/riscv-gnu-toolchain </pre><p>Since we are dealing with a bare metal system, the toolchain must be built for the <a href="https://www.sourceware.org/newlib/">Newlib library</a>. The Newlib library is an implementation of the standard C library for embedded systems.</p> <p>Choose a destination folder on your host system where the toolchain will be installed (this tutorial will assume <code>/opt/riscv/</code>), ensuring that its location is in the PATH. Then build the toolchain with the following commands:</p> <pre class="example"> ./configure --prefix=/opt/riscv make </pre><h2>Do As RISC Does</h2> <p>Once the compiler toolchain is setup, the next requirement is an environment in which to run the programs thereby generated. QEMU will be used for this purpose. Use <code>apt-get</code> on Debian systems to install QEMU:</p> <pre class="example"> $ apt-get install qemu-system-misc </pre><p>This will install several QEMU binaries supporting various architectures including:</p> <ul class="org-ul"><li>qemu-system-riscv64</li> <li>qemu-system-riscv32</li> </ul><p>We're interested in the 64-bit RISC-V version.</p> <h1>RISC-V is Alive</h1> <p>This section describes the process of writing a simple RISC-V program in assembly, and running it on a bare metal virtual board emulated by QEMU. The programming examples are modelled after those in the <a href="http://www.bravegnu.org/gnu-eprog/index.html">Embedded Programming with the GNU Toolchain</a> tutorial, but adapted from ARM-32 to the RISC-V 64-bit architecture.</p> <p>Each line of the assembly program is composed of three optional elements: a label, an instruction, and a comment:</p> <dl class="org-dl"><dt>label</dt> <dd>A label is a convenience to allow the program to refer to a particular memory location using a symbolic name. A label is composed of a sequence of alphanumeric characters, in addition to underscores (_) or dollar signs ($). A label will always be terminated by a colon (:).</dd> <dt>instruction</dt> <dd>Instructions consist of either RISC-V assembly instructions or assembler directives. Assembler directives are prefixed with a period (.)</dd> <dt>comment</dt> <dd>Comments are preceded by an "#" charachter. Anything that follows that character will be ignored until the first newline character.</dd> </dl><p>The first example will simply calculate the sum of two literal numerical values:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #00ffff;">.text</span> <span style="color: #00ffff;">.global</span> _start <span style="color: #87cefa;">_start</span>: <span style="color: #00ffff;">li</span> a2, 5 # a2 = 5 <span style="color: #00ffff;">li</span> a3, 4 # a3 = 4 <span style="color: #00ffff;">add</span> a0, a2, a3 # a0 = a2 + a3 <span style="color: #87cefa;">stop</span>: <span style="color: #00ffff;">j</span> stop </pre></div> <p>The first line is an assembler directive which indicates that the program is meant to go in the "text" section of the binary file. The next line uses the <b>.global</b> directive to define the <b>_start</b> symbol. This will ensure that this symbol is visible to the loader. Next the <b>_start</b> label is defined to indicate the start offset of the program.</p> <p>The first two instructions will load integer values 5 and 4 in to registers <b>a2</b> and <b>a3</b> respectively. The instruction mnemonic stands for "Load Immediate". This is a pseudo instruction since it does not actually correspond with a RISC-V opcode; internally this maps to a RISC-V <b>addi</b> instruction:</p> <div class="org-src-container"> <pre class="src src-asm"> <span style="color: #87cefa;">addi</span> <span style="color: #00ffff;">a2</span>,zero,5 <span style="color: #87cefa;">addi</span> <span style="color: #00ffff;">a3</span>,zero,4 </pre></div> <p>The last line defines the label "stop" and a jump instruction that will return the program counter to the memory address at that label; thus will loop indefinitely.</p> <p>Save the program to a file called <code>add.s</code> then compile it using the following command:</p> <pre class="example"> $ riscv64-unknown-elf-as -o add.o add.s </pre><p>This will create the <b>add.o</b> object file representing the assembled program. Before it can be used, the program must be linked into suitable executable file. We use the loader for this purpose. Moreover since we are targetting a bare metal machine, we have to ensure that our program is loaded to an address where it can be found by the processor.</p> <p>We will be working with the <b>virt</b> machine emulated by QEMU. The reset vector for this machine is located at address 0x8000000. This means that the first instruction that will be executed when the processor is reset will be the one at memory location 0x80000000. Therefore we must ensure that the memory offset labelled "<sub>start</sub>" in our program is loaded at the reset vector location:</p> <pre class="example"> $ riscv64-unknown-elf-ld -Ttext=0x80000000 -o add.elf add.o </pre><p>The <b>-Ttext=</b> options forces the text section (defined using the <b>.text</b> assembler directive) to be loaded at the given memory address. We can verify that the instruction with the label _start is at the correct location using the <b>nm</b> command. The output should be similar to the following:</p> <pre class="example"> $ riscv64-unknown-elf-nm add.elf 0000000080001010 T __BSS_END__ 0000000080001010 T __bss_start 0000000080001010 T __DATA_BEGIN__ 0000000080001010 T _edata 0000000080001010 T _end 0000000080001810 A __global_pointer$ 0000000080001010 T __SDATA_BEGIN__ 0000000080000000 T _start 000000008000000c t stop $ </pre><p>Notice that the <b>_start</b> label in the previous example is located at address 0000000080000000, which corresponds with the reset vector of the virt machine. We can now run the program in QEMU using the following command:</p> <pre class="example"> $ qemu-system-riscv64 -M virt -serial /dev/null -nographic -kernel add.elf QEMU 3.1.0 monitor - type 'help' for more information (qemu) </pre><p>This executes the RISC-V 64-bit QEMU emulator with the following options:</p> <dl class="org-dl"><dt>-M virt</dt> <dd>This sets the machine type to 'virt' which models a RISC-V VirtIO board using the priviledge RISC-V ISA version (1.10).</dd> <dt>-serial /dev/null</dt> <dd>Since there is no I/O in our the serial output is redirected to <code>/dev/null</code>.</dd> <dt>-nographic</dt> <dd>Again, since there is no I/O, we don't need a graphical UI.</dd> <dt>-kernel add.elf</dt> <dd>Load the kernel in add.elf.</dd> </dl><p>Although it doesn't seem that the program has done very much, remember that it is very basic and does not involve any kind if I/O. To ensure that it did what was expected, we have to inspect the state of the machine. The QEMU console allows us to do this. We can inspect the state of the registers using the <code>info registers</code> command:</p> <pre class="example"> (qemu) info registers pc 000000008000000c mhartid 0000000000000000 mstatus 0000000000000000 mip 0000000000000000 mie 0000000000000000 mideleg 0000000000000000 medeleg 0000000000000000 mtvec 0000000000000000 mepc 0000000000000000 mcause 0000000000000000 zero 0000000000000000 ra 0000000000000000 sp 0000000000000000 gp 0000000000000000 tp 0000000000000000 t0 0000000080000000 t1 0000000000000000 t2 0000000000000000 s0 0000000000000000 s1 0000000000000000 a0 0000000000000009 a1 0000000000001020 a2 0000000000000005 a3 0000000000000004 a4 0000000000000000 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000 s2 0000000000000000 s3 0000000000000000 s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000 s8 0000000000000000 s9 0000000000000000 s10 0000000000000000 s11 0000000000000000 t3 0000000000000000 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 ft0 0000000000000000 ft1 0000000000000000 ft2 0000000000000000 ft3 0000000000000000 ft4 0000000000000000 ft5 0000000000000000 ft6 0000000000000000 ft7 0000000000000000 fs0 0000000000000000 fs1 0000000000000000 fa0 0000000000000000 fa1 0000000000000000 fa2 0000000000000000 fa3 0000000000000000 fa4 0000000000000000 fa5 0000000000000000 fa6 0000000000000000 fa7 0000000000000000 fs2 0000000000000000 fs3 0000000000000000 fs4 0000000000000000 fs5 0000000000000000 fs6 0000000000000000 fs7 0000000000000000 fs8 0000000000000000 fs9 0000000000000000 fs10 0000000000000000 fs11 0000000000000000 ft8 0000000000000000 ft9 0000000000000000 ft10 0000000000000000 ft11 0000000000000000 </pre><p>Remember that we had loaded the immediate value 5 in register <b>a2</b>, and the immediate value 4 in register <b>a3</b>. The values in the register dump reflect this. Moreover the program will sum the values in <b>a2</b> and <b>a3</b> and store the result in register <b>a0</b>. The regiseter dump shows that the value in register <b>a0</b> is in fact 9, which is the sum of 4 and 5. Moreover we see that the program counter in register <b>pc</b> is at memory address 0x8000000c which corresponds with the label "stop" which is an infinite loop.</p> <h1>Conclusion</h1> <p>This tutorial went through the process of setting up a host environment for developing bare metal RISC-V programs, creating a simple program to add two numbers in RISC-V assembly, then running this program using QEMU. In future tutorials, the coding examples will get progressively more complex. However, the host environment will remain the same.</p> <h1>Footnotes</h1> <div id="text-footnotes"> <div class="footdef"><sup><a class="footnum" href="#fnr.1" id="fn.1">1</a></sup><div class="footpara"> <p class="footpara"><a href="http://www.bravegnu.org/gnu-eprog/index.html">http://www.bravegnu.org/gnu-eprog/index.html</a></p> </div> </div> <div class="footdef"><sup><a class="footnum" href="#fnr.2" id="fn.2">2</a></sup><div class="footpara"> <p class="footpara"><a href="https://wiki.osdev.org/Bare_Bones">https://wiki.osdev.org/Bare_Bones</a></p> </div> </div> <div class="footdef"><sup><a class="footnum" href="#fnr.3" id="fn.3">3</a></sup><div class="footpara"> <p class="footpara">One good reference is Stepen Marz tutorial for building an OS in Rust: <a href="http://osblog.stephenmarz.com/index.html">http://osblog.stephenmarz.com/index.html</a></p> </div> </div> <div class="footdef"><sup><a class="footnum" href="#fnr.4" id="fn.4">4</a></sup><div class="footpara"> <p class="footpara">Future tutorials may focus on setting up other host platforms.</p> </div> </div> </div> </div> </div> <span rel="sioc:has_creator" class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/main/user/1" lang="" about="/main/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">MarcAdmin</a></span> <span property="dc:date dc:created" content="2019-11-02T01:27:24+00:00" datatype="xsd:dateTime" class="field field--name-created field--type-created field--label-hidden">Fri, 11/01/2019 - 21:27</span> <div class="field field--name-field-tags field--type-entity-reference field--label-above clearfix"> <h3 class="field__label">Tags</h3> <ul class="links field__items"> <li><a href="/main/riscv" rel="dc:subject" hreflang="en">RISC-V</a></li> </ul> </div> Sat, 02 Nov 2019 01:27:24 +0000 MarcAdmin 27 at https://www.vociferousvoid.org/main Game, Set, Match https://www.vociferousvoid.org/main/node/23 <span property="dc:title" class="field field--name-title field--type-string field--label-hidden">Game, Set, Match</span> <div property="content:encoded" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p> One of the motivations in creating the Didactronic Framework was to learn new technology. Many ports of the framework have been started including using Python, Java, Clojure, and most recently Rust. Rust was an interesting options because of its promise of speed, safety and expressiveness. It seemed a good middle ground between imperative and functional programming. Since this is a completely new language and development paradigm for me (being primarily a C and Lisp hacker), the Rust framework will need to be refined over time to make use of the various constructs that are unique to that language. One such construct is the match form. </p> <div id="outline-container-sec-1" class="outline-2"> <h2 id="sec-1">Match</h2> <div class="outline-text-2" id="text-1"> <p> The match form in Rust is similar to a switch statement in C or the cond form in Lisp. Essentially it will evaluate the given expression, and find a matching clause defined in the body of the match statement. This is illustrated in the listing that follows: </p> <div class="org-src-container"> <pre class="src src-rust">match rand::random::&lt;u8&gt;() { 1 =&gt; println!( "Strike one" ), 2 =&gt; println!( "Strike two!" ), 3 =&gt; println!( "You're out!!" ), _ =&gt; println!( "Wait! What?!" ) } </pre> </div> <p> In this example an 8-bit unsigned value is randomly selected. The result of this function is that passed to the match expression. The match expression will match the result with one of the values on the left side. When a match is found, the statement that follows the '=&gt;' will be executed. The underscore is a placeholder which will match any value. Note that matches are attempted in the order in which the branches appear in the statement, therefore a random value of 3 will match the brach whose head is the value 3 before matching the underscore. </p> <p> This same code snippet can be implemented in C as follows: </p> <div class="org-src-container"> <pre class="src src-c">#include &lt;time.h&gt; #include &lt;stdio.h&gt; #include &lt;stdlib.h&gt; int main( int argc, char* argv[] ) { srand( time(NULL) ) ; switch ( rand() ) { case 1: printf( "Strike one" ) ; break ; case 2: printf( "Strike two" ) ; break ; case 3: printf( "Strike three" ) ; break ; default: printf( "Wait! What?!" ) ;break ; } return 0 ; } </pre> </div> <p> Or as a Lisp cond form: </p> <div class="org-src-container"> <pre class="src src-lisp">(let ((count (random)) (cond ((= count 1) (print "Strike one!")) ((= count 2) (print "Strike two!")) ((= count 3) (print "Strike three!")) ('t (print "Wait! What?!")) ) ) ) </pre> </div> <p> The structure of each of these examples are fairly similar. However, where the Rust match statement really shines is in binding with sub-expressions. </p> </div> </div> <div id="outline-container-sec-2" class="outline-2"> <h2 id="sec-2">Sub-Expression Matching</h2> <div class="outline-text-2" id="text-2"> <p> In experimenting with the Didactronic framework, I have created an example tic-tac-toe program to serve as a reference, as well as to test out the framework's design. Each player in the game is associated with a marker which is defined as an enumeration: </p> <div class="org-src-container"> <pre class="src src-rust">pub enum Marker { X, O } </pre> </div> <p> A configuration of the game board will represent the state of the game. The Configuration structure, which incidentally implements the State trait from the framework, is defined as follows: </p> <div class="org-src-container"> <pre class="src src-rust">pub struct Grid { states: RefCell&lt;HashMap&lt;u32,Rc&lt;Configuration&gt;&gt;&gt;, } pub struct Configuration { id: u32, last: Option&lt;Marker&gt;, value: f32, grid: Rc&lt;Grid&gt;, } </pre> </div> <p> Each configuration has associated therewith an ID which uniquely identifies it within the Grid environment and a value. The last field indicates the marker associated with the player who made the move that lead to the current Configuration. This is will be one of: Some(Marker::X), Some(Marker:O), or None. The match expression can be used to determine the marker of the next player to play as follows: </p> <div class="org-src-container"> <pre class="src src-rust">match configuration.last { Some(Player::X) =&gt; Player::O, _ =&gt; Player:X, } </pre> </div> <p> In this expression, Rust will attempt to bind the configuration's last field with the value Some(Player::X) and return Player::O, otherwise it will match the underscore and return Player::X. </p> <p> This is very similar to the use of match from the previous section. A more intersting use could be in retrieving Configurations from the Grid's state set. When retrieving a Configuration via HashMap::get() function, either some state will be obtained, or None if no such state exists in the set. When a state is found, we want to clone its counted reference in order for the state set to retain ownership of the original state: </p> <div class="org-src-container"> <pre class="src src-rust">let state = match grid.borrow().get( &amp;id ) { Some(s) =&gt; Rc::clone(&amp;s), None =&gt; Rc::new(Configuration{ id, last, value: 0.0, grid: Rc::clone(&amp;self.grid) }) } </pre> </div> <p> In this example, Rust will attempt to bind the result of the get() operation with Some(s) where s is the unwrapped version of the Option container. By definition of the Grid structure, this will be a standard Rc&lt;Configuration&gt; which can be cloned. However, if the state is not found in the Grid, a new state will be created. This illustrates the sub-expression binding that is possible using Rust. </p> <div id="outline-container-sec-3" class="outline-2"> <h2 id="sec-3">Conclusion</h2> <div class="outline-text-2" id="text-3"> <p> The match expression in Rust is a very powerful and expressive form. In many ways, it reminds me of my undergraduate days writing in Prolog; the bind logic seems to be very similar. Its ability to do sub-expression matching make it more powerful than the equivalent C (switch) or Lisp (cond) forms. That being said, I have not done any kind of performance analysis to determine how efficient this expression is. For the time being, I am satisfied with getting everything to work. There will be time to make it work faster once that particular summit has been reached. If and when such an analysis is performed, the results will assuredly appear in this blog. Until then, the hacking continues: Game (Tic-Tac-Toe), set (Grid.states), and match (in case the inspiration for the title was not clear). </p> </div> <span rel="sioc:has_creator" class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/main/user/1" lang="" about="/main/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">MarcAdmin</a></span> <span property="dc:date dc:created" content="2018-08-21T12:58:39+00:00" datatype="xsd:dateTime" class="field field--name-created field--type-created field--label-hidden">Tue, 08/21/2018 - 08:58</span> Tue, 21 Aug 2018 12:58:39 +0000 MarcAdmin 23 at https://www.vociferousvoid.org/main https://www.vociferousvoid.org/main/node/23#comments Design of the Didactronic Toolkit. https://www.vociferousvoid.org/main/node/22 <span property="dc:title" class="field field--name-title field--type-string field--label-hidden">Design of the Didactronic Toolkit.</span> <div property="content:encoded" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item">The Didactronic Toolkit came about as a way to investigate ideas about formulating reinforcement learning problems using group theory. Its name, didactronic, is a contraction of the "didact" part of the word didactic, appended with the suffix "-tronic". <dl class="org-dl"> <dt> <a href="https://www.merriam-webster.com/dictionary/didactic">didactic</a> </dt><dd>a) designed or intended to teach, b) intended to convey instruction and information as well as pleasure and entertainment didactic poetry. </dd> <dt> <a href="http://wordinfo.info/unit/2190/page:13">-tronic</a> </dt><dd>Greek: a suffix referring to a device, tool, or instrument; more generally, used in the names of any kind of chamber or apparatus used in experiments. </dd> </dl> Therefore the term Didactronic signifies an instrument or apparatus intended to teach or convey instruction to an agent through experiments; this is the essence of reinforcement learning. The Didactronic Toolkit is meant to provide the basic tools to build such an instrument for an arbitrary task. However, since the toolkit is meant to be independent of domain, it must be both useful enough to simplify the task while being generic enough not to constrain it. The goal of this article is to distill reinforcement learning into its most basic elements to provide insight into the design philosophy behind the toolkit. The secondary objective of this work is to provide a vehicle for learning the Rust language. To that end, the Didactronic Toolkit will be re-implemented as a crate in Rust. The main elements of reinforcement learning are can be grouped into three modules: <dl class="org-dl"> <dt> Environment </dt><dd>The environment lays out the rules of the universe in which an agent exists. The environment can be composed by a set of <b>States</b> which describe the domain in which agents are allowed to operate. The environment will also define the rules which prescribe how any <b>State</b> can be reached from any other <b>State</b>.</dd> <dt> Agent </dt><dd>An agent exists in a given environment and is endowed with certain capabilities which enable it to operate therein. An agent's capabilities can be expressed as a set of <b>Actions</b>. An agent may follow some <b>Policy</b> in selecting an <b>Action</b> to take given its current state; making sure to follow the rules of its environment.</dd> <dt> Task </dt><dd>A task describes a goal that an agent is trying to achieve. This can be described by a set of goal <b>States</b> that an agent may want to reach. A <b>Task</b> may also define some anti-goal <b>States</b> which describe situations that an agent may wish to avoid; otherwise the task will be considered failed and incomplete. An agent will attempt to learn a <b>Strategy</b> to reach one or more of the goal <b>States</b> defined for a task while avoiding any anti-goal <b>States</b>.</dd> </dl> The terminology described in this article largely matches that used by <a href="https://books.google.ca/books?id=CAFR6IBF4xYC">Sutton and Barto</a> in describing the reinforcement learning problem with one notable addition: Strategy. This concept allows the decoupling of an action selection policy from a strategy employed to accomplish a particular task. In the sections that follow, the components of the Didactronic Toolkit will be described in more detail and their interfaces defined using the Rust language. The framework is mostly comprised of traits which must be exhibited for each entity of a specific domain. This allows the actual reinforcement learning algorithms to be implemented without consideration for the environments in which they are applied. <h2 id="sec-1">Strategy</h2> In the Didactronic Framework, the <b>Strategy</b> represents the thing that and agent is trying to learn. The reinforcement learning formalism proposed by Sutton and Barto prescribes updating a policy followed by an agent for selecting actions in a given state. This can lead to confusion because a <b>Policy</b> should be independent of the task at hand. For example, a greedy policy will always select the action which leads to the state with the greatest reward, regardless of the task. If updating a policy for a given task, then the greedy policy is only greedy for that particular objective. To address this potential confusion, I propose the concept of the strategy. The advantage of this is two-fold: <ol class="org-ol"> <li>The policy remains agnostic to the task for which it is applied. In other words, a greedy policy will always be a greedy policy irrespective of the task. The next action will depend on both the strategy and the policy being followed. </li> <li>A learned strategy can be followed using a variety of different policies. This provides opportunities to verify if a learned strategy is effective for various different policy types. </li> </ol> A <b>Strategy</b> is defined as a trait which exposes a function to determine the best action(s) to take given a state. The following listing illustrates the definition of the <b>Strategy</b> trait. <div class="org-src-container"> <pre class="src src-rust">pub trait Strategy&lt;S: State, A: Action&gt; { fn next( &amp;self, state: S ) -&gt; [A] ; fn get_value( &amp;self, state: S ) -&gt; S::Value ; } </pre> </div> The <i>next()</i> function will return an array of one or more actions ordered by preference for a given state; the most preferred action being first. An agent will select one action from this array according to some policy. For example, given an epsilon-greedy policy, the preferred action will be selected with a probability $1-\varepsilon$, otherwise an action will be randomly selected from the array using a normal distribution. The <b>Strategy</b> trait also defines the <i>get<sub>value</sub>()</i> function which will determine the value of a given state. This allows the Task to be decoupled from the Environment; the same state may have different values depending on the task. For example, in a game of tic-tac-toe, two agents will have competing goals, therefore the value of a winning state for one player will be a losing state for its opponent. Naturally this state's value will be different for each agent. <h2 id="sec-2">Environment</h2> The Environment trait defines the functions to describe all of its valid states and the possible transitions between them. The following listing illustrates the Rust definition of the Environment trait. <div class="org-src-container"> <pre class="src src-rust">pub trait Environment&lt;S: State, A: Action&gt; { fn contains( &amp;self, state: S ) -&gt; bool ; fn initial( &amp;self ) -&gt; &amp;S ; fn current( &amp;self ) -&gt; &amp;S ; fn is_valid( &amp;self, state: &amp;S, action: &amp;A ) -&gt; bool ; fn get_probability( &amp;self, current: &amp;S, action: &amp;A, next: &amp;S ) -&gt; f32 { 1.0 } } </pre> </div> The <i>contains()</i> function is used to assert whether or not the given state is part of the current environment. This design pattern is more general than mandating that an environment expose a set of all known states. For very large domains, it may not be possible to fully express this set. It is therefore preferrable to simply assert whether or not a state belongs to the environment. This should be a sufficient condition to define the environment's bounds. Additionally, the environment trait defines functions to retrieve the current state, the initial state, and to determine whether or not an action is valid for a given state. These capabilities can be used by agents to help them determine which actions to take. The trait also allows for stochastic state transitions by exposing a function, <i>transition<sub>probability</sub>()</i>, which will evaluate the probability that taking an action in a given state will lead to the specified <b>next</b> state. In combination with the <i>apply()</i> function, which returns all possible outcomes of taking a given action in the current state, it is possible to estimate the value of an action. Finally, the <i>execute()</i> function will apply the given action to update the current environment's state. <h2 id="sec-3">Task</h2> The Task trait describes an agent's objective by specifying its goals and the rewards received for its actions. <div class="org-src-container"> <pre class="src src-rust">pub trait Task&lt;S: State, A: Action&gt; { fn get_environment( &amp;self ) -&gt; &amp;Environment&lt;S,A&gt; ; fn is_terminal( &amp;self, state: &amp;S ) -&gt; bool ; fn is_goal( &amp;self, state: &amp;S ) -&gt; bool ; fn get_reward( &amp;self, initial: &amp;S, action: &amp;A, next: &amp;S ) -&gt; f32 ; } </pre> </div> During the learning process, a task will be tied to a particular environment. The task's environment can be retrieved using the <i>get<sub>environment</sub>()</i> function. Additionally, the task provides functions to assert whether or not a given state is terminal (<i>is<sub>terminal</sub>()</i>), and whether it represents a goal (<i>is<sub>goal</sub>()</i>). It is possible for a state to be terminal without being a goal. For example, a losing state in a game would be terminal without being a goal. <h2 id="sec-4">Agent</h2> The Agent trait represents the entity learning a <b>Strategy</b> to solve a given <b>Task</b>. The Rust definition of this trait is illustrated in the following listing: <div class="org-src-container"> <pre class="src src-rust">pub trait Agent&lt;S: State, A: Action&gt; { fn get_capabilities( &amp;self ) -&gt; Vec&lt;&amp;A&gt; ; fn next_action( &amp;self, state: S, strategy: Strategy&lt;S,A&gt;, policy: Policy&lt;A&gt; ) -&gt; A ; } </pre> </div> The Agent trait is very simple. It defines a function to retrieve the agent's capabilities as well as the action it will take in some state while following a given strategy and policy. Note that the Agent may also express an agency. This is necessary when trying to define a multi-agent learning system. The agent in this case will represetn 2 or more cooperating agents. The strategy will have to be implemented accordingly. The reinforcement learning elements described in this article will serve as the basis of what will hopefully become a useful tool for researching reinforcement learning. However, what is described herein is by no means final or complete. This will be an on-going project which will produce a useful crate to allow fellow Rustaceans to create applications therewith. </div> <span rel="sioc:has_creator" class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/main/user/1" lang="" about="/main/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">MarcAdmin</a></span> <span property="dc:date dc:created" content="2018-07-06T02:38:09+00:00" datatype="xsd:dateTime" class="field field--name-created field--type-created field--label-hidden">Thu, 07/05/2018 - 22:38</span> Fri, 06 Jul 2018 02:38:09 +0000 MarcAdmin 22 at https://www.vociferousvoid.org/main https://www.vociferousvoid.org/main/node/22#comments Play Tic-tac-toe with Arthur Cayley! Part Two: Expansion https://www.vociferousvoid.org/main/play-tic-tac-toe-with-arthur-cayley-part2-expansion <span property="dc:title" class="field field--name-title field--type-string field--label-hidden">Play Tic-tac-toe with Arthur Cayley! Part Two: Expansion</span> <div property="content:encoded" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p>In <a href="/mmlab/play-tic-tac-toe-with-arthur-cayley" rel="nofollow">part 1</a> of this series, the Tic-tac-toe reinforcement learning task was expressed as a Combinatorial Group with the hypothesis that the expansion of the group into a Cayley Graph could be used to learn its associated game tree. In this instalment, the expansion of the group into a Caley Graph will be examined in a bit more detail. Initially, the Tic-tac-toe group will be set aside in favour of a simpler domain which will offer a more compact and pedagogical representation. However, the expansion of the Tic-tac-toe group should follow the same process, this article will circle back to the Tic-tac-toe domain to highlight the equivalences which should ensure that this is so.</p> <p>Although Tic-tac-toe is a relatively simple problem, its state space makes it intractable for a "back of the napkin" illustration. Therefore, the random walk task proposed by Sutton and Barto (<a href="bibliography#Sutton-Barto:1998" rel="nofollow">Sutton and Barto, 1998</a>) will be used to discuss the formal expansion into a Cayley Graph. The random walk example consists of a small Markov process with five non-terminal states: $A$, $B$, $C$, $D$, and $E$. In each of the five non-terminal states, two actions with equal probability are possible: move left ($l$), and move right ($r$). An automata describing the random walk domain is illustrated in <a href="#Figure1:RandomWalk" rel="nofollow">Figure 1</a>.</p> <p><a rel="nofollow"></a></p> <p><strong>Figure 1</strong>: Diagram of a Markov process for generating random walks on five states plus two terminal states.</p> <p>Let $\langle R|\cdot\rangle$ represent the random walk group, it can be expressed as a combinatorial group with a generator set $R_G = \{l, r\}$ and associated constraint relations $R_D$. The $l$ and $r$ generators are inverses, therefore the group will have the following constraint: $R_D = \{ l \cdot r = e \}$, where $e$ is the identity element. In light of this constraint, the group expression can be simplified; let $a=r$, and thus $a^{-1} = l$, $R$ can now be expressed as the free group $\langle a | \rangle$. This expresses the composition of all the terms that comprise the group $R$ (e.g.: $aa^{-1}aa^{-1}a$ = a, $a^{-1}a^{-1}a^{-1} = a^{-3}$, $aaa = a^3$...). Given $C$ is the initial state of the random walk, then the following equivalences hold for this group: $C=e$, $D = C \cdot a$, and $A = C \cdot a^{-2}$.</p> <p>Because the random walk problem has a terminal state (i.e. the task is episodic), two additional constraints are required for a proper group representation to ensure that the random walk does not continue indefinitely:<br /> $$a^{3(-1)^n}\cdot i = a^{3(-1)^n}, \forall i \in R_G \land \forall n \in \mathbb{Z^+}$$<br /> and<br /> $$a^{3} = a^{-3} = F$$<br /> It should be pointed out that although there are an infinite number of random walks that can be taken starting from $C$ to reach the terminal states, the group $R$ is nonetheless a finite group when terms are reduced to their simplest form (i.e. occurrences of an element of the generator set followed by its inverse are elided from the term). The complete set of terms in the random walk group are:</p> <p>$$<br /> \begin{equation}<br /> R = \{ e, a, a^{-1}, a^2, a^{-2}, a^{3}, a^{-3} \} = \{ C, D, B, E, A, F \}<br /> \end{equation}<br /> $$</p> <p>The Cayley Graph $\Gamma(R,R_G)$ of the group $R$, illustrated in <a href="#Figure2:RandomWalk-CayleyGraph" rel="nofollow">Figure 2</a>, is constructed as follows:</p> <ol><li>Construct the vertex set: $V(\Gamma) = \{ s ~|~ \forall s \in R \}$</li> <li>Construct the edge set and partition it into two subsets with colour labels:<br /> $E(\Gamma) = E_\text{red}(\Gamma) \cap E_\text{blue}(\Gamma) = \{ (s_i, s_j) ~|~ a\cdot{s_i} = s_j \} \cap \{ (s_i, s_j) ~|~ a^{-1}\cdot{s_i} = s_j \}$</li> </ol><p><a rel="nofollow"></a></p> <p><strong>Figure 2</strong>: Cayley Graph of the Random Walk group $\langle R | \cdot \rangle$</p> <p>Note that the set $R$ is the set of all states in the task including the terminal state. In the environment-agent model of reinforcement learning, this is expressed as $S^+$. Additionally, the edge set of the Cayley Graph $E(\Gamma)$ is equivalent to the set of actions $\mathscr{A}(\pi)$ available to a given policy. This graph can therefore serve as the basis of a model for estimating a state-value function which can be improved using a Dynamic Programming implementation of Generalized Policy Iteration. However, some additional information must first be attached to the graph. Let $\mathscr{R}(s,s',a)$ be the function which defines the expected reward for taking action $a$ in state $s$ leading to state $s'$:<br /> $$<br /> \mathscr{R}(s, s', a) = \left\{<br /> \begin{array}{lr}<br /> 0 &amp; : s' \neq F \lor a \in E_\text{blue}(\Gamma) \\<br /> 1 &amp; : s' = F \land a \in E_\text{red}(\Gamma)<br /> \end{array}<br /> \right .<br /> $$<br /> This will associate a zero weight to all the edges in $\Gamma(R,R_G)$ with the exception of the red edge connecting $E$ to $F$. Additionally, initial value estimations must be assigned to each of the vertices in the graph. All values will initially be set to zero. Given an $\epsilon$-greedy policy, $\pi$, the policy evaluation algorithm described in <a href="#alg:PolicyEvaluation" rel="nofollow">Figure 3</a> will be used to get an initial approximation of the value function $V^{\pi}(R)$. The value $\mathscr{P}_{ss'}^{a}$ represents the probability that taking action $a$ in state $s$ will yield state $s'$. For the random walk problem, this is a certainty (probabilty is $1.0$). Therefore the actual value estimation update is calculated as follows:<br /> $$<br /> V^{\pi}(s) \leftarrow \sum_{s'} \mathscr{R}(s, s', \pi(s)) + \gamma V^{\pi}(s')<br /> $$<br /> where $\pi(s)$ will choose either $a$ or $a^{-1}$ with equal probability. Initially, the value estimation will remain zero with the possible exception of $V(E)$ which will have a value of 1 if the policy chooses action $a$ in this pass; which is a 50% probability.</p> <p><a rel="nofollow"></a></p> <ul><li>Repeat <ul><li>$\Delta \leftarrow 0$</li> <li>For each $s \in R$: <ul><li>$t \leftarrow V^{\pi}(s)$</li> <li>$V^{\pi}(s) \leftarrow \sum_{s'}{\mathscr{P}_{ss'}^{\pi(s)}[ \mathscr{R}(s,s',\pi(s)) + \gamma V^{\pi}(s') ]}$</li> <li>$\Delta \leftarrow \text{max}(\Delta, |t - V^{\pi}(s)|)$</li> </ul></li> </ul><p> until $\Delta$ &lt; $\theta$ (a small positive number) </p></li> </ul><p><strong>Figure 3</strong>: The Policy Evaluation algorithm</p> <p>With the updated value estimation, the policy improvement algorithm, described in <a href="#alg:PolicyImprovement" rel="nofollow">Figure 4</a>, will update the policy in relation to the new value estimation. As in the previous step, $\mathscr{P}_{ss'}^{a}$ will always be 1.0, therefore the policy update step will be:<br /> $$<br /> \pi(s) \leftarrow \text{arg}~\text{max}_a \sum_{s'}{\mathscr{R}(s, s', a) + \gamma V^{\pi(s')}}<br /> $$<br /> Following the first policy improvement, the policy will randomly choose either $a$ or $a^{-1}$ in all states with a probability of 0.5. The exception is in state $E$ where the policy will chose $a$ with a probability of $1-\epsilon$ (since an $\epsilon$-greedy policy will select an action randomly with a probability of $\epsilon$). From here, it should be fairly easy to verify, by hand calculating the value-estimation and policy, that this converges toward an optimal policy following a large number of iterations of policy evaluation and improvement. The final value-estimation will assign the values $\frac{1}{6}, \frac{2}{6}, \frac{3}{6}, \frac{4}{6}$ and $\frac{5}{6}$ to states $A, B, C, D$, and $E$ respectively. Therefore an $\epsilon$-greedy policy will almost always elect to walk toward $E$ to reach the final destination; which yields a higher reward.</p> <p><a href="PolicyImprovement" rel="nofollow"></a></p> <ul><li>$\mathit{stable} \leftarrow \text{true}$</li> <li>For each $s \in R$: <ul><li>$b \leftarrow \pi(s)$</li> <li>$\pi(s) \leftarrow \text{arg}~\text{max}_a \sum_{s'}{\mathscr{P}_{ss'}^{a}[ \mathscr{R}(s,s',a) + \gamma V^{\pi}(s')]}, a \in R_G$</li> <li>If $b \neq \pi(s)$, then $\mathit{stable} \leftarrow \text{false}$</li> </ul></li> <li>If $\mathit{stable}$, then stop; else do PolicyEvaluation</li> </ul><p><strong>Figure 4</strong>: The Policy Improvement algorithm</p> <p>This example illustrates how defining a reinforcement learning task as an combinatorial group yields a suitable model for learning an optimal policy using Dynamic Programming and Generalized Policy Iteration. The same procedure should yield similar results for the Tic-tac-toe domain, although with much greater complexity (it won't be feasible to calculate this by hand). There are a few caveats: 1) there will be multiple possible initial states (depending on whether or not the agent plays first) as opposed to the single initial state in the random walk task described in this article, and 2) the probability value $\mathscr{P}_{ss'}^{a}$ will not be zero because the resulting game tree must account for the various possible moves by the opponent. Aside from this the procedure to define the task should remain the same. Additionally, it should be possible to extend this to even more complex domains if the requirement of constructing the Cayley Graph is relaxed. A more abstract group representation could be used with Monte Carlo methods or Temporal Difference learning which do not require a well-defined model of the environment. These ideas will be explored in future articles.</p> </div> <span rel="sioc:has_creator" class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/main/user/1" lang="" about="/main/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">MarcAdmin</a></span> <span property="dc:date dc:created" content="2016-02-18T04:39:14+00:00" datatype="xsd:dateTime" class="field field--name-created field--type-created field--label-hidden">Wed, 02/17/2016 - 23:39</span> <div class="field field--name-field-tags field--type-entity-reference field--label-above clearfix"> <h3 class="field__label">Tags</h3> <ul class="links field__items"> <li><a href="/main/taxonomy/term/3" rel="dc:subject" hreflang="en">Cayley Graph</a></li> <li><a href="/main/taxonomy/term/11" rel="dc:subject" hreflang="en">Value Estimation</a></li> <li><a href="/main/taxonomy/term/4" rel="dc:subject" hreflang="en">Combinatorial Group</a></li> <li><a href="/main/taxonomy/term/10" rel="dc:subject" hreflang="en">Policy Iteration</a></li> </ul> </div> Thu, 18 Feb 2016 04:39:14 +0000 MarcAdmin 5 at https://www.vociferousvoid.org/main https://www.vociferousvoid.org/main/play-tic-tac-toe-with-arthur-cayley-part2-expansion#comments Play Tic-tac-toe with Arthur Cayley! https://www.vociferousvoid.org/main/play-tic-tac-toe-with-arthur-cayley <span property="dc:title" class="field field--name-title field--type-string field--label-hidden">Play Tic-tac-toe with Arthur Cayley!</span> <div property="content:encoded" class="clearfix text-formatted field field--name-body field--type-text-with-summary field--label-hidden field__item"><p><a href="https://en.wikipedia.org/wiki/Tic-tac-toe" rel="nofollow">Tic-tac-toe</a>, (or <em>noughts and crosses</em> or <em>Xs and Ox</em>), is a turn-based game for two players who alternately tag the spaces of a $3 \times 3$ grid with their respective marker: an X or an O. The object of the game is to place three markers in a row, either horizontally, vertically, or diagonally. Given only the mechanics of Tic-tac-toe, the game can be expressed as <a href="https://en.wikipedia.org/wiki/Combinatorial_group_theory" rel="nofollow">Combinatorial Group</a> by defining a set $A$ of generators $\{a_i\}$ which describe the actions that can be taken by either player. The <a href="https://en.wikipedia.org/wiki/Cayley_graph" rel="nofollow">Cayley Graph</a> of this group can be constructed which will express all the possible ways the game can be played. Using the Cayley Graph as a model, it should be possible to learn the Tic-tac-toe game tree using dynamic programming techniques (hint: the game tree is a sub-graph of the Cayley Graph).</p> <p>Before going any further, it is important to understand the structure of the Tic-tac-toe group. Tic-tac-toe is expressed as a finite combinatorial group on the set, $S$, of $4^9$ possible board positions: the 9 grid locations which can be empty or contain an X, an O, or the superposition of X and O, $\ast$. The generator set, $A$, is a proper subset of $S$ with a cardinality of 10; the tagging of each of the 9 grid locations with a marker, and the empty grid (not playing is also a valid play). The identity element of the group is the empty grid, $\varnothing$, which is also the initial configuration in the game. The group law is the bijective group operation which combines an initial state with an action to produce the final state, and is expressed as follows:</p> <p>$$ p: S \times S \mapsto S $$</p> <p>with</p> <p>$$ p(S,S) = \{ s, s' \in S : s \cdot s' = s_{ij} \cdot s'_{ij} \} $$</p> <p>In other words, the application of the group law will evaluate the dot-product of each grid cell location. The dot-product of grid cells is defined as follows:<br /> $$ s_{ij} \cdot s'_{ij} = \left\{<br /> \begin{array}{lr}<br /> s_{ij} &amp; \quad s_{ij} \neq \varnothing \land s'_{ij} = \varnothing \\<br /> s'_{ij} &amp; \quad s_{ij} = \varnothing \land s'_{ij} \neq \varnothing \\<br /> \ast &amp; \quad s_{ij} = \overline{s'_{ij}} \\<br /> \varnothing &amp; \quad s_{ij} = s'_{ij} \\<br /> \overline{s_{ij}} &amp; s'_{ij} = \ast \land s_{ij} \neq \varnothing<br /> \end{array}<br /> \right .<br /> $$<br /> The product of a marker with an empty cell tags the cell with the marker, two different markers will tag the cell with the superposition of both ($\ast$). The product of two similar markers will tag the cell as empty, therefore the group law described here is an autoinverse; this means that applying the law to a position with itself will result in the identity element.</p> <p>The group $E$ is expressed as $\langle A|p \rangle$, and its full state space is specified by repeated applications of the generator. The fact that $E$ is a group can be asserted by verifying that it satisfies the group axioms:</p> <ul><li>Totality: The set is closed under the operation $p$.</li> <li>Associativity: The operation $p$ will combine any two positions in $S$ and yield another position in $S$.</li> <li>Identity: There exists an identity element.</li> <li>Divisibility: For each element in the group, there exists an inverse which yields the identity element when the group law is applied thereto.</li> </ul><p>The proof that the group satisfies these axioms should be pretty evident. A formal proof of this fact is left as a future exercise.</p> <dl><dt>NOTE:</dt> <dd>The state space can be further constrained by defining a more intelligent group law. The state set $S$ could be partitioned into two sub-sets: $S = X \cup O$; where $X$ is the set of positions which allow X to play, and $O$ is the set of positions which allow O to play (note that the intersection of $X$ and $O$ is not empty). This would simplify the Cayley Graph and thus reduce the time required to learn the game tree. However, this would greatly increase the complexity of the group law, making it more prone to error.</dd> </dl><p>The abstract structure of the Tic-tac-toe group can be encoded with a Cayley graph, $\Gamma$, where each of vertices represents a position, and the edges represent that possible transitions resulting from an agent making a move.</p> <p>The Cayley graph of the Tic-tac-toe group is isomorphic to the backup diagram of the approximate value function, $V^\pi(s)$. By extending the graph -- associating values for each of the vertices (states), and weights for the edges -- it can be used as an initial approximation of the value function. Dynamic programming algorithms will iteratively update the values and weights to obtain a better approximation of the optimal value function. By removing the edges that tend toward a zero probability of being followed, the resulting graph should be isomorphic to the game tree.</p> <p>Initially, the value of each state will be set to zero with the exception of winning states which will have high values, and losing states which have low values. Given the sets $W$ and $L$ which contain all the winning and losing positions respectively (note: $W \cap L = \emptyset$), the initial values could be assigned as follows:</p> <p>$$\forall s \in S \quad : \quad V^\pi(s) = \left\{<br /> \begin{array}{lr}<br /> \gg 0 &amp; \quad s \in W \\<br /> \ll 0 &amp; \quad s \in L \\<br /> 0 &amp; \quad s \notin W \cup L<br /> \end{array}<br /> \right .<br /> $$</p> <p>The Tic-tac-toe group allows for positions that are not valid in a regular game (i.e. the states with superpositions). These moves should be suppressed in the process of iteratively improving the approximation of the value function. To do this, the transitions leading to invalid positions could be assigned a very small weight, ensuring that the probability of following the edge tends toward zero. The same could be done to prevent actions which place a marker in a previously occupied grid cell:</p> <p>$$<br /> P( s \cdot a = s') = \left\{<br /> \begin{array}{lr}<br /> 0 &amp; \quad \exists i,j \in \mathbb{Z}/3 : \quad s'_{ij} \neq \varnothing \land a_{ij} \neq \varnothing \\<br /> &gt;0 &amp; \quad \forall i,j \in \mathbb{Z}/3 : \quad s_{ij} = \varnothing \lor a_{ij} = \varnothing<br /> \end{array}<br /> \right .<br /> $$</p> <p>This will ensure that an agent using the Cayley graph as a value function approximation will generally not take actions leading to invalid states (which would be seen as a newbie error or an attempt at cheating by an opponent).</p> <p>The simplicity of the Tic-tac-toe problem make it a good pedagogical tool to learn about reinforcement learning.It is straightforward to write a computer program to play Tic-tac-toe perfectly, to enumerate the 765 essentially different positions (the state space complexity), or the 26,830 possible games up to rotations and reflections (the game tree complexity) on this space.<a href="https://en.wikipedia.org/wiki/Tic-tac-toe" rel="nofollow">[1]</a> However, by designing a program which learns how to play rather than manually building the game tree, the relatively small state space makes it easier to validate the techniques and algorithms used. Additionally, the theoretical foundations should also be applicable to more complex problems with state spaces that are too large to hand build the associated game tree.</p> <p>In this article, the Tic-tac-toe problem was expressed in group theoretic terms. There is an entire body of work on group theory which may provide valuable tools for reasoning about dynamic programming algorithms used to learn approximations of the solutions to modelled problems. In future articles, the ideas developed herein will be tested by implementing them using the Didactronic toolkit. The goals of this endeavour are two-fold: 1) to validate the hypothesis that group theory provides a useful formalism for expressing reinforcement learning systems, and 2) to drive the development of the Didactronic Toolkit to make it more useful as a generalized machine learning framework.</p> </div> <span rel="sioc:has_creator" class="field field--name-uid field--type-entity-reference field--label-hidden"><a title="View user profile." href="/main/user/1" lang="" about="/main/user/1" typeof="schema:Person" property="schema:name" datatype="" class="username">MarcAdmin</a></span> <span property="dc:date dc:created" content="2016-02-06T03:51:07+00:00" datatype="xsd:dateTime" class="field field--name-created field--type-created field--label-hidden">Fri, 02/05/2016 - 22:51</span> <div class="field field--name-field-tags field--type-entity-reference field--label-above clearfix"> <h3 class="field__label">Tags</h3> <ul class="links field__items"> <li><a href="/main/taxonomy/term/3" rel="dc:subject" hreflang="en">Cayley Graph</a></li> <li><a href="/main/taxonomy/term/4" rel="dc:subject" hreflang="en">Combinatorial Group</a></li> <li><a href="/main/taxonomy/term/5" rel="dc:subject" hreflang="en">Tic-Tac-Toe</a></li> <li><a href="/main/taxonomy/term/9" rel="dc:subject" hreflang="en">Dynamic Programming</a></li> </ul> </div> Sat, 06 Feb 2016 03:51:07 +0000 MarcAdmin 2 at https://www.vociferousvoid.org/main https://www.vociferousvoid.org/main/play-tic-tac-toe-with-arthur-cayley#comments