# DSA Implementation vs Documentation Discrepancies ## Critical Discrepancies ### 1. **Stack Growth Direction** ❌ CRITICAL **Documentation states:** Stack grows upward (toward higher addresses) **Implementation shows (expand.rs:44-51):** ```rust fn expand_push(current: &Node, nodes: &mut Vec) -> Result<(), AssembleError> { // ... nodes.extend(vec![ node!(label, Opcode::SubI, spr, 4, spr), // spr = spr - 4 node!(None, Opcode::Stw, reg, spr, 0), ]); ``` **Implementation shows (expand.rs:130-137):** ```rust fn expand_pop(current: &Node, nodes: &mut Vec) -> Result<(), AssembleError> { // ... nodes.extend(vec![ node!(label, Opcode::Ldw, spr, reg, 0), node!(None, Opcode::AddI, spr, 4, spr), // spr = spr + 4 ]); ``` **Reality:** Stack grows **DOWNWARD** (toward lower addresses) - PUSH: Decrements SPR by 4, then stores - POP: Loads, then increments SPR by 4 **Impact:** All documentation examples and calling convention diagrams are backwards! --- ### 2. **CALL Pseudo-instruction Expansion** ❌ CRITICAL **Documentation states (DSA_Assembly_Reference.md):** ```asm ; call print::print expands to: lwi print::print, ret ; Load function address into ret jmp 0, ret ; Jump to function (saves return in pcx) ``` **Implementation shows (expand.rs:109-123):** ```rust fn expand_call(current: &Node, nodes: &mut Vec) -> Result<(), AssembleError> { nodes.extend(vec![ node!(label, Opcode::SubI, spr, 4, spr), // Decrement stack pointer node!(None, Opcode::Stw, pcx, spr, 0), // Store PCX (return addr) on stack node!(None, Opcode::Jmp, addr, zero), // Jump to function ]); ``` **Reality:** CALL expansion is: 1. Decrement SPR by 4 2. Store PCX (return address) to stack 3. Jump to function address **Impact:** Return address is stored on the STACK, not in RET register! --- ### 3. **RETURN Pseudo-instruction Expansion** ❌ CRITICAL **Documentation states:** ```asm ; return expands to: jmp 0, ret ; Jump to address in ret register ``` **Implementation shows (expand.rs:125-135):** ```rust fn expand_return(current: &Node, nodes: &mut Vec) { nodes.extend(vec![ node!(label, Opcode::Ldw, spr, ret, 0), // Load return addr from stack node!(None, Opcode::AddI, spr, 4, spr), // Increment stack pointer node!(None, Opcode::Jmp, 4, ret), // Jump to (ret + 4) ]); } ``` **Reality:** RETURN expansion is: 1. Load return address from stack into RET register 2. Increment SPR by 4 3. Jump to (RET + 4) **Why +4?** The stored PCX points to the instruction AFTER the call's jump, so we need to add 4 to skip past the stored PCX instruction itself... or this might be a bug in the implementation. **Impact:** Return mechanism is completely different from documentation! --- ### 4. **Calling Convention - Stack Frame Layout** ❌ CRITICAL **Documentation states:** ``` Higher Addresses ├─────────────┤ │ Arg N │ ← spr + (8 + 4*(N-1)) │ ... │ │ Arg 2 │ ← spr + 16 │ Arg 1 │ ← spr + 12 │ Arg 0 │ ← spr + 8 ├─────────────┤ │ Ret Addr │ ← spr + 4 ├─────────────┤ │ Old BPR │ ← spr + 0 ├─────────────┤ ← bpr, spr │ Locals │ Lower Addresses ``` **Reality based on implementation:** Since stack grows DOWN: ``` Lower Addresses ├─────────────┤ ← Current SPR/BPR │ Old BPR │ ← spr + 0 (immediately above SPR) ├─────────────┤ │ Ret Addr │ ← spr + 4 (pushed by CALL) ├─────────────┤ │ Arg 0 │ ← spr + 8 │ Arg 1 │ ← spr + 12 │ Arg 2 │ ← spr + 16 │ ... │ │ Arg N │ ← spr + (8 + 4*(N-1)) ├─────────────┤ Higher Addresses ``` **The diagram needs to be flipped!** The offsets are correct, but the direction is wrong. --- ### 5. **Label-Based Load/Store Scratch Register** ⚠️ IMPORTANT **Documentation states:** Uses `rgf` as scratch register **Implementation confirms (expand.rs:138-153):** ```rust fn expand_ldx(current: &Node, nodes: &mut Vec) -> Result<(), AssembleError> { // For ldb label, reg: nodes.extend(vec![ node!(current.label(), Opcode::Lli, name, reg), node!(None, Opcode::Lui, name, reg), node!(None, opcode, reg, reg, offset), ]); ``` **Wait! This is WRONG in the implementation!** The load expansion uses the DESTINATION register as scratch: ```asm ldb buffer, rg2 expands to: lli buffer, rg2 ; Uses rg2 as destination lui buffer, rg2 ; Uses rg2 as destination ldb rg2, rg2, 0 ; Uses rg2 as base ``` **Documentation says it should use rgf:** ```asm ldb buffer, rg2 expands to: lli buffer, rgf ; Uses rgf as scratch lui buffer, rgf ; Uses rgf as scratch ldb rgf, rg2, 0 ; Load from rgf into rg2 ``` **For stores (expand.rs:155-176):** ```rust fn expand_stx(current: &Node, nodes: &mut Vec) -> Result<(), AssembleError> { // For stb reg, label: let temp = Token::Register(Register::Acc); // Uses ACC, not RGF! nodes.extend(vec![ node!(current.label(), Opcode::Lli, dest, temp), node!(None, Opcode::Lui, dest, temp), node!(None, opcode, base, temp, offset), ]); ``` **Reality:** - Load pseudo-instructions use the DESTINATION register as scratch - Store pseudo-instructions use the ACC register as scratch, NOT rgf **Impact:** Documentation is incorrect about which registers are used! --- ### 6. **LWI Pseudo-instruction** ✅ CORRECT **Documentation and implementation agree:** ```rust fn expand_lwi(current: &Node, nodes: &mut Vec) -> Result<(), AssembleError> { nodes.extend(vec![ node!(current.label(), Opcode::Lli, val, reg), node!(None, Opcode::Lui, val, reg), ]); ``` This matches the documented expansion. --- ### 7. **PUSHA/POPA Pseudo-instructions** 📝 UNDOCUMENTED **These exist in implementation but are NOT in documentation!** **expand.rs:53-76:** ```rust fn expand_pusha(current: &Node, nodes: &mut Vec) -> Result<(), AssembleError> { let count = expect_token!(arg0, Immediate)?; let spr = Token::Register(Register::Spr); let registers: Vec = Register::general(); nodes.push(node!(label, Opcode::SubI, spr, Token::Immediate(count * 4), spr)); nodes.extend((0..count).rev().map(|i| { node!(None, Opcode::Stw, Token::Register(registers[i as usize]), spr, Token::Immediate(i * 4) ) })); ``` **expand.rs:78-101:** ```rust fn expand_popa(current: &Node, nodes: &mut Vec) -> Result<(), AssembleError> { let count = expect_token!(arg0, Immediate)?; nodes.extend((0..count).rev().map(|i| { node!( { if i == 0 { label.clone() } else { None } }, Opcode::Ldw, spr, Token::Register(registers[i as usize]), Token::Immediate(i * 4) ) })); nodes.push(node!(None, Opcode::AddI, spr, Token::Immediate(count * 4), spr)); ``` **What they do:** - `pusha N` - Push first N general-purpose registers (rg0-rgN) to stack - `popa N` - Pop first N general-purpose registers from stack **Missing from documentation entirely!** --- ### 8. **Register Index Encoding** ⚠️ IMPORTANT **Documentation states:** System registers like MAR, MDR, STS, CIR, PCX are "internal" and not accessible **Implementation shows (instructions.rs:148-153):** ```rust 0x18 => Self::Mar, 0x19 => Self::Mdr, 0x1A => Self::Sts, 0x1B => Self::Cir, 0x1C => Self::Pcx, ``` **Reality:** These registers ARE encoded in the instruction format at indices 0x18-0x1C! **However, instructions.rs:186 shows:** ```rust "null" => Ok(Self::NoReg), // Can parse "null" as NoReg ``` **Documentation never mentions "null" as an alternative name for noreg!** --- ### 9. **LUI Immediate Value Handling** ⚠️ IMPORTANT **Documentation states:** ``` lui immediate, dest_reg ; Load immediate into upper 16 bits ``` **Implementation shows (codegen.rs:248-254):** ```rust fn build_load_immediate_instruction(...) -> Result { // ... match opcode { Opcode::Lli => { let instruction_args = args!(I, immediate: value as u16, r1: dest); Ok(Instruction::LoadLowerImmediate(instruction_args)) } Opcode::Lui => { let upper_value = value >> 16; // Shifts right by 16! let instruction_args = args!(I, immediate: upper_value as u16, r1: dest); Ok(Instruction::LoadUpperImmediate(instruction_args)) } ``` **Reality:** When assembling `lui immediate, reg`, the assembler: 1. Takes the immediate value 2. Shifts it RIGHT by 16 bits 3. Stores the result in the instruction **This means:** ```asm lli 0x1234, rg0 ; Stores 0x1234 in lower 16 bits lui 0xABCD0000, rg0 ; Right-shifts to 0xABCD, stores in upper 16 bits ``` **Or more likely, the assembler expects:** ```asm lli 0x1234, rg0 ; Stores 0x1234 in lower 16 bits lui 0xABCD, rg0 ; Stores 0xABCD in upper 16 bits (no shift needed) ``` **Documentation needs clarification on what immediate value format LUI expects!** --- ### 10. **Data Definition Encoding** ⚠️ IMPORTANT **Implementation (expand.rs:217-267):** ```rust fn process_dx_data(args: Vec, size: usize) -> Result, AssembleError> { for token in args { match token { Token::StringLit(mut s) => { s.push('\0'); // Automatically adds null terminator! for ch in s.chars() { let mut char_buf = [0u8; 4]; let char_bytes = ch.encode_utf8(&mut char_buf); buffer.extend_from_slice(char_bytes.as_bytes()); } } Token::Immediate(value) => { buffer.extend_from_slice(&value.to_be_bytes()); // BIG ENDIAN! } ``` **Key findings:** 1. String literals automatically get null terminator appended 2. Numeric values are stored in **BIG ENDIAN** format (to_be_bytes) 3. Documentation says "little-endian byte order" globally **Contradiction:** Data definition uses BIG ENDIAN, but doc says LITTLE ENDIAN! --- ### 11. **Segment Instruction** 📝 UNDOCUMENTED **Implementation has a SEGMENT instruction (0x27/0x3F):** ```rust Segment(u32) = 0x3F, ``` **This is completely undocumented!** From model.rs: ```rust Self::Segment => write!(f, "[SEGMENT]"), ``` From codegen.rs: ```rust Opcode::Segment => build_segment_instruction(&args), ``` **Purpose unclear, needs documentation!** --- ### 12. **Data Instruction** 📝 UNDOCUMENTED **Implementation has a DATA instruction (0x3E):** ```rust Data(u32) = 0x3E, ``` **This appears to be a meta-instruction for embedding raw data, but it's undocumented in the assembly reference!** --- ### 13. **INC/DEC Instruction Encoding** ⚠️ MINOR **Implementation (codegen.rs:293-299):** ```rust fn build_inc_dec_instruction(opcode: Opcode, args: &[Token]) -> Result { let reg = expect_token!(reg_token, Register)?; match opcode { Opcode::Inc => Ok(Instruction::Increment(args!(R, sr1: reg))), Opcode::Dec => Ok(Instruction::Decrement(args!(R, sr1: reg))), ``` **Reality:** INC/DEC only set SR1 field, not DR field. **But args.rs shows:** ```rust impl RTypeArgs { pub fn new(...) -> Self { let sr1 = sr1.unwrap_or_default(); // Defaults to NoReg let dr = dr.unwrap_or_default(); // Defaults to NoReg ``` **So the DR field gets set to NoReg, which is correct per documentation.** **However, the Display impl (instructions.rs:449) shows:** ```rust Self::Increment(a) | Self::Decrement(a) => write!(f, " {}", a.sr1), ``` **This is correct - only shows SR1 in disassembly.** --- ### 14. **Shift Instruction Operand Order** ⚠️ MINOR **Implementation (codegen.rs:301-312):** ```rust fn build_shift_instruction(opcode: Opcode, args: &[Token]) -> Result { let reg = expect_token!(reg_token, Register)?; let amount = expect_token!(amount_token, Immediate)? as u8; match opcode { Opcode::Shl => Ok(Instruction::ShiftLeft(args!(R, sr1: reg, shamt: amount))), ``` **This only handles LITERAL shift amounts, not REGISTER shift amounts!** **Documentation states both are supported:** ```asm shl rg0, 2 ; Literal shift shl rg0, rg1 ; Register shift ``` **The current codegen only handles the literal case!** **This is a BUG in the implementation - register shifts aren't properly assembled!** --- ### 15. **Jump Instruction Operand Order** ⚠️ CONFUSION **Documentation shows assembly syntax:** ```asm jmp addr [, offset_reg] ``` **But implementation (codegen.rs:256-270):** ```rust fn build_jump_instruction(opcode: Opcode, args: &[Token]) -> Result { let address = expect_token!(address_token, Immediate)?; let offset = expect_token!(offset_token, Register)?; let instruction_args = args!(I, immediate: address as u16, r1: offset); ``` **This expects:** 1. First arg: immediate (address) 2. Second arg: register (offset) **So assembly syntax should be:** ```asm jmp immediate, offset_register ``` **Example:** ```asm jmp 0x1000, zero ; Jump to 0x1000 jmp 4, ret ; Jump to (ret + 4) ``` **Documentation syntax is correct, but parameter names are confusing!** The "address" is actually an OFFSET, and the register is the BASE! **Better naming:** ```asm jmp offset, base_register ; Target = base_register + offset ``` --- ### 16. **NOT Instruction Operand Count** ✅ MINOR ISSUE **Documentation shows:** ```asm not src, dest ; Two operands ``` **Implementation (instructions.rs:428-429):** ```rust Self::Compare(args) | Self::Not(args) => { write!(f, " {}, {}", args.sr1, args.sr2) } ``` **This displays BOTH sr1 and sr2 for NOT!** **But codegen.rs:354-362:** ```rust fn build_not_instruction(args: &[Token]) -> Result { let reg = expect_token!(reg_token, Register)?; let dest = expect_token!(dest_token, Register)?; Ok(Instruction::Not(args!(R, sr1: reg, dr: dest))) ``` **Sets sr1 and dr, NOT sr1 and sr2!** **The Display impl is WRONG - should show sr1 and dr:** ```rust Self::Not(args) => write!(f, " {}, {}", args.sr1, args.dr) ``` **This is a display bug in the implementation!** --- ### 17. **Register File Indexing** ✅ CORRECT **Documentation and implementation both agree:** - 0x00-0x0F: rg0-rgf (general purpose) - 0x10: acc - 0x11: spr - 0x12: bpr - 0x13: ret - 0x14: idr - 0x15: mmr - 0x16: zero - 0x17: noreg **This matches perfectly.** --- ### 18. **Immediate Arithmetic Destination** ⚠️ MINOR **Implementation (codegen.rs:314-330):** ```rust fn build_arithmetic_immediate_instruction(...) -> Result { let reg = expect_token!(reg_token, Register)?; let immediate = expect_token!(immediate_token, Immediate)? as u16; let dest = expect_token!(dest_token, Register)?; let instruction_args = args!(I, immediate: immediate, r1: reg, r2: dest); ``` **This REQUIRES three arguments:** 1. Source register 2. Immediate value 3. Destination register **But documentation says destination is optional:** ``` iadd src_reg, imm [, dest_reg] ; dest optional ``` **Reality:** The assembler REQUIRES the destination register! **If you want in-place operation:** ```asm iadd rg0, 10, rg0 ; Required to specify rg0 twice ``` **Not:** ```asm iadd rg0, 10 ; This won't work! ``` **Documentation is misleading - destination is NOT optional!** --- ### 19. **Memory Instruction Offsets** ✅ CORRECT **Implementation correctly handles signed 16-bit offsets:** ```rust let offset = expect_token!(offset_token, Immediate)? as u16; ``` **These are stored as u16 but interpreted as signed i16 at runtime.** **Documentation is correct about this.** --- ### 20. **Instruction Opcode Values** ✅ VERIFIED Comparing model.rs opcodes with instructions.rs: | Instruction | model.rs | instructions.rs | Match | |-------------|----------|-----------------|-------| | Nop | 0x00 | 0x0 | ✅ | | Mov | 0x01 | 0x1 | ✅ | | MovSigned | 0x02 | 0x2 | ✅ | | LoadByte | 0x03 | 0x3 | ✅ | | ... | ... | ... | ✅ | | AddImmediate | 0x25 | 0x25 | ✅ | | SubImmediate | 0x26 | 0x26 | ✅ | | Segment | 0x27 | 0x3F | ❌ MISMATCH! | **CRITICAL:** Segment instruction has opcode **0x27** in model.rs but **0x3F** in instructions.rs! --- ## Summary of Critical Issues ### Must Fix in Documentation: 1. ✅ **Stack grows DOWNWARD** - flip all diagrams 2. ✅ **CALL expansion** - uses stack, not ret register directly 3. ✅ **RETURN expansion** - loads from stack, jumps to ret+4 4. ✅ **Stack frame layout** - flip diagram vertically 5. ✅ **Load pseudo scratch register** - uses DEST reg, not rgf 6. ✅ **Store pseudo scratch register** - uses ACC, not rgf 7. ✅ **Add PUSHA/POPA documentation** 8. ✅ **Add SEGMENT instruction documentation** 9. ✅ **Add DATA instruction documentation** 10. ✅ **Clarify LUI immediate value handling** 11. ✅ **Fix endianness** - data definition uses BIG endian 12. ✅ **IADD/ISUB destination NOT optional** 13. ✅ **Add "null" as alias for noreg** 14. ✅ **Fix Segment opcode** - 0x27 or 0x3F? ### Potential Implementation Bugs: 1. ⚠️ **Shift instruction** - doesn't handle register shifts 2. ⚠️ **NOT display** - shows sr2 instead of dr 3. ⚠️ **RETURN +4 offset** - why is this needed? 4. ⚠️ **Segment opcode mismatch** - 0x27 vs 0x3F ### Minor Documentation Improvements: 1. Add explicit examples of stack growth direction 2. Show complete memory layout diagrams 3. Document which registers are volatile/preserved 4. Add troubleshooting section for common mistakes 5. Clarify jump instruction parameter semantics