18 KiB
DSA Implementation vs Documentation Discrepancies
Critical Discrepancies
1. Stack Growth Direction ❌ CRITICAL
Documentation states: Stack grows upward (toward higher addresses)
Implementation shows (expand.rs:44-51):
fn expand_push(current: &Node, nodes: &mut Vec<Node>) -> Result<(), AssembleError> {
// ...
nodes.extend(vec![
node!(label, Opcode::SubI, spr, 4, spr), // spr = spr - 4
node!(None, Opcode::Stw, reg, spr, 0),
]);
Implementation shows (expand.rs:130-137):
fn expand_pop(current: &Node, nodes: &mut Vec<Node>) -> Result<(), AssembleError> {
// ...
nodes.extend(vec![
node!(label, Opcode::Ldw, spr, reg, 0),
node!(None, Opcode::AddI, spr, 4, spr), // spr = spr + 4
]);
Reality: Stack grows DOWNWARD (toward lower addresses)
- PUSH: Decrements SPR by 4, then stores
- POP: Loads, then increments SPR by 4
Impact: All documentation examples and calling convention diagrams are backwards!
2. CALL Pseudo-instruction Expansion ❌ CRITICAL
Documentation states (DSA_Assembly_Reference.md):
; call print::print expands to:
lwi print::print, ret ; Load function address into ret
jmp 0, ret ; Jump to function (saves return in pcx)
Implementation shows (expand.rs:109-123):
fn expand_call(current: &Node, nodes: &mut Vec<Node>) -> Result<(), AssembleError> {
nodes.extend(vec![
node!(label, Opcode::SubI, spr, 4, spr), // Decrement stack pointer
node!(None, Opcode::Stw, pcx, spr, 0), // Store PCX (return addr) on stack
node!(None, Opcode::Jmp, addr, zero), // Jump to function
]);
Reality: CALL expansion is:
- Decrement SPR by 4
- Store PCX (return address) to stack
- Jump to function address
Impact: Return address is stored on the STACK, not in RET register!
3. RETURN Pseudo-instruction Expansion ❌ CRITICAL
Documentation states:
; return expands to:
jmp 0, ret ; Jump to address in ret register
Implementation shows (expand.rs:125-135):
fn expand_return(current: &Node, nodes: &mut Vec<Node>) {
nodes.extend(vec![
node!(label, Opcode::Ldw, spr, ret, 0), // Load return addr from stack
node!(None, Opcode::AddI, spr, 4, spr), // Increment stack pointer
node!(None, Opcode::Jmp, 4, ret), // Jump to (ret + 4)
]);
}
Reality: RETURN expansion is:
- Load return address from stack into RET register
- Increment SPR by 4
- Jump to (RET + 4)
Why +4? The stored PCX points to the instruction AFTER the call's jump, so we need to add 4 to skip past the stored PCX instruction itself... or this might be a bug in the implementation.
Impact: Return mechanism is completely different from documentation!
4. Calling Convention - Stack Frame Layout ❌ CRITICAL
Documentation states:
Higher Addresses
├─────────────┤
│ Arg N │ ← spr + (8 + 4*(N-1))
│ ... │
│ Arg 2 │ ← spr + 16
│ Arg 1 │ ← spr + 12
│ Arg 0 │ ← spr + 8
├─────────────┤
│ Ret Addr │ ← spr + 4
├─────────────┤
│ Old BPR │ ← spr + 0
├─────────────┤ ← bpr, spr
│ Locals │
Lower Addresses
Reality based on implementation: Since stack grows DOWN:
Lower Addresses
├─────────────┤ ← Current SPR/BPR
│ Old BPR │ ← spr + 0 (immediately above SPR)
├─────────────┤
│ Ret Addr │ ← spr + 4 (pushed by CALL)
├─────────────┤
│ Arg 0 │ ← spr + 8
│ Arg 1 │ ← spr + 12
│ Arg 2 │ ← spr + 16
│ ... │
│ Arg N │ ← spr + (8 + 4*(N-1))
├─────────────┤
Higher Addresses
The diagram needs to be flipped! The offsets are correct, but the direction is wrong.
5. Label-Based Load/Store Scratch Register ⚠️ IMPORTANT
Documentation states: Uses rgf as scratch register
Implementation confirms (expand.rs:138-153):
fn expand_ldx(current: &Node, nodes: &mut Vec<Node>) -> Result<(), AssembleError> {
// For ldb label, reg:
nodes.extend(vec![
node!(current.label(), Opcode::Lli, name, reg),
node!(None, Opcode::Lui, name, reg),
node!(None, opcode, reg, reg, offset),
]);
Wait! This is WRONG in the implementation!
The load expansion uses the DESTINATION register as scratch:
ldb buffer, rg2 expands to:
lli buffer, rg2 ; Uses rg2 as destination
lui buffer, rg2 ; Uses rg2 as destination
ldb rg2, rg2, 0 ; Uses rg2 as base
Documentation says it should use rgf:
ldb buffer, rg2 expands to:
lli buffer, rgf ; Uses rgf as scratch
lui buffer, rgf ; Uses rgf as scratch
ldb rgf, rg2, 0 ; Load from rgf into rg2
For stores (expand.rs:155-176):
fn expand_stx(current: &Node, nodes: &mut Vec<Node>) -> Result<(), AssembleError> {
// For stb reg, label:
let temp = Token::Register(Register::Acc); // Uses ACC, not RGF!
nodes.extend(vec![
node!(current.label(), Opcode::Lli, dest, temp),
node!(None, Opcode::Lui, dest, temp),
node!(None, opcode, base, temp, offset),
]);
Reality:
- Load pseudo-instructions use the DESTINATION register as scratch
- Store pseudo-instructions use the ACC register as scratch, NOT rgf
Impact: Documentation is incorrect about which registers are used!
6. LWI Pseudo-instruction ✅ CORRECT
Documentation and implementation agree:
fn expand_lwi(current: &Node, nodes: &mut Vec<Node>) -> Result<(), AssembleError> {
nodes.extend(vec![
node!(current.label(), Opcode::Lli, val, reg),
node!(None, Opcode::Lui, val, reg),
]);
This matches the documented expansion.
7. PUSHA/POPA Pseudo-instructions 📝 UNDOCUMENTED
These exist in implementation but are NOT in documentation!
expand.rs:53-76:
fn expand_pusha(current: &Node, nodes: &mut Vec<Node>) -> Result<(), AssembleError> {
let count = expect_token!(arg0, Immediate)?;
let spr = Token::Register(Register::Spr);
let registers: Vec<Register> = Register::general();
nodes.push(node!(label, Opcode::SubI, spr, Token::Immediate(count * 4), spr));
nodes.extend((0..count).rev().map(|i| {
node!(None, Opcode::Stw,
Token::Register(registers[i as usize]),
spr,
Token::Immediate(i * 4)
)
}));
expand.rs:78-101:
fn expand_popa(current: &Node, nodes: &mut Vec<Node>) -> Result<(), AssembleError> {
let count = expect_token!(arg0, Immediate)?;
nodes.extend((0..count).rev().map(|i| {
node!(
{ if i == 0 { label.clone() } else { None } },
Opcode::Ldw,
spr,
Token::Register(registers[i as usize]),
Token::Immediate(i * 4)
)
}));
nodes.push(node!(None, Opcode::AddI, spr, Token::Immediate(count * 4), spr));
What they do:
pusha N- Push first N general-purpose registers (rg0-rgN) to stackpopa N- Pop first N general-purpose registers from stack
Missing from documentation entirely!
8. Register Index Encoding ⚠️ IMPORTANT
Documentation states: System registers like MAR, MDR, STS, CIR, PCX are "internal" and not accessible
Implementation shows (instructions.rs:148-153):
0x18 => Self::Mar,
0x19 => Self::Mdr,
0x1A => Self::Sts,
0x1B => Self::Cir,
0x1C => Self::Pcx,
Reality: These registers ARE encoded in the instruction format at indices 0x18-0x1C!
However, instructions.rs:186 shows:
"null" => Ok(Self::NoReg), // Can parse "null" as NoReg
Documentation never mentions "null" as an alternative name for noreg!
9. LUI Immediate Value Handling ⚠️ IMPORTANT
Documentation states:
lui immediate, dest_reg ; Load immediate into upper 16 bits
Implementation shows (codegen.rs:248-254):
fn build_load_immediate_instruction(...) -> Result<Instruction, AssembleError> {
// ...
match opcode {
Opcode::Lli => {
let instruction_args = args!(I, immediate: value as u16, r1: dest);
Ok(Instruction::LoadLowerImmediate(instruction_args))
}
Opcode::Lui => {
let upper_value = value >> 16; // Shifts right by 16!
let instruction_args = args!(I, immediate: upper_value as u16, r1: dest);
Ok(Instruction::LoadUpperImmediate(instruction_args))
}
Reality: When assembling lui immediate, reg, the assembler:
- Takes the immediate value
- Shifts it RIGHT by 16 bits
- Stores the result in the instruction
This means:
lli 0x1234, rg0 ; Stores 0x1234 in lower 16 bits
lui 0xABCD0000, rg0 ; Right-shifts to 0xABCD, stores in upper 16 bits
Or more likely, the assembler expects:
lli 0x1234, rg0 ; Stores 0x1234 in lower 16 bits
lui 0xABCD, rg0 ; Stores 0xABCD in upper 16 bits (no shift needed)
Documentation needs clarification on what immediate value format LUI expects!
10. Data Definition Encoding ⚠️ IMPORTANT
Implementation (expand.rs:217-267):
fn process_dx_data(args: Vec<Token>, size: usize) -> Result<Vec<u32>, AssembleError> {
for token in args {
match token {
Token::StringLit(mut s) => {
s.push('\0'); // Automatically adds null terminator!
for ch in s.chars() {
let mut char_buf = [0u8; 4];
let char_bytes = ch.encode_utf8(&mut char_buf);
buffer.extend_from_slice(char_bytes.as_bytes());
}
}
Token::Immediate(value) => {
buffer.extend_from_slice(&value.to_be_bytes()); // BIG ENDIAN!
}
Key findings:
- String literals automatically get null terminator appended
- Numeric values are stored in BIG ENDIAN format (to_be_bytes)
- Documentation says "little-endian byte order" globally
Contradiction: Data definition uses BIG ENDIAN, but doc says LITTLE ENDIAN!
11. Segment Instruction 📝 UNDOCUMENTED
Implementation has a SEGMENT instruction (0x27/0x3F):
Segment(u32) = 0x3F,
This is completely undocumented!
From model.rs:
Self::Segment => write!(f, "[SEGMENT]"),
From codegen.rs:
Opcode::Segment => build_segment_instruction(&args),
Purpose unclear, needs documentation!
12. Data Instruction 📝 UNDOCUMENTED
Implementation has a DATA instruction (0x3E):
Data(u32) = 0x3E,
This appears to be a meta-instruction for embedding raw data, but it's undocumented in the assembly reference!
13. INC/DEC Instruction Encoding ⚠️ MINOR
Implementation (codegen.rs:293-299):
fn build_inc_dec_instruction(opcode: Opcode, args: &[Token]) -> Result<Instruction, AssembleError> {
let reg = expect_token!(reg_token, Register)?;
match opcode {
Opcode::Inc => Ok(Instruction::Increment(args!(R, sr1: reg))),
Opcode::Dec => Ok(Instruction::Decrement(args!(R, sr1: reg))),
Reality: INC/DEC only set SR1 field, not DR field.
But args.rs shows:
impl RTypeArgs {
pub fn new(...) -> Self {
let sr1 = sr1.unwrap_or_default(); // Defaults to NoReg
let dr = dr.unwrap_or_default(); // Defaults to NoReg
So the DR field gets set to NoReg, which is correct per documentation.
However, the Display impl (instructions.rs:449) shows:
Self::Increment(a) | Self::Decrement(a) => write!(f, " {}", a.sr1),
This is correct - only shows SR1 in disassembly.
14. Shift Instruction Operand Order ⚠️ MINOR
Implementation (codegen.rs:301-312):
fn build_shift_instruction(opcode: Opcode, args: &[Token]) -> Result<Instruction, AssembleError> {
let reg = expect_token!(reg_token, Register)?;
let amount = expect_token!(amount_token, Immediate)? as u8;
match opcode {
Opcode::Shl => Ok(Instruction::ShiftLeft(args!(R, sr1: reg, shamt: amount))),
This only handles LITERAL shift amounts, not REGISTER shift amounts!
Documentation states both are supported:
shl rg0, 2 ; Literal shift
shl rg0, rg1 ; Register shift
The current codegen only handles the literal case!
This is a BUG in the implementation - register shifts aren't properly assembled!
15. Jump Instruction Operand Order ⚠️ CONFUSION
Documentation shows assembly syntax:
jmp addr [, offset_reg]
But implementation (codegen.rs:256-270):
fn build_jump_instruction(opcode: Opcode, args: &[Token]) -> Result<Instruction, AssembleError> {
let address = expect_token!(address_token, Immediate)?;
let offset = expect_token!(offset_token, Register)?;
let instruction_args = args!(I, immediate: address as u16, r1: offset);
This expects:
- First arg: immediate (address)
- Second arg: register (offset)
So assembly syntax should be:
jmp immediate, offset_register
Example:
jmp 0x1000, zero ; Jump to 0x1000
jmp 4, ret ; Jump to (ret + 4)
Documentation syntax is correct, but parameter names are confusing!
The "address" is actually an OFFSET, and the register is the BASE!
Better naming:
jmp offset, base_register
; Target = base_register + offset
16. NOT Instruction Operand Count ✅ MINOR ISSUE
Documentation shows:
not src, dest ; Two operands
Implementation (instructions.rs:428-429):
Self::Compare(args) | Self::Not(args) => {
write!(f, " {}, {}", args.sr1, args.sr2)
}
This displays BOTH sr1 and sr2 for NOT!
But codegen.rs:354-362:
fn build_not_instruction(args: &[Token]) -> Result<Instruction, AssembleError> {
let reg = expect_token!(reg_token, Register)?;
let dest = expect_token!(dest_token, Register)?;
Ok(Instruction::Not(args!(R, sr1: reg, dr: dest)))
Sets sr1 and dr, NOT sr1 and sr2!
The Display impl is WRONG - should show sr1 and dr:
Self::Not(args) => write!(f, " {}, {}", args.sr1, args.dr)
This is a display bug in the implementation!
17. Register File Indexing ✅ CORRECT
Documentation and implementation both agree:
- 0x00-0x0F: rg0-rgf (general purpose)
- 0x10: acc
- 0x11: spr
- 0x12: bpr
- 0x13: ret
- 0x14: idr
- 0x15: mmr
- 0x16: zero
- 0x17: noreg
This matches perfectly.
18. Immediate Arithmetic Destination ⚠️ MINOR
Implementation (codegen.rs:314-330):
fn build_arithmetic_immediate_instruction(...) -> Result<Instruction, AssembleError> {
let reg = expect_token!(reg_token, Register)?;
let immediate = expect_token!(immediate_token, Immediate)? as u16;
let dest = expect_token!(dest_token, Register)?;
let instruction_args = args!(I, immediate: immediate, r1: reg, r2: dest);
This REQUIRES three arguments:
- Source register
- Immediate value
- Destination register
But documentation says destination is optional:
iadd src_reg, imm [, dest_reg] ; dest optional
Reality: The assembler REQUIRES the destination register!
If you want in-place operation:
iadd rg0, 10, rg0 ; Required to specify rg0 twice
Not:
iadd rg0, 10 ; This won't work!
Documentation is misleading - destination is NOT optional!
19. Memory Instruction Offsets ✅ CORRECT
Implementation correctly handles signed 16-bit offsets:
let offset = expect_token!(offset_token, Immediate)? as u16;
These are stored as u16 but interpreted as signed i16 at runtime.
Documentation is correct about this.
20. Instruction Opcode Values ✅ VERIFIED
Comparing model.rs opcodes with instructions.rs:
| Instruction | model.rs | instructions.rs | Match |
|---|---|---|---|
| Nop | 0x00 | 0x0 | ✅ |
| Mov | 0x01 | 0x1 | ✅ |
| MovSigned | 0x02 | 0x2 | ✅ |
| LoadByte | 0x03 | 0x3 | ✅ |
| ... | ... | ... | ✅ |
| AddImmediate | 0x25 | 0x25 | ✅ |
| SubImmediate | 0x26 | 0x26 | ✅ |
| Segment | 0x27 | 0x3F | ❌ MISMATCH! |
CRITICAL: Segment instruction has opcode 0x27 in model.rs but 0x3F in instructions.rs!
Summary of Critical Issues
Must Fix in Documentation:
- ✅ Stack grows DOWNWARD - flip all diagrams
- ✅ CALL expansion - uses stack, not ret register directly
- ✅ RETURN expansion - loads from stack, jumps to ret+4
- ✅ Stack frame layout - flip diagram vertically
- ✅ Load pseudo scratch register - uses DEST reg, not rgf
- ✅ Store pseudo scratch register - uses ACC, not rgf
- ✅ Add PUSHA/POPA documentation
- ✅ Add SEGMENT instruction documentation
- ✅ Add DATA instruction documentation
- ✅ Clarify LUI immediate value handling
- ✅ Fix endianness - data definition uses BIG endian
- ✅ IADD/ISUB destination NOT optional
- ✅ Add "null" as alias for noreg
- ✅ Fix Segment opcode - 0x27 or 0x3F?
Potential Implementation Bugs:
- ⚠️ Shift instruction - doesn't handle register shifts
- ⚠️ NOT display - shows sr2 instead of dr
- ⚠️ RETURN +4 offset - why is this needed?
- ⚠️ Segment opcode mismatch - 0x27 vs 0x3F
Minor Documentation Improvements:
- Add explicit examples of stack growth direction
- Show complete memory layout diagrams
- Document which registers are volatile/preserved
- Add troubleshooting section for common mistakes
- Clarify jump instruction parameter semantics