-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathvm.txt
256 lines (240 loc) · 12.3 KB
/
vm.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
Meisaka Wave 2 Vector CPU
Every user get's their own "thread"
All memory addresses hold a 16 bit word
i.e. 0x0000 => 0xABCD, 0x0001 => 0xEF39
All CPU registers are SIMD, and hold a vector of 4x 16bit words
A few special instructions, will treat registers as 8x 8bit words
CPU register words are loaded and stored in little endian order:
i.e. the instruction pointer is at address 0x060
Bytes are also imported into words in little endian order.
each "thread" has:
8 Constant registers (CPU readonly, can be written as inputs)
7 General registers (CPU can be read/write from all address spaces)
The instruction register (special, but used like a general register)
64 Words of private access RAM
The instruction pointer is vector 0 (X) from the instruction register
All instruction opcodes are 16 bits, and use one address worth of memory
System memory map:
0(0x0000) ..= 31(0x001F) => System constant registers, accessible via load
32(0x0020) ..= 63(0x003F) => System mutable registers, accessible via load/store
64(0x0040) ..= 127(0x007F) => Thread private RAM
128(0x0080) ..= 2047(0x07FF) => For future use (read as zero for now)
2048(0x0800) ..= 4095(0x0FFF) => Shared RAM (all user threads can read/write)
+0 +1 +2 +3 Address
+----+----+----+----+
c0 0x0000 | X | Y | Z | W | 0x0000 (beginning of !vm write)
: : | | | | | :
c7 0x001c | X | Y | Z | W | 0x001f
+----+----+----+----+
r0 0x0020 | X | Y | Z | W | 0x0020
: : | | | | | :
r7 0x0038 | X | Y | Z | W | 0x003b
+----+----+----+----+
ri 0x003c | pc | Y | Z | W | 0x003f
+----+----+----+----+
+0 +1 +2 +3 Address
+----+----+----+----+ <- start of address wrap
0x0040 | P P P P | 0x0043 (beginning of !vm code)
: | Private Memory | :
0x007c | P P P P | 0x007f (end of user writable memory)
+----+----+----+----+ <- end of address wrap
0x0080 |0000 0000 0000 0000| 0x0083 (reserved area)
: | | :
0x07fc |0000 0000 0000 0000| 0x07ff
+----+----+----+----+ <- start of address wrap
0x0800 | S S S S | 0x0803 (shared memory aparture)
: | Shared Memory | :
0x0ffc | S S S S | 0x0fff
+----+----+----+----+ <- end of address wrap
memory within in shared RAM is visualized as blocky pixels
The display matrix is 128 pixels wide, by 16 pixels tall
Accending pixel addresses are rastered from left to right, top down.
Pixel address from position is: 0x800 + (0x80 * Y) + X
The pixel format is rrrr rggg gggb bbbb => i.e. 0xf800 is pure RED
this mode is called RGB565
Memory accesses will behave differently depending on
where the CPU is currently executing instructions:
- when PC is within private memory:
+ any action that increments an address (load, store, pc fetch)
with the address 0x007f: will wrap to 0x0040
i.e. load with increment with address of 0x007e
will read from 0x007e, 0x007f, 0x0040, 0x0041
the register will point at 0x0042 after.
+ loads and stores to private memory are atomic
+ loads and stores to shared memory are non-atomic
shared memory will be accessed a single word at a time
and incur significant delay between words
- when PC is within shared memory:
+ any action that increments an address (load, store, pc fetch)
with the address 0x0fff: will wrap to 0x0800
+ loads and stores to private memory are non-atomic
private memory will be accessed a single word at a time
and incur significant delay between words
+ loads and stores to shared memory are atomic
and will be accessed without delay
+ registers should be used for fast transfer between memory regions
loads and stores to the register area of private memory still incurs delays
Register Index values:
0..= 7 => constant registers 0 through 7 ("c0"..="c7") respectively
8..=14 => general registers 0 through 6 ("r0"..="r6") respectively
15 => instruction pointer register ("ri") word 0 ("x") is the instruction pointer
words 1 to 3 ("y", "z", "w") are general-ish use (but useful as a mini-stack)
Instruction set:
0..= 3 => System, WSelect, Extra2, Extra3,
4..= 7 => Move, Swizzle, Load, Store,
8..=11 => Math8, Math16, Shift8, Shift16,
12..=15 => BitOp, SpecOp, Extra14, Extra15
<- msb lsb ->
dddd ssss xxxx oooo
dest source extra opcode
source and destination specify a register index, in the order described above.
Extra and the extra bits are for extended and future instructions,
if the instruction does not use bits within a field, they must be zero.
Example of instructions and their encodings:
Move r0, c0 => 8004 // move c0 into r0
Store *[r4+], c5 => 5c37 // scatter store values in c5 at the 4 addresses in r4, incrementing each address
Swizzle.xxyz ri => f905 // push pc onto a "stack" (ri.w = ri.z, ri.z = ri.y, ri.y = ri.x, ri.x = ri.x)
Xor r1, r1 => 996c // clear r1 to all zero
CompareEq16 r0, r1 => 8979 // test each field of r0 against r1, store 0xffff or 0 into r0 fields
Move.yzw r0, r1 => 89e4 // move only y,z,w of r1 into r0, leaving r0.x unchanged
SubRev16 ri, r0 => f889 // compute ri - r0, put result into ri
// if r0 is set to 0 or 0xffff by a compare,
// this will conditionally skip the next instruction
System executes special instructions based on extra bits:
0b0000=> Halt the thread
0b0001=> Sleep will suspend the thread for a certain number of ticks
a sleep duration of zero acts like a no-op.
the lower 3 bits of the destination field control the sleep duration:
0b00 => source field used as a number of ticks (0 to 15)
0b01 => lowest byte of source register as number of ticks
0b10 => high byte of the lowest word of source register as number of ticks
0b11 => lowest word of source register as number of ticks
The WSelect opcodes perform an operation on a single word from
the source register to and the X element of the destination register
the extra bits are used to select which word from source register and operation:
-> word select: 0b00__=> X, 0b01__=> Y, 0b10__=> Z, 0b11__=> W
operators:
0b__00 => (WMove) copy source word to destination X
0b__01 => (WSwap) swap source word with destination X
0b__10 => (WAdd ) add source word to destination X // dst.x = dst.x + src.*
put result in destination X
0b__11 => (WSub ) subtract source word from destination X // dst.x = dst.x - src.*
put result in destination X
Move copies source register words to destination register
-> extra bits set to one will not copy the respective word (performs a "Mix" operation)
Swizzle re-arranges or copies the destination register words according to bits in "extra" and "source"
-> given register [X,Y,Z,W] source: 0bWWZZ, extra: 0bYYXX
-> every two bit index specifies which word to swizzle from.
Load and Store use the word(s) in the source register as an address
-> the destination register will be loaded or written to memory
-> "extra" bits used as modifiers:
-> upper bits specify the number of words to load or store
(words are accessed starting at X):
0b00__=> 4, 0b01__=> 3, 0b10__=> 2, 0b11__=> 1
-> 0b__0_ => sequential (low word from source as address)
-> 0b__1_ => scatter/gather (each word in source used as address),
-> 0b___0 => normal load/store
-> 0b___1 => increment source address(es) after each load/store (updates source register)
Math# opcodes use source register as left hand and destination register as right
-> extra bits used to select math operation:
0x0 => Add (dest = src + dest)
0x1 => Sub (dest = src - dest)
0x2 => RSub (dest = dest - src)
0x3 => Eq (equal: -1, not equal: 0)
0x4 => Carry Carry( src + dest) (No carry: 0, carry: -1)
0x5 => LessU Carry( src - dest) (src >= dest: 0, src < dest: -1) unsigned
0x6 => GreaterU Carry(dest - src) (src <= dest: 0, src > dest: -1) unsigned
0x7 => NotEq (equal: 0, not equal: -1)
0x8 => AddSat (dest = src + dest) (Signed Saturate)
0x9 => SubSat (dest = src - dest) (Signed Saturate)
0xA => RSubSat (dest = dest - src) (Signed Saturate)
0xB => GreaterEqU Carry( src - dest) (src < dest: 0, src >= dest: -1) unsigned
0xC => AddOver Over( src + dest) (overflow flips sign: -1, no overflow: 0)
0xD => SubOver Over( src - dest) (overflow flips sign: -1, no overflow: 0)
0xE => RSubOver Over(dest - src) (overflow flips sign: -1, no overflow: 0)
0xF => LessEqU Carry(dest - src) (src > dest: 0, src <= dest: -1) unsigned
Shift# opcodes perform bitshifts on the destination register
-> extra bits control shift:
0b_000 => Left Shift,
0b_001 => Right Shift (logical),
0b_010 => Right Shift (sign extend)
0b_011 => Left Rotate
0b_100 => Left Shift (#-shift),
0b_101 => Right Shift (#-shift) (logical),
0b_110 => Right Shift (#-shift) (sign extend)
0b_111 => Right Rotate
0b0___ => Low (#8 => 3, #16 => 4) bits of source register words as shift amount
0b1___ => source field as shift amount
BitOp performs a binary operation on source and destination
-> extra bits specify the operation:
0x0 => One, 0xF => All
0x8 => And, 0xE => Or, 0x6 => Xor
0x7 => Nand, 0x1 => Nor, 0x9 => Xnor
0xA => Swap
0xC => (reserved, currently no-op)
0x5 => NotSrc, 0x3 => NotDest
0x2 => Src And NotDest (Dest Clears Src)
0x4 => NotSrc And Dest (Src Clears Dest)
0xB => Src Or NotDest
0xD => NotSrc Or Dest
SpecOp extra math operations
-> extra bits used to select operation:
0b0000 => Horizontal Add: dest.xyzw = src.x + src.y + src.z + src.w
0b0001 => Dot product: Horizonal Add, but src is src * dst
default constant registers:
0=> [0, 0, 0, 0]
1=> [0, 0, 0, 0]
2=> [0, 0, 0, 0]
3=> [0, 0, 0, 0]
// all 1
4=> [1, 1, 1, 1],
// max pixel field values for R, G, B, A (todo)
5=> [31, 63, 31, 0],
// R shift, G shift, B shift, A (todo)
6=> [11, 5, 0, 0]
// start of shared RAM, start of program, future use: -1 for now
7=> [2048, 64, -1, -1]
standard registers default to all zero
instruction pointer will default to start of private RAM
the !vm commands to control your thread:
!vm help (link to this info page)
!vm asm <program> (tiny in-chat assembler: TODO)
!vm halt (stop your thread)
!vm run (default the instruction pointer and start your thread)
!vm restart (defaults the instruction register, without directly affecting any other registers)
!vm clear (resets all thread memory and registers to their defaults)
!vm dump (prints the current state of the VM to the terminal, not very useful remotely)
!vm write <data>
!vm code <data>
<data> for !vm "write" and "code" consists of hex without any prefix
it's encoded from the most significant nibble first, to the least significant nibble last
i.e. the 0x123f 0x2014 values would be writen "123f2014"
all spaces or non-hex characters in input are ignored, except '!'
the presence of a ! in the input will end parsing of hex characters,
this will switch back to parsing sub-commands, a space is required after the '!'
i.e. "!vm halt write 1234! code 1234! restart run"
"!vm write" starts at address 000, and thus write to the constant registers first
"!vm code" starts at address 040, and thus write to the private RAM
The write commands can not write to shared memory, only running threads may do so.
Runic encoding of values is supported:
ᚺ => 0x0, ᚾ => 0x1, ᛁ => 0x2, ᛃ => 0x3,
ᛈ => 0x4, ᛇ => 0x5, ᛉ => 0x6, ᛊ => 0x7,
ᛏ => 0x8, ᛒ => 0x9, ᛖ => 0xA, ᛗ => 0xB,
ᛚ => 0xC, ᛜ => 0xD, ᛞ => 0xE, ᛟ => 0xF,
0123 4567 89AB CDEF
ᚺᚾᛁᛃ ᛈᛇᛉᛊ ᛏᛒᛖᛗ ᛚᛜᛞᛟ
Additional commands are available as runes:
most of these require a prefix which is 0 to 3 hex values (or hex runes)
used as a parameter.
Parameter of 0 will run rune actions at least once.
[N]ᚢ => skip forward in the address space by N
ᚨ => skip forward a word without writting nor affecting other input
[N]× => right align and write current value N
[N]ᚲ => left align and write current value N
the parameter to the align runes is used as an incomplete word
slightly useful for short values
[N]ᚠ => write N words consisting of 0000
[N]ᚱ => repeat the "last written" value 1 or N times
the ᚠ rune does not affect "last written"
simple-ish program to clear/set shared memory
write 0000 0000 0000 0000 0800 ! code 8104 f005 0817 f555