Automated parameter parsing in PROMAL

Or, decrypting the unencrypted cryptic code

I decided to spend some more time looking at how PROMAL works internally. The next routine I decided to examine is BLKMOV, as its function is similar to the MOVSTR function examined earlier.

Let’s have a look at the jump table to find BLKMOV:

EXT ASM PROC BLKMOV AT $F30
0f30  4C A9 21    JMP $21A9

Easy enough! Let’s have a look to see how BLKMOV initializes itself, and to see if it accepts a 16-bit length.

21a9  20 15 18    JSR $1815
21ac  33 E0       RLA ($E0),Y
21ae  38          SEC
21af  34 2C       NOOP $2C,X
21b1  A5 34       LDA $34
21b3  38          SEC
21b4  E5 2C       SBC $2C
21b6  A8          TAY
21b7  A5 35       LDA $35
21b9  E5 2D       SBC $2D
21bb  AA          TAX
21bc  98          TYA
21bd  C5 38       CMP $38
21bf  8A          TXA
21c0  E5 39       SBC $39

Wait. WHAT? (Almost) no professional programmer would use illegal opcodes for a final product. The RLA and NOOP $2C,X are invalid opcodes. Also, I’ve tested PROMAL and found that it works well on the CMD SuperCPU accelerator, which will behave differently with illegal opcodes and cause PROMAL to crash.

Caller relative addressing
Let’s see what exactly is going on with 1815. I’ve studied the routine ahead of time and added some helpful comments and names to the disassembly. Because this routine is used by many library routines, I will refer to the caller as “the parent”. After analyzing the routine I’ll provide BLKMOVs results to help clarify the routine’s work.

                                        ; we are always a grandchild!
                                        ; parameter = value from p-code interp
                                        ; argument  = value from caller's table
1815: 68          getprm  PLA           ; get parents address
1816: 85 3A               STA Parent    ;
1818: 68                  PLA           ;
1819: 85 3B               STA Parent+1  ;   and save for our use
181B: 68                  PLA           ; get grandparent's address
181C: 85 6D               STA GParent   ;
181E: 68                  PLA           ;
181F: 85 6E               STA GParent+1 ;   and save for later restoration
1821: 84 6A               STY numparms  ; save number of parameters
1823: A0 01               LDY #$01      ; initialize index
1825: B1 3A               LDA (Parent),Y; get min/max parameters

I knew something was odd. This is an uncommon but handy trick: The data after the JSR in the parent is never executed. If you examine 1816, you see that it stores the parent’s address from the stack as MOVSTR did. It then stores the parent’s parent’s return address (‘GParent’) so that it can get to the parameters on the stack later. After that’s all set up, the number of parameters from PROMAL is saved to numparms and the Y index is initialized to 1.

Why all this work? The method is used when resources are tight: We have parameters on the stack that get processed by many routines in the library. It’s best for code space efficiency if a single routine handles these parameters. However, not all library routines use the same number or even type of parameters. That’s where this routine comes in. The arguments for the ‘getprm’ routine are stored after the JSR from the library routine calling it. This way each library routine will be able to specify what type of information it expects to find on the stack.

On arguments and parameters
In this post I need to distinguish between two things: The data used by the getprm routine, and the data the parent needs from the stack. In this case, ‘argument’ refers to data used by getprm, and ‘parameter’ refers to any data passed on the stack by PROMAL. This is done in consistency with the MOVSTR post.

Let’s have a good look at this routine to understand what it does.

Setting things up
We already have the calling routines’ addresses safely shuffled away, and we have our first argument retrieved from the parent.

1827: 29 0F               AND #$0F      ;  Mask max #parms off
1829: 85 69               STA maxparms  ;   and save
182B: C5 6A               CMP numparms  ;  compare with paremeter count
182D: 90 10               BCC getperr   ;   too many, runtime error

In the segment above, the argument loaded from the parent is masked off and saved. Studying the routine ahead of time helped me understand that the low half of the first argument is the ‘minimum’ number of arguments the parent requires. If the number of arguments provided by PROMAL (‘CMP numparms’) is larger than the maximum, the routine branches off to a fatal runtime error (‘BCC getperr’).

182F: B1 3A               LDA (Parent),Y; get min/max parameters
1831: F0 40               BEQ getpfin   ;  0/0 parms? exit.

The argument is reloaded since it was mangled when setting up maxparms. While it’s loaded and un-mangled, the routine checks to see if there are no parameters to be loaded. If this is the case, the routine exits. It would seem to make no sense to call this routine if you don’t want any parameters. I’d agree, but there must be a good reason to call it in this fashion as a few library routines do just that.

1833: 4A                  LSR           ;
1834: 4A                  LSR           ;
1835: 4A                  LSR           ;
1836: 4A                  LSR           ;
1837: 85 6B               STA minparms  ;  save min #parms
1839: C5 6A               CMP numparms  ;  compare with parameter count
183B: F0 05               BEQ getpok    ;  same? ok.
183D: 90 03               BCC getpok    ;  more than min parms? ok.
183F: 4C 80 10    getperr JMP syserr    ; fail out via system error

Now, the upper half of the first argument is shifted down and store in ‘minparms’. It’s again compared to the numparms value, this time to determine if there are at least the correct number of parameters (beq: bcc).  If not, the routine fails through to a jump to PROMAL’s fatal runtime error routine.

1842: C8          getpok  INY           ; Increment index
1843: B1 3A               LDA (Parent),Y; Get mask bits

There will be a lot of INY : LDA (Parent),y as the routine works its way through the argument table.

1845: 85 6C               STA getpmsk   ; Save in mask byte

The second argument is stored to getpmsk, short for ‘getprm mask.’ This byte is actually eight flags, each indicating the type of parameter to get from the stack. There are two types of data and one way to work with each. As a quick reminder, PROMAL always pushes parameters as words, even when they’re bytes.

Bit = 0    Parameter is a byte
           The next argument byte is a zero-page address and a default value.
           * Store this byte where specified at the address
           * Load and discard high byte from stack if applicable
           * Load and store low byte from stack at the address if applicable
Bit = 1    Parameter is a word
           This argument is one zero-page address.
           * Load and store the high byte from stack at the address+1
           * Load and store the low byte from the stack at the address

The routine appears to not have any facilities for handling a default 16-bit value.  It’ll be up to the parent to detect a missing 16-bit parameter and set up a default value in its place.

Processing arguments and setting up parameters
At this point, the routine is initialized and ready to load parameters as specified by the parent until it’s out of arguments.

1847: C6 69       getpl   DEC maxparms  ; Decrement parameter count
1849: 30 28               BMI getpfin   ;  Out of parms? exit.

Maxparms is now used as a count-down value to determine when the routine’s out of arguments. The name of the location is a bit of a misnomer, I apologize.

184B: C8                  INY           ;
184C: B1 3A               LDA (Parent),Y; get zp address
184E: AA                  TAX           ;  and save

The arguments now always start with a zero page address. This is read from the argument table and saved in the X register to be used as an index. This allows the code to run without modifying itself and is a good example of advanced indexing when used in this situation.

184F: A5 69               LDA maxparms  ; check max parms
1851: C5 6B               CMP minparms  ;  are we out of required parms?
1853: 90 0F               BCC gpfprm    ;  No, go pull it off the stack.

In this section of the loop, maxparms is compared with minparms to determine whether or not we’re out of required parameters.

1855: 24 6C               BIT getpmsk   ; is current parm a word?
1857: 30 05               BMI gpisw     ;  yep, skip

Remember the paramter type mask? This is one of the two checks against the flags in the loop. The BIT instruction does a handful of things, but of interest to the routine is the way it copies bit 7 of getpmsk to the negative flag without modifying any other registers. In this case, if a parameter is a word the negative flag gets set and the BMI (branch if minus) routes the cpu to gpisw (short for getparm is word), below.

1859: C8                  INY           ;
185A: B1 3A               LDA (Parent),Y; get default value or low byte
185C: 95 00               STA 0,X    ;  store at zp address
185E: A5 69       gpisw   LDA maxparms  ; is current argument
1860: C5 6A               CMP numparms  ; greater than parameter count?
1862: B0 0A               BCS gpdefl    ;  Yes, process default value

The next argument byte is loaded if it’s a ‘byte’ type. It’s stored at the zero page location pointed to by X, which was read in earlier. Then it follows through to ‘gpisw’, which checks the current argument against the number of parameters provided to the parent by PROMAL. If we’re out of parameters, we skip off to gpdefl, which is short for ‘getparm default’.

1864: 68          gpfprm  PLA           ; Get parameter from stack
1865: 24 6C               BIT getpmsk   ; current parm = word?
1867: 10 02               BPL gpis8     ;  no, skip high byte store
1869: 95 01               STA 1,X   ;  * store high byte if word
186B: 68          gpis8   PLA           ; get low byte or default value
186C: 95 00               STA 0,X    ; store where requested

In a moment whose reason eludes me, I called this branch point gpfprm. What this section does is first pull the high byte of the next parameter from PROMAL off the stack and then check the parameter type mask to see if it’s a byte type. If so (BPL, as bit 7 would be a zero its plus or positive), it skips to gpis8, discarding the byte. If it’s a word, it gets stored to X+1 by using a base of 1 instead of 0.
Gpis8 always pulls and stores the byte to the location indicated by X.

Defaults!
This method is clever: The routine first loads the default value from the argument block into memory, and then only loads a value if there’s one available on the stack. It’s a good way of ensuring a default is in place if it’s not specified by PROMAL.

186E: 06 6C       gpdefl  ASL getpmsk   ; shift parameter bit mask
1870: 4C 47 18            JMP getpl     ; loop

The argument mask is shifted one to the left to ensure it stays in sync with the argument index in Y. Then, the loop is restarted.

Cleaning up

1873: 98          getpfin TYA           ; transfer index to A
1874: 18                  CLC           ; pre for math
1875: 65 3A               ADC Parent    ; Add our parent's return address
1877: AA                  TAX           ;
1878: A5 3B               LDA Parent+1  ;
187A: 69 00               ADC #$00      ;
187C: 48                  PHA           ;  and put on stack
187D: 8A                  TXA           ;
187E: 48                  PHA           ;  for rts.

As we don’t want to return into a data block, we’ll add our current value for Y to the parent’s calling address and put it on the stack. This ensures we safely RTS into the byte following the argument table.

187F: A5 6D               LDA GParent   ; get grandparent's address
1881: 85 3A               STA Parent    ; place where p-code expects parent's
1883: A5 6E               LDA GParent+1 ;
1885: 85 3B               STA Parent+1  ;
1887: 60                  RTS           ; and return to arg table+1

And as a last bit of cleanup, the grandparent’s address is placed where our parent would expect it to be, leaving the runtime in a good state and keeping the stack clear.

How BLKMOV used this routine
BLKMOV used this routine to set up all of its zero page vectors. Once getprm is done, the routine looks largely like MOVSTR, so I’ll (probably) cover it later.

Here’s what getprm did for BLKMOV. I’ll include the first part of BLKMOV again, with a bit better formatting since we know what the data following the JSR is for.

21a9  20 15 18    JSR $1815     ; jsr to getprm
                  .byte $33     ; min/max number of parameters
                  .byte $e0     ; %1110 0000 - all three parameters are words
                  .byte $38     ; first word stores at $38 and $39
                  .byte $34     ; second word stores at $34 and $35
                  .byte $2c     ; third word stores at $2c and $2d

When getprm runs for BLKMOV, it performs these actions:
* Writes the last parameter (Count) to $38 and $39
* Writes the second parameter (From) to $34 and $35
* Writes the first parameter (To) to $2c and $2d
* Cleans house and returns to the byte following the argument table at 21b1

This might look a little familiar. MOVSTR uses the same vectors.

In summary
The routine is very handy in that you can easily specify what you need loaded, as well as quickly specifying how many parameters you require and how many you can take. The limit for the number of parameters is sensibly eight, given the mask argument is a byte providing 8 flags for parameter types.

The getprm routine is used by many routines in the PROMAL system, including (but not limited to) GETC, GETL, BLKMOV, OPEN, CLOSE, CHKSUM, and EDLINE.

Advertisements

Moving data in PROMAL

or, Losing your mind with PROMAL

Learning how things work

In a recent experiment in learning to work with PROMAL, I needed a method for moving pieces of data around in memory to split strings. The MOVSTR library procedure seemed ideal, but consistently missed the mark and corrupted memory.

As it turns out, I had an addressing issue as well as a misunderstanding of what the procedure will do for me.

For our reference, I’ll quote the MOVSTR proc’s documentation from the Library Manual.

PROC MOVSTR    COPY OR JOIN STRINGS OR EXACT SUBSTRING

USAGE: MOVSTR FromString, ToString [,Limit]

MOVSTR is a procedure which is used to copy strings, to concatenate strings, or extract substrings (i.e., replaces the LEFT$, MID$, and RIGHT$ functions found in BASIC).  FromString is the address of the string to copy.  ToString is the address of the destination.  Limit is an optional argument specifying the maximum number of characters to copy.

This brings up some useful syntax in PROMAL:  Specifying the address of a string.  In my project, I needed to extract the middle of a string and deposit into another variable.  My first attempt used this method:

movstr buf[3], name, 16

A person would think then, movstr would copy from buf[3] to buf[19] into name, but this was not the case.  After some deep debugging of the PROMAL library routine itself, I learned that I was in fact telling PROMAL to use the address at buf[3] and buf[4] as the source for the string to put into name.  This is an inconsistency in addressing that was learned:  When I specify ‘movstr buf,name,16’ it will use the location of buf[], but if I use ‘movstr buf[3],name,16’ it instead uses a vector placed at buf[3].  To fix this issue, use the # operator to specify ‘the address of…’:

movstr #buf[3], name, 16

The alternate format tells the compiler to use the address of buf[3] instead of a vector at the same location.

Learning inner workings through an assembly debugger

Disclaimer: Most 8-bit fans will balk at using an emulator to develop programs for their beloved 8-bit systems.  I do heavily prefer to develop on the machine itself, but there’s little that beats a debugging system that will stop the system cold: video refresh, hardware timers, everything gets paused.  As I was having trouble doing local development, I moved my data to the Vice emulator and got to work.

The MOVSTR function

In the library, the definition for MOVSTR is ‘EXT ASM PROC MOVSTR AT $F33’.  This tells the compiler that MOVSTR is a procedure that can be called directly in memory at location $0f33.  In my particular installation of Vice, Alt-H opens the debugger, pausing the emulation.  A quick look at f33 will show that it’s part of a jump table:

(C:$f33e) d f33
.C:0f33  4C 38 22    JMP $2238
.C:0f36  4C 18 22    JMP $2218
.C:0f39  4C DF 26    JMP $26DF

Of course, the only real interest is the first instruction: jmp $2238.  Let’s have a look there.

Exploratory surgery (or, finding out how PROMAL thinks)

At $2238 is a fairly straightforward routine.  For reference, the code below is being called by my test program after things are in working order as that’s the only debug log I saved.  There’s still a lot to learn!

Here’s the processor registers when the call is made: a=3 x=3 y=3 sp=f7

First, promal saves the calling address in a scratch space so it can return to the caller:

2238  68          PLA
2239  85 3A       STA $3A
223b  68          PLA
223c  85 3B       STA $3B

Now, we can examine the stack and prepare something ahead of time:

223e  A9 FF       LDA #$FF
2240  C0 03       CPY #$03
2242  D0 03       BNE $2247

At this point, I recognize the #03:  Movstr can have two or three parameters, so apparently the Y register holds the number of parameters for the function.  I specified three in my application, so this falls through to the next instruction:

2244  68          PLA
2245  68          PLA
2246  88          DEY
2247  85 38       STA $38

At first, this confused me greatly.  Why would you pull two bytes from the stack without saving the first?  As it turns out, the third parameter is only supposed to be a byte, rather than a word.  However, the compiler apparently always pushes words to the stack.  The first PLA simply pulls the unused high byte of the word and discards it.  DEY is a setup for the next compare below:

2249  C0 02       CPY #$02
224b  F0 03       BEQ $2250
224d  4C 80 10    JMP $1080

Here’s the second check.  Remember, movstr can have two or three parameters.  Here, Y is checked to see if it’s 2.  If it is, the jmp is skipped.  For reference, $1080 is a runtime error routine.  I checked by entering ‘go 1080’ in the promal executive.  PROMAL replied with this:

*** RUNTIME ERROR: ILLEGAL # ARGS, LIB. CALL
AT $C3F3
*** PROGRAM ABORTED.

Continuing to $2250, the routine then gathers more information:

2250  68          PLA
2251  85 35       STA $35
2253  68          PLA
2254  85 34       STA $34
2256  68          PLA
2257  85 2D       STA $2D
2259  68          PLA
225a  85 2C       STA $2C

At this point, the MOVSTR routine has everything set up for the routine below.  The [limit] was processed early on, and now the [tostring] and [fromstring] parameters are stored in zero-page as well.  Tearing apart the actual copy routine is beyond the scope of this post, but I’ll include it for reference.

225c  A5 34       LDA $34
225e  38          SEC
225f  E5 2C       SBC $2C
2261  AA          TAX
2262  A5 35       LDA $35
2264  E5 2D       SBC $2D
2266  D0 1F       BNE $2287
2268  8A          TXA
2269  C5 38       CMP $38
226b  B0 1A       BCS $2287
226d  A0 00       LDY #$00
226f  B1 2C       LDA ($2C),Y
2271  F0 0A       BEQ $227D
2273  C8          INY
2274  C4 38       CPY $38
2276  90 F7       BCC $226F
2278  A9 00       LDA #$00
227a  F0 03       BEQ $227F
227c  88          DEY
227d  B1 2C       LDA ($2C),Y
227f  91 34       STA ($34),Y
2281  C0 00       CPY #$00
2283  D0 F7       BNE $227C
2285  F0 15       BEQ $229C
2287  A0 00       LDY #$00
2289  A5 38       LDA $38
228b  F0 0D       BEQ $229A
228d  B1 2C       LDA ($2C),Y
228f  91 34       STA ($34),Y
2291  F0 09       BEQ $229C
2293  C8          INY
2294  C4 38       CPY $38
2296  90 F5       BCC $228D
2298  A9 00       LDA #$00
229a  91 34       STA ($34),Y

Remember how we started?  We stored the return address at $3a so we could examine the parameters on the stack.  To return, an internal routine is then run, which does the work of putting the calling routine back on the stack and returning:

229c  4C 69 20    JMP $2069
[at $2069]
2069  A5 3B       LDA $3B
206b  48          PHA
206c  A5 3A       LDA $3A
206e  48          PHA
206f  60          RTS

In Summary…

Lesson learned?  The [limit] parameter for MOVSTR has a maximum value of 255, and one has to be very careful about how the parameters are specified.  We don’t have the luxury of a memory protection unit that modern systems have, so an incorrectly specified parameter can cause the whole environment to be overwritten at random.

Also, if you looked carefully at the entire routine, you’ll notice that the copy will stop on the first null ($00) byte it finds.  As it’s a ‘string’ move rather than a block move, it makes sense considering PROMAL uses ‘ascii-z’ strings.

I also got a good chance to see how exactly the PROMAL compiler passes its data to procedures via the stack.  What I learned is confirmed in the promal language indexes, specifically the section on calling external assembly functions and procedures.