Bringing the Espressobin up

Or, why did I buy this again?  Oh, right- it looked cool.

The Espressobin I purchased via the Globalscale kickstarter arrived Saturday and I finally took the time to work with it.  It’s a rather neat machine that solves an issue I had with small system boards- I wanted a small format low-power router capable system and SBC’s either had a single ethernet port or were too costly to consider.  The three independent ethernet ports and 1G of ram will ensure I won’t have any concern for resources.  Additional hardware details for the espressobin is well covered in Globalscale’s website.

First, a bit of stumbling.

The bringup instructions for their board covers using Ubuntu 14.04 in good detail, but I wanted to use 16.04 as it’s more recent.  Here, I’ll cover the issues I had and how I solved them.  Credits go to Globalscale for their Ubuntu 14.04 instructions, on which these are based.

  • First, there’s apparently no binary kernel on their website, so I used their instructions for installing the toolchain and kernel source to build a working kernel for the board.  Too keep things simple, I just used the defconfig as instructed.
  • Next, I needed some storage for the board.  The version shipped to me doesn’t have an EMMC chip mounted, so I got an SD card and inserted it into my computer.  Ubuntu automatically mounts the sdcard, so I had to umount it to continue as I’ll be changing the filesystem on it.
  • Fdisk was used to change the partition type to 83 (linux).  The use of fdisk in linux should be well understood and I’ll excuse myself from covering the topic in detail as I’m not generally a good instructor.  Note that I only edited the partition type tag rather than zeroing out the MBR and repartitioning the card as Globalscale directs us to.  This is done under the assumption that the SDcard manufacturer aligned the first block of the partition on one of the SDcard’s flash erase boundaries, and re-creating the partition may start the partition in the middle of an erase block and cause filesystem allocation blocks to span multiple erase blocks, reducing performance significantly.  This theory is untested, so your results may vary.  I have the belief the manufacturer would know how the partition should be aligned and would ship the card appropriately configured.
  • Next I used mkfs.ext4 on the sdcard’s partition (mmcblk0p1 in my case as I have an onboard SDcard reader) since the u-boot config on the board can directly read ext4 filesystems.  I ejected the SDcard, waited a few seconds for Ubuntu to realize it’s ejected, and re-inserted it to get it to remount.
  • Time to drop Ubuntu into place.  Since there’s no installer for this board,  I got the 16.04 aarch64 tarball from ubuntu at and unpacked it onto the sdcard.  Be VERY careful as it contains a full root filesystem.  If it’s errantly unpacked in the root directory on another unix-like system it WILL destroy it unrecoverably.  Danger-danger, flailing arms, all that.  Seriously.
  • Now that we have a somewhat working root filesystem on the card, we need to copy the kernel that we built using globalscale’s instructions.  Copying the kernel in is also covered in their website, so I won’t repeat them here.
  • The Ubuntu 14.04 instructions for setting up ttyMV0 will apply to this install, and will offer us a login prompt.  Add the content below to /etc/init/ttyMV0.conf.

start on stopped rc or RUNLEVEL=[12345]
stop on runlevel [!12345]
exec /sbin/getty -L 115200 ttyMV0 vt100 -a root

  • All that was left was to resolve some dependencies for smoothly running apt and debconf.  The stock tarball for 16.04 is very minimally installed and configured, so some additional packages and configuration was necessary to get a quietly running system.  These steps are below.

At this point the startup went swimmingly indicating that ubuntu 16.04 runs happily with the kernel source and toolchain globalscale provided.  However, there was an issue with ttyMV0.conf timing out.  Research discovered some things that were mising, most specifically udev.  Also, Ubuntu 16.04’s stock getty doesn’t appear to auto-log in so we’ll have to set a root password and permit logins.

[add prose to seem clever] [possibly leave this here for a chuckle] [okay, chuckle it is.]

The steps:

In the end, the steps I took to build a brand new image are below.  Hopefully I didn’t forget to document any of them.

  • Insert sdcard and discover its device ID.  Mine’s fixed at /dev/mmcblk0 since it’s not a USB-type reader.
  • Umount the partition that gets automatically mounted
  • Use fdisk to tag partition 1 on the card as type 83 (linux) and mkfs.ext4 to reformat the partition
  • remount the sdcard and cd to its mountpoint
  • host$ curl /ubuntu-base/releases/16.04.2/release/ubuntu-base-16.04.2-base-arm64.tar.gz  |tar zxvf – DANGER! Make sure you’re in the sdcard’s directory before doing this.  It streams the tarball from the web rather than downloading it, so it’ll unpack immediately to wherever you type the command.
  • host$ apt install qemu-user-static on the host system so we can run arm64 binaries
  • host$ cp /usr/bin/qemu-static-aarch64 usr/bin/ on the sdcard so the system will find it during the chroot setup.
  • host$ cp /etc/resolv.conf etc/resolv.conf so that DNS will work within the card’s environment
  • host $ chroot . (include the dot at the end, it indicates to chroot into the current location).  The prompt won’t change much, so use uname -m to ensure you see aarch64 instead of your host’s native architecture.
  • chroot$ apt update
  • chroot$ apt install net-tools udev ifupdown vim iputils-ping less whiptail apt-utils ; It’s important to note that many warnings and errors may come up during this step.  It has a lot to do with being in a chroot without /proc mounted, and a little to do with just being in a chroot jail.
  • chroot$ passwd root
  • chroot$ echo ttyMV0 >> /etc/securetty
  • chroot$ exit
  • Install the kernel and dtb as instructed on Globalscale’s page
  • Set up u-boot as instructed on Globalscale’s page
  • sync, umount, insert into espressobin, and hopefully smile as you see Ubuntu boot up and request a login on the USB serial console.
  • Log in as root, using the password you entered earlier, and then fix language warnings by typing: locale-gen en_US.UTF-8
  • disable unusable VTY’s with: for i in 1 2 3 4 5 6; do systemctl mask getty@tty$i.service; done
  • Edit or create /etc/network/interfaces (you may need to install your favorite editor first) and add:

auto eth0
iface eth0 inet manual

auto lo
iface lo inet loopback

auto lan1
iface lan1 inet dhcp
pre-up /sbin/ifconfig lan1 up

My biggest stumbling block was the networking failing to start up.  I discovered that lan0, lan1, and wan won’t work unless eth0 is up.  The network tools will bring the interfaces up in parallel so lan1 comes up before eth0 is ready, and will stay down.  The key here is the pre-up statement on lan1, bringing it up manually before running dhcp.

  • set /etc/hostname to something meaningful (in my case, I used espressobin)

In closing

So what am I going to do with this machine?  It’ll likely serve as a router in the future.  Embedded routers are great for most people, but I use a dhcp configuration that I haven’t figured out how to do with the options offered by factory, ipfire, or *wrt distributions.  Also, I do have a mild preference for managing most things via ssh rather than a WWW interface.  It’s a matter of personal preferences as it’s how I initially learned to do many things in linux, including setting up NAT under ipfwadm and then ipchains.

Before I use the system as a router I’ll explore its other capabilities, such as the apparently fully functional mini-pci express slot as well as the sata interface.  I’d like to some day be brave enough to try mounting an EMMC chip on the pads Globalscale left in place.  Maybe.

My only real issue with the board is the little power LED.  I could probably hang it above an average solar panel and power a house with the light it puts out.  Okay, probably not.  However it is blindingly bright.  Hopefully Globalscale will revise the BOM by changing to a larger dropout resistor so it doesnt go to solar mode while powered up.

Thanks for reading. 🙂


Programming an EEPROM on a 6502 Microprocessor

Or, updating our firmware internally like a proper computer

The Kickstarter edition of the VIC-1112 IEEE-488 adapter has an EEPROM included on it to enable software updates or even general firmware hacking without removing the IC.  These devices are simple to program, but not quite as simple as just poke’ing ram.  The IC’s need some time to complete a write before they can be accessed again, and indicate their finished state by returning the data that was written to them.

Let’s explore the process of writing to an EEPROM using a generic 6502 system as a model, and expand on using a VIC-20 to do the work later.

I tested some simple write code using byte mode to simplify the code’s logic.  Since the device is only 8,192 bytes it should finish in a reasonable time in spite of not using page mode.

Here’s my first test, a simple program to write 256 bytes in a row to the EEPROM.

zpsrc   =   $42         ; zp source pointer
zpdst   =   zpsrc+2     ; zp pointer to EEPROM
eeprom  =   $A000       ; just to have it defined
code    =   $2100       ; etc.

        *=  $2000
prog    ldy #<code
        sty srcptr
        ldy #>code
        sty zpsrc+1
        ldy #<eeprom
        sty zpdst
        ldy #>eeprom
        sty zpdst+1

        ldy #$00
wloop   lda (zpsrc),y   ; get source
        sta (zpdst),y   ; tell EEPROM to write
cloop   cmp (zpdst),y   ; check to see if it's done
        bne cloop       ; nope, check again

        bne wloop
        rts             ; 256 bytes programmed.

This works well, but what if the EEPROM is faulty, or the code erroneously tries to program IO, open space, or ROM? What if the EEPROM is faulty and never responds with a success?  It fails badly by never moving forward. This is bad practice as the user waiting for the EEPROM is left completely in the dark when it fails.  Let’s add a timeout in the loop:

wloop   lda (zpsrc),y   ; get source
        sta (zpdst),y   ; tell EEPROM to write
        ldx #$00        ; prepare timeout and give EEPROM a little more time
cloop   cmp (zpdst),y   ; check to see if it's done
        beq wdone       ; nope, check again
        inx             ; count loops
        beq timeout     ; did we overflow? We failed!
        bne cloop       ; keep it as relocatable as we can

wdone   iny

The change is pretty straightforward:  the .X register is used as a timer since we can’t assume a generic system has a hardware timer available.  Datasheets indicate 10ms maximum is required for writing the EEPROM. Let’s allow twice that.  That requires the timeout to give up after a maximum of 20,000 cycles.

Let’s evaluate the loop’s minimal timing.

        sta (zpdst),y   ; doesn't count, this is where we start
        ldx #0          ; 2
cloop   cmp (zpdst),y   ; 5 (6 if page boundaries are crossed)
        beq wdone       ; 2 (exiting the loop doesnt count)
        inx             ; 2
        beq timeout     ; 2 (again, exiting doesnt count)
        bne cloop       ; 3

wdone   iny             ; we're out of the loop here.

The total count inside the loop is fourteen cycles. Considering it will loop a maximum of 256 times before giving up, that’s 3,584 cycles, or 3.5ms at 1MHz. We need to add another 16,340 cycles to the overall loop, which is 65 cycles per loop, minimum.  Since the loop exits when the write’s complete, waiting longer is harmless.

There’s a few ways to waste some time. NOP’ing it out is inefficient; requiring 22 NOPs. How about a nested loop?  We’re out of registers: .A is our value to compare, .X is the failure counter, and .Y is our index.  We’ll have to save and restore a register each time the loop runs, which is easiest to do with .A on NMOS 6502’s. We’ll use it for our nested loop counter.

Here’s our proposed nested loop:

        pha             ; 6     save .A for compare instruction
        lda #$00        ; 2      preset counter
dloop   clc             ; 2
        adc #$01        ; 2     count up
        bne dloop       ; 3     until overflow
        pla             ; 6     restore .A for compare instruction

This requires 8 cycles on entry, and 5 on exit. The exit may look off; it’s offset for the ‘bne dloop’ only taking two cycles instead of three when falling through.  The center is 8 cycles, occuring 256 times. That’s 2048 in all, or 2.048ms.  Adding the head and tail brings us to 2061 cycles in all.

With this, our compare loop interior becomes quite efficient at wasting time.  The interior moves up from 14 cycles to 2075 cycles, and repeating 256 times gives a delay of 531,200 cycles, offering over a half second for the EEPROM to finish its write before the routine gives up on the EEPROM.  Perfect!

Lets see the entire routine:

zpsrc   =   $42         ; zp source pointer
zpdst   =   zpsrc+2     ; zp pointer to EEPROM
eeprom  =   $A000       ; just to have it defined
code    =   $2100       ; etc.

        *=  $2000
prog    ldy #<code
        sty srcptr
        ldy #>code
        sty zpsrc+1
        ldy #<eeprom
        sty zpdst
        ldy #>eeprom
        sty zpdst+1

        ldy #$00        ; preset our index
wloop   lda (zpsrc),y   ; get source
        sta (zpdst),y   ; tell EEPROM to write
        ldx #$00        ;  prepare timeout and give EEPROM a little more time
cloop   cmp (zpdst),y   ;  check to see if it's done
        beq wdone       ;  nope, check again
        pha             ;   save .A for compare instruction
        lda #$00        ;   preset delay counter
dloop   clc             ;
        adc #$01        ;   count up
        bne dloop       ;   until overflow
        pla             ;  restore .A for compare instruction
        inx             ;  increment our timer
        beq timeout     ;   did we out of time? Error out
        bne cloop       ; continue to next byte.

wdone   iny
        sta (zpdst),y   ; tell EEPROM to write
cloop   cmp (zpdst),y   ; check to see if it's done
        bne cloop       ; nope, check again

        bne wloop
        clc             ; indicate things are fine.
        bcc exit        ; and exit
timeout sec             ; indicate write error
exit    rts             ; 256 bytes programmed.

There are some considerations to be kept in mind when programming the EEPROM.  First, you obviously can’t be running code from the device while programming it.  Data read back isn’t valid until the write’s complete.  Second, if you’re streaming from a file to the EEPROM, you’ll have to ensure it’s not being used to load the data from mass storage.  There are also many other considerations, such as the program’s inability to program more than 256 bytes at a time.

Covering some of these as well as enabling write mode on the Kickstarter 1112 will be covered later on, as this was intended to be more of an intro post for programming EEPROMs on your system.


A simple network card

or, let’s start yet another thing!

I did make a note about increasing the content, and technical difficulties combined with attention span errors caused me to back-burner re-reviewing a certain PDF file. 😉

Enter the Imagewriter II localtalk option card, and what I’ve learned about it so far. Let’s begin.

The device

The card is a network adapter for Apple’s Imagewriter II series printers, allowing a direct connection to an appletalk or localtalk network. This gives a lab the ability to share the printer with all systems on the network without a switchbox or other trickery, while also allowing faster data transfers into the printer: Normally the printer communicates at 9600bps while appletalk runs at 230.4kbps.

I’d had curiosity about the card for quite a few years and spotted one on our favorite second hand website for an inexpensive sum. To this, I thought to myslef “why not? It looks like it shouldnt be too bad to figure out.”

Pre-exam before arrival

When examining the board in any images I could find online I got hints of a few key IC’s that indicate that the network adapter is in fact an intelligent device. Clearly visible in online images are the z8530 one would expect since Appletalk is generally driven by the Zilog SCC, as well as 8k of sram, a 65c02, and a rom-sized IC with an Apple copyright date.

This gives me the impression that the card in general is in fact an independent SBC, and the copyrighted IC is likely a mask or OTP ROM that provides the network services for the printer. In fact, I’d seen images of these cards with a ceramic DIP and paper sticker over a circular indentation, indicating that it is in fact compatible with an EPROM.

Then I considered some requirements of the 6500 series microprocessor family: Ram at pages 0 and 1 (unless you like a gelding system), and ROM at the end of memory (unless you like a comatose computer). Given the SRAM was easily identifiable as a 6264 in the auction page I viewed and the EPROM socket had the same number of pins, I guessed that the firmware was 8k in size.

The other chips aside from the oscillator weren’t well photographed in the auction page so other guessing must be done to distract myself from the waiting process while the card made its journey to my hands. For example, what sort of memory map might this processor see?

Remember again the need for ram at the bottom of memory and rom at the top, and that they’re 8k in size? A likely solution for decoding this easily is by using a 3-8 demultiplexor, likely our friend the 74LS138. Bets are that it’s involved, so lets keep our scope to eight devices.

So far we have four devices. I could see a few chips in the original pictures, a trio of 74ls374 8-bit latches, and a 74ls245 bus transciever. That gives us four more, leaving us at six. Add the SCC and we’re at seven, and some possible control logic regarding controlling data flow to get to eight. No rule says all areas have to be used, however.

The arrival

The board arrived in great condition, and I proceeded to take some good pictures.  Here’s a top shot for reference:

ImageWriter II Localtalk Card

Imagewriter Localtalk Option

When the board arrived I spent no time giving it a good examination. The 65c02 is rated at 2mhz, and a 3.6864mhz crystal clocks the system. A small non-volatile memory IC was also found, adding another device. The remainder are all standard 7400-series logic IC’s, one of which is a 74ls138 as predicted.

Since I can’t remove the CPU as I did for the IIEasy Print board, I’ll have to use a less automated method of finding the memory map. Time to get the continuity checker out!

Mapping the network (card)

If the 74ls138 is indeed used as a memory mapper, a few assumptions could be made. The first is that it’s enabled continuously, and the second that it’s controlled by the CPU’s clock to ensure correct write timing on the SRAM and latches. A quick check on /g1 and /g2 show them to be grounded. G3, the active high input however was not grounded or tied high. An intelligent guess led me to the phase2 line on the CPU, and my beeper agreed. Next is the blocksize decoded by the LS138. The I3 input on the 138 was quickly traced to A15 on the 65c02, I2 to A14, and I1 to A13. Eight 8k blocks of memory. With a single CPU and a 74ls138 the memory map is decoded.

Now let’s follow some outputs. Given the address bit connections, O0 should lead to the RAM chip, and as expected it did. O7 to the rom? Correct. Now where’s that SCC? I parked one probe on my meter on the /cs pin on the SCC and walked my way across the remaining outputs on the ‘138. Connected to O5 on the ‘138, the SCC starts at $a000 and ends at $BFFF.

All but two of the other outputs are traced so far. The remainder are a bit more complex and will be covered in a future update. A detailed memory map is below.

U9: 74ls138
pin name dest
7 /o7 [ e0-ff ] u3:20, u3:22 (8k eprom /oe, /ce) [$e000-$ffff]
9 /o6 [ c0-df ] nc?
10 /o5 [ a0-bf ] u2:33 (scc /cs)
11 /o4 [ 80-9f ] u7:1 (ls374 /oe)
12 /o3 [ 60-7f ] u5:1 (ls374 /oe)
13 /o2 [ 40-5f ] u6:11 (ls374 /load)
14 /o1 [ 20-3f ] nc?
15 /o0 [ 00-1f ] u4:20, u4:22 (8k sram /oe, /ce)

But what about the printer?

The Imagewriter II isn’t incredibly well documented, but a SAMS manual does have something resembling a schematic, in which a port for an ‘optional device’ is pinned out. It has 26 pins like the localtalk card, and has vcc and gnd in the same locations.

The pins are pretty straightforward: eight ‘AD’ pins for a multiplexed address/data bus, an ALE line to latch the address during an access, a DE line to enable the data transfers, and other various control lines.

AD0-7 head straight for the bus transciever to keep the network card off the imagewriter’s ADbus unless there’s a valid transaction, and then head into the three ‘374 8-bit latches. Likely, the latches are used for Address, Data in, and Data out (respectively). There’s some twisty logic to be traced so that this can be worked out.

And the software?

The firmware on the card is still hidden as I’ve not taken time to dump its contents and examine it. This may or may not be covered later as well, but I do find it interesting to find out how the card communicates with the printer. Perhaps other communications cards could be created- perhaps a parallel interface or other networking option?

And then…

To be honest, I didn’t really intend on adding this to my Imagewriter II, since printers aren’t really my thing these days. I just find vintage hardware interesting, and enjoy the process of mapping simple systems to feed my imagination of what could have been done with them had we known what they really were made of.

(Proabaly) more later. 😉

In resurrecting abandoned projects

Or, where the heck have you been?

Once I began analyzing some of the routines internal to PROMAL ages ago, I found myself fascinated and then obsessed with the simple elegance that the system is written with.  I’ve been working on commenting a full disassembly of the program but haven’t yet decided to officially relase it.

Therein comes a challenge for myself.  I’d like to document the steps I used to reverse the language’s concepts, internal variables, and machine code here.  However it will take some effort to refresh my memory as the initial disassembly was performed nearly two years ago, and I’m a bit stuck for time.

Let’s just provide a screenie of the title page for entertainment purposes, as well as a shout-out to the venerable COMPUTE! Books series that covered many CBM systems including drives and computers.


Hopefully life will be a bit more generous with spare time and inspiration so I can complete the work as well as get some serious documentation about the process posted here.

By the way, that font was HARD to find.  Still not an exact match for the one COMPUTE! used, but it’s close enough to pass. 🙂

[Edit: Found an attempted escape : q and removed it.  Meh.]

Automated parameter parsing in PROMAL

Or, decrypting the unencrypted cryptic code

I decided to spend some more time looking at how PROMAL works internally. The next routine I decided to examine is BLKMOV, as its function is similar to the MOVSTR function examined earlier.

Let’s have a look at the jump table to find BLKMOV:

0f30  4C A9 21    JMP $21A9

Easy enough! Let’s have a look to see how BLKMOV initializes itself, and to see if it accepts a 16-bit length.

21a9  20 15 18    JSR $1815
21ac  33 E0       RLA ($E0),Y
21ae  38          SEC
21af  34 2C       NOOP $2C,X
21b1  A5 34       LDA $34
21b3  38          SEC
21b4  E5 2C       SBC $2C
21b6  A8          TAY
21b7  A5 35       LDA $35
21b9  E5 2D       SBC $2D
21bb  AA          TAX
21bc  98          TYA
21bd  C5 38       CMP $38
21bf  8A          TXA
21c0  E5 39       SBC $39

Wait. WHAT? (Almost) no professional programmer would use illegal opcodes for a final product. The RLA and NOOP $2C,X are invalid opcodes. Also, I’ve tested PROMAL and found that it works well on the CMD SuperCPU accelerator, which will behave differently with illegal opcodes and cause PROMAL to crash.

Caller relative addressing
Let’s see what exactly is going on with 1815. I’ve studied the routine ahead of time and added some helpful comments and names to the disassembly. Because this routine is used by many library routines, I will refer to the caller as “the parent”. After analyzing the routine I’ll provide BLKMOVs results to help clarify the routine’s work.

                                        ; we are always a grandchild!
                                        ; parameter = value from p-code interp
                                        ; argument  = value from caller's table
1815: 68          getprm  PLA           ; get parents address
1816: 85 3A               STA Parent    ;
1818: 68                  PLA           ;
1819: 85 3B               STA Parent+1  ;   and save for our use
181B: 68                  PLA           ; get grandparent's address
181C: 85 6D               STA GParent   ;
181E: 68                  PLA           ;
181F: 85 6E               STA GParent+1 ;   and save for later restoration
1821: 84 6A               STY numparms  ; save number of parameters
1823: A0 01               LDY #$01      ; initialize index
1825: B1 3A               LDA (Parent),Y; get min/max parameters

I knew something was odd. This is an uncommon but handy trick: The data after the JSR in the parent is never executed. If you examine 1816, you see that it stores the parent’s address from the stack as MOVSTR did. It then stores the parent’s parent’s return address (‘GParent’) so that it can get to the parameters on the stack later. After that’s all set up, the number of parameters from PROMAL is saved to numparms and the Y index is initialized to 1.

Why all this work? The method is used when resources are tight: We have parameters on the stack that get processed by many routines in the library. It’s best for code space efficiency if a single routine handles these parameters. However, not all library routines use the same number or even type of parameters. That’s where this routine comes in. The arguments for the ‘getprm’ routine are stored after the JSR from the library routine calling it. This way each library routine will be able to specify what type of information it expects to find on the stack.

On arguments and parameters
In this post I need to distinguish between two things: The data used by the getprm routine, and the data the parent needs from the stack. In this case, ‘argument’ refers to data used by getprm, and ‘parameter’ refers to any data passed on the stack by PROMAL. This is done in consistency with the MOVSTR post.

Let’s have a good look at this routine to understand what it does.

Setting things up
We already have the calling routines’ addresses safely shuffled away, and we have our first argument retrieved from the parent.

1827: 29 0F               AND #$0F      ;  Mask max #parms off
1829: 85 69               STA maxparms  ;   and save
182B: C5 6A               CMP numparms  ;  compare with paremeter count
182D: 90 10               BCC getperr   ;   too many, runtime error

In the segment above, the argument loaded from the parent is masked off and saved. Studying the routine ahead of time helped me understand that the low half of the first argument is the ‘minimum’ number of arguments the parent requires. If the number of arguments provided by PROMAL (‘CMP numparms’) is larger than the maximum, the routine branches off to a fatal runtime error (‘BCC getperr’).

182F: B1 3A               LDA (Parent),Y; get min/max parameters
1831: F0 40               BEQ getpfin   ;  0/0 parms? exit.

The argument is reloaded since it was mangled when setting up maxparms. While it’s loaded and un-mangled, the routine checks to see if there are no parameters to be loaded. If this is the case, the routine exits. It would seem to make no sense to call this routine if you don’t want any parameters. I’d agree, but there must be a good reason to call it in this fashion as a few library routines do just that.

1833: 4A                  LSR           ;
1834: 4A                  LSR           ;
1835: 4A                  LSR           ;
1836: 4A                  LSR           ;
1837: 85 6B               STA minparms  ;  save min #parms
1839: C5 6A               CMP numparms  ;  compare with parameter count
183B: F0 05               BEQ getpok    ;  same? ok.
183D: 90 03               BCC getpok    ;  more than min parms? ok.
183F: 4C 80 10    getperr JMP syserr    ; fail out via system error

Now, the upper half of the first argument is shifted down and store in ‘minparms’. It’s again compared to the numparms value, this time to determine if there are at least the correct number of parameters (beq: bcc).  If not, the routine fails through to a jump to PROMAL’s fatal runtime error routine.

1842: C8          getpok  INY           ; Increment index
1843: B1 3A               LDA (Parent),Y; Get mask bits

There will be a lot of INY : LDA (Parent),y as the routine works its way through the argument table.

1845: 85 6C               STA getpmsk   ; Save in mask byte

The second argument is stored to getpmsk, short for ‘getprm mask.’ This byte is actually eight flags, each indicating the type of parameter to get from the stack. There are two types of data and one way to work with each. As a quick reminder, PROMAL always pushes parameters as words, even when they’re bytes.

Bit = 0    Parameter is a byte
           The next argument byte is a zero-page address and a default value.
           * Store this byte where specified at the address
           * Load and discard high byte from stack if applicable
           * Load and store low byte from stack at the address if applicable
Bit = 1    Parameter is a word
           This argument is one zero-page address.
           * Load and store the high byte from stack at the address+1
           * Load and store the low byte from the stack at the address

The routine appears to not have any facilities for handling a default 16-bit value.  It’ll be up to the parent to detect a missing 16-bit parameter and set up a default value in its place.

Processing arguments and setting up parameters
At this point, the routine is initialized and ready to load parameters as specified by the parent until it’s out of arguments.

1847: C6 69       getpl   DEC maxparms  ; Decrement parameter count
1849: 30 28               BMI getpfin   ;  Out of parms? exit.

Maxparms is now used as a count-down value to determine when the routine’s out of arguments. The name of the location is a bit of a misnomer, I apologize.

184B: C8                  INY           ;
184C: B1 3A               LDA (Parent),Y; get zp address
184E: AA                  TAX           ;  and save

The arguments now always start with a zero page address. This is read from the argument table and saved in the X register to be used as an index. This allows the code to run without modifying itself and is a good example of advanced indexing when used in this situation.

184F: A5 69               LDA maxparms  ; check max parms
1851: C5 6B               CMP minparms  ;  are we out of required parms?
1853: 90 0F               BCC gpfprm    ;  No, go pull it off the stack.

In this section of the loop, maxparms is compared with minparms to determine whether or not we’re out of required parameters.

1855: 24 6C               BIT getpmsk   ; is current parm a word?
1857: 30 05               BMI gpisw     ;  yep, skip

Remember the paramter type mask? This is one of the two checks against the flags in the loop. The BIT instruction does a handful of things, but of interest to the routine is the way it copies bit 7 of getpmsk to the negative flag without modifying any other registers. In this case, if a parameter is a word the negative flag gets set and the BMI (branch if minus) routes the cpu to gpisw (short for getparm is word), below.

1859: C8                  INY           ;
185A: B1 3A               LDA (Parent),Y; get default value or low byte
185C: 95 00               STA 0,X    ;  store at zp address
185E: A5 69       gpisw   LDA maxparms  ; is current argument
1860: C5 6A               CMP numparms  ; greater than parameter count?
1862: B0 0A               BCS gpdefl    ;  Yes, process default value

The next argument byte is loaded if it’s a ‘byte’ type. It’s stored at the zero page location pointed to by X, which was read in earlier. Then it follows through to ‘gpisw’, which checks the current argument against the number of parameters provided to the parent by PROMAL. If we’re out of parameters, we skip off to gpdefl, which is short for ‘getparm default’.

1864: 68          gpfprm  PLA           ; Get parameter from stack
1865: 24 6C               BIT getpmsk   ; current parm = word?
1867: 10 02               BPL gpis8     ;  no, skip high byte store
1869: 95 01               STA 1,X   ;  * store high byte if word
186B: 68          gpis8   PLA           ; get low byte or default value
186C: 95 00               STA 0,X    ; store where requested

In a moment whose reason eludes me, I called this branch point gpfprm. What this section does is first pull the high byte of the next parameter from PROMAL off the stack and then check the parameter type mask to see if it’s a byte type. If so (BPL, as bit 7 would be a zero its plus or positive), it skips to gpis8, discarding the byte. If it’s a word, it gets stored to X+1 by using a base of 1 instead of 0.
Gpis8 always pulls and stores the byte to the location indicated by X.

This method is clever: The routine first loads the default value from the argument block into memory, and then only loads a value if there’s one available on the stack. It’s a good way of ensuring a default is in place if it’s not specified by PROMAL.

186E: 06 6C       gpdefl  ASL getpmsk   ; shift parameter bit mask
1870: 4C 47 18            JMP getpl     ; loop

The argument mask is shifted one to the left to ensure it stays in sync with the argument index in Y. Then, the loop is restarted.

Cleaning up

1873: 98          getpfin TYA           ; transfer index to A
1874: 18                  CLC           ; pre for math
1875: 65 3A               ADC Parent    ; Add our parent's return address
1877: AA                  TAX           ;
1878: A5 3B               LDA Parent+1  ;
187A: 69 00               ADC #$00      ;
187C: 48                  PHA           ;  and put on stack
187D: 8A                  TXA           ;
187E: 48                  PHA           ;  for rts.

As we don’t want to return into a data block, we’ll add our current value for Y to the parent’s calling address and put it on the stack. This ensures we safely RTS into the byte following the argument table.

187F: A5 6D               LDA GParent   ; get grandparent's address
1881: 85 3A               STA Parent    ; place where p-code expects parent's
1883: A5 6E               LDA GParent+1 ;
1885: 85 3B               STA Parent+1  ;
1887: 60                  RTS           ; and return to arg table+1

And as a last bit of cleanup, the grandparent’s address is placed where our parent would expect it to be, leaving the runtime in a good state and keeping the stack clear.

How BLKMOV used this routine
BLKMOV used this routine to set up all of its zero page vectors. Once getprm is done, the routine looks largely like MOVSTR, so I’ll (probably) cover it later.

Here’s what getprm did for BLKMOV. I’ll include the first part of BLKMOV again, with a bit better formatting since we know what the data following the JSR is for.

21a9  20 15 18    JSR $1815     ; jsr to getprm
                  .byte $33     ; min/max number of parameters
                  .byte $e0     ; %1110 0000 - all three parameters are words
                  .byte $38     ; first word stores at $38 and $39
                  .byte $34     ; second word stores at $34 and $35
                  .byte $2c     ; third word stores at $2c and $2d

When getprm runs for BLKMOV, it performs these actions:
* Writes the last parameter (Count) to $38 and $39
* Writes the second parameter (From) to $34 and $35
* Writes the first parameter (To) to $2c and $2d
* Cleans house and returns to the byte following the argument table at 21b1

This might look a little familiar. MOVSTR uses the same vectors.

In summary
The routine is very handy in that you can easily specify what you need loaded, as well as quickly specifying how many parameters you require and how many you can take. The limit for the number of parameters is sensibly eight, given the mask argument is a byte providing 8 flags for parameter types.

The getprm routine is used by many routines in the PROMAL system, including (but not limited to) GETC, GETL, BLKMOV, OPEN, CLOSE, CHKSUM, and EDLINE.

Moving data in PROMAL

or, Losing your mind with PROMAL

Learning how things work

In a recent experiment in learning to work with PROMAL, I needed a method for moving pieces of data around in memory to split strings. The MOVSTR library procedure seemed ideal, but consistently missed the mark and corrupted memory.

As it turns out, I had an addressing issue as well as a misunderstanding of what the procedure will do for me.

For our reference, I’ll quote the MOVSTR proc’s documentation from the Library Manual.


USAGE: MOVSTR FromString, ToString [,Limit]

MOVSTR is a procedure which is used to copy strings, to concatenate strings, or extract substrings (i.e., replaces the LEFT$, MID$, and RIGHT$ functions found in BASIC).  FromString is the address of the string to copy.  ToString is the address of the destination.  Limit is an optional argument specifying the maximum number of characters to copy.

This brings up some useful syntax in PROMAL:  Specifying the address of a string.  In my project, I needed to extract the middle of a string and deposit into another variable.  My first attempt used this method:

movstr buf[3], name, 16

A person would think then, movstr would copy from buf[3] to buf[19] into name, but this was not the case.  After some deep debugging of the PROMAL library routine itself, I learned that I was in fact telling PROMAL to use the address at buf[3] and buf[4] as the source for the string to put into name.  This is an inconsistency in addressing that was learned:  When I specify ‘movstr buf,name,16’ it will use the location of buf[], but if I use ‘movstr buf[3],name,16’ it instead uses a vector placed at buf[3].  To fix this issue, use the # operator to specify ‘the address of…’:

movstr #buf[3], name, 16

The alternate format tells the compiler to use the address of buf[3] instead of a vector at the same location.

Learning inner workings through an assembly debugger

Disclaimer: Most 8-bit fans will balk at using an emulator to develop programs for their beloved 8-bit systems.  I do heavily prefer to develop on the machine itself, but there’s little that beats a debugging system that will stop the system cold: video refresh, hardware timers, everything gets paused.  As I was having trouble doing local development, I moved my data to the Vice emulator and got to work.

The MOVSTR function

In the library, the definition for MOVSTR is ‘EXT ASM PROC MOVSTR AT $F33’.  This tells the compiler that MOVSTR is a procedure that can be called directly in memory at location $0f33.  In my particular installation of Vice, Alt-H opens the debugger, pausing the emulation.  A quick look at f33 will show that it’s part of a jump table:

(C:$f33e) d f33
.C:0f33  4C 38 22    JMP $2238
.C:0f36  4C 18 22    JMP $2218
.C:0f39  4C DF 26    JMP $26DF

Of course, the only real interest is the first instruction: jmp $2238.  Let’s have a look there.

Exploratory surgery (or, finding out how PROMAL thinks)

At $2238 is a fairly straightforward routine.  For reference, the code below is being called by my test program after things are in working order as that’s the only debug log I saved.  There’s still a lot to learn!

Here’s the processor registers when the call is made: a=3 x=3 y=3 sp=f7

First, promal saves the calling address in a scratch space so it can return to the caller:

2238  68          PLA
2239  85 3A       STA $3A
223b  68          PLA
223c  85 3B       STA $3B

Now, we can examine the stack and prepare something ahead of time:

223e  A9 FF       LDA #$FF
2240  C0 03       CPY #$03
2242  D0 03       BNE $2247

At this point, I recognize the #03:  Movstr can have two or three parameters, so apparently the Y register holds the number of parameters for the function.  I specified three in my application, so this falls through to the next instruction:

2244  68          PLA
2245  68          PLA
2246  88          DEY
2247  85 38       STA $38

At first, this confused me greatly.  Why would you pull two bytes from the stack without saving the first?  As it turns out, the third parameter is only supposed to be a byte, rather than a word.  However, the compiler apparently always pushes words to the stack.  The first PLA simply pulls the unused high byte of the word and discards it.  DEY is a setup for the next compare below:

2249  C0 02       CPY #$02
224b  F0 03       BEQ $2250
224d  4C 80 10    JMP $1080

Here’s the second check.  Remember, movstr can have two or three parameters.  Here, Y is checked to see if it’s 2.  If it is, the jmp is skipped.  For reference, $1080 is a runtime error routine.  I checked by entering ‘go 1080’ in the promal executive.  PROMAL replied with this:

AT $C3F3

Continuing to $2250, the routine then gathers more information:

2250  68          PLA
2251  85 35       STA $35
2253  68          PLA
2254  85 34       STA $34
2256  68          PLA
2257  85 2D       STA $2D
2259  68          PLA
225a  85 2C       STA $2C

At this point, the MOVSTR routine has everything set up for the routine below.  The [limit] was processed early on, and now the [tostring] and [fromstring] parameters are stored in zero-page as well.  Tearing apart the actual copy routine is beyond the scope of this post, but I’ll include it for reference.

225c  A5 34       LDA $34
225e  38          SEC
225f  E5 2C       SBC $2C
2261  AA          TAX
2262  A5 35       LDA $35
2264  E5 2D       SBC $2D
2266  D0 1F       BNE $2287
2268  8A          TXA
2269  C5 38       CMP $38
226b  B0 1A       BCS $2287
226d  A0 00       LDY #$00
226f  B1 2C       LDA ($2C),Y
2271  F0 0A       BEQ $227D
2273  C8          INY
2274  C4 38       CPY $38
2276  90 F7       BCC $226F
2278  A9 00       LDA #$00
227a  F0 03       BEQ $227F
227c  88          DEY
227d  B1 2C       LDA ($2C),Y
227f  91 34       STA ($34),Y
2281  C0 00       CPY #$00
2283  D0 F7       BNE $227C
2285  F0 15       BEQ $229C
2287  A0 00       LDY #$00
2289  A5 38       LDA $38
228b  F0 0D       BEQ $229A
228d  B1 2C       LDA ($2C),Y
228f  91 34       STA ($34),Y
2291  F0 09       BEQ $229C
2293  C8          INY
2294  C4 38       CPY $38
2296  90 F5       BCC $228D
2298  A9 00       LDA #$00
229a  91 34       STA ($34),Y

Remember how we started?  We stored the return address at $3a so we could examine the parameters on the stack.  To return, an internal routine is then run, which does the work of putting the calling routine back on the stack and returning:

229c  4C 69 20    JMP $2069
[at $2069]
2069  A5 3B       LDA $3B
206b  48          PHA
206c  A5 3A       LDA $3A
206e  48          PHA
206f  60          RTS

In Summary…

Lesson learned?  The [limit] parameter for MOVSTR has a maximum value of 255, and one has to be very careful about how the parameters are specified.  We don’t have the luxury of a memory protection unit that modern systems have, so an incorrectly specified parameter can cause the whole environment to be overwritten at random.

Also, if you looked carefully at the entire routine, you’ll notice that the copy will stop on the first null ($00) byte it finds.  As it’s a ‘string’ move rather than a block move, it makes sense considering PROMAL uses ‘ascii-z’ strings.

I also got a good chance to see how exactly the PROMAL compiler passes its data to procedures via the stack.  What I learned is confirmed in the promal language indexes, specifically the section on calling external assembly functions and procedures.

Making things a bit faster

Or, reducing frustration

The project has seen a bit of silence recently.  I’d apologize but I’m not really all that sorry about it.  I’ll offer an explanation:

The process for programming the NVRAM takes 6 minutes for a full update, and the tools I made in BASIC are at best a kludge.  If I did anything imperfectly I’d most often have to reload the entire 8k since I don’t have a debugger yet.

Enter PROMAL discovered by an online colleague we lovingly refer to as ShadowM.  He recently acquired the long forgotten and abandoned software and graciously offers it to the Commodore community.

PROMAL is a high level compiled application language originally written for the C-64, Apple2, and IBM PC running DOS.  To date, Ive only managed to find the C-64 version ShadowM has on his site.

PROMAL is a native development environment including a functional editor and commandline environment that supports passing arguments to programs being called.  This is a great advantage, as a user can just specify an action as well as a target, rather than having the action app request what the target is intended to be.

As PROMAL is a compiled language, it does tend to run a bit more efficiently.  There are also other things that can be done in the language that enables one to increase efficiency.  For example, the most common data type is a 16bit word rather than a 40bit float.  All those loops in the BASIC program?  The address counters, the data byte… all of those are stored and processed as floats.  Not so in the programs below, as I only use ‘byte’ and ‘word’ variables, which are optimally sized as 8 bit and 16 bit unsigned numbers, respecitvely.

How much more efficient than BASIC is this implementation?  In this specific application we’re looking at a speed increase of 383%.  Nearly four times the operational speed.  There’s light at the end of the reprogramming tunnel the moment enter is pressed!

Another advantage of PROMAL is the ability to define where a variable rests in memory.  Take the nvdefs.s file below as an example:

ext byte via1[] at $de20

There are some key words here:  ‘ext’ refers to an external reference.  ‘byte’ defines the variable’s values to be stored as bytes.  The name ‘via1’ is assigned and the brackets indicate the variable is an array.  Finally, the specified address is $de20.  Readers might remember the address of the first VIA on the IO card I’m using for this project being at that address.

What does this do for me?  It’s simple.  I use the constant also defined in nvdefs.s called ‘porta’ to set up the output pins as desired, without a poke or a lot of math: via1[porta]=$ff.  This causes the value of porta to be offset into the via1 array, and a $ff is stored there, turning all bits on.  No run around looking up a variable, getting its value, then converting the float to an unsigned integer, setting up a vector, loading porta into an index, and hten finally storing.  It just skips the second through fifth steps.   You might also notice ‘addr’ in nvdefs, which is a WORD type set at DE20.  This means any 16bit address stored in the ‘addr’ variable is automatically stored to both port a and port b on the first via at that address.  No address splitting or additional processing.  This all adds to speed, which really matters when you only have approximately 300,000 operations/second available.

Below is the nvdefs.s file I created for the project.  PROMAL includes among other things the ability to include secondary files in the same way the modern C compiler has headers.  These files are considered part of the compiler’s input stream at the position they’re included, and can provide hardware and operating system abstraction.  If the apple2 version of PROMAL weren’t made of unobtanium and a “john bell” 32bit IO card were in use, nvdefs.h could be changed to reflect the IO port addresses for the apple2 card and the application below could be used without modifications after a simple recompile.

File: nvdefs.s

;DEFs for nvram read/write via Schnedler ultimate interface
;IF VIAS CHANGE also change named items below.

; assume VIA1 is at DE20, VIA2 at DE30
ext byte via1[] at $de20
ext byte via2[] at $de30
con ddra=3
con ddrb=2
con pa=1
con pb=0

; Named items for nvram config:
; Bits on control port: WE CS OE nc nc nc nc nc
con off=$e0 ; chip offline
con wr =$20 ; chip write
con rd =$80 ; chip read

; Named items for via control
con in=0   ; for DDR
con out=255; for DDR

; Named items for nvram address/data bus
ext word addr at $de20 ; access via1 ports a and b as 16bit unsigned!
ext byte dat  at $de31 ; via2 port a for data bus
ext byte dc   at $de33 ; via2 port a ddr for data in/out
ext byte ctl  at $de30 ; via2 port b for chip control

And now the tool written to reprogram the NVRAM, rewritten in PROMAL.  It accepts its parameters from the commandline, making it easy to specify what needs to be done.

File: nvwr.s

program nvwr

include library
include nvdefs

word a       ; address presented to nvram
word length  ; number of bytes to prog
word file    ; file handle
word total   ; total bytes written
byte d       ; data byte to program
byte t       ; scratch space

if ncarg < 1
  put "Usage:  nvprog <file> [addr] [len]",nl
  put "  All values are specified",nl
  put "  in hexadecimal.",nl,nl
if file=0
  put "Unable to open ",carg[1],nl
if ncarg>1
  t=strval(carg[2],#a,16,4) ; address
if ncarg>2
  t=strval(carg[3],#length,16,4) ; bytes to program

output "Loading $#4H bytes from ",length
put carg[1]," into NVRAM.",nl

while getcf(file,#d) and length>0
  ctl =off ; ensure chip is offline
  addr=a   ; set up address bus
  dat =d   ; set up data bus
  dc  =out ; drive data bus
  ctl =wr  ; write byte (cs,we)
  ctl =off ; back offline
  dc  =in  ; prep for verify
  ctl =rd  ; re-read byte
  if dat <> d ; live compare data bits on ram with d
    put "ERROR: failed to verify byte",nl
    output "#4H >#2H <#2H#C",a,d,dat
  length=length-1 ; count bytes left to program

output "Wrote #4H bytes.#C",total


Sadly, I don’t offer anything in the way of pictures in this post, but I hope the information was interesting regardless.  Real Soon Now(TM) I’ll be able to get a more featured toolset for getting my minimal debugger going on the IIEasy Print card.  Once the machine language monitor is fully functional, everything will suddenly become very simple.

To learn more about PROMAL, you can download the documentation easily from ShadowM’s webpage.