Updating SSHD on the Cyclades ACS series

(or, changing things without really changing things)

As a user of legacy hardware that often has no ability to handle an ethernet connection, I found myself in need of a solution to get them online in order to ease the transfer of data to and from them. To solve this issue, I chose a “terminal server” device, and settled on an Avocent/Cyclades “ACS” series system.

There were two primary reasons for the choice. First and most important, they’re well documented. Second and also most important at least for me, they run linux and offer a root shell for anyone in need of one. This is handy as software often comes with security issues, and hardware is often abandoned when newer or better hardware is put on the market. This is the case, sadly, with our beloved Advanced Console Server.

I often become interested in the Open Source releases for hardware simply “because I can,” but in this instance I’d needed something to update the SSH daemon on the system. The version included with the last firmware release for the system is ‘OpenSSH_4.4p1, OpenSSL 0.9.8l 5 Nov 2009’ as reported by ‘sshd -V’. Anyone looking up security issues with that version of SSH would experience shivers and curled toes.

This post is intended to provide an entertaining look at the process I experienced while learning not only how the system works, but also how it handles configuration updates, and locating and utilizing a build environment to aid these updates. The process isn’t yet perfect as the current solution is to use Dropbear SSHd, and I would like to eventually use OpenSSHd to do the job.

The information in this article comes from a handful of sources online, but due to the litigous nature of the internet and the habit of internet entities frankly concentrating on ass-covering over defending users (cough I use a -free- wordpress acount cough) I’ll provide search keys for that great semi-trustworthy search engine empire that should direct us all to the keys we need. These will be included at the end of the article, but if I forget one feel free to remind me and I’ll do an update. /finally inhales.

 

Understanding what the ACS is

(or, grokking the gateway)

Console Servers are often also referred to as Terminal Servers. They’re literally made to convert a serial connection (or two, or maybe forty-eight) to a network connection of some sort. In the case of the ACS series, the options are pretty impressive considering the age of the hardware.

Besides simply putting a serial terminal on a network using TCP sockets, these also have the ability to provide services to handle modems both for dialing in or out, putting serial devices on a network such as plotters or printers, or even custom applications if the ‘acslinux’ open source package was requested.

I’ve noted above that these ACS systems run Linux. They also provide a serial console like many of the systems of its era, and connecting to it allows logging in as root and even resetting the configuration of the system by using the ‘single’ command on the kernel commandline before booting. Beyond that step, a full root shell in a ramdisk is offered to manage and utilize the system’s resources.

The challenges are getting beyond the volitale nature of a ramdisk and understanding how to do so in a way that works with the firmware implementation. The first article in this series covers the step of understanding the system.

Further articles will cover changing the system’s configuration and thereby its behavior, and ensuring that configuration change is included when the system is powered down or rebooted.

Understanding the ACS’ configuration files

(or, a segment of an unwritten manual for the Cyclades ACS)

Like many embedded systems I’ve come across, the Cyclades ACS runs entirely witin a ramdisk that’s loaded during startup. That ensures a sane configuration is put in place to operate the hardware as intended by the manufacturer. After the ramdisk is loaded into memory, another location is used to read configuration data and update this ramdisk.

In the ACS series, a compact flash card stores the bootloader, kernel, ramdisk, and the configuration data. How exactly it’s laid out isn’t apropos to these articles and is somewhat beyond my understanding. I’m not interested (yet /grin) in changing this, so it’s not a concern. As much as I dislike ‘magic’ or ‘black boxes’ I’m going to have to let this go, for now. Grr.

So, how do we determine how this is done? It’s a linux system. Look at the startup scripts! We’re dealing with a simple script system here, not anything convoluted or over-engineered like SystemD. The startup process is summarized here.

A large portion of the system is based on busybox, which is modified for this system somewhat- it runs one script of interest, /bin/init_ram.

Init_ram takes care of setting up system mounts and pcmcia before loading configuration data using a script at /bin/restoreconf.

Reviewing /bin/restoreconf sheds light on how the system saves its configuration between runs: A list of files is chosen baed on the contents of /etc/config_files and saved to a custom filesystem mounted on /mnt/flash, in config.tgz.

The contents of config.tgz are unpacked over the root filesystem, overwriting files that have changed from the factory configuration and adding new ones.

/etc/config_files will be key to the SSHd upgrade later on.

After busybox’s built-in preconfiguration /etc/rc.sysinit is run. A lot of the usual linux init happens here, such as tuning the system, setting hostname, and starting up networking.

After rc.sysinit, several scripts are run at once, the most important of which is /bin/daemon.sh. This script reads the file /etc/daemon_list, which contains a list of daemons to start after sysinit. The format is pretty simple. Hashed lines are ignored via an inverse grep, and lines are in the format ‘nickname dfile’ where nickname is a name for the daemons, and dfile is the file responsible for controlling the daemon. These files are called via /bin/handle_daemon.sh.

/etc/daemon_list is another key to the SSHd upgrade.
The files in /etc/daemon.d/, which are referred to by /etc/daemon_list, are the last key to getting the SSHd upgrade in place.

Building programs for a legacy system

(or, digging for tools waaaay back in the shed where the dust bunnies live)

Building programs for a given system requires an understanding of a little of its hardware. Linux will abstract just about everything, but it’s necessary to know what processor’s at heart in the system. This is often easily obtained via a single command:

# cat /proc/cpuinfo
processor : 0
cpu : 8xx
clock : 48MHz
bus clock : 48MHz
revision : 0.0 (pvr 0050 0000)
bogomips : 47.36

Well then. Isn’t that incredibly descriptive. It’s a 8-what now? And 48MHz? The system’s fitted with PC133 SDRAM. I guess the memory’s not going to be a bottleneck here.

Lets check a binary on a system with ‘file’ installed.

bin$ file busybox
busybox: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), dynamically linked, interpreter /lib/ld.so.1, for GNU/Linux 2.4.17, stripped

Well look at that. It’s apparently a PowerPC 8xx. Startup messages also mention Montavista and Hardhat Linux. The only thing I’ve been able to learn about hardhat is that it’s an apparently not-for-free distribution of linux that’s carefully kept behind password-protected gateways.

With that option out, a secondary option appears to be viable: DENX’s ELDK, aka Embedded Linux Development Kit. Thankfully it’s still present on their site and fairly well documented! Searching for ppc_8xx on the internet finds them pretty quickly, linking to a ‘working with’ page. Using ELDK on a newer system as I’m doing does require a bit of trickery, but that’s covered later and mostly impacts cross-compiling.

Cross-compiling is often the ideal way to take things, but it’s not without its challenges. Did I mention earlier that I’ve not yet succeeded getting OpenSSH’s configure script to finish? Meh. 😛

I’m taking the often-used (today anyway) easy method of crossbuilding used by owners of small ARM systems such as Raspberry pi’s and their friends. Essentially it involves installing a system emulator in a chroot and just building programs “natively” under emulation.

The ELDK installation does provide a CHROOT’able tree that can be used to build smaller software sets, which led to the decision to use Dropbear SSH.

On Ubuntu, the qemu-user-static package provides CPU emulation while still using system calls in native space. The files in /usr/bin/qemu-ppc* need to be copied to the chroot as described below so the system can use qemu to handle launching cross-platform binaries.

I followed the standard installation location of ELDK to /opt/eldk. On choosing the ppc_8xx target, a tree at /opt/eldk/ppc_8xx is created. A backup was made just in case I overwrote some files during a build and needed to recover.

Starting the environment enough to run a build was fairly simple. First, get the dropbear sources and copy them to /opt/eldk/ppc_8xx/usr/src/ and unpack them there. Next, copy /usr/bin/qemu-ppc* to /opt/eldk/ppc_8xx/usr/bin/. Once that’s done, you can use ‘sudo chroot /opt/eldk/ppc_8xx’ and will be greeted by a root prompt.

~$ uname -a
Linux epicfail 4.15.0-29-generic #31~16.04.1-Ubuntu SMP Wed Jul 18 08:54:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
~$ sudo chroot /opt/eldk/ppc_8xx/
bash-3.00# uname -a
Linux epicfail 4.15.0-29-generic #31~16.04.1-Ubuntu SMP Wed Jul 18 08:54:04 UTC 2018 ppc ppc ppc GNU/Linux
bash-3.00#

WAT. A chroot turns an x86 into a ppc? Kind of. QEMU is using a powerpc cpu emulator to run programs within the chroot and translating system calls to the native host kernel. At this point, the build should be easy, right?

Right??

I chose to use ELDK 4.0 as it’s an early version of similar age to the Montavista 3.1 installation on the ACS. That means some newer software will rely on things that don’t exist inside this chroot. Did I mention OpenSSH?

Dropbear had only one issue. Configure ran without a hitch, but make failed after an error with sed. Apparently, ‘sed -E’ didn’t exist then, but ‘sed -r’ does. An edit to ifndef_wrapper.sh and another make leads to success.

Make install is run to get a list of files that are created, and a tarball is made of the files to transit them to the ACS for testing.

make -j$(grep -c cpu /proc/stat)
make install
tar cfz dropbear.tar.gz /usr/local/sbin/dropbear /usr/local/bin/dbclient /usr/local/bin/dropbear* /etc/dropbear/

Now we can exit the chroot and copy the tarball to the ACS:

scp /opt/eldk/ppc_8xx/usr/src/dropbear-2018.76/dropbear.tar.gz acs:/tmp/

Note that on my system I have a 1gig CF mounted via a pcmcia-CF adapter. This is incredibly handy at times, and was used to store the tarball before unpacking it. The system will even automatically mount an ext2 (not ext3+) filesystem on the first partition to /mnt/ide on startup. I have a copy of the ppc_8xx chroot there for short tests.

Next, ssh into the ACS as root, and unpack the tarball. There’s no need for concern with the filesystem– All files are in the /usr/local tree or /etc/dropbear, and a reset will restore the system to where it was before the tarball was unpacked.

cd /
tar zxvf /tmp/dropbear.tar.gz
touch /var/log/lastlog
/usr/local/sbin/dropbear -FERp8022

Dropbear needs lastlog to log in without complaints, so we’ll create that ahead of time.

Next, launch dropbear in test mode. F means ‘foreground’, E means ‘log to stderr’, R means ‘create host keys on demand’ and p8022 sets the sshd port to 8022.

From your system, ssh to root on port 8022 to see if it worked:

ssh -p8022 root@acs

The test run of dropbear should note an incoming connection. At this moment a healthy reminder of the 48MHz system speed is handed over. The first time I made a connection it did take about two minutes, and timed out before I could log in. A second connection proceeded quickly enough and successfully logged into the root account.

 In all, it was frustrating but still educational and entertaining.

(or, we’ll just edit these explitives out.)

It’s always great to find ways for breathing new life into old hardware for me. Also, a good reminder of how to perform trickery to get things done an easier way keeps the mind active. Generally cross-compiling isn’t something people do even if they’re in a system administration line, which reminds me how to find resources and understand different methods. In all, a handful of weekend hours well spent.

The replacement SSH daemon is now up and running, but will disappear the moment the ACS is rebooted or power cycled. In a following article, I’ll cover some permanency for the SSH daemon, and even replacing the old and insecure daemon with something a little more secure.

See you then!

Designing a 6809 SBC

Or, what did I get myself into?

Recently I’ve gained interest in a lesser-known operating system called OS-9.  This isn’t to be confused with System 9 or MacOS 9, but instead is an OS originally created by Microware for the 8-bit 6809 CPU.

The simplest method for running OS-9 these days is to emulate, but for me that removes some of the fun.  As for real hardware a Color Computer from Tandy/Radio Shack is the easiest route, however getting my hands on such a machine was cost prohibitive when I started researching solutions.  The solution I chose?  Build my own.  In this article I describe in short the hardware design and the decisions that led to the final design for 6809v2, an actually-6309 single board system.

But first, the result:

6809v2-top

The final product (almost: This revision had a few design errors, but it did work)

The story begins.

6809, the first

In order to learn about the 6809 CPU and how to build for it, I did some research online and found a minimal system designed by a fine gentleman by the name of Grant Searle.  His minimal system seemed ideal as it was a tested proven model to work on, and even had a subset of Coco BASIC in its firmware.  Upon selecting the design I’d like to build upon it became time to get some hardware.

To E, or not to E.  That is the question

When selecting my 6809 I chose the Hitachi variant: the 63c09e.  I wasn’t aware of any different requirements between the -E and the non-E versions of the processor at that moment.  When initially testing the CPU, I found otherwise.

My initial sanity test was to wire the bus up for $12 (a NOP opcode in 6809-land) and lashed on a crystal oscillator to the crystal input pin on the 6309.  Note that I wired it up for the non-E version of the chip.  I spent a few hours trying to understand whether the CPU was bad or if I was insufficiently educated.

I do enjoy learning, so I was mightily pleased when the discovery was a lack of education on my part.  As it turns out, the non-E version has an internal clock generator, and the -E version requires an external generator.  Oh that’s what the E means!  External!

Tick, tock

    Tick, tock

The standard 6809 CPU internally generates what’s called a Quadrature Clock.  This is two clocks running 90 degrees out of phase, named Q and E.  The Q clock is used internally by the processor and the E clock is used to synchronize external devices with the system bus.  Below is an illustration of what a quadrature clock looks like relative to the passage of time.  I knew then what must be done.

Quadrature_Diagram

By Sagsaw at English Wikipedia (Transferred from en.wikipedia to Commons.) [Public domain], via Wikimedia Commons

Generating a quadrature clock

There are a handful of solutions to generating a quadrature clock.  I chose the route that uses two D flipflops with reset inputs.  With the correct feedback it’s simple to make an appropriate generator with the only caveat being that the source clock gets divided by four in the process of generating the Q and E signals needed by the 6809E and similar processors.

quadclk

The quadrature clock generator chosen for 6809v2

Progress, at last

Now that the clocking issue is resolved, and NOP’ing the processor on the breadboard works, it’s time to move to something a bit more advanced.  I drafted the Grant’s schematic with my additional clock generator in KiCad and took a crash course in PCB layout.  While the layout wasn’t the most efficient in the world it did fit within my chosen board house’s “prototype pcb” limits nicely.

6809sbc

The “Grant Searle” design adapted for the 6809E CPU

The system above became the target of tinkering for quite some time and did allow some self-education on the system architecture, but it became a bit tedious as the only way to save any progress was to dump it to the tty and log the text output.  Moreso was the system’s inability to tokenize BASIC programs fast enough to keep up with the console’s speed, making it impossible to simply paste data into the tty to ‘reload’ previous works.  Terminal programs do have delays and limits that can be applied to outgoing text dumps, but a simple copy and paste is far easier than dealing with sending text files via that method.  Grant’s notes did include a reasonable method of applying a rudimentary hardware handshake, but it was left out due to a lack of understanding the system’s speed limitations.

On to fixing a few issues:  First, the system couldn’t effectively run my target OS due to a few missing capabilities.  A recurring interrupt is generally required to provide a multitasking manager with its ability to reassign the processor to tasks that are to be handled.  Second, there was no storage on the system to load the OS and its programs from.  Less importantly was a lack of memory.  While OS-9 can run with 32K of ram it’s quite sub-optimal to use a system with that little memory unless it’s running in an embedded situation to handle a single task.

6809, the second

So, now we have our requirements, and one recommendation in addition.  A few other constraints as well, to fit manufacturing and cost limitation.

  • 64K of RAM is required to effectively run OS-9
  • Bulk non-volatile storage is needed to store the OS and its programs
  • A console to communicate with the system
  • The design must fit within 10x10cm to stay in the ‘inexpensive’ board category
  • The design must remain minimal to ease the engineering and assembly effort

As long as I’m designing the system with some additional features in mind, I added two more requests:

  • Two (or more) console ports
  • Faster CPU

That considered, I moved on with research for an I/O solution.

Let’s talk IC’s

I had mentally pre-selected my console solution based on some knowledge of an IC often used with the IDE-64 interface card, called the ‘DUART’.  The IC turns out to be the MC68681 DUART, which quite conveniently had two serial ports as well as some hardware-assisted handshake capability.

My next trick will be providing a GPIO chip for the system to drive a storage solution with.  I did have a desire to stay within the Motorola line for the sake of it, and looked at first at the 6821.  It would serve nicely, but I also need a timer interrupt of some sort and my chip list is growing as quickly as PCB space was disappearing.

For those shouting at me, wait for it.  Just wait for it. 😉

At this point the design was at three DIP40’s, an unknown RAM array, a ROM, and two RS-232 transceivers.  The core logic wasn’t even factored in and the system was starting to look like a dual-PCB solution.  Not Good(TM).

In an attempt to avoid slapping a 6522 VIA on the board I took a break from a week’s spare-time research, and did some prototyping with the 68681 I acquired during this research.

Oh, that’s what it’s like to have an epiphany!

Knowing the Grant Searle design had a 16k area of memory that was decoded but not assigned to any IC, I decided to strap up a 68681 to the board to determine the difficulty in programming the device.

duart-proto-overhead

The research prototype: 68681 on a breadboard to the right

Ah, a system only an engineer could love.  The ratsnest of wires soldered directly to IC’s leads to a 68681 DUART on the prototyping board, and a smaller ratsnest leads to a MAX232 RS-232 transceiver connected to port one on the DUART.  I also used my oscilloscope to detect a toggle on one of the GPIO pins as a sanity test and wrote a short program in BASIC to toggle the GPIO.

blinkyblinky

The sanity test:  Note the toggling signal on the floor-mounted oscilloscope

Success, on the first try even!  In short order I had a simple test program written in basic that would log incoming and outgoing data to the onboard console.  Further testing even confirmed that the DUART is well capable of responding fast enough to not worry about the DTACK output and holding the processor long enough for a bus transaction to complete.

So, what’s this epiphany, you ask?  Did you see the mention of toggling the GPIO above?  Hark!  The 68681 has more lines than necessary to operate a directly-connected console.  Even better, they can be set and reset in software.  There’s even a few -input- lines available, after configuring two console ports with RTS/CTS handshaking.

But wait! There’s more!  The real epiphany came when reading the register descriptions.  This handy device even has a programmable counter that can be put into free-run mode.  My timer!  My GPIO!  They’re already there!

On with it, then!

The system’s I/O now handled, the RAM array became the next and possibly the most expedient decision in the design process.  It’s non-trivial to find a 64Kx8 RAM that doesn’t require a lot of support so a 128Kx8 static RAM was chosen to resolve the 64K-RAM requirement.  There was even a couple GPIO’s left, so I send them both to the RAM socket.  The design would support a maximum of 256K, banking 64K at a time into view.

The firmware storage was already chosen, a 27c64 8Kx8 EPROM was chosen for that, as the device shares a pinout with the somewhat less-familiar 28c64 EEPROM.  An EEPROM is more expensive in most cases, but provides a much shorter development cycle:  An EPROM requires 20-30 minutes of UV exposure to erase and prepare it for another programming cycle, while the EEPROM can be erased in about two hundred milliseconds.  A vast improvement!

Where do I put it all?

On the PCB, obviously.  However, this is more of a discussion on where in the processor’s view things should go.  The initial map was pretty straightforward, and only marginally different from the final memory map.  The mapping of the GPIO pins didn’t change at all between the pre-production revisions of the design.

The initial map was as follows:

Base   End    Description
0000   DFFF   RAM, bankswitched via GPIO
E000   E0FF   I/O page, 68681 present in the first $10 bytes
E100   FFFF   ROM or RAM, controlled via GPIO

This seemed reasonable until I considered the memory layout with the ROM disabled and RAM in its place.  Having the I/O page stuck in the middle like that seemed like a poor design decision.  Upon researching the Color Computer’s memory map I learned that its IO area is closer to the top.  In the end $FF00-FFEF was chosen for the I/O area, with the last 16 bytes at $FFF0-FFFF accessing memory to provide the CPU with its vectors.

The design’s final map is like this:

Base   End    Description
0000   DFFF   RAM, bankswitched via GPIO
E000   FEFF   ROM or RAM, controlled via GPIO
FF00   FFEF   I/O page, 68681 present in the first $10 bytes
FFF0   FFFF   ROM or RAM, controlled via GPIO

A much better layout, allowing contiguous RAM memory to be present from 0 to $FEFF and just enough room for the CPU’s hardware vectors and a bit of scratch memory to possibly handle bank-switching during hardware interrupts.  The vectors would need to be copied to all banks of RAM of course, but that’s beyond the scope here.

Mapping the map-ables

The next design consideration is to consider how to control the mapping of memory.  To aid in design it’s best to think in binary.  Since that’s a lot of ones and zeros (16 of them per word to be exact), a good compromise is to think in a number system that’s closely related.  Hexadecimal will provide an address in four digits so that or binary will be used in this section.

TOP8, the designator of Magic

First we should consider the division of memory in the system.  All of the magic in the system happens in the top area of memory, from $E000 to the top at $FFFF.

As this involves a lot of one’s in the binary address for the range, an AND gate of some form is generally the best device to consider.  The devices in the system generally require an inverted signal to enable them, so a NAND is best.  In the 7400-series IC family one of my favorites got chosen for the job:  The 74LS10, a 3-wide NAND gate.  In simplest terminology, all of its inputs must be high at once for its output to drop low.

Consider the base address in binary format:  $E000 becomes %1110 0000 0000 0000.  As the three highest address bits are set, the ‘LS10’s triple-input becomes best to detect when the top area of memory is being accessed.  The output of this gate will be referred to as ‘TOP8’, to refer to the (decimal) top 8192 bytes of system memory.  Since all the magic in the system occurs here, it’ll become the most important signal in the system.

IOPAGE: When we need to access the world

The next consideration is to select memory or I/O.  Let’s look at the IO page’s base address, $FF00, in binary: %1111 1111 0000 0000.  That’s a lot of ones, and with the TOP8 area, a NAND is best for determining the access of this page.  One of my favorites is a near-perfect fit:  The 74LS133, a 13-input NAND gate.  As with the ‘LS10, all inputs must be high for the output to drop low.

This comes with a caveat however.  Assigning A8-15 to the ‘LS133 decodes the I/O page for the entire range of $FF00-$FFFF, denying access to ROM (or RAM) during hardware vector reads.  An exception must be made, and a new signal is designed for it.

A second element in the 74LS10 is used to detect any address equal $xxEx or $xxFx by attaching A5, A6, and A7 to its inputs.  This is quite a few ranges of memory, but combined with the ‘LS133 above it can be limited to disabling the IO page only for $FFE0-FFFF.  A slight difference in the intended memory map, but a worthy sacrifice to minimize the logic design in the system.  A bonus is the gain of 16 more bytes above the I/O space to enlarge the minuscule dispatch code area above the I/O range.  The output is named ‘EF’ for the two digits it will match.

How do we connect this?  It’s sent into the 74LS133 to force it to de-assert whenever any of these ranges are addressed on the bus.  The final memory zone this section of the circuit will address is $FF00-FFDF, named ‘IOPAGE’.

As the only device intended for this range is the 68681 DUART, its select input is directly connected to the ‘IOPAGE’ signal.

Reading the bootstrap firmware

In order to start the system, the ROM must be readable when it comes out of reset during power-up.  As the ROM occupies the memory range from $E000-FFFF, it will conflict with the IO page described above.  This will be covered shortly.  First, lets work out how to turn the ROM on when it’s accessed.

There are two general signals needed, one of which is newly named: ‘ROMON’.  ROMON is an active-high signal that comes from the DUART’s GPIO port.  It’s intended to allow software to disable the system firmware and place its own code in the RAM area it would otherwise occupy.  The other signal, ‘TOP8’ described above, will be combined to provide the first of the two ROM enable inputs.

A challenge here is the optimization requirement for the system- to keep the IC count to a minimum.  The TOP8 signal’s polarity isn’t suitable for the combination with ROMON, so it gets inverted by a 2-input NAND gate, a 74LS00.  It’s then combined by a second 2-input NAND gate with the ROMON signal.  Its output is named ‘ROMCS’.  The EPROM’s CS line is chosen here to put the device in standby mode and reduce power usage while the ROM is turned off in software.

The next challenge is to ensure the ROM is disabled when the IO page is being accessed.  In a method similar to the combination of ROMON and TOP8 above, the inverted TOP8 signal is used again with a third element of the 74LS00 to enable access to the ROM whenever the IO page is -not- being accessed.  Its output is named ‘ROMOE’, and connected to the EPROM’s OE signal.  With that, the ROM is properly decoded.

So, we’re done, right?  Right??

We’re not quite done.  It’s easy to say ‘and the rest is RAM,’ but that’s not all that easy, especially when we’ll be optionally overlaying an area with the EPROM’s data.  The IO page’s mapping is fixed and always present so it should also be considered.

Conveniently we already have two signals to disable the RAM device:  IOPAGE and ROMCS.  Unfortunately the polarity of both are incorrect for our needs.  They’ll both need inversion to suit the need of RAM selection.  This will be handled by the two remaining NAND gates:  IOPAGE is inverted by the remaining 2-input NAND gate, and ROMCS is inverted by the remaining 3-input NAND gate.  These are combined by the most frustrating decision in the design:  Adding a single OR gate, a 74LS32 to combine them.  The resulting signal is named ‘RAMOE’, and connected to the RAM’s OE signal.  With this, the RAM’s output is enabled except when ROM is enabled for $E000-FFFF, and when the IO page is accessed.

Sadly, the other three gates in the 74LS32 are unused.  (Queue sad music and slow fade)

A feature designed into the system is the ability to write to the RAM, even for the memory area of $E000-FFFF while the EPROM is enabled.  This permits the preloading of RAM data before the system firmware is turned off to help avoid crashes in the event of an unexpected interrupt when configuring the system to run an OS.  To do this, the RAM’s WE signal is tied directly to the R/W signal from the processor.  This will cause the RAM to also receive writes sent to the IO page, but as that area cannot be read by the processor, the data can be considered voided and non-impactful during normal operation.  As the input is directly connected to R/W, it doesn’t have a special name.

The RAM’s requirement to be always written to makes it necessary to select it on every CPU cycle.  Therefore, the CS input will always be enabled by the system clock.  As the E clock’s polarity isn’t suitable for this input, the inverted signal is borrowed from the intermediate stage of the clock generator.

But what about the rest?

As for the software considerations, they’ll be covered in another article.  The last few hardware details are below.

The system’s speed was ultimately set to 3.6864MHz.  This was done to reduce the device count on the board, driving the system with a 14.756MHz crystal.  The CPU’s clock is the same as that required by the DUART, so those familiar with RS-232 designs may recognize this frequency.

The GPIO’s are mapped to a few different devices as shown below:

Pin    Connection
IP0    CTS for port 1
IP1    CTS for port 2
IP2    DCD for port 1 (not routed)
IP3    DCD for port 2 (not routed)
IP4    Unassigned
IP5    SD-Card Data Input

OP0    RTS for port 1
OP1    RTS for port 2
OP2    RAM A16
OP3    RAM A17
OP4    ROMON
OP5    SD-Card Clock Output
OP6    SD-Card Data Output
OP7    SD-Card Select Output

The outputs on the 68681 are all inverted, so when reset by the system reset line, they all are set to high output.  This is a handy feature, as the ROMON signal needs to be high to enable the EPROM, and the SD-Card is de-selected upon reset.

As noted above, system software will be covered in an additional article.

Finding more information [ammendment]

On initially publishing this article, I made the ingenious decision to leave out an important link.  The design files and current source code state are available on my github repository: https://github.com/jbevren/6809v2

I do invite others to use the information there, and even have their own boards made if they wish.  All I want is credit for the original design.

Ascend the soap box

The design is open and free (with credit given, that is) because I feel the 8-bit community is not large enough to hold secrets and the animosity that comes from attempting to keep them so.  At this stage in the community and its associated hobbies, an open and sharing attitude is best for welcoming newcomers- many of which I’ve met that barely know the names and machines we grew up and still have incredible enthusiasm in our aging technologies and the challenges and fun they offer.

If anyone- young or not- is curious about these hobbies, help them learn and encourage them to continue learning on their own.  It’s a part of history that shouldn’t be lost, as automated computation and data processing is taken for granted by many people and its origin should never be forgotten.

Descend the soap box

Thanks for reading everyone! 🙂

 

 

Bringing the Espressobin up

Or, why did I buy this again?  Oh, right- it looked cool.

The Espressobin I purchased via the Globalscale kickstarter arrived Saturday and I finally took the time to work with it.  It’s a rather neat machine that solves an issue I had with small system boards- I wanted a small format low-power router capable system and SBC’s either had a single ethernet port or were too costly to consider.  The three independent ethernet ports and 1G of ram will ensure I won’t have any concern for resources.  Additional hardware details for the espressobin is well covered in Globalscale’s website.

First, a bit of stumbling.

The bringup instructions for their board covers using Ubuntu 14.04 in good detail, but I wanted to use 16.04 as it’s more recent.  Here, I’ll cover the issues I had and how I solved them.  Credits go to Globalscale for their Ubuntu 14.04 instructions, on which these are based.

  • First, there’s apparently no binary kernel on their website, so I used their instructions for installing the toolchain and kernel source to build a working kernel for the board.  Too keep things simple, I just used the defconfig as instructed.
  • Next, I needed some storage for the board.  The version shipped to me doesn’t have an EMMC chip mounted, so I got an SD card and inserted it into my computer.  Ubuntu automatically mounts the sdcard, so I had to umount it to continue as I’ll be changing the filesystem on it.
  • Fdisk was used to change the partition type to 83 (linux).  The use of fdisk in linux should be well understood and I’ll excuse myself from covering the topic in detail as I’m not generally a good instructor.  Note that I only edited the partition type tag rather than zeroing out the MBR and repartitioning the card as Globalscale directs us to.  This is done under the assumption that the SDcard manufacturer aligned the first block of the partition on one of the SDcard’s flash erase boundaries, and re-creating the partition may start the partition in the middle of an erase block and cause filesystem allocation blocks to span multiple erase blocks, reducing performance significantly.  This theory is untested, so your results may vary.  I have the belief the manufacturer would know how the partition should be aligned and would ship the card appropriately configured.
  • Next I used mkfs.ext4 on the sdcard’s partition (mmcblk0p1 in my case as I have an onboard SDcard reader) since the u-boot config on the board can directly read ext4 filesystems.  I ejected the SDcard, waited a few seconds for Ubuntu to realize it’s ejected, and re-inserted it to get it to remount.
  • Time to drop Ubuntu into place.  Since there’s no installer for this board,  I got the 16.04 aarch64 tarball from ubuntu at http://cdimage.ubuntu.com/ubuntu-base/releases/16.04.2/release/ubuntu-base-16.04.2-base-arm64.tar.gz and unpacked it onto the sdcard.  Be VERY careful as it contains a full root filesystem.  If it’s errantly unpacked in the root directory on another unix-like system it WILL destroy it unrecoverably.  Danger-danger, flailing arms, all that.  Seriously.
  • Now that we have a somewhat working root filesystem on the card, we need to copy the kernel that we built using globalscale’s instructions.  Copying the kernel in is also covered in their website, so I won’t repeat them here.
  • The Ubuntu 14.04 instructions for setting up ttyMV0 will apply to this install, and will offer us a login prompt.  Add the content below to /etc/init/ttyMV0.conf.

start on stopped rc or RUNLEVEL=[12345]
stop on runlevel [!12345]
respawn
exec /sbin/getty -L 115200 ttyMV0 vt100 -a root

  • All that was left was to resolve some dependencies for smoothly running apt and debconf.  The stock tarball for 16.04 is very minimally installed and configured, so some additional packages and configuration was necessary to get a quietly running system.  These steps are below.

At this point the startup went swimmingly indicating that ubuntu 16.04 runs happily with the kernel source and toolchain globalscale provided.  However, there was an issue with ttyMV0.conf timing out.  Research discovered some things that were mising, most specifically udev.  Also, Ubuntu 16.04’s stock getty doesn’t appear to auto-log in so we’ll have to set a root password and permit logins.

[add prose to seem clever] [possibly leave this here for a chuckle] [okay, chuckle it is.]

The steps:

In the end, the steps I took to build a brand new image are below.  Hopefully I didn’t forget to document any of them.

  • Insert sdcard and discover its device ID.  Mine’s fixed at /dev/mmcblk0 since it’s not a USB-type reader.
  • Umount the partition that gets automatically mounted
  • Use fdisk to tag partition 1 on the card as type 83 (linux) and mkfs.ext4 to reformat the partition
  • remount the sdcard and cd to its mountpoint
  • host$ curl /ubuntu-base/releases/16.04.2/release/ubuntu-base-16.04.2-base-arm64.tar.gz  |tar zxvf – DANGER! Make sure you’re in the sdcard’s directory before doing this.  It streams the tarball from the web rather than downloading it, so it’ll unpack immediately to wherever you type the command.
  • host$ apt install qemu-user-static on the host system so we can run arm64 binaries
  • host$ cp /usr/bin/qemu-static-aarch64 usr/bin/ on the sdcard so the system will find it during the chroot setup.
  • host$ cp /etc/resolv.conf etc/resolv.conf so that DNS will work within the card’s environment
  • host $ chroot . (include the dot at the end, it indicates to chroot into the current location).  The prompt won’t change much, so use uname -m to ensure you see aarch64 instead of your host’s native architecture.
  • chroot$ apt update
  • chroot$ apt install net-tools udev ifupdown vim iputils-ping less whiptail apt-utils ; It’s important to note that many warnings and errors may come up during this step.  It has a lot to do with being in a chroot without /proc mounted, and a little to do with just being in a chroot jail.
  • chroot$ passwd root
  • chroot$ echo ttyMV0 >> /etc/securetty
  • chroot$ exit
  • Install the kernel and dtb as instructed on Globalscale’s page
  • Set up u-boot as instructed on Globalscale’s page
  • sync, umount, insert into espressobin, and hopefully smile as you see Ubuntu boot up and request a login on the USB serial console.
  • Log in as root, using the password you entered earlier, and then fix language warnings by typing: locale-gen en_US.UTF-8
  • disable unusable VTY’s with: for i in 1 2 3 4 5 6; do systemctl mask getty@tty$i.service; done
  • Edit or create /etc/network/interfaces (you may need to install your favorite editor first) and add:

auto eth0
iface eth0 inet manual

auto lo
iface lo inet loopback

auto lan1
iface lan1 inet dhcp
pre-up /sbin/ifconfig lan1 up

My biggest stumbling block was the networking failing to start up.  I discovered that lan0, lan1, and wan won’t work unless eth0 is up.  The network tools will bring the interfaces up in parallel so lan1 comes up before eth0 is ready, and will stay down.  The key here is the pre-up statement on lan1, bringing it up manually before running dhcp.

  • set /etc/hostname to something meaningful (in my case, I used espressobin)

In closing

So what am I going to do with this machine?  It’ll likely serve as a router in the future.  Embedded routers are great for most people, but I use a dhcp configuration that I haven’t figured out how to do with the options offered by factory, ipfire, or *wrt distributions.  Also, I do have a mild preference for managing most things via ssh rather than a WWW interface.  It’s a matter of personal preferences as it’s how I initially learned to do many things in linux, including setting up NAT under ipfwadm and then ipchains.

Before I use the system as a router I’ll explore its other capabilities, such as the apparently fully functional mini-pci express slot as well as the sata interface.  I’d like to some day be brave enough to try mounting an EMMC chip on the pads Globalscale left in place.  Maybe.

My only real issue with the board is the little power LED.  I could probably hang it above an average solar panel and power a house with the light it puts out.  Okay, probably not.  However it is blindingly bright.  Hopefully Globalscale will revise the BOM by changing to a larger dropout resistor so it doesnt go to solar mode while powered up.

Thanks for reading. 🙂

Programming an EEPROM on a 6502 Microprocessor

Or, updating our firmware internally like a proper computer

The Kickstarter edition of the VIC-1112 IEEE-488 adapter has an EEPROM included on it to enable software updates or even general firmware hacking without removing the IC.  These devices are simple to program, but not quite as simple as just poke’ing ram.  The IC’s need some time to complete a write before they can be accessed again, and indicate their finished state by returning the data that was written to them.

Let’s explore the process of writing to an EEPROM using a generic 6502 system as a model, and expand on using a VIC-20 to do the work later.

I tested some simple write code using byte mode to simplify the code’s logic.  Since the device is only 8,192 bytes it should finish in a reasonable time in spite of not using page mode.

Here’s my first test, a simple program to write 256 bytes in a row to the EEPROM.

zpsrc   =   $42         ; zp source pointer
zpdst   =   zpsrc+2     ; zp pointer to EEPROM
eeprom  =   $A000       ; just to have it defined
code    =   $2100       ; etc.

        *=  $2000
prog    ldy #<code
        sty srcptr
        ldy #>code
        sty zpsrc+1
        ldy #<eeprom
        sty zpdst
        ldy #>eeprom
        sty zpdst+1

        ldy #$00
wloop   lda (zpsrc),y   ; get source
        sta (zpdst),y   ; tell EEPROM to write
cloop   cmp (zpdst),y   ; check to see if it's done
        bne cloop       ; nope, check again

        iny
        bne wloop
        rts             ; 256 bytes programmed.

This works well, but what if the EEPROM is faulty, or the code erroneously tries to program IO, open space, or ROM? What if the EEPROM is faulty and never responds with a success?  It fails badly by never moving forward. This is bad practice as the user waiting for the EEPROM is left completely in the dark when it fails.  Let’s add a timeout in the loop:

wloop   lda (zpsrc),y   ; get source
        sta (zpdst),y   ; tell EEPROM to write
        ldx #$00        ; prepare timeout and give EEPROM a little more time
cloop   cmp (zpdst),y   ; check to see if it's done
        beq wdone       ; nope, check again
        inx             ; count loops
        beq timeout     ; did we overflow? We failed!
        bne cloop       ; keep it as relocatable as we can

wdone   iny

The change is pretty straightforward:  the .X register is used as a timer since we can’t assume a generic system has a hardware timer available.  Datasheets indicate 10ms maximum is required for writing the EEPROM. Let’s allow twice that.  That requires the timeout to give up after a maximum of 20,000 cycles.

Let’s evaluate the loop’s minimal timing.

        sta (zpdst),y   ; doesn't count, this is where we start
        ldx #0          ; 2
cloop   cmp (zpdst),y   ; 5 (6 if page boundaries are crossed)
        beq wdone       ; 2 (exiting the loop doesnt count)
        inx             ; 2
        beq timeout     ; 2 (again, exiting doesnt count)
        bne cloop       ; 3

wdone   iny             ; we're out of the loop here.

The total count inside the loop is fourteen cycles. Considering it will loop a maximum of 256 times before giving up, that’s 3,584 cycles, or 3.5ms at 1MHz. We need to add another 16,340 cycles to the overall loop, which is 65 cycles per loop, minimum.  Since the loop exits when the write’s complete, waiting longer is harmless.

There’s a few ways to waste some time. NOP’ing it out is inefficient; requiring 22 NOPs. How about a nested loop?  We’re out of registers: .A is our value to compare, .X is the failure counter, and .Y is our index.  We’ll have to save and restore a register each time the loop runs, which is easiest to do with .A on NMOS 6502’s. We’ll use it for our nested loop counter.

Here’s our proposed nested loop:

        pha             ; 6     save .A for compare instruction
        lda #$00        ; 2      preset counter
dloop   clc             ; 2
        adc #$01        ; 2     count up
        bne dloop       ; 3     until overflow
        pla             ; 6     restore .A for compare instruction

This requires 8 cycles on entry, and 5 on exit. The exit may look off; it’s offset for the ‘bne dloop’ only taking two cycles instead of three when falling through.  The center is 8 cycles, occuring 256 times. That’s 2048 in all, or 2.048ms.  Adding the head and tail brings us to 2061 cycles in all.

With this, our compare loop interior becomes quite efficient at wasting time.  The interior moves up from 14 cycles to 2075 cycles, and repeating 256 times gives a delay of 531,200 cycles, offering over a half second for the EEPROM to finish its write before the routine gives up on the EEPROM.  Perfect!

Lets see the entire routine:

zpsrc   =   $42         ; zp source pointer
zpdst   =   zpsrc+2     ; zp pointer to EEPROM
eeprom  =   $A000       ; just to have it defined
code    =   $2100       ; etc.

        *=  $2000
prog    ldy #<code
        sty srcptr
        ldy #>code
        sty zpsrc+1
        ldy #<eeprom
        sty zpdst
        ldy #>eeprom
        sty zpdst+1

        ldy #$00        ; preset our index
wloop   lda (zpsrc),y   ; get source
        sta (zpdst),y   ; tell EEPROM to write
        ldx #$00        ;  prepare timeout and give EEPROM a little more time
cloop   cmp (zpdst),y   ;  check to see if it's done
        beq wdone       ;  nope, check again
        pha             ;   save .A for compare instruction
        lda #$00        ;   preset delay counter
dloop   clc             ;
        adc #$01        ;   count up
        bne dloop       ;   until overflow
        pla             ;  restore .A for compare instruction
        inx             ;  increment our timer
        beq timeout     ;   did we out of time? Error out
        bne cloop       ; continue to next byte.

wdone   iny
        sta (zpdst),y   ; tell EEPROM to write
cloop   cmp (zpdst),y   ; check to see if it's done
        bne cloop       ; nope, check again

        iny
        bne wloop
        clc             ; indicate things are fine.
        bcc exit        ; and exit
timeout sec             ; indicate write error
exit    rts             ; 256 bytes programmed.

There are some considerations to be kept in mind when programming the EEPROM.  First, you obviously can’t be running code from the device while programming it.  Data read back isn’t valid until the write’s complete.  Second, if you’re streaming from a file to the EEPROM, you’ll have to ensure it’s not being used to load the data from mass storage.  There are also many other considerations, such as the program’s inability to program more than 256 bytes at a time.

Covering some of these as well as enabling write mode on the Kickstarter 1112 will be covered later on, as this was intended to be more of an intro post for programming EEPROMs on your system.

-David

A simple network card

or, let’s start yet another thing!

I did make a note about increasing the content, and technical difficulties combined with attention span errors caused me to back-burner re-reviewing a certain PDF file. 😉

Enter the Imagewriter II localtalk option card, and what I’ve learned about it so far. Let’s begin.

The device

The card is a network adapter for Apple’s Imagewriter II series printers, allowing a direct connection to an appletalk or localtalk network. This gives a lab the ability to share the printer with all systems on the network without a switchbox or other trickery, while also allowing faster data transfers into the printer: Normally the printer communicates at 9600bps while appletalk runs at 230.4kbps.

I’d had curiosity about the card for quite a few years and spotted one on our favorite second hand website for an inexpensive sum. To this, I thought to myslef “why not? It looks like it shouldnt be too bad to figure out.”

Pre-exam before arrival

When examining the board in any images I could find online I got hints of a few key IC’s that indicate that the network adapter is in fact an intelligent device. Clearly visible in online images are the z8530 one would expect since Appletalk is generally driven by the Zilog SCC, as well as 8k of sram, a 65c02, and a rom-sized IC with an Apple copyright date.

This gives me the impression that the card in general is in fact an independent SBC, and the copyrighted IC is likely a mask or OTP ROM that provides the network services for the printer. In fact, I’d seen images of these cards with a ceramic DIP and paper sticker over a circular indentation, indicating that it is in fact compatible with an EPROM.

Then I considered some requirements of the 6500 series microprocessor family: Ram at pages 0 and 1 (unless you like a gelding system), and ROM at the end of memory (unless you like a comatose computer). Given the SRAM was easily identifiable as a 6264 in the auction page I viewed and the EPROM socket had the same number of pins, I guessed that the firmware was 8k in size.

The other chips aside from the oscillator weren’t well photographed in the auction page so other guessing must be done to distract myself from the waiting process while the card made its journey to my hands. For example, what sort of memory map might this processor see?

Remember again the need for ram at the bottom of memory and rom at the top, and that they’re 8k in size? A likely solution for decoding this easily is by using a 3-8 demultiplexor, likely our friend the 74LS138. Bets are that it’s involved, so lets keep our scope to eight devices.

So far we have four devices. I could see a few chips in the original pictures, a trio of 74ls374 8-bit latches, and a 74ls245 bus transciever. That gives us four more, leaving us at six. Add the SCC and we’re at seven, and some possible control logic regarding controlling data flow to get to eight. No rule says all areas have to be used, however.

The arrival

The board arrived in great condition, and I proceeded to take some good pictures.  Here’s a top shot for reference:

ImageWriter II Localtalk Card

Imagewriter Localtalk Option

When the board arrived I spent no time giving it a good examination. The 65c02 is rated at 2mhz, and a 3.6864mhz crystal clocks the system. A small non-volatile memory IC was also found, adding another device. The remainder are all standard 7400-series logic IC’s, one of which is a 74ls138 as predicted.

Since I can’t remove the CPU as I did for the IIEasy Print board, I’ll have to use a less automated method of finding the memory map. Time to get the continuity checker out!

Mapping the network (card)

If the 74ls138 is indeed used as a memory mapper, a few assumptions could be made. The first is that it’s enabled continuously, and the second that it’s controlled by the CPU’s clock to ensure correct write timing on the SRAM and latches. A quick check on /g1 and /g2 show them to be grounded. G3, the active high input however was not grounded or tied high. An intelligent guess led me to the phase2 line on the CPU, and my beeper agreed. Next is the blocksize decoded by the LS138. The I3 input on the 138 was quickly traced to A15 on the 65c02, I2 to A14, and I1 to A13. Eight 8k blocks of memory. With a single CPU and a 74ls138 the memory map is decoded.

Now let’s follow some outputs. Given the address bit connections, O0 should lead to the RAM chip, and as expected it did. O7 to the rom? Correct. Now where’s that SCC? I parked one probe on my meter on the /cs pin on the SCC and walked my way across the remaining outputs on the ‘138. Connected to O5 on the ‘138, the SCC starts at $a000 and ends at $BFFF.

All but two of the other outputs are traced so far. The remainder are a bit more complex and will be covered in a future update. A detailed memory map is below.

U9: 74ls138
pin name dest
7 /o7 [ e0-ff ] u3:20, u3:22 (8k eprom /oe, /ce) [$e000-$ffff]
9 /o6 [ c0-df ] nc?
10 /o5 [ a0-bf ] u2:33 (scc /cs)
11 /o4 [ 80-9f ] u7:1 (ls374 /oe)
12 /o3 [ 60-7f ] u5:1 (ls374 /oe)
13 /o2 [ 40-5f ] u6:11 (ls374 /load)
14 /o1 [ 20-3f ] nc?
15 /o0 [ 00-1f ] u4:20, u4:22 (8k sram /oe, /ce)

But what about the printer?

The Imagewriter II isn’t incredibly well documented, but a SAMS manual does have something resembling a schematic, in which a port for an ‘optional device’ is pinned out. It has 26 pins like the localtalk card, and has vcc and gnd in the same locations.

The pins are pretty straightforward: eight ‘AD’ pins for a multiplexed address/data bus, an ALE line to latch the address during an access, a DE line to enable the data transfers, and other various control lines.

AD0-7 head straight for the bus transciever to keep the network card off the imagewriter’s ADbus unless there’s a valid transaction, and then head into the three ‘374 8-bit latches. Likely, the latches are used for Address, Data in, and Data out (respectively). There’s some twisty logic to be traced so that this can be worked out.

And the software?

The firmware on the card is still hidden as I’ve not taken time to dump its contents and examine it. This may or may not be covered later as well, but I do find it interesting to find out how the card communicates with the printer. Perhaps other communications cards could be created- perhaps a parallel interface or other networking option?

And then…

To be honest, I didn’t really intend on adding this to my Imagewriter II, since printers aren’t really my thing these days. I just find vintage hardware interesting, and enjoy the process of mapping simple systems to feed my imagination of what could have been done with them had we known what they really were made of.

(Proabaly) more later. 😉

In resurrecting abandoned projects

Or, where the heck have you been?

Once I began analyzing some of the routines internal to PROMAL ages ago, I found myself fascinated and then obsessed with the simple elegance that the system is written with.  I’ve been working on commenting a full disassembly of the program but haven’t yet decided to officially relase it.

Therein comes a challenge for myself.  I’d like to document the steps I used to reverse the language’s concepts, internal variables, and machine code here.  However it will take some effort to refresh my memory as the initial disassembly was performed nearly two years ago, and I’m a bit stuck for time.

Let’s just provide a screenie of the title page for entertainment purposes, as well as a shout-out to the venerable COMPUTE! Books series that covered many CBM systems including drives and computers.

mappnig-promal

Hopefully life will be a bit more generous with spare time and inspiration so I can complete the work as well as get some serious documentation about the process posted here.

By the way, that font was HARD to find.  Still not an exact match for the one COMPUTE! used, but it’s close enough to pass. 🙂

[Edit: Found an attempted escape : q and removed it.  Meh.]

Automated parameter parsing in PROMAL

Or, decrypting the unencrypted cryptic code

I decided to spend some more time looking at how PROMAL works internally. The next routine I decided to examine is BLKMOV, as its function is similar to the MOVSTR function examined earlier.

Let’s have a look at the jump table to find BLKMOV:

EXT ASM PROC BLKMOV AT $F30
0f30  4C A9 21    JMP $21A9

Easy enough! Let’s have a look to see how BLKMOV initializes itself, and to see if it accepts a 16-bit length.

21a9  20 15 18    JSR $1815
21ac  33 E0       RLA ($E0),Y
21ae  38          SEC
21af  34 2C       NOOP $2C,X
21b1  A5 34       LDA $34
21b3  38          SEC
21b4  E5 2C       SBC $2C
21b6  A8          TAY
21b7  A5 35       LDA $35
21b9  E5 2D       SBC $2D
21bb  AA          TAX
21bc  98          TYA
21bd  C5 38       CMP $38
21bf  8A          TXA
21c0  E5 39       SBC $39

Wait. WHAT? (Almost) no professional programmer would use illegal opcodes for a final product. The RLA and NOOP $2C,X are invalid opcodes. Also, I’ve tested PROMAL and found that it works well on the CMD SuperCPU accelerator, which will behave differently with illegal opcodes and cause PROMAL to crash.

Caller relative addressing
Let’s see what exactly is going on with 1815. I’ve studied the routine ahead of time and added some helpful comments and names to the disassembly. Because this routine is used by many library routines, I will refer to the caller as “the parent”. After analyzing the routine I’ll provide BLKMOVs results to help clarify the routine’s work.

                                        ; we are always a grandchild!
                                        ; parameter = value from p-code interp
                                        ; argument  = value from caller's table
1815: 68          getprm  PLA           ; get parents address
1816: 85 3A               STA Parent    ;
1818: 68                  PLA           ;
1819: 85 3B               STA Parent+1  ;   and save for our use
181B: 68                  PLA           ; get grandparent's address
181C: 85 6D               STA GParent   ;
181E: 68                  PLA           ;
181F: 85 6E               STA GParent+1 ;   and save for later restoration
1821: 84 6A               STY numparms  ; save number of parameters
1823: A0 01               LDY #$01      ; initialize index
1825: B1 3A               LDA (Parent),Y; get min/max parameters

I knew something was odd. This is an uncommon but handy trick: The data after the JSR in the parent is never executed. If you examine 1816, you see that it stores the parent’s address from the stack as MOVSTR did. It then stores the parent’s parent’s return address (‘GParent’) so that it can get to the parameters on the stack later. After that’s all set up, the number of parameters from PROMAL is saved to numparms and the Y index is initialized to 1.

Why all this work? The method is used when resources are tight: We have parameters on the stack that get processed by many routines in the library. It’s best for code space efficiency if a single routine handles these parameters. However, not all library routines use the same number or even type of parameters. That’s where this routine comes in. The arguments for the ‘getprm’ routine are stored after the JSR from the library routine calling it. This way each library routine will be able to specify what type of information it expects to find on the stack.

On arguments and parameters
In this post I need to distinguish between two things: The data used by the getprm routine, and the data the parent needs from the stack. In this case, ‘argument’ refers to data used by getprm, and ‘parameter’ refers to any data passed on the stack by PROMAL. This is done in consistency with the MOVSTR post.

Let’s have a good look at this routine to understand what it does.

Setting things up
We already have the calling routines’ addresses safely shuffled away, and we have our first argument retrieved from the parent.

1827: 29 0F               AND #$0F      ;  Mask max #parms off
1829: 85 69               STA maxparms  ;   and save
182B: C5 6A               CMP numparms  ;  compare with paremeter count
182D: 90 10               BCC getperr   ;   too many, runtime error

In the segment above, the argument loaded from the parent is masked off and saved. Studying the routine ahead of time helped me understand that the low half of the first argument is the ‘minimum’ number of arguments the parent requires. If the number of arguments provided by PROMAL (‘CMP numparms’) is larger than the maximum, the routine branches off to a fatal runtime error (‘BCC getperr’).

182F: B1 3A               LDA (Parent),Y; get min/max parameters
1831: F0 40               BEQ getpfin   ;  0/0 parms? exit.

The argument is reloaded since it was mangled when setting up maxparms. While it’s loaded and un-mangled, the routine checks to see if there are no parameters to be loaded. If this is the case, the routine exits. It would seem to make no sense to call this routine if you don’t want any parameters. I’d agree, but there must be a good reason to call it in this fashion as a few library routines do just that.

1833: 4A                  LSR           ;
1834: 4A                  LSR           ;
1835: 4A                  LSR           ;
1836: 4A                  LSR           ;
1837: 85 6B               STA minparms  ;  save min #parms
1839: C5 6A               CMP numparms  ;  compare with parameter count
183B: F0 05               BEQ getpok    ;  same? ok.
183D: 90 03               BCC getpok    ;  more than min parms? ok.
183F: 4C 80 10    getperr JMP syserr    ; fail out via system error

Now, the upper half of the first argument is shifted down and store in ‘minparms’. It’s again compared to the numparms value, this time to determine if there are at least the correct number of parameters (beq: bcc).  If not, the routine fails through to a jump to PROMAL’s fatal runtime error routine.

1842: C8          getpok  INY           ; Increment index
1843: B1 3A               LDA (Parent),Y; Get mask bits

There will be a lot of INY : LDA (Parent),y as the routine works its way through the argument table.

1845: 85 6C               STA getpmsk   ; Save in mask byte

The second argument is stored to getpmsk, short for ‘getprm mask.’ This byte is actually eight flags, each indicating the type of parameter to get from the stack. There are two types of data and one way to work with each. As a quick reminder, PROMAL always pushes parameters as words, even when they’re bytes.

Bit = 0    Parameter is a byte
           The next argument byte is a zero-page address and a default value.
           * Store this byte where specified at the address
           * Load and discard high byte from stack if applicable
           * Load and store low byte from stack at the address if applicable
Bit = 1    Parameter is a word
           This argument is one zero-page address.
           * Load and store the high byte from stack at the address+1
           * Load and store the low byte from the stack at the address

The routine appears to not have any facilities for handling a default 16-bit value.  It’ll be up to the parent to detect a missing 16-bit parameter and set up a default value in its place.

Processing arguments and setting up parameters
At this point, the routine is initialized and ready to load parameters as specified by the parent until it’s out of arguments.

1847: C6 69       getpl   DEC maxparms  ; Decrement parameter count
1849: 30 28               BMI getpfin   ;  Out of parms? exit.

Maxparms is now used as a count-down value to determine when the routine’s out of arguments. The name of the location is a bit of a misnomer, I apologize.

184B: C8                  INY           ;
184C: B1 3A               LDA (Parent),Y; get zp address
184E: AA                  TAX           ;  and save

The arguments now always start with a zero page address. This is read from the argument table and saved in the X register to be used as an index. This allows the code to run without modifying itself and is a good example of advanced indexing when used in this situation.

184F: A5 69               LDA maxparms  ; check max parms
1851: C5 6B               CMP minparms  ;  are we out of required parms?
1853: 90 0F               BCC gpfprm    ;  No, go pull it off the stack.

In this section of the loop, maxparms is compared with minparms to determine whether or not we’re out of required parameters.

1855: 24 6C               BIT getpmsk   ; is current parm a word?
1857: 30 05               BMI gpisw     ;  yep, skip

Remember the paramter type mask? This is one of the two checks against the flags in the loop. The BIT instruction does a handful of things, but of interest to the routine is the way it copies bit 7 of getpmsk to the negative flag without modifying any other registers. In this case, if a parameter is a word the negative flag gets set and the BMI (branch if minus) routes the cpu to gpisw (short for getparm is word), below.

1859: C8                  INY           ;
185A: B1 3A               LDA (Parent),Y; get default value or low byte
185C: 95 00               STA 0,X    ;  store at zp address
185E: A5 69       gpisw   LDA maxparms  ; is current argument
1860: C5 6A               CMP numparms  ; greater than parameter count?
1862: B0 0A               BCS gpdefl    ;  Yes, process default value

The next argument byte is loaded if it’s a ‘byte’ type. It’s stored at the zero page location pointed to by X, which was read in earlier. Then it follows through to ‘gpisw’, which checks the current argument against the number of parameters provided to the parent by PROMAL. If we’re out of parameters, we skip off to gpdefl, which is short for ‘getparm default’.

1864: 68          gpfprm  PLA           ; Get parameter from stack
1865: 24 6C               BIT getpmsk   ; current parm = word?
1867: 10 02               BPL gpis8     ;  no, skip high byte store
1869: 95 01               STA 1,X   ;  * store high byte if word
186B: 68          gpis8   PLA           ; get low byte or default value
186C: 95 00               STA 0,X    ; store where requested

In a moment whose reason eludes me, I called this branch point gpfprm. What this section does is first pull the high byte of the next parameter from PROMAL off the stack and then check the parameter type mask to see if it’s a byte type. If so (BPL, as bit 7 would be a zero its plus or positive), it skips to gpis8, discarding the byte. If it’s a word, it gets stored to X+1 by using a base of 1 instead of 0.
Gpis8 always pulls and stores the byte to the location indicated by X.

Defaults!
This method is clever: The routine first loads the default value from the argument block into memory, and then only loads a value if there’s one available on the stack. It’s a good way of ensuring a default is in place if it’s not specified by PROMAL.

186E: 06 6C       gpdefl  ASL getpmsk   ; shift parameter bit mask
1870: 4C 47 18            JMP getpl     ; loop

The argument mask is shifted one to the left to ensure it stays in sync with the argument index in Y. Then, the loop is restarted.

Cleaning up

1873: 98          getpfin TYA           ; transfer index to A
1874: 18                  CLC           ; pre for math
1875: 65 3A               ADC Parent    ; Add our parent's return address
1877: AA                  TAX           ;
1878: A5 3B               LDA Parent+1  ;
187A: 69 00               ADC #$00      ;
187C: 48                  PHA           ;  and put on stack
187D: 8A                  TXA           ;
187E: 48                  PHA           ;  for rts.

As we don’t want to return into a data block, we’ll add our current value for Y to the parent’s calling address and put it on the stack. This ensures we safely RTS into the byte following the argument table.

187F: A5 6D               LDA GParent   ; get grandparent's address
1881: 85 3A               STA Parent    ; place where p-code expects parent's
1883: A5 6E               LDA GParent+1 ;
1885: 85 3B               STA Parent+1  ;
1887: 60                  RTS           ; and return to arg table+1

And as a last bit of cleanup, the grandparent’s address is placed where our parent would expect it to be, leaving the runtime in a good state and keeping the stack clear.

How BLKMOV used this routine
BLKMOV used this routine to set up all of its zero page vectors. Once getprm is done, the routine looks largely like MOVSTR, so I’ll (probably) cover it later.

Here’s what getprm did for BLKMOV. I’ll include the first part of BLKMOV again, with a bit better formatting since we know what the data following the JSR is for.

21a9  20 15 18    JSR $1815     ; jsr to getprm
                  .byte $33     ; min/max number of parameters
                  .byte $e0     ; %1110 0000 - all three parameters are words
                  .byte $38     ; first word stores at $38 and $39
                  .byte $34     ; second word stores at $34 and $35
                  .byte $2c     ; third word stores at $2c and $2d

When getprm runs for BLKMOV, it performs these actions:
* Writes the last parameter (Count) to $38 and $39
* Writes the second parameter (From) to $34 and $35
* Writes the first parameter (To) to $2c and $2d
* Cleans house and returns to the byte following the argument table at 21b1

This might look a little familiar. MOVSTR uses the same vectors.

In summary
The routine is very handy in that you can easily specify what you need loaded, as well as quickly specifying how many parameters you require and how many you can take. The limit for the number of parameters is sensibly eight, given the mask argument is a byte providing 8 flags for parameter types.

The getprm routine is used by many routines in the PROMAL system, including (but not limited to) GETC, GETL, BLKMOV, OPEN, CLOSE, CHKSUM, and EDLINE.