Making things a bit faster

Or, reducing frustration

The project has seen a bit of silence recently.  I’d apologize but I’m not really all that sorry about it.  I’ll offer an explanation:

The process for programming the NVRAM takes 6 minutes for a full update, and the tools I made in BASIC are at best a kludge.  If I did anything imperfectly I’d most often have to reload the entire 8k since I don’t have a debugger yet.

Enter PROMAL discovered by an online colleague we lovingly refer to as ShadowM.  He recently acquired the long forgotten and abandoned software and graciously offers it to the Commodore community.

PROMAL is a high level compiled application language originally written for the C-64, Apple2, and IBM PC running DOS.  To date, Ive only managed to find the C-64 version ShadowM has on his site.

PROMAL is a native development environment including a functional editor and commandline environment that supports passing arguments to programs being called.  This is a great advantage, as a user can just specify an action as well as a target, rather than having the action app request what the target is intended to be.

As PROMAL is a compiled language, it does tend to run a bit more efficiently.  There are also other things that can be done in the language that enables one to increase efficiency.  For example, the most common data type is a 16bit word rather than a 40bit float.  All those loops in the BASIC program?  The address counters, the data byte… all of those are stored and processed as floats.  Not so in the programs below, as I only use ‘byte’ and ‘word’ variables, which are optimally sized as 8 bit and 16 bit unsigned numbers, respecitvely.

How much more efficient than BASIC is this implementation?  In this specific application we’re looking at a speed increase of 383%.  Nearly four times the operational speed.  There’s light at the end of the reprogramming tunnel the moment enter is pressed!

Another advantage of PROMAL is the ability to define where a variable rests in memory.  Take the nvdefs.s file below as an example:

ext byte via1[] at $de20

There are some key words here:  ‘ext’ refers to an external reference.  ‘byte’ defines the variable’s values to be stored as bytes.  The name ‘via1’ is assigned and the brackets indicate the variable is an array.  Finally, the specified address is $de20.  Readers might remember the address of the first VIA on the IO card I’m using for this project being at that address.

What does this do for me?  It’s simple.  I use the constant also defined in nvdefs.s called ‘porta’ to set up the output pins as desired, without a poke or a lot of math: via1[porta]=$ff.  This causes the value of porta to be offset into the via1 array, and a $ff is stored there, turning all bits on.  No run around looking up a variable, getting its value, then converting the float to an unsigned integer, setting up a vector, loading porta into an index, and hten finally storing.  It just skips the second through fifth steps.   You might also notice ‘addr’ in nvdefs, which is a WORD type set at DE20.  This means any 16bit address stored in the ‘addr’ variable is automatically stored to both port a and port b on the first via at that address.  No address splitting or additional processing.  This all adds to speed, which really matters when you only have approximately 300,000 operations/second available.

Below is the nvdefs.s file I created for the project.  PROMAL includes among other things the ability to include secondary files in the same way the modern C compiler has headers.  These files are considered part of the compiler’s input stream at the position they’re included, and can provide hardware and operating system abstraction.  If the apple2 version of PROMAL weren’t made of unobtanium and a “john bell” 32bit IO card were in use, nvdefs.h could be changed to reflect the IO port addresses for the apple2 card and the application below could be used without modifications after a simple recompile.

File: nvdefs.s

;DEFs for nvram read/write via Schnedler ultimate interface
;IF VIAS CHANGE also change named items below.

; assume VIA1 is at DE20, VIA2 at DE30
ext byte via1[] at $de20
ext byte via2[] at $de30
con ddra=3
con ddrb=2
con pa=1
con pb=0

; Named items for nvram config:
; Bits on control port: WE CS OE nc nc nc nc nc
con off=$e0 ; chip offline
con wr =$20 ; chip write
con rd =$80 ; chip read

; Named items for via control
con in=0   ; for DDR
con out=255; for DDR

; Named items for nvram address/data bus
ext word addr at $de20 ; access via1 ports a and b as 16bit unsigned!
ext byte dat  at $de31 ; via2 port a for data bus
ext byte dc   at $de33 ; via2 port a ddr for data in/out
ext byte ctl  at $de30 ; via2 port b for chip control

And now the tool written to reprogram the NVRAM, rewritten in PROMAL.  It accepts its parameters from the commandline, making it easy to specify what needs to be done.

File: nvwr.s

program nvwr

include library
include nvdefs

word a       ; address presented to nvram
word length  ; number of bytes to prog
word file    ; file handle
word total   ; total bytes written
byte d       ; data byte to program
byte t       ; scratch space

if ncarg < 1
  put "Usage:  nvprog <file> [addr] [len]",nl
  put "  All values are specified",nl
  put "  in hexadecimal.",nl,nl
if file=0
  put "Unable to open ",carg[1],nl
if ncarg>1
  t=strval(carg[2],#a,16,4) ; address
if ncarg>2
  t=strval(carg[3],#length,16,4) ; bytes to program

output "Loading $#4H bytes from ",length
put carg[1]," into NVRAM.",nl

while getcf(file,#d) and length>0
  ctl =off ; ensure chip is offline
  addr=a   ; set up address bus
  dat =d   ; set up data bus
  dc  =out ; drive data bus
  ctl =wr  ; write byte (cs,we)
  ctl =off ; back offline
  dc  =in  ; prep for verify
  ctl =rd  ; re-read byte
  if dat <> d ; live compare data bits on ram with d
    put "ERROR: failed to verify byte",nl
    output "#4H >#2H <#2H#C",a,d,dat
  length=length-1 ; count bytes left to program

output "Wrote #4H bytes.#C",total


Sadly, I don’t offer anything in the way of pictures in this post, but I hope the information was interesting regardless.  Real Soon Now(TM) I’ll be able to get a more featured toolset for getting my minimal debugger going on the IIEasy Print card.  Once the machine language monitor is fully functional, everything will suddenly become very simple.

To learn more about PROMAL, you can download the documentation easily from ShadowM’s webpage.



WTH is this thing doing?

Or, examining the reset cycle.

In traditional fashion for myself, I bit off a bit more than I can easily chew.  I added some nice routines to a vintage computer’s system debugger in an attempt to quickly port it to the board and hopefully just get rolling.  These software modifications will be covered later.  Today however we’ll be covering a second mental exercise.  Off to the oscilloscope.

For a bit of early reference, I’ll be doing this with a Tektronix 2246 scope, using four channels as described.  They’re listed in the order they appear in the ‘scope pictures, from top to bottom.

  • Channel 1: /RESET line on the cpu
  • Channel 3: Phase2 system clock, to which all CPU transactions are synchronized
  • Channel 2: Floating probe to examine data lines during the reset loop
  • Channel 4: Floating probe with hook to attach to a visual reference signal

The challenge in figuring out the reset sequence is that there’s no really good way to trigger it besides repeatedly resetting the CPU.  Even then, you’d have to trigger on the rising edge of the /RESET line and then change your trigger to the CPU’s PHASE2 output to get good sync with your data transfers or you’ll just see intermittent pulses of unaligned digital noise.


A challenge I faced as well is that I run everything on a two by four foot desk that has a PC keyboard as well as my GPIO enhanced Commodore.  The scope fortunately doesn’t mind running on the floor in a pinch, so it sits to my left.  This excludes any space for a signal generator to provide the repeating reset pulse.  What to do?

Finding a slow clock

As I needed to repeatedly reset the CPU at a low frequency and didn’t have a clock generator handy, I thought of options:

  • Program the 6522 VIA on my Commodore to generate a repeating pulse
  • Use a serial TTY to generate a repeating pulse by sending nulls out its port continuously

The second option was chosen since the serial port was already connected.  I modified the reset pin on the CPU so that it hangs out of its socket instead of being attached, and tied it to the RXD pin on the 6551 ACIA.  Note carefully that I didn’t connect it directly to the serial line, which would have permanently destroyed the CPU.  RS232 lines can run as much as +/-12v.

That fixed, we’re back to the trigger challenge.

Advanced triggering on the Tektronix 2246

The Tektronix 2246 oscilloscope has a pretty remarkable trigger section on it.  I had not considered the use of an A/B trigger setup since college, so it was a bit of self re-education.  The needs are simple: First, wait for /RESET to go high (trigger A, rising slope, channel 1).  Next, trigger on phase2 (Trigger B, rising slope, channel 3).

After a bit of trial and error as well as some remembering how to read the double ghosted image on the Tek’s display, I remembered what was I was seeing and settled on a configuration.  Here’s the steps I take.

  1. Attach lines in the configuration listed earlier in this article
  2. Set up A trigger (the most commonly used scope mode)
  3. Get my PC transmitting nulls:
    jbevren@epicfail:~/projects.local/iieasy$ sudo stty ispeed 9600 </dev/ttyS4
    jbevren@epicfail:~/projects.local/iieasy$ cat /dev/zero >/dev/ttyS4
  4. Set up a stable signal for channel 1:
  5. Trigger source: channel 1, DC coupling, auto level
  6. Place horizontal section into ‘alt’ mode
  7. Reduce the ‘A’ intensity and increase the ‘B’ intensity
  8. Set up a stable trigger for channel 3:
  9. Trigger source: channel 3, DC coupling, auto level

At this point, you should see something similar to the picture here.


The bright section at the left illustrates the section visible in trigger B.  It may help to understand the configuration if you monitor the width of the highlighted section while changing the horizontal sweep time while in the A/B alt setup:  The horizontal setting no longer affects the initial A trigger setup, allowing me to magnify as needed.

At this point, I don’t need to see the ‘A’ section, as I’ll only be checking in on the first few transactions to ensure the NVRAM’s getting read in correctly.  In this case, I set the horizontal section’s mode to ‘B’ only.  The ‘scope still processes the A trigger but no longer displays it.  This saves me some brain time as I won’t have to mentally separate the two overlaid images.

Now, I can see what’s going on through a slight bit of visual jitter on the B trigger.  I’ll attach channel 4 to the CPU’s SYNC output, which asserts at a logic high level each time a new instruction is fetched.


As you can see there’s a bit of ghosting going on the display.  I can see through it, so it’s not a huge issue for me.  However, to try and clean it up a bit, I’ll try triggering on channel 4 (now on SYNC) to see if it remains stable after the A trigger’s processed.


This setup is perfect.  The initial SYNC pulse caused by the reset isn’t visible any longer, and a single SYNC pulse is visible on the right third of the display’s bottom line.

Finally, examining the reset sequence

Our first target is the address on the first SYNC pulse after RESET completes.  The address will let us know if the reset vector gets read correctly from the firmware NVRAM and will also tell us where the CPU’s actually starting up.  The data’s not yet important, so we’ll check address bits one by one.


I chose to provide more entertaining drivel over image processing, so the montage above only contains a few of the address lines I checked, recklessly copy/pasted into a modified image.  They’re address bits 3-0 from top to bottom.  If you look at them aligned with the clock at its low state you’ll see the first fetch is %1100, or a $C- the last digit in $FF5C.

I prefer to work from the high address bits down, as it makes it easier to think in a numeric fashion:  We write our digits in such a way that the highest valued digit is entered first.  In “$1,000,002” the 1 certainly has a higher value than the 2.  Examining the bits in this order allows me to simply enter each binary digit as I walk my way down the chip’s address lines.

To help find the appropriate pins, I’ve marked my CPU out with a bit of pencil to separate the address bits into groups of four.  The lead shines in the overhead light enabling me to quickly see where the labels are.


For anyone following along on a datasheet, remember this isn’t a standard CMOS 65c02.  It’s a g65sc102 CPU, which is software compatible but has a slightly modified pin layout.

In the end, the 16 address bits at the SYNC mark show me this first opcode fetch at address %1111 1111 0101 1100.  Note that the SYNC pulse is two cycles long, and the second address is incremented by one.  This comes to $FF5C, which is correct.  For reasons I’ll share later, a few lightbulbs may be appearing in some readers’ heads.  Don’t feel alarmed if the address looks familiar.

As the reset vector’s getting read correctly but the serial chip’s not getting polled as I expected, I’ll also read the opcodes it should be reading.  In a working system I’d only have one SYNC cycle, as the initial opcode is a JSR to a subroutine.

Here’s what the oscilloscope told me:

  • [sync] 0011 0011 = $33, invalid opcode [nop]
  • [sync] 0010 0000 = $20, JSR
  • [norm] 0010 0111 = $27
  • [norm] 1111 1111 = $FF

If the scope tells the truth, we start with an invalid $33 opcode, followed by jsr $FF27.  This isn’t what should be happening.  Considering I’m prone to errors as a part of being human, it’s entirely possible that my NVRAM programmer wiring or code may be at fault, so it’s time to re-verify that part of the project.  Perhaps it’s even a good opportunity to wire up a board that would plug into the GPIO card rather than running a haystack of wires to a dip socket. 😉


Today I learned a lot from things not working as expected.  A few people who do projects have told their audience that the best chance to learn is when there’s a failure.  I can agree wholeheartedly after today’s experience, as I re-learned a skill we got in our second year of college:  debugging a looping program using a simple 4-channel oscilloscope.  I also learned the new skill of advanced triggering on my own ‘scope, and will be able to use it in future projects without a doubt!

More will come later, as I take some time to re-verify the code for the programmer as well as the GPIO wiring to the socket used to set the NVRAM up with its code.

IIeasy Print recycling

Or, Reverse engineering a 6502 SBC

The victim

A few weeks ago I acquired a “IIeasy Print” automatic printer buffer/switch designed for Apple systems.  It was a curiosity because it had a few nice features and appeared to be a self-contained system.  The items below were discovered during research before the card arrived.

  • CPU (believed to be a g65sc102 based on usenet posts, a cmos 6502 variant)
  • 256k ram (two chips, 128kx8 PSRAM)
  • 6 serial ports (6551 acia)
  • system rom (unknown, believed to be 8k 2364 or 2764)
  • system ram (6116 2kx8 sram)

The type of cpu and rom were unknown as they had stickers covering their information in any photos I could locate.  Scouring the image of the card gave me the 6116 and 6551 chip ID’s, as well as a couple of others, such as 74ls138’s to assist address decoding and a 74ls374 near a 14 pin header.

Beyond this, the next step was to imagine possible memory maps and wait for the card to arrive.

Upon arrival, I eagerly opened the box and stripped labels off the unknown IC’s and identify chips further.  The CPU was confirmed to be a g65sc102 CPU, and the master crystal was found to be 7.32mhz.

Regarding system speed, most 6551’s are set up with a 1.84mhz clock to get perfect data rates.  The crystal is exactly four times the ideal clock.  The 6551’s are gs65sc51p-2, indicating a 2mhz ceiling on system speed.  As the CPU runs in sync with its IO chips, it’s also going to run at a max of 2mhz.

In effort to save costs, only one crystal is on card, and a system clock of less than 1mhz doesn’t sound reasonable for handling six serial ports, so a 1.8mhz system clock is assumed.

The next step is to verify the system’s memory map.  As there’s a PLA on card, it’s not going to be easy to work out the memory map based on the PCB traces and logic on the card.  Instead, I chose to brute-force the memory map via the CPU socket.

Before this could be configured I needed to decide on how exactly to scan the card for its memory map.  The most straightforward method seemed to be driving the address bus through the CPU socket and monitoring the chip select signals on the individual IC’s.  A relatively simple program could be built to do the work and create a log of what addresses involve each chip select line.

In the image above, you can see my trusty keyboard-enhanced Arduino work-alike.  Most people would call this a Commodore 64, but in this case I’m using it in the way most people these days would use an Arduino.

The vertical card is in a bus extender plugged directly into the Commodore’s memory bus.  It’s loaded with a pair of R6522P VIA’s, each of which provide sixteen GPIO lines.  The ribbon cables lead to an Atari VBXE adapter.  The adapter simply mirrors the 40 pins in the dip socket to a header on the side, making it easier to connect to the socket reliably.  Due to physical constraints, I had to insert the adapter upside down, but it can still serve its purpose in this configuration.

The upper VIA has all sixteen GPIO lines routed directly to the CPU socket’s address pins.  You can see this configuration with the two 8-line ‘rainbow’ cables leading from the vertical card to the IIeasy Print card under test.

The lower VIA serves two purposes:  First, port B’s higher bits control the CPU socket’s control bus, managing /reset, r/w, and phi2.  This is the smaller ribbon cable that has blue, white, grey, and purple connecting to the left edge of the header on the CPU socket.

The lower bits on port B as well as all bits on port A are configured as inputs, monitoring chip select signals on each IC I believed to be a memory bus target:  The 2k SRAM, each of the 128K sram’s, the system rom, the six 6551 ICs, and the 74ls374.  For socketed ICs, I simply lifted them from their sockets and attached a male pin lead to the /cs line on the target.  /OE was ignored as most systems based on the 6502 will ground /OE and use /CS and /WE to control the IC.  Chips without sockets had a test pin tack-soldered to their respective /CS (or CLK line in the 74ls374’s case) and a male-female pin lead was routed to the lower VIA’s port A lines.

The next step is to write a program to scan the card.  I wrote this in BASIC.  BASIC is slow, but it’s built into the system ROM on the Commodore and does its job well.

10 V0=56864
11 V1=56880
12 PB(0)=V0
13 PA(0)=V0+15
14 PB(1)=V1
15 PA(1)=V1+15
16 POKE V0+2,255:POKEV0+3,255:REM OUT
17 POKE V1+2,240:POKEV1+3,0
18 REM VIA 0A:     ADR LOW
21 REM VIA 1B7:    PHI2 OUT
22 REM VIA 1B6:    R/W OUT
60 POKE PB(0),0:POKE PA(0),0
70 POKE PB(1),0:FORT=1TO300:NEXT
80 POKE PB(1),32
100 FOR AH=0TO255
110 POKE PB(0),AH
120 FOR AL=0TO255
130 POKE PA(0),AL
140 PRINTAH*256+AL"{up}"
150 Y=PEEK(PA(1))
160 X=PEEK(PB(1))
210 PRINT ,"NEW: ";
220 B=(X AND 15)
230 GOSUB 300
240 B=Y
250 GOSUB 300
310 FOR T=7 TO 0 STEP -1
320 PRINTMID$(STR$(SGN(BAND(2^T))),2,1);
330 IF T=4 THEN PRINT" ";
340 NEXT
350 PRINT"  ";

Initial testing confirmed my bus scanner was working correctly by manually setting GPIO lines and checking the 2k SRAM’s /cs pin.  As expected, the 2k SRAM turned out to be at address 0, so it made a good test target.

From there, the program was run and left to scan the bus, while it displayed any changes in the chip select lines’ outputs.

Would you look at all that dust! :P

After the program was run, I evaluated the chip select log and came to this memory map as a conclusion:
Address Range   IC      Desc
0000-07ff       U503    2116 soldered-in ram
0800-1fff       U503    mirrors of U503
2000-9fff       u401/2  Buffer ram (256k is bankswitched here)
a000-bfff       empty
c000-c01f       U101    6551 acia
c020-c03f       U102    6551 acia
c040-c05f       U103    6551 acia
c060-c07f       U104    6551 acia
c080-c09f       U105    6551 acia
c0a0-c0bf       U106    6551 acia
c0c0-c0df       U701    74ls374
c0e0-c0ff       unknown
c100-c7ff       Mirrors
c800-cbff       U504    Read jumpers
cc00-cfff       Buffer RAM mapping
e000-ffff       U502    Firmware ROM (2764)

Of interest to me was that only 7 of 8 devices are present in the C000 range, and that only 32k of the 128k memory was mapped.  It’s time to read the ROM and get some more educated guesses about the card.

Since most of the hardware to fully control the card was already set up, I placed the system ROM back in its socket and changed the lower VIA’s port A to attach to the data lines.  I slowly but surely extracted the system code from the card and saved it to a file.

After loading the data into RAM on the Commodore and viewing it in a debugger, there appeared to be an initial store that wasn’t obvious:  The CMOS ‘STZ’ instruction was used to store a 0 into $CC00, and the debugger didn’t understand CMOS opcodes.  Further testing and experimentation discovered that the $CC00 register controls memory banking on the two 128k SRAMs as detailed below:

Bit     Effect
0       A15 on U401 and U402
1       0=select U401, 1=Select U402
2       A16 on U401 and U402

There are also many writes to the $C0E0 location in memory, but I wasn’t able to determine what IC is connected to it.  I might discover that later.
As I’m now aware of the system’s memory map, I could consider other uses for the card.  Currently, I”m working on a port of the original Apple2 System Monitor, as it’s almost as simple as a system firmware can get, requires only 2k of space, and is something I’m quite familiar with.  In ideal conditions the only real work for getting minimal functionality from the system monitor firmware would be patching code that writes to the screen and reads from the keyboard.  From there, additional changes could be made.  That will be covered in a later post.

Edited May 14 2015 to correct the ACIAs’ addresses.