Underneath the Hood of the Super CPU

by Jim Brain (brain@mail.msen.com)

Does your mind go blank when you hear about the SuperCPU? With all the mention of it in magazines and newsletters, are you left wondering how much of the discussion is hype and how much is true? Are you worried that this latest attempt is just another design destined for failure like the others before it? Well, if so, then you're not alone. With the reputation accelerator cartridges and their manufacturers have acquired over the years, you are wise to be concerned. Judge for yourself, as we peer under the hood of the Creative Micro Designs SuperCPU accelerator cartridges.

Note: The information contained in this article has been gleaned from talks with CMD, Mr. Charlie Christianson's post to comp.sys.cbm, responses to USENET posts by Mr. Doug Cotton, and information from CommodoreWorld Issue #12. While general information is not likely to change, some details discussed in this article may differ slightly from those incorporated in the final product.

What's An Accelerator?

Did you know a Commodore 64 CPU executes things at 1 MHz? A tiny clock inside the 64 ticks off 1 million "cycles" per second, and instructs the CPU to move forward one cycle at a time. The CPU, in turn, either executes an internal operation, reads from memory, or writes to memory during that cycle. These operations are concatenated to form funtions, which is the smallest piece of work a programmer can ask the CPU to perform. These functions are called instruction, and take an average of 3 cycles each to perform. So, the typical C64CPU does 333,333 things a second. The C128 fares a bit better, as it can run twice as fast when in "fast" mode. In either case, there is an upper bound on the amount of useful work each CPU can do in an amount of time.

An accelerator increases that amount of work done by substituting a faster CPU and clock speed for the 1 MHz 64 CPU. The ratio of increase should be as easy to determine as dividing the new clock frequency by 1 MHz for a 64. If this were true, an accelerator that runs at 4 Mhz would execute things at 4 times the speed of a stock 64. Sadly, this is not true, since not all parts of the system can be sped up to the higher frequency. So, the accelerator runs at full speed while it utilizes ICs designed for the faster clock speed, and slows down when it must "talk" with ICs like the SID and VIC-II in the 64, which run only at the slow 1 MHz clock speed.

Most accelerators are produced as large cartridges that plug into the expansion port of the computer system. Some require special wires be attached to internal components, while others do not.

The New Kid on the Block

In mid 1995, Creative Micro Designs, after having evaluated the FLASH 8 accelerator from Europe with only mild success, noted that there might possibly be a market for a speedy accelerator that would run GEOS and other useful applications in the USA. After surveying the readership of Commodore World, the Internet, and FIDONet, CMD decided that interest in such a unit was forthcoming. Shortly thereafter, the SuperCPU announcement was made.

As development work ensued, progress reports and preliminary information about the product surfaced from CMD. The first items involved the processor choice, which was originally the 65C02S, but is now its bigger brother, the 16 bit 65C816S. Another piece of information involved the case, which is an enclosure 6" wide by 2" deep by 3" wide. This enclosure contains a circuit board protruding from the front of the unit that will plug into the Commodore 64 or 128 expansion port. In back, a complementary card edge connector is provided to pass signals through the cartridge. This will allow users to attach other expansion port cartridges to the system. On top sit three switches, described below.

The first switch enables or disables the SuperCPU unit. The second switch enables or disables JiffyDOS, which is built into the unit. The third switch determines the speed of the unit. This third switch has three positions. The first position forces the accelerator to operate at 1 MHz speed (the same speed as the stock C64). The second position allows the programmer to change the speed via a register in the SuperCPU memory map. The third position locks the SuperCPU into 20 MHz mode, regardless of register settings.

The use of the CMD SuperCPU will be straightforward. Simply plug the unit into the expansion port, set the appropriate switches on the top of the unit, and power on the unit.

Technical Details

The basic system utilizes a WDC W65C816S 16 bit microprocessor running at 20 MHz. This CPU can not only fully emulate a CMOS 6502, it can be switched into "native" mode which allows access to 16 bit registers and 16 megabytes of RAM without bank switching, DMA, or paging.

Attached to the CPU is a bank of 64 kilobytes of Read Only Memory (ROM) and 128 kilobytes of high speed static RAM (SRAM). The extra RAM above 64 kB is used to "mirror" the contents of the slower ROM. See below for details.

A number of features designed to maximize the performance of the SuperCPU are being developed into the unit. Since the late 1980's ROM speeds have not been able to keep pace with CPU clock frequencies. With the CMD accelerator moving into the frequency range of newer PC systems, this becomes a problem for the SuperCPU as well. The Commodore typically stores its KERNAL and BASIC code in ROMS, and the SuperCPU will need to read that code. The easiest solution is to read the stock ROMs in the computer, but those ICs can only be accessed at 1 MHz (they are part of that set of older ICs that cannot be utilized at 20 MHz). So, the next option is to copy that code into faster ROMs and install those ROMs into the cartridge. Well, as stated earlier, ROMs of sufficient speed are very expensive and not widely available. So, the third option, which is the one CMD will use, is to copy the KERNAL and BASIC at startup to RAM and write protect the RAM area, making it look like ROM. Fast static RAM (SRAM) is available to meet the 20 MHz clock requirements, and is not terribly expensive, as most new PC systems use the same memory for similar uses. This technique is called ROM shadowing and has been utilized for a few years in the IBM PC community.

The heart of the unit is the Altera Complex Programmable Logic Device (CPLD). Analogous to electonic "glue", this single chip can replace ten or hundreds of discrete ICs in circuits. This unit is responsible for decoding the complex series of signals presented in the expansion port, handling DMA requests to an REU unit, emulating the specialize I/O port found at locations $00 and $01 on the 6510 CPU, and handling the synchronization of the SuperCPU memory and C64 memory.

One item that has plagued accelerator designers for years and minimized the widespread acceptance of accelerators involves this RAM sync operation the Altera CPLD handles. In areas of the stock C64 memory map where only RAM is present, like $0002 - $40959, the synchronization of memory can be handled very easily. However, when dealing with areas like $d000, where RAM and IO can be present, the situation becomes more complex. The SuperCPU overcomes this problem as well, which is important since many video applications use the RAM under IO at $d000 for graphics or text.

As the VIC-II IC in the C64 and C128 requires that screen information be present in on-board memory, memory "mirroring" is necessary. However, CMD has introduced two new technologies, called WriteSmart tm and CacheWrite tm to reduce the slowdown associated with mirroring the SuperCPU SRAM and the slower on-board DRAM. According to documentation, WriteSmart allows the programmer to decide which portions of memory need mirroring. The four selections include "BASIC", where only text and color memory are mirrored, "GEOS", where GEOS foreground bitmap and color memory are mirrored, "ALL", where all 64 kB of RAM is mirrored, and "NONE", where the SuperCPU does not attempt to syncronize memory contents between the two RAM areas.

The other technology, called CacheWrite tm, minimizes the effect of this mirroring. When storing a value into SuperCPU RAM in a range of RAM that requires mirroring, the value is stored not only in SuperCPU RAM, but also into a special cache memory location. The SuperCPU is allowed to continue processing, while the system waits for the on board DRAM to acknowledge readiness to store a value. When successive stores to mirror ranges are done, the system must slow down, but can still operate at about 4 MHz. This speed is achieved because the SuperCPU need not wait for the value to be successfully stored before it attempts to fetch the next opcode and operand. Since opcodes that write value to memory avarage 4 cycles to complete, the SuperCPU can effectively do 4 cycles worth of processing in 1 period of the 1 MHz clock. Note that this slowdown does not occur if the cache is not full when a store instruction is executed.

Features

Being a CMD product, the CMD SuperCPU comes with JiffyDOS, CMD's flagship speed enhancement routines, installed. However, JiffyDOS can be switched out for those applications that fail to run with this serial bus enhancement functionality.

The unit also features compatibility with RAMLink, CMD's RAM drive unit. As the RAMLink fucntions by sharing the CPU with the computer system and runs a special set of instructions called RL-DOS, the SuperCPU contains its own version of RL-DOS optimized to take advantage of the speed and extra features available in the 65C816S. Preliminary information suggests that RAMLink data retrieval, typicially much slower than REU data retrieval, will now operate at speeds approaching that of the REU. In addition, the on-baord RL-DOS will handle usage of the special parallel CMD HD drive cable available with the RAMLink.

For those with expansion in mind, CMD has incorporated a special expansion port internal to the unit. The port, called the "Rocket Socket", will allow access to the complete signal set from the W65C816S CPU and possibly other support ICs. This will allow developers to produce peripheral cards for the unit containing hardware that will run at 20 Mhz. (The cartridge port will still be limited to slow speed.)

Myths About the Unit

In the early phases of development, CMD hinted that possibly extra RAM installed in the unit could be used as a fast RAM disk, a la RAMLink. However, the inability to battery back up that RAM area, coupled with the small increase in speed gained form doing so and the lengthy development time needed to realize this feature, has prompted CMD to abandon this idea for the time being. Later in the development cycle, such an idea might resurface, but the feature is most likely never to be implemented.

Also, early information about the units noted that two speed options would be available, but low support for the slower 10 MHz model prompted CMD to discontinue development on that version. As of now, there is only one speed option available: 20 MHz.

When CMD first announced the unit to the public, it was to include the Western Design Center W65C02S microprocessor. However, in late 1995/early 1996, CMD opted to switch from that CPU to its bigger brother, the W65C816 16 bit CPU, owing to small increase in per item cost, more flexibility, and more expansion options.

Although the speed of the CPU in the SuperCPU unit is running at 20 Mhz, that does not imply all operations will occur twenty times faster. Some operations, like reads from I/O ICs, serial bus operation, and mirroring of video memory, require the CPU to slow down temporarily. This will reduce the effective speed to about 17-18 MHz.

Compatibility Issues

All legal 6502/6510/8502 opcodes are supported in the accelerator. Undocumented or "illegal" opcodes are not supported and will fail.

Although not a compatibility issue, some applications that rely on the CPU running at a certain speed to correctly time events will most likely fail or operate too quickly to be useful. Event or interrupt driven code should operate correctly.

The SuperCPU 64 model will operate correctly with any C64 or C64C model of computer system, as well as with any C128 or C128D in 64 mode. However, CMD has recently announced a 128 native version of the cartridge.

Super128CPU

In early 1996, CMD announced that interest was compelling and that it would begin development on a 128 version of the SuperCPU. As a result of this announcement, the ship date was moved from Februarty to April as CMD validated the SuperCPU design so that it could be used to manufacture both the SuperCPU 64 and SuperCPU 128. Both units will operate at a maximum of 20 MHz, and will most likely be packaged in the same enclosure. The SuperCPU 128 will operate in both 64 mode and native 128 mode. It will not enhance CP/M mode on the C128. CMD announced that the availability of this unit would be Auguest or September of 1996. As far as cost is concerned, a current estimate falls at $300.00, and advance orders are being taken, with a security deposit of US$50.00 needed to place an advance order.

As this announcement was made, some confusion has resulted in the naming scheme. Previously called the SuperCPU or SuperCPU 64/20 (64 model at 20 MHz), the new models are referred to as alternately:

128 Model64 Model
Super128CPUSuper64CPU
SuperCPU 128/20SuperCPU 64/20

Prototype Testing and Benchmarks

As no developer units have shipped as of this date, CMD has the sole unit available for testing and benchmarks. CMD's prototype unit consists of a handwired unit on perfboard. At first, CMD was hesitant that the prototype would actually run at 20 MHz, since such designs are not "clean" and can suffer from eignal degradation, signal skew, and crosstalk, which inhibits operation at higher frequencies. So, with that in mind, early tests were done at 4 MHz. CMD reported in late Fenbruary 1996 that the prototype had been ramped up to 20 MHz and was operating correctly. In fact, the unit appears to run faster than it can, illustrated by the following example:

CMD tested the following program at 1 MHz on a Commodore 64.

 
10 TI$="000000"
20 FORI=1TO10000:NEXT
30 PRINTTI

The result from this test was 660. After enabling the unit, the test was rerun and the result printed out again: 31.

Quick calculations by the CMD personnel verified that the unit was executing this program 21.29 times the normal speed. However, that is impossible, as the CPU is only clocked 20 times the normal speed.

The supposed impossibility is explained if you delve deeper into the timing of the 64. As you many know, the VIC-II "steals" cycles from the CPU in order to refresh the VIC-II video screen. Extra cycles are "stolen" for sprites. With the SuperCPU disabled, the above code runs at 1 Mhz minus the amount of time the VIC-II "steals" from the CPU. With the SuperCPU enabled, the VIC-II does not "steal" cycles from the unit, as the accelerator uses it own private memory area for operation. The VIC, meanwhile, uses the on-board C64 memory.

CMD notes that games that use timers or are event driven function correctly, but those that count processor cycles or utilize spin-wait loops run so quickly as to be virtually unusable.

Of partiular note to Commodore Hacking readers is the test done with the object code for the Polygonamy article elsewhere in this issue. On a stock 64, the program renders approximately 12-13 frames per second. With the SuperCPU enabled, the frame rate jumped to 128 fps. CMD notes that further gains might be realized if the code was modified to cooperate more fully with the SuperCPU memory scheme.

As for Ram Expansion Unit compatibility, CMD responds that the issues have been tackled and that DMA operation is available on the SuperCPU unit. In addition, CMD notes that the CPU need not be running at 1 Mhz to initiate a DMA transfer.

As stated from the beginning, the 64 model of the SuperCPU accelerator will work on the Commodore 128 in 64 mode, and tests have confirmed that the prototype 64 model does indeed frunction correctly any the C128 and C128D.

Conclusion

While it is too early to determine the success of the CMD SuperCPU product, the company has a reputation for delivering stable products packed with features. While no accelerator can guarantee 100% compatibility with all Commodore software, the CMD offering should provide the best compatibility options thus far, due to its solutions to RAM synchronization problems that have plagued accelerator designers for years. The fact that CMD also owns the marketing rights to the GEOS family of software products and manufacturers a wide variety of successful mass media storage devices bodes well for compatibility with those applications and peripherals.

For More Information

To find out more about the CMD SuperCPU family of accelerators, contact CMD at the following address or via email:

Creative Micro Designs, Inc.
P.O. Box 646
East Longmeadow, MA 01028-0646
(413) 525-0023 (Information)
(800) 638-3263 (Ordering only)
cmd.sales@the-spa.com (Internet Contact for Sales)

Advance orders are being taken for all units, and the cost to place an advance order is $50.00.

For programmers, CMD is planning to make available a Developer's Package, which will help those wanting to exploit the potential of the new unit to achieve success. A W65C816S assembler supporting all the new opcodes and addressing modes will be provided, as will documentation pertaining to the unit, the CPU, and its capabilities.


Document Revision B