Does your mind go blank when you hear about the SuperCPU? With all the mention of it in magazines and newsletters, are you left wondering how much of the discussion is hype and how much is true? Are you worried that this latest attempt is just another design destined for failure like the others before it? Well, if so, then you're not alone. With the reputation accelerator cartridges and their manufacturers have acquired over the years, you are wise to be concerned. Judge for yourself, as we peer under the hood of the Creative Micro Designs SuperCPU accelerator cartridges.
Note: The information contained in this article has been gleaned from talks with CMD, Mr. Charlie Christianson's post to comp.sys.cbm, responses to USENET posts by Mr. Doug Cotton, and information from CommodoreWorld Issue #12. While general information is not likely to change, some details discussed in this article may differ slightly from those incorporated in the final product.
An accelerator increases that amount of work done by substituting a faster CPU and clock speed for the 1 MHz 64 CPU. The ratio of increase should be as easy to determine as dividing the new clock frequency by 1 MHz for a 64. If this were true, an accelerator that runs at 4 Mhz would execute things at 4 times the speed of a stock 64. Sadly, this is not true, since not all parts of the system can be sped up to the higher frequency. So, the accelerator runs at full speed while it utilizes ICs designed for the faster clock speed, and slows down when it must "talk" with ICs like the SID and VIC-II in the 64, which run only at the slow 1 MHz clock speed.
Most accelerators are produced as large cartridges that plug into the expansion port of the computer system. Some require special wires be attached to internal components, while others do not.
As development work ensued, progress reports and preliminary information about the product surfaced from CMD. The first items involved the processor choice, which was originally the 65C02S, but is now its bigger brother, the 16 bit 65C816S. Another piece of information involved the case, which is an enclosure 6" wide by 2" deep by 3" wide. This enclosure contains a circuit board protruding from the front of the unit that will plug into the Commodore 64 or 128 expansion port. In back, a complementary card edge connector is provided to pass signals through the cartridge. This will allow users to attach other expansion port cartridges to the system. On top sit three switches, described below.
The first switch enables or disables the SuperCPU unit. The second switch enables or disables JiffyDOS, which is built into the unit. The third switch determines the speed of the unit. This third switch has three positions. The first position forces the accelerator to operate at 1 MHz speed (the same speed as the stock C64). The second position allows the programmer to change the speed via a register in the SuperCPU memory map. The third position locks the SuperCPU into 20 MHz mode, regardless of register settings.
The use of the CMD SuperCPU will be straightforward. Simply plug the unit into the expansion port, set the appropriate switches on the top of the unit, and power on the unit.
Attached to the CPU is a bank of 64 kilobytes of Read Only Memory (ROM) and 128 kilobytes of high speed static RAM (SRAM). The extra RAM above 64 kB is used to "mirror" the contents of the slower ROM. See below for details.
A number of features designed to maximize the performance of the SuperCPU are being developed into the unit. Since the late 1980's ROM speeds have not been able to keep pace with CPU clock frequencies. With the CMD accelerator moving into the frequency range of newer PC systems, this becomes a problem for the SuperCPU as well. The Commodore typically stores its KERNAL and BASIC code in ROMS, and the SuperCPU will need to read that code. The easiest solution is to read the stock ROMs in the computer, but those ICs can only be accessed at 1 MHz (they are part of that set of older ICs that cannot be utilized at 20 MHz). So, the next option is to copy that code into faster ROMs and install those ROMs into the cartridge. Well, as stated earlier, ROMs of sufficient speed are very expensive and not widely available. So, the third option, which is the one CMD will use, is to copy the KERNAL and BASIC at startup to RAM and write protect the RAM area, making it look like ROM. Fast static RAM (SRAM) is available to meet the 20 MHz clock requirements, and is not terribly expensive, as most new PC systems use the same memory for similar uses. This technique is called ROM shadowing and has been utilized for a few years in the IBM PC community.
The heart of the unit is the Altera Complex Programmable Logic Device (CPLD). Analogous to electonic "glue", this single chip can replace ten or hundreds of discrete ICs in circuits. This unit is responsible for decoding the complex series of signals presented in the expansion port, handling DMA requests to an REU unit, emulating the specialize I/O port found at locations $00 and $01 on the 6510 CPU, and handling the synchronization of the SuperCPU memory and C64 memory.
One item that has plagued accelerator designers for years and minimized the widespread acceptance of accelerators involves this RAM sync operation the Altera CPLD handles. In areas of the stock C64 memory map where only RAM is present, like $0002 - $40959, the synchronization of memory can be handled very easily. However, when dealing with areas like $d000, where RAM and IO can be present, the situation becomes more complex. The SuperCPU overcomes this problem as well, which is important since many video applications use the RAM under IO at $d000 for graphics or text.
As the VIC-II IC in the C64 and C128 requires that screen information be present in on-board memory, memory "mirroring" is necessary. However, CMD has introduced two new technologies, called WriteSmart tm and CacheWrite tm to reduce the slowdown associated with mirroring the SuperCPU SRAM and the slower on-board DRAM. According to documentation, WriteSmart allows the programmer to decide which portions of memory need mirroring. The four selections include "BASIC", where only text and color memory are mirrored, "GEOS", where GEOS foreground bitmap and color memory are mirrored, "ALL", where all 64 kB of RAM is mirrored, and "NONE", where the SuperCPU does not attempt to syncronize memory contents between the two RAM areas.
The other technology, called CacheWrite tm, minimizes the effect of this mirroring. When storing a value into SuperCPU RAM in a range of RAM that requires mirroring, the value is stored not only in SuperCPU RAM, but also into a special cache memory location. The SuperCPU is allowed to continue processing, while the system waits for the on board DRAM to acknowledge readiness to store a value. When successive stores to mirror ranges are done, the system must slow down, but can still operate at about 4 MHz. This speed is achieved because the SuperCPU need not wait for the value to be successfully stored before it attempts to fetch the next opcode and operand. Since opcodes that write value to memory avarage 4 cycles to complete, the SuperCPU can effectively do 4 cycles worth of processing in 1 period of the 1 MHz clock. Note that this slowdown does not occur if the cache is not full when a store instruction is executed.
The unit also features compatibility with RAMLink, CMD's RAM drive unit. As the RAMLink fucntions by sharing the CPU with the computer system and runs a special set of instructions called RL-DOS, the SuperCPU contains its own version of RL-DOS optimized to take advantage of the speed and extra features available in the 65C816S. Preliminary information suggests that RAMLink data retrieval, typicially much slower than REU data retrieval, will now operate at speeds approaching that of the REU. In addition, the on-baord RL-DOS will handle usage of the special parallel CMD HD drive cable available with the RAMLink.
For those with expansion in mind, CMD has incorporated a special expansion port internal to the unit. The port, called the "Rocket Socket", will allow access to the complete signal set from the W65C816S CPU and possibly other support ICs. This will allow developers to produce peripheral cards for the unit containing hardware that will run at 20 Mhz. (The cartridge port will still be limited to slow speed.)
Also, early information about the units noted that two speed options would be available, but low support for the slower 10 MHz model prompted CMD to discontinue development on that version. As of now, there is only one speed option available: 20 MHz.
When CMD first announced the unit to the public, it was to include the Western Design Center W65C02S microprocessor. However, in late 1995/early 1996, CMD opted to switch from that CPU to its bigger brother, the W65C816 16 bit CPU, owing to small increase in per item cost, more flexibility, and more expansion options.
Although the speed of the CPU in the SuperCPU unit is running at 20 Mhz, that does not imply all operations will occur twenty times faster. Some operations, like reads from I/O ICs, serial bus operation, and mirroring of video memory, require the CPU to slow down temporarily. This will reduce the effective speed to about 17-18 MHz.
Although not a compatibility issue, some applications that rely on the CPU running at a certain speed to correctly time events will most likely fail or operate too quickly to be useful. Event or interrupt driven code should operate correctly.
The SuperCPU 64 model will operate correctly with any C64 or C64C model of computer system, as well as with any C128 or C128D in 64 mode. However, CMD has recently announced a 128 native version of the cartridge.
As this announcement was made, some confusion has resulted in the naming scheme. Previously called the SuperCPU or SuperCPU 64/20 (64 model at 20 MHz), the new models are referred to as alternately:
128 Model | 64 Model |
---|---|
Super128CPU | Super64CPU | SuperCPU 128/20 | SuperCPU 64/20 |
CMD tested the following program at 1 MHz on a Commodore 64.
10 TI$="000000" 20 FORI=1TO10000:NEXT 30 PRINTTI
The result from this test was 660. After enabling the unit, the test was rerun and the result printed out again: 31.
Quick calculations by the CMD personnel verified that the unit was executing this program 21.29 times the normal speed. However, that is impossible, as the CPU is only clocked 20 times the normal speed.
The supposed impossibility is explained if you delve deeper into the timing of the 64. As you many know, the VIC-II "steals" cycles from the CPU in order to refresh the VIC-II video screen. Extra cycles are "stolen" for sprites. With the SuperCPU disabled, the above code runs at 1 Mhz minus the amount of time the VIC-II "steals" from the CPU. With the SuperCPU enabled, the VIC-II does not "steal" cycles from the unit, as the accelerator uses it own private memory area for operation. The VIC, meanwhile, uses the on-board C64 memory.
CMD notes that games that use timers or are event driven function correctly, but those that count processor cycles or utilize spin-wait loops run so quickly as to be virtually unusable.
Of partiular note to Commodore Hacking readers is the test done with the object code for the Polygonamy article elsewhere in this issue. On a stock 64, the program renders approximately 12-13 frames per second. With the SuperCPU enabled, the frame rate jumped to 128 fps. CMD notes that further gains might be realized if the code was modified to cooperate more fully with the SuperCPU memory scheme.
As for Ram Expansion Unit compatibility, CMD responds that the issues have been tackled and that DMA operation is available on the SuperCPU unit. In addition, CMD notes that the CPU need not be running at 1 Mhz to initiate a DMA transfer.
As stated from the beginning, the 64 model of the SuperCPU accelerator will work on the Commodore 128 in 64 mode, and tests have confirmed that the prototype 64 model does indeed frunction correctly any the C128 and C128D.
Creative Micro Designs, Inc.
Advance orders are being taken for all units, and the cost to place an advance order is $50.00.
For programmers, CMD is planning to make available a Developer's Package, which will help those wanting to exploit the potential of the new unit to achieve success. A W65C816S assembler supporting all the new opcodes and addressing modes will be provided, as will documentation pertaining to the unit, the CPU, and its capabilities.