DESIGN AND IMPLEMENTATION OF A 'REAL' OPERATING SYSTEM FOR THE 128: PART II

by Craig S. Bruce < csbruce@ccnga.uwaterloo.ca>

0. PREFACE

There has been a slight change in plans. I originally intended this article to give the design of a theoretical distributed multitasking microkernel operating system for the C128. I have decided to go a different route: to take out the distributed component for now and implement a real multitasking microkernel OS for a single machine and extend the system to be distributed later. The implementation so far is, of course, only in the prototype stage and the application for it is only a demo. Part III of this series will extend this demo system into, perhaps, a usable distributed operating system.

1. INTRODUCTION

The previous article talked about the general approach to building a multitasking microkernel OS for the C128. It is assumed here that you have read and understood the previous article. This article goes into the grungy details of implementing such a beast. The prototype kernel implementation provides system calls to create and "exit" user processes, obtain status information, delay execution of a process for a specified period of time, and to perform message-passing interprocess communication.

Currently, there is no real memory management, no real device drivers, and no process-resource reclamation. More "infrastructure" features need to be added before a command-shell environment or any such thing could be supported, though not toooo many more; the Commodore-Kernal Server in the demo system makes the $FFD2 (CHROUT) routine of the Commodore Kernal available to all other processes in the demo system. It could easily be modified to provide all of the Commodore-Kernal features to the other processes, thereby giving us a basic I/O sub-system.

There is also no way to dynamically load external programs, so the test programs have to be assembled with the kernel code. Loading external programs in this type of environment has the requirement that the program will have to be relocated upon being loaded to an address that would only be known at load time. There are two ways to go on this: load all programs to a fixed address or load them to dynamic addresses. If you load them to a fixed address, then you can only have one processes loaded and concurrently running on each bank of internal memory of the C128 and then demand-swap all of the other processes into and out of these (two) slots, presumably from REU memory (any other type would be much too slow). IHMO, even with REU memory, this would be too slow, especially for a microkernel environment. So, programs will need to be loaded to dynamic addresses. Fortunately, I have a program-relocation mechanism in the works.

The entire kernel and the demo program fits in C128 memory in the slot between $1300 and $1BFF of RAM0. The kernel uses storage from $C00-$CFF and $2000-$BFFF. The latter section of memory is used up in 768-byte chunks by each new process, so there can be a total of 53 concurrently executing user processes in the system.

2. TEST PROGRAM

The test program includes no provisions for user interaction, so it may not be something that you can impress your friends with, but I can assure you that all kinds multitasking stuff is going on behind the scenes to make everything happen.

The demo test program creates ten processes. There are five "delay" processes, two "blabbering" processes, a Commodore-Kernal-Server process, one SID-banging process, and the Null process. (Note that when I use the word "kernel" I am referring to the OS that I have written, and when I use the word "Kernal", I am referring to the Commodore Kernal in ROM).

The purpose of the Commodore-Kernal Server is to receive requests from the worker processes to call the CHROUT routine to print a given line of text out to the screen. The Kernal server is the only processes that is allowed to call the Commodore-Kernal routines. Since it can only process one request from a client process at a time, calls to the Commodore Kernal are effectively "serialized" (made to happen one after another in time), which is good, since things would blow up pretty badly if two accesses the Commodore Kernal were to happen concurrently.

The "delay" processes, numbered from 1 to 5, each delay for a period of N seconds and then request the Kernal Server to print a "I'm alive" message to the screen. The number N of seconds to delay is the number of the process. You should be able to observe from watching the execution that each delay process prints a message to the screen with approximately the correct period between messages. Note that while these processes are delaying, they don't use any CPU time, so the CPU time is allocated to other processes. If you try holding down the C= key to slow the scrolling or run the system in Slow mode, you will still notice that the delay processes generate their output at approximately the right time.

The two "blabbering" processes, named "blabber" and "spinner", continually send print messages to the Commodore-Server process. You will observe that the messages from each comes pretty much prefectly interleaved with each other, and with the output of the delay process, because of the interprocess communication scheduling policy: FIFO (first-in, first-out).

The SID-banging process runs continuously in the background. It increments the 16-bit frequency of voice #1 from $0000 to $ffff and then suspends its execution for two seconds and then repeats. A square wave with even pulse widths is used. When the SID process delays for two seconds, you will notice that the printing operation of the other processes speeds up a bit. This is because the SID process is a heavy CPU user while it is active. The amount of slow-down of the rest of the system is limited a bit because the SID process runs with a slightly lower priority than the other processes in the system (which makes the sound increment slightly slower than it otherwise would -- there's only so much CPU to go around).

The Null process is not actually needed in this exercise, but it would normally be used to insure that the system always had some process to schedule. All that it does is increment the binary value in locations $0400-$0403 (on the 40-column screen) whenever it is active. It has the lowest priority in the system and never gets to execute unless all of the other processes are blocked (i.e., are suspended for some reason).

When you get tired of watching the demo, you can just hit the RESTORE key. This will cause an NMI which will make the system exit back to BASIC. The system does not have the ability to handle external events (like key strokes) at this time. A couple of locations on the 40-column screen are used for status information, so you will want to run the demo on the 80-column screen.

3. PROCESS CONTROL

A process is a user program that is in an active state of execution. A process is periodically given a certain amount of CPU time to execute its code, and then CPU attention is taken away from it to execute other processes. This may sound like you're simply making N processes run N times slower, and this is true in the worst case, but the normal case is that many processes in the system will be blocked (for whatever reason) and will not require any more CPU time until they wake up again (for whatever reason). Therefore, multitasking is a "winnable" proposition.

In our system, the process that the CPU is currently executing is changed every 1/60 of a second. This is a convenient "quantum" period for a number of reasons, including the fact that, thanks to the MMU of the C128, "context switching" can be efficiently performed this quickly.

3.1. PROCESS-CONTROL CALLS

There are six kernel calls that deal with process control:

CALL NAME     INPUT ARGUMENTS                  RETURN VALUES
-----------   ------------------------------   -------------------------------
Create        ( .AY=address, .X=priority )     .AY=newPid, .CS=err(.A=errcode)
Exit          ( .A=code, .X=$00 )              <none>
MyPid         ( )                              .AY=pid
MyParentPid   ( )                              .AY=parentPid
Suspend       ( )                              <none>
Delay         ( .AY=jiffies )                  <none>

3.1.1. CREATE

The Create() kernel call is used to create a new process. The first input argument is the code address, and it is passed in the .AY register (.A is loaded with the low byte of the address, and the .Y register is loaded with the high byte of the address--.AY for short). The code must be present in memory and be ready to be executed, since there is no facility for loading external programs. Also, if the code is not re-entrant, then there must be no other process already executing it or things will likely blow up. Re-entrant code is code that can be executed by multiple processes simultaneously without conflicts, essentially because there are no global variables that could be banged on by more than one process at a time.

The priority argument is the priority to execute the new process at. Valid values for this argument are on the range 0 to 127. The system keeps a list at all times of all the processes that are ready to execute, called the "ready list". The way that the scheduling works is that a pointer to the "active" process is kept (the one that is currently executing) and this pointer cycles through the ready list, trying to activate each process in turn, every 1/60 of a second. This is roundrobin scheduling. The priority of a process determines the number of cycles that the active-process pointer has to take through the list before the process is activated. So, if a process has priority 1, then it will be activated on every round; if it has a priority of 2, then it will be activated on every second round; and if it has a priority of 86, then it will be activated only on every 86th round through the ready list. The higher the priority value, the slower the process executes. This policy gives a fair allocation of the CPU to the various processes in the system.

Normally, foreground processes (ones that perform actions right in front of the user's face) should have a priority of 1, and background processes should have a relatively lower priority. A priority value of 0 for a process means that when the process is activated, it will not be deactivated again until it blocks for some reason. This priority level should be reserved for urgent computations that block often, since it has the potential to starve out the rest of the system. The Null process executes at a special priority level (255) that makes it so that it will only be activated if there are no other processes in the ready list.

The Create() call returns with the carry flag clear and the process id of the newly created process in the .AY register upon success, or returns with the carry flag set and an error code in the .A register upon failure. I do not have the complete list of error conditions figured out a this time, but errors will usually happen on a call like this because of a lack of resources (memory) for the kernel's internal data structures. Upon successful return, the child process is created, made ready, and may be activated by the system at any time. The first instruction to be executed by the child will be at the value given for the code address.

The newly created process will have a clear individual stack, except for a couple of bytes on the very bottom of it (high addresses), and a clear individual zeropage. Processes are allowed to make full use of every location in their zeropage, except for the I/O registers at locations 0 and 1, and full use of their stack, except that they must make sure that about a dozen bytes are available on the stack at all times in case an interrupt happens.

3.1.2. EXIT

The Exit() kernel call is used to remove the current process from the system. There are two input arguments: the .A register contains the return code that will be made available to the parent process if it is interested (though not in the current implementation), and a value in the .X register, which must currently be $00. I haven't figured out exactly how the exit mechanism should work yet and it currently only has a minimal implementation. The call does make the kernel reclaim the resources that were allocated to the process yet, although this functionality will be needed in any real operating system.

There is no return value from the Exit() call, because the call never returns. The semantics of the call is the the process calling Exit() will never be made active again. All processes should call Exit() when they are finished executing, or they can achieve the same result by executing an RTS instruction at the end of their main routine. The kernel pushes the address of an Exit() stub routine onto the top of the stack of a user process when it is created, and the user process will exit with a return code of $00 in this case.

3.1.3. MY_PID

The MyPid() kernel call is used to return the process identifier of the current process. This call is very simple, takes no arguments, and executes very quickly. The return value is the process id of the calling process and is returned in the .AY register. This call cannot fail, because the current process must exist in order to make the call in the first place, so there is no error-return condition.

3.1.4. MY_PARENT_PID

The MyParentPid() kernel call is used to return the process identifier of the parent process of the current process (i.e., the process that created the current process). This call is simple, takes no arguments, and executes quickly, very much like the MyPid() call. No error returns are possible, and the process id of the parent to the current process is returned in the .AY register. But note: it is not guaranteed that the parent process will still exist either before or after the current process makes this call; it may have Exit()ed. I may re-think this semantic.

This call is useful for setting up interprocess communication between a child process and its parent.

3.1.5. SUSPEND

The Suspend() kernel call is used to suspend the execution of the currently executing process for an indefinite period of time. Currently, this period of time is forever, since there is no corresponding "Resume" system call that another process can call in order to wake up the process that suspended itself. The reason that this call is made available is because the guts of what it does is required by other kernel operations, and the cost of making this call user-accessible was three 6502 instructions. This call may be retracted in the future, since it may cause programmers to do bad things.

The call takes no arguments, returns no values, and currently, will never return at all, much like Exit().

3.1.6. DELAY

The Delay() kernel call is used to suspend the execution of the current process for a user-specified period of time. The delay period is given in units of jiffies (1/60ths of a second). The unsigned 16-bit delay period is passed in in the .AY registers, giving a maximum possible delay period of about 18 minutes. If a user process requests to delay for a period of zero jiffies, its execution will not be suspended at all and the Delay() primitive will return immediately.

Since there may be other processes in the system doing things when the current processes wakes up after doing a delay, you can think of the process delaying for "at least" the period of time that you specify. Actually, to muddy things even more, your process will always go to sleep at a moment in time that is inbetween two ticks of the jiffy clock, so the first "jiffy" that your process waits may actually be any period between a couple of microseconds to almost a full jiffy, with a statistical average of half a jiffy. This is an artifact of any coarse-tick-based mechanism.

To muddy things again, the jiffy ticks, which are currently based on VIC raster interrupts (one per screen update), may not be processed immediately when they occur, since the IRQ may be delayed by a small period of time if interrupts are disabled in the processor status register when the jiffy tick happens. And finally, you should note that you will have a difficult time using this call for true "real time" periodic operations, like performing some specific task precisely every tenth of a second, since the call specifies a period to delay for, rather than a time to wake up at. The actual period of your process' activations will be determined by the waiting time plus the time skew caused by the processing that your process does. A DelayUntil() call easily could be implemented, if I figure that it will be needed for anything.

Currently, the scheduling policy is to make processes active immediately after they are awakened, so this makes the activities of other processes less of a worry to accurate timing. Unix does a similar thing by giving a freshly awakened process a temporarily high priority, since it is probably likely that the process will do some small think and then block again. This policy statistically improves concurrency.

3.2. PROCESS CONTROL BLOCKS

A Process Control Block (PCB) is the data structure that the kernel keeps the information that it needs to know about a process in. A Process identifier (pid) is actually the RAM0 address of the process control block of a process, for convenience, though this will have to change later. The fields of the process control block are shown here, organized into classes:

OFF   SIZ   CLASS   LABEL
---   ---   -----   ------
  0     2   queue   pcbNext
  2     2   queue   pcbPrev
  4     1   queue   pcbIsHead
  5     1   queue   pcbQCount
  6     1   ctxt    pcbSP
  7     1   ctxt    pcbStackPage
  8     1   ctxt    pcbZeroPage
  9     1   ctxt    pcbD506
 10     1   sched   pcbPriority
 11     1   sched   pcbCountdown
 12     2   sched   pcbWakeupTime
 12     2   ipc     pcbSendMsgPtr  (overlap)
 12     2   ipc     pcbRecvMsgPtr  (overlap)
 14     2   ipc/q   pcbSendQHead
 16     2   ipc/q   pcbSendQTail
 18     1   ipc/q   pcbSendQFlag
 19     1   ipc/q   pcbSendQCount
 20     2   ipc     pcbBlockedOn
 22     2   ipc     pcbReceiveFrom
 24     2   proc    pcbParent
 26     1   proc    pcbState
 27     -   -       SIZE

3.2.1. QUEUE-CLASS FIELDS

The first four fields, of the class "queue" are used for maintaining a process control block in queues with other PCBs. Some general-purpose queue-handling routines have been written to make queue management easier: QueueInit(), QueueInsert(), and QueueUnlink(). Each queue has a head node, and the nodes in a doubly linked circular order. This means that each node in the queue has a forward ("pcbNext") and a backward ("pcbPrev") pointer and that the first node points back to the head and the last node in a list points forward to the head. This organization removes all of the quirks of handling null pointers from the code. Using a doubly linked organization makes it easy to remove an arbitrary node from the middle of a queue.

Each node also has a "pcbIsHead" field which is always False (zero) and a "pcbQCount" field which is always zero. The head is the same as an entry in the queue, except that its "pcbIsHead" field is set to True ($ff) and its "pcbQCount" field records the number of nodes that are in the queue at any time. The "pcbIsHead" field is checked when scanning a list to tell if you've bumped back into the head node again, indicating the end of the list. The "pcbQCount" field is very convenient to check to see whether the queue is empty or not.

All of the processes that are ready to execute in the system are kept in the ready queue. The PCB of the Null process acts as the head for this queue, and is also an active node in the queue (a small but harmless kluge). The pointer to the active process is kept in a kernel-zero-page variable, and sweeps through the circularly linked ready-process list to activate new processes. The active PCB is not removed from the ready list while it is active.

3.2.2. CONTEXT-CLASS FIELDS

The next four fields, of the class "ctxt", store the "context" of a process that is not stored on the process' stack when it is not executing. These fields include space for the stack pointer, the stack page, the zeropage, and the contents of the MMU register at location $d506. The stack pointer is what was in the SP register of the CPU when the process last paused. The stack page and the zeropage values are the values in MMU registers $d505 and $d507, respectively; these are the page numbers of the pages in RAM0 memory that are allocated to a process. These pages can only be in RAM0 unless common memory is disabled, for hardware reasons. I may allow these pages to be in either RAM0 or RAM1 if there is a need later. The $d506 register of the MMU stores the most-significant bits of the RAM bank that is selected if you have expanded internal memory on your 128 (a la TwinCities-128) and the bank selection for REU (DMA) operations.

The rest of a process' context is stored on its stack. Here is what a process' stack looks like just after it has been created:

ADDR   SP-REL   DESCRIPTION
----   ------   ------------
$ff    sp+09    exitaddr-1.h
$fe    sp+08    exitaddr-1.l
$fd    sp+07    pc.h
$fc    sp+06    pc.l
$fb    sp+05    status register
$fa    sp+04    .A
$f9    sp+03    .X
$f8    sp+02    .Y
$f7    sp+01    $ff00 save
$f6    sp+00    -empty-

The "exitaddr" is the address of the routine in the kernel that will terminate a process if it executes an RTS from its main routine. The address is in the regular low-high order (although it is pushed on high-low since the stack grows downward in memory) but the value pushed is actually one less than the address of the routine, because this is what JSR pushes onto the stack and this is what RTS expects to find. The "pc" low and high fields give the address of the next instruction to be executed by the process when it is activated. When the process is first created, this will be the address of the first instruction. The "pc" value is the actual address, not one before, because this is what a hardware interrupt pushes onto the stack, and this is what the RTI instruction expects to find.

The "status register", ".A", ".X", and ".Y" fields contain the values to be loaded into the corresponding registers inside of the CPU when the process is activated. For a new process, these values are all zero.

The "$ff00 save" is the value to be loaded into the $ff00 "shadow" register of the MMU when the process is activated. This gives the memory context that the process is to execute in. As the kernel currently only works with one bank configuration (RAM0, NO BASIC ROM, I/O enabled, KERNAL ROM enabled), this is the value put here when a process is created.

The final field is "-empty-" because the stack pointer in the 6502 points to the next location in stack memory that will be used. All other values in the stack are relative to this. Upon startup, the stack-pointer field of the process control block will be set to $f6, which is what the table above shows.

The stack contents look exactly the same after a process has been interrupted by a hardware interrupt, except that the stack pointer will likely be lower in the stack memory, so the absolute addresses in stack memory in the above table no longer apply and the "exitaddr" bytes are not part of the interrupt context. That things look the same is no coincidence; on startup, we set up the stack to make things look as if an interrupt had just occurred, and to start a process executing, we execute the code that returns from an interrupt, which loads the "context" that is on the stack into the process registers, and we are ready to rock.

The above stack organization is exactly the same as it is for processing interrupts normally using the Commodore-Kernal environment on the 128, and this too is no coincidence because the code that sets up the stack like this upon an interrupt is burned into the Kernal ROM and there is very little that I can do about it. Fortunately, the organization is just fine for our purposes.

3.2.3. SCHEDULE-CLASS FIELDS

The next three fields, of class "sched", are used to schedule the process. The "pcbPriority" field gives the relative priority of the process according to the scheme already discussed. The "pcbCountdown" field is used to keep count of the number of remaining times that the process will have to be bypassed in cycling through the ready queue before the process will be activated again. When a process gives up the CPU upon the expiration of its time quantum, the "pcbCountdown" field is loaded with the pcbPriority of the process. When the "pcbCountdown" value reaches zero, the process is selected for activation.

The "pcbWakeupTime" field is used with the Delay() kernel call to indicate the absolute system time when the process should be activated again. The current time in the system is kept in a 16-bit kernel variable, and wraps around every 18.2 minutes. If the process is not currently time-delayed, then the WakeupTime field is not used (in fact, the memory may be used to record other status information).

3.2.4. IPC-CLASS FIELDS

The eight eight fields, of classes "ipc" and "ipc/q" are used for interprocess communication (message passing). The first two fields, "pcbSendMsgPtr" and "pcbRecvMsgPtr", are used for storing temporary values for handling message requests; the four "ipc/q"-class fields are used to implement the head of a queue of processes that are waiting to communicate with the current process, and the "pcbBlockedOn" field indicates which process this process is waiting to communicate with, if the current process is waiting. The first two fields actually overlap with each other and with the "pcbWakeupTime" field discussed earlier. This is okay since none of the fields will store active status information at the same time. The last field, "pcbReceiveFrom" is not used at this time, but will be used in the future for an primitive to receive a message only from a specific process. The interprocess communication is discussed in much greater detail later.

3.2.5. PROC-CLASS FIELDS

The final two fields, of class "proc", store information about the status of the process. I guess the same can be said of all the other fields. Anyway, the "pcbParent" field indicates which process is the process that created the current process, and the return value for the MyParentPid() kernel call is taken from this field.

The "pcbState" field gives the current state of the process. Here are the different possible process states:

STATE NAME        CODE
---------------   ----
STATE_READY       $c0
STATE_SEND        $c1
STATE_RECEIVE     $c2
STATE_REPLY       $c3
STATE_DELAY       $c4
STATE_SUSPENDED   $c5

The STATE_READY state means that the process is in the ready queue. The STATE_SEND, STATE_RECEIVE, and STATE_REPLY states mean that a process is waiting for some interprocess communication primitive to be called by the process that it is communicating with. The STATE_DELAY state means that the process has called the Delay() primitive and is waiting for some period of real time to pass before it can be activated again. The STATE_SUSPENDED state means that a process has called the Suspend() or Exit() primitive and will never be made active again (currently, Suspend() and Exit() mean the same thing).

The state information is needed for some operations, and will definitely be needed by a Kill()-process operation, to find out what state a process is in so that it can be removed from any queue or whatever it is in, in order to obliterate all information about the process from the system. Currently, there is no Kill() call.

3.3. TASK CREATION

When the user calls for the creation of a new process in the system, lots of stuff has to happen. First, memory for the process control block, zero page, and stack page must be allocated. Currently, this allocation is performed very simply, by keeping a page pointer and incrementing it every time a page is allocated. The process control block ends up getting allocated 256 bytes even though it actually requires much less than that. Thus, each new process chews up 768 bytes of memory space (plus code). Also, there is currently no mechanism for recovering the memory allocated to a process, which is okay since the mechanism for Exit()ing a process is incomplete too.

The address that a PCB gets allocated at is used for the process' PID (process identifier). This is particularly useful since the real purpose of a PID is to conveniently locate the process control block. This will have to change in the future, however, since the PIDs will also have to locate the machine that a process is on.

Then the process control block must be initialized. It is initialized as follows:

FIELD            SIZ   CLASS   INITIAL VALUE
--------------   ---   -----   --------------
pcbNext            2   queue   0
pcbPrev            2   queue   0
pcbIsHead          1   queue   0
pcbQCount          1   queue   0
pcbSP              1   ctxt    $f6
pcbStackPage       1   ctxt    set to newly allocated space
pcbZeroPage        1   ctxt    set to newly allocated space
pcbD506            1   ctxt    $04
pcbPriority        1   sched   set to the given argument
pcbCountdown       1   sched   set to the same value as "pcbPriority"
pcbWakeupTime      2   sched   0 (overloaded field)
pcbSendQHead       2   ipc/q   set by the QueueInit() function for empty queue
pcbSendQTail       2   ipc/q   set by the QueueInit() function for empty queue
pcbSendQFlag       1   ipc/q   set by the QueueInit() function for empty queue
pcbSendQCount      1   ipc/q   set by the QueueInit() function for empty queue
pcbBlockedOn       2   ipc     0
pcbReceiveFrom     2   ipc     0
pcbParent          2   proc    set to the id of the creator process
pcbState           1   proc    STATE_SUSPENDED

The values on the top of the stack page are also initialized for the new process, as follows:

ADDR   SP-REL   DESCRIPTION       INITIAL VALUE
----   ------   ---------------   --------------
$ff    sp+09    exitaddr-1.h      high byte of ExitAddr-1: the exit routine
$fe    sp+08    exitaddr-1.l      low byte of ExitAddr-1: the exit routine
$fd    sp+07    pc.h              high byte of the code-execution address
$fc    sp+06    pc.l              high byte of the code-execution address
$fb    sp+05    status register   $00
$fa    sp+04    .A                $00
$f9    sp+03    .X                $00
$f8    sp+02    .Y                $00
$f7    sp+01    $ff00 save        $0e  (Kernal ROM, I/O, rest RAM0)
$f6    sp+00    -empty-           --

And now, the new process is ready for action. We insert it into the ready queue in the next position after the current process, set its state to STATE_READY (actually, both of these operations are performed by the MakeReady() function, which is generally useful and is called from a number of places) and then we exit back to the calling process, returning the process id of the calling process. I should change this a little bit in the future, to make it exit to the newly created child process if the priority of the child process is greater than the priority of the parent.

3.4. CONTEXT SWITCHING

Context switching describes the procedure of switching control of the processor from a user process to the kernel and then switching control back to a user process. Normally, there is only one "style" of context switching in a system, but for a couple of design reasons, BOS actually has three "styles" of context switching: IRQ switching, JSR switching, and quick JSR switching. IRQ-style switching is the one type normally implemented in operating systems for other architectures, so it will be the one that we cover first.

IRQ-style context switching involves saving the full context of a process onto its stack and into its process control block, switching into the kernel, doing work, switching back out of the kernel, and reloading the full context of a user process and activating to it. All of the work of saving and restoring the the stack portion of a process' context is handled by the ROM routines for IRQ (and NMI and BRK) handling. All we have to do is locate the current process control block, save the zero-page, stack-page, stack-pointer, and $d506 registers into the PCB, and load a $00 into the zero-page MMU register to switch to the kernel's zeropage (where some of the kernel's variables are stored). Note that the interrupt will be executed using the user process' stack; therefore, enough space should always be available on user stacks to handle this system overhead.

When we are done processing the interrupt, we execute the priority-management algorithm that was described earlier to select the next process to activate, and then restore the zero-page, stack-page, stack-pointer, and $d506 registers and execute the ROM stack-handling code for exiting from an interrupt. Note that there's a chance that we might well be exiting to a different user process from the one that was active when the interrupt occurred. There aren't many registers to save and restore, so context switching has a fairly low overhead, so there is no problem in doing it (at least) sixty times a second.

JSR-style context switching is pretty much the same as IRQ-style context switching, except that the stack will not have most of the processor registers already saved on it; it will only have the return address that performed the JSR. Immediately upon entering the kernel, interrupts are disabled to prevent all sorts of bad things from happening. Then a function is called, EnterKernel(), which will pull the return address of the process that called the JSR off the stack and increment it by one (since we will be exiting by using an RTI instruction rather than an RTS) and saves the other processor registers onto the stack in the same way that the interrupt-handling code in ROM would. Then we save the four additional registers into the PCB as before, activate the kernel zeropage, and we are switched in.

This style of context switching is used for kernel calls that will cause the calling process to block (like a non-zero Delay()). It would have been possible to organize the kernel calls to be entered by executing a BRK instruction, which would have caused the stack to be already set up in the same way as with IRQ interrupts, but I decided against this for two reasons: efficiency, it would have been slower to do this, and debugging (security?), since I only want the BRK condition to signal a bug in the code. The exit from this type of context switch is the same as for the IRQ style of context switch, since things are rigged to end up looking the same on the stack. This is a good thing, since the action that will cause a Delay()ed process to be re-activated will, in fact, be an IRQ interrupt.

Quick JSR-style context switching is used for kernel calls that will not block or cause a new process to be activated when they finish, such as MyPid() or (currently) Create(). No context has to be saved since the function will get in and out very quickly; all we have to do is switch to the kernel's zeropage and then switch back to the user's zeropage before exit.

There's one more note to make about return values. For the quick JSR-style context switch, there is no problem with return values, since we just have to load them into the processor registers and exit. With the full JSR-style context switch, the return values have to be put onto the user stack into the positions in the stack memory the hold the processor register contents, since these values will be what are restored into the processor immediately upon the return to the user process. There are no return values associated with the IRQ style of context switching (and there'd better not be), since an interrupt can happen at any point in the execution of a user process.

3.5. DELAY PRIMITIVE

There are two complementary halves to the implemention of the Delay() primitive: the half that is called by the user and causes a process to go to sleep, and the half that wakes up a sleeping process at the correct time. This latter half is executed by the 60-Hz system interrupt.

3.5.1. USER HALF OF THE DELAY PRIMITIVE

The first thing that the user half of the Delay() primitive does is check to see if the delay period is zero jiffies. If it is, then the primitive returns immediately to the calling process without rescheduling (without skipping to the next ready process in line). I may change this semantic, because it is often useful to have a primitive that yeilds process execution to the next ready process without actually blocking the current process.

If the delay period is longer than zero jiffies, then the current process is suspended and removed from the ready queue, and the absolute time that the process is to be reawakened is calculated and put into the "pcbWakeupTime" field of the PCB for the current process. The absolute wakeup time is calculated, of course, by adding the number of jiffies to delay to the current absolute time, which is maintained by the system and incremented on every (60 Hz) system interrupt.

Then the current process control block is inserted into the delay queue at the correct position. The delay queue is a queue (implemented in the standard way) of process control blocks for processes which are asleep, ordered by the absolute wakeup time of each process such that the process that will be awakened at the nearest time in the future is at the head of the list and that the process which will be awakened at the farthest point in the future is at the tail. The following diagram gives an example:

CurrentTime = 2016 +---------+ +---------+ +---------+ +---------+ --->| Proc A |----->| Proc B |----->| Proc C |----->| Proc D | | wakeup: | | wakeup: | | wakeup: | | wakeup: | <---| @ 2345 |<-----| @ 2765 |<-----| @ 54999 |<-----| @ 441 | +---------+ +---------+ +---------+ +---------+ (ct+5.5sec) (ct+12.5sec) (ct+14.7min) (ct+17.8min) There is a rub here: only 16 bits are used for storing times, which equals about 18.2 minutes, so we have to worry about time quantities overflowing and wrapping around. For example, if the current time is 48232 and a process wants to sleep for 18000 jiffies (5 minutes), then its wakeup time would be at 696 jiffies, accounting for the 16-bit wraparound, which is a lower numerical value than the current time, or than the wakeup time of any other process that will wake up before the current-time wraparound. In fact, all timers have this wraparound problem (although with 64-bit times, wraparound periods would be expressed in millions of millennia rather than in minutes). Sixteen bits is a good number of bits to use, however, because that is the maximum delay period (2^16-1).

When we insert a new process into the delay queue, we scan the delay queue from the head and continue until we find a record that has a time that is higher than or equal to the wakeup time of the new process (or we hit the end of the queue). Then, we insert the new process immediately before this point. To handle the wraparound problem, all comparisons of wakeup times are done using 17 bits (well, really 24 bits). For each value in the comparison, we add 65536 to it (set its 17th bit) if the value is less than the current time. We don't have to worry about the current time changing while we are doing this, because interrupts will be disabled for the entire time that we are executing the system call, as per usual. Things could go horribly wrong anyway if interrupts were not disabled.

Okay, so now our delaying process is removed from the ready queue, its complete context is saved, and it is put into the delay queue at the right spot. So, set the active process pointer to the next ready process in the system and finish by activating the next ready process.

3.5.2. SYSTEM HALF OF THE DELAY PRIMITIVE

During each 60-Hz system interrupt, the current time (jiffy counter) is incremented by one. Note that since this timer is only 16 bits wide, it is not suitable for keeping track of the current time of day; for this purpose, the TOD clocks in the CIA chips should (and will) be used. The jiffy counter may also be inaccurate if interrupts are disabled for a long period of time, such as they are during some Commodore-Kernal I/O operations.

After incrementing the time, the kernel checks to see if any Delay()ed processes need to be woken up. If there are no processes in the delay queue, then this is a quick check. If there are any processes in the queue, then if the wakeup time of the head process is equal to the current time, then that process is woken up and this check is performed repeatedly until the condition fails, since there may be multiple processes that want to be woken up at the same jiffy of absolute time. Note that because of the scheduling for a freshly unblocked process, the process that Delay()ed first will be the first one activated after it is woken up, if there are multiple processes woken up at the start of the same jiffy.

3.6. SYSTEM BOOTSTRAPPING

Operating systems always have a bootstrapping problem, because you always need to use the services of the operating system in order to start it up, but, of course, it's not started up yet, chicken and egg, catch-22. So, what usually ends up happening is that you just "fake it", start from somewhere, get the ball rolling, and snowball up to a fully running system.

The first thing that the kernel does is change all of the interrupt vectors (IRQ, NMI, and BRK) to my custom routines. I need to cover all of the interrupts, since I chave the zero page during the execution of the system, and if a BRK or NMI were to happen and be serviced by the Commodore-Kernal ROM routines, all hell would break loose. Currently, the NMI and BRK routines just clean things up and return to BASIC.

Then we initialize the kernel variables, including the delay queue and the jiffy counter.

And then we fake the creation of the Null process. For the purposes of bootstrapping, the Null process doubles as the "Boot" process. Its process control block is not allocated in the normal way, either; it is at a fixed location, and its PCB doubles as the head of the process list. A kluge here and a hack there and the Null process is initialized and "joined in progress". Then, the Null process creates the Init process, using a standard call to the Create() primitive, and then the Null process goes into an endless loop of incrementing the 32-bit value at addresses $400-$403, the first four locations of the 40-column screen memory. It doesn't matter whether you run BOS with the clock in Fast or Slow mode, except in terms of performance.

It is the responsibility of the Init process to start up all of the user processes in the user application after Init starts running. In the current implementation, Init starts up all of the other processes in the test application and then becomes the Commodore-Kernal Server, which is a convenient organization, since all of the other processes can find out the pid of the Kernal Server merely by calling MyParentPid().

4. INTERPROCESS COMMUNICATION

In this system, processes are not strictly independent and competitive; many must cooperate and comunicate to get work done. To facilitiate this interprocess communication (IPC), a particular paradigm was chosen: the Remote Procedure Call (RPC) paradigm. RPC is a message-passing scheme that is used with the much-hyped Client/Server system-architecture model. Its operation parallels the implicit operations that take place when you call a local procedure (a subroutine).

The RPC message-passing paradigm is also coupled with a shared-memory paradigm to offer greater performance for passing around massive amounts of data. All processes in the system (and in the entire distributed system when this OS is extended) have global access to all of the memory in the system. The coupling of the two paradigms is such that you get the best of both worlds: the convenence and natural interprocess *coordination* (synchronization) semantics of RPC and the convenience and raw performance of shared storage.

4.1. MESSAGE-PASSING CALLS

The kernel provides three primitives for message passing:

CALL NAME     INPUT ARGUMENTS                  RETURN VALUES
-----------   ------------------------------   -------------------------------
Send          ( .AY=msgHead )                  .CS=err(.A=errcode)
Receive       ( .AY=msgHead )                  .AY=senderPid
Reply         ( .AY=msgHead[msgRet,msgData] )  .CS=err(.A=errcode)

These calls will send a message from one process (the client) to another process (the server) and wait for a reply, receive a message from another process (a client), and reply to a message sent from another process (a client) that has been received, respectively.

4.1.1. MESSAGE-HEADER DATA STRUCTURE

Each of the message-passing primitives requires a pointer to a message-header data structure that is stored in the user program's data space. The message header must be initialized with appropriate values before a message can be sent. Note that this scheme of passing a pointer to a message header allows you to have multiple message headers lying around, initialized and ready for action, and you can easily pick between them. Here is what a message header looks like:

OFF   SIZ   CLASS   LABEL
---   ---   -----   ------
  0     2   pid     msgTo
  2     2   pid     msgFrom
  4     4   buf     msgBuf
  8     2   buf     msgLen
 10     4   buf     msgRepBuf
 14     2   buf     msgRepLen
 16     1   data    msgOp
 17     1   data    msgRet
 18     2   data    msgObj
 20     4   data    msgData
 24     -   -       SIZE

You should not put too much faith in the offsets of the fields in the data structure remaining static; you should always use the label to access the fields of the structure, as in:

sta myMessageHeader+msgTo+0
sty myMessageHeader+msgTo+1

4.1.1.1. PID-CLASS FIELDS

The first two fields, of class "pid", are used to identify the processes involved in an RPC interaction. The "msgTo" field is the pid of the process that a message is to be/has been sent to, and the "pcbFrom" field is the id of the process which a message has been received from. For security reasons, the sender does not fill in the "pcbFrom" field; the kernel does after the message has been sent and the sender is blocked. (Or else the sender could fake being someone else). The "pcbTo" field is used as the destination for when a message is being sent and must be filled in with a legitimate value on a send operation, and the "pcbFrom" field is used as the destination when a message is being replied to, and must be filled in with a legitimate value on a reply operation. The "pcbTo" field is the only field of the message header that actually needs to have a legitimate value before a message can be sent.

4.1.1.2. BUF-CLASS FIELDS

The next four fields, of class "buf", point out the send and reply buffers in memory and the sizes of each. The send buffer ("msgBuf"/"msgLen") is expected to point to a region of near/far memory that contains valid data for a send operation, and the reply buffer ("msgRepBuf"/"msgRepLen") is expected to point to a valid area of memory for the server to fill in with any bulky result data from an RPC request. Each of the message-buffer pointers is four bytes in size to allow for future expansion when the kernel will support "far" memory that will be accessed through 32-bit pointers. User processes are expected to access these "far" buffers directly themselves, through the global shared memory. This eliminates the system overhead of uselessly copying bulky data from place to place.

There are two special notes to make about there "buf" fields. First, they don't actually have to be used how they're intended to be used. as long as both the client and the server agree on what the contents of these fields are supposed to mean. In this respect, the fields can be used to quickly pass twelve bytes of completely arbitrary information. This is useful because many RPCs only require that a small amount of information be transferred from one process to another, or at least that bulky data be passed in only one direction (like read or write), so that one of the buffer pointers is free to be used quick, tiny data.

Second, on the sending side, the "buffer" that is pointed to does not have to be a "buffer" at all; it can be an arbitrary data structure that has an arbitrary number of pieces, scattered throughout the global memory of the system. The only responsibility of the sender is to insure that no one else will be attempting to modify the shared data simultaneously while the server is accessing it. This scheme is quite ingenious, I think (thank you, thank you). (The scheme may appear to have a security leak in the design, but our system has no real hardware security anyway).

The expected usage of buffers will be for the sender to use near memory for the request and reply buffers and access them as regular near memory to construct and interpret request and reply messages. The receiver will (in the future) access the buffers as far memory (which they may very well be since processes will be allowed to execute on different banks of internal memory and even on different machines), and may wish to fetch parts of messages into near memory for processing. The use of far pointers makes it so that data is copied only when necessary, and copied only once.

4.1.1.3. DATA-CLASS FIELDS

The final four fields, of class "data", are intended to be used to conveniently pass small amounts of arbitrary data. This data can be arbitrary, but the fields do have a convention that should usually be followed, unless both parties agree to an alternative usage.

The "msgOp" field is intended to be the "operation code" that a client process wishes a server to execute. The "msgRet" field is intended to be the return/ error code that is returned from the server to the client upon completion of an operation. The "msgObj" field is intended to be used by the client to indicate which of the server's "objects" the client wishes to perform the operation on. And the "msgData" field is intended to contain four bytes of arbitrary user data that is passed in with an operation and is passed back from the server to give return values. In the spirit of these semantics, the data in all of the fields is send with a request, but only the data in the "msgRet" and "msgData" fields is passed back in a reply operation. None of the other fields are passed back in a reply operation (the field values will remain how they were before the send, for the sender). Take special note that the "msgRepLen" field will not be passed back; if there is less data returned than was asked for by an operation, you will have to encode the "actual" reply-buffer length into the "msgData" field.

4.1.2. SEND

Send() is used to transmit a message to a remote process and get back a reply message. The .AY register contains the near-memory address of the message header, which must have its "msgTo" field filled in to be the pid of the process that the message is being sent to. The sending process will suspend its execution while it is waiting for remote process to process its request. If there is to be bulky reply data for the request (such as there would be for a "read" request to a file server), then space for the reply buffer must be allocated and indicated in the message header. The reply-buffer space should normally be owned by the sender.

If there is an error in passing the message, the the error return will be indicated by the carry flag being set and the error code will be returned in the .A register. Some possible errors will be, in the future: destination process is not valid, and that destination process died before receiving/ replying to your message. (These conditions are not currently checked). Also in the future, this call will work completely transparently for passing messages between machines in a network.

4.1.3. RECEIVE

Receive() is used to receive a message transmitted by a remote process to the current process. The receiver will block until another process does a corresponding Send() operation, and then the message header sent by the sender will be retrieved into the message-header buffer pointed to by the .AY register, for this call. No error returns are possible. The pid of the sending process will be returned in the .AY register as well as in the "msgFrom" field of the receive-message-header buffer. The receiver is then expected to eventually call the Reply() primitive to re-awaken the sender. The receiver is free to do anything it wants to after receiving a message from a process, including receiving messages from other processes. Messages are received from other processes in FIFO order.

A similar ReceiveSpecific() primitive may be provided in the future. It would only accept a message from a specifically named process and would enqueue all other messages that are received before the specific message, to be received later.

4.1.4. REPLY

Reply() is used to re-awaken a process that sent a message that was Receive()d by the current process. The current process is expected to have set up the return information in the reply-message-header buffer and the reply buffer area according to the client's wishes before calling the Reply() primitive. The near address of the reply-message-header buffer is loaded into the .AY register as an argument to the call. Only the "msgFrom", "msgRet", and "msgData" fields need to have values. The "msgFrom" field identifies the process to send the reply message to, and that process must be in the state of waiting for a reply from the Reply()ing process, or an error will be returned. An error is indicated by the carry flag being set on return and the error code is loaded in the .A register. In the case of an error, no action will have been performed by the system.

4.2. IMPLEMENTATION

The fields of the process control block that are used for message passing are restated here:

OFF   SIZ   CLASS   LABEL
---   ---   -----   ------
 12     2   ipc     pcbSendMsgPtr  (overlap)
 12     2   ipc     pcbRecvMsgPtr  (overlap)
 14     2   ipc/q   pcbSendQHead
 16     2   ipc/q   pcbSendQTail
 18     1   ipc/q   pcbSendQFlag
 19     1   ipc/q   pcbSendQCount
 20     2   ipc     pcbBlockedOn
 22     2   ipc     pcbReceiveFrom

The "pcbBlockedOn" field is used to allow Reply() to verify that the pid it is instructed to send a reply message to is indeed waiting for a reply from the task calling Reply(). The "pcbSendQ*" fields constitute a queue head for a list of process control blocks that are waiting to send a message to the current process. The "pcbSendMsgPtr" and "pcbRecvMsgPtr" fields are used to save the message data parameters of a Send() or Receive() call, respectively, when it has to be suspended without a transfer of the message header. When the other process involved performs the corresponding operation, the first process' header buffer pointer is recovered from its process control block. The "pcbReceiveFrom" field is unused at this time.

The process states of STATE_SEND, STATE_REPLY, and STATE_RECEIVE are used with message passing. The STATE_SEND state means that the current process has sent a message to a server process and is waiting for it to do a Receive(). The STATE_REPLY state means that the current process has sent a message to a server process, the message has been Receive()d, and that the current process is waiting for the server process to perform a Reply(). The STATE_RECEIVE state means that the current process has performed a Receive() and is waiting for some other process to perform a corresponding Send(). These state names/meanings may be a bit inconsistent; deal with it.

The implementation of the actual Send(), Receive(), and Reply() operations is actually quite straight-forward. Both Send() and Receive() have to handle two possibile situations: either the other process involved has already performed its corresponding operation and is waiting, or it has not. Reply() is simplified in that it knows that the sender is already waiting for its reply so it can proceed to copy the reply-message-header contents directly.

The Send() primitive (will) checks the given destination pid for validity and then checks the state of the recipient process. If the recipient process is in STATE_RECEIVE, the Send() function copies the message-header contents directly to the receive-header buffer of the recipient. The address of the receive-header buffer is taken from the "pcbRecvMsgPtr" field of the receiver's process control block in this case. The receiver's return value (the sending process' pid) is set up (on the receiving process' stack) and the receiver is awakened while the sender is put to sleep, in STATE_REPLY state (since the receive has already happened, it is waiting for the corresponding Reply()).

If the recipient process is not in the STATE_RECEIVE state, then the sending process will have to wait for the recipient to perform a Receive(). The sender's message-header buffer address is stored into its process control block, the sender's process control block is linked into the recipient process' "pcbSendQ*", and the sender is put to sleep, in the STATE_SEND state.

The Send() function does not set up the return value for the user's system call since that will not be known until another process performs the corresponding Reply(). A return value is set up immediately only in the case of an error. The possible error returns from Send() are: invalid pid and reply too long (in which case the reply is truncated).

The Receive() primitive first checks its "pcbSendQ*" to see if any processes have already tried to send a message to the receiver. If there is a process there, the sender's process control block is removed from the head of the send queue then the sender process' state is changed to STATE_REPLY and the sent message-header contents (dereferenced by the sender's "pcbSendMsgPtr" pointer) are copied into the receiver's message-header buffer. The Receive() primitive then exits returning the pid of the sender. No error returns are possible.

If there is no process enqueued in the recipient process' "pcbSendQ*", then the receiving process is put to sleep in the STATE_RECEIVE state and its message-header buffer pointer is copied into its process control block.

The Reply() primitive verifies that the destination process is valid (but not in the current implementation) and is actually awaiting a reply from the replying process. If not, it craps out. Otherwise, it copies the two message-header fields and awakens the sender. The return value of the sender is (already) set up to be carry-clear (no error) and the Reply() primitive returns error-free too.

The Exit() kernel call does not currently recover from a process performing a Receive() and then Exit()ing before performing the corresponding Reply(). Some care will have to be taken to insure that all process involved in IPC can consistently recover if one of the processes gets blown away, for whatever reason (including Exit()). Such consistent recovery has to be carefully thought out for any kind of operating system; however, since there are only a small number of kernel concepts in this one, consistent recovery is that much easier to insure.

5. CONCLUSION

So there ya have it; the start of a real operating system for the Commodore 128. What the operating system needs in terms of features is to be extended to execute processes on any bank of internal memory, to access far memory, and to be distributed so that it will work across multiple hosts. What it needs in terms of software is: device drivers, a command shell, utility programs, and an assembler that can produce relocatable code. Oh where, oh where shall I ever find such software??? ;-)

APPENDIX A. SOURCE-CODE LISTING

The source code follows. Extract everything between the "-----=-----" lines and save into a file named "bos.s" (or whatever) and then run it through the ACE assembler to generate the executable program (which is also included below for your convenience). The ACE assembler is available for free with the ACE-128/64 system.

I have not gone through and fully documented the source code, since I have been sitting on this program for quite a while and am in a rush to get it out the door. Besides, the functionality of each important component has already been discussed.

-----=----- ;simple multitasking kernel by Craig Bruce, started 25-Oct-1994. ;This program is written in the ACE-Assembler format. org $1300 jmp main ;======== declarations ======== pcbNext = 00 ;(2) mgmt pcbPrev = 02 ;(2) mgmt pcbIsHead = 04 ;(1) mgmt pcbQCount = 05 ;(1) mgmt pcbSP = 06 ;(1) ctxt pcbStackPage = 07 ;(1) ctxt pcbZeroPage = 08 ;(1) ctxt pcbD506 = 09 ;(1) ctxt pcbPriority = 10 ;(1) sche pcbCountdown = 11 ;(1) sche pcbWakeupTime = 12 ;(2) sche (overlap) pcbWaitEvent = 12 ;(1) sche (overlap) pcbSendMsgPtr = 12 ;(2) sche (overlap) pcbRecvMsgPtr = 12 ;(2) sche (overlap) pcbSendQHead = 14 ;(2) ipc pcbSendQTail = 16 ;(2) ipc pcbSendQFlag = 18 ;(1) ipc pcbSendQCount = 19 ;(1) ipc pcbBlockedOn = 20 ;(2) ipc pcbReceiveFrom = 22 ;(2) ipc pcbParent = 24 ;(2) proc pcbState = 26 ;(1) proc pcbSize = 27 STATE_READY = $c0 STATE_SEND = $c1 STATE_RECEIVE = $c2 STATE_REPLY = $c3 STATE_DELAY = $c4 STATE_SUSPENDED = $c5 STATE_EVENT = $c6 KERN_ERR_OK = $e0 KERN_ERR_PID_NOT_REPLY = $e1 msgTo = 0 ;(2) msgFrom = 2 ;(2) msgBuf = 4 ;(4) msgLen = 8 ;(2) msgRepBuf = 10 ;(4) msgRepLen = 14 ;(2) msgOp = 16 ;(1) msgRet = 17 ;(1) msgObj = 18 ;(2) msgData = 20 ;(4) msgSize = 24 queueHeadSize = 6 nullPcb : buf pcbSize delayQueue : buf queueHeadSize jiffyTime : buf 2 activePid = 02 ;(2) p = 04 ;(2) q = 06 ;(2) pcbPtr = 08 ;(2) msgPtr = 10 ;(2) pageAlloc = 12 ;(1) ;Stack: ($ff) : exitaddr-1.h ; ($fe) : exitaddr-1.l ; ($fd) sp+07: pc.h ; ($fc) sp+06: pc.l ; ($fb) sp+05: status register ; ($fa) sp+04: .A ; ($f9) sp+03: .X ; ($f8) sp+02: .Y ; ($f7) sp+01: $ff00 save ; ($f6) sp+00: -empty- bkBOS = $0e bkUser = $0e bkSelect = $ff00 vic = $d000 sid = $d400 mmuZeroPage = $d507 mmuStackPage = $d509 IrqExit = $ff33 ; Create ( .AY=address, .X=priority ) : .AY=pid ; Exit ( .A=code, .X=$00 ) ; MyPid ( ) : .AY=pid ; MyParentPid ( ) : .AY=parentPid ; Suspend ( ) ; Delay ( .AY=jiffies ) : .CS=err ; Send ( .AY=msgBuf ) : .CS:.A=err ; Receive ( .AY=msgBuf ) : .AY=senderPid ; Reply ( .AY=msgBuf[msgRet,msgData] ) : .CS:.A=err ;======== kernel code ======== main = * sei ;** entry lda #bkBOS sta bkSelect ;** set interrupt vectors lda #<IrqHandler ldy #>IrqHandler sta $0314 sty $0315 lda #<BrkHandler ldy #>BrkHandler sta $0316 sty $0317 lda #<NmiHandler ldy #>NmiHandler sta $0318 sty $0319 ;** initialize delay queue lda #0 sta jiffyTime+0 sta jiffyTime+1 lda #<delayQueue ldy #>delayQueue sta q+0 sty q+1 jsr QueueInit ;** initialize null/boot process lda #<nullPcb ldy #>nullPcb sta nullPcb+pcbNext+0 sty nullPcb+pcbNext+1 sta nullPcb+pcbPrev+0 sty nullPcb+pcbPrev+1 sta activePid+0 sty activePid+1 lda #$ff sta nullPcb+pcbIsHead lda #0 sta nullPcb+pcbQCount lda #>$2000 sta pageAlloc lda #STATE_READY sta nullPcb+pDESIGN lda #2 sta nullPcb+pcbPriority cli jmp Null Null = * ;** create init process lda #<Init ldy #>Init ldx #1 jsr Create ;** go into endless loop - inc $0400 bne + inc $0401 bne + inc $0402 bne + inc $0403 + jmp - NmiHandler = * BrkHandler = * Shutdown = * ;** restore interrupt vectors sei lda #<$fa65 ldy #>$fa65 sta $0314 sty $0315 lda #<$fa40 ldy #>$fa40 sta $0318 sty $0319 ldx #250 txs lda #$00 sta mmuZeroPage sta mmuZeroPage+1 ldx #$01 stx mmuStackPage sta mmuStackPage+1 lda #%00000100 sta $d506 cli jmp $4db7 zpSave : buf 1 createAddr : buf 2 createPriority : buf 1 createZeropage : buf 1 createStack : buf 1 createPcb : buf pcbSize Create = * ;( .AY=address, .X=priority ) : .AY=pid sei ;** switch in sta createAddr+0 sty createAddr+1 stx createPriority lda mmuZeroPage sta zpSave lda #$00 sta mmuZeroPage ;** allocate resources lda #$00 ldy pageAlloc sta pcbPtr+0 sty pcbPtr+1 iny sty createZeropage iny sty createStack iny sty pageAlloc cpy #>$c000 bcc + brk ; recover gracefully from the condition of running out of memory + ;** initialize pcb ;** pcbNext ;(2) mgmt := 0 ;** pcbPrev ;(2) mgmt := 0 ;** pcbIsHead ;(1) mgmt := 0 ;** pcbQCount ;(1) mgmt := 0 ;** pcbSP ;(1) ctxt := $f6 ;** pcbStackPage ;(1) ctxt := new ;** pcbZeroPage ;(1) ctxt := new ;** pcbD506 ;(1) ctxt := $04 ;** pcbPriority ;(1) sche := given ;** pcbCountdown ;(1) sche := priority ;** pcbWakeupTime ;(2) sche := 0 ;** pcbSendQHead ;(2) ipc := QueueInit ;** pcbSendQTail ;(2) ipc := QueueInit ;** pcbSendQFlag ;(1) ipc := QueueInit ;** pcbSendQCount ;(1) ipc := QueueInit ;** pcbBlockedOn ;(2) ipc := 0 ;** pcbReceiveFrom ;(2) ipc := 0 ;** pcbParent ;(2) proc := creator ;** pcbState ;(1) proc := STATE_SUSPENDED ldx #pcbSize-1 lda #$00 - sta createPcb,x dex bpl - lda #$f6 sta createPcb+pcbSP lda createStack sta createPcb+pcbStackPage lda createZeropage sta createPcb+pcbZeroPage lda #$04 sta createPcb+pcbD506 lda createPriority sta createPcb+pcbPriority sta createPcb+pcbCountdown lda activePid+0 ldy activePid+1 sta createPcb+pcbParent+0 sty createPcb+pcbParent+1 lda #STATE_SUSPENDED sta createPcb+pcbState ldy #pcbSize-1 - lda createPcb,y sta (pcbPtr),y dey bpl - lda pcbPtr+0 clc adc #pcbSendQHead sta q+0 lda pcbPtr+1 adc #0 sta q+1 jsr QueueInit ;** initialize new stack ;** Stack: ($ff) : exitaddr-1.h := >ExitAddr ;** ($fe) : exitaddr-1.l := <ExitAddr ;** ($fd) sp+07: pc.h := >Addr ;** ($fc) sp+06: pc.l := <Addr ;** ($fb) sp+05: status register := $00 ;** ($fa) sp+04: .A := $00 ;** ($f9) sp+03: .X := $00 ;** ($f8) sp+02: .Y := $00 ;** ($f7) sp+01: $ff00 save := $0e ;** ($f6) sp+00: -empty- lda #$00 ldy createStack sta p+0 sty p+1 ldy #$f6+1 lda #bkUser sta (p),y ;$ff00 iny ldx #4 lda #$00 - sta (p),y iny dex bne - lda createAddr+0 sta (p),y iny lda createAddr+1 sta (p),y iny lda #<DefaultExit-1 sta (p),y iny lda #>DefaultExit-1 sta (p),y ;** make new process ready jsr MakeReady ;** switch out lda pcbPtr+0 ldy pcbPtr+1 ldx zpSave stx mmuZeroPage clc cli rts MakeReady = * ;( (pcbPtr)=pcb ) ;after activePid ldy #pcbState lda #STATE_READY sta (pcbPtr),y lda #<nullPcb ldy #>nullPcb sta q+0 sty q+1 lda activePid+0 ldy activePid+1 sta p+0 sty p+1 jsr QueueInsert rts QueueInit = * ;( (q)=queueHead ) lda q+0 ldy q+1 sta queueInitVals+pcbNext+0 sty queueInitVals+pcbNext+1 sta queueInitVals+pcbPrev+0 sty queueInitVals+pcbPrev+1 lda #$ff sta queueInitVals+pcbIsHead lda #0 sta queueInitVals+pcbQCount ldy #queueHeadSize-1 - lda queueInitVals,y sta (q),y dey bpl - rts queueInitVals : buf queueHeadSize QueueInsert = * ;( (q)=queueHead, (p)=nodeToInsertAfter, (pcbPtr)=newItem ) ;** q->count +:= 1 clc ldy #pcbQCount lda (q),y adc #1 sta (q),y ;** pcbPtr->next := p->next ldy #pcbNext lda (p),y sta (pcbPtr),y iny lda (p),y sta (pcbPtr),y ;** pcbPtr->prev := p iny lda p+0 sta (pcbPtr),y iny lda p+1 sta (pcbPtr),y ;** p->next->prev := pcbPtr ldy #pcbNext lda (p),y sta q+0 iny lda (p),y sta q+1 ldy #pcbPrev lda pcbPtr+0 sta (q),y iny lda pcbPtr+1 sta (q),y ;** p->next := pcbPtr ldy #pcbNext lda pcbPtr+0 sta (p),y iny lda pcbPtr+1 sta (p),y rts QueueUnlink = * ;( (q)=queueHead, (pcbPtr)=node ) ;uses p ;** pcbPtr->next->prev := pcbPtr->prev ldy #pcbNext lda (pcbPtr),y sta p+0 iny lda (pcbPtr),y sta p+1 ldy #pcbPrev lda (pcbPtr),y sta (p),y iny lda (pcbPtr),y sta (p),y ;** pcbPtr->prev->next := pcbPtr->next ldy #pcbPrev lda (pcbPtr),y sta p+0 iny lda (pcbPtr),y sta p+1 ldy #pcbNext lda (pcbPtr),y sta (p),y iny lda (pcbPtr),y sta (p),y ;** q->count -:= 1 ldy #pcbQCount lda (q),y sec sbc #1 sta (q),y rts IrqHandler = * cld lda #bkBOS sta bkSelect lda vic+$19 bpl + and #1 bne Sixty + lda $dc0d Sixty = * sta vic+$19 ;** save full context lda mmuZeroPage ldx #$00 stx mmuZeroPage ldy #pcbZeroPage sta (activePid),y ldy #pcbSP tsx txa sta (activePid),y ldy #pcbStackPage lda mmuStackPage sta (activePid),y ldy #pcbD506 lda $d506 sta (activePid),y ;** process interrupt inc jiffyTime+0 bne + inc jiffyTime+1 + lda delayQueue+pcbQCount beq + jsr DelayIrqAwake + nop ;** select new process - ldy #pcbPriority ;give cur full count lda (activePid),y iny sta (activePid),y beq ++ - ldy #pcbNext ;find next proc lda (activePid),y tax iny lda (activePid),y stx activePid+0 sta activePid+1 ExitKernel = * ldy #pcbCountdown lda (activePid),y beq ++ sec sbc #1 sta (activePid),y beq + jmp - + ;check if null process ldy #pcbIsHead lda (activePid),y bpl + iny lda (activePid),y ;only run null if only proc bne -- + ;we've got a winner ;** restore full context and exit ldy #pcbD506 lda (activePid),y sta $d506 ldy #pcbStackPage lda (activePid),y sta mmuStackPage ldy #pcbSP lda (activePid),y tax txs ldy #pcbZeroPage lda (activePid),y sta mmuZeroPage jmp IrqExit DefaultExit = * lda #$00 ldx #$00 Exit = * ;( .A=code, .X=$00 ) jmp Suspend brk MyPid = * ;( ) : .AY=pid lda #$00 ldx mmuZeroPage sta mmuZeroPage lda activePid+0 ldy activePid+1 stx mmuZeroPage clc rts MyParentPid = * ;( ) : .AY=parentPid lda #$00 ldx mmuZeroPage sta mmuZeroPage ldy #pcbParent lda (activePid),y pha iny lda (activePid),y tay pla stx mmuZeroPage clc rts enterKernSave : buf 4 EnterKernel = * ;** set up process stack as if it had performed an interrupt ;** necessary if process will block ;** called as a one-level-deep subroutine of the system call sta enterKernSave+2 ;** save system-call return address pla sta enterKernSave+0 pla sta enterKernSave+1 ;** increment user-process return address (rts -> rti) pla clc adc #1 sta enterKernSave+3 pla adc #0 pha lda enterKernSave+3 pha ;** set up processor registers as-is, status $00 lda #$00 pha lda enterKernSave+2 pha txa pha tya pha lda $ff00 ;xxx change for multi-banks pha ;** save info into pcb lda mmuZeroPage ldx #$00 stx mmuZeroPage ldy #pcbZeroPage sta (activePid),y dey lda mmuStackPage sta (activePid),y dey tsx txa sta (activePid),y ldy #pcbD506 lda $d506 sta (activePid),y ;** restore system-call return address ;** (continue to use user-process stack) lda enterKernSave+1 pha lda enterKernSave+0 pha rts Suspend = * ;( ) ;suspend self sei jsr EnterKernel jsr SuspendSub jmp ExitKernel SuspendSub = * ;( activePid ) : activePid, pcbPtr, q=nullPcb ;** Remove the active pid from the ready queue and set another pid to ;** active; set pcbPtr to point to the suspended process; and set the ;** process state to "suspended". lda activePid+0 sta pcbPtr+0 lda activePid+1 sta pcbPtr+1 lda #<nullPcb ldy #>nullPcb sta q+0 sty q+1 jsr QueueUnlink ldy #pcbNext lda (pcbPtr),y sta activePid+0 iny lda (pcbPtr),y sta activePid+1 ldy #pcbState lda #STATE_SUSPENDED sta (pcbPtr),y rts Delay = * ;( .AY=jiffies ) : .CS=err cmp #0 bne + cpy #0 bne + clc rts + sei sta delayTime+0 sty delayTime+1 jsr EnterKernel jsr SuspendSub ldy #pcbState lda #STATE_DELAY ldy #pcbWakeupTime clc lda delayTime+0 adc jiffyTime+0 sta delayTime+0 sta (pcbPtr),y iny lda delayTime+1 adc jiffyTime+1 sta delayTime+1 sta (pcbPtr),y lda #0 rol sta delayTime+2 lda #<delayQueue ldy #>delayQueue sta q+0 sty q+1 sta p+0 sty p+1 jsr DelayFindSpot jsr QueueInsert jmp ExitKernel delayTime : buf 3 pTimeHi : buf 1 DelayFindSpot = * ;( (q)=queue, (p)=queueHead, (pcbPtr) ) : p=prevNode jsr IncPtrP ldy #pcbIsHead lda (p),y bne DelayFindSpotExit ldy #pcbWakeupTime lda (p),y cmp jiffyTime+0 iny lda (p),y sbc jiffyTime+1 ldx #0 bcs + inx + stx pTimeHi dey lda delayTime+0 cmp (p),y iny lda delayTime+1 sbc (p),y lda delayTime+2 sbc pTimeHi bcs DelayFindSpot DelayFindSpotExit = * ;xx fall through DecPtrP = * ;( (p) ) : (p):=(p)->prev ldy #pcbPrev lda (p),y tax iny lda (p),y stx p+0 sta p+1 rts IncPtrP = * ;( (p) ) : (p):=(p)->next ldy #pcbNext lda (p),y tax iny lda (p),y stx p+0 sta p+1 rts DelayIrqAwake = * lda delayQueue+pcbNext+0 ldy delayQueue+pcbNext+1 sta pcbPtr+0 sty pcbPtr+1 ldy #pcbWakeupTime lda (pcbPtr),y cmp jiffyTime+0 beq + rts + iny lda (pcbPtr),y cmp jiffyTime+1 beq + rts + lda #<delayQueue ldy #>delayQueue sta q+0 sty q+1 jsr QueueUnlink jsr MakeReady jmp DelayIrqAwake msgPtrSave : buf 2 Send = * ;( .AY=msgBuf ) : .CS:.A=err sei sta msgPtrSave+0 sty msgPtrSave+1 jsr EnterKernel jsr SuspendSub lda msgPtrSave+0 ldy msgPtrSave+1 sta msgPtr+0 sty msgPtr+1 ldy #msgTo lda (msgPtr),y sta q+0 iny lda (msgPtr),y sta q+1 ;xx should verify that receiver is a process ldy #pcbSendMsgPtr lda msgPtr+0 sta (pcbPtr),y iny lda msgPtr+1 sta (pcbPtr),y ldy #pcbBlockedOn lda q+0 sta (pcbPtr),y iny lda q+1 sta (pcbPtr),y ldy #pcbState lda (q),y cmp #STATE_RECEIVE beq SendToReceiverBlocked lda #STATE_SEND sta (pcbPtr),y clc lda q+0 adc #pcbSendQHead sta q+0 bcc + inc q+1 + ldy #pcbPrev lda (q),y sta p+0 iny lda (q),y sta p+1 jsr QueueInsert jmp ExitKernel SendToReceiverBlocked = * lda #STATE_REPLY sta (pcbPtr),y ldy #pcbRecvMsgPtr lda (q),y sta p+0 iny lda (q),y sta p+1 jsr CopyMessage lda pcbPtr+0 ldy pcbPtr+1 ldx q+0 stx pcbPtr+0 ldx q+1 stx pcbPtr+1 ldx #$00 clc jsr SetReturn jsr MakeReady jmp ExitKernel setretSave : buf 4 SetReturn = * ;( (pcbPtr)=proc, .AXY=regvals, .C=cval ) : (p)=junk sta setretSave+2 stx setretSave+1 sty setretSave+0 php pla and #$01 sta setretSave+3 ldy #pcbStackPage lda (pcbPtr),y sta p+1 ldy #pcbSP lda (pcbPtr),y clc adc #2 sta p+0 ldy #3 - lda setretSave,y sta (p),y dey bpl - rts CopyMessage = * ;( (pcbPtr)=sender, (msgPtr)=sendmsg, (p)=recvmsg ) ldy #msgFrom lda pcbPtr+0 sta (msgPtr),y iny lda pcbPtr+1 sta (msgPtr),y ldy #msgSize-1 - lda (msgPtr),y sta (p),y dey bpl - rts Receive = * ;( .AY=msgBuf ) : .AY=senderPid sei sta msgPtrSave+0 sty msgPtrSave+1 lda mmuZeroPage pha lda #$00 sta mmuZeroPage ldy #pcbSendQCount lda (activePid),y bne ReceiveFromSender pla sta mmuZeroPage jsr EnterKernel jsr SuspendSub lda #STATE_RECEIVE sta (pcbPtr),y ldy #pcbRecvMsgPtr lda msgPtrSave+0 sta (pcbPtr),y iny lda msgPtrSave+1 sta (pcbPtr),y jmp ExitKernel ReceiveFromSender = * ;( (activePid), (msgPtrSave) ) lda activePid+0 ldy activePid+1 clc adc #pcbSendQHead bcc + iny + sta q+0 sty q+1 ldy #pcbSendQHead lda (activePid),y sta pcbPtr+0 iny lda (activePid),y sta pcbPtr+1 jsr QueueUnlink ;( (q)=queueHead, (pcbPtr)=node ) ;uses p ldy #pcbSendMsgPtr lda (pcbPtr),y sta msgPtr+0 iny lda (pcbPtr),y sta msgPtr+1 lda msgPtrSave+0 ldy msgPtrSave+1 sta p+0 sty p+1 jsr CopyMessage ;( (pcbPtr)=sender, (msgPtr)=sendmsg, (p)=recvmsg ) ldy #pcbState lda #STATE_REPLY sta (pcbPtr),y ldx pcbPtr+0 ldy pcbPtr+1 pla sta mmuZeroPage txa cli clc rts zpPtrSave : buf 1 Reply = * ;( .AY=msgBuf[msgRet,msgData] ) : .CS:.A=err sei ;** switch to kernel ldx mmuZeroPage stx zpPtrSave ldx #$00 stx mmuZeroPage sta msgPtr+0 sty msgPtr+1 ;** find and check the sender ldy #msgFrom lda (msgPtr),y sta pcbPtr+0 iny lda (msgPtr),y sta pcbPtr+1 ;xx verify that receiver is a pcb here ldy #pcbState lda (pcbPtr),y cmp #STATE_REPLY beq + - lda #KERN_ERR_PID_NOT_REPLY ldx zpPtrSave stx mmuZeroPage sec cli rts + ldy #pcbBlockedOn lda (pcbPtr),y cmp activePid+0 bne - iny lda (pcbPtr),y cmp activePid+1 bne - ;** copy the reply contents ldy #pcbSendMsgPtr lda (pcbPtr),y sta p+0 iny lda (pcbPtr),y sta p+1 ldy #msgRet lda (msgPtr),y sta (p),y ldy #msgData - lda (msgPtr),y sta (p),y iny cpy #msgData+4 bcc - ;** wake up the sender and exit jsr MakeReady ldx zpPtrSave stx mmuZeroPage clc cli rts ;======== test application ======== testNumber : buf 1 Init = * lda #1 sta testNumber lda #<TestSid1 ldy #>TestSid1 ldx #2 jsr Create lda #<TestDelay1 ldy #>TestDelay1 ldx #1 jsr Create lda #<TestDelay2 ldy #>TestDelay2 ldx #1 jsr Create lda #<TestDelay3 ldy #>TestDelay3 ldx #1 jsr Create lda #<TestDelay4 ldy #>TestDelay4 ldx #1 jsr Create lda #<TestDelay5 ldy #>TestDelay5 ldx #1 jsr Create lda #<Blabber1 ldy #>Blabber1 ldx #1 jsr Create lda #<Spinner1 ldy #>Spinner1 ldx #1 jsr Create jmp KernelServer TestSid1 = * ldx #$1c-1 lda #$00 - sta $d400,x dex bpl - lda #$50 sta 2 sta 3 lda #$08 sta $d418 lda #$00 ldy #$08 sta $d402 sty $d403 lda #$41 sta $d404 lda #$00 sta $d405 lda #$f0 sta $d406 - lda 2 ldy 3 sta $d400 sty $d401 lda 2 ora 3 bne + lda #120 ldy #0 jsr Delay + inc 2 bne + inc 3 + inc $d020 tsx jmp - TestDelay1 = * jsr MyParentPid sta testDelay1Msg+msgTo+0 sty testDelay1Msg+msgTo+1 lda #<testDelay1Txt ldy #>testDelay1Txt sta testDelay1Msg+msgBuf+0 sty testDelay1Msg+msgBuf+1 - lda #<60 ldy #>60 jsr Delay inc $581 lda #<testDelay1Msg ldy #>testDelay1Msg jsr Send jmp - testDelay1Txt : db "Hi, this is delay process 1 *\n",0 TestDelay2 = * jsr MyParentPid sta testDelay2Msg+msgTo+0 sty testDelay2Msg+msgTo+1 lda #<testDelay2Txt ldy #>testDelay2Txt sta testDelay2Msg+msgBuf+0 sty testDelay2Msg+msgBuf+1 - lda #<120 ldy #>120 jsr Delay inc $582 lda #<testDelay2Msg ldy #>testDelay2Msg jsr Send jmp - testDelay2Txt : db "Hi, this is delay process 2\n",0 TestDelay3 = * jsr MyParentPid sta testDelay3Msg+msgTo+0 sty testDelay3Msg+msgTo+1 lda #<testDelay3Txt ldy #>testDelay3Txt sta testDelay3Msg+msgBuf+0 sty testDelay3Msg+msgBuf+1 - lda #<180 ldy #>180 jsr Delay inc $583 lda #<testDelay3Msg ldy #>testDelay3Msg jsr Send jmp - testDelay3Txt : db "Hi, this is delay process 3\n",0 TestDelay4 = * jsr MyParentPid sta testDelay4Msg+msgTo+0 sty testDelay4Msg+msgTo+1 lda #<testDelay4Txt ldy #>testDelay4Txt sta testDelay4Msg+msgBuf+0 sty testDelay4Msg+msgBuf+1 - lda #<240 ldy #>240 jsr Delay inc $584 lda #<testDelay4Msg ldy #>testDelay4Msg jsr Send jmp - testDelay4Txt : db "Hi, this is delay process 4\n",0 TestDelay5 = * jsr MyParentPid sta testDelay5Msg+msgTo+0 sty testDelay5Msg+msgTo+1 lda #<testDelay5Txt ldy #>testDelay5Txt sta testDelay5Msg+msgBuf+0 sty testDelay5Msg+msgBuf+1 - lda #<300 ldy #>300 jsr Delay inc $585 lda #<testDelay5Msg ldy #>testDelay5Msg jsr Send jmp - testDelay5Txt : db "Hi, this is delay process 5\n",0 Blabber1 = * jsr MyParentPid sta blabber1Msg+msgTo+0 sty blabber1Msg+msgTo+1 lda #<blabber1Txt ldy #>blabber1Txt sta blabber1Msg+msgBuf+0 sty blabber1Msg+msgBuf+1 - inc $580 lda #<blabber1Msg ldy #>blabber1Msg jsr Send jmp - blabber1Txt : db "Hi, this is blabber\n",0 Spinner1 = * jsr MyParentPid sta spinner1Msg+msgTo+0 sty spinner1Msg+msgTo+1 lda #<spinner1Txt ldy #>spinner1Txt sta spinner1Msg+msgBuf+0 sty spinner1Msg+msgBuf+1 - inc $580 lda #<spinner1Msg ldy #>spinner1Msg jsr Send jmp - spinner1Txt : db "Hi, this is spinner +\n",0 KernelServer = * lda #$00 sta mmuZeroPage lda #14 jsr $ffd2 - lda #<ksMsg ldy #>ksMsg jsr Receive lda ksMsg+msgBuf+0 ldy ksMsg+msgBuf+1 sta $80 sty $81 ldy #0 - lda ($80),y beq + jsr $ffd2 iny bne - + lda #<ksMsg ldy #>ksMsg jsr Reply jmp -- bss = * testDelay1Msg = $c00 ;** put these here to save pgm memory testDelay2Msg = testDelay1Msg+msgSize testDelay3Msg = testDelay2Msg+msgSize testDelay4Msg = testDelay3Msg+msgSize testDelay5Msg = testDelay4Msg+msgSize blabber1Msg = testDelay5Msg+msgSize spinner1Msg = blabber1Msg+msgSize ksMsg = spinner1Msg+msgSize -----=----- Here is the executable binary for the demo system.