Design

In the design for our RTOS we incorporated ideas from multiple different sources: online examples, other students reports and other real-time operating systems. These are the decisions we made during the development of the RTOS.

Active Kernel vs Shared Library Kernel

We were given code using two different approaches to examine. The "active" kernel version uses a separate stack for the kernel, namely the original stack at the end of SRAM. Each task has its own stack allocated in an array in the data section near the top of SRAM. In the "shared library" kernel version, the kernel uses the stack of the current task.

The advantage of the "shared" approach is the response time of the kernel to a system call. Rather than having to spend time restoring a kernel "context," the kernel operates outside any context. The disadvantage of this approach is that the stacks of the individual tasks must be large enough to accommodate the kernel. Also, the scheduling becomes more complicated; each system call must select the next task to run.

We didn't attempt the shared library approach. Instead, we chose the active kernel approach.

Task and Kernel Context

In abstract terms, a task in a multi-tasking system consists of two parts: code and context.

The code is the sequence of machine instructions that are executed by the processor when the task is "running." The context is the description of the state of the processor (not including memory) between the execution of instructions.

From the point of view of a task, it alone controls the entire processor. So when an interrupt occurs to switch out the task, the processor's state must be recorded somewhere so that it can be restored when the task is resumed.

There are only a few registers that describe the processor's state:

PC - the program counter register,
SP - the stack pointer register,
SREG - the status register, whose bits indicate the results of the last instruction executed and whether interrupts are enabled
r0 to r31 the general purpose I/O registers

These are the items to be saved during a context switch.

We chose to follow the sample code and save:

the program counter, PC, on the stack with a call instruction
the 32 I/O registers, r0 to r31, on the stack
the status register, SREG, on the stack
the stack pointer, SP, in a variable

See the context switching page. We could have chosen some other location, such as individual variables for each item for each task for example. But the assembly instructions for pushing and popping the stack are quite fast. Also, historically, many CISC processors have an instruction that saves the context on the stack, although RISC processors such the AT90 do not. We followed tradition and used the stack.

The stack for each task lies within an array defined inside a struct, called a task descriptor, which collects all the data pertaining to the task. All of the task descriptors lie in an array of these structs. Since we know the maximum number of tasks at compile-time, all this data is statically defined, and fixed before the application is run.

Another option, which we rejected, is to dynamically allocate storage for tasks as they are created with calls to malloc(). This is never a good idea in an embedded system. With its limited resources, an embedded system couldn't practically implement virtual memory or swap space nor any other mechanism used by larger operating systems to abstract the physical memory. More importantly, for a real-time system, we value the predictability of the response time when creating a task.

Queues as Linked Lists

Because the design of the OS in the os.h file provided to us, the natural data structure to employ for scheduling tasks is a set of singly-linked lists. For each of the queues we need:

RR task ready queue, scheduled round-robin
SYSTEM task ready queue, scheduled first-come, first-served
queues of blocked tasks waiting for a signal
the dead pool of empty task descriptors

Tasks always enter at one end of the queue, and leave from the other. Minor modifications to the OS design would change this. For example, if one task could cause some other task to terminate, then the killed task might have to be removed from the middle of a queue.

For the elements of the lists, we chose to use the task descriptors themselves. We merely added a pointer variable inside the struct to point to the next task in whichever queue the task is in at the moment. The lists are accessed with pointers to the head and tail via queue() and dequeue() functions.

System Calls

We wanted the kernel to respond to one system call at a time. So each of the public system calls first disables interrupts before setting a variable to indicate the nature of the call, then causes the processor to switch to the kernel's context. Because interrupts are disabled, no interrupt service routine, or ISR, can overwrite that variable before the kernel reads it. So there only needs to be one copy of a variable to hold the current request to the kernel. By the same reasoning, there need be only one copy of the arguments and return values needed for the various calls.

OS Idling

When there is no task to run, the processor still must do something as the timers tick. But in what context should the processor run?

The two choices we considered were the kernel context, and a separate "idle task" context. The advantage of using the kernel context is that when an interrupt does occur that has an interrupt service routine that causes some task to become ready to run, the kernel can respond much more quickly. If the processor is idling in a useless task, then the processor must spend time switching out of that context and into the kernel's. As well, the idle task will need its own stack which consumes memory. In the first choice, the processor idles using the kernel's stack.

But ultimately, we chose the second option. The reason is that the OS specification does not stipulate if or how ISRs can make system calls, and there is no way to tell if a call came from inside one. In order to use the option of idling within the kernel, we need to specify certain rules for system calls within ISRs. For example, only allow signals or broadcasts on events, and then only once as the last statement of the ISR.

ISRs and System Calls

Because there are no restrictions on interrupt service routines calling system calls, we implicitly allow them. But then we have to decide the semantics of the calls. Specifically, in which context, and on whose behalf, is the call being made?

One answer is that the ISR is operating outside of every task. This is the usual notion of ISR such as used in device drivers on other systems, for example. The system is responding to stimuli that are independent of the task. But these ISRs do not normally make system calls.

Since we implicitly allow all system calls, it makes sense to consider that an ISR makes a system call on behalf of the currently running task. This makes no difference for Task_Create() or Event_Signal(), but Event_Wait() or Task_Terminate() have a profound effect on the current task. And we could envision application designers using ISRs in this way. For example, system level task scheduling could be changed to be round-robin by creating an ISR that periodically calls Task_Next(). Or an ISR could kill random tasks by calling Task_Terminate(), or make them wait with Event_Wait().

As well, an ISR might make two system calls, such as signaling on an event exactly twice. In this case we need someplace for the first system call to return to. If some task is running, then the solution is apparent: use the current task's context to save the state.

But if no task is running and the OS is idling, we can't use the same solution if the idling is occurring in the kernel's own context. With the restrictions suggested above, we could just forget the rest of the ISR and never return to it without any problem. But since we do allow unconventional ISRs, the only solution we could find is to use an idle task and give it its own context and stack space.

Event Handles

In the os.h file, Event_Init() conveniently returns a pointer to a struct. The intention seems to make it easy to access the event's waiting queues in the other Event system calls. But it is generally a dangerous practice to allow tasks to access internal kernel variables. What would happen if the user task signaled with a pointer that didn't actually point to such a structure? How could the kernel code check?

Since we are forced to used pointers to structs, but don't want to, we employ a simple subterfuge. The "pointers" referred to in the Event system calls are merely integers cast as pointers, with values from 0 to num_events_created - 1. Then it becomes easy to check if the pointer is valid; simply cast it to an integer and check if it is in the right range.

This technique is generally known as using handles to objects, where the handle must be converted into a pointer inside the private code. It takes a few extra steps to do this, but the code eliminates many hard-to-find errors.