Software Engineering Institute | Carnegie Mellon University
It is always useful to start with a simple example that exhibits the properties that we want to focus on. On this airbag video:
it is possible to see two airbags: one that inflates at the proper timing and another that deploys late. This represents the character of a Cyber-Physical System that needs to execute the steps correctly (infate the airbag) and execute them at the proper time in sync with the physical process (the driver traveling towards the steering wheel in a crash).
Below are some of the projects we work on.
Zero-Slack Scheduling is a scheduling framework for real-time mixed-criticality systems. Specifically, it targets systems where the utilization-based scheduling priorities are not aligned with the criticality of the tasks. With this framework we implemented a family of schedulers, resource allocation protocols, and synchronization protocols to support the scheduling of mixed-criticality systems.
Safety standards like the DO-178B for avionics and the ISO 26262 in automotive include safety levels that map to criticality levels. For a discussion of the use of Mixed-Criticality scheduling in automotive systems with AutoSAR and ISO 26262 please look at the paper: Applying the AUTOSAR timing protection to build safe and efficient ISO 26262 mixed-criticality systems
Zero-Slack Q-RAM combines ZS-RM and Q-RAM to enable overbooking (cycles allocated to more than one task) not only between tasks of different criticality but also to task with different utility to the mission of the system.We implemented three version of the ZS-QRAM scheduler: a modification to the Linux/RK kernel (and kernel module), an independent kernel module implementation, and a daemon-based implementation.
We developed a number of experiments to showcase the effects of the scheduler in a drone mission. In particular, first we demonstrate how the wrong scheduling can actually crash a drone. This is shown in this video.
Secondly, we showcase how in a full mission ZS-QRAM not only preserve the saftey of the flight but also maximizes the utility of the mission. In particular, the demo shows a surveillance mission where a video-streaming task and an object recognition tasks are dynamically adjusted according to their utility to the mission. This video shows this case.
Multicore processors are quite different from multi-processors. This is due to the fact that cores within a processor share resources. One of the most critical shared resources is the memory system. This includes both shared cache and shared RAM memory. The effect of the memory interference that one task running on one core in another running on a different core can be really significant. We have seen extreme cases of 12X increases in the execution time due to memory interference (as can be seen in the figure below) and some practitioners have observed 3X. This 3X basically means that in a dual core processor I am better of shutting down a core to avoid a DECREASE in the execution speed.
Once you solve the interference problem due to shared resources we need to support new tasks with parallelized jobs that need more than one core to complete before the deadline. New scheduling algorithms are necessary to schedule these tasks and these need to be combined with memory partitions to maximize their utilization and guarantee their time predictability.
At the SEI we have been working with the CMU RTML on the memory by creating partitioning mechanisms to eliminate or reduce this interference along with analysis algorithms to take into account residual effects.
A key mechanism we use is page coloring. Page coloring takes advantage of the virtual memory system that translates virtual memory addresses to physical addresses to assign physical addresses that do not interference with each other to different tasks running on different cores. This mechanism is combined with characteristics of the memory hardware that divide the memory into areas that do not interference with each other. Different mechanisms exist for cache and main memory.
Cache fast memory that is used in the memory system to load memory blocks that are frequently used. This means that the first time a variable is read by the CPU this variable is loaded in the cache and any following access to the variable is done in the cache at a much faster speed. However, caches are much smaller than main memory and as the program executes it will stop using some variables. The cache system remembers if a variable has not been access frequently and selects its cache block to be replaced by a newly access variable that has no empty room in the cache for it. When a variable from a task in one core replaces the cache block of the variable of another task in another core it creates a delay in the execution of the latter because it needs to go all the way back to memory to access it again.
Most cache hardware divide the cache into sets of cache blocks in what is known as set-associativity. Each set is restricted to be used for certain area of the physical memory and cannot be used by any other. We take advantage of this and ensure that the physical memory used by one tasks in one core belongs to one of these regions while the memory of another task running on a different core belongs to a different one. This is what is known as cache coloring that effectively creates cache partitions.
While cache coloring provides a great benefit to reduce the interference across cores, its not enough to solve the problem. Main memory is another source of interference that can be significant. In fact, the experiments shown in the figure above is due to memory interference and not cache interference. Main memory is divided into regions called banks. This banks in turn are organized into rows and columns. Whenever a task running in a core tries to access a memory address in main memory first this address is analyzed to extract three pieces of information (out of specific bits in the memory address): (i) the bank number, (ii) the row number, (iii) the column number. The bank memory is used to select the bank where the memory block is located. Then the memory controller loads the row from that bank into a row buffer within the bank for faster access. Finally the memory block is access in the column indicated by the column number from the row buffer. This can be seen in the figure below.
Because the memory controller is optimized to improve the memory accesses per second, it takes advantage of the row buffer and favors the memory accesses that go to the same row. Unfortunately this means that when a task 1 in one core is accessing a row (already loaded in the row buffer) while a task 2 running in another core is trying to access another row in the same bank, the access from task 2 can be move back in the memory access queue by another more recent access from task 1 to the already loaded row multiple times, creating an important delay for task 1.
Memory bank partitions are created by mapping the memory of the different tasks to different memory banks. This way each task can have its own bank and row buffer an no other task will modify that buffer or the queue of memory accesses to this bank.
Because caches and memory banking technologies were not developed together, more often than not, their partitions intersect each other. In other words, it is not possible to select a bank color independent from a cache color because the selection of a cache color may limit the number of bank colors available. This is because in some processor architectures, the address bit used to select a bank and the bits used to select a cache set share some elements. To illustrate this issue consider a memory system with four banks and four cache sets. In this case, we need two address bit to select a bank and two bits to select a cache set. If they were independent then we would be able to select four cache colors for each bank color we can select for a total of 16 combinations. This can be seen as a matrix (a color matrix) where rows are cache colors and columns are bank colors. However, if they share one bit then in reality we only have 23=8 colors. This means that in the color matrix some of the cells will not be real. This is shown in the figure below
We developed a coordinated allocation approach to allocate cache and bank colors along with processor to taks in order to avoid the cache/bank color conflicts and to maximize the effectiveness of the memory partitions taking into account the difference between inter and intra core interference.
We developed memory reservations with cache and memory partitions in the Linux/RK OS.
Unfortunately, the number of partitions that is possible to obtain with page coloring is limited. For instance, for an Intel i7 2600 processor it is possible to obtain 32 cache colors and 16 bank colors. Given that, in practice, we may have a larger number of tasks (say 100), these number of partitions may prove insufficient for a real system. As a result, it is important to also enable the sharing of partitions whenever the memory bandwidth requirements of tasks allows it. However, this sharing must be done in a predictable way ensuring that we can guarantee meeting the tasks deadlines. At the same time it is important to avoid pessimistic over-approximations in order not to waste the processor cycles that we were trying to save in ther first place. For this case we developed an analysis algorithm that allows us to verify the timing interference of private and share memory partitions.
Beyond solving the resource sharing problem we also need to enable the execution of parallelized tasks. For this we have developed a global EDF scheduling algorithm for parallelized tasks with staged execution. These tasks generate jobs composed of a set of sequential stages that in turn are composed of a set of parallel segments. This segments are allowed to run in parallel to each other provided that all the segments from the previous stage have completed or the task has arrived for the first segments of the first stage. Our algorithm allows us to verify the schedulability of these tasks with a global EDF scheduler. Beyond EDF, it is possible to use this algorithm with a global fixed priority scheduler with synchronous start, harmonic periods and implicit deadlines, a common configuration used by practitioners.