In this recipe 3.1, we describe
architecture models where computing units (i.e. cores) have no
shared hardware (i.e. memory units). Each computing units has
its own memory, which means that thread running on different
core units cannot share software resources such as data component.
By computing units, we means a hardware unit able to run a program sequentially.
It may be a unicore processor or a core in a multicore/manycore architecture.
In this recipe, we do not address interconnection between cores/memory units aspects: this is
presented in recipe 4.3.
We introduce 3 typical solutions for this recipe, each where processor is assumed
to have a unique computing unit:
- A partitionned multiprocessor solution with independant processors.
Each processor has its own memory unit.
Each processor has its own scheduler.
Then,
each thread is located on a given processor, cannot migrate, and can only shared
resources with the other threads of the same processor. In this solution, each thread competes
only with the threads located on the same processor to access it.
This solution can be used to model, for example,
a set of board connected by a bus where communication latency is
negligeble and where each board hosts a unicore processor.
- A partitionned multiprocessor solution with processors sharing the same memory unit.
Threads are always not allowed to migrate.
Each processor has its own scheduler.
Again, each thread competes
only with the threads located on the same processor to access it.
However, has the processors share the same memory units, threads located on different processors
can share software resources.
Processor and memory units can be connected either with a bus, a
crossbar switch or by a network-on-chip.
This architecture solution can be used to
model a multicore architecture based on the concept of task affinity (XXX 2010) for example.
- A global multiprocessor solution. This solution has the highiest level of flexiblity.
Processors shared share the same memory unit.
There is only one scheduler allocating all the thread on all processors.
Then, threads can migrate and may share various software resources.
Processor and memory units can be connected either with a bus, a crossbar switch
or by a network-on-chip.
This architecture solution can be used to
model a multicore architecture a SMP with a global scheduling for example.
Cheddar properties
To model such various architecture models, Cheddar proposes a set of properties given below:
Supported_Soc_Type : type enumeration (
SoC_Processing_Unit, SoC_Memory_Unit, Soc_Interconnection_Unit);
System_Soc_Type : Supported_SoC_Type applies to (system);
---------------------------------------------------
Supported_Instruction_Set_Architecture_Type : type enumeration (
i386, powerpc, risc5, sparc1, sparc2, sparc3,
sparc4, sparc5, sparc6, sparc7, sparc8);
Instruction_Set_Architecture_Type : Cheddar_Multicore_Properties::Supported_Isa_Type applies to (memory, system);
Supported_Processors_type : type enumeration (
Unicore_type, Identical_Multicores_Type, Uniform_Multicores_Type,
Unrelated_Multicores_Type );
Processors_type : Cheddar_Multicore_Properties::Supported_Processors_type applies to (memory, system);
Supported_Migrations_Type : type enumeration (
No_Migration_Type, Job_Level_Migration_Type, Time_Unit_Migration_Type);
Migrations_Type : Cheddar_Multicore_Properties::Supported_Migrations_Type applies to (memory, system);
Supported_Multiprocessors_Type : type enumeration (
Identical, Homogeneous, Heterogeneous);
Multiprocessors_Type : Cheddar_Multicore_Properties::Supported_Multiprocessors_Type applies to (memory, system);
Implement_Runtime_Protection: aadlboolean applies to (processor,system);
Peak_MIPS : addlinteger applies to (processor,system);
------------------------------------------------------------
-- This property is used to tad a system component
-- to specify if it implement a SoC and what kidnd of SoC
------------------------------------------------------------
Supported_Soc_Type : type enumeration (
SoC_Processing_Unit, SoC_Memory_Unit, Soc_Interconnection_Unit);
System_Soc_Type : Supported_SoC_Type applies to (system);
---------------------------------------------------
-- Property for Processing units modeling
---------------------------------------------------
Supported_Instruction_Set_Architecture_Type : type enumeration (
i386, powerpc, risc5, sparc1, sparc2, sparc3,
sparc4, sparc5, sparc6, sparc7, sparc8);
Instruction_Set_Architecture_Type : Cheddar_Multicore_Properties::Supported_Isa_Type applies to (memory, system);
Supported_Processors_type : type enumeration (
Unicore_type, Identical_Multicores_Type, Uniform_Multicores_Type,
Unrelated_Multicores_Type );
Processors_type : Cheddar_Multicore_Properties::Supported_Processors_type applies to (memory, system);
Supported_Migrations_Type : type enumeration (
No_Migration_Type, Job_Level_Migration_Type, Time_Unit_Migration_Type);
Migrations_Type : Cheddar_Multicore_Properties::Supported_Migrations_Type applies to (memory, system);
Supported_Multiprocessors_Type : type enumeration (
Identical, Homogeneous, Heterogeneous);
Multiprocessors_Type : Cheddar_Multicore_Properties::Supported_Multiprocessors_Type applies to (memory, system);
-- Memory space isolation
-- Define whether a processor may have its own private addressing space
-- Related to the capability of hardware to isolate thread's memory access flows (virtual memory context, address space numbers, ?)
--
Implement_Runtime_Protection: aadlboolean applies to (processor,system);
-- Maximum number of run instruction per second
-- i.e. Peak_MIPS is an upper bound of the processor speed
-- This is the worst mean to express the speed of a processor, but everyone knows the limit of that metric
--
Peak_MIPS : addlinteger applies to (processor,system);
1. Partitionned multiprocessor solution with independant processors
An example of the solution with
independant processors and private memory
(can be downloaded here) is:
package partitionned
processor core
properties
Scheduling_Protocol => RM;
end core;
thread get_line
properties
Dispatch_Protocol => Periodic;
Period => 125 ms;
Deadline => 125 ms;
Compute_Execution_Time => 1 ms .. 25 ms;
Priority => 10 ;
end get_line;
thread implementation get_line.Impl
end get_line.Impl;
thread edge
properties
Dispatch_Protocol => Periodic;
Period => 125 ms;
Deadline => 125 ms;
Compute_Execution_Time => 25 ms .. 25 ms;
Priority => 9 ;
end edge;
thread implementation edge.Impl
end edge.Impl;
thread sharp
properties
Dispatch_Protocol => Periodic;
Period => 250 ms;
Deadline => 250 ms;
Compute_Execution_Time => 25 ms .. 25 ms;
Priority => 8 ;
end sharp;
thread implementation sharp.Impl
end sharp.Impl;
process Soft
end Soft;
process implementation Soft.Impl
subcomponents
edge : thread edge.Impl;
get_line : thread get_line.Impl;
sharp : thread sharp.Impl;
end Soft.Impl;
system partitionned
end partitionned;
system implementation partitionned.Impl
subcomponents
soft1 : process soft.Impl;
soft2 : process soft.Impl;
cpu1 : processor core;
cpu2 : processor core;
properties
Actual_Processor_Binding
=> (reference(cpu1))
applies to soft1;
Actual_Processor_Binding
=> (reference(cpu2))
applies to soft2;
end partitionned.Impl;
end partitionned;
This example is fully compliant with AADL V2 and do not need any of the specific Cheddar property.
The only noticeable part is the component partitionned.Impl in which the
partitioning of the thread is defined by two processes (called soft1 and soft2)
and two processors (called cpu1 and cpu2).
Assignment of each process on its processor is given by the two following property assignement:
Actual_Processor_Binding
=> (reference(cpu1)) applies to soft1;
Actual_Processor_Binding
=> (reference(cpu2)) applies to soft2;
2. Partitionned multiprocessor solution with processors sharing the same memory unit
We now define solution 2 by extending solution 1 with a common memory unit between the processors.
A typical model for this solution
((can be downloaded here) can be:
package core_affinity
public
with multicore_crossbar_units;
thread ordo_bus
properties
Dispatch_Protocol => Periodic;
Period => 125 ms;
Deadline => 125 ms;
Compute_Execution_Time => 1 ms .. 25 ms;
Priority => 10 ;
end ordo_bus;
thread implementation ordo_bus.Impl
end ordo_bus.Impl;
thread pilotage
properties
Dispatch_Protocol => Periodic;
Period => 250 ms;
Deadline => 250 ms;
Compute_Execution_Time => 25 ms .. 25 ms;
Priority => 8 ;
end pilotage;
thread implementation pilotage.Impl
end pilotage.Impl;
system implementation core_affinity.Impl
subcomponents
a_process : process application.Impl;
cpu : system multicore_crossbar_units::dual_core.impl;
properties
Actual_Processor_Binding
=> (reference(cpu.core1))
applies to a_process.t1;
Actual_Processor_Binding
=> (reference(cpu.core2))
applies to a_process.t2;
end core_affinity.Impl;
system core_affinity
end core_affinity;
process application
end application;
process implementation Application.Impl
subcomponents
t1 : thread pilotage.Impl;
t2 : thread ordo_bus.Impl;
end Application.Impl;
end core_affinity;
This second solution makes use of the following components
(which can be downloaded here):
package multicore_crossbar_units
public
with Cheddar_Multicore_Properties;
processor uni_core
properties
Scheduling_Protocol=>(POSIX_1003_HIGHEST_PRIORITY_FIRST_PROTOCOL);
end uni_core;
system dual_core
end dual_core;
system implementation dual_core.impl
subcomponents
core1 : processor uni_core;
core2 : processor uni_core;
properties
Cheddar_Multicore_Properties::System_Soc_Type => SoC_Processing_Unit;
Cheddar_Multicore_Properties::SoC_Interconnection_Type => Crossbar;
end dual_core.impl;
system quad_core
end quad_core;
system implementation quad_core.impl
subcomponents
cores : processor uni_core [4];
properties
Cheddar_Multicore_Properties::System_Soc_Type => SoC_Processing_Unit;
Cheddar_Multicore_Properties::SoC_Interconnection_Type => Crossbar;
end quad_core.impl;
end multicore_crossbar_units;
3. Global multiprocessor
Finally, the third example is a global multiprocessor solution, where
threads can migrate from one computing unit to another, and may share each other
various software resources
(which can be downloaded here):
package global_scheduling
public
with multicore_crossbar_units;
thread ordo_bus
properties
Dispatch_Protocol => Periodic;
Period => 125 ms;
Deadline => 125 ms;
Compute_Execution_Time => 1 ms .. 25 ms;
Priority => 10 ;
end ordo_bus;
thread implementation ordo_bus.Impl
end ordo_bus.Impl;
thread donnees
properties
Dispatch_Protocol => Periodic;
Period => 125 ms;
Deadline => 125 ms;
Compute_Execution_Time => 25 ms .. 25 ms;
Priority => 9 ;
end donnees;
thread implementation donnees.Impl
end donnees.Impl;
thread pilotage
properties
Dispatch_Protocol => Periodic;
Period => 250 ms;
Deadline => 250 ms;
Compute_Execution_Time => 25 ms .. 25 ms;
Priority => 8 ;
end pilotage;
thread implementation pilotage.Impl
end pilotage.Impl;
process Application
end Application;
process implementation Application.Impl
subcomponents
ordo_bus : thread ordo_bus.Impl;
donnees : thread donnees.Impl;
pilotage : thread pilotage.Impl;
end Application.Impl;
system product
end product;
System implementation product.impl
subcomponents
hard : system multicore_crossbar_units::dual_core.impl;
soft : process Application.impl;
properties
actual_processor_binding => (reference(hard)) applies to soft;
Scheduling_Protocol => (RMS) applies to hard;
end product.impl;
end global_scheduling;
migraion, nombre de communication de contexte et de préemption
Global scheduling versus partitioned scheduling
We want to analyze scheduling of an architecture composed of 5 threads defined by
the following parameters:
Threads | Execution times | Periods |
T1 | 3 ms | 6 ms |
T2 | 2 ms | 24 ms |
T3 | 2 ms | 10 ms |
T4 | 1 ms | 6 ms |
T5 | 2 ms | 5 ms |
Deadlines are equal to periods. Threads have the first release time at the same time.
We want to apply a preemptive fixed priority scheduling with a Rate Monotonic
priority assignment.
We assume an architecture with 2 processors and we want to compare global and partitionned scheduling.
-
First we apply a partitioning approach. T3 and T5 are located on the first processor
while the other threads are run on the second processor.
Design an AADL model with AADLInspector or OSATE for this architecture and compute with
these tools the scheduling during the first 24 ms. What are the worst case response time of
each thread?
-
First we apply a global approach.
Design an AADL model with AADLInspector or OSATE for this second architecture and compute with
these tools the scheduling during the first 24 ms. What are the worst case response time of
each thread?
-
For this thread set, what is the best approach between partitionning and global scheduling?
Usually, what is the best approach between global and partitionned scheduling?
Possible solution/AADL model for this
case study here.
Migrations and global scheduling
We want to analyze scheduling of an architecture composed of 4 threads defined by
the following parameters:
Threads | Execution times | Periods |
T1 | 2 ms | 5 ms |
T2 | 8 ms | 20 ms |
T3 | 2 ms | 25 ms |
T4 | 3 ms | 6 ms |
Deadlines are equal to periods. Threads have the first release time at the same time.
We want to apply a preemptive fixed priority scheduling with a Rate Monotonic
priority assignment.
-
Design an AADL model for this architecture.
-
Without computing the scheduling of the thread set, what analysis feature can we use
with OSATE or AADLInspector to prove that threads cannot met their deadline on one processor?
-
With AADLInspector or OSATE, compute the scheduling during the first 20 ms and verify that
your answer to question 1 is correct.
-
In order to met the deadline, we now modify the architecture by adding un second processor.
Scheduling on the two processors is made by a global approach.
Each thread is allowed to migration only when it start the execution of a job/activation, i.e.
when the thread executes the first instruction at each periodic release.
With AADLInspector or OSATE, modify your previous AADL model and compute the
scheduling during the first 20 ms.
-
Modify again your AADL model to allow the threads to migrate at any time, and then,
compute the scheduling of the threads during the 20 first ms.
-
Compare the results of the questions 4 and 5. In which case do the threads have the shortest
worst case response? What is the migration policy which reduce the best the worst
case response time with this thread set?
More generally, what is the migration policy which reduce the best the worst
case response time?
Possible solution/AADL model for this
case study here.