RT-Thread SMP
RT-Thread SMP Introduction and Porting
Last updated
RT-Thread SMP Introduction and Porting
Last updated
Assoc. Prof. Wiroon Sriborrirux, Founder of Advance Innovation Center (AIC) and Bangsaen Design House (BDH), Electrical Engineering Department, Faculty of Engineering, Burapha University
SMP: Symmetrical Multi-Processing (SMP) refers to a group of processors (multiple CPUs) gathered on a computer, with each CPU sharing a memory subsystem and bus structure.
RT-Thread supports SMP since version v4.0.0, which can be enabled by enabling RT_USING_SMP on symmetric multi-core. This document mainly introduces SMP and explains how to port RT-Thread SMP.
After the system is powered on, each CPU will run independently under the control of the code in the ROM, but only the main processor (hereinafter referred to as CPU0) jumps to the initialization entry of RT-Thread, while other processors (hereinafter referred to as secondary CPUs) will be suspended in a certain state, waiting for CPU0 to wake them up.
CPU0 completes the global initialization process of RT-Thread, including peripheral initialization, interrupt distribution part initialization of interrupt controller, global variable initialization, creation of global kernel objects, etc. It also completes the hardware initialization of CPU0 itself, including MMU, CPU interface part of interrupt controller, and interrupt vector table, etc.
Finally, before executing the main thread, CPU0 wakes up other secondary CPUs and guides them to execute the secondary CPU initialization code. This code allows each secondary CPU to complete its own hardware initialization and start task scheduling.
After that, the system enters the normal operation stage. The actions of each CPU during the system startup stage are shown in the following figure:
It is worth noting that the initialization of each secondary CPU's own hardware part cannot be completed by CPU0 because its own hardware cannot be accessed by other CPUs.
On the SMP platform, the boot process of core CPU0 is the same as the boot process on a single-core CPU. The main process is shown in the following figure:
Due to the differences in hardware platforms and compilers, the initialization code and startup process executed after the system is powered on are different, but the system ultimately calls the entry function rtthread_startup()
to start RT-Thread. This function sets the hardware platform, initializes the operating system components, creates the user main thread, and finally starts the task scheduling mechanism of the current CPU to start normal work.
When the task scheduler of CPU0 is turned on, there are usually two threads on CPU0: main thread and idle thread. Users can create new threads by modifying configuration options or through the interface provided by RT-Thread. The scheduler selects the ready thread to execute based on the priority and thread status.
If the configuration option RT_USING_SMP is defined, the main thread of CPU0 will execute the function to start other CPU cores during operation rt_hw_secondary_cpu_up()
. This function is provided by the kernel porting developer and performs the following two operations:
Set the boot entry address of the secondary CPU;
Power on the secondary CPU.
In ARMv7-A, the startup entry address of the secondary CPU is fixed to secondary_cpu_start
, which is defined in the file libcpu/arm/cortex-a/start_gcc.S. The main steps include setting the kernel stack of the current CPU, establishing the MMU memory mapping table, and then jumping to the function secondary_cpu_c_start()
to execute. Function secondary_cpu_c_start()
is the initialization function of all secondary CPUs. It is also closely related to the hardware platform and is provided by the developer of the porting system. The following steps need to be completed:
Initialize the interrupt controller of the current CPU and set the interrupt vector table;
Set the timer to generate a tick interrupt for the current CPU;
Acquire the kernel spin lock _cpus_lock
to protect the global task table and call the function rt_system_scheduler_start()
to start the task scheduler of the current CPU.
Allwinner T3 chip uses GIC interrupt controller. The system provides tick interrupt count through Generic Timer. The function secondary_cpu_c_start()
code is as follows:
After each secondary CPU is started, the task with the highest priority is selected from the global task table and the local task table of the current CPU for execution. If the priorities are the same, the task in the local task table of the current CPU is selected for execution first.
In the absence of other tasks, each CPU schedules its own idle task to execute. Among them, the idle task of CPU0 executes the function in a loop rt_thread_idle_execute()
, while the idle task of the secondary CPU executes the function in a loop rt_hw_secondary_cpu_idle_exec()
. The latter also needs to be provided by the developer of the transplanted system. The example code is as follows:
Tasks in RT-Thread are divided into the following states:
Running state: the task is being executed on a CPU;
Ready state: The task can be executed at any time, but has not yet been assigned to the CPU, so it is waiting in a ready task table;
Suspended state: The task cannot be executed because the conditions are not met (waiting timeout or data arrival, etc.);
Closed state: The task has been deleted and is waiting to be recycled.
After entering the normal operation phase, each CPU runs interrupt processing, scheduler, and task code independently. RT-Thread has the following characteristics when running on a multi-core system:
At the same time, a task (thread) will only run on one CPU;
Each CPU mutually exclusively accesses the global scheduler data to determine the task to be run on the current CPU;
Supports binding tasks to run on a certain CPU.
To achieve the above goals, the RT-Thread scheduler implements two ready task queues:
The global ready task table rt_thread_ready_table[] contains ready tasks that are not bound to a CPU;
CPU local ready task table ready_table[], one for each CPU, contains the ready tasks bound to the corresponding CPU. A typical CPU bound task is each CPU's own idle task.
When the CPU needs to switch tasks to execute, the task scheduler searches for the highest priority ready task in the system, that is, the highest priority task in the global ready task table and the local ready task table of the current CPU. In the case of the same priority, the task in the local task table is selected first.
Correspondingly, if a task changes from other states to the ready state, the following processing is performed:
If it is not a CPU-bound task, it will be placed in the global ready list and an IPI (Inter-Processor Interrupt) interrupt will be sent to all other CPUs to notify them to check whether they need to switch tasks, because the priority of the current tasks of other CPUs may be lower than that of this ready task, so priority preemption will occur;
If it is a CPU-bound task, check whether it has a higher priority than the current task of the corresponding CPU. If so, priority preemption occurs, otherwise it is hung in the local ready task table of the corresponding CPU. The whole process does not notify other CPUs.
To support SMP platforms, RT-Thread provides the following kernel interfaces to facilitate kernel developers to use multi-core functions.
When a task running on a single CPU changes the system state or triggers an event, it needs to notify other CPUs through an inter-processor interrupt. After receiving the signal, other CPUs call the corresponding registered routines for processing.
RT-Thread provides the following IPI interfaces:
This function is used to send the IPI signal of the specified number to the CPU set represented in the CPU bitmap.
This function sets the processing function of the IPI signal with the specified number for the current CPU.
In an SMP system, each CPU maintains its own independent tick value for task execution timing and time slice statistics. In addition, CPU0 also updates the system time through tick counts and provides system timer functions, which secondary CPUs do not need to provide.
During the initialization of the secondary CPU, each CPU needs to enable its own tick timer and register the corresponding tick interrupt handler. The operation of enabling the tick timer is related to the specific hardware platform and needs to be provided by the kernel porting developer; the tick interrupt handler mainly completes two actions:
Sets the state of the tick timer.
Increment the current CPU tick count.
In an SMP system, disabling interrupts cannot prevent multiple CPUs from concurrently accessing shared resources, so a spin lock mechanism is needed to protect them. Similar to a mutex lock, a spin lock can have at most one holder at any time, which means that at most one execution unit can obtain the lock at any time. The difference is that for a mutex lock, if the resource is already occupied, the resource applicant can only enter a sleep state. A spin lock does not cause the caller to sleep, but instead loops and queries until the holder of the spin lock has released the lock.
RT-Thread provides a spin lock interface:
1. The following function initializes the allocated spinlock variable:
2. The following function acquires the spinlock and waits until the acquisition is successful:
3. The following function releases the spinlock:
4. The following function prohibits the current CPU from interrupting and acquiring the spinlock, busy-waiting until the acquisition is successful, and returning the previous interrupt status:
5. The following function releases the spinlock and restores the previous interrupt status:
Usually, the ready tasks in the system are located in the global ready task table, and each task is randomly scheduled to run on which CPU. By placing the ready tasks into a CPU's local ready task list, RT-Thread allows tasks to be bound to CPUs, that is, the task can only be run on a specified CPU.
The task binding interface provided by RT-Thread is as follows:
When the value of the parameter cmd is RT_THREAD_CTRL_BIND_CPU
, the function binds the thread thread to the CPU specified by the parameter arg.
In order to port the RT-Thread system to other SMP platforms, you need to create the corresponding BSP and write the underlying code. Here we use the Allwinner T3 chip (quad-core Cortex-A7) as a template to explain the porting steps of RT-Thread on a multi-core Cortex-A platform. Kernel developers can refer to these steps to port RT-Thread to other multi-core Cortex-A chips.
RT-Thread uses the scons tool for compilation and configuration. Taking the operating system Ubuntu 18.04 as an example to illustrate the compilation environment, the following software packages need to be installed:
lib32ncurses5
lib32z1
python
scons
python-pip
make
zlib1g-dev
binutils-arm-none-eabi
When porting to other SMP platforms, you can use the Allwinner T3 BSP as a template, copy the folder bsp/allwinner_t3/, change the name to the target platform, such as bsp/some_cortexa_smp, and keep the following files:
Kconfig: BSP configuration description file, used to organize BSP configuration options
link.lds: link file used to generate BSP executable files
.config: BSP configuration result file, providing default configuration options
SConscript: scons configuration file, together with the SConscript file in the subdirectory, records the source files to be compiled
SConstruct: scons configuration file, used to set up the BSP compilation environment
applications/main.c: provides the user code entry main function
drivers/:
board.c and board.h: Contains the initialization code on the PCB board
drv_clock.c and drv_clock.h: Contains the clock tree setup code
platsmp.c and platsmp.h: Contains the SMP startup code
Enter the BSP directory bsp/allwinner_t3 and run the command scons --menuconfig
to configure the number of BSP cores. Modify the configuration options RT_CPUS_NR
and change the value to the actual number of SMP cores, as shown in the following figure:
RT-Thead uses direct mapping. The kernel and all tasks share the same address space. The physical memory address and device register address space use different mapping attributes, but the mapped virtual address and physical address are consistent.
During the initialization process, the platform_mem_desc[]
system sets up a unique MMU mapping table based on the array settings. Each item in the array corresponds to a description of an address mapping range, where the elements are the starting physical address, the ending physical address, the starting virtual address, and the mapping attributes.
The MMU mapping configuration array is defined in the file drivers/board.c. Taking Allwinner T3 as an example, its DDR memory start address is 0x40000000 and end address is 0xC0000000; and the device address range is from 0x01000000 to 0x40000000, so platform_mem_desc[]
the content of the array is as follows:
The compiled kernel image is usually loaded into the starting address of the memory for execution. The starting address used when the kernel is linked must be consistent with this value, which is determined by the link script link.lds. Taking Allwinner T3 as an example, the starting address of DDR memory is 0x40000000, so the loading address of the kernel image is also set to 0x40000000, as shown below:
When porting, modify this value to the actual loading address of the kernel image, usually the starting address of the physical memory.
According to the description of the secondary CPU startup process above, when porting RT-Thread to other ARMv7-A SMP chips, kernel developers need to provide the following three functions:
rt_hw_secondary_cpu_up()
,This function sets the boot entry address of the secondary CPU to secondary_cpu_start
,power on and start other CPU cores;
secondary_cpu_c_start()
, This function is used to initialize a single secondary CPU, mainly including initializing the interrupt controller interface, setting the interrupt vector table, and the tick interrupt of the current CPU. Finally, get the kernel spin lock _cpus_lock and call the function rt_system_scheduler_start()
to start the task scheduler of the current CPU;
rt_hw_secondary_cpu_idle_exec()
,This function is called cyclically by the idle thread of the secondary CPU and can be used to do power consumption related processing.
The above three functions are defined in the file drivers/platsmp.c. Among them, only rt_hw_secondary_cpu_up()
the function implementation is closely related to the chip and needs to be provided by the porter according to the chip characteristics. If the chip does not use the GIC interrupt controller and Generic Timer, the function also needs to be re-implemented secondary_cpu_c_start()
.
It should be noted that on the Allwinner T3 platform, the operating frequency of the Generic Timer is 24 MHZ, and the tick frequency is 1000, so the interrupt interval register CNTP_TVAL
is set to 24M / 1000 = 24000
. In the tick interrupt processing code, the interval register needs to be reset, and the code is as follows:
This value is reset each time a tick interrupt is triggered. For the interrupt handling code, see the description of OS tick above. If the operating frequency of the target platform is inconsistent with the above value, modify the function gt_set_interval()
parameters to set it.