Kernel porting

After studying the kernel chapter, everyone has a good understanding of RT-Thread, but many people may not be familiar with how to port the RT-Thread kernel to different hardware platforms. Kernel porting refers to running the RT-Thread kernel on different chip architectures and different boards, and can have functions such as thread management and scheduling, memory management, inter-thread synchronization and communication, and timer management. Porting can be divided into two parts: CPU architecture porting and BSP (Board support package) porting.

This chapter will introduce the CPU architecture transplantation and BSP transplantation. The CPU architecture transplantation part will be introduced in conjunction with the Cortex-M CPU architecture, so it is necessary to review the "Cortex-M CPU Architecture Basics" introduced in the previous chapter "Interrupt Management" . At the end of this chapter, the complete process of RT-Thread kernel transplantation is demonstrated with an example of actual transplantation to a development board. After reading this chapter, we will understand how to complete the kernel transplantation of RT-Thread.

There are many different CPU architectures in the embedded field, such as Cortex-M, ARM920T, MIPS32, RISC-V, etc. In order to enable RT-Thread to run on chips with different CPU architectures, RT-Thread provides a libcpu abstraction layer to adapt to different CPU architectures. The libcpu layer provides a unified interface to the kernel, including the switch of global interrupts, initialization of thread stacks, context switching, etc.

RT-Thread's libcpu abstraction layer provides a unified set of CPU architecture porting interfaces, which include global interrupt switch functions, thread context switch functions, clock beat configuration and interrupt functions, Cache, etc. The following table lists the interfaces and variables that need to be implemented for CPU architecture porting.

libcpu porting related API

Functions and variables

describe

rt_base_t rt_hw_interrupt_disable(void);

Disable global interrupts

void rt_hw_interrupt_enable(rt_base_t level);

Enable global interrupts

rt_uint8_t *rt_hw_stack_init(void *tentry, void *parameter, rt_uint8_t *stack_addr, void *texit);

Initialization of the thread stack. The kernel calls this function during thread creation and thread initialization.

void rt_hw_context_switch_to(rt_uint32_t to);

There is no context switch of the source thread. It is called when the scheduler starts the first thread and in the signal.

void rt_hw_context_switch(rt_uint32_t from, rt_uint32_t to);

Switch from the from thread to the to thread, used to switch between threads

void rt_hw_context_switch_interrupt(rt_uint32_t from, rt_uint32_t to);

Switch from the from thread to the to thread, used when switching in the interrupt

rt_uint32_t rt_thread_switch_interrupt_flag;

A flag indicating that a switch is required in an interrupt

rt_uint32_t rt_interrupt_from_thread, rt_interrupt_to_thread;

When the thread context switches, it is used to save the from and to threads.

Whether in kernel code or user code, there may be some variables that need to be used in multiple threads or interrupts. If there is no corresponding protection mechanism, it may lead to critical section problems. In order to solve this problem, RT-Thread provides a series of inter-thread synchronization and communication mechanisms. However, these mechanisms all need to use the global interrupt switch function provided by libcpu. They are:

/* 关闭全局中断 */
rt_base_t rt_hw_interrupt_disable(void);

/* 打开全局中断 */
void rt_hw_interrupt_enable(rt_base_t level);copymistakeCopy Success

The following describes how to implement these two functions on the Cortex-M architecture. As mentioned earlier, Cortex-M implements the CPS instruction for fast switching interrupts, which can be used here.

CPSID I ;PRIMASK=1, ; 关中断
CPSIE I ;PRIMASK=0, ; 开中断copymistakeCopy Success

Disable global interrupts

The functions that need to be completed in the rt_hw_interrupt_disable() function are:

1). Save the current global interrupt status and use the status as the return value of the function.

2). Disable global interrupts.

Based on MDK, the global interrupt is disabled on the Cortex-M core, as shown in the following code:

Disable global interrupts

;/*
; * rt_base_t rt_hw_interrupt_disable(void);
; */
rt_hw_interrupt_disable    PROC      ;PROC 伪指令定义函数
    EXPORT  rt_hw_interrupt_disable  ;EXPORT 输出定义的函数,类似于 C 语言 extern
    MRS     r0, PRIMASK              ; 读取 PRIMASK 寄存器的值到 r0 寄存器
    CPSID   I                        ; 关闭全局中断
    BX      LR                       ; 函数返回
    ENDP                             ;ENDP 函数结束copymistakeCopy Success

The above code first uses the MRS instruction to save the value of the PRIMASK register to the r0 register, then uses the "CPSID I" instruction to disable global interrupts, and finally uses the BX instruction to return. The data stored in r0 is the return value of the function. An interrupt can occur between the "MRS r0, PRIMASK" instruction and "CPSID I", which will not cause confusion in the global interrupt status.

Different CPU architectures have different conventions on how registers are managed during function calls and in interrupt handlers. A more detailed introduction to register usage conventions for Cortex-M can be found in the ARM official manual "Procedure Call Standard for the ARM ® Architecture".

Enable global interrupts

In rt_hw_interrupt_enable(rt_base_t level), the variable level is used as the state to be restored, overwriting the global interrupt state of the chip.

Based on MDK, the implementation on the Cortex-M core enables global interrupts, as shown in the following code:

Enable global interrupts

;/*
; * void rt_hw_interrupt_enable(rt_base_t level);
; */
rt_hw_interrupt_enable    PROC      ; PROC 伪指令定义函数
    EXPORT  rt_hw_interrupt_enable  ; EXPORT 输出定义的函数,类似于 C 语言 extern
    MSR     PRIMASK, r0             ; 将 r0 寄存器的值写入到 PRIMASK 寄存器
    BX      LR                      ; 函数返回
    ENDP                            ; ENDP 函数结束copymistakeCopy Success

The above code first uses the MSR instruction to write the value register r0 to the PRIMASK register, thereby restoring the previous interrupt status.

When dynamically creating and initializing threads, the internal thread initialization function _rt_thread_init() is used. The _rt_thread_init() function calls the stack initialization function rt_hw_stack_init(). In the stack initialization function, a context content is manually constructed. This context content will be used as the initial value of each thread's first execution. The layout of the context in the stack is shown in the following figure:

The following code is the code for stack initialization:

Building context in the stack

rt_uint8_t *rt_hw_stack_init(void       *tentry,
                             void       *parameter,
                             rt_uint8_t *stack_addr,
                             void       *texit)
{
    struct stack_frame *stack_frame;
    rt_uint8_t         *stk;
    unsigned long       i;

    /* 对传入的栈指针做对齐处理 */
    stk  = stack_addr + sizeof(rt_uint32_t);
    stk  = (rt_uint8_t *)RT_ALIGN_DOWN((rt_uint32_t)stk, 8);
    stk -= sizeof(struct stack_frame);

    /* 得到上下文的栈帧的指针 */
    stack_frame = (struct stack_frame *)stk;

    /* 把所有寄存器的默认值设置为 0xdeadbeef */
    for (i = 0; i < sizeof(struct stack_frame) / sizeof(rt_uint32_t); i ++)
    {
        ((rt_uint32_t *)stack_frame)[i] = 0xdeadbeef;
    }

    /* 根据 ARM  APCS 调用标准,将第一个参数保存在 r0 寄存器 */
    stack_frame->exception_stack_frame.r0  = (unsigned long)parameter;
    /* 将剩下的参数寄存器都设置为 0 */
    stack_frame->exception_stack_frame.r1  = 0;                 /* r1 寄存器 */
    stack_frame->exception_stack_frame.r2  = 0;                 /* r2 寄存器 */
    stack_frame->exception_stack_frame.r3  = 0;                 /* r3 寄存器 */
    /* 将 IP(Intra-Procedure-call scratch register.) 设置为 0 */
    stack_frame->exception_stack_frame.r12 = 0;                 /* r12 寄存器 */
    /* 将线程退出函数的地址保存在 lr 寄存器 */
    stack_frame->exception_stack_frame.lr  = (unsigned long)texit;
    /* 将线程入口函数的地址保存在 pc 寄存器 */
    stack_frame->exception_stack_frame.pc  = (unsigned long)tentry;
    /* 设置 psr 的值为 0x01000000L,表示默认切换过去是 Thumb 模式 */
    stack_frame->exception_stack_frame.psr = 0x01000000L;

    /* 返回当前线程的栈地址       */
    return stk;
}copymistakeCopy Success

In different CPU architectures, the context switch between threads and the context switch from interrupt to thread may have different or the same context registers. In Cortex-M, context switching is done uniformly using the PendSV exception, and the switching part is the same. However, in order to adapt to different CPU architectures, the libcpu abstraction layer of RT-Thread still needs to implement three thread switching related functions:

1) rt_hw_context_switch_to(): There is no source thread, switch to the target thread, and is called when the scheduler starts the first thread.

2) rt_hw_context_switch(): In a thread environment, switches from the current thread to the target thread.

3) rt_hw_context_switch_interrupt (): In an interrupt environment, switch from the current thread to the target thread.

There is a difference between switching in a thread environment and switching in an interrupt environment. In a thread environment, if the rt_hw_context_switch() function is called, the context switch can be performed immediately; in an interrupt environment, the context switch can only be performed after the interrupt handling function is completed.

Due to this difference, the implementation of rt_hw_context_switch() and rt_hw_context_switch_interrupt() is different on platforms such as ARM9. If the thread scheduling is triggered in the interrupt handler, the scheduling function will call rt_hw_context_switch_interrupt() to trigger the context switch. After the interrupt transaction is processed in the interrupt handler, before the interrupt exits, the rt_thread_switch_interrupt_flag variable is checked. If the value of the variable is 1, the thread context switch is completed according to the rt_interrupt_from_thread variable and the rt_interrupt_to_thread variable.

In the Cortex-M processor architecture, context switching can be implemented more concisely based on the features of automatic partial stack push and PendSV.

The context switching between threads is shown in the following figure:

Before entering the PendSV interrupt, the hardware automatically saves the PSR, PC, LR, R12, R3-R0 registers of the from thread, then saves the R11~R4 registers of the from thread in PendSV, and restores the R4~R11 registers of the to thread. Finally, after exiting the PendSV interrupt, the hardware automatically restores the R0~R3, R12, LR, PC, and PSR registers of the to thread.

The context switch from interrupt to thread can be represented by the following figure:

The hardware automatically saves the PSR, PC, LR, R12, R3-R0 registers of the from thread before entering the interrupt, and then triggers the PendSV exception. In the PendSV exception handling function, the R11~R4 registers of the from thread are saved, and the R4~R11 registers of the to thread are restored. Finally, after exiting the PendSV interrupt, the hardware automatically restores the R0~R3, R12, PSR, PC, and LR registers of the to thread.

Obviously, in the Cortex-M kernel, rt_hw_context_switch() and rt_hw_context_switch_interrupt() have the same function, both of which complete the saving and restoring of the remaining context in PendSV. So we only need to implement one code to simplify the porting work.

Implementing rt_hw_context_switch_to()

rt_hw_context_switch_to() has only the target thread, but no source thread. This function implements the function of switching to the specified thread. The following figure is the flow chart:

The implementation of rt_hw_context_switch_to() on the Cortex-M3 core (based on MDK) is shown in the following code:

MDK version of rt_hw_context_switch_to() implementation

;/*
; * void rt_hw_context_switch_to(rt_uint32_t to);
; * r0 --> to
; * this fucntion is used to perform the first thread switch
; */
rt_hw_context_switch_to    PROC
    EXPORT rt_hw_context_switch_to
    ; r0 的值是一个指针,该指针指向 to 线程的线程控制块的 SP 成员
    ; 将 r0 寄存器的值保存到 rt_interrupt_to_thread 变量里
    LDR     r1, =rt_interrupt_to_thread
    STR     r0, [r1]

    ; 设置 from 线程为空,表示不需要保存 from 的上下文
    LDR     r1, =rt_interrupt_from_thread
    MOV     r0, #0x0
    STR     r0, [r1]

    ; 设置标志为 1,表示需要切换,这个变量将在 PendSV 异常处理函数里切换时被清零
    LDR     r1, =rt_thread_switch_interrupt_flag
    MOV     r0, #1
    STR     r0, [r1]

    ; 设置 PendSV 异常优先级为最低优先级
    LDR     r0, =NVIC_SYSPRI2
    LDR     r1, =NVIC_PENDSV_PRI
    LDR.W   r2, [r0,#0x00]       ; read
    ORR     r1,r1,r2             ; modify
    STR     r1, [r0]             ; write-back

    ; 触发 PendSV 异常 (将执行 PendSV 异常处理程序)
    LDR     r0, =NVIC_INT_CTRL
    LDR     r1, =NVIC_PENDSVSET
    STR     r1, [r0]

    ; 放弃芯片启动到第一次上下文切换之前的栈内容,将 MSP 设置启动时的值
    LDR     r0, =SCB_VTOR
    LDR     r0, [r0]
    LDR     r0, [r0]
    MSR     msp, r0

    ; 使能全局中断和全局异常,使能之后将进入 PendSV 异常处理函数
    CPSIE   F
    CPSIE   I

    ; 不会执行到这里
    ENDPcopymistakeCopy Success

实现 rt_hw_context_switch()/ rt_hw_context_switch_interrupt()

Function rt_hw_context_switch() and function rt_hw_context_switch_interrupt() both have two parameters, from thread and to thread. They implement the function of switching from from thread to to thread. The following figure is the specific flow chart:

The implementation of rt_hw_context_switch() and rt_hw_context_switch_interrupt() on the Cortex-M3 core (based on MDK) is shown in the following code:

rt_hw_context_switch()/rt_hw_context_switch_interrupt() 实现

;/*
; * void rt_hw_context_switch(rt_uint32_t from, rt_uint32_t to);
; * r0 --> from
; * r1 --> to
; */
rt_hw_context_switch_interrupt
    EXPORT rt_hw_context_switch_interrupt
rt_hw_context_switch    PROC
    EXPORT rt_hw_context_switch

    ; 检查 rt_thread_switch_interrupt_flag 变量是否为 1
    ; 如果变量为 1 就跳过更新 from 线程的内容
    LDR     r2, =rt_thread_switch_interrupt_flag
    LDR     r3, [r2]
    CMP     r3, #1
    BEQ     _reswitch
    ; 设置 rt_thread_switch_interrupt_flag 变量为 1
    MOV     r3, #1
    STR     r3, [r2]

    ; 从参数 r0 里更新 rt_interrupt_from_thread 变量
    LDR     r2, =rt_interrupt_from_thread
    STR     r0, [r2]

_reswitch
    ; 从参数 r1 里更新 rt_interrupt_to_thread 变量
    LDR     r2, =rt_interrupt_to_thread
    STR     r1, [r2]

    ; 触发 PendSV 异常,将进入 PendSV 异常处理函数来完成上下文切换
    LDR     r0, =NVIC_INT_CTRL
    LDR     r1, =NVIC_PENDSVSET
    STR     r1, [r0]
    BX      LRcopymistakeCopy Success

Implementing the PendSV interrupt

In Cortex-M3, the PendSV interrupt handler is PendSV_Handler(). The actual work of thread switching is completed in PendSV_Handler(). The following figure is a specific flow chart:

The following code is the implementation of PendSV_Handler:

; r0 --> switch from thread stack
; r1 --> switch to thread stack
; psr, pc, lr, r12, r3, r2, r1, r0 are pushed into [from] stack
PendSV_Handler   PROC
    EXPORT PendSV_Handler

    ; 关闭全局中断
    MRS     r2, PRIMASK
    CPSID   I

    ; 检查 rt_thread_switch_interrupt_flag 变量是否为 0
    ; 如果为零就跳转到 pendsv_exit
    LDR     r0, =rt_thread_switch_interrupt_flag
    LDR     r1, [r0]
    CBZ     r1, pendsv_exit         ; pendsv already handled

    ; 清零 rt_thread_switch_interrupt_flag 变量
    MOV     r1, #0x00
    STR     r1, [r0]

    ; 检查 rt_interrupt_from_thread 变量是否为 0
    ; 如果为 0,就不进行 from 线程的上下文保存
    LDR     r0, =rt_interrupt_from_thread
    LDR     r1, [r0]
    CBZ     r1, switch_to_thread

    ; 保存 from 线程的上下文
    MRS     r1, psp                 ; 获取 from 线程的栈指针
    STMFD   r1!, {r4 - r11}       ; 将 r4~r11 保存到线程的栈里
    LDR     r0, [r0]
    STR     r1, [r0]                ; 更新线程的控制块的 SP 指针

switch_to_thread
    LDR     r1, =rt_interrupt_to_thread
    LDR     r1, [r1]
    LDR     r1, [r1]                ; 获取 to 线程的栈指针

    LDMFD   r1!, {r4 - r11}       ; 从 to 线程的栈里恢复 to 线程的寄存器值
    MSR     psp, r1                 ; 更新 r1 的值到 psp

pendsv_exit
    ; 恢复全局中断状态
    MSR     PRIMASK, r2

    ; 修改 lr 寄存器的 bit2,确保进程使用 PSP 堆栈指针
    ORR     lr, lr, #0x04
    ; 退出中断函数
    BX      lr
    ENDPcopymistakeCopy Success

With the foundation of global interrupt and context switching, RTOS can create, run, and schedule threads. With clock beat support, RT-Thread can schedule threads of the same priority using time slice rotation, implement timer functions, implement rt_thread_delay() delay function, and so on.

The work that needs to be done in the porting of libcpu is to ensure that the rt_tick_increase() function will be called periodically in the interruption of the clock beat. The calling period depends on the value of the macro RT_TICK_PER_SECOND in rtconfig.h.

In Cortex M, the clock beat function can be realized by implementing the SysTick interrupt processing function.

void SysTick_Handler(void)
{
    /* enter interrupt */
    rt_interrupt_enter();

    rt_tick_increase();

    /* leave interrupt */
    rt_interrupt_leave();
}copymistakeCopy Success

In actual projects, different boards may use the same CPU architecture, but carry different peripheral resources to complete different products, so we also need to adapt the boards. RT-Thread provides a BSP abstraction layer to adapt to common boards. If you want to use the RT-Thread kernel on a board, in addition to the corresponding chip architecture porting, you also need to port it to the board, that is, to implement a basic BSP. The main task is to establish a basic environment for the operating system to run. The main tasks that need to be completed are:

1) Initialize CPU internal registers and set RAM working timing.

2) Implement clock drive and interrupt controller drive, and improve interrupt management.

3) Implement serial port and GPIO drivers.

4) Initialize the dynamic memory heap and implement dynamic heap memory management.

Last updated

Assoc. Prof. Wiroon Sriborrirux, Founder of Advance Innovation Center (AIC) and Bangsaen Design House (BDH), Electrical Engineering Department, Faculty of Engineering, Burapha University