“Many embedded systems are deployed in locations that are difficult or inaccessible for operators, especially Internet of Things (IoT) applications. These applications are typically deployed in large numbers and have limited battery life, examples include embedded systems monitoring the health of people or machines. These challenges, combined with rapidly iterative software life cycles, have resulted in the need for many systems to support over-the-air (OTA) updates, which replace software on an embedded system’s microcontroller or microprocessor with new software, although many are very familiar with mobile devices OTA updates on the Internet, but designing and implementing them on resource-constrained systems presents many different challenges.This article will introduce the OTA update for
“
Many embedded systems are deployed in locations that are difficult or inaccessible for operators, especially Internet of Things (IoT) applications. These applications are typically deployed in large numbers and have limited battery life, examples include embedded systems monitoring the health of people or machines. These challenges, combined with rapidly iterative software life cycles, have resulted in the need for many systems to support over-the-air (OTA) updates, which replace software on an embedded system’s microcontroller or microprocessor with new software, although many are very familiar with mobile devices OTA updates on the Internet, but designing and implementing them on resource-constrained systems presents many different challenges. This article will introduce several different software designs for OTA updates, discuss their advantages and disadvantages, and we will see how the OTA update software takes advantage of the hardware features of two ultra-low-power microcontrollers.
building blocks
server and client
OTA updates replace the current software on the device with new software, which is downloaded over the air. In an embedded system, the device running this software is usually a microcontroller. A microcontroller is a small computing device with limited memory, speed, and power consumption. A microcontroller typically contains a microprocessor (core) and digital hardware modules (peripherals) that perform specific operations. Ultra-low-power microcontrollers with typical power consumption of 30µA/MHz to 40µA/MHz in active mode are ideal for this type of application. Using specific hardware peripherals on these microcontrollers and putting them into low-power modes is an important part of software design for OTA updates. Figure 1 shows an example of an embedded system that may require an OTA update. As you can see, a microcontroller is connected to an RF transceiver and sensors, which can be used in IoT applications, where sensors are used to collect data about the environment, and wireless transceivers are used to periodically report the data. This part of the system, called edge nodes or clients, is the target of OTA updates. The other part of the system, called the cloud or server, is the provider of new software. The server and client communicate over a wireless connection using a wireless transceiver.
Figure 1. Server/Client Architecture in Example Embedded System
What is a software application?
Much of the OTA update process is the transfer of new software from the server to the client. After the software has converted from the source format to the binary format, it is transmitted as a sequence of bytes. This conversion process involves compiling a source code file (eg c, cpp), linking it into an executable (eg exe, elf), and then converting the executable to a portable binary format (eg bin, hex). In a nutshell, these file formats contain a sequence of bytes that needs to be stored at a specific address in the microcontroller’s memory. Often, we conceptualize information sent over a wireless link as data, such as commands to change the state of a system or sensor data collected by the system. In the case of OTA updates, the data is the new software in binary format. In many cases, the binary file is too large to be sent from the server to the client in a single transfer, which means that the binary file needs to be put into several different packets, a process called “packetization”. To better illustrate this process, Figure 2 demonstrates how different versions of the software generate different binaries that send different packets during an OTA update. In this simple example, each packet contains 8 bytes of data, with the first 4 bytes representing the address in client memory where the last 4 bytes are stored.
Figure 2. The binary conversion and subpackage process of a software application
main challenge
Based on this high-level description of the OTA update process, an OTA update solution must address three major challenges. The first challenge is related to memory. The software solution must organize the new software application into the client device’s volatile or non-volatile memory so that it can be executed when the update process is complete. The solution must ensure that the previous version of the software is kept as a fallback application in case of problems with the new software. In addition, we must keep the state of the client device – such as the currently running software version and its location in memory – unchanged across resets and power cycles. The second big challenge is communication. The new software must be sent from the server to the client in discrete packets, each of which is placed at a specific address in the client’s memory. The subcontracting scheme, data packet structure and data transmission protocol must be considered in the software design. The last major challenge is security. When new software is sent wirelessly from the server to the client, we must ensure that the server is a trusted party. This security challenge is called authentication. We must also obfuscate new software to prevent theft, as it may contain sensitive information. This security challenge is called secrecy. The final element of security is integrity, which is ensuring that new software is not corrupted when it is sent over the air.
Second Stage Bootloader (SSBL)
Understanding Boot Sequences
The main bootloader is a software application that resides permanently in the read-only memory of the microcontroller. The storage area where the main bootloader is located is called the information space and is sometimes inaccessible to the user. The application program is executed on each reset, generally completing some necessary hardware initialization, and possibly loading user software into memory. However, if the microcontroller contains on-chip non-volatile memory such as flash memory, the bootloader does not need to do any loading, it just transfers control to the program in flash memory. If the main bootloader does not support OTA updates, there must be a second stage bootloader. Like the main bootloader, the SSBL will run on every reset, but will implement part of the OTA update process. This boot sequence is shown in Figure 3. This section will explain why a second-stage bootloader is needed and explain how designing this application is an important design trade-off.
Figure 3. Example of memory map and boot flow using SSBL
Lessons Learned: Be Sure to Have an SSBL
Conceptually, it seems simpler to omit the SSBL and put all OTA update functionality into the user application, as the OTA process can then seamlessly leverage existing software frameworks, operating systems, and device drivers. Figure 4 shows the memory map and boot sequence for a system that chooses this method.
Figure 4. Example of memory map and boot flow without SSBL
Application A is the original application deployed on the field microcontroller. This application contains OTA update related software, which is used to download Application B when requested by the server. After the download is complete and application B is verified, application A will execute a branch instruction to application B’s reset handler to transfer control to application B. A reset handler is a small piece of code that serves as an entry point to a software application and runs on reset. In this case, reset is simulated by executing a branch, which is equivalent to a function call. There are two major problems with this approach:
1. Many embedded software applications employ a real-time operating system (RTOS), which allows software to be split into multiple concurrent tasks, each with different responsibilities in the system. For example, the application shown in Figure 1 might have an RTOS task to read sensors, an RTOS task to run some algorithm on the sensor data, and an RTOS task to interface with the radio. The RTOS itself is always active and is responsible for switching these tasks based on asynchronous events or specific time-based delays. Therefore, it is not safe to branch from an RTOS task to a new program, as other tasks continue to run in the background. For real-time operating systems, the only safe way to terminate a program is through a reset.
2. Based on Figure 4, the solution to the above problem is to have the main bootloader branch to application B instead of application A. But on some microcontrollers, the main bootloader always runs the program with the Interrupt Vector Table (IVT); the IVT is a critical part of the application that describes the interrupt handler and is located at address 0. This means that the IVT must be relocated in some form so that its reset maps to Application B. If a power cycle occurs during IVT relocation, the system may be left in a permanently broken state.
3. Fixing the SSBL at address 0 solves these problems, as shown in Figure 3. SSBL is not an RTOS program, so it is safe to branch into new applications. The IVT of the SSBL at address 0 is never relocated, so there is no need to worry about a power cycle that will put the system in a catastrophic state.
Design Tradeoffs: The Role of SSBLs
We spend a lot of time discussing SSBL and how it relates to application software, but what does the SSBL program do? At a minimum, the program must determine what the current application is (where it starts), and then branch to that address. The location of the various applications in the microcontroller memory is typically kept in a table of contents (ToC), as shown in Figure 3. This is a shared area in resident memory that both the SSBL and application software use to communicate with each other. When the OTA update process is complete, the ToC is updated with new application information. Some parts of the OTA update function can also be pushed to SSBL. When developing software for OTA updates, determining which parts to push is an important design decision. The above minimal SSBL will be very simple, easy to verify, and will most likely not require modification during the lifetime of the application. However, this means that each app is responsible for downloading and validating the next app. This can lead to code duplication in the communication stack, device firmware, and OTA update software. On the other hand, we can choose to push the entire OTA update process to SSBL. In this case, the application simply sets a flag in the ToC to request an update, and then performs a reset. SSBL then performs the download sequence and verification process. This will minimize code duplication and simplify application-specific software. However, this introduces a new challenge, which may be the need to update the SSBL itself (ie, update the update code). Ultimately, deciding which functions to place in the SSBL will depend on the memory constraints of the client device, the similarity between the downloaded applications, and the portability of the OTA update software.
Design Tradeoff: Caching and Compression
Another key design decision in OTA update software is how to organize incoming applications in memory during the OTA update process. There are usually two types of memory on microcontrollers: non-volatile memory (such as flash memory) and volatile memory (such as SRAM). Flash memory is used to store the application’s program code and read-only data, as well as other system-level data such as ToCs and event logs. SRAM is used to store modifiable parts of software applications such as non-constant global variables and stacks. The software application binary file shown in Figure 2 contains only certain parts of the program that exist in non-volatile memory. During the startup routine, the application will initialize the portion of volatile memory.
During an OTA update, each time the client device receives a packet from the server that contains part of this binary, it stores it in SRAM. The packet can be compressed or uncompressed. The benefit of compressing the application binary is that the files are smaller, so fewer packets are sent, and the SRAM space required to store the packets during the download process is reduced accordingly. The disadvantage of this approach is that compression and decompression increase the processing time of the update process, and compression-related code must be bundled in the OTA update software.
The new application software should be stored in the flash memory, but arrive in the SRAM during the update process, so the OTA update software needs to perform a write operation to the flash memory at some point during the update process. The operation of temporarily storing new applications in SRAM is called caching. In summary, there are three different caching approaches that OTA update software can take.
• No caching: Every time a packet arrives that contains part of a new application, it is written to the destination in flash. This scheme is very simple and minimizes the amount of logic in the OTA update software, but requires a complete erase of the flash area corresponding to the new application. This method eats up flash and adds overhead.
・ Partial Cache: Reserve an area of SRAM for buffering and store new packets in this area when they arrive. When the area fills up, data is written to flash to clear the area. This scheme can become complicated if packets arrive out of order or if there are gaps in the new application binary, as a way to map SRAM addresses to flash addresses is required. One strategy is to have the cache act as a mirror of part of the flash. Flash memory is divided into small areas called pages, which are the smallest areas available for erasing. Thanks to this natural division, a good approach is to cache a page of flash in SRAM, and when it fills up or the next packet belongs to another page, write that page to flash to flush the cache.
・Full Cache: Store the entire new application in SRAM during the OTA update process, and only write it to flash after the new application is completely downloaded from the server. This method overcomes the shortcomings of the previous methods, the number of writes to the flash memory is minimal, and the OTA update software does not require complex caching logic. However, this limits the size of new applications downloaded, as the amount of available SRAM in the system is usually much less than the amount of available flash memory.
Figure 5. A page of flash memory using SRAM cache
Figure 5 shows the second option in the OTA update process, partial caching, with the portion of the flash corresponding to application A from Figures 3 and 4 enlarged and showing the functional memory map of the SRAM used for the SSBL. Example flash page size is 2kB. Ultimately, this design decision will depend on the size of the new application and the complexity allowed by the OTA update software.
Security and Communications
Design Tradeoffs: Software vs. Protocols
OTA update solutions must also address security and communication issues. As shown in Figure 1, many systems implement communication protocols in hardware and software to support normal (non-OTA update-related) operation of the system, such as exchanging sensor data. This means that a (possibly secure) method of wireless communication has been established between the server and the client. Communication protocols that can be used by an embedded system like the one shown in Figure 1 are Bluetooth® Low Energy (BLE) or 6LoWPAN. Sometimes these protocols support the security and data exchange that OTA update software can take advantage of during the OTA update process.
The amount of communication capabilities that must be built into OTA update software will ultimately depend on the level of abstraction provided by existing communication protocols. Existing communication protocols have facilities for sending and receiving files between the server and client, which OTA update software can simply use for the download process. However, if the communication protocol is primitive and there is only a tool to send the raw data, then the OTA update software may need to perform sub-package processing and provide metadata and new application binaries. This also applies to security challenges. If the communication protocol does not support it, the OTA update software may be responsible for decrypting the bytes sent over the air confidentially.
In summary, what features to implement in the OTA update software, such as custom packet structures, server/client synchronization, encryption and key exchange, etc., will depend on what the system’s communication protocol provides and what is required for security and robustness. Require. The next section will present a complete security solution that addresses all the challenges presented previously, and we will show how the microcontroller’s cryptographic hardware peripherals can be leveraged in this solution.
Solve security challenges
Our security solutions need to allow new applications to be sent confidentially over the air, detect any corruption in new applications, and verify that new applications are being sent from a trusted server and not a malicious party. These challenges can be solved by cryptographic operations. Specifically, the security solution can use two cryptographic operations: encryption and hashing. Encryption uses a secret key (password) shared by the client and server to obfuscate data sent over the air. The specific encryption type that may be supported by the microcontroller’s cryptographic hardware accelerator is AES-128 or AES-256, depending on the key size. In addition to encrypting the data, the server can also send a digest to ensure there is no corruption. The digest is generated by hashing the packet, an irreversible mathematical function used to generate a unique code. After the server produces a message or digest, if any part of it is modified, such as a bit flipping during wireless communication, the client will notice the modification when it performs the same hash function on the packet and compares the digests . A specific type of hashing that a microcontroller’s cryptographic hardware accelerator may support is SHA-256. Figure 6 shows a block diagram of the cryptographic hardware peripheral in the microcontroller, with the OTA update software residing in the Cortex-M4 application layer. This figure also shows its support for storing protected keys in peripherals, which OTA update software solutions can take advantage of to securely store client keys.
Figure 6. Hardware block diagram of the cryptographic accelerator on the ADuCM4050
A common technique to address the ultimate challenge of authentication is to use asymmetric encryption. For this operation, the server generates a public-private key pair. Only the server knows the private key, and the client knows the public key. Using the private key, the server can generate a signature for a given block of data, such as a digest of a packet to be sent over the air. The signature is sent to the client, which can verify the signature using the public key. This way, the client can confirm that the message was sent from the server and not from a malicious third party. This sequence is shown in Figure 7, with solid arrows representing function input/output and dashed arrows representing information sent wirelessly.
Figure 7. Authenticating a message using asymmetric encryption
Most microcontrollers do not have hardware accelerators for performing these asymmetric cryptographic operations, but can be implemented using software libraries such as Micro-ECC that are specific to resource-constrained devices. The library requires a user-defined random number generation function, which can be implemented using the true random number generator hardware peripheral on the microcontroller. While these asymmetric cryptographic operations solve the trust challenge during OTA updates, they consume significant processing time and require the signature to be sent with the data, which increases the packet size. We could perform this check once at the end of the download using a digest of the last packet or a digest of the entire new software application, but then third parties would be able to download untrusted software to the client, which is not ideal. Ideally, we want to verify that every packet received is from a server we trust without the overhead of requiring a signature every time. This can be achieved using hash chains.
A hash chain incorporates the cryptographic concepts discussed in this section into a series of packets in order to mathematically tie them together. As shown in Figure 8, the first packet (number 0) contains a summary of the next packet. The payload of the first packet is not the actual software application data, but a signature. The payload of the second packet (number 1) contains part of the binary file and a digest of the third packet (number 2). The client verifies the signature in the first packet and caches the digest H0 for later use. When the second packet arrives, the client hashes the payload and compares it to H0. If they match, the client can be sure that the subsequent packet is from a trusted server, without the need for laborious signature checking. The high-overhead task of generating this chain is left to the server, and the client simply caches and hashes each packet as it arrives, ensuring that arriving packets are intact and authenticated.
Figure 8. Applying a hash chain to a sequence of packets
Experimental setup
The ultra-low-power microcontrollers that address the memory, communication, and security design challenges described in this article are the ADuCM3029 and ADuCM4050. These microcontrollers contain the hardware peripherals discussed in this article for OTA updates, such as flash memory, SRAM, cryptographic accelerators, and true randomness. number generator. Device Family Packages (DFPs) for these microcontrollers provide software support for building OTA update solutions on these devices. DFP includes peripheral drivers to provide a simple and flexible interface for working with hardware.
Hardware Configuration
To demonstrate the concepts discussed in this article, we created an OTA update software reference design using the ADuCM4050. For the client, an ADuCM4050EZ-KIT® connects to the ADF7242 using the transceiver daughterboard horseshoe connector. The client device is shown on the left side of Figure 9. For the server, we developed a Python application that runs on a Windows PC. The Python application communicates over the serial port with another ADuCM4050EZ-KIT, which also has an ADF7242 connected in the same configuration as the client. However, the EZ-KIT on the right in Figure 9 does not perform the OTA update logic and just relays the packets received from the ADF7242 to the Python application.
Figure 9. Experimental hardware setup
software components
The software reference design partitions the flash memory of the client device, as shown in Figure 3. The main client application is very portable and configurable so that other scenarios or 6 analog conversations 52-11, November 2018 can also be used on other hardware platforms. Figure 10 shows the software architecture of the client device. Note that while we sometimes refer to the entire application as the SSBL, in Figure 10, and from now on, we logically separate the real SSBL part (blue) from the OTA update part (red) because the latter It does not necessarily need to be fully implemented in the above application. The hardware abstraction layer shown in Figure 10 makes the OTA client software portable and independent of any underlying libraries (shown in orange).
Figure 10. Client software architecture
The software application implements the bootstrap sequence in Figure 3 (a simple communication protocol for downloading new applications from the server) and the hash chain. Each packet in the communication protocol has a 12-byte metadata header, a 64-byte payload, and a 32-byte digest. In addition, it has the following features:
• Cache: Supports uncached or cached one page of flash memory, depending on user configuration.
・Directory: The ToC is designed to hold only two apps, and new apps are always downloaded to the oldest location to keep a backup app. This is called an A/B update scheme.
• Messaging: Supports ADF7242 or UART for messaging, depending on user configuration. Using the UART for messaging removes the EZ-KIT on the left of Figure 9, leaving only the right kit for the client. This wired update scheme is useful for initial system startup and debugging.
result
In addition to meeting functional requirements and passing various tests, the performance of the software is also important in judging the success of a project. Two metrics are commonly used to measure the performance of embedded software: footprint and number of cycles. Footprint refers to the amount of space a software application occupies in volatile (SRAM) and nonvolatile (Flash) memory. Cycles refers to the number of microprocessor clock cycles used by software to perform a particular task. It is similar to the software runtime, but while performing an OTA update, the software may enter a low-power mode where the microprocessor is inactive and consumes no cycles. While software reference designs are not optimized for either metric, they are useful for program benchmarking and comparing design trade-offs.
Figure 11 and Figure 12 show the footprint of the OTA update software reference design implemented on the ADuCM4050 (without caching). The diagrams are divided according to the components shown in Figure 10. As shown in Figure 11, the entire application uses about 15kB of flash memory. Given that the ADuCM4050 contains 512B of flash memory, this footprint is very small. The real application software (software developed for the OTA update process) only needs about 1.kB, the rest is for libraries such as DFP, Micro-ECC and ADF7242 stack. These results help illustrate the design tradeoffs of what role SSBL should play in the system. Most of the 1kB footprint is for the update process. The SSBL itself only takes up about 500 bytes of space, plus 1kB to 2kB of DFP code to access devices like flash drives.
Figure 11. Flash Footprint (Bytes)
Figure 12. SRAM Footprint (Bytes)
To evaluate the software overhead, we count cycles each time a packet is received, and then calculate the average number of cycles consumed per packet. Each packet requires AES-128 decryption, SHA-256 hashing, flash writing, and some kind of packet metadata validation. With a packet payload of 64 bytes and no buffering, the overhead for processing a single packet is 7409 cycles. With a 26MHz core clock, it takes about 285 microseconds of processing time. This value is calculated using the cycle count driver in the ADuCM4050DFP (unadjusted number of cycles) and is an average during a 100kB binary download (~1500 packets). To minimize per-packet overhead, the driver in the DFP should utilize the Direct Memory Access (DMA) hardware peripheral on the ADuCM4050 to perform bus transactions, and the driver should put the processor in low power during each transaction. consumption sleep state. There is no one-size-fits-all state per transaction, if we disable low-power sleep in DFP and change the bus transaction to not use DMA, the overhead per packet increases to 17,297 cycles. This illustrates the impact of efficient use of device drivers on embedded software applications. Although reducing the number of data bytes per packet also reduces overhead, doubling the number of data bytes per packet to 128 only increases the number of cycles by a small amount, resulting in 8362 cycles for the same experiment.
The cycle count and footprint also explain the trade-off discussed earlier — caching packet data instead of writing to flash every time. After enabling cache one page of flash, the overhead per packet is reduced from 7409 to 5904 cycles. This 20% reduction comes from the fact that the update process skips flash writes for most packets and only performs flash writes when the cache is full. The price is an increase in the SRAM footprint. When no cache is used, the HAL requires only 336 bytes of SRAM, as shown in Figure 12. However, when using the cache, a space equivalent to a full page of flash memory must be reserved, so the SRAM occupancy increases to 2388 bytes. The flash used by the HAL also increases by a small amount due to the extra code required to determine when the cache must be emptied.
These results demonstrate that design decisions have a tangible impact on software performance. There is no one-size-fits-all solution, each system has different requirements and constraints, and OTA software updates need to be treated on a case-by-case basis. Hopefully this article clarifies common issues and tradeoffs encountered when designing, implementing, and validating OTA update software solutions.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.