ACM Transactions on

Embedded Computing Systems (TECS)

Latest Articles

DLSpace: Optimizing SSD Lifetime via An Efficient Distributed Log Space Allocation Strategy

Due to limited numbers of program/erase cycles (i.e., P/Es) of NAND Flash, excessive out-of-place update and erase-before-write operations wear out these P/Es during garbage collections, which adversely shorten solid state disk (i.e., SSD) lifetime. The log space in NAND Flash space of an SSD performs as an updated page ′s buffer, which... (more)

Energy-Efficient Multicore Scheduling for Hard Real-Time Systems: A Survey

As real-time embedded systems are evolving in scale and complexity, the demand for a higher performance at a minimum energy consumption has become a... (more)

Exact WCRT Analysis for Message-Processing Tasks on Gateway-Integrated In-Vehicle CAN Clusters

A typical automotive integrated architecture is a controller area network (CAN) cluster integrated... (more)

An Efficient UAV Hijacking Detection Method Using Onboard Inertial Measurement Unit

With the fast growth of civil drones, their security problems meet significant challenges. A commercial drone may be hijacked by a GPS-spoofing attack... (more)


About TECS 

The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM Transactions on Embedded Computing Systems (TECS) aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems. 


TECS Editor-in-Chief featured in "People of ACM"

Sandeep K. Shukla was recently reappointed as Editor-in-Chief of ACM Transactions on Embedded Computing Systems (TECS), and he was featured in the periodic series "People of ACM".  Full article.

Forthcoming Articles

Editorial: Human Factors in Embedded Computing

BlueIO: A Scalable Real-Time Hardware I/O Virtualization System for Many-core Embedded Systems

In safety-critical systems, time predictability is vital. This extends to I/O operations which require predictability, timing-accuracy, parallel access, scalability and isolation. Currently, existing approaches can not achieve all these requirements at the same time. In this paper, we propose a framework of hardware-implemented real-time I/O virtualization system termed BlueIO to meet all these requirements simultaneously. BlueIO integrates the important functionalities of I/O virtualization, low layer I/O drivers and a clock cycle level timing-accurate I/O controller  GPIOCP [49]. BlueIO provides this functionality in the hardware layer, supporting abstract virtualised access to I/O from the software domain. The hardware implementation includes I/O virtualization and I/O drivers, provides isolation and parallel (concurrent) access to I/O operations and improves I/O performance. Furthermore, the approach includes the previously proposed GPIOCP to guarantee that I/O operations will occur at a speci c clock cycle (i.e. be timing-accurate and predictable). In this paper, we propose the design and implementation of BlueIO  a real-time I/O virtualization system. We demonstrate how a BlueIO-based system can be exploited to meet real-time requirements with signi cant improvements in I/O performance and a low running cost on di erent OSs. We also present a hardware consumption analysis of BlueIO, in order to show that it linearly scales with the number of CPUs and I/O devices, evidenced by our implementation which targets both FPGA and VLSI.

Ensuring Secure Application Execution and Platform Specific Execution in Embedded Devices

The Internet of Things (IoT) is expanding at a large rate, with devices found in commercial and domestic settings from industrial sensors to home appliances. However, as the IoT market grows, so does the number of attacks made against it with some reports claiming an increase of 600\% in 2017. This work seeks to prevent code replacement, injection and exploitation attacks by ensuring correct and platform specific application execution. This combines two previously studied problems: secure application execution and binding hardware and software. We present descriptions of both problems and requirements for ensuring both simultaneously. We then propose a scheme extending previous work that meets these requirements, and describe our implementation of the soft-core Secure Execution Processor developed and tested on Xilinx Spartan-6 FPGA. Finally, we analyse the scheme and our implementation according to performance and the requirements listed.

Stigmergy based Security for SoC Operations from Runtime Performance Degradation of SoC Components

Present semiconductor design industry has embraced the globalization strategy for System on Chip(SoC) design.However, attacks due to vulnerability of hardware like Hardware Trojans and counterfeiting has raised significant concerns.Both root of untrust may be undetectable during testing but may get exhibited via sudden performance degradation at runtime.Threat analysis is performed for real time SoC operations due to sudden performance degradation of any of the SoC components, procured from untrusted third party vendors.Refuge is sought to stigmergic behavior exhibited in insect colonies to propose a decentralized self aware security mechanism.Experimental validation and low overhead depicts prospect of our proposed approach.

Compact and Flexible FPGA Implementation of Ed25519 and X25519

This paper describes an FPGA cryptographic hardware which combines elliptic curve based Ed25519 digital signature algorithm and the X25519 key establishment scheme in a single module. Cryptographically, these are high security elliptic curve cryptography algorithms with short key sizes and impressive execution times. Our goal is to provide a lightweight FPGA module, that enable them on resource-constrained devices, specifically for IoT applications. In addition, we aim at extensibility with customisable countermeasures against side-channel and fault-injection attacks. For the former, we offer a choice between time-optimised versus constant-time execution, with or without base point blinding; and for the latter, we offer enabling or disabling default-case statements in the FSM descriptions. To obtain compactness and at the same time fast execution times, we make maximum use of the DSP slices on the FPGA for designing a single arithmetic unit that is flexible to support operations with two moduli and non-modulus arithmetic. In addition, our design benefits in-place memory management and local storage of inputs into DSP slices' pipeline registers and takes advantage of distributed RAMs. These eliminate communication bottle-neck. The flexibility is offered by a micro-coded approach. While our design combines Ed25519 and X25519 in a single module, it can be optimized only for X25519 which gives more compact hardware than previously published X25519 implementations. Our design targets a 7-Series Xilinx FPGAs and realized on a Zynq platform. Its basic Ed25519 implementation requires only around 10.3 K LUTs, 2.6 K registers and 16 DSP slices, resulting in 1.6 ms for a signature generation, and 3.6 ms for a signature verification with a 82MHz clock. While optimizing it only for X25519 gives greater results, enabling the optional security features introduces resource and performance overheads.

Design-Level and Code-Level Security Analysis of IoT Devices

The Internet of Things (IoT) is playing an important role in different aspects of our lives. Smart grids, smart cars, and medical devices all incorporate IoT devices as key components. The ubiquity and criticality of these devices make them an attractive target for attackers. Therefore, we need techniques to analyze their security, so that we can address their potential vulnerabilities. Security analysis techniques may operate at the design-level, to avoid state-space explosion, or at the code-level for ensuring accuracy. In this paper we introduce one techniques for each category, and compare their effectiveness on a real IoT testbed.

Combining PUF with RLUTs: A Two Party Pay Per Device IP Licensing Scheme On FPGAs

With the popularity of modern FPGAs, the business of FPGA specific intellectual properties (IP) is expanding rapidly. This also brings in the concern of IP protection. FPGA vendors are making serious efforts for IP protection leading to standardization schemes like IEEE P1735. However, efficient techniques to prevent unauthorized overuse of IP still remain an open question. In this paper, we propose a two-party IP protection scheme combining the re-configurable LUT (RLUT) primitive of modern FPGAs with physically unclonable functions (PUF). The proposed scheme is considerably lightweight compared to existing schemes, prevents overuse and does not involve FPGA vendors or trusted third parties for IP licensing. The validation of the proposed scheme is done on MCNC'91 benchmark and third party IPs like AES and lightweight MIPS processor.

Cache Reconfiguration using Machine Learning for Vulnerability-aware Energy Optimization

Dynamic cache reconfiguration has been widely explored for energy optimization and performance improvement for single-core systems. Cache partitioning techniques are introduced for the shared cache in multicore systems to alleviate inter-core interference. While these techniques focus only on performance and energy, they ignore vulnerability due to soft errors. In this paper, we present a static profiling based algorithm to enable vulnerability-aware energy-optimization for real-time multicore systems. Our approach can efficiently search the space of cache configurations and partitioning schemes for energy optimization while task deadlines and vulnerability constraints are satisfied. A machine learning technique has been employed to minimize the static profiling time without sacrificing the accuracy of results. Our experimental results demonstrate that our approach can achieve 19.2% average energy savings compared with the base configuration, while drastically reduce the vulnerability (49.3% on average) compared to state-of-the-art techniques. Furthermore, the machine learning technique enabled more than 10x speedup in static profiling time with negligible prediction error of 3%.

XOR-Based Low-Cost Reconfigurable PUFs for IoT Security

With the rapid development of the Internet of Things (IoT), security has attracted considerable interest. Conventional security solutions that have been proposed for Internet based on classical cryptography cannot be applied to IoT nodes due to the resource-constrained platform. A physical unclonable function (PUF) can be used to generate a key online or uniquely identify an integrated circuits (ICs) by extracting its internal random differences using the so-called challenge-response pairs (CRPs). The PUF is a new type of hardware-based security primitive; it is regarded as a promising low-cost solution for IoT security. A logic reconfigurable PUF (RPUF) is highly efficient in terms of hardware cost. This paper first presents a new classification of RPUFs into circuit based RPUF (C-RPUF) and algorithm based RPUF (A-RPUF); two XOR-based RPUF circuits (namely the XOR-based reconfigurable bistable ring PUF (XRBR PUF) and the XOR-based reconfigurable ring oscillator PUF (XRRO PUF)) are proposed. Both the XRBR and XRRO PUFs are implemented using Xilinx Spartan-6 FPGAs. The implementation results are compared with previous PUF designs showing a good uniqueness and reliability. Compared to conventional PUF designs, the most significant advantage of the proposed designs is that they are highly efficient in terms of hardware cost. Moreover, the XRRO PUF is the most efficient design when compared with previous RPUFs. Also, both the proposed XRRO and XRBR PUFs require only 12.5% of the hardware resources of previous bitstable ring PUFs and reconfigurable RO PUFs, respectively, to generate a 1-bit response; this confirms that the proposed XRBR and XRRO PUFs are very efficient designs with good uniqueness and reliability.

FPGA Implementation of ECC over GF(2m) for Small Embedded Applications

In this paper, we propose a compact ECC core over GF(2m). The proposed architecture is based on the Lopez-Dahab projective point arithmetic operations. Efficiency is acheived using ROM-based state machine for ECC point doubling and addition operations. The compact core is implemented using Virtex FPGA devices. The required number of slices is 2123 at 198MHz and 8068 slices at 335MHz for different GF(2m). Extensive experiments were conducted to compare our solution with existing methods. Our proposed compact core outperforms previously proposed methods in terms of speed and area usage which makes it the right choice for cryptosystems in limited-resource devices.

A Lightweight Cryptographic Protocol with Certificateless Signature Scheme for the Internet of Things

Recently, the popularity of smart-devices (e.g., IoT devices or smartphones) has led to a rapid development and significant advancement of ubiquitous applications for mobile commerce around the world. Novel transaction schemes, such as Apple Pay, Android Pay and Samsung Pay, are becoming a more popular way for new types of payments no matter what type of smart IoT-devices are used. Due to the promptly growing importance of security, a great deal of attention has come to the topic of how to construct a robust transaction protocol during online payments. In this study, we demonstrate a lightweight cryptographic protocol based on a sturdy certificateless signature scheme with robust bilinear pairing crypto-primitives. We elegantly refine the proposed cryptographic protocol to account for computation-limited smart-devices during transaction. The practicability of the proposed protocol is then guaranteed via a rigorous security analysis and a thorough performance evaluation conducted by us, where an IoT-based test-bed, i.e. the Raspberry PI platform, is acted as a underlying architecture of the implementation of our proposed cryptographic protocol.

Chimp: a Learning-based Power Aware Communication Protocol for Wireless Body Area Networks

Radio links in wireless body area networks (WBANs) commonly experience highly time-varying attenuation due to the dynamic network topology and frequent occlusions caused by body movements, making it challenging to design a reliable, energy-efficient and real-time communication protocol for WBANs. In this paper, we present Chimp, a learning-based power-aware communication protocol in which each sending node can self-learn the channel quality and choose the best transmission power level to reduce energy consumption, and interference range while still guaranteeing high communication reliability. Chimp is designed based on learning automata that uses only the acknowledgment packets and motion data from a local gyroscope sensor to infer the real-time channel status. We design a new cost function that takes into account the energy consumption, communication reliability and interference, and develop a new learning function that can guarantee to select the optimal transmission power level to minimize the cost function for any given channel quality. For highly dynamic postures such as walking and running, we exploit the correlation between channel quality and motion data generated by a gyroscope sensor to fastly estimate channel quality, eliminating the need to use expensive channel sampling procedures. We evaluate the performance of Chimp through experiments using TelosB motes equipped with the MPU-9250 motion sensor chip and compare it with the state-of-the-art protocols in different body postures. Experimental results demonstrate that Chimp outperforms existing schemes and works efficiently in most common body postures. In high date rate scenarios, it achieves almost the same performance as the optimal power assignment scheme in which the optimal power level for each transmission is calculated based on the collected channel measurements in an off-line manner.

Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC

Convolutional Neural Networks (CNN) have been widely deployed in diverse application domains. There has been significant progress in accelerating both their training and inference using high-performance GPUs, FPGAs, and custom ASICs for datacenter-scale environments. The recent proliferation of mobile and IoT devices have necessitated real-time, energy-efficient deep neural network inference on embedded-class, resource-constrained platforms. In this context, we present Synergy, an automated, hardware-software co-designed, pipelined, high-throughput CNN inference framework on embedded heterogeneous system-on-chip (SoC) architectures (Xilinx Zynq). Synergy leverages, through multi-threading, all the available on-chip resources, which includes the dual-core ARM processor along with the FPGA and the NEON SIMD engines as accelerators. Moreover, Synergy provides a unified abstraction of the heterogeneous accelerators (FPGA and NEON) and can adapt to different network configurations at runtime without changing the underlying hardware accelerator architecture by balancing workload across accelerators through work-stealing. Synergy achieves 7.3x speedup, averaged across seven CNN models, over a well-optimized software-only solution. Synergy demonstrates substantially better throughput and energy-efficiency compared to the contemporary CNN implementations on the same SoC architecture.

Single- and multi-FPGA Acceleration of Dense Stereo Vision for Planetary Rovers

Increased mobile autonomy is a vital requisite for future planetary exploration rovers. Stereo vision is a key enabling technology in this regard, as it can passively reconstruct in 3D the surroundings of a rover and facilitate the selection of science targets and the planning of safe routes. Nonetheless, accurate dense stereo algorithms are computationally demanding. When executed on the low-performance, radiation-hardened CPUs typically installed on rovers, stereo processing severely limits the driving speed and hence the science it can be conducted in situ. Aiming to decrease execution time while increasing the accuracy of stereo vision embedded in future rovers, this paper proposes HW/SW co-design and acceleration on resource-constrained, space-grade FPGAs. In a top-down approach, we develop a stereo algorithm based on the space sweep paradigm, design its parallel HW architecture, implement it with VHDL and demonstrate feasible solutions even on small-sized devices with our multi-FPGA partitioning methodology. To meet all cost, accuracy and speed requirements set by the European Space Agency for this embedded system, we customize our HW/SW co-processor by design space exploration and testing on a Mars-like dataset. Implemented on Xilinx Virtex technology, or European NG-MEDIUM devices, the FPGA kernel processes a 1120x1120 stereo pair in 1.7-3.1 sec, utilizing only 5.4-9.3 LUT6 and 200-312 RAMB18. The proposed system exhibits up to 32x speedup over desktop CPUs, or 2810x over space-grade LEON3, and achieves a mean reconstruction error less than 2 cm up to 4 m depth. Excluding errors exceeding 2 cm (less than 4% of the total), the mean error is under 8 mm.

A Lightweight & Secure Data Collection Serverless Protocol Demonstrated in an Active RFIDs scenario

In the growing Internet of Things context, thousands of computing devices with various functionalities are producing data (from environmental sensors or other sources). However, they are also collecting, storing, processing and transmitting data to eventually communicate them securely to third parties (e.g. owners of devices or cloud data storage). The deployed devices are often battery-powered mobile or static nodes equipped with sensors and/or actuators and they communicate using wireless technologies. Examples include unmanned aerial vehicles, wireless sensor nodes, smart beacons, and wearable health objects. Such resource-constrained devices include Active RFID (Radio Frequency IDentification) nodes and these are used to illustrate our proposal. In most scenarios, these nodes are unattended in an adverse environment, so data confidentiality must be ensured from the sensing phase through to delivery to authorized entities: in other words, data must be securely stored and transmitted to prevent attack by active adversaries even if the nodes are captured. However, due to the scarce resources available to nodes in terms of energy, storage and/or computation, the proposed security solution has to be lightweight. In this paper, we propose a serverless protocol to enable MDCs (Mobile Data Collectors), such as drones, to securely collect data from mobile and static Active RFID nodes and then deliver them later to an authorized third party. The whole solution ensures data confidentiality at each step (from the sensing phase, before data collection by the MDC, once data have been collected by MDC, and during final delivery) while fulfilling the lightweight requirements for the resource-limited entities involved. To assess the suitability of the protocol against the performance requirements, it was implemented on the most resource-constrained devices to get the worst possible results. In addition, to prove the protocol fulfills the security requirements, it was analyzed with regard to security games and also formally verified using the AVISPA tool.

All ACM Journals | See Full Journal Index

Search TECS
enter search term and/or author name