ACM Transactions on

Embedded Computing Systems (TECS)

Latest Articles

Optimal Power Management with Guaranteed Minimum Energy Utilization for Solar Energy Harvesting Systems

In this work, we present a formal study on optimizing the energy consumption of energy harvesting... (more)

Contention-Detectable Mechanism for Receiver-Initiated MAC

The energy efficiency and delivery robustness are two critical issues for low duty-cycled wireless sensor networks. The asynchronous... (more)

NQA: A Nested Anti-collision Algorithm for RFID Systems

Radio frequency identification (RFID) systems, as one of the key components in the Internet of Things (IoT), have attracted much attention in the domains of industry and academia. In practice, the performance of RFID systems rather relies on the effectiveness and efficiency of anti-collision algorithms. A large body of studies have recently focused... (more)

A Task Failure Rate Aware Dual-Channel Solar Power System for Nonvolatile Sensor Nodes

In line with the rapid development of the Internet of Things (IoT), the maintenance of on-board batteries for a trillion sensor nodes has become... (more)

Enabling On-the-Fly Hardware Tracing of Data Reads in Multicores

Software debugging is one of the most challenging aspects of embedded system development due to growing hardware and software complexity, limited... (more)

Partitioning and Selection of Data Consistency Mechanisms for Multicore Real-Time Systems

Multicore platforms are becoming increasingly popular in real-time systems. One of the major challenges in designing multicore real-time systems is... (more)

Thermal-aware Real-time Scheduling Using Timed Continuous Petri Nets

We present a thermal-aware, hard real-time (HRT) global scheduler for a multiprocessor system... (more)

Self-Adaptive QoS Management of Computation and Communication Resources in Many-Core SoCs

Providing quality of service (QoS) for many-core systems with dynamic application admission is challenging due to the high amount of resources to... (more)

Cooperative Cache Transfer-based On-demand Network Coded Broadcast in Vehicular Networks

Real-time traffic updates, safety and comfort driving, infotainment, and so on, are some envisioned applications in vehicular networks. Unlike... (more)


Editor-in-Chief Call for Nominations

The term of the current Editor-in-Chief (EiC) of the ACM Transactions on Embedded Computing Systems (TECS) is coming to an end, and the ACM Publications Board has set up a nominating committee to assist the Board in selecting the next EiC. The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM TECS aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems.



Call for Nominations for ACM Transactions on Embedded Computing Systems Best Paper Award 2019

ACM TECS is seeking nominations to recognize the best paper published in ACM TECS. The best paper award will be based on the overall quality, the originality, the level of contribution, the subject matter, and the timeliness and potential impact of the research. 


About TECS

The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM Transactions on Embedded Computing Systems (TECS) aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems. 



TECS Editor-in-Chief featured in "People of ACM"

Sandeep K. Shukla was recently reappointed as Editor-in-Chief of ACM Transactions on Embedded Computing Systems (TECS), and he was featured in the periodic series "People of ACM".  Full article.

Forthcoming Articles
Sample Essentiality and Its Application in the Modeling Attacks of PUFs

Physically Unclonable Function (PUF) is an alternative hardware-based security method. As is known to all, samples are significant in the modeling attacks of PUFs. To improve the modeling attacks, some efforts are made to expand sample sets therein. A closer examination, however, reveals that not all samples contribute to the modeling attacks equally. Therefore, in this paper we introduce the concept of sample essentiality for describing the contribution of a sample in the modeling attacks and point out that any sample without sample essentiality cannot enhance some modeling attacks of PUFs. As a by-product, we find theoretically and empirically that the samples expanded by the procedures due to Chatterjee et al. do not satisfy our sample essentiality. Furthermore, we propose the notion of essential sample set for datasets and discuss its basic properties. Finally, we demonstrate that our results about sample essentiality can be used to reduce samples efficiently and benefit sample selection in the modeling attacks of PUFs.

Towards Customized Hybrid Fuel-Cell and Battery Powered Mobile Device for Individual User

The rapidly evolving technologies in the mobile devices inevitably increase their power demands for the battery. However, the development of the battery can hardly keep pace with the fast growing demands, leading to short battery life which becomes the top complaints from the customers. In this paper, we investigate a novel energy supply technology, fuel cell (FC), and leverage its advantages of providing the long-term energy storage to build a hybrid FC-battery power system. Therefore, the mobile devices operation time is dramatically extended so that users are not bothered by the battery recharging anymore. We examine the real-world smart phone usage data and find that a naive hybrid power system cannot meet many users' highly diversified power demands. We thus propose the $\alpha$\% peak throttling technique that reduces the device power consumption by a% for each power peak to solve the mismatch between the power supply and demands. This technique trades the quality-of-service (QoS) for a larger FC ratio in the system, thus much longer device operation time. We further observe that the user's personality largely determines his/her satisfaction with the QoS degradation and the operation time extension. Applying a fixed a% peak throttling fails to satisfy every user. We thus propose the personality-aware peak throttling that identifies the user personality online and then adopts the best a% value during the peak throttling to achieve the optimal satisfaction score for each user. The experimental results show that our personality-aware hybrid FC-battery solution can achieve 3.4X longer operation time and 25\% higher satisfaction score comparing to the baseline (the common battery powered device) under the same size and weight limitation.

ICNN: The Iterative Convolutional Neural Network

Modern and recent architectures of vision based Convolutional Neural Networks (CNN) have improved detection and prediction accuracy significantly. However, these algorithms are extremely computational intensive. To break the power and performance wall of CNN computation, we reformulate the CNN computation into an iterative process, where each iteration processes a sub-sample of input features with smaller network and ingests additional features to improve the prediction accuracy. Each smaller network could either classify based on its input set or feed computed and extracted features to the next network to enhance the accuracy. The proposed approach allows early-termination upon reaching acceptable confidence. Moreover, each iteration provides a contextual awareness that allows an intelligent resource allocation and optimization for the proceeding iterations. In this paper we propose various policies to reduce the computational complexity of CNN through the proposed iterative approach. We illustrate how the proposed policies construct a dynamic architecture suitable for a wide range of applications with varied accuracy requirements, resources, and time-budget, without further need for network re-training. Furthermore, we carry out a visualization of the detected features in each iteration through deconvolution network to gain more insight into the successive traversal of the ICNN.

REAL: REquest Arbitration in Last Level Caches

Shared last level caches (LLC) of multicore systems-on-chip are subject to a significant amount of contention over a limited bandwidth, resulting in major performance bottlenecks that make the issue a first-order concern in modern multiprocessor systems-on-chip. Even though shared cache space partitioning has been extensively studied in the past, the problem of cache bandwidth partitioning has not received sufficient attention. We demonstrate the occurrence of such contention and the resulting impact on the overall system performance. To address the issue, we perform detailed simulations to study the impact of different parameters, and propose a novel cache bandwidth partitioning technique, called REAL, that arbitrates among cache access requests originating from different processor cores. It monitors the LLC access patterns to dynamically assign a priority value to each core. Experimental results on different mixes of benchmarks show up to 2.13x overall system speedup over baseline policies, with minimal impact on energy.

Weakly-Hard Real-Time Guarantees for Earliest Deadline First Scheduling of Independent Tasks

The current trend in modeling and analyzing real-time systems is toward tighter yet safe timing constraints. Many practical real-time systems can de facto sustain a bounded number of deadline misses, i.e., they have weakly-hard real-time constraints rather than hard real-time constraints. We therefore strive to provide tight deadline miss models in complement to tight response time bounds for such systems. In this work, we bound the distribution of deadline misses for task sets running on uniprocessors using the Earliest Deadline First (EDF) scheduling policy. We assume tasks miss their deadlines due to transient overload resulting from sporadic activations, e.g. interrupt service routines and we use Typical Worst-Case Analysis (TWCA) to tackle the problem in this context. TWCA relies on existing worst-case response time analyses as a foundation, so we revisit and revise in this paper the state-of-the-art worst-case response time analysis for EDF scheduling. This work is motivated by and validated on a realistic case study inspired by industrial practice (satellite on-board software) and on a set of synthetic test cases. The results show the usefulness of this approach for temporary overloaded systems when EDF scheduling is considered. The scalability has also been addressed in our experiments.

Robust design and validation of cyber-physical systems

Co-simulation based validation of hardware controllers adjoined with plant models, with continuous dynamics, is an important step in model based design of controllers for Cyber-physical Systems (CPS). Co-simulation suffers from many problems such as timing delays, skew, race conditions, etc, making it unsuitable for checking timing properties of CPS. In our approach to verification of controllers synthesised from their models, the synthesised controller is adjoined with a synthesised hardware plant unit. The synthesised plant and controller are then executed synchronously and Metric Interval Temporal Logic properties are validated on the closed-loop system. The clock period is chosen, using the robustness estimates, such that all timing properties that hold on the controller guiding the discretized plant model also hold on the original case of the continuous time plant model guided by the controller.

CXDNN: Hardware-Software Compensation methods for Deep Neural Networks on Resistive Crossbar Systems

Resistive crossbars have shown strong potential as the building blocks of future neural fabrics, due to their ability to natively execute vector-matrix multiplication (the dominant computational kernel in DNNs). However, a key challenge that arises in resistive crossbars is that non-idealities in the synaptic devices, interconnects, and peripheral circuits of resistive crossbars lead to errors in the computations performed. When large-scale DNNs are executed on resistive crossbar systems, these errors compound and result in unacceptable degradation in application-level accuracy. We propose CXDNN, a hardware-software methodology that enables the realization of large-scale DNNs on crossbar systems with minimal degradation in accuracy by compensating for errors due to non-idealities. CXDNN comprises of (i) an optimized mapping technique to convert floating-point weights and activations to crossbar conductances and input voltages, (ii) a fast re-training method to recover accuracy loss due to this conversion, and (iii) low-overhead compensation hardware to mitigate dynamic and hardware-instance-specific errors. Unlike previous efforts that are limited to small networks and require the training and deployment of hardware-instance-specific models, CXDNN presents a scalable compensation methodology that can address large DNNs (e.g., ResNet-50 on ImageNet), and enables a common model to be trained and deployed on many devices. We evaluated CXDNN on 6 top DNNs from the ILSVRC challenge with 0.5-13.8 million neurons and 0.5-15.5 billion connections. CXDNN achieves 16.9%-49% improvement in the top-1 classification accuracy, effectively mitigating a key challenge to the use of resistive crossbar based neural fabrics.

Overcoming Security Vulnerabilities in Deep Learning Based Indoor Localization on Mobile Devices

Indoor localization is an emerging application domain for the navigation and tracking of people and assets. Ubiquitously available Wi-Fi signals have enabled low-cost fingerprinting-based localization solutions. Further, the rapid growth in mobile hardware capability now allows high-accuracy deep learning-based frameworks to be executed locally on mobile devices in an energy-efficient manner. However, existing deep learning-based indoor localization solutions are vulnerable to access point (AP) attacks. This paper presents an analysis into the vulnerability of a convolutional neural network (CNN) based indoor localization solution to AP security compromises. Based on this analysis, we propose a novel methodology to maintain indoor localization accuracy, even in the presence of AP attacks. The proposed secured framework (called S-CNNLOC) is validated across a benchmark suite of indoor paths and is found to observe up to 10x average localization improvement on a given path with large number of malicious AP attacks, compared to its unsecured counterpart.

Response Time Analysis for Tasks with Fixed Preemption Points on Multiprocessors

As an effective method for detecting the schedulability of real-time tasks on multiprocessor platforms, Response time analysis (RTA) has been deeply researched in recent decades. Most of the existing RTA methods are designed for tasks which can be preempted at any time. However, in some real-time systems, a task may have some fixed preemption points (FPPs) which divide its execution into a series of non-preemptive regions (NPRs). In such environments, the task can only be preempted at its FPPs, which makes existing RTA methods for arbitrary preemption tasks not applicable. In this paper, we study the schedulability analysis on tasks with FPPs under both global fixed-priority (G-FP) scheduling and global earliest deadline first (G-EDF) scheduling. First, based on the idea of limiting the time interval between two consecutive executions of an NPR, a more precise RTA method for tasks with FPPs under G-FP scheduling is proposed. Next, we propose, to our knowledge, the first RTA method for tasks with FPPs under G-EDF scheduling. Finally, extensive simulations are conducted and the results validate the effectiveness of the proposed methods.

A Closed-Loop Controller to Ensure Performance and Temperature Constraints for Dynamic Applications

To secure correct system operation, a plethora of Reliability, Availability and Serviceability (RAS) techniques have been deployed by circuit designers. RAS mechanisms however, come with the cost of extra clock cycles. In addition, a wide variety of dynamic workloads and different input conditions often constitute preemptive dependability techniques hard to implement. To this end, we focus on a realistic case study of a closed-loop controller that mitigates performance variation with a reactive response. This concept has been discussed but was only illustrated on small benchmarks. In particular, the extension of the approach to manage performance of dynamic workloads on a target platform has not been shown earlier. We compare our scheme against the version of a Linux CPU frequency governor in terms of timing response and energy consumption. Finally, we move forward and suggest a new flavor of our controller to efficiently manage processor temperature. Again, the concept is illustrated with a realistic case study and compared to a modern temperature manager.

Power-mode-aware memory subsystem optimization for low-power System-on-Chip design

The memory subsystem is increasingly subject to intensive energy minimization effort in embedded and System-on-Chip development. While the main focus is typically put on energy consumption reduction, there are other optimization aspects that become more and more relevant as well, e.g. peak power constraints or time budgets. In this regard, the present article makes the following contributions. Taking industrial-grade information into account, different SRAM power modes and their characteristics are presented at first. Using this information, a comprehensive optimization model with the main intention of energy minimization is defined. It is based on memory access statistics that represent the embedded software of interest, which allows for application-tailored improvements. Further, it considers different power states of the memory subsystem and enables the definition of peak power and time corridor constraints. The presented two-stage implementation of this optimization model allows the handling of large design spaces. Clearly defined interfaces facilitate the exchange of individual workflow parts in a plug-and-play fashion and further enable a neat integration of our optimization method with existing HW/SW codesign synthesis flows. A general evaluation for different technology nodes yields that the optimization potential of memory low-power modes increases with advancing miniaturization but also depends on the data footprint of the embedded software. Experimental results for a set of benchmark applications confirm these findings and provide energy savings of up to 90% and over 60% on average compared to a monolithic memory layout without low-power modes.

BTMonitor: Bit-Time based Intrusion Detection and Attacker Identification in Controller Area Network

With the rapid growth of connectivity and autonomy for today?s automobiles, their security vulnerabilities are becoming one of the most urgent concerns in the automotive industry. The lack of message authentication in Controller Area Network (CAN), which is the most popular in-vehicle communication protocol, makes it susceptible to cyber attack. It has been demonstrated that the remote attackers can take over the maneuver of vehicles after getting access to CAN, which poses serious safety threats to the public. To mitigate this issue, we propose a novel intrusion detection system (IDS), called BTMonitor (Bit-time based CAN Bus Monitor). It utilizes the small but measurable discrepancy of bit time in CAN frames to fingerprint their sender Electronic Control Units (ECUs). To reduce the requirement for high sampling rate, we calculate the bit time of recessive bits and dominant bits respectively and extract their statistical features as fingerprint. The generated fingerprint is then used to detect intrusion and pinpoint the attacker. BTMonitor can detect new types of masquerade attack that the state-of-the-art clock-skew based IDS is unable to identify. We implement a prototype system for BTMonitor using Xilinx Spartan 6 FPGA for data collection. We evaluate our method on both a CAN bus prototype and a real vehicle. The results show that BTMonitor can correctly identify the sender with an average probability of 99.76% on the real vehicle.

Tÿcho: A Framework for Compiling Stream Programs

Stream programs are graph structured parallel programs, where the nodes are computational kernels that communicate by sending tokens over the edges. In this paper we present a framework for compiling stream programs that we call Tÿcho. It handles kernels of different styles and with a high degree of expressiveness using a common intermediate representation. It also provides efficient implementation, especially for but not limited to the restricted forms of stream programs, such as synchronous dataflow.

Distill-Net: Application-Specific Distillation of Deep CNNs for Resource-Constrained IoT Platforms

Many Internet-of-Things (IoT) applications demand fast and accurate understanding of a few key events in their surrounding environment. Deep Convolutional Neural Networks (CNNs) have emerged as an effective approach to understand speech, images, and similar high dimensional data types. Algorithmic performance of modern CNNs, however, fundamentally relies on learning class-agnostic hierarchical features that only exist in comprehensive training datasets with many classes. As a result, fast inference using CNNs trained on such datasets is prohibitive for most resource-constrained IoT platforms. To bridge this gap, we present a principled and practical methodology for distilling a complex modern CNN that is trained to effectively recognize many different classes of input data into an application-dependent essential core that not only recognizes the few classes of interest to the application accurately, but also runs efficiently on platforms with limited resources. Experimental results confirm that our approach strikes a favorable balance between classification accuracy (application constraint), inference efficiency (platform constraint), and productive development of new applications (business constraint).

Hardware-Software Collaborative Thermal Sensing in Optical Network-on-Chip Based Manycore Systems

Continuous technology scaling in manycore systems leads to severe overheating issues. To guarantee system reliability, it is critical to accurately yet efficiently monitor run-time temperature distribution for effective chip thermal management. As an emerging communication architecture for new-generation manycore systems, optical network-on-chip (ONoC) satisfies the communication bandwidth and latency requirements with low power dissipation. What?s more, observation shows that it can be leveraged for run-time thermal sensing. In this paper, we propose a brand-new on-chip thermal sensing approach for ONoC-based manycore systems by utilizing the intrinsic thermal sensitivity of optical devices and the interprocessor communications in ONoCs. It requires no extra hardware but utilizes existing optical devices in ONoCs, and combines them with lightweight software computation in a hardware-software collaborative manner. The effectiveness of our approach is validated both at the device level and the system level through professional photonic simulations. Evaluation results based on synthetic communication traces and realistic benchmarks show that our approach achieves an average temperature inaccuracy of only 0.6648 K compared to ground truth values, and is scalable to be applied for large-size ONoCs.

ALEXIA: A Processor with Light Weight Extensions for Memory Safety

Illegal use of memory pointers is a serious security vulnerability. A large number of malwares exploit the spatial and temporal nature of these vulnerabilities to subvert execution or glean sensitive data from an application. Recent countermeasures attach metadata to memory pointers, which define the pointer's capabilities. The metadata is used by the hardware to validate pointer-based memory accesses. However, recent works have considerable overheads. Further, the pointer validation is decoupled from the actual memory access. We show that this could open up vulnerabilities in multi-threaded applications and introduce new vulnerabilities due to speculation in out-of-order processors. In this paper, we demonstrate that the overheads can be reduced considerably by efficient metadata management. We show that the hardware can be designed in a manner that would remain safe in multi-threaded applications and immune to speculative vulnerabilities. We achieve these by ensuring that the pointer validations and the corresponding memory access is always done atomically and in-order. To evaluate our scheme, which we call ALEXIA, we enhance an OpenRISC processor to perform the memory validation at run time and also add compiler support. ALEXIA is the first hardware countermeasure scheme for memory protection that provides such an end-to-end solution. We evaluate the processor on an Altera FPGA and show that the run time overheads, on average, is 14%, with negligible impact on the processor's size and clock frequency. There is also a negligible impact on the program's code and data sizes.

GRec: automatic computation of reconfiguration graphs for multi-core platforms

The DREAMS (Distributed REal-Time Architecture for Mixed Criticality Systems ? 2013-2017) European project addressed the design of a cross-domain architecture for executing applications of different criticality levels in networked multi-core embedded systems. One of the outcome of the project was the development of a fault-tolerant distributed middleware that supports some failures. In case of permanent core failure occurrence, the resource management reconfigures the platform by selecting an adequate configuration in a set of pre-defined configurations. We have presented the DREAMS fault-tolerant resource management in former papers, all of them assuming that the set of pre-defined configurations was available. In this paper, we formally define the notion of reconfiguration graphs and we provide a constraint programming-based approach to automatically compute the reconfiguration graphs.

All ACM Journals | See Full Journal Index

Search TECS
enter search term and/or author name