Latest Articles

## Optimal Power Management with Guaranteed Minimum Energy Utilization for Solar Energy Harvesting Systems

In this work, we present a formal study on optimizing the energy consumption of energy harvesting... (more)

## Contention-Detectable Mechanism for Receiver-Initiated MAC

The energy efficiency and delivery robustness are two critical issues for low duty-cycled wireless sensor networks. The asynchronous... (more)

## NQA: A Nested Anti-collision Algorithm for RFID Systems

Radio frequency identification (RFID) systems, as one of the key components in the Internet of Things (IoT), have attracted much attention in the domains of industry and academia. In practice, the performance of RFID systems rather relies on the effectiveness and efficiency of anti-collision algorithms. A large body of studies have recently focused... (more)

## A Task Failure Rate Aware Dual-Channel Solar Power System for Nonvolatile Sensor Nodes

In line with the rapid development of the Internet of Things (IoT), the maintenance of on-board batteries for a trillion sensor nodes has become... (more)

## Enabling On-the-Fly Hardware Tracing of Data Reads in Multicores

Software debugging is one of the most challenging aspects of embedded system development due to growing hardware and software complexity, limited... (more)

## Partitioning and Selection of Data Consistency Mechanisms for Multicore Real-Time Systems

Multicore platforms are becoming increasingly popular in real-time systems. One of the major challenges in designing multicore real-time systems is... (more)

## Thermal-aware Real-time Scheduling Using Timed Continuous Petri Nets

We present a thermal-aware, hard real-time (HRT) global scheduler for a multiprocessor system... (more)

## Self-Adaptive QoS Management of Computation and Communication Resources in Many-Core SoCs

Providing quality of service (QoS) for many-core systems with dynamic application admission is challenging due to the high amount of resources to... (more)

## Cooperative Cache Transfer-based On-demand Network Coded Broadcast in Vehicular Networks

Real-time traffic updates, safety and comfort driving, infotainment, and so on, are some envisioned applications in vehicular networks. Unlike... (more)

##### NEWS

Editor-in-Chief Call for Nominations

The term of the current Editor-in-Chief (EiC) of the ACM Transactions on Embedded Computing Systems (TECS) is coming to an end, and the ACM Publications Board has set up a nominating committee to assist the Board in selecting the next EiC. The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM TECS aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems.

Call for Nominations for ACM Transactions on Embedded Computing Systems Best Paper Award 2019

ACM TECS is seeking nominations to recognize the best paper published in ACM TECS. The best paper award will be based on the overall quality, the originality, the level of contribution, the subject matter, and the timeliness and potential impact of the research.

MORE DETAILS

#### Editorial: Embedded Computing and Society

Towards Customized Hybrid Fuel-Cell and Battery Powered Mobile Device for Individual User

The rapidly evolving technologies in the mobile devices inevitably increase their power demands for the battery. However, the development of the battery can hardly keep pace with the fast growing demands, leading to short battery life which becomes the top complaints from the customers. In this paper, we investigate a novel energy supply technology, fuel cell (FC), and leverage its advantages of providing the long-term energy storage to build a hybrid FC-battery power system. Therefore, the mobile devices operation time is dramatically extended so that users are not bothered by the battery recharging anymore. We examine the real-world smart phone usage data and find that a naive hybrid power system cannot meet many users' highly diversified power demands. We thus propose the $\alpha$\% peak throttling technique that reduces the device power consumption by a% for each power peak to solve the mismatch between the power supply and demands. This technique trades the quality-of-service (QoS) for a larger FC ratio in the system, thus much longer device operation time. We further observe that the user's personality largely determines his/her satisfaction with the QoS degradation and the operation time extension. Applying a fixed a% peak throttling fails to satisfy every user. We thus propose the personality-aware peak throttling that identifies the user personality online and then adopts the best a% value during the peak throttling to achieve the optimal satisfaction score for each user. The experimental results show that our personality-aware hybrid FC-battery solution can achieve 3.4X longer operation time and 25\% higher satisfaction score comparing to the baseline (the common battery powered device) under the same size and weight limitation.

ICNN: The Iterative Convolutional Neural Network

Modern and recent architectures of vision based Convolutional Neural Networks (CNN) have improved detection and prediction accuracy significantly. However, these algorithms are extremely computational intensive. To break the power and performance wall of CNN computation, we reformulate the CNN computation into an iterative process, where each iteration processes a sub-sample of input features with smaller network and ingests additional features to improve the prediction accuracy. Each smaller network could either classify based on its input set or feed computed and extracted features to the next network to enhance the accuracy. The proposed approach allows early-termination upon reaching acceptable confidence. Moreover, each iteration provides a contextual awareness that allows an intelligent resource allocation and optimization for the proceeding iterations. In this paper we propose various policies to reduce the computational complexity of CNN through the proposed iterative approach. We illustrate how the proposed policies construct a dynamic architecture suitable for a wide range of applications with varied accuracy requirements, resources, and time-budget, without further need for network re-training. Furthermore, we carry out a visualization of the detected features in each iteration through deconvolution network to gain more insight into the successive traversal of the ICNN.

REAL: REquest Arbitration in Last Level Caches

Shared last level caches (LLC) of multicore systems-on-chip are subject to a significant amount of contention over a limited bandwidth, resulting in major performance bottlenecks that make the issue a first-order concern in modern multiprocessor systems-on-chip. Even though shared cache space partitioning has been extensively studied in the past, the problem of cache bandwidth partitioning has not received sufficient attention. We demonstrate the occurrence of such contention and the resulting impact on the overall system performance. To address the issue, we perform detailed simulations to study the impact of different parameters, and propose a novel cache bandwidth partitioning technique, called REAL, that arbitrates among cache access requests originating from different processor cores. It monitors the LLC access patterns to dynamically assign a priority value to each core. Experimental results on different mixes of benchmarks show up to 2.13x overall system speedup over baseline policies, with minimal impact on energy.

Robust design and validation of cyber-physical systems

Co-simulation based validation of hardware controllers adjoined with plant models, with continuous dynamics, is an important step in model based design of controllers for Cyber-physical Systems (CPS). Co-simulation suffers from many problems such as timing delays, skew, race conditions, etc, making it unsuitable for checking timing properties of CPS. In our approach to verification of controllers synthesised from their models, the synthesised controller is adjoined with a synthesised hardware plant unit. The synthesised plant and controller are then executed synchronously and Metric Interval Temporal Logic properties are validated on the closed-loop system. The clock period is chosen, using the robustness estimates, such that all timing properties that hold on the controller guiding the discretized plant model also hold on the original case of the continuous time plant model guided by the controller.

CXDNN: Hardware-Software Compensation methods for Deep Neural Networks on Resistive Crossbar Systems

Resistive crossbars have shown strong potential as the building blocks of future neural fabrics, due to their ability to natively execute vector-matrix multiplication (the dominant computational kernel in DNNs). However, a key challenge that arises in resistive crossbars is that non-idealities in the synaptic devices, interconnects, and peripheral circuits of resistive crossbars lead to errors in the computations performed. When large-scale DNNs are executed on resistive crossbar systems, these errors compound and result in unacceptable degradation in application-level accuracy. We propose CXDNN, a hardware-software methodology that enables the realization of large-scale DNNs on crossbar systems with minimal degradation in accuracy by compensating for errors due to non-idealities. CXDNN comprises of (i) an optimized mapping technique to convert floating-point weights and activations to crossbar conductances and input voltages, (ii) a fast re-training method to recover accuracy loss due to this conversion, and (iii) low-overhead compensation hardware to mitigate dynamic and hardware-instance-specific errors. Unlike previous efforts that are limited to small networks and require the training and deployment of hardware-instance-specific models, CXDNN presents a scalable compensation methodology that can address large DNNs (e.g., ResNet-50 on ImageNet), and enables a common model to be trained and deployed on many devices. We evaluated CXDNN on 6 top DNNs from the ILSVRC challenge with 0.5-13.8 million neurons and 0.5-15.5 billion connections. CXDNN achieves 16.9%-49% improvement in the top-1 classification accuracy, effectively mitigating a key challenge to the use of resistive crossbar based neural fabrics.

Overcoming Security Vulnerabilities in Deep Learning Based Indoor Localization on Mobile Devices

Indoor localization is an emerging application domain for the navigation and tracking of people and assets. Ubiquitously available Wi-Fi signals have enabled low-cost fingerprinting-based localization solutions. Further, the rapid growth in mobile hardware capability now allows high-accuracy deep learning-based frameworks to be executed locally on mobile devices in an energy-efficient manner. However, existing deep learning-based indoor localization solutions are vulnerable to access point (AP) attacks. This paper presents an analysis into the vulnerability of a convolutional neural network (CNN) based indoor localization solution to AP security compromises. Based on this analysis, we propose a novel methodology to maintain indoor localization accuracy, even in the presence of AP attacks. The proposed secured framework (called S-CNNLOC) is validated across a benchmark suite of indoor paths and is found to observe up to 10x average localization improvement on a given path with large number of malicious AP attacks, compared to its unsecured counterpart.

Design & Optimization of Energy-Accuracy Trade-off for Mobile platforms via Pretrained Deep Models

Many real-world edge applications including object detection, robotics, and smart health are enabled by deploying deep neural networks (DNNs) on energy-constrained mobile platforms. In this paper, we propose a novel approach to trade-off energy and accuracy of inference at runtime using a design space called Learning Energy Accuracy Tradeoff Networks (LEANets). The key idea behind LEANets is to design classifiers of increasing complexity using pre-trained DNNs to perform input-specific adaptive inference. The accuracy and energy-consumption of the adaptive inference scheme depends on a set of thresholds, one for each classifier. To determine the set of threshold vectors to achieve different energy and accuracy trade-offs, we propose a novel multi-objective optimization approach. We can select the appropriate threshold vector at runtime based on the desired trade-off. We perform experiments on multiple pre-trained DNNs including ConvNet, VGG-16, and MobileNet using diverse image classification datasets. Our results show that we get up to 50% gain in energy for negligible loss in accuracy, and optimized LEANets achieve significantly better energy and accuracy trade-off when compared to a state-of-the-art method referred as Slimmable neural networks.

BTMonitor: Bit-Time based Intrusion Detection and Attacker Identification in Controller Area Network

With the rapid growth of connectivity and autonomy for today?s automobiles, their security vulnerabilities are becoming one of the most urgent concerns in the automotive industry. The lack of message authentication in Controller Area Network (CAN), which is the most popular in-vehicle communication protocol, makes it susceptible to cyber attack. It has been demonstrated that the remote attackers can take over the maneuver of vehicles after getting access to CAN, which poses serious safety threats to the public. To mitigate this issue, we propose a novel intrusion detection system (IDS), called BTMonitor (Bit-time based CAN Bus Monitor). It utilizes the small but measurable discrepancy of bit time in CAN frames to fingerprint their sender Electronic Control Units (ECUs). To reduce the requirement for high sampling rate, we calculate the bit time of recessive bits and dominant bits respectively and extract their statistical features as fingerprint. The generated fingerprint is then used to detect intrusion and pinpoint the attacker. BTMonitor can detect new types of masquerade attack that the state-of-the-art clock-skew based IDS is unable to identify. We implement a prototype system for BTMonitor using Xilinx Spartan 6 FPGA for data collection. We evaluate our method on both a CAN bus prototype and a real vehicle. The results show that BTMonitor can correctly identify the sender with an average probability of 99.76% on the real vehicle.

Tÿcho: A Framework for Compiling Stream Programs

Stream programs are graph structured parallel programs, where the nodes are computational kernels that communicate by sending tokens over the edges. In this paper we present a framework for compiling stream programs that we call Tÿcho. It handles kernels of different styles and with a high degree of expressiveness using a common intermediate representation. It also provides efficient implementation, especially for but not limited to the restricted forms of stream programs, such as synchronous dataflow.

Hardware-Software Collaborative Thermal Sensing in Optical Network-on-Chip Based Manycore Systems

Continuous technology scaling in manycore systems leads to severe overheating issues. To guarantee system reliability, it is critical to accurately yet efficiently monitor run-time temperature distribution for effective chip thermal management. As an emerging communication architecture for new-generation manycore systems, optical network-on-chip (ONoC) satisfies the communication bandwidth and latency requirements with low power dissipation. What?s more, observation shows that it can be leveraged for run-time thermal sensing. In this paper, we propose a brand-new on-chip thermal sensing approach for ONoC-based manycore systems by utilizing the intrinsic thermal sensitivity of optical devices and the interprocessor communications in ONoCs. It requires no extra hardware but utilizes existing optical devices in ONoCs, and combines them with lightweight software computation in a hardware-software collaborative manner. The effectiveness of our approach is validated both at the device level and the system level through professional photonic simulations. Evaluation results based on synthetic communication traces and realistic benchmarks show that our approach achieves an average temperature inaccuracy of only 0.6648 K compared to ground truth values, and is scalable to be applied for large-size ONoCs.

ALEXIA: A Processor with Light Weight Extensions for Memory Safety

Illegal use of memory pointers is a serious security vulnerability. A large number of malwares exploit the spatial and temporal nature of these vulnerabilities to subvert execution or glean sensitive data from an application. Recent countermeasures attach metadata to memory pointers, which define the pointer's capabilities. The metadata is used by the hardware to validate pointer-based memory accesses. However, recent works have considerable overheads. Further, the pointer validation is decoupled from the actual memory access. We show that this could open up vulnerabilities in multi-threaded applications and introduce new vulnerabilities due to speculation in out-of-order processors. In this paper, we demonstrate that the overheads can be reduced considerably by efficient metadata management. We show that the hardware can be designed in a manner that would remain safe in multi-threaded applications and immune to speculative vulnerabilities. We achieve these by ensuring that the pointer validations and the corresponding memory access is always done atomically and in-order. To evaluate our scheme, which we call ALEXIA, we enhance an OpenRISC processor to perform the memory validation at run time and also add compiler support. ALEXIA is the first hardware countermeasure scheme for memory protection that provides such an end-to-end solution. We evaluate the processor on an Altera FPGA and show that the run time overheads, on average, is 14%, with negligible impact on the processor's size and clock frequency. There is also a negligible impact on the program's code and data sizes.

ELSA: A Throughput-Optimized Design of an LSTM Accelerator for Energy-Constrained Devices

The next significant step in the evolution and proliferation of artificial intelligence technology will be the integration of neural network (NN) models within embedded and mobile systems. This calls for the design of compact, energy efficient NN models in silicon. In this paper, we present a scalable ASIC design of an LSTM accelerator named ELSA, that is suitable for energy-constrained devices. It includes several architectural innovations to achieve small area and high energy efficiency. To reduce the area and power consumption of the overall design, the compute-intensive units of ELSA employ approximate multiplications and still achieve high performance and accuracy. The performance is further improved through efficient synchronization of the elastic pipeline stages to maximize the utilization. The paper also includes a performance model of ELSA, as a function of the hidden nodes and time steps, permitting its use for the evaluation of any LSTM application. ELSA was implemented in RTL and was synthesized and placed and routed in 65nm technology. Its functionality is demonstrated for language modeling ? a common application of LSTM. ELSA is compared against a baseline implementation of an LSTM accelerator with standard functional units and without any of the architectural innovations of ELSA. The paper demonstrates that ELSA can achieve significant improvements in power, area and energy-efficiency when compared to the baseline design and several ASIC implementations reported in the literature, making it suitable for use in embedded systems and real-time applications.

Blocking-Aware Partitioned Real-Time Scheduling for Uniform Heterogeneous Multicore Platforms

Heterogeneous multicore processor has recently become de facto computing platform for state-of-the-art embedded applications. Nonetheless, very little research focuses on the scheduling for real-time periodic tasks upon heterogeneous multicores under the requirements of task synchronization, which is stemmed from resource access conflicts and can greatly affect the schedulability of tasks. In view of the partitioned-EDF algorithm and Multiprocessor Stack Resource Policy (MSRP), we first discuss the blocking-aware utilization bound for uniform heterogeneous multicores and then illustrate its non-monotonicity, where the bound may reduce with more cores being exploited. Following the insights obtained from the analysis of the bound, taking the heterogeneity of computing systems into consideration, we propose an algorithm SA-TPA-HM (synchronization-aware task partitioning algorithm for heterogeneous multicores). Several blocking-guided and heterogeneity-aware mapping heuristics are incorporated to reduce the negative impacts of blocking conflicts in task system for better schedulability performance of tasks and balanced workload distributed across cores. The extensive simulation results demonstrate that the SA-TPA-HM algorithm can obtain the schedulability result approximate to an Integer Non-Linear Programming (INLP) based method and can have much better schedulability result (such as 60% more) in comparison with the current mapping heuristics targeted at homogeneous multicores. The measurements in Linux kernel further reveal the practical viability of SA-TPA-HM that can experience lower online overhead (e.g., 15% less) in contrast to other partitioning schemes.

###### All ACM Journals | See Full Journal Index

Search TECS
enter search term and/or author name