Fixed-Priority Scheduling for Two-Phase Mixed-Criticality Systems
Editorial: Security of Mobile Devices
Convolutional neural networks (CNNs) are widely employed in many image recognition applications. The ever-increasing computational capability of mobile processors provides opportunities to run such applications in an energy efficient and low latency manner, and is therefore critical that they are properly optimized on these platforms. Matrix multiplication is an important operation used in CNNs. In this paper, we propose customized versions of the matrix multiplication algorithms that can help to speed up CNNs. Specifically, we propose a BCSC (Block Compressed Sparse Column) algorithm and a bitrepresentation based algorithm (BitsGEMM) that exploit sparsity to accelerate CNNs on the NVIDIA Jetson TK1.
Network-connected embedded systems are vulnerable to increasing malware. While anomaly-based malware detection is effective, existing approaches incur significant overheads and are susceptible to mimicry attacks. We present a formal security model that defines normal system behavior including execution sequence and timing. On-chip hardware non-intrusively monitors system execution and detect anomalies at runtime. The timing distribution of control flow events is further analyzed to select subset of monitoring targets to meet hardware constraints. The approach is evaluated using a network-connected pacemaker prototype and mimicry malware.
In an object-based NAND flash device (ONFD), two causes of write amplification are onode partial update and cascading update. Updating one onode, a small-sized object metadata invokes partial page update (i.e., onode partial update), incurring unnecessary migration of the un-updated data. Cascading update denotes updating object metadata in a cascading manner due to object data update or migration. In this work, we propose a system design to alleviate the write amplification issue in the object-based NAND flash device. Experiment results show that our proposed design can achieve up to 20% write reduction compared to the best state-of-the-art.
A key step in simulation-driven algorithms is to compute the reach set over-approximations from a set of initial states through numerical simulations and sensitivity analysis. This paper addresses this problem by providing algorithms for computing discrepancy functions as the upper bound on the sensitivity. The proposed algorithms rely on computing local bounds on matrix measures under different norms such that the over-approximations can be computed fast or its conservativeness is locally minimized. The proposed algorithms enable automatic reach set computations of general nonlinear systems and have been successfully used on several challenging benchmark model.
In safety-critical systems, there are typically different applications providing functionalities with varying degrees of criticality. Consequently, high levels of assurance is required for a highly critical functionality, whereas relatively low levels of assurance is required for a less critical functionality. A theory of real-time scheduling for such multi-criticality systems has been under development in the recent past. In particular, an algorithm called Earliest Deadline First with Virtual Deadlines (EDF-VD) has shown a lot of promise for systems in terms of practical performance. In this paper we design a new schedulability test based on demand bound functions for EDF-VD.
In this paper, we propose four Approximate Full Adders (AFAs) with design objective that Cout remains independent of Cin with minimal error probability. We exploit the proposed AFAs to construct an N-bit approximate adder which hereinafter is referred to as ApproxADD. For improving Error Distance (ED) and Error Rate (ER) of ApproxADD, we exploit the concept of carry-lifetime and Error Detection and Correction (EDC) logic, respectively. We evaluate efficiency of the proposed approach by comparing it with existing approximate adders. We inspect effectiveness of the proposed approach in real-life applications by demonstrating image compression and decompression using the ApproxADD.
ÿTransiently powered systems avoid large energy storage by computation during availability of harvested energy. Using an efficient Energy Management Unit with Dynamic Energy Burst Scaling, energy can be accumulated in an optimally sized capacitance and used to supply energy bursts to the load at their optimal power point. The maximum burst size and use of a Non-Volatile Memory Hierarchy can have a significant impact on the work done per unit of energy. Experiments with a long-term autonomous image acquisition application show that the energy per image can be reduced by 77.8%, at the price of a 65% larger energy buffer.
Application requirements, such as real-time response, are pushing wearable devices to leverage more powerful processors inside the SoC (System-on-chip). However, existing wearable devices are not well suited for such challenging applications due to poor performance. We propose LOCUS a low-power, customizable, many-core processor for next-generation wearable devices. LOCUS combines customizable processor cores with a customizable network on a message-passing architecture to deliver very competitive performance/watt an average 3.1x compared to quad-core ARM processors used in the stateof-the-art wearable devices, and an average 1.52x performance/watt improvement over a conventional 16-core shared-memory many-core architecture.
This paper investigates the use of many-core systems to execute the disparity estimation algorithm, used in stereo vision applications, as these systems can provide flexibility between performance scaling and power consumption. We present a learning-based runtime management approach which achieves a required performance threshold whilst minimizing power consumption through dynamic control of frequency and core allocation. Experimental results are obtained from a 61-core Intel Xeon-Phi platform for the above investigation. The same performance can be achieved with an average reduction in power consumption of 27.8% and increased energy efficiency by 30.04% when compared to DVFS control alone without runtime management.
In this paper, we propose D-PUF, an intrinsically reconfigurable DRAM PUF based on refresh pausing. A key feature of the proposed DRAM PUF is reconfigurability, i.e., by varying the refresh-pause interval, the challenge-response behavior of the PUF can be altered, making it robust against various attacks. The paper is broadly divided into two parts. In the first part, we demonstrate the use of D-PUF in performing device authentication through a secure, low-overhead methodology. In the second part, we show the generation of true random numbers using D-PUF. The design is implemented and validated using several off-the-shelf DDR3 DRAM modules.
Evaluation of industrial embedded control system designs is a time-consuming and imperfect process. While an ideal process would apply a formal verification technique such as model checking, for industrial scale control systems, these techniques are often difficult to use to verify performance aspects such as convergence. For industrial designs, engineers rely on testing processes to identify unexpected behaviors. We propose a novel framework called Underminer to improve the testing process; this is an automated technique to identify non-converging behaviors in embedded control system designs. It supports various convergence-like notions, such as those based on Lyapunov analysis and temporal logic formulae.
In this paper we present a new algorithm that combines contextual unfoldings and dynamic symbolic execution to systematically test multithreaded programs. The approach uses symbolic execution to limit the number of input values and unfoldings to limit the number of thread interleavings that are needed to cover reachable local states of threads in the program under test. We show that the use of contextual unfoldings allows interleavings of threads to be succinctly represented. This can in some cases lead to a substantial reduction to the number of needed test executions when compared to previous approaches.
Guest Editorial for ACM TECS Special Issue on Autonomous Battery-Free Sensing and Communication
A self-sustained water quality sensing system is designed where sensors are powered by renewable bio-energy generated from microbial fuel cells (MFC). MFC collect the energy released from native manganese oxidizing microorganisms (MOM) that are abundant in natural waters. A power management module and a radio-frequency (RF) activation technique are proposed to ensure precious energy harvested by MFC is used efficiently. The proposed system is implemented and evaluated in a stream. Results from three-month field experiments indicate that the proposed system is able to collect reliable water quality data and is robust to environment changes.
Auto-tuning and parametric implementation of deep learning kernels allow off-the-shelf accelerator-based embedded platforms to deliver high performance and energy efficient mappings for lightweight neural networks. Low-complexity classifiers are characterized by operations on small image maps with 23 deep layers and few class labels. For these use cases, we consider a range of embedded systems with 20 W power budget such as the Xilinx ZC706 (FPGA), NVIDIA Jetson TX1 (GPU), TI Keystone II (DSP) and the Adapteva Parallella (RISC+NoC). In Caffe- Presso, we combine auto-tuning of the implementation parameters, and platform-specific constraints deliver optimized solution for each input ConvNet specification.
In this paper we present Distributed Computing for Constrained Devices (DC4CD), a novel software architecture that supports symbolic distributed computing on Wireless Sensor Networks. DC4CD integrates the functionalities of a high-level symbolic interpreter, a compiler, and an operating system, and includes networking abstractions to exchange high-level symbolic code among peer devices. Contrarily to other architectures proposed in literature, DC4CD allows for changes at runtime, even on deployed nodes of both application and system code. Experimental results show that DC4CD is more efficient in terms of memory usage than existing architectures, with which also compares well in terms of execution efficiency.
The lack of schedulability evaluation of previous charging schemes in wireless rechargeable sensor networks (WRSNs) degrades the charging efficiency, leading to energy exhaustion. We propose an optimal path planning charging scheme, namely OPPC, under the on-demand charging architecture. OPPC evaluates the schedulability of a charging mission, which makes charging scheduling predictable. OPPC provides an optimal charging path which enhances the charging performance. When confronting with a non-schedulable charging mission, a couple of mission adjusting algorithms are developed to enable the schedulability. Simulation results demonstrate that OPPC can achieve better performance in successful charging ratio as well as charging efficiency.
This paper presents a new reliability-aware task mapping approach in a many-core platform at design time for applications with DAG-based task graphs. The main goal is to devise a task mapping which meets a predefined reliability threshold considering a minimized performance degradation. The proposed approach uses a majority-voting replication technique to fulfill error-masking capability. A quantitative reliability model is also proposed for the platform. Our iterative approach is applicable to an unlimited number of system fault types. All parts of the platform including cores, links and routers are assumed to be prone to failur
The drift phenomenon can degrade the quality of result (QoR) of applications in presence of approximate MLC-PCM. The architecture-level techniques to alleviate the effects of the drift incur considerable power overhead in embedded systems (about 100%). In this paper, we utilize the DVFS technique to speed up the execution of the application when the profiling shows the drift-based errors are more probable to decrease the probability of drift-based soft errors. But, we speed up the application to save energy when the drift is not threating. Power overhead is improved by 84% (in average) in our approach, while QoR is acceptable.