Fixed-Priority Scheduling for Two-Phase Mixed-Criticality Systems
Radio frequency (RF) energy harvesting techniques are becoming a potential method to power battery-free wireless networks. In RF energy harvesting communications, energy cooperation enables shaping and optimization of the energy arrivals at the energy-receiving node to improve the overall system performance. In this paper, we propose an energy cooperation scheme that enables energy cooperation in battery-free wireless networks with RF harvesting. We first study the battery-free wireless network with RF energy harvesting then state the problem that optimizing the system performance through new energy cooperation protocol. We find our protocol performs better than the original battery-free wireless network solution.
Convolutional neural networks (CNNs) are widely employed in many image recognition applications. The ever-increasing computational capability of mobile processors provides opportunities to run such applications in an energy efficient and low latency manner, and is therefore critical that they are properly optimized on these platforms. Matrix multiplication is an important operation used in CNNs. In this paper, we propose customized versions of the matrix multiplication algorithms that can help to speed up CNNs. Specifically, we propose a BCSC (Block Compressed Sparse Column) algorithm and a bitrepresentation based algorithm (BitsGEMM) that exploit sparsity to accelerate CNNs on the NVIDIA Jetson TK1.
Network-connected embedded systems are vulnerable to increasing malware. While anomaly-based malware detection is effective, existing approaches incur significant overheads and are susceptible to mimicry attacks. We present a formal security model that defines normal system behavior including execution sequence and timing. On-chip hardware non-intrusively monitors system execution and detect anomalies at runtime. The timing distribution of control flow events is further analyzed to select subset of monitoring targets to meet hardware constraints. The approach is evaluated using a network-connected pacemaker prototype and mimicry malware.
In an object-based NAND flash device (ONFD), two causes of write amplification are onode partial update and cascading update. Updating one onode, a small-sized object metadata invokes partial page update (i.e., onode partial update), incurring unnecessary migration of the un-updated data. Cascading update denotes updating object metadata in a cascading manner due to object data update or migration. In this work, we propose a system design to alleviate the write amplification issue in the object-based NAND flash device. Experiment results show that our proposed design can achieve up to 20% write reduction compared to the best state-of-the-art.
A key step in simulation-driven algorithms is to compute the reach set over-approximations from a set of initial states through numerical simulations and sensitivity analysis. This paper addresses this problem by providing algorithms for computing discrepancy functions as the upper bound on the sensitivity. The proposed algorithms rely on computing local bounds on matrix measures under different norms such that the over-approximations can be computed fast or its conservativeness is locally minimized. The proposed algorithms enable automatic reach set computations of general nonlinear systems and have been successfully used on several challenging benchmark model.
In safety-critical systems, there are typically different applications providing functionalities with varying degrees of criticality. Consequently, high levels of assurance is required for a highly critical functionality, whereas relatively low levels of assurance is required for a less critical functionality. A theory of real-time scheduling for such multi-criticality systems has been under development in the recent past. In particular, an algorithm called Earliest Deadline First with Virtual Deadlines (EDF-VD) has shown a lot of promise for systems in terms of practical performance. In this paper we design a new schedulability test based on demand bound functions for EDF-VD.
In this paper, we propose four Approximate Full Adders (AFAs) with design objective that Cout remains independent of Cin with minimal error probability. We exploit the proposed AFAs to construct an N-bit approximate adder which hereinafter is referred to as ApproxADD. For improving Error Distance (ED) and Error Rate (ER) of ApproxADD, we exploit the concept of carry-lifetime and Error Detection and Correction (EDC) logic, respectively. We evaluate efficiency of the proposed approach by comparing it with existing approximate adders. We inspect effectiveness of the proposed approach in real-life applications by demonstrating image compression and decompression using the ApproxADD.
Wireless sensor networks for rarely occurring critical events must maintain sensing coverage and low latency network connectivity to ensure event detection and subsequent rapid propagation of notification messages. Few algorithms have been proposed that address both coverage and forwarding and those that do are either unconcerned with rapid propagation or are not optimised to handle the constant changes in topology observed in duty cycled networks. This paper proposes an algorithm for Coverage Preservation with Rapid Forwarding (CPRF). The algorithm is shown to deliver perfect coverage maintenance and low latency message propagation whilst allowing stored-charge conservation via collaborative duty cycling in...
Application requirements, such as real-time response, are pushing wearable devices to leverage more powerful processors inside the SoC (System-on-chip). However, existing wearable devices are not well suited for such challenging applications due to poor performance. We propose LOCUS a low-power, customizable, many-core processor for next-generation wearable devices. LOCUS combines customizable processor cores with a customizable network on a message-passing architecture to deliver very competitive performance/watt an average 3.1x compared to quad-core ARM processors used in the stateof-the-art wearable devices, and an average 1.52x performance/watt improvement over a conventional 16-core shared-memory many-core architecture.
This paper investigates the use of many-core systems to execute the disparity estimation algorithm, used in stereo vision applications, as these systems can provide flexibility between performance scaling and power consumption. We present a learning-based runtime management approach which achieves a required performance threshold whilst minimizing power consumption through dynamic control of frequency and core allocation. Experimental results are obtained from a 61-core Intel Xeon-Phi platform for the above investigation. The same performance can be achieved with an average reduction in power consumption of 27.8% and increased energy efficiency by 30.04% when compared to DVFS control alone without runtime management.
In this paper, we propose D-PUF, an intrinsically reconfigurable DRAM PUF based on refresh pausing. A key feature of the proposed DRAM PUF is reconfigurability, i.e., by varying the refresh-pause interval, the challenge-response behavior of the PUF can be altered, making it robust against various attacks. The paper is broadly divided into two parts. In the first part, we demonstrate the use of D-PUF in performing device authentication through a secure, low-overhead methodology. In the second part, we show the generation of true random numbers using D-PUF. The design is implemented and validated using several off-the-shelf DDR3 DRAM modules.
Evaluation of industrial embedded control system designs is a time-consuming and imperfect process. While an ideal process would apply a formal verification technique such as model checking, for industrial scale control systems, these techniques are often difficult to use to verify performance aspects such as convergence. For industrial designs, engineers rely on testing processes to identify unexpected behaviors. We propose a novel framework called Underminer to improve the testing process; this is an automated technique to identify non-converging behaviors in embedded control system designs. It supports various convergence-like notions, such as those based on Lyapunov analysis and temporal logic formulae.
In this paper we present a new algorithm that combines contextual unfoldings and dynamic symbolic execution to systematically test multithreaded programs. The approach uses symbolic execution to limit the number of input values and unfoldings to limit the number of thread interleavings that are needed to cover reachable local states of threads in the program under test. We show that the use of contextual unfoldings allows interleavings of threads to be succinctly represented. This can in some cases lead to a substantial reduction to the number of needed test executions when compared to previous approaches.
Auto-tuning and parametric implementation of deep learning kernels allow off-the-shelf accelerator-based embedded platforms to deliver high performance and energy efficient mappings for lightweight neural networks. Low-complexity classifiers are characterized by operations on small image maps with 23 deep layers and few class labels. For these use cases, we consider a range of embedded systems with 20 W power budget such as the Xilinx ZC706 (FPGA), NVIDIA Jetson TX1 (GPU), TI Keystone II (DSP) and the Adapteva Parallella (RISC+NoC). In Caffe- Presso, we combine auto-tuning of the implementation parameters, and platform-specific constraints deliver optimized solution for each input ConvNet specification.
Coarse-grained reconfigurable architectures (CGRAs) have many processing elements, which is suitable for implementing spatial redundancy, as used in the design of fault-tolerant systems. This paper introduces a recovery time model for transient faults in CGRAs. The proposed fault-tolerance is based on triple modular redundancy and coding techniques for error detection and correction. To evaluate the model, several kernels from space computing are mapped onto the suggested architecture. We demonstrate the tradeoff between recovery time, performance and area. In addition, the average execution time of an application including recovery time is evaluated using area-based error rate estimates in harsh radiation environments.
In this paper we present Distributed Computing for Constrained Devices (DC4CD), a novel software architecture that supports symbolic distributed computing on Wireless Sensor Networks. DC4CD integrates the functionalities of a high-level symbolic interpreter, a compiler, and an operating system, and includes networking abstractions to exchange high-level symbolic code among peer devices. Contrarily to other architectures proposed in literature, DC4CD allows for changes at runtime, even on deployed nodes of both application and system code. Experimental results show that DC4CD is more efficient in terms of memory usage than existing architectures, with which also compares well in terms of execution efficiency.
The lack of schedulability evaluation of previous charging schemes in wireless rechargeable sensor networks (WRSNs) degrades the charging efficiency, leading to energy exhaustion. We propose an optimal path planning charging scheme, namely OPPC, under the on-demand charging architecture. OPPC evaluates the schedulability of a charging mission, which makes charging scheduling predictable. OPPC provides an optimal charging path which enhances the charging performance. When confronting with a non-schedulable charging mission, a couple of mission adjusting algorithms are developed to enable the schedulability. Simulation results demonstrate that OPPC can achieve better performance in successful charging ratio as well as charging efficiency.
This paper presents a new reliability-aware task mapping approach in a many-core platform at design time for applications with DAG-based task graphs. The main goal is to devise a task mapping which meets a predefined reliability threshold considering a minimized performance degradation. The proposed approach uses a majority-voting replication technique to fulfill error-masking capability. A quantitative reliability model is also proposed for the platform. Our iterative approach is applicable to an unlimited number of system fault types. All parts of the platform including cores, links and routers are assumed to be prone to failur
The drift phenomenon can degrade the quality of result (QoR) of applications in presence of approximate MLC-PCM. The architecture-level techniques to alleviate the effects of the drift incur considerable power overhead in embedded systems (about 100%). In this paper, we utilize the DVFS technique to speed up the execution of the application when the profiling shows the drift-based errors are more probable to decrease the probability of drift-based soft errors. But, we speed up the application to save energy when the drift is not threating. Power overhead is improved by 84% (in average) in our approach, while QoR is acceptable.