ASPLOS

All

Cited by Paper title Year
826 PowerNap: eliminating server idle power. 2009
814 A comparison of software and hardware techniques for x86 virtualization. 2006
680 Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. 2008
627 DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings. 2009
610 “No “”power”” struggles: coordinated multi-level power management for the data center. “ 2008
549 Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. 2006
540 Addressing shared resource contention in multicore processors via scheduling. 2010
498 Hybrid transactional memory. 2006
485 Clearing the clouds: a study of emerging scale-out workloads on modern hardware. 2012
405 Conservation cores: reducing the energy of mature computations. 2010
392 Overshadow: a virtualization-based approach to retrofitting protection in commodity operating systems. 2008
377 AVIO: detecting atomicity violations via access interleaving invariants. 2006
377 Accelerator: using data parallelism to program GPUs for general-purpose uses. 2006
370 Accurate and efficient regression modeling for microarchitectural performance and power prediction. 2006
369 Kendo: efficient deterministic multithreading in software. 2009
326 S2E: a platform for in-vivo multi-path analysis of software systems. 2011
318 Merge: a programming model for heterogeneous multi-core systems. 2008
314 Mnemosyne: lightweight persistent memory. 2011
310 Combinatorial sketching for finite programs. 2006
307 NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories. 2011
300 DMP: deterministic shared memory multiprocessing. 2009
275 CoreDet: a compiler and runtime system for deterministic multithreaded execution. 2010
272 Flikker: saving DRAM refresh-power through critical data partitioning. 2011
269 Accelerating critical section execution with asymmetric multi-core architectures. 2009
268 Early experience with a commercial hardware transactional memory implementation. 2009
267 CTrigger: exposing atomicity violation bugs from their hiding places. 2009
266 Efficiently exploring architectural design spaces via predictive modeling. 2006
258 Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications. 2009
256 Architecture support for disciplined approximate programming. 2012
248 Paragon: QoS-aware scheduling for heterogeneous datacenters. 2013
239 Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. 2010
232 Producing wrong data without doing anything obviously wrong! 2009
227 Supporting nested transactional memory in logTM. 2006
225 Mercury and freon: temperature emulation and management for server systems. 2006
220 DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. 2014
217 Understanding the propagation of hard errors to software and implications for resilient system design. 2008
213 Quasar: resource-efficient and QoS-aware cluster management. 2014
212 PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor. 2006
199 Geiger: monitoring the buffer cache in a virtual machine environment. 2006
199 MemScale: active low-power modes for main memory. 2011
197 Dynamic knobs for responsive power-aware computing. 2011
194 Tarazu: optimizing MapReduce on heterogeneous clusters. 2012
180 Joint optimization of idle and cooling power in data centers while maintaining response time. 2010
177 Green-Marl: a DSL for easy and efficient graph analysis. 2012
176 RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations. 2009
175 An asymmetric distributed shared memory model for heterogeneous parallel systems. 2010
172 Faults in linux: ten years later. 2011
171 SherLog: error diagnosis by connecting clues from run-time logs. 2010
170 Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs. 2008
169 ELI: bare-metal performance for I/O virtualization. 2012
168 Accelerating two-dimensional page walks for virtualized systems. 2008
164 Cosmic rays don’t strike twice: understanding the nature of DRAM errors and the implications for system design. 2012
163 A randomized scheduler with probabilistic guarantees of finding bugs. 2010
161 Micro-pages: increasing DRAM efficiency with locality-aware data placement. 2010
161 On-the-fly elimination of dynamic irregularities for GPU computing. 2011
157 Unikernels: library operating systems for the cloud. 2013
156 Shoestring: probabilistic soft error reliability on the cheap. 2010
155 ASSURE: automatic software self-healing using rescue points. 2009
153 DoublePlay: parallelizing sequential logging and replay. 2011
151 Parasol and GreenSwitch: managing datacenters powered by renewable energy. 2013
150 Dynamically replicated memory: building reliable systems from nanoscale resistive memories. 2010
147 OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance. 2013
144 Recording shared memory dependencies using strata. 2006
143 Ultra low-cost defect protection for microprocessor pipelines. 2006
142 A performance counter architecture for computing accurate CPI components. 2006
140 DejaVu: accelerating resource allocation in virtualized environments. 2012
137 Rethinking the library OS from the top down. 2011
136 Capo: a software-hardware interface for practical deterministic multiprocessor replay. 2009
133 Whole-system persistence. 2012
132 Respec: efficient online multiprocessor replayvia speculation and external determinism. 2010
132 Blink: managing server clusters on intermittent power. 2011
131 Traffic management: a holistic approach to memory placement on NUMA systems. 2013
130 Adaptive set pinning: managing shared caches in chip multiprocessors. 2008
128 Improving software diagnosability via log enhancement. 2011
128 InkTag: secure applications on an untrusted operating system. 2013
127 Parallelizing security checks on commodity hardware. 2008
127 Complete information flow tracking from the gates up. 2009
124 Hybrid NOrec: a case study in the effectiveness of best effort hardware transactional memory. 2011
123 A regulated transitive reduction (RTR) for longer memory race recording. 2006
120 Computation spreading: employing hardware migration to specialize CMP cores on-the-fly. 2006
113 Hardbound: architectural support for spatial safety of the C programming language. 2008
111 Power routing: dynamic power provisioning in the data center. 2010
110 Unbounded page-based transactional memory. 2006
109 Providing safe, user space access to fast, solid state disks. 2012
106 Ensuring operating system kernel integrity with OSck. 2011
104 Flexible architectural support for fine-grain scheduling. 2010
102 Virtualized and flexible ECC for main memory. 2010
101 Bell: bit-encoding online memory leak detection. 2006
101 The design and implementation of microdrivers. 2008
101 Speculative parallelization using software multi-threaded transactions. 2010
101 Architectural support for hypervisor-secure virtualization. 2012
99 Better bug reporting with better privacy. 2008
97 Bottleneck identification and scheduling in multithreaded applications. 2012
95 Automatic generation of peephole superoptimizers. 2006
93 Tradeoffs in transactional memory virtualization. 2006
92 Streamware: programming general-purpose multicore processors using streams. 2008
92 Optimistic parallelism benefits from data partitioning. 2008
91 A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing. 2010
91 ConMem: detecting severe concurrency bugs through an effect-oriented approach. 2010
90 ConSeq: detecting concurrency bugs through sequential errors. 2011
89 Leveraging stored energy for handling power emergencies in aggressively provisioned datacenters. 2012
89 Improving GPGPU concurrency with elastic kernels. 2013
88 Stochastic superoptimization. 2013
88 KVM/ARM: the design and implementation of the linux ARM hypervisor. 2014
87 RCDC: a relaxed consistency deterministic computer. 2011
87 SDF: software-defined flash for web-scale internet storage systems. 2014
85 Mementos: system support for long-running computation on RFID-scale devices. 2011
84 Sponge: portable stream programming on graphics engines. 2011
80 How low can you go?: recommendations for hardware-supported minimal TCB code execution. 2008
80 Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults. 2012
80 Iago attacks: why the system call API is a bad untrusted RPC interface. 2013
79 Understanding and visualizing full systems with data flow tomography. 2008
79 Paraprox: pattern-based approximation for data parallel applications. 2014
76 Q100: the architecture and design of a database processing unit. 2014
74 Pocket cloudlets. 2011
74 Data races vs. data race bugs: telling the difference with portend. 2012
72 Looking back on the language and hardware revolutions: measured power, performance, and scaling. 2011
72 Scalable address spaces using RCU balanced trees. 2012
71 Tartan: evaluating spatial computation for whole program execution. 2006
71 Characterizing processor thermal behavior. 2010
71 Inter-core cooperative TLB for chip multiprocessors. 2010
70 Temporal search: detecting hidden malware timebombs with virtual machines. 2006
70 GPUfs: integrating a file system with GPUs. 2013
68 Introspective 3D chips. 2006
68 Uncertain: a first-order type for uncertain data. 2014
68 Using ARM trustzone to build a trusted language runtime for mobile applications. 2014
67 Probabilistic job symbiosis modeling for SMT processor scheduling. 2010
66 Adapting to intermittent faults in multicore systems. 2008
66 Inter-core prefetching for multicore processors using migrating helper threads. 2011
65 Virtual ghost: protecting applications from hostile operating systems. 2014
64 An evaluation of the TRIPS computer system. 2009
64 CRUISE: cache replacement and utility-aware scheduling. 2012
61 Per-thread cycle accounting in SMT processors. 2009
59 Scale-out NUMA. 2014
58 Integrated network interfaces for high-bandwidth TCP/IP. 2006
58 Archipelago: trading address space for reliability and security. 2008
58 Mixed-mode multicore reliability. 2009
58 Understanding modern device drivers. 2012
58 Portable performance on heterogeneous architectures. 2013
58 Memory Errors in Modern Systems: The Good, The Bad, and The Ugly. 2015
57 Analyzing multicore dumps to facilitate concurrency bug reproduction. 2010
56 Software-based instruction caching for embedded processors. 2006
56 Execution migration in a heterogeneous-ISA chip multiprocessor. 2012
56 STABILIZER: statistically sound performance evaluation. 2013
55 A spatial path scheduling algorithm for EDGE architectures. 2006
55 Power containers: an OS facility for fine-grained power and energy management on multicore servers. 2013
54 ISOLATOR: dynamically ensuring isolation in comcurrent programs. 2009
54 Decoupling contention management from scheduling. 2010
54 NVM duet: unified working memory and persistent store architecture. 2014
53 Recovery domains: an organizing principle for recoverable operating systems. 2009
53 DreamWeaver: architectural support for deep sleep. 2012
53 PuDianNao: A Polyvalent Machine Learning Accelerator. 2015
52 PICSEL: measuring user-perceived performance to control dynamic frequency scaling. 2008
52 PocketWeb: instant web browsing for mobile devices. 2012
51 ApproxHadoop: Bringing Approximations to MapReduce Frameworks. 2015
50 Reflex: using low-power processors in smartphones without knowing them. 2012
50 Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers. 2015
49 SlicK: slice-based locality exploitation for efficient redundant multithreading. 2006
48 HOTL: a higher order theory of locality. 2013
47 A probabilistic pointer analysis for speculative optimizations. 2006
46 Efficiency trends and limits from comprehensive microarchitectural adaptivity. 2008
45 Leak pruning. 2009
44 2ndStrike: toward manifesting hidden concurrency typestate bugs. 2011
44 Using likely invariants for automated software fault localization. 2013
44 Fine-grained fault tolerance using device checkpoints. 2013
44 Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces. 2014
43 Understanding prediction-based partial redundant threading for low-overhead, high- coverage fault tolerance. 2006
43 Ubik: efficient cache sharing with strict qos for latency-critical workloads. 2014
43 Price theory based power management for heterogeneous multi-cores. 2014
42 Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors. 2008
42 Efficient online validation with delta execution. 2009
42 Commutativity analysis for software parallelization: letting program transformations see the big picture. 2009
42 ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications. 2010
42 Heterogeneous-race-free memory models. 2014
42 REF: resource elasticity fairness with sharing incentives for multiprocessors. 2014
41 Comprehensively and efficiently protecting the heap. 2006
41 DeNovoND: efficient hardware support for disciplined non-determinism. 2013
40 Safe and automatic live update for operating systems. 2013
40 EnCore: exploiting system environment and correlation information for misconfiguration detection. 2014
39 SoftSig: software-exposed hardware signatures for code analysis and optimization. 2008
39 Path-exploration lifting: hi-fi tests for lo-fi emulators. 2012
39 Post-compiler software optimization for reducing energy. 2014
38 Predictor virtualization. 2008
38 TwinDrivers: semi-automatic derivation of fast and safe hypervisor network drivers from guest OS drivers. 2009
38 Efficient sequential consistency via conflict ordering. 2012
38 Data-parallel finite-state machines. 2014
38 GPU Concurrency: Weak Behaviours and Programming Assumptions. 2015
37 Hardware counter driven on-the-fly request signatures. 2008
36 Orchestration by approximation: mapping stream programs onto multicore architectures. 2011
35 HeapMD: identifying heap-based bugs using anomaly detection. 2006
35 Stealth prefetching. 2006
35 A defect tolerant self-organizing nanoscale SIMD architecture. 2006
34 Instruction scheduling for a tiled dataflow architecture. 2006
34 Demand-based coordinated scheduling for SMP VMs. 2013
34 Protecting Data on Smartphones and Tablets from Memory Attacks. 2015
34 Beyond the PDP-11: Architectural Support for a Memory-Safe C Abstract Machine. 2015
33 A new idiom recognition framework for exploiting hardware-assist instructions. 2006
33 Optimal task assignment in multithreaded processors: a statistical approach. 2012
33 Verifying systems rules using rule-directed symbolic execution. 2013
33 Verifying security invariants in ExpressOS. 2013
33 A study of the scalability of stop-the-world garbage collectors on multicores. 2013
33 K2: a mobile operating system for heterogeneous coherence domains. 2014
33 Transactionalizing legacy code: an experience report using GCC and Memcached. 2014
33 Disengaged scheduling for fair, protected access to fast computational accelerators. 2014
33 Page Placement Strategies for GPUs within Heterogeneous Memory Systems. 2015
33 GhostRider: A Hardware-Software System for Memory Trace Oblivious Computation. 2015
33 Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures. 2015
32 Xoc, an extension-oriented compiler for systems programming. 2008
32 Computational sprinting on a hardware/software testbed. 2013
32 PolyMage: Automatic Optimization for Image Processing Pipelines. 2015
31 Mapping esterel onto a multi-threaded embedded processor. 2006
31 Tapping into the fountain of CPUs: on operating system support for programmable devices. 2008
30 MacroSS: macro-SIMDization of streaming applications. 2010
30 A case for neuromorphic ISAs. 2011
30 Hardware acceleration of transactional memory on commodity systems. 2011
30 Region scheduling: efficiently using the cache architectures via page-level affinity. 2012
30 Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems. 2013
30 ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers. 2013
30 Deterministic galois: on-demand, portable and parameterless. 2014
30 Mojim: A Reliable and Highly-Available Non-Volatile Memory System. 2015
29 Exploring circuit timing-aware language and compilation. 2011
29 Efficient processor support for DRFx, a memory model with exceptions. 2011
29 Comprehensive kernel instrumentation via dynamic binary translation. 2012
29 VSwapper: a memory swapper for virtualized environments. 2014
29 FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications. 2015
28 The mapping collector: virtual memory support for generational, parallel, and concurrent compaction. 2008
28 Discerning the dominant out-of-order performance advantage: is it speculation or dynamism? 2013
28 Production-run software failure diagnosis via hardware performance counters. 2013
28 Parallelizing data race detection. 2013
28 Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory. 2015
28 Chimera: Collaborative Preemption for Multitasking on a Shared GPU. 2015
27 Improving the performance of object-oriented languages with dynamic predication of indirect jumps. 2008
27 Maximum benefit from a minimal HTM. 2009
27 COMPASS: a programmable data prefetcher using idle GPU shaders. 2010
27 Chameleon: operating system support for dynamic processors. 2012
27 ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution. 2013
27 DDOS: taming nondeterminism in distributed systems. 2013
27 Underprovisioning backup power infrastructure for datacenters. 2014
26 Orthrus: efficient software integrity protection on multi-cores. 2010
26 Synthesizing concurrent schedulers for irregular algorithms. 2011
26 Improving the performance of trace-based systems by false loop filtering. 2011
26 Transparent mutable replay for multicore debugging and patch validation. 2013
25 Cooperative empirical failure avoidance for multithreaded programs. 2013
25 Integrated 3D-stacked server designs for increasing physical density of key-value stores. 2014
25 Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services. 2015
24 Butterfly analysis: adapting dataflow analysis to dynamic parallel monitoring. 2010
24 Applying transactional memory to concurrency bugs. 2012
24 A Probabilistic Graphical Model-based Approach for Minimizing Energy Under Performance Constraints. 2015
24 Architectural Support for Software-Defined Metadata Processing. 2015
23 Anomaly-based bug prediction, isolation, and validation: an automated approach for software debugging. 2009
23 A real system evaluation of hardware atomicity for software speculation. 2010
23 Automated repair of binary and assembly programs for cooperating embedded devices. 2013
23 Energy-efficient work-stealing language runtimes. 2014
23 A Hardware Design Language for Timing-Sensitive Information-Flow Security. 2015
23 A DNA-Based Archival Storage System. 2016
22 HICAMP: architectural support for efficient concurrency-safe shared structured data access. 2012
22 Monitoring and Debugging the Quality of Results in Approximate Programs. 2015
21 Accurate branch prediction for short threads. 2008
21 Specifying and checking semantic atomicity for multithreaded programs. 2011
21 SIMD defragmenter: efficient ILP realization on data-parallel architectures. 2012
21 DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations. 2015
20 Phantom-BTB: a virtualized branch target buffer design. 2009
20 A case for unlimited watchpoints. 2012
20 GPUDet: a deterministic GPU architecture. 2013
20 Volition: scalable and precise sequential consistency violation detection. 2013
20 Cyrus: unintrusive application-level record-replay for replay parallelism. 2013
20 Sapper: a language for hardware-level security policy enforcement. 2014
20 SI-TM: reducing transactional memory abort rates through snapshot isolation. 2014
20 NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines. 2015
20 Freecursive ORAM: [Nearly] Free Recursion and Integrity Verification for Position-based Oblivious RAM. 2015
19 Dispersing proprietary applications as benchmarks through code mutation. 2008
19 The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism. 2014
19 Hybrid Static: Dynamic Analysis for Statically Bounded Region Serializability. 2015
18 StreamRay: a stream filtering architecture for coherent ray tracing. 2009
18 Specifying and dynamically verifying address translation-aware memory consistency. 2010
18 Aikido: accelerating shared data dynamic analyses. 2012
18 Comprehending performance from real-world execution traces: a device-driver case. 2014
18 Prototyping symbolic execution engines for interpreted languages. 2014
17 Architectural implications of nanoscale integrated sensing and computing. 2009
17 Request behavior variations. 2010
17 Nested Kernel: An Operating System Architecture for Intra-Kernel Privilege Separation. 2015
16 Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle. 2009
16 A declarative language approach to device configuration. 2011
16 Why you should care about quantile regression. 2013
16 Rhythm: harnessing data parallel hardware for server workloads. 2014
16 I/o paravirtualization at the device file boundary. 2014
16 Targeted Automatic Integer Overflow Discovery Using Goal-Directed Conditional Branch Enforcement. 2015
16 DEUCE: Write-Efficient Encryption for Non-Volatile Memories. 2015
16 Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques. 2016
15 Dynamic prediction of collection yield for managed runtimes. 2009
15 Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies. 2013
15 rIOMMU: Efficient IOMMU for I/O Devices that Employ Ring Buffers. 2015
15 CoGENT: Verifying High-Assurance File System Implementations. 2016
14 Automatic generation of hardware/software interfaces. 2012
14 To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach. 2013
14 Hardware support for fine-grained event-driven computation in Anton 2. 2013
14 RelaxReplay: record and replay for relaxed-consistency multiprocessors. 2014
14 OpenPiton: An Open Source Manycore Research Framework. 2016
13 Communication optimizations for global multi-threaded instruction scheduling. 2008
13 iThreads: A Threading Library for Parallel Incremental Computation. 2015
12 Totally green: evaluating and designing servers for lifecycle environmental impact. 2012
12 Iterative optimization for the data center. 2012
12 Low-level detection of language-level data races with LARD. 2014
12 The sharing architecture: sub-core configurability for IaaS clouds. 2014
12 Synchronization Using Remote-Scope Promotion. 2015
12 High-Performance Transactions for Persistent Memories. 2016
11 Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems. 2014
11 CommGuard: Mitigating Communication Errors in Error-Prone Parallel Execution. 2015
11 HCloud: Resource-Efficient Provisioning in Shared Cloud Systems. 2016
10 Dynamic filtering: multi-purpose architecture support for language runtime systems. 2010
10 “Challenging the “”embarrassingly sequential””: parallelizing finite state machine-based computations through principled speculation. “ 2014
10 Speculative hardware/software co-designed floating-point multiply-add fusion. 2014
10 Leveraging the short-term memory of hardware to diagnose production-run software failures. 2014
10 VARAN the Unbelievable: An Efficient N-version Execution Framework. 2015
10 SPECS: A Lightweight Runtime Mechanism for Protecting Software from Security-Critical Processor Bugs. 2015
10 Automated OS-level Device Runtime Power Management. 2015
10 TaxDC: A Taxonomy of Non-Deterministic Concurrency Bugs in Datacenter Distributed Systems. 2016
10 Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications. 2016
10 How to Build Static Checking Systems Using Orders of Magnitude Less Code. 2016
9 Practical automatic loop specialization. 2013
9 Fence-free work stealing on bounded TSO processors. 2014
9 Neuromorphic processing: a new frontier in scaling computer architecture. 2014
9 Ziria: A DSL for Wireless Systems Programming. 2015
9 CoolAir: Temperature- and Variation-Aware Management for Free-Cooled Datacenters. 2015
9 Improving Agility and Elasticity in Bare-metal Clouds. 2015
9 SD-PCM: Constructing Reliable Super Dense Phase Change Memory under Write Disturbance. 2015
9 ANVIL: Software-Based Protection Against Next-Generation Rowhammer Attacks. 2016
8 An update-aware storage system for low-locality update-intensive workloads. 2012
8 Cider: native execution of iOS apps on android. 2014
8 Specifying and Checking File System Crash-Consistency Models. 2016
8 Failure-Atomic Persistent Memory Updates via JUSTDO Logging. 2016
8 The Computational Sprinting Game. 2016
8 Scaling up Superoptimization. 2016
7 Continuous object access profiling and optimizations to overcome the memory wall and bloat. 2012
7 The rise of the expert amateur: DIY culture and the evolution of computer science. 2013
7 ASC: automatically scalable computation. 2014
7 Finding the limit: examining the potential and complexity of compilation scheduling for JIT-based runtime systems. 2014
7 Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD). 2015
7 Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems. 2016
7 High Performance Packet Processing with FlexNIC. 2016
7 Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers. 2016
7 Scalable Kernel TCP Design and Implementation for Short-Lived Connections. 2016
7 COATCheck: Verifying Memory Ordering at the Hardware-OS Interface. 2016
6 A program transformation and architecture support for quantum uncomputation. 2006
6 Toward molecular programming with DNA. 2008
6 Improved device driver reliability through hardware verification reuse. 2011
6 High-performance fractal coherence. 2014
6 Temporally Bounding TSO for Fence-Free Asymmetric Synchronization. 2015
6 ProteusTM: Abstraction Meets Performance in Transactional Memory. 2016
6 Generating Configurable Hardware from Parallel Patterns. 2016
6 Proactive Control of Approximate Programs. 2016
5 The cloud will change everything. 2011
5 Compiler Management of Communication and Parallelism for Quantum Computation. 2015
5 PIFT: Predictive Information-Flow Tracking. 2016
5 An Energy-interference-free Hardware-Software Debugger for Intermittent Energy-harvesting Systems. 2016
5 Paravirtual Remote I/O. 2016
5 High-Density Image Storage Using Approximate Memory Cells. 2016
5 Analyzing Behavior Specialized Acceleration. 2016
4 Impact of virtualization on computer architecture and operating systems. 2006
4 DeAliaser: alias speculation using atomic region support. 2013
4 Efficient virtualization on embedded power architecture® platforms. 2013
4 Guardrail: a high fidelity approach to protecting hardware devices from buggy drivers. 2014
4 Finding trojan message vulnerabilities in distributed systems. 2014
4 Kinetic Dependence Graphs. 2015
4 On-the-Fly Principled Speculation for FSM Parallelization. 2015
4 Watson and the Era of Cognitive Computing. 2015
4 NVWAL: Exploiting NVRAM in Write-Ahead Logging. 2016
4 Sego: Pervasive Trusted Metadata for Efficiently Verified Untrusted System Services. 2016
4 memif: Towards Programming Heterogeneous Memory Asynchronously. 2016
4 Silent Shredder: Zero-Cost Shredding for Secure Non-Volatile Main Memory Controllers. 2016
3 Architectural Support for Cyber-Physical Systems. 2015
3 M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores. 2016
3 Whirlpool: Improving Dynamic Cache Management with Static Data Classification. 2016
3 RAPID Programming of Pattern-Recognition Processors. 2016
3 TxRace: Efficient Data Race Detection Using Commodity Hardware Transactional Memory. 2016
3 SpaceJMP: Programming with Multiple Virtual Address Spaces. 2016
3 ReBudget: Trading Off Efficiency vs. Fairness in Market-Based Multicore Resource Allocation via Runtime Budget Reassignment. 2016
3 Efficient Address Translation for Architectures with Multiple Page Sizes. 2017
3 SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing. 2017
2 Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs. 2014
2 Dual Execution for On the Fly Fine Grained Execution Comparison. 2015
2 HIPStR: Heterogeneous-ISA Program State Relocation. 2016
2 Interference Management for Distributed Parallel Applications in Consolidated Clusters. 2016
2 WiSync: An Architecture for Fast Synchronization through On-Chip Wireless Communication. 2016
2 Architecture-Adaptive Code Variant Tuning. 2016
2 True IOMMU Protection from DMA Attacks: When Copy is Faster than Zero Copy. 2016
2 CSR: Core Surprise Removal in Commodity Operating Systems. 2016
2 DySel: Lightweight Dynamic Selection for Kernel-based Data-parallel Programming Model. 2016
2 AxGames: Towards Crowdsourcing Quality Target Determination in Approximate Computing. 2016
2 Lifting Assembly to Intermediate Representation: A Novel Approach Leveraging Compilers. 2016
2 Automated Synthesis of Comprehensive Memory Model Litmus Test Suites. 2017
2 Breaking the Boundaries in Heterogeneous-ISA Datacenters. 2017
2 KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations. 2017
2 Translation-Triggered Prefetching. 2017
1 Research directions for 21st century computer systems: asplos 2013 panel. 2013
1 Resolved: specialized architectures, languages, and system software should supplant general-purpose alternatives within a decade. 2014
1 Inside windows azure: the challenges and opportunities of a cloud operating system. 2014
1 More is Less, Less is More: Molecular-Scale Photonic NoC Power Topologies. 2015
1 Asymmetric Memory Fences: Optimizing Both Performance and Implementability. 2015
1 Architectural Support for Dynamic Linking. 2015
1 DIABLO: A Warehouse-Scale Computer Network Simulator using FPGAs. 2015
1 Prudent Memory Reclamation in Procrastination-Based Synchronization. 2016
1 Programming Uncertain<T>jhings. 2016
1 TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services. 2016
1 LDX: Causality Inference by Lightweight Dual Execution. 2016
1 CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs. 2016
1 CASPAR: Breaking Serialization in Lock-Free Multicore Synchronization. 2016
1 Brain Inspired Computing. 2016
1 ReFlex: Remote Flash?Local Flash. 2017
1 SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs. 2017
1 Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling. 2017
1 Exploiting Intra-Request Slack to Improve SSD Performance. 2017
1 Towards Practical Default-On Multi-Core Record/Replay. 2017
1 Enabling Lightweight Transactions with Precision Time. 2017
1 An Analysis of Persistent Memory Use with WHISPER. 2017
1 GRIFFIN: Guarding Control Flows Using Intel Processor Trace. 2017
1 TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory. 2017
1 Thermostat: Application-transparent Page Management for Two-tiered Main Memory. 2017
1 Page Fault Support for Network Controllers. 2017
0 Technology for developing regions: Moore’s law is not enough. 2010
0 TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations. 2013
0 Programmer Productivity in a World of Mushy Interfaces: Challenges of the Post-ISA Reality. 2016
0 Synopsis of the ASPLOS ‘16 Wild and Crazy Ideas (WACI) Invited-Speakers Session. 2016
0 RID: Finding Reference Count Bugs with Inconsistent Path Pair Checking. 2016
0 Sidewinder: An Energy Efficient and Developer Friendly Heterogeneous Architecture for Continuous Mobile Sensing. 2016
0 Pallas: Semantic-Aware Checking for Finding Deep Bugs in Fast Path. 2017
0 Kill the Program Counter: Reconstructing Program Behavior in the Processor Cache Hierarchy. 2017
0 FLEP: Enabling Flexible and Efficient Preemption on GPUs. 2017
0 CHERI JNI: Sinking the Java Security Model into the C. 2017
0 3DGates: An Instruction-Level Energy Analysis and Optimization of 3D Printers. 2017
0 Moonwalk: NRE Optimization in ASIC Clouds. 2017
0 AMNESIAC: Amnesic Automatic Computer. 2017
0 AsyncClock: Scalable Inference of Asynchronous Event Causality. 2017
0 Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers. 2017
0 Typed Architectures: Architectural Support for Lightweight Scripting. 2017
0 REDSPY: Exploring Value Locality in Software. 2017
0 Locality Transformations for Nested Recursive Iteration Spaces. 2017
0 Verification of a Practical Hardware Security Architecture Through Static Information Flow Analysis. 2017
0 Optimizing CNNs on Multicores for Scalability, Performance and Goodput. 2017
0 Locality-Aware CTA Clustering for Modern GPUs. 2017
0 Browsix: Bridging the Gap Between Unix and the Browser. 2017
0 DCatch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems. 2017
0 Identifying Security Critical Properties for the Dynamic Verification of a Processor. 2017
0 An Architecture Supporting Formal and Compositional Binary Analysis. 2017
0 Sound Loop Superoptimization for Google Native Client. 2017
0 Dynamic Resource Management for Efficient Utilization of Multitasking GPUs. 2017
0 Black-box Concurrent Data Structures for NUMA Architectures. 2017
0 Determining Application-specific Peak Power and Energy Requirements for Ultra-low Power Processors. 2017
0 Approximate Storage of Compressed and Encrypted Videos. 2017
0 History-Based Arbitration for Fairness in Processor-Interconnect of NUMA Servers. 2017
0 Crossing Guard: Mediating Host-Accelerator Coherence Interactions. 2017
0 Bolt: I Know What You Did Last Summer... In The Cloud. 2017
0 DudeTM: Building Durable Transactions with Decoupling for Persistent Memory. 2017
0 Voltage Regulator Efficiency Aware Power Management. 2017
0 Failure-Atomic Slotted Paging for Persistent Memory. 2017
0 TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA. 2017
0 Mallacc: Accelerating Memory Allocation. 2017
0 “Towards “”Full Containerization”” in Containerized Network Function Virtualization. “ 2017
0 IncBricks: Toward In-Network Computation with an In-Network Cache. 2017
0 Big Data Analytics and Intelligence at Alibaba Cloud. 2017
0 CoRAL: Confined Recovery in Distributed Asynchronous Graph Processing. 2017
0 Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. 2017
0 ProRace: Practical Data Race Detection for Production Use. 2017
0 What Scalable Programs Need from Transactional Memory. 2017
0 Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code. 2017

2017

Cited by Paper title
3 Efficient Address Translation for Architectures with Multiple Page Sizes.
3 SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing.
2 Automated Synthesis of Comprehensive Memory Model Litmus Test Suites.
2 Breaking the Boundaries in Heterogeneous-ISA Datacenters.
2 KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations.
2 Translation-Triggered Prefetching.
1 ReFlex: Remote Flash?Local Flash.
1 SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs.
1 Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling.
1 Exploiting Intra-Request Slack to Improve SSD Performance.
1 Towards Practical Default-On Multi-Core Record/Replay.
1 Enabling Lightweight Transactions with Precision Time.
1 An Analysis of Persistent Memory Use with WHISPER.
1 GRIFFIN: Guarding Control Flows Using Intel Processor Trace.
1 TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory.
1 Thermostat: Application-transparent Page Management for Two-tiered Main Memory.
1 Page Fault Support for Network Controllers.
0 Pallas: Semantic-Aware Checking for Finding Deep Bugs in Fast Path.
0 Kill the Program Counter: Reconstructing Program Behavior in the Processor Cache Hierarchy.
0 FLEP: Enabling Flexible and Efficient Preemption on GPUs.
0 CHERI JNI: Sinking the Java Security Model into the C.
0 3DGates: An Instruction-Level Energy Analysis and Optimization of 3D Printers.
0 Moonwalk: NRE Optimization in ASIC Clouds.
0 AMNESIAC: Amnesic Automatic Computer.
0 AsyncClock: Scalable Inference of Asynchronous Event Causality.
0 Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers.
0 Typed Architectures: Architectural Support for Lightweight Scripting.
0 REDSPY: Exploring Value Locality in Software.
0 Locality Transformations for Nested Recursive Iteration Spaces.
0 Verification of a Practical Hardware Security Architecture Through Static Information Flow Analysis.
0 Optimizing CNNs on Multicores for Scalability, Performance and Goodput.
0 Locality-Aware CTA Clustering for Modern GPUs.
0 Browsix: Bridging the Gap Between Unix and the Browser.
0 DCatch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems.
0 Identifying Security Critical Properties for the Dynamic Verification of a Processor.
0 An Architecture Supporting Formal and Compositional Binary Analysis.
0 Sound Loop Superoptimization for Google Native Client.
0 Dynamic Resource Management for Efficient Utilization of Multitasking GPUs.
0 Black-box Concurrent Data Structures for NUMA Architectures.
0 Determining Application-specific Peak Power and Energy Requirements for Ultra-low Power Processors.
0 Approximate Storage of Compressed and Encrypted Videos.
0 History-Based Arbitration for Fairness in Processor-Interconnect of NUMA Servers.
0 Crossing Guard: Mediating Host-Accelerator Coherence Interactions.
0 Bolt: I Know What You Did Last Summer... In The Cloud.
0 DudeTM: Building Durable Transactions with Decoupling for Persistent Memory.
0 Voltage Regulator Efficiency Aware Power Management.
0 Failure-Atomic Slotted Paging for Persistent Memory.
0 TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA.
0 Mallacc: Accelerating Memory Allocation.
0 “Towards “”Full Containerization”” in Containerized Network Function Virtualization. “
0 IncBricks: Toward In-Network Computation with an In-Network Cache.
0 Big Data Analytics and Intelligence at Alibaba Cloud.
0 CoRAL: Confined Recovery in Distributed Asynchronous Graph Processing.
0 Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge.
0 ProRace: Practical Data Race Detection for Production Use.
0 What Scalable Programs Need from Transactional Memory.
0 Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code.

2016

Cited by Paper title
23 A DNA-Based Archival Storage System.
16 Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques.
15 CoGENT: Verifying High-Assurance File System Implementations.
14 OpenPiton: An Open Source Manycore Research Framework.
12 High-Performance Transactions for Persistent Memories.
11 HCloud: Resource-Efficient Provisioning in Shared Cloud Systems.
10 TaxDC: A Taxonomy of Non-Deterministic Concurrency Bugs in Datacenter Distributed Systems.
10 Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications.
10 How to Build Static Checking Systems Using Orders of Magnitude Less Code.
9 ANVIL: Software-Based Protection Against Next-Generation Rowhammer Attacks.
8 Specifying and Checking File System Crash-Consistency Models.
8 Failure-Atomic Persistent Memory Updates via JUSTDO Logging.
8 The Computational Sprinting Game.
8 Scaling up Superoptimization.
7 Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems.
7 High Performance Packet Processing with FlexNIC.
7 Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers.
7 Scalable Kernel TCP Design and Implementation for Short-Lived Connections.
7 COATCheck: Verifying Memory Ordering at the Hardware-OS Interface.
6 ProteusTM: Abstraction Meets Performance in Transactional Memory.
6 Generating Configurable Hardware from Parallel Patterns.
6 Proactive Control of Approximate Programs.
5 PIFT: Predictive Information-Flow Tracking.
5 An Energy-interference-free Hardware-Software Debugger for Intermittent Energy-harvesting Systems.
5 Paravirtual Remote I/O.
5 High-Density Image Storage Using Approximate Memory Cells.
5 Analyzing Behavior Specialized Acceleration.
4 NVWAL: Exploiting NVRAM in Write-Ahead Logging.
4 Sego: Pervasive Trusted Metadata for Efficiently Verified Untrusted System Services.
4 memif: Towards Programming Heterogeneous Memory Asynchronously.
4 Silent Shredder: Zero-Cost Shredding for Secure Non-Volatile Main Memory Controllers.
3 M3: A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores.
3 Whirlpool: Improving Dynamic Cache Management with Static Data Classification.
3 RAPID Programming of Pattern-Recognition Processors.
3 TxRace: Efficient Data Race Detection Using Commodity Hardware Transactional Memory.
3 SpaceJMP: Programming with Multiple Virtual Address Spaces.
3 ReBudget: Trading Off Efficiency vs. Fairness in Market-Based Multicore Resource Allocation via Runtime Budget Reassignment.
2 HIPStR: Heterogeneous-ISA Program State Relocation.
2 Interference Management for Distributed Parallel Applications in Consolidated Clusters.
2 WiSync: An Architecture for Fast Synchronization through On-Chip Wireless Communication.
2 Architecture-Adaptive Code Variant Tuning.
2 True IOMMU Protection from DMA Attacks: When Copy is Faster than Zero Copy.
2 CSR: Core Surprise Removal in Commodity Operating Systems.
2 DySel: Lightweight Dynamic Selection for Kernel-based Data-parallel Programming Model.
2 AxGames: Towards Crowdsourcing Quality Target Determination in Approximate Computing.
2 Lifting Assembly to Intermediate Representation: A Novel Approach Leveraging Compilers.
1 Prudent Memory Reclamation in Procrastination-Based Synchronization.
1 Programming Uncertain<T>jhings.
1 TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services.
1 LDX: Causality Inference by Lightweight Dual Execution.
1 CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs.
1 CASPAR: Breaking Serialization in Lock-Free Multicore Synchronization.
1 Brain Inspired Computing.
0 Programmer Productivity in a World of Mushy Interfaces: Challenges of the Post-ISA Reality.
0 Synopsis of the ASPLOS ‘16 Wild and Crazy Ideas (WACI) Invited-Speakers Session.
0 RID: Finding Reference Count Bugs with Inconsistent Path Pair Checking.
0 Sidewinder: An Energy Efficient and Developer Friendly Heterogeneous Architecture for Continuous Mobile Sensing.

2015

Cited by Paper title
58 Memory Errors in Modern Systems: The Good, The Bad, and The Ugly.
53 PuDianNao: A Polyvalent Machine Learning Accelerator.
51 ApproxHadoop: Bringing Approximations to MapReduce Frameworks.
50 Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers.
38 GPU Concurrency: Weak Behaviours and Programming Assumptions.
34 Protecting Data on Smartphones and Tablets from Memory Attacks.
34 Beyond the PDP-11: Architectural Support for a Memory-Safe C Abstract Machine.
33 Page Placement Strategies for GPUs within Heterogeneous Memory Systems.
33 GhostRider: A Hardware-Software System for Memory Trace Oblivious Computation.
33 Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures.
32 PolyMage: Automatic Optimization for Image Processing Pipelines.
30 Mojim: A Reliable and Highly-Available Non-Volatile Memory System.
29 FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications.
28 Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory.
28 Chimera: Collaborative Preemption for Multitasking on a Shared GPU.
25 Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services.
24 A Probabilistic Graphical Model-based Approach for Minimizing Energy Under Performance Constraints.
24 Architectural Support for Software-Defined Metadata Processing.
23 A Hardware Design Language for Timing-Sensitive Information-Flow Security.
22 Monitoring and Debugging the Quality of Results in Approximate Programs.
21 DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations.
20 NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines.
20 Freecursive ORAM: [Nearly] Free Recursion and Integrity Verification for Position-based Oblivious RAM.
19 Hybrid Static: Dynamic Analysis for Statically Bounded Region Serializability.
17 Nested Kernel: An Operating System Architecture for Intra-Kernel Privilege Separation.
16 Targeted Automatic Integer Overflow Discovery Using Goal-Directed Conditional Branch Enforcement.
16 DEUCE: Write-Efficient Encryption for Non-Volatile Memories.
15 rIOMMU: Efficient IOMMU for I/O Devices that Employ Ring Buffers.
13 iThreads: A Threading Library for Parallel Incremental Computation.
12 Synchronization Using Remote-Scope Promotion.
11 CommGuard: Mitigating Communication Errors in Error-Prone Parallel Execution.
10 VARAN the Unbelievable: An Efficient N-version Execution Framework.
10 SPECS: A Lightweight Runtime Mechanism for Protecting Software from Security-Critical Processor Bugs.
10 Automated OS-level Device Runtime Power Management.
9 Ziria: A DSL for Wireless Systems Programming.
9 CoolAir: Temperature- and Variation-Aware Management for Free-Cooled Datacenters.
9 Improving Agility and Elasticity in Bare-metal Clouds.
9 SD-PCM: Constructing Reliable Super Dense Phase Change Memory under Write Disturbance.
7 Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD).
6 Temporally Bounding TSO for Fence-Free Asymmetric Synchronization.
5 Compiler Management of Communication and Parallelism for Quantum Computation.
4 Kinetic Dependence Graphs.
4 On-the-Fly Principled Speculation for FSM Parallelization.
4 Watson and the Era of Cognitive Computing.
3 Architectural Support for Cyber-Physical Systems.
2 Dual Execution for On the Fly Fine Grained Execution Comparison.
1 More is Less, Less is More: Molecular-Scale Photonic NoC Power Topologies.
1 Asymmetric Memory Fences: Optimizing Both Performance and Implementability.
1 Architectural Support for Dynamic Linking.
1 DIABLO: A Warehouse-Scale Computer Network Simulator using FPGAs.

2014

Cited by Paper title
220 DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning.
213 Quasar: resource-efficient and QoS-aware cluster management.
88 KVM/ARM: the design and implementation of the linux ARM hypervisor.
87 SDF: software-defined flash for web-scale internet storage systems.
79 Paraprox: pattern-based approximation for data parallel applications.
76 Q100: the architecture and design of a database processing unit.
68 Uncertain: a first-order type for uncertain data.
68 Using ARM trustzone to build a trusted language runtime for mobile applications.
65 Virtual ghost: protecting applications from hostile operating systems.
59 Scale-out NUMA.
54 NVM duet: unified working memory and persistent store architecture.
44 Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces.
43 Ubik: efficient cache sharing with strict qos for latency-critical workloads.
43 Price theory based power management for heterogeneous multi-cores.
42 Heterogeneous-race-free memory models.
42 REF: resource elasticity fairness with sharing incentives for multiprocessors.
40 EnCore: exploiting system environment and correlation information for misconfiguration detection.
39 Post-compiler software optimization for reducing energy.
38 Data-parallel finite-state machines.
33 K2: a mobile operating system for heterogeneous coherence domains.
33 Transactionalizing legacy code: an experience report using GCC and Memcached.
33 Disengaged scheduling for fair, protected access to fast computational accelerators.
30 Deterministic galois: on-demand, portable and parameterless.
29 VSwapper: a memory swapper for virtualized environments.
27 Underprovisioning backup power infrastructure for datacenters.
25 Integrated 3D-stacked server designs for increasing physical density of key-value stores.
23 Energy-efficient work-stealing language runtimes.
20 Sapper: a language for hardware-level security policy enforcement.
20 SI-TM: reducing transactional memory abort rates through snapshot isolation.
19 The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism.
18 Comprehending performance from real-world execution traces: a device-driver case.
18 Prototyping symbolic execution engines for interpreted languages.
16 Rhythm: harnessing data parallel hardware for server workloads.
16 I/o paravirtualization at the device file boundary.
14 RelaxReplay: record and replay for relaxed-consistency multiprocessors.
12 Low-level detection of language-level data races with LARD.
12 The sharing architecture: sub-core configurability for IaaS clouds.
11 Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems.
10 “Challenging the “”embarrassingly sequential””: parallelizing finite state machine-based computations through principled speculation. “
10 Speculative hardware/software co-designed floating-point multiply-add fusion.
10 Leveraging the short-term memory of hardware to diagnose production-run software failures.
9 Fence-free work stealing on bounded TSO processors.
9 Neuromorphic processing: a new frontier in scaling computer architecture.
8 Cider: native execution of iOS apps on android.
7 ASC: automatically scalable computation.
7 Finding the limit: examining the potential and complexity of compilation scheduling for JIT-based runtime systems.
6 High-performance fractal coherence.
4 Guardrail: a high fidelity approach to protecting hardware devices from buggy drivers.
4 Finding trojan message vulnerabilities in distributed systems.
2 Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs.
1 Resolved: specialized architectures, languages, and system software should supplant general-purpose alternatives within a decade.
1 Inside windows azure: the challenges and opportunities of a cloud operating system.

2013

Cited by Paper title
248 Paragon: QoS-aware scheduling for heterogeneous datacenters.
157 Unikernels: library operating systems for the cloud.
151 Parasol and GreenSwitch: managing datacenters powered by renewable energy.
147 OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance.
131 Traffic management: a holistic approach to memory placement on NUMA systems.
128 InkTag: secure applications on an untrusted operating system.
89 Improving GPGPU concurrency with elastic kernels.
88 Stochastic superoptimization.
80 Iago attacks: why the system call API is a bad untrusted RPC interface.
70 GPUfs: integrating a file system with GPUs.
58 Portable performance on heterogeneous architectures.
56 STABILIZER: statistically sound performance evaluation.
55 Power containers: an OS facility for fine-grained power and energy management on multicore servers.
48 HOTL: a higher order theory of locality.
44 Using likely invariants for automated software fault localization.
44 Fine-grained fault tolerance using device checkpoints.
41 DeNovoND: efficient hardware support for disciplined non-determinism.
40 Safe and automatic live update for operating systems.
34 Demand-based coordinated scheduling for SMP VMs.
33 Verifying systems rules using rule-directed symbolic execution.
33 Verifying security invariants in ExpressOS.
33 A study of the scalability of stop-the-world garbage collectors on multicores.
32 Computational sprinting on a hardware/software testbed.
30 Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems.
30 ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers.
28 Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?
28 Production-run software failure diagnosis via hardware performance counters.
28 Parallelizing data race detection.
27 ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution.
27 DDOS: taming nondeterminism in distributed systems.
26 Transparent mutable replay for multicore debugging and patch validation.
25 Cooperative empirical failure avoidance for multithreaded programs.
23 Automated repair of binary and assembly programs for cooperating embedded devices.
20 GPUDet: a deterministic GPU architecture.
20 Volition: scalable and precise sequential consistency violation detection.
20 Cyrus: unintrusive application-level record-replay for replay parallelism.
16 Why you should care about quantile regression.
15 Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies.
14 To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach.
14 Hardware support for fine-grained event-driven computation in Anton 2.
9 Practical automatic loop specialization.
7 The rise of the expert amateur: DIY culture and the evolution of computer science.
4 DeAliaser: alias speculation using atomic region support.
4 Efficient virtualization on embedded power architecture® platforms.
1 Research directions for 21st century computer systems: asplos 2013 panel.
0 TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations.

2012

Cited by Paper title
485 Clearing the clouds: a study of emerging scale-out workloads on modern hardware.
256 Architecture support for disciplined approximate programming.
194 Tarazu: optimizing MapReduce on heterogeneous clusters.
177 Green-Marl: a DSL for easy and efficient graph analysis.
169 ELI: bare-metal performance for I/O virtualization.
164 Cosmic rays don’t strike twice: understanding the nature of DRAM errors and the implications for system design.
140 DejaVu: accelerating resource allocation in virtualized environments.
133 Whole-system persistence.
109 Providing safe, user space access to fast, solid state disks.
101 Architectural support for hypervisor-secure virtualization.
97 Bottleneck identification and scheduling in multithreaded applications.
89 Leveraging stored energy for handling power emergencies in aggressively provisioned datacenters.
80 Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults.
74 Data races vs. data race bugs: telling the difference with portend.
72 Scalable address spaces using RCU balanced trees.
64 CRUISE: cache replacement and utility-aware scheduling.
58 Understanding modern device drivers.
56 Execution migration in a heterogeneous-ISA chip multiprocessor.
53 DreamWeaver: architectural support for deep sleep.
52 PocketWeb: instant web browsing for mobile devices.
50 Reflex: using low-power processors in smartphones without knowing them.
39 Path-exploration lifting: hi-fi tests for lo-fi emulators.
38 Efficient sequential consistency via conflict ordering.
33 Optimal task assignment in multithreaded processors: a statistical approach.
30 Region scheduling: efficiently using the cache architectures via page-level affinity.
29 Comprehensive kernel instrumentation via dynamic binary translation.
27 Chameleon: operating system support for dynamic processors.
24 Applying transactional memory to concurrency bugs.
22 HICAMP: architectural support for efficient concurrency-safe shared structured data access.
21 SIMD defragmenter: efficient ILP realization on data-parallel architectures.
20 A case for unlimited watchpoints.
18 Aikido: accelerating shared data dynamic analyses.
14 Automatic generation of hardware/software interfaces.
12 Totally green: evaluating and designing servers for lifecycle environmental impact.
12 Iterative optimization for the data center.
8 An update-aware storage system for low-locality update-intensive workloads.
7 Continuous object access profiling and optimizations to overcome the memory wall and bloat.

2011

Cited by Paper title
326 S2E: a platform for in-vivo multi-path analysis of software systems.
314 Mnemosyne: lightweight persistent memory.
307 NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories.
272 Flikker: saving DRAM refresh-power through critical data partitioning.
199 MemScale: active low-power modes for main memory.
197 Dynamic knobs for responsive power-aware computing.
172 Faults in linux: ten years later.
161 On-the-fly elimination of dynamic irregularities for GPU computing.
153 DoublePlay: parallelizing sequential logging and replay.
137 Rethinking the library OS from the top down.
132 Blink: managing server clusters on intermittent power.
128 Improving software diagnosability via log enhancement.
124 Hybrid NOrec: a case study in the effectiveness of best effort hardware transactional memory.
106 Ensuring operating system kernel integrity with OSck.
90 ConSeq: detecting concurrency bugs through sequential errors.
87 RCDC: a relaxed consistency deterministic computer.
85 Mementos: system support for long-running computation on RFID-scale devices.
84 Sponge: portable stream programming on graphics engines.
74 Pocket cloudlets.
72 Looking back on the language and hardware revolutions: measured power, performance, and scaling.
66 Inter-core prefetching for multicore processors using migrating helper threads.
44 2ndStrike: toward manifesting hidden concurrency typestate bugs.
36 Orchestration by approximation: mapping stream programs onto multicore architectures.
30 A case for neuromorphic ISAs.
30 Hardware acceleration of transactional memory on commodity systems.
29 Exploring circuit timing-aware language and compilation.
29 Efficient processor support for DRFx, a memory model with exceptions.
26 Synthesizing concurrent schedulers for irregular algorithms.
26 Improving the performance of trace-based systems by false loop filtering.
21 Specifying and checking semantic atomicity for multithreaded programs.
16 A declarative language approach to device configuration.
6 Improved device driver reliability through hardware verification reuse.
5 The cloud will change everything.

2010

Cited by Paper title
540 Addressing shared resource contention in multicore processors via scheduling.
405 Conservation cores: reducing the energy of mature computations.
275 CoreDet: a compiler and runtime system for deterministic multithreaded execution.
239 Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems.
180 Joint optimization of idle and cooling power in data centers while maintaining response time.
175 An asymmetric distributed shared memory model for heterogeneous parallel systems.
171 SherLog: error diagnosis by connecting clues from run-time logs.
163 A randomized scheduler with probabilistic guarantees of finding bugs.
161 Micro-pages: increasing DRAM efficiency with locality-aware data placement.
156 Shoestring: probabilistic soft error reliability on the cheap.
150 Dynamically replicated memory: building reliable systems from nanoscale resistive memories.
132 Respec: efficient online multiprocessor replayvia speculation and external determinism.
111 Power routing: dynamic power provisioning in the data center.
104 Flexible architectural support for fine-grain scheduling.
102 Virtualized and flexible ECC for main memory.
101 Speculative parallelization using software multi-threaded transactions.
91 A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing.
91 ConMem: detecting severe concurrency bugs through an effect-oriented approach.
71 Characterizing processor thermal behavior.
71 Inter-core cooperative TLB for chip multiprocessors.
67 Probabilistic job symbiosis modeling for SMT processor scheduling.
57 Analyzing multicore dumps to facilitate concurrency bug reproduction.
54 Decoupling contention management from scheduling.
42 ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications.
30 MacroSS: macro-SIMDization of streaming applications.
27 COMPASS: a programmable data prefetcher using idle GPU shaders.
26 Orthrus: efficient software integrity protection on multi-cores.
24 Butterfly analysis: adapting dataflow analysis to dynamic parallel monitoring.
23 A real system evaluation of hardware atomicity for software speculation.
18 Specifying and dynamically verifying address translation-aware memory consistency.
17 Request behavior variations.
10 Dynamic filtering: multi-purpose architecture support for language runtime systems.
0 Technology for developing regions: Moore’s law is not enough.

2009

Cited by Paper title
826 PowerNap: eliminating server idle power.
627 DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings.
369 Kendo: efficient deterministic multithreading in software.
300 DMP: deterministic shared memory multiprocessing.
269 Accelerating critical section execution with asymmetric multi-core architectures.
268 Early experience with a commercial hardware transactional memory implementation.
267 CTrigger: exposing atomicity violation bugs from their hiding places.
258 Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications.
232 Producing wrong data without doing anything obviously wrong!
176 RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations.
155 ASSURE: automatic software self-healing using rescue points.
136 Capo: a software-hardware interface for practical deterministic multiprocessor replay.
127 Complete information flow tracking from the gates up.
64 An evaluation of the TRIPS computer system.
61 Per-thread cycle accounting in SMT processors.
58 Mixed-mode multicore reliability.
54 ISOLATOR: dynamically ensuring isolation in comcurrent programs.
53 Recovery domains: an organizing principle for recoverable operating systems.
45 Leak pruning.
42 Efficient online validation with delta execution.
42 Commutativity analysis for software parallelization: letting program transformations see the big picture.
38 TwinDrivers: semi-automatic derivation of fast and safe hypervisor network drivers from guest OS drivers.
27 Maximum benefit from a minimal HTM.
23 Anomaly-based bug prediction, isolation, and validation: an automated approach for software debugging.
20 Phantom-BTB: a virtualized branch target buffer design.
18 StreamRay: a stream filtering architecture for coherent ray tracing.
17 Architectural implications of nanoscale integrated sensing and computing.
16 Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle.
15 Dynamic prediction of collection yield for managed runtimes.

2008

Cited by Paper title
680 Learning from mistakes: a comprehensive study on real world concurrency bug characteristics.
610 “No “”power”” struggles: coordinated multi-level power management for the data center. “
392 Overshadow: a virtualization-based approach to retrofitting protection in commodity operating systems.
318 Merge: a programming model for heterogeneous multi-core systems.
217 Understanding the propagation of hard errors to software and implications for resilient system design.
170 Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs.
168 Accelerating two-dimensional page walks for virtualized systems.
130 Adaptive set pinning: managing shared caches in chip multiprocessors.
127 Parallelizing security checks on commodity hardware.
113 Hardbound: architectural support for spatial safety of the C programming language.
101 The design and implementation of microdrivers.
99 Better bug reporting with better privacy.
92 Streamware: programming general-purpose multicore processors using streams.
92 Optimistic parallelism benefits from data partitioning.
80 How low can you go?: recommendations for hardware-supported minimal TCB code execution.
79 Understanding and visualizing full systems with data flow tomography.
66 Adapting to intermittent faults in multicore systems.
58 Archipelago: trading address space for reliability and security.
52 PICSEL: measuring user-perceived performance to control dynamic frequency scaling.
46 Efficiency trends and limits from comprehensive microarchitectural adaptivity.
42 Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors.
39 SoftSig: software-exposed hardware signatures for code analysis and optimization.
38 Predictor virtualization.
37 Hardware counter driven on-the-fly request signatures.
32 Xoc, an extension-oriented compiler for systems programming.
31 Tapping into the fountain of CPUs: on operating system support for programmable devices.
28 The mapping collector: virtual memory support for generational, parallel, and concurrent compaction.
27 Improving the performance of object-oriented languages with dynamic predication of indirect jumps.
21 Accurate branch prediction for short threads.
19 Dispersing proprietary applications as benchmarks through code mutation.
13 Communication optimizations for global multi-threaded instruction scheduling.
6 Toward molecular programming with DNA.

2006

Cited by Paper title
814 A comparison of software and hardware techniques for x86 virtualization.
549 Exploiting coarse-grained task, data, and pipeline parallelism in stream programs.
498 Hybrid transactional memory.
377 AVIO: detecting atomicity violations via access interleaving invariants.
377 Accelerator: using data parallelism to program GPUs for general-purpose uses.
370 Accurate and efficient regression modeling for microarchitectural performance and power prediction.
310 Combinatorial sketching for finite programs.
266 Efficiently exploring architectural design spaces via predictive modeling.
227 Supporting nested transactional memory in logTM.
225 Mercury and freon: temperature emulation and management for server systems.
212 PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor.
199 Geiger: monitoring the buffer cache in a virtual machine environment.
144 Recording shared memory dependencies using strata.
143 Ultra low-cost defect protection for microprocessor pipelines.
142 A performance counter architecture for computing accurate CPI components.
123 A regulated transitive reduction (RTR) for longer memory race recording.
120 Computation spreading: employing hardware migration to specialize CMP cores on-the-fly.
110 Unbounded page-based transactional memory.
101 Bell: bit-encoding online memory leak detection.
95 Automatic generation of peephole superoptimizers.
93 Tradeoffs in transactional memory virtualization.
71 Tartan: evaluating spatial computation for whole program execution.
70 Temporal search: detecting hidden malware timebombs with virtual machines.
68 Introspective 3D chips.
58 Integrated network interfaces for high-bandwidth TCP/IP.
56 Software-based instruction caching for embedded processors.
55 A spatial path scheduling algorithm for EDGE architectures.
49 SlicK: slice-based locality exploitation for efficient redundant multithreading.
47 A probabilistic pointer analysis for speculative optimizations.
43 Understanding prediction-based partial redundant threading for low-overhead, high- coverage fault tolerance.
41 Comprehensively and efficiently protecting the heap.
35 HeapMD: identifying heap-based bugs using anomaly detection.
35 Stealth prefetching.
35 A defect tolerant self-organizing nanoscale SIMD architecture.
34 Instruction scheduling for a tiled dataflow architecture.
33 A new idiom recognition framework for exploiting hardware-assist instructions.
31 Mapping esterel onto a multi-threaded embedded processor.
6 A program transformation and architecture support for quantum uncomputation.
4 Impact of virtualization on computer architecture and operating systems.
Last updated:2017-08-07