The era of multi-core chips - a fresh look on software challenges | p. 1 |
Streaming networks for coordinating data-parallel programs (position statement) | p. 2 |
Implementations of square-root and exponential functions for large FPGAs | p. 6 |
Using branch prediction information for near-optimal I-cache leakage | p. 24 |
Scientific computing applications on the imagine stream processor | p. 38 |
Enhancing last-level cache performance by block bypassing and early miss determination | p. 52 |
A study of the performance potential for dynamic instruction hints selection | p. 67 |
Reorganizing UNIX for reliability | p. 81 |
Critical-task anticipation scheduling algorithm for heterogeneous and grid computing | p. 95 |
Processor directed dynamic page policy | p. 109 |
Static WCET analysis based compiler-directed DVS energy optimization in real-time applications | p. 123 |
A study on transformation of self-similar processes with arbitrary marginal distributions | p. 137 |
[mu]TC - an intermediate language for programming chip multiprocessors | p. 147 |
Functional unit chaining : a runtime adaptive architecture for reducing bypass delays | p. 161 |
Trace-based data cache leakage reduction at link time | p. 175 |
Parallelizing user-defined and implicit reductions globally on multiprocessors | p. 189 |
Overload protection for commodity network appliances | p. 203 |
An integrated temporal partitioning and mapping framework for handling custom instructions on a reconfigurable functional unit | p. 219 |
A high performance simulator system for a multiprocessor system based on a multi-way cluster | p. 231 |
Hardware budget and runtime system for data-driven multithreaded chip multiprocessor | p. 244 |
Combining wireless sensor network with grid for intelligent city traffic | p. 260 |
A novel processor architecture for real-time control | p. 270 |
A 0-1 integer linear programming based approach for global locality optimizations | p. 281 |
Design and analysis of low power image filters toward defect-resilient embedded memories for multimedia SoCs | p. 295 |
Entropy throttling : a physical approach for maximizing packet mobility in interconnection networks | p. 309 |
Design of an efficient flexible architecture for color image enhancement | p. 323 |
Hypercube communications on optical chordal ring networks with chord length of three | p. 337 |
PMPS(3) : a performance model of parallel systems | p. 344 |
Issues and support for dynamic register allocation | p. 351 |
A heterogeneous multi-core processor architecture for high performance computing | p. 359 |
Reducing the branch power cost, in embedded processors through static scheduling, profiling and SuperBlock formation | p. 366 |
Fault-free pairwise independent Hamiltonian paths on faulty hypercubes | p. 373 |
Constructing node-disjoint paths in enhanced pyramid networks | p. 380 |
Striping cache : a global cache for striped network file system | p. 387 |
DTuplesHPC : distributed tuple space for desktop high performance computing | p. 394 |
The algorithm and circuit design of a 400MHz 16-bit hybrid multiplier | p. 401 |
Live range aware cache architecture | p. 409 |
The challenges of efficient code-generation for massively parallel architectures | p. 416 |
Reliable systolic computing through redundancy | p. 423 |
A diversity-controllable genetic algorithm for optimal fused traffic planning on sensor networks | p. 430 |
A context-switch reduction heuristic for power-aware off-line scheduling | p. 437 |
On the reliability of drowsy instruction caches | p. 445 |
Design of a reconfigurable cryptographic engine | p. 452 |
Enhancing ICOUNT2.8 fetch policy with better fairness for SMT processors | p. 459 |
The new BCD subtractor and its reversible logic implementation | p. 466 |
Power-efficient microkernel of embedded operating system on chip | p. 473 |
Understanding prediction limits through unbiased branches | p. 480 |
Bandwidth optimization of the EMCI for a high performance 32-bit DSP | p. 488 |
Research on Petersen graphs and hyper-cubes connected interconnection networks | p. 495 |
Cycle period analysis and optimization of timed circuits | p. 502 |
Acceleration techniques for chip-multiprocessor simulator debug | p. 509 |
A DDL-based software architecture model | p. 516 |
Branch behavior characterization for multimedia applications | p. 523 |
Optimization and evaluating of StreamYGX2 on MASA stream processor | p. 531 |
SecureTorrent : a security framework for file swarming | p. 538 |
Register allocation on stream processor with local register file | p. 545 |
A self-reconfigurable system-on-chip architecture for satellite on-board computer maintenance | p. 552 |
Compile-time thread distinguishment algorithm on VIM-based architecture | p. 559 |
Designing a coarse-grained reconfigurable architecture using loop self-pipelining | p. 567 |
Low-power data cache architecture by address range reconfiguration for multimedia applications | p. 574 |
Automatic synthesis of interface circuits from simplified IP interface protocols | p. 581 |
An architectural leakage power reduction method for instruction cache in ultra deep submicron microprocessors | p. 588 |
An efficient approach to energy saving in microcontrollers | p. 595 |
Table of Contents provided by Blackwell. All Rights Reserved. |