| List of Figures | p. xi |
| List of Tables | p. xv |
| Foreword | p. xvii |
| Acknowledgments | p. xix |
| Preface | p. xxi |
| Introduction | p. 1 |
| An Example | p. 1 |
| The Design Process: Constraints and Alternatives | p. 3 |
| Organization of the Book | p. 7 |
| For the Reader | p. 9 |
| Programmable DSP Based Implementation | p. 11 |
| Power Dissipation - Sources and Measures | p. 13 |
| Components Contributing to Power Dissipation | p. 13 |
| Measures of Power Dissipation in Busses | p. 13 |
| Measures of Power Dissipation in the Multiplier | p. 13 |
| Low Power Realization of DSP Algorithms | p. 16 |
| Allocation of Program, Coefficient and Data Memory | p. 16 |
| Bus Coding | p. 17 |
| Gray Coded Addressing | p. 17 |
| T0 coding | p. 18 |
| Bus Invert Coding | p. 20 |
| Instruction Buffering | p. 21 |
| Memory Architectures for Low Power | p. 22 |
| Bus Bit Reordering | p. 24 |
| Generic Techniques for Power Reduction | p. 26 |
| Low Power Realization of Weighted-sum Computation | p. 26 |
| Selective Coefficient Negation | p. 27 |
| Coefficient Ordering | p. 28 |
| Coefficient Ordering Problem Formulation | p. 29 |
| Coefficient Ordering Algorithm | p. 30 |
| Adder Input Bit Swapping | p. 31 |
| Swapping Multiplier Inputs | p. 33 |
| Exploiting Coefficient Symmetry | p. 34 |
| Techniques for Low Power Realization of FIR Filters | p. 35 |
| Circular Buffer | p. 36 |
| Multirate Architectures | p. 37 |
| Computational Complexity of Multirate Architectures | p. 37 |
| Multirate Architecture on a Programmable DSP | p. 38 |
| Architecture to Support Transposed FIR Structure | p. 41 |
| Coefficient Scaling | p. 42 |
| Coefficient Optimization | p. 43 |
| Coefficient Optimization - Problem Definition | p. 43 |
| Coefficient Optimization - Problem Formulation | p. 43 |
| Coefficient Optimization Algorithm - Components | p. 44 |
| Coefficient Optimization Algorithm | p. 45 |
| Coefficient Optimization Using 0-1 Programming | p. 50 |
| Framework for Low Power Realization of FIR Filters on a Programmable DSP | p. 51 |
| Implementation Using Hardware Multiplier(s) and Adder(s) | p. 55 |
| Architectural Transformations | p. 55 |
| Evaluating the Effectiveness of DFG Transformations | p. 56 |
| Low Energy vs Low Peak Power Tradeoff | p. 61 |
| Multirate Architectures | p. 63 |
| Computational Complexity of Multirate Architectures | p. 64 |
| Non-linear Phase FIR Filters | p. 64 |
| Linear Phase FIR Filters | p. 65 |
| Power Analysis of Multirate Architectures | p. 68 |
| Power Analysis for One Level Decimated Multirate Architectures | p. 68 |
| Power Analysis - an Example | p. 70 |
| Power Reduction Using Multirate Architectures | p. 71 |
| Distributed Arithmetic Based Implementation | p. 75 |
| DA Structures for Area-Delay Tradeoff | p. 76 |
| DA Based Implementation of Linear Phase FIR Filters | p. 77 |
| 1-Bit-At-A-Time vs 2-Bits-At-A-Time Access | p. 78 |
| Multiple Coefficient Memory Banks | p. 79 |
| Multiple Memory Bank Implementation with 2BAAT Access | p. 80 |
| DA Based Implementation of Multirate Architectures | p. 81 |
| Multirate Architecture with a Decimation Factor of Three | p. 82 |
| Multirate Architectures with Two Level Decimation | p. 84 |
| Coefficient Memory vs Number of Additions Tradeoff | p. 84 |
| Improving Area Efficiency of Two LUT Based DA Structures | p. 85 |
| Minimum Area Partitions for Two ROM Implementation | p. 87 |
| Minimum Area Partitions for Hardwired Logic | p. 88 |
| CF2: Estimating Area from the Actual Truth-Table | p. 89 |
| CF1: Estimating Area from the Coefficients in Each Partition | p. 91 |
| Evaluating the Effectiveness of the Coefficient Partitioning Technique | p. 92 |
| Techniques for Low Power Implementation of DA Based FIR Filters | p. 94 |
| Toggle Reduction Using Data Coding | p. 95 |
| Nega-binary Coding | p. 95 |
| 2's Complement vs Nega-binary Representation | p. 96 |
| Deriving an Optimum Nega-binary Scheme for a Given Data Distribution | p. 99 |
| Incorporating a Nega-binary Scheme into the DA Based FIR Filter Implementation | p. 101 |
| A Few Observations | p. 103 |
| Additional Power Saving with Nega-binary Architecture | p. 104 |
| Toggle Reduction in Memory Based Implementations by Gray Sequencing and Sequence Reordering | p. 107 |
| Multipler-Less Implementation | p. 113 |
| Minimizing Additions in the Weighted-sum Computation | p. 114 |
| Minimizing Additions - an Example | p. 114 |
| 2 Bit Common Subexpressions | p. 116 |
| Problem Formulation | p. 116 |
| Common Subexpression Elimination | p. 118 |
| The Algorithm | p. 119 |
| Minimizing additions in MCM Computation | p. 120 |
| Minimizing Additions - an Example | p. 120 |
| 2 Bit Common Subexpressions | p. 122 |
| Problem Formulation | p. 123 |
| Common Subexpression Elimination | p. 124 |
| The Algorithm | p. 124 |
| An Upper Bound on the Number of Additions for MCM Computation | p. 126 |
| Transformations for Minimizing Number of Additions | p. 128 |
| Number Theoretic Transforms | p. 128 |
| 2's Complement Representation | p. 128 |
| Uni-sign Representation | p. 129 |
| Canonical Signed Digit (CSD) Representation | p. 129 |
| Signal Flow Graph Transformations | p. 130 |
| Evaluating Effectiveness of the Transformations | p. 133 |
| Transformations for Optimal Initial Solution | p. 137 |
| Coefficient Optimization | p. 137 |
| Efficient Pre-Filter Structures | p. 138 |
| High Level Synthesis of Multiprecision DFGs | p. 138 |
| Precision Sensitive Register Allocation | p. 138 |
| Precision Sensitive Functional Unit Binding | p. 139 |
| Precision Sensitive Scheduling | p. 140 |
| Implementation of Multiplication-Free Linear Transforms | p. 141 |
| Optimum Code Generation for Register-rich Architectures | p. 142 |
| Generic Register-rich Architecture Model | p. 142 |
| Sources and Measures of Power Dissipation | p. 143 |
| Optimum Code Generation for 1-D Transforms | p. 144 |
| Minimizing Number of Operations in Two Dimensional Transforms | p. 146 |
| Low Power Code Generation | p. 148 |
| Optimum Code Generation for Single Register, Accumulator Based Architectures | p. 153 |
| Single Register, Accumulator Based Architecture Model | p. 153 |
| Code Generation Rules | p. 154 |
| Computation Scheduling Algorithm | p. 156 |
| Impact of DAG Structure on the Optimality of Generated Code | p. 158 |
| DAG Optimizing Transformations | p. 159 |
| Transformation I - Tree to Chain Conversion | p. 159 |
| Transformation II - Serializing a Butterfly | p. 159 |
| Transformation III - Fanout Reduction | p. 160 |
| Transformation IV - Merging | p. 161 |
| Synthesis of Spill-free DAGs | p. 162 |
| Sources and Measures of Power Dissipation | p. 168 |
| Low Power Code Generation | p. 168 |
| Residue Number System Based Implementation | p. 171 |
| Optimizing RNS based Implementation of the Weighted-sum Computation | p. 172 |
| Parallel Processing | p. 174 |
| Residue Encoding for Low Power | p. 174 |
| Coefficient Ordering | p. 175 |
| Exploiting Redundancy | p. 176 |
| Residue Encoding for minimizing LUT area | p. 177 |
| Optimizing RNS based Implementation of FIR Filters | p. 179 |
| Coefficient Scaling | p. 179 |
| Coefficient Optimization for Low Power | p. 180 |
| RNS based Implementation of Transposed FIR Filter Structure | p. 180 |
| Coefficient Optimization for Area Reduction | p. 180 |
| RNS as an Optimizing Transformation for High Precision Signal Processing | p. 183 |
| A Framework for Algorithmic and Architectural Transformations | p. 187 |
| Classification of Algorithmic and Architectural Transformations | p. 187 |
| A Snapshot of the Framework | p. 191 |
| Summary | p. 195 |
| References | p. 199 |
| Topic Index | p. 207 |
| About the Authors | p. 209 |
| Table of Contents provided by Syndetics. All Rights Reserved. |