| Preface | p. VII |
| Introduction | p. 1 |
| An Overview of Learning and Optimization | p. 1 |
| Problem Description | p. 1 |
| Optimal Policies | p. 5 |
| Fundamental Limitations of Learning and Optimization | p. 12 |
| A Sensitivity-Based View of Learning and Optimization | p. 17 |
| Problem Formulations in Different Disciplines | p. 19 |
| Perturbation Analysis (PA) | p. 21 |
| Markov Decision Processes (MDPs) | p. 26 |
| Reinforcement Learning (RL) | p. 31 |
| Identification and Adaptive Control (I&AC) | p. 34 |
| Event-Based Optimization and Potential Aggregation | p. 37 |
| A Map of the Learning and Optimization World | p. 41 |
| Terminology and Notation | p. 42 |
| Problems | p. 44 |
| Four Disciplines in Learning and Optimization | |
| Perturbation Analysis | p. 51 |
| Perturbation Analysis of Markov Chains | p. 52 |
| Constructing a Perturbed Sample Path | p. 53 |
| Perturbation Realization Factors and Performance Potentials | p. 57 |
| Performance Derivative Formulas | p. 64 |
| Gradients with Discounted Reward Criteria | p. 68 |
| Higher-Order Derivatives and the MacLaurin Series | p. 74 |
| Performance Sensitivities of Markov Processes | p. 83 |
| Performance Sensitivities of Semi-Markov Processes | p. 90 |
| Fundamentals for Semi-Markov Processes | p. 90 |
| Performance Sensitivity Formulas | p. 95 |
| Perturbation Analysis of Queueing Systems | p. 102 |
| Constructing a Perturbed Sample Path | p. 105 |
| Perturbation Realization | p. 115 |
| Performance Derivatives | p. 121 |
| Remarks on Theoretical Issues | p. 125 |
| Other Methods | p. 132 |
| Problems | p. 137 |
| Learning and Optimization with Perturbation Analysis | p. 147 |
| The Potentials | p. 148 |
| Numerical Methods | p. 148 |
| Learning Potentials from Sample Paths | p. 151 |
| Coupling | p. 156 |
| Performance Derivatives | p. 161 |
| Estimating through Potentials | p. 161 |
| Learning Directly | p. 162 |
| Optimization with PA | p. 172 |
| Gradient Methods and Stochastic Approximation | p. 172 |
| Optimization with Long Sample Paths | p. 174 |
| Applications | p. 177 |
| Problems | p. 177 |
| Markov Decision Processes | p. 183 |
| Ergodic Chains | p. 185 |
| Policy Iteration | p. 186 |
| Bias Optimality | p. 192 |
| MDPs with Discounted Rewards | p. 201 |
| Multi-Chains | p. 203 |
| Policy Iteration | p. 205 |
| Bias Optimality | p. 216 |
| MDPs with Discounted Rewards | p. 226 |
| The nth-Bias Optimization | p. 228 |
| nth-Bias Difference Formulas | p. 229 |
| Optimality Equations | p. 232 |
| Policy Iteration | p. 240 |
| nth-Bias Optimal Policy Spaces | p. 244 |
| Problems | p. 246 |
| Sample-Path-Based Policy Iteration | p. 253 |
| Motivation | p. 254 |
| Convergence Properties | p. 258 |
| Convergence of Potential Estimates | p. 259 |
| Sample Paths with a Fixed Number of Regenerative Periods | p. 260 |
| Sample Paths with Increasing Lengths | p. 267 |
| "Fast" Algorithms | p. 277 |
| The Algorithm That Stops in a Finite Number of Periods | p. 278 |
| With Stochastic Approximation | p. 282 |
| Problems | p. 284 |
| Reinforcement Learning | p. 289 |
| Stochastic Approximation | p. 290 |
| Finding the Zeros of a Function Recursively | p. 291 |
| Estimating Mean Values | p. 297 |
| Temporal Difference Methods | p. 298 |
| TD Methods for Potentials | p. 298 |
| Q-Factors and Other Extensions | p. 308 |
| TD Methods for Performance Derivatives | p. 313 |
| TD Methods and Performance Optimization | p. 318 |
| PA-Based Optimization | p. 318 |
| Q-Learning | p. 321 |
| Optimistic On-Line Policy Iteration | p. 325 |
| Value Iteration | p. 327 |
| Summary of the Learning and Optimization Methods | p. 330 |
| Problems | p. 333 |
| Adaptive Control Problems as MDPs | p. 341 |
| Control Problems and MDPs | p. 342 |
| Control Systems Modelled as MDPs | p. 342 |
| A Comparison of the Two Approaches | p. 345 |
| MDPs with Continuous State Spaces | p. 353 |
| Operators on Continuous Spaces | p. 354 |
| Potentials and Policy Iteration | p. 359 |
| Linear Control Systems and the Riccati Equation | p. 363 |
| The LQ Problem | p. 363 |
| The JLQ Problem | p. 368 |
| On-Line Optimization and Adaptive Control | p. 373 |
| Discretization and Estimation | p. 374 |
| Discussion | p. 379 |
| Problems | p. 381 |
| The Event-Based Optimization - A New Approach | |
| Event-Based Optimization of Markov Systems | p. 387 |
| An Overview | p. 388 |
| Summary of Previous Chapters | p. 388 |
| An Overview of the Event-Based Approach | p. 390 |
| Events Associated with Markov Chains | p. 398 |
| The Event and Event Space | p. 400 |
| The Probabilities of Events | p. 403 |
| The Basic Ideas Illustrated by Examples | p. 407 |
| Classification of Three Types of Events | p. 410 |
| Event-Based Optimization | p. 414 |
| The Problem Formulation | p. 414 |
| Performance Difference Formulas | p. 417 |
| Performance Derivative Formulas | p. 420 |
| Optimization | p. 425 |
| Learning: Estimating Aggregated Potentials | p. 429 |
| Aggregated Potentials | p. 429 |
| Aggregated Potentials in the Event-Based Optimization | p. 432 |
| Applications and Examples | p. 434 |
| Manufacturing | p. 434 |
| Service Rate Control | p. 438 |
| General Applications | p. 444 |
| Problems | p. 446 |
| Constructing Sensitivity Formulas | p. 455 |
| Motivation | p. 455 |
| Markov Chains on the Same State Space | p. 456 |
| Event-Based Systems | p. 464 |
| Sample-Path Construction | p. 464 |
| Parameterized Systems: An Example | p. 467 |
| Markov Chains with Different State Spaces | p. 470 |
| One Is a Subspace of the Other | p. 470 |
| A More General Case | p. 478 |
| Summary | p. 482 |
| Problems | p. 483 |
| Appendices: Mathematical Background | |
| Probability and Markov Processes | p. 491 |
| Probability | p. 491 |
| Markov Processes | p. 498 |
| Problems | p. 504 |
| Stochastic Matrices | p. 507 |
| Canonical Form | p. 507 |
| Eigenvalues | p. 508 |
| The Limiting Matrix | p. 511 |
| Problems | p. 516 |
| Queueing Theory | p. 519 |
| Single-Server Queues | p. 519 |
| Queueing Networks | p. 524 |
| Some Useful Techniques | p. 536 |
| Problems | p. 538 |
| Notation and Abbreviations | p. 543 |
| References | p. 547 |
| Index | p. 563 |
| Table of Contents provided by Ingram. All Rights Reserved. |