Preface | p. VII |
Introduction | p. 1 |
An Overview of Learning and Optimization | p. 1 |
Problem Description | p. 1 |
Optimal Policies | p. 5 |
Fundamental Limitations of Learning and Optimization | p. 12 |
A Sensitivity-Based View of Learning and Optimization | p. 17 |
Problem Formulations in Different Disciplines | p. 19 |
Perturbation Analysis (PA) | p. 21 |
Markov Decision Processes (MDPs) | p. 26 |
Reinforcement Learning (RL) | p. 31 |
Identification and Adaptive Control (I&AC) | p. 34 |
Event-Based Optimization and Potential Aggregation | p. 37 |
A Map of the Learning and Optimization World | p. 41 |
Terminology and Notation | p. 42 |
Problems | p. 44 |
Four Disciplines in Learning and Optimization | |
Perturbation Analysis | p. 51 |
Perturbation Analysis of Markov Chains | p. 52 |
Constructing a Perturbed Sample Path | p. 53 |
Perturbation Realization Factors and Performance Potentials | p. 57 |
Performance Derivative Formulas | p. 64 |
Gradients with Discounted Reward Criteria | p. 68 |
Higher-Order Derivatives and the MacLaurin Series | p. 74 |
Performance Sensitivities of Markov Processes | p. 83 |
Performance Sensitivities of Semi-Markov Processes | p. 90 |
Fundamentals for Semi-Markov Processes | p. 90 |
Performance Sensitivity Formulas | p. 95 |
Perturbation Analysis of Queueing Systems | p. 102 |
Constructing a Perturbed Sample Path | p. 105 |
Perturbation Realization | p. 115 |
Performance Derivatives | p. 121 |
Remarks on Theoretical Issues | p. 125 |
Other Methods | p. 132 |
Problems | p. 137 |
Learning and Optimization with Perturbation Analysis | p. 147 |
The Potentials | p. 148 |
Numerical Methods | p. 148 |
Learning Potentials from Sample Paths | p. 151 |
Coupling | p. 156 |
Performance Derivatives | p. 161 |
Estimating through Potentials | p. 161 |
Learning Directly | p. 162 |
Optimization with PA | p. 172 |
Gradient Methods and Stochastic Approximation | p. 172 |
Optimization with Long Sample Paths | p. 174 |
Applications | p. 177 |
Problems | p. 177 |
Markov Decision Processes | p. 183 |
Ergodic Chains | p. 185 |
Policy Iteration | p. 186 |
Bias Optimality | p. 192 |
MDPs with Discounted Rewards | p. 201 |
Multi-Chains | p. 203 |
Policy Iteration | p. 205 |
Bias Optimality | p. 216 |
MDPs with Discounted Rewards | p. 226 |
The nth-Bias Optimization | p. 228 |
nth-Bias Difference Formulas | p. 229 |
Optimality Equations | p. 232 |
Policy Iteration | p. 240 |
nth-Bias Optimal Policy Spaces | p. 244 |
Problems | p. 246 |
Sample-Path-Based Policy Iteration | p. 253 |
Motivation | p. 254 |
Convergence Properties | p. 258 |
Convergence of Potential Estimates | p. 259 |
Sample Paths with a Fixed Number of Regenerative Periods | p. 260 |
Sample Paths with Increasing Lengths | p. 267 |
"Fast" Algorithms | p. 277 |
The Algorithm That Stops in a Finite Number of Periods | p. 278 |
With Stochastic Approximation | p. 282 |
Problems | p. 284 |
Reinforcement Learning | p. 289 |
Stochastic Approximation | p. 290 |
Finding the Zeros of a Function Recursively | p. 291 |
Estimating Mean Values | p. 297 |
Temporal Difference Methods | p. 298 |
TD Methods for Potentials | p. 298 |
Q-Factors and Other Extensions | p. 308 |
TD Methods for Performance Derivatives | p. 313 |
TD Methods and Performance Optimization | p. 318 |
PA-Based Optimization | p. 318 |
Q-Learning | p. 321 |
Optimistic On-Line Policy Iteration | p. 325 |
Value Iteration | p. 327 |
Summary of the Learning and Optimization Methods | p. 330 |
Problems | p. 333 |
Adaptive Control Problems as MDPs | p. 341 |
Control Problems and MDPs | p. 342 |
Control Systems Modelled as MDPs | p. 342 |
A Comparison of the Two Approaches | p. 345 |
MDPs with Continuous State Spaces | p. 353 |
Operators on Continuous Spaces | p. 354 |
Potentials and Policy Iteration | p. 359 |
Linear Control Systems and the Riccati Equation | p. 363 |
The LQ Problem | p. 363 |
The JLQ Problem | p. 368 |
On-Line Optimization and Adaptive Control | p. 373 |
Discretization and Estimation | p. 374 |
Discussion | p. 379 |
Problems | p. 381 |
The Event-Based Optimization - A New Approach | |
Event-Based Optimization of Markov Systems | p. 387 |
An Overview | p. 388 |
Summary of Previous Chapters | p. 388 |
An Overview of the Event-Based Approach | p. 390 |
Events Associated with Markov Chains | p. 398 |
The Event and Event Space | p. 400 |
The Probabilities of Events | p. 403 |
The Basic Ideas Illustrated by Examples | p. 407 |
Classification of Three Types of Events | p. 410 |
Event-Based Optimization | p. 414 |
The Problem Formulation | p. 414 |
Performance Difference Formulas | p. 417 |
Performance Derivative Formulas | p. 420 |
Optimization | p. 425 |
Learning: Estimating Aggregated Potentials | p. 429 |
Aggregated Potentials | p. 429 |
Aggregated Potentials in the Event-Based Optimization | p. 432 |
Applications and Examples | p. 434 |
Manufacturing | p. 434 |
Service Rate Control | p. 438 |
General Applications | p. 444 |
Problems | p. 446 |
Constructing Sensitivity Formulas | p. 455 |
Motivation | p. 455 |
Markov Chains on the Same State Space | p. 456 |
Event-Based Systems | p. 464 |
Sample-Path Construction | p. 464 |
Parameterized Systems: An Example | p. 467 |
Markov Chains with Different State Spaces | p. 470 |
One Is a Subspace of the Other | p. 470 |
A More General Case | p. 478 |
Summary | p. 482 |
Problems | p. 483 |
Appendices: Mathematical Background | |
Probability and Markov Processes | p. 491 |
Probability | p. 491 |
Markov Processes | p. 498 |
Problems | p. 504 |
Stochastic Matrices | p. 507 |
Canonical Form | p. 507 |
Eigenvalues | p. 508 |
The Limiting Matrix | p. 511 |
Problems | p. 516 |
Queueing Theory | p. 519 |
Single-Server Queues | p. 519 |
Queueing Networks | p. 524 |
Some Useful Techniques | p. 536 |
Problems | p. 538 |
Notation and Abbreviations | p. 543 |
References | p. 547 |
Index | p. 563 |
Table of Contents provided by Ingram. All Rights Reserved. |