Abhijit Gosavi A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis. [Citation Graph (0, 0)][DBLP] Machine Learning, 2004, v:55, n:1, pp:5-29 [Journal]
Abhijit Gosavi Reinforcement learning for long-run average cost. [Citation Graph (0, 0)][DBLP] European Journal of Operational Research, 2004, v:155, n:3, pp:654-674 [Journal]
On step sizes, stochastic shortest paths, and survival probabilities in Reinforcement Learning. [Citation Graph (, )][DBLP]
Reinforcement Learning for Model Building and Variance-penalized Control. [Citation Graph (, )][DBLP]
Search in 0.001secs, Finished in 0.001secs
NOTICE1
System may not be available sometimes or not working properly, since it is still in development with continuous upgrades
NOTICE2
The rankings that are presented on this page should NOT be considered as formal since the citation info is incomplete in DBLP