Posts

Showing posts from April, 2007

Constrained MDPs and the reward hypothesis

It's been a looong ago that I posted on this blog. But this should not mean the blog is dead. Slow and steady wins the race, right? Anyhow, I am back and today I want to write about constrained Markovian Decision Process (CMDPs). The post is prompted by a recent visit of Eugene Feinberg , a pioneer of CMDPs, of our department, and also by a growing interest in CMPDs in the RL community (see this , this , or this paper). For impatient readers, a CMDP is like an MDP except that there are multiple reward functions, one of which is used to set the optimization objective, while the others are used to restrict what policies can do. Now, it seems to me that more often than not the problems we want to solve are easiest to specify using multiple objectives (in fact, this is a borderline tautology!). An example, which given our current sad situation is hard to escape, is deciding what interventions a government should apply to limit the spread of a virus while maintaining economic

LaTeX support

I added the following two lines to the header section of the template of this page: <script type="text/javascript" src="http://www.maths.nottingham.ac.uk/personal/drw/LaTeXMathML.js"> </script> Suddenly, I can type (almost) anything in latex, e.g. \$a_n\$ becomes $a_n$. Fancy! (If you do not see anything fancy then either Javascript is disabled in your browser or you are using Internet Explorer without MathML support. In the latter case you may want to download MathPlayer by DesignScience.) Many thanks for Peter Jipsen the folks who developed ASCIIMathML , which serves as the basis of LaTeXMathML by Douglas R. Woodall . Examples showing what is possible with LatexMathML can be found here . This is an indispensable tool! The nice thing is that MathML is scaleable: $E=m c^2$

The Fastest Mixing Markov Chain on a Graph

The paper can be found here . The authors are Stephen Boyd , Persi Diaconis and Lin Xiao. I have found the paper while looking at the papers by Perso Diaconis , a notable mathematician and magician. The paper talks about exactly what the title suggests: You are given a finite graph and you can set up a random walk on this graph by determining the transition probabilities between vertices that are connected by an edge. The walk must be symmetric so that the uniform distribution is a stationary distribution of this walk. Assuming that the associated Markov chain is irreducible and symmetric, the state distribution will converge to the uniform. The task is to maximize the rate of convergence of this. The solution is the Fastest Mixing Markov Chain on the graph (FMMC). The authors show that this is a convex optimization problem and give a polynomial algorithm (based on semidefinite programming) to find the solution. A subgradient method is given that can be more effective for larger graphs

The Loss Rank Principle by Marcus Hutter

I found the paper posted by Marcus Hutter on arxiv quite interesting. The paper is about model (or rather predictor) selection. The idea is a familiar one, but the details appear to be novel: You want to find a model which yields small loss on the dataset available, while yielding a larger loss on most other datasets. Classification : The simplest case is when we consider supervised learning and the target set is finite. Then you can count the number of target label variations such that the predictor's loss is smaller than its loss when the true targets are used. This idea sounds very similar to the way Rademacher complexity works, see e.g. the paper of Lugosi and Wegkamp, where a localized version of Rademacher complexity is investigated. Regression : For continuous targets you can use a grid with an increasing resolution (assume that the range of targets is bounded) and count the number of gridpoints such that the predictor's loss is less than its loss on the true dataset.

Why this Blog??

I am struggling with organizing the notes I make after lectures or after reading a paper. Hence I will experiment with this fancy way of keeping track of my thoughts. Of course, I will be happy to receive feedback from the occasional readers. We will see how well it goes!