How can the value iteration pseudocode algorithm be translated to C++?

 I am having a difficult time translating this pseudocode into functional C++ code. At line 10: The value function is represented as V[s], which has bracket notation-like arrays. Is this a separate method or just a function of the value with a given state? Why is S inside the brackets? Is this supposed to be an array with as many elements as S?

At line 12: Vk would be the element in index k inside of array V?

At line 16: I'm interpreting this as the start of a do-while loop that ends at line 20. Line 19: I'm finding the action that maximises the sum, for all states, of the equation following the sigma? Line 20: I'm interpreting this as the the-end of the do-while. But what is this condition? Am I checking if there is an s such that this condition applies? So Would I have to loop between all states and stop if any state satisfies the condition? (Basically a loop with a break, instead of a while)

Answered by Andrea Bailey

At line 10: The value function is represented as V[s], which has bracket notation-like arrays. Is this a separate method or just a function of the value with a given state? Why is S inside the brackets? Is this supposed to be an array with as many elements as S? This is just a notation that the value function is a mapping between S and the real numbers. When implementing, you would want to store V(s) and π(s) as either arrays or some kind of hashmap like unordered_map (in which case your states must be hashable). Here we also have to assume that this container will fit in the memory, otherwise, the value function has to be approximated with function approximation which is not covered explicitly by this pseudo-code.

At line 12: Vk would be the element in index k inside of array V?

No. This line says that you would need to store multiple value functions during the algorithm, basically, a list of functions (or think of it as an array of functions or a two-dimensional array, where k goes along one dimension and s goes along the other dimension) and if you look at the loop following it, it shows you how you need to keep appending new value functions to the end of this list. However, notice (in line 19) that you only need the value function from the previous iteration to calculate your new value function, which means that you will never need to store more than two value functions (the new one and the previous one).

At line 16: I'm interpreting this as the start of a do-while loop that ends at line 20.

Yes. This loop calculates your value function.

Line 19: I'm finding the action that maximizes the sum, for all states, of the equation following the sigma? Here you have to iterate through all the actions a in state s and calculate the sum according to the formula in the max function. Then you take the highest value and set the new V(s) to be this value. Line 20: I'm interpreting this at the end of the do-while. But what is this condition? Am I checking if there is an s such that this condition applies? So would I have to loop between all states and stop if any state satisfies the condition? (Basically a loop with a break, instead of a while) You have to check for all s if the condition satisfies and you may only break the loop if all these values are lower than the threshold θ.



Your Answer

Interviews

Parent Categories