RLRP routing  v.0.1.0
rl_logic Namespace Reference

Classes

class  ActionSelector
 Class for selecting the action from the list of actions and their corresponding values. More...
 
class  ValueEstimator
 Class for assigning current estimated value for a given action and provides method for returning this value. More...
 

Detailed Description

@package rl_logic
Created on Aug 1, 2016

@author: Dmitrii Dugaev


This module provides methods for selecting the actions and calculating their estimation values, according to the
Reinforcement Learning (RL) methodology, which came from the subject of Artificial Intelligence (AI) algorithms.
The abstract task of such algorithms is, based on current "situation" (a set of current possible actions and their
action "values"), select an optimal actions which will return the maximum possible reward.
Therefore, there are two input parameters which are required - a set of current actions, and the "feedback" from
each action in a form of "reward value". The underlying mechanism of action selection depends on the implemented
selection method, which could be a simple "greedy" algorithm, or more sophisticated "soft-max" solutions.
The other important values - are the "estimation" (or "estimated reward") values - which represent a predicted "outcome"
from the given action if this action would have been taken. The way those values are being estimated is the other
important task, which would affect the chosen action. They could be based, for example, on a simple "sample average"
calculation.
More information about the selection and calculation methods in RL can be found in R.Sutton's book:
"Reinforcement Learning: An Introduction"

The module has two main classes - ValueEstimator and ActionSelector.
The ValueEstimator class provides methods for estimating the current action values based on the last given reward which
has been received by selecting the action.
The ActionSelector class provides methods for selecting the action based on the given list of actions and their current
estimation values.