Builders
- class EnvironmentBuilder(env_name, env_params)[source]
Bases:
object
Class to spawn instances of a MushroomRL environment
- __init__(env_name, env_params)[source]
Constructor
- Parameters:
env_name – name of the environment to build;
env_params – required parameters to build the specified environment.
- class AgentBuilder(n_steps_per_fit=None, n_episodes_per_fit=None, compute_policy_entropy=True, compute_entropy_with_states=False, compute_value_function=True, preprocessors=None)[source]
Bases:
object
Base class to spawn instances of a MushroomRL agent
- __init__(n_steps_per_fit=None, n_episodes_per_fit=None, compute_policy_entropy=True, compute_entropy_with_states=False, compute_value_function=True, preprocessors=None)[source]
Initialize AgentBuilder
- set_preprocessors(preprocessors)[source]
Set preprocessor for the specific AgentBuilder
- Parameters:
preprocessors – list of preprocessor classes.
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- compute_Q(agent, states)[source]
Compute the Q Value for an AgentBuilder
- Parameters:
agent (Agent) – the considered agent;
states (np.ndarray) – the set of states over which we need to compute the Q function.
Policy Search Builders
Policy Gradient
- class PolicyGradientBuilder(n_episodes_per_fit, optimizer, **kwargs)[source]
Bases:
AgentBuilder
AgentBuilder for Policy Gradient Methods. The current builder uses a state dependant gaussian with diagonal standard deviation and linear mean.
- __init__(n_episodes_per_fit, optimizer, **kwargs)[source]
Constructor.
- Parameters:
optimizer (Optimizer) – optimizer to be used by the policy gradient algorithm;
**kwargs – others algorithms parameters.
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- class REINFORCEBuilder(n_episodes_per_fit, optimizer, **kwargs)[source]
Bases:
PolicyGradientBuilder
- alg_class
alias of
REINFORCE
- class GPOMDPBuilder(n_episodes_per_fit, optimizer, **kwargs)[source]
Bases:
PolicyGradientBuilder
- alg_class
alias of
GPOMDP
- class eNACBuilder(n_episodes_per_fit, optimizer, **kwargs)[source]
Bases:
PolicyGradientBuilder
- alg_class
alias of
eNAC
Black-Box optimization
- class BBOBuilder(n_episodes_per_fit, **kwargs)[source]
Bases:
AgentBuilder
AgentBuilder for Black Box optimization methods. The current builder uses a simple deterministic linear policy and gaussian Diagonal distribution.
- __init__(n_episodes_per_fit, **kwargs)[source]
Constructor.
- Parameters:
optimizer (Optimizer) – optimizer to be used by the policy gradient algorithm;
**kwargs – others algorithms parameters.
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- class PGPEBuilder(n_episodes_per_fit, optimizer)[source]
Bases:
BBOBuilder
- alg_class
alias of
PGPE
- class RWRBuilder(n_episodes_per_fit, beta)[source]
Bases:
BBOBuilder
- alg_class
alias of
RWR
- class REPSBuilder(n_episodes_per_fit, eps)[source]
Bases:
BBOBuilder
- alg_class
alias of
REPS
- class ConstrainedREPSBuilder(n_episodes_per_fit, eps, kappa)[source]
Bases:
BBOBuilder
- alg_class
alias of
ConstrainedREPS
Value Based Builders
Temporal Difference
- class TDFiniteBuilder(learning_rate, epsilon, epsilon_test, **alg_params)[source]
Bases:
AgentBuilder
AgentBuilder for a generic TD algorithm (for finite states).
- __init__(learning_rate, epsilon, epsilon_test, **alg_params)[source]
Constructor.
- Parameters:
epsilon (Parameter) – exploration coefficient for learning;
epsilon_test (Parameter) – exploration coefficient for test.
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- compute_Q(agent, states)[source]
Compute the Q Value for an AgentBuilder
- Parameters:
agent (Agent) – the considered agent;
states (np.ndarray) – the set of states over which we need to compute the Q function.
- class QLearningBuilder(learning_rate, epsilon, epsilon_test)[source]
Bases:
TDFiniteBuilder
- alg_class
alias of
QLearning
- class SARSABuilder(learning_rate, epsilon, epsilon_test)[source]
Bases:
TDFiniteBuilder
- alg_class
alias of
SARSA
- class SpeedyQLearningBuilder(learning_rate, epsilon, epsilon_test)[source]
Bases:
TDFiniteBuilder
- alg_class
alias of
SpeedyQLearning
- class DoubleQLearningBuilder(learning_rate, epsilon, epsilon_test)[source]
Bases:
TDFiniteBuilder
- alg_class
alias of
DoubleQLearning
- class WeightedQLearningBuilder(learning_rate, epsilon, epsilon_test, sampling, precision)[source]
Bases:
TDFiniteBuilder
- alg_class
alias of
WeightedQLearning
- class TDTraceBuilder(learning_rate, epsilon, epsilon_test, lambda_coeff, trace)[source]
Bases:
TDFiniteBuilder
Builder for TD algorithms with eligibility traces and finite states.
- class SARSALambdaBuilder(learning_rate, epsilon, epsilon_test, lambda_coeff, trace)[source]
Bases:
TDTraceBuilder
- alg_class
alias of
SARSALambda
- class QLambdaBuilder(learning_rate, epsilon, epsilon_test, lambda_coeff, trace)[source]
Bases:
TDTraceBuilder
- alg_class
alias of
QLambda
- class SarsaLambdaContinuousBuilder(policy, approximator, learning_rate, lambda_coeff, epsilon, epsilon_test, n_tilings, n_tiles)[source]
Bases:
TDContinuousBuilder
AgentBuilder for Sarsa(Lambda) Continuous. Using tiles as function approximator.
- __init__(policy, approximator, learning_rate, lambda_coeff, epsilon, epsilon_test, n_tilings, n_tiles)[source]
Constructor.
- Parameters:
approximator (class) – Q-function approximator.
- class TrueOnlineSarsaLambdaBuilder(policy, learning_rate, lambda_coeff, epsilon, epsilon_test, n_tilings, n_tiles)[source]
Bases:
TDContinuousBuilder
AgentBuilder for True Online Sarsa(Lambda) Continuous. Using tiles as function approximator.
- __init__(policy, learning_rate, lambda_coeff, epsilon, epsilon_test, n_tilings, n_tiles)[source]
Constructor.
DQN
- class DQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]
Bases:
AgentBuilder
AgentBuilder for Deep Q-Network (DQN).
- __init__(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]
Constructor.
- Parameters:
policy (Policy) – policy class;
approximator (dict) – Q-function approximator;
approximator_params (dict) – parameters of the Q-function approximator;
alg_params (dict) – parameters for the algorithm;
n_steps_per_fit (int, 1) – number of steps per fit.
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- compute_Q(agent, states)[source]
Compute the Q Value for an AgentBuilder
- Parameters:
agent (Agent) – the considered agent;
states (np.ndarray) – the set of states over which we need to compute the Q function.
- set_eval_mode(agent, eval)[source]
Set the eval mode for the agent. This function can be overwritten by any agent builder to setup specific evaluation mode for the agent.
- Parameters:
agent (Agent) – the considered agent;
eval (bool) – whether to set eval mode (true) or learn mode.
- classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]
Create a default initialization for the specific AgentBuilder and return it
- class DoubleDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]
Bases:
DQNBuilder
- class AveragedDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]
Bases:
DQNBuilder
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, n_approximators=10, use_cuda=False, get_default_dict=False)[source]
Create a default initialization for the specific AgentBuilder and return it
- class PrioritizedDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]
Bases:
DQNBuilder
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]
Create a default initialization for the specific AgentBuilder and return it
- class DuelingDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]
Bases:
DQNBuilder
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]
Create a default initialization for the specific AgentBuilder and return it
- class MaxminDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]
Bases:
DQNBuilder
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, n_approximators=3, use_cuda=False, get_default_dict=False)[source]
Create a default initialization for the specific AgentBuilder and return it
- class NoisyDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]
Bases:
DQNBuilder
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]
Create a default initialization for the specific AgentBuilder and return it
- class CategoricalDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]
Bases:
DQNBuilder
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, v_min=-10, v_max=10, n_atoms=51, use_cuda=False, get_default_dict=False)[source]
Create a default initialization for the specific AgentBuilder and return it
Actor Critic Builders
Classic AC
- class StochasticACBuilder(std_0, alpha_theta, alpha_v, lambda_par, n_tilings, n_tiles, **kwargs)[source]
Bases:
AgentBuilder
Builder for the stochastic actor critic algorithm. Using linear approximator with tiles for mean, standard deviation and value function approximator. The value function approximator also uses a bias term.
- __init__(std_0, alpha_theta, alpha_v, lambda_par, n_tilings, n_tiles, **kwargs)[source]
Constructor.
- Parameters:
std_0 (float) – initial standard deviation;
alpha_theta (Parameter) – Learning rate for the policy;
alpha_v (Parameter) – Learning rate for the value function;
n_tilings (int) – number of tilings to be used as approximator;
n_tiles (int) – number of tiles for each state space dimension.
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- class COPDAC_QBuilder(std_exp, std_eval, alpha_theta, alpha_omega, alpha_v, n_tilings, n_tiles, **kwargs)[source]
Bases:
AgentBuilder
Builder for the COPDAQ_Q actor critic algorithm. Using linear approximator with tiles for the mean and value function approximator.
- __init__(std_exp, std_eval, alpha_theta, alpha_omega, alpha_v, n_tilings, n_tiles, **kwargs)[source]
Constructor.
- Parameters:
std_exp (float) – exploration standard deviation;
std_eval (float) – evaluation standard deviation;
alpha_theta (Parameter) – Learning rate for the policy;
alpha_omega (Parameter) – Learning rate for the
alpha_v (Parameter) – Learning rate for the value function;
n_tilings (int) – number of tilings to be used as approximator;
n_tiles (int) – number of tiles for each state space dimension.
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- set_eval_mode(agent, eval)[source]
Set the eval mode for the agent. This function can be overwritten by any agent builder to setup specific evaluation mode for the agent.
- Parameters:
agent (Agent) – the considered agent;
eval (bool) – whether to set eval mode (true) or learn mode.
Deep AC
- class A2CBuilder(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=5, preprocessors=None)[source]
Bases:
AgentBuilder
AgentBuilder for Advantage Actor Critic algorithm (A2C)
- __init__(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=5, preprocessors=None)[source]
Constructor.
- Parameters:
policy_params (dict) – parameters for the policy;
actor_optimizer (dict) – parameters for the actor optimizer;
critic_params (dict) – parameters for the critic;
alg_params (dict) – parameters for the algorithm;
n_steps_per_fit (int, 5) – number of steps per fit;
preprocessors (list, None) – list of preprocessors.
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- compute_Q(agent, states)[source]
Compute the Q Value for an AgentBuilder
- Parameters:
agent (Agent) – the considered agent;
states (np.ndarray) – the set of states over which we need to compute the Q function.
- classmethod default(actor_lr=0.0007, critic_lr=0.0007, eps_actor=0.003, eps_critic=1e-05, batch_size=64, max_grad_norm=0.5, ent_coeff=0.01, critic_network=<class 'mushroom_rl_benchmark.builders.network.a2c_network.A2CNetwork'>, n_features=64, preprocessors=None, use_cuda=False, get_default_dict=False)[source]
Create a default initialization for the specific AgentBuilder and return it
- class DDPGBuilder(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, preprocessors=None, n_steps_per_fit=1)[source]
Bases:
AgentBuilder
AgentBuilder for Deep Deterministic Policy Gradient algorithm (DDPG)
- __init__(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, preprocessors=None, n_steps_per_fit=1)[source]
Constructor.
- Parameters:
policy_class (Policy) – policy class;
policy_params (dict) – parameters for the policy;
actor_params (dict) – parameters for the actor;
actor_optimizer (dict) – parameters for the actor optimizer;
critic_params (dict) – parameters for the critic;
alg_params (dict) – parameters for the algorithm;
n_steps_per_fit (int, 1) – number of steps per fit.
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- compute_Q(agent, states)[source]
Compute the Q Value for an AgentBuilder
- Parameters:
agent (Agent) – the considered agent;
states (np.ndarray) – the set of states over which we need to compute the Q function.
- classmethod default(actor_lr=0.0001, actor_network=<class 'mushroom_rl_benchmark.builders.network.ddpg_network.DDPGActorNetwork'>, critic_lr=0.001, critic_network=<class 'mushroom_rl_benchmark.builders.network.ddpg_network.DDPGCriticNetwork'>, initial_replay_size=500, max_replay_size=50000, batch_size=64, n_features=[80, 80], tau=0.001, use_cuda=False, preprocessors=None, get_default_dict=False)[source]
Create a default initialization for the specific AgentBuilder and return it
- class PPOBuilder(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]
Bases:
AgentBuilder
AgentBuilder for Proximal Policy Optimization algorithm (PPO)
- __init__(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]
Constructor.
- Parameters:
policy_params (dict) – parameters for the policy;
actor_optimizer (dict) – parameters for the actor optimizer;
critic_params (dict) – parameters for the critic;
alg_params (dict) – parameters for the algorithm;
n_steps_per_fit (int, 3000) – number of steps per fit;
preprocessors (list, None) – list of preprocessors.
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- compute_Q(agent, states)[source]
Compute the Q Value for an AgentBuilder
- Parameters:
agent (Agent) – the considered agent;
states (np.ndarray) – the set of states over which we need to compute the Q function.
- classmethod default(eps=0.2, ent_coeff=0.0, n_epochs_policy=4, actor_lr=0.0003, critic_lr=0.0003, critic_fit_params=None, critic_network=<class 'mushroom_rl_benchmark.builders.network.trpo_network.TRPONetwork'>, lam=0.95, batch_size=64, n_features=32, n_steps_per_fit=3000, std_0=1.0, preprocessors=None, use_cuda=False, get_default_dict=False)[source]
Create a default initialization for the specific AgentBuilder and return it
- class SACBuilder(actor_mu_params, actor_sigma_params, actor_optimizer, critic_params, alg_params, n_q_samples=100, n_steps_per_fit=1, preprocessors=None)[source]
Bases:
AgentBuilder
AgentBuilder Soft Actor-Critic algorithm (SAC)
- __init__(actor_mu_params, actor_sigma_params, actor_optimizer, critic_params, alg_params, n_q_samples=100, n_steps_per_fit=1, preprocessors=None)[source]
Constructor.
- Parameters:
actor_mu_params (dict) – parameters for actor mu;
actor_sigma_params (dict) – parameters for actor sigma;
actor_optimizer (dict) – parameters for the actor optimizer;
critic_params (dict) – parameters for the critic;
alg_params (dict) – parameters for the algorithm;
n_q_samples (int, 100) – number of samples to compute value function;
n_steps_per_fit (int, 1) – number of steps per fit;
preprocessors (list, None) – list of preprocessors.
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- compute_Q(agent, states)[source]
Compute the Q Value for an AgentBuilder
- Parameters:
agent (Agent) – the considered agent;
states (np.ndarray) – the set of states over which we need to compute the Q function.
- classmethod default(actor_lr=0.0003, actor_network=<class 'mushroom_rl_benchmark.builders.network.sac_network.SACActorNetwork'>, critic_lr=0.0003, critic_network=<class 'mushroom_rl_benchmark.builders.network.sac_network.SACCriticNetwork'>, initial_replay_size=64, max_replay_size=50000, n_features=64, warmup_transitions=100, batch_size=64, tau=0.005, lr_alpha=0.003, preprocessors=None, target_entropy=None, use_cuda=False, get_default_dict=False)[source]
Create a default initialization for the specific AgentBuilder and return it
- class TD3Builder(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1.0, preprocessors=None)[source]
Bases:
AgentBuilder
AgentBuilder for Twin Delayed DDPG algorithm (TD3)
- __init__(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1.0, preprocessors=None)[source]
Constructor.
- Parameters:
policy_class (Policy) – policy class;
policy_params (dict) – parameters for the policy;
actor_params (dict) – parameters for the actor;
actor_optimizer (dict) – parameters for the actor optimizer;
critic_params (dict) – parameters for the critic;
alg_params (dict) – parameters for the algorithm;
n_steps_per_fit (int, 1) – number of steps per fit.
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- compute_Q(agent, states)[source]
Compute the Q Value for an AgentBuilder
- Parameters:
agent (Agent) – the considered agent;
states (np.ndarray) – the set of states over which we need to compute the Q function.
- classmethod default(actor_lr=0.0001, actor_network=<class 'mushroom_rl_benchmark.builders.network.td3_network.TD3ActorNetwork'>, critic_lr=0.001, critic_network=<class 'mushroom_rl_benchmark.builders.network.td3_network.TD3CriticNetwork'>, initial_replay_size=500, max_replay_size=50000, batch_size=64, n_features=[80, 80], tau=0.001, preprocessors=None, use_cuda=False, get_default_dict=False)[source]
Create a default initialization for the specific AgentBuilder and return it
- class TRPOBuilder(policy_params, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]
Bases:
AgentBuilder
AgentBuilder for Trust Region Policy optimization algorithm (TRPO)
- __init__(policy_params, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]
Constructor.
- Parameters:
policy_params (dict) – parameters for the policy;
critic_params (dict) – parameters for the critic;
alg_params (dict) – parameters for the algorithm;
n_steps_per_fit (int, 3000) – number of steps per fit;
preprocessors (list, None) – list of preprocessors.
- build(mdp_info)[source]
Build and return the AgentBuilder
- Parameters:
mdp_info (MDPInfo) – information about the environment.
- compute_Q(agent, states)[source]
Compute the Q Value for an AgentBuilder
- Parameters:
agent (Agent) – the considered agent;
states (np.ndarray) – the set of states over which we need to compute the Q function.
- classmethod default(critic_lr=0.0003, critic_network=<class 'mushroom_rl_benchmark.builders.network.trpo_network.TRPONetwork'>, max_kl=0.01, ent_coeff=0.0, lam=0.95, batch_size=64, n_features=32, critic_fit_params=None, n_steps_per_fit=3000, n_epochs_line_search=10, n_epochs_cg=100, cg_damping=0.01, cg_residual_tol=1e-10, std_0=1.0, preprocessors=None, use_cuda=False, get_default_dict=False)[source]
Create a default initialization for the specific AgentBuilder and return it