Builders

class EnvironmentBuilder(env_name, env_params)[source]

Bases: object

Class to spawn instances of a MushroomRL environment

__init__(env_name, env_params)[source]

Constructor

Parameters:
  • env_name – name of the environment to build;

  • env_params – required parameters to build the specified environment.

build()[source]

Build and return an environment

static set_eval_mode(env, eval)[source]

Make changes to the environment for evaluation mode.

Parameters:
  • env (Environment) – the environment to change;

  • eval (bool) – flag for activating evaluation mode.

copy()[source]

Create a deepcopy of the environment_builder and return it

class AgentBuilder(n_steps_per_fit=None, n_episodes_per_fit=None, compute_policy_entropy=True, compute_entropy_with_states=False, compute_value_function=True, preprocessors=None)[source]

Bases: object

Base class to spawn instances of a MushroomRL agent

__init__(n_steps_per_fit=None, n_episodes_per_fit=None, compute_policy_entropy=True, compute_entropy_with_states=False, compute_value_function=True, preprocessors=None)[source]

Initialize AgentBuilder

get_fit_params()[source]

Get n_steps_per_fit and n_episodes_per_fit for the specific AgentBuilder

set_preprocessors(preprocessors)[source]

Set preprocessor for the specific AgentBuilder

Parameters:

preprocessors – list of preprocessor classes.

get_preprocessors()[source]

Get preprocessors for the specific AgentBuilder

copy()[source]

Create a deepcopy of the AgentBuilder and return it

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;

  • states (np.ndarray) – the set of states over which we need to compute the Q function.

set_eval_mode(agent, eval)[source]

Set the eval mode for the agent. This function can be overwritten by any agent builder to setup specific evaluation mode for the agent.

Parameters:
  • agent (Agent) – the considered agent;

  • eval (bool) – whether to set eval mode (true) or learn mode.

classmethod default(get_default_dict=False, **kwargs)[source]

Create a default initialization for the specific AgentBuilder and return it

Policy Search Builders

Policy Gradient

class PolicyGradientBuilder(n_episodes_per_fit, optimizer, **kwargs)[source]

Bases: AgentBuilder

AgentBuilder for Policy Gradient Methods. The current builder uses a state dependant gaussian with diagonal standard deviation and linear mean.

__init__(n_episodes_per_fit, optimizer, **kwargs)[source]

Constructor.

Parameters:
  • optimizer (Optimizer) – optimizer to be used by the policy gradient algorithm;

  • **kwargs – others algorithms parameters.

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

classmethod default(n_episodes_per_fit=25, alpha=0.01, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;

  • states (np.ndarray) – the set of states over which we need to compute the Q function.

class REINFORCEBuilder(n_episodes_per_fit, optimizer, **kwargs)[source]

Bases: PolicyGradientBuilder

alg_class

alias of REINFORCE

class GPOMDPBuilder(n_episodes_per_fit, optimizer, **kwargs)[source]

Bases: PolicyGradientBuilder

alg_class

alias of GPOMDP

class eNACBuilder(n_episodes_per_fit, optimizer, **kwargs)[source]

Bases: PolicyGradientBuilder

alg_class

alias of eNAC

Black-Box optimization

class BBOBuilder(n_episodes_per_fit, **kwargs)[source]

Bases: AgentBuilder

AgentBuilder for Black Box optimization methods. The current builder uses a simple deterministic linear policy and gaussian Diagonal distribution.

__init__(n_episodes_per_fit, **kwargs)[source]

Constructor.

Parameters:
  • optimizer (Optimizer) – optimizer to be used by the policy gradient algorithm;

  • **kwargs – others algorithms parameters.

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

classmethod default(n_episodes_per_fit=25, alpha=0.01, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;

  • states (np.ndarray) – the set of states over which we need to compute the Q function.

class PGPEBuilder(n_episodes_per_fit, optimizer)[source]

Bases: BBOBuilder

alg_class

alias of PGPE

__init__(n_episodes_per_fit, optimizer)[source]

Constructor.

Parameters:
  • optimizer (Optimizer) – optimizer to be used by the policy gradient algorithm;

  • **kwargs – others algorithms parameters.

classmethod default(n_episodes_per_fit=25, alpha=0.3, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class RWRBuilder(n_episodes_per_fit, beta)[source]

Bases: BBOBuilder

alg_class

alias of RWR

__init__(n_episodes_per_fit, beta)[source]

Constructor.

Parameters:
  • optimizer (Optimizer) – optimizer to be used by the policy gradient algorithm;

  • **kwargs – others algorithms parameters.

classmethod default(n_episodes_per_fit=25, beta=0.01, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class REPSBuilder(n_episodes_per_fit, eps)[source]

Bases: BBOBuilder

alg_class

alias of REPS

__init__(n_episodes_per_fit, eps)[source]

Constructor.

Parameters:
  • optimizer (Optimizer) – optimizer to be used by the policy gradient algorithm;

  • **kwargs – others algorithms parameters.

classmethod default(n_episodes_per_fit=25, eps=0.05, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class ConstrainedREPSBuilder(n_episodes_per_fit, eps, kappa)[source]

Bases: BBOBuilder

alg_class

alias of ConstrainedREPS

__init__(n_episodes_per_fit, eps, kappa)[source]

Constructor.

Parameters:
  • optimizer (Optimizer) – optimizer to be used by the policy gradient algorithm;

  • **kwargs – others algorithms parameters.

classmethod default(n_episodes_per_fit=25, eps=0.05, kappa=0.01, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

Value Based Builders

Temporal Difference

class TDFiniteBuilder(learning_rate, epsilon, epsilon_test, **alg_params)[source]

Bases: AgentBuilder

AgentBuilder for a generic TD algorithm (for finite states).

__init__(learning_rate, epsilon, epsilon_test, **alg_params)[source]

Constructor.

Parameters:
  • epsilon (Parameter) – exploration coefficient for learning;

  • epsilon_test (Parameter) – exploration coefficient for test.

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;

  • states (np.ndarray) – the set of states over which we need to compute the Q function.

set_eval_mode(agent, eval)[source]

Set the eval mode for the agent. This function can be overwritten by any agent builder to setup specific evaluation mode for the agent.

Parameters:
  • agent (Agent) – the considered agent;

  • eval (bool) – whether to set eval mode (true) or learn mode.

classmethod default(learning_rate=0.9, epsilon=0.1, decay_lr=0.0, decay_eps=0.0, epsilon_test=0.0, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class QLearningBuilder(learning_rate, epsilon, epsilon_test)[source]

Bases: TDFiniteBuilder

alg_class

alias of QLearning

__init__(learning_rate, epsilon, epsilon_test)[source]

Constructor.

Parameters:
  • epsilon (Parameter) – exploration coefficient for learning;

  • epsilon_test (Parameter) – exploration coefficient for test.

class SARSABuilder(learning_rate, epsilon, epsilon_test)[source]

Bases: TDFiniteBuilder

alg_class

alias of SARSA

__init__(learning_rate, epsilon, epsilon_test)[source]

Constructor.

Parameters:
  • epsilon (Parameter) – exploration coefficient for learning;

  • epsilon_test (Parameter) – exploration coefficient for test.

class SpeedyQLearningBuilder(learning_rate, epsilon, epsilon_test)[source]

Bases: TDFiniteBuilder

alg_class

alias of SpeedyQLearning

__init__(learning_rate, epsilon, epsilon_test)[source]

Constructor.

Parameters:
  • epsilon (Parameter) – exploration coefficient for learning;

  • epsilon_test (Parameter) – exploration coefficient for test.

class DoubleQLearningBuilder(learning_rate, epsilon, epsilon_test)[source]

Bases: TDFiniteBuilder

alg_class

alias of DoubleQLearning

__init__(learning_rate, epsilon, epsilon_test)[source]

Constructor.

Parameters:
  • epsilon (Parameter) – exploration coefficient for learning;

  • epsilon_test (Parameter) – exploration coefficient for test.

compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;

  • states (np.ndarray) – the set of states over which we need to compute the Q function.

class WeightedQLearningBuilder(learning_rate, epsilon, epsilon_test, sampling, precision)[source]

Bases: TDFiniteBuilder

alg_class

alias of WeightedQLearning

__init__(learning_rate, epsilon, epsilon_test, sampling, precision)[source]

Constructor.

Parameters:
  • sampling (bool, True) – use the approximated version to speed up the computation;

  • precision (int, 1000) – number of samples to use in the approximated version.

classmethod default(learning_rate=0.9, epsilon=0.1, decay_lr=0.0, decay_eps=0.0, epsilon_test=0.0, sampling=True, precision=1000, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class TDTraceBuilder(learning_rate, epsilon, epsilon_test, lambda_coeff, trace)[source]

Bases: TDFiniteBuilder

Builder for TD algorithms with eligibility traces and finite states.

__init__(learning_rate, epsilon, epsilon_test, lambda_coeff, trace)[source]

Constructor.

lambda_coeff ([float, Parameter]): eligibility trace coefficient; trace (str): type of eligibility trace to use.

classmethod default(learning_rate=0.9, epsilon=0.1, decay_lr=0.0, decay_eps=0.0, epsilon_test=0.0, lambda_coeff=0.9, trace='replacing', get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class SARSALambdaBuilder(learning_rate, epsilon, epsilon_test, lambda_coeff, trace)[source]

Bases: TDTraceBuilder

alg_class

alias of SARSALambda

class QLambdaBuilder(learning_rate, epsilon, epsilon_test, lambda_coeff, trace)[source]

Bases: TDTraceBuilder

alg_class

alias of QLambda

class SarsaLambdaContinuousBuilder(policy, approximator, learning_rate, lambda_coeff, epsilon, epsilon_test, n_tilings, n_tiles)[source]

Bases: TDContinuousBuilder

AgentBuilder for Sarsa(Lambda) Continuous. Using tiles as function approximator.

__init__(policy, approximator, learning_rate, lambda_coeff, epsilon, epsilon_test, n_tilings, n_tiles)[source]

Constructor.

Parameters:

approximator (class) – Q-function approximator.

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

classmethod default(alpha=0.1, lambda_coeff=0.9, epsilon=0.0, decay_eps=0.0, epsilon_test=0.0, n_tilings=10, n_tiles=10, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class TrueOnlineSarsaLambdaBuilder(policy, learning_rate, lambda_coeff, epsilon, epsilon_test, n_tilings, n_tiles)[source]

Bases: TDContinuousBuilder

AgentBuilder for True Online Sarsa(Lambda) Continuous. Using tiles as function approximator.

__init__(policy, learning_rate, lambda_coeff, epsilon, epsilon_test, n_tilings, n_tiles)[source]

Constructor.

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

classmethod default(alpha=0.1, lambda_coeff=0.9, epsilon=0.0, decay_eps=0.0, epsilon_test=0.0, n_tilings=10, n_tiles=10, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

DQN

class DQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Bases: AgentBuilder

AgentBuilder for Deep Q-Network (DQN).

__init__(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Constructor.

Parameters:
  • policy (Policy) – policy class;

  • approximator (dict) – Q-function approximator;

  • approximator_params (dict) – parameters of the Q-function approximator;

  • alg_params (dict) – parameters for the algorithm;

  • n_steps_per_fit (int, 1) – number of steps per fit.

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;

  • states (np.ndarray) – the set of states over which we need to compute the Q function.

set_eval_mode(agent, eval)[source]

Set the eval mode for the agent. This function can be overwritten by any agent builder to setup specific evaluation mode for the agent.

Parameters:
  • agent (Agent) – the considered agent;

  • eval (bool) – whether to set eval mode (true) or learn mode.

classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class DoubleDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Bases: DQNBuilder

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

class AveragedDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Bases: DQNBuilder

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, n_approximators=10, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class PrioritizedDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Bases: DQNBuilder

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class DuelingDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Bases: DQNBuilder

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class MaxminDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Bases: DQNBuilder

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, n_approximators=3, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class NoisyDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Bases: DQNBuilder

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class CategoricalDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Bases: DQNBuilder

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, v_min=-10, v_max=10, n_atoms=51, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

Actor Critic Builders

Classic AC

class StochasticACBuilder(std_0, alpha_theta, alpha_v, lambda_par, n_tilings, n_tiles, **kwargs)[source]

Bases: AgentBuilder

Builder for the stochastic actor critic algorithm. Using linear approximator with tiles for mean, standard deviation and value function approximator. The value function approximator also uses a bias term.

__init__(std_0, alpha_theta, alpha_v, lambda_par, n_tilings, n_tiles, **kwargs)[source]

Constructor.

Parameters:
  • std_0 (float) – initial standard deviation;

  • alpha_theta (Parameter) – Learning rate for the policy;

  • alpha_v (Parameter) – Learning rate for the value function;

  • n_tilings (int) – number of tilings to be used as approximator;

  • n_tiles (int) – number of tiles for each state space dimension.

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

classmethod default(std_0=1.0, alpha_theta=0.001, alpha_v=0.1, lambda_par=0.9, n_tilings=10, n_tiles=11, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;

  • states (np.ndarray) – the set of states over which we need to compute the Q function.

class COPDAC_QBuilder(std_exp, std_eval, alpha_theta, alpha_omega, alpha_v, n_tilings, n_tiles, **kwargs)[source]

Bases: AgentBuilder

Builder for the COPDAQ_Q actor critic algorithm. Using linear approximator with tiles for the mean and value function approximator.

__init__(std_exp, std_eval, alpha_theta, alpha_omega, alpha_v, n_tilings, n_tiles, **kwargs)[source]

Constructor.

Parameters:
  • std_exp (float) – exploration standard deviation;

  • std_eval (float) – evaluation standard deviation;

  • alpha_theta (Parameter) – Learning rate for the policy;

  • alpha_omega (Parameter) – Learning rate for the

  • alpha_v (Parameter) – Learning rate for the value function;

  • n_tilings (int) – number of tilings to be used as approximator;

  • n_tiles (int) – number of tiles for each state space dimension.

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

set_eval_mode(agent, eval)[source]

Set the eval mode for the agent. This function can be overwritten by any agent builder to setup specific evaluation mode for the agent.

Parameters:
  • agent (Agent) – the considered agent;

  • eval (bool) – whether to set eval mode (true) or learn mode.

classmethod default(std_exp=0.1, std_eval=0.001, alpha_theta=0.005, alpha_omega=0.5, alpha_v=0.5, n_tilings=10, n_tiles=11, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;

  • states (np.ndarray) – the set of states over which we need to compute the Q function.

Deep AC

class A2CBuilder(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=5, preprocessors=None)[source]

Bases: AgentBuilder

AgentBuilder for Advantage Actor Critic algorithm (A2C)

__init__(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=5, preprocessors=None)[source]

Constructor.

Parameters:
  • policy_params (dict) – parameters for the policy;

  • actor_optimizer (dict) – parameters for the actor optimizer;

  • critic_params (dict) – parameters for the critic;

  • alg_params (dict) – parameters for the algorithm;

  • n_steps_per_fit (int, 5) – number of steps per fit;

  • preprocessors (list, None) – list of preprocessors.

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;

  • states (np.ndarray) – the set of states over which we need to compute the Q function.

classmethod default(actor_lr=0.0007, critic_lr=0.0007, eps_actor=0.003, eps_critic=1e-05, batch_size=64, max_grad_norm=0.5, ent_coeff=0.01, critic_network=<class 'mushroom_rl_benchmark.builders.network.a2c_network.A2CNetwork'>, n_features=64, preprocessors=None, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class DDPGBuilder(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, preprocessors=None, n_steps_per_fit=1)[source]

Bases: AgentBuilder

AgentBuilder for Deep Deterministic Policy Gradient algorithm (DDPG)

__init__(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, preprocessors=None, n_steps_per_fit=1)[source]

Constructor.

Parameters:
  • policy_class (Policy) – policy class;

  • policy_params (dict) – parameters for the policy;

  • actor_params (dict) – parameters for the actor;

  • actor_optimizer (dict) – parameters for the actor optimizer;

  • critic_params (dict) – parameters for the critic;

  • alg_params (dict) – parameters for the algorithm;

  • n_steps_per_fit (int, 1) – number of steps per fit.

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;

  • states (np.ndarray) – the set of states over which we need to compute the Q function.

classmethod default(actor_lr=0.0001, actor_network=<class 'mushroom_rl_benchmark.builders.network.ddpg_network.DDPGActorNetwork'>, critic_lr=0.001, critic_network=<class 'mushroom_rl_benchmark.builders.network.ddpg_network.DDPGCriticNetwork'>, initial_replay_size=500, max_replay_size=50000, batch_size=64, n_features=[80, 80], tau=0.001, use_cuda=False, preprocessors=None, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class PPOBuilder(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]

Bases: AgentBuilder

AgentBuilder for Proximal Policy Optimization algorithm (PPO)

__init__(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]

Constructor.

Parameters:
  • policy_params (dict) – parameters for the policy;

  • actor_optimizer (dict) – parameters for the actor optimizer;

  • critic_params (dict) – parameters for the critic;

  • alg_params (dict) – parameters for the algorithm;

  • n_steps_per_fit (int, 3000) – number of steps per fit;

  • preprocessors (list, None) – list of preprocessors.

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;

  • states (np.ndarray) – the set of states over which we need to compute the Q function.

classmethod default(eps=0.2, ent_coeff=0.0, n_epochs_policy=4, actor_lr=0.0003, critic_lr=0.0003, critic_fit_params=None, critic_network=<class 'mushroom_rl_benchmark.builders.network.trpo_network.TRPONetwork'>, lam=0.95, batch_size=64, n_features=32, n_steps_per_fit=3000, std_0=1.0, preprocessors=None, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class SACBuilder(actor_mu_params, actor_sigma_params, actor_optimizer, critic_params, alg_params, n_q_samples=100, n_steps_per_fit=1, preprocessors=None)[source]

Bases: AgentBuilder

AgentBuilder Soft Actor-Critic algorithm (SAC)

__init__(actor_mu_params, actor_sigma_params, actor_optimizer, critic_params, alg_params, n_q_samples=100, n_steps_per_fit=1, preprocessors=None)[source]

Constructor.

Parameters:
  • actor_mu_params (dict) – parameters for actor mu;

  • actor_sigma_params (dict) – parameters for actor sigma;

  • actor_optimizer (dict) – parameters for the actor optimizer;

  • critic_params (dict) – parameters for the critic;

  • alg_params (dict) – parameters for the algorithm;

  • n_q_samples (int, 100) – number of samples to compute value function;

  • n_steps_per_fit (int, 1) – number of steps per fit;

  • preprocessors (list, None) – list of preprocessors.

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;

  • states (np.ndarray) – the set of states over which we need to compute the Q function.

classmethod default(actor_lr=0.0003, actor_network=<class 'mushroom_rl_benchmark.builders.network.sac_network.SACActorNetwork'>, critic_lr=0.0003, critic_network=<class 'mushroom_rl_benchmark.builders.network.sac_network.SACCriticNetwork'>, initial_replay_size=64, max_replay_size=50000, n_features=64, warmup_transitions=100, batch_size=64, tau=0.005, lr_alpha=0.003, preprocessors=None, target_entropy=None, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class TD3Builder(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1.0, preprocessors=None)[source]

Bases: AgentBuilder

AgentBuilder for Twin Delayed DDPG algorithm (TD3)

__init__(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1.0, preprocessors=None)[source]

Constructor.

Parameters:
  • policy_class (Policy) – policy class;

  • policy_params (dict) – parameters for the policy;

  • actor_params (dict) – parameters for the actor;

  • actor_optimizer (dict) – parameters for the actor optimizer;

  • critic_params (dict) – parameters for the critic;

  • alg_params (dict) – parameters for the algorithm;

  • n_steps_per_fit (int, 1) – number of steps per fit.

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;

  • states (np.ndarray) – the set of states over which we need to compute the Q function.

classmethod default(actor_lr=0.0001, actor_network=<class 'mushroom_rl_benchmark.builders.network.td3_network.TD3ActorNetwork'>, critic_lr=0.001, critic_network=<class 'mushroom_rl_benchmark.builders.network.td3_network.TD3CriticNetwork'>, initial_replay_size=500, max_replay_size=50000, batch_size=64, n_features=[80, 80], tau=0.001, preprocessors=None, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class TRPOBuilder(policy_params, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]

Bases: AgentBuilder

AgentBuilder for Trust Region Policy optimization algorithm (TRPO)

__init__(policy_params, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]

Constructor.

Parameters:
  • policy_params (dict) – parameters for the policy;

  • critic_params (dict) – parameters for the critic;

  • alg_params (dict) – parameters for the algorithm;

  • n_steps_per_fit (int, 3000) – number of steps per fit;

  • preprocessors (list, None) – list of preprocessors.

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:

mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;

  • states (np.ndarray) – the set of states over which we need to compute the Q function.

classmethod default(critic_lr=0.0003, critic_network=<class 'mushroom_rl_benchmark.builders.network.trpo_network.TRPONetwork'>, max_kl=0.01, ent_coeff=0.0, lam=0.95, batch_size=64, n_features=32, critic_fit_params=None, n_steps_per_fit=3000, n_epochs_line_search=10, n_epochs_cg=100, cg_damping=0.01, cg_residual_tol=1e-10, std_0=1.0, preprocessors=None, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it