Builders

class mushroom_rl_benchmark.builders.environment_builder.EnvironmentBuilder(env_name, env_params)[source]

Bases: object

Class to spawn instances of a MushroomRL environment

__init__(env_name, env_params)[source]

Constructor

Parameters:
  • env_name – name of the environment to build;
  • env_params – required parameters to build the specified environment.
build()[source]

Build and return an environment

static set_eval_mode(env, eval)[source]

Make changes to the environment for evaluation mode.

Parameters:
  • env (Environment) – the environment to change;
  • eval (bool) – flag for activating evaluation mode.
copy()[source]

Create a deepcopy of the environment_builder and return it

class mushroom_rl_benchmark.builders.agent_builder.AgentBuilder(n_steps_per_fit, compute_policy_entropy=True, compute_entropy_with_states=False, preprocessors=None)[source]

Bases: object

Base class to spawn instances of a MushroomRL agent

__init__(n_steps_per_fit, compute_policy_entropy=True, compute_entropy_with_states=False, preprocessors=None)[source]

Initialize AgentBuilder

set_n_steps_per_fit(n_steps_per_fit)[source]

Set n_steps_per_fit for the specific AgentBuilder

Parameters:n_steps_per_fit – number of steps per fit.
get_n_steps_per_fit()[source]

Get n_steps_per_fit for the specific AgentBuilder

set_preprocessors(preprocessors)[source]

Set preprocessor for the specific AgentBuilder

Parameters:preprocessors – list of preprocessor classes.
get_preprocessors()[source]

Get preprocessors for the specific AgentBuilder

copy()[source]

Create a deepcopy of the AgentBuilder and return it

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:mdp_info (MDPInfo) – information about the environment.
compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;
  • states (np.ndarray) – the set of states over which we need to compute the Q function.
set_eval_mode(agent, eval)[source]

Set the eval mode for the agent. This function can be overwritten by any agent builder to setup specific evaluation mode for the agent.

Parameters:
  • agent (Agent) – the considered agent;
  • eval (bool) – whether to set eval mode (true) or learn mode.
classmethod default(get_default_dict=False, **kwargs)[source]

Create a default initialization for the specific AgentBuilder and return it

Value Based Builders

class mushroom_rl_benchmark.builders.value.dqn.DQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Bases: mushroom_rl_benchmark.builders.agent_builder.AgentBuilder

AgentBuilder for Deep Q-Network (DQN).

__init__(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Constructor.

Parameters:
  • policy (Policy) – policy class;
  • approximator (dict) – Q-function approximator;
  • approximator_params (dict) – parameters of the Q-function approximator;
  • alg_params (dict) – parameters for the algorithm;
  • n_steps_per_fit (int, 1) – number of steps per fit.
build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:mdp_info (MDPInfo) – information about the environment.
compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;
  • states (np.ndarray) – the set of states over which we need to compute the Q function.
set_eval_mode(agent, eval)[source]

Set the eval mode for the agent. This function can be overwritten by any agent builder to setup specific evaluation mode for the agent.

Parameters:
  • agent (Agent) – the considered agent;
  • eval (bool) – whether to set eval mode (true) or learn mode.
classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.value.double_dqn.DoubleDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Bases: mushroom_rl_benchmark.builders.value.dqn.DQNBuilder

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:mdp_info (MDPInfo) – information about the environment.
class mushroom_rl_benchmark.builders.value.averaged_dqn.AveragedDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Bases: mushroom_rl_benchmark.builders.value.dqn.DQNBuilder

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:mdp_info (MDPInfo) – information about the environment.
classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, n_approximators=10, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.value.prioritized_dqn.PrioritizedDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Bases: mushroom_rl_benchmark.builders.value.dqn.DQNBuilder

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:mdp_info (MDPInfo) – information about the environment.
classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.value.dueling_dqn.DuelingDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Bases: mushroom_rl_benchmark.builders.value.dqn.DQNBuilder

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:mdp_info (MDPInfo) – information about the environment.
classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.value.maxmin_dqn.MaxminDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Bases: mushroom_rl_benchmark.builders.value.dqn.DQNBuilder

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:mdp_info (MDPInfo) – information about the environment.
classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, n_approximators=3, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.value.noisy_dqn.NoisyDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Bases: mushroom_rl_benchmark.builders.value.dqn.DQNBuilder

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:mdp_info (MDPInfo) – information about the environment.
classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.value.categorical_dqn.CategoricalDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]

Bases: mushroom_rl_benchmark.builders.value.dqn.DQNBuilder

build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:mdp_info (MDPInfo) – information about the environment.
classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, v_min=-10, v_max=10, n_atoms=51, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

Actor Critic Builders

class mushroom_rl_benchmark.builders.actor_critic.a2c.A2CBuilder(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=5, preprocessors=None)[source]

Bases: mushroom_rl_benchmark.builders.agent_builder.AgentBuilder

AgentBuilder for Advantage Actor Critic algorithm (A2C)

__init__(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=5, preprocessors=None)[source]

Constructor.

Parameters:
  • policy_params (dict) – parameters for the policy;
  • actor_optimizer (dict) – parameters for the actor optimizer;
  • critic_params (dict) – parameters for the critic;
  • alg_params (dict) – parameters for the algorithm;
  • n_steps_per_fit (int, 5) – number of steps per fit;
  • preprocessors (list, None) – list of preprocessors.
build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:mdp_info (MDPInfo) – information about the environment.
compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;
  • states (np.ndarray) – the set of states over which we need to compute the Q function.
classmethod default(actor_lr=0.0007, critic_lr=0.0007, eps_actor=0.003, eps_critic=1e-05, batch_size=64, max_grad_norm=0.5, ent_coeff=0.01, critic_network=<class 'mushroom_rl_benchmark.builders.network.a2c_network.A2CNetwork'>, n_features=64, preprocessors=None, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.actor_critic.ddpg.DDPGBuilder(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]

Bases: mushroom_rl_benchmark.builders.agent_builder.AgentBuilder

AgentBuilder for Deep Deterministic Policy Gradient algorithm (DDPG)

__init__(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]

Constructor.

Parameters:
  • policy_class (Policy) – policy class;
  • policy_params (dict) – parameters for the policy;
  • actor_params (dict) – parameters for the actor;
  • actor_optimizer (dict) – parameters for the actor optimizer;
  • critic_params (dict) – parameters for the critic;
  • alg_params (dict) – parameters for the algorithm;
  • n_steps_per_fit (int, 1) – number of steps per fit.
build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:mdp_info (MDPInfo) – information about the environment.
compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;
  • states (np.ndarray) – the set of states over which we need to compute the Q function.
classmethod default(actor_lr=0.0001, actor_network=<class 'mushroom_rl_benchmark.builders.network.ddpg_network.DDPGActorNetwork'>, critic_lr=0.001, critic_network=<class 'mushroom_rl_benchmark.builders.network.ddpg_network.DDPGCriticNetwork'>, initial_replay_size=500, max_replay_size=50000, batch_size=64, n_features=[80, 80], tau=0.001, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.actor_critic.ppo.PPOBuilder(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]

Bases: mushroom_rl_benchmark.builders.agent_builder.AgentBuilder

AgentBuilder for Proximal Policy Optimization algorithm (PPO)

__init__(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]

Constructor.

Parameters:
  • policy_params (dict) – parameters for the policy;
  • actor_optimizer (dict) – parameters for the actor optimizer;
  • critic_params (dict) – parameters for the critic;
  • alg_params (dict) – parameters for the algorithm;
  • n_steps_per_fit (int, 3000) – number of steps per fit;
  • preprocessors (list, None) – list of preprocessors.
build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:mdp_info (MDPInfo) – information about the environment.
compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;
  • states (np.ndarray) – the set of states over which we need to compute the Q function.
classmethod default(eps=0.2, n_epochs_policy=4, actor_lr=0.0003, critic_lr=0.0003, critic_fit_params=None, critic_network=<class 'mushroom_rl_benchmark.builders.network.trpo_network.TRPONetwork'>, lam=0.95, batch_size=64, n_features=32, n_steps_per_fit=3000, preprocessors=None, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.actor_critic.sac.SACBuilder(actor_mu_params, actor_sigma_params, actor_optimizer, critic_params, alg_params, n_q_samples=100, n_steps_per_fit=1, preprocessors=None)[source]

Bases: mushroom_rl_benchmark.builders.agent_builder.AgentBuilder

AgentBuilder Soft Actor-Critic algorithm (SAC)

__init__(actor_mu_params, actor_sigma_params, actor_optimizer, critic_params, alg_params, n_q_samples=100, n_steps_per_fit=1, preprocessors=None)[source]

Constructor.

Parameters:
  • actor_mu_params (dict) – parameters for actor mu;
  • actor_sigma_params (dict) – parameters for actor sigma;
  • actor_optimizer (dict) – parameters for the actor optimizer;
  • critic_params (dict) – parameters for the critic;
  • alg_params (dict) – parameters for the algorithm;
  • n_q_samples (int, 100) – number of samples to compute value function;
  • n_steps_per_fit (int, 1) – number of steps per fit;
  • preprocessors (list, None) – list of preprocessors.
build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:mdp_info (MDPInfo) – information about the environment.
compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;
  • states (np.ndarray) – the set of states over which we need to compute the Q function.
classmethod default(actor_lr=0.0003, actor_network=<class 'mushroom_rl_benchmark.builders.network.sac_network.SACActorNetwork'>, critic_lr=0.0003, critic_network=<class 'mushroom_rl_benchmark.builders.network.sac_network.SACCriticNetwork'>, initial_replay_size=64, max_replay_size=50000, n_features=64, warmup_transitions=100, batch_size=64, tau=0.005, lr_alpha=0.003, preprocessors=None, target_entropy=None, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.actor_critic.td3.TD3Builder(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]

Bases: mushroom_rl_benchmark.builders.agent_builder.AgentBuilder

AgentBuilder for Twin Delayed DDPG algorithm (TD3)

__init__(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]

Constructor.

Parameters:
  • policy_class (Policy) – policy class;
  • policy_params (dict) – parameters for the policy;
  • actor_params (dict) – parameters for the actor;
  • actor_optimizer (dict) – parameters for the actor optimizer;
  • critic_params (dict) – parameters for the critic;
  • alg_params (dict) – parameters for the algorithm;
  • n_steps_per_fit (int, 1) – number of steps per fit.
build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:mdp_info (MDPInfo) – information about the environment.
compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;
  • states (np.ndarray) – the set of states over which we need to compute the Q function.
classmethod default(actor_lr=0.0001, actor_network=<class 'mushroom_rl_benchmark.builders.network.td3_network.TD3ActorNetwork'>, critic_lr=0.001, critic_network=<class 'mushroom_rl_benchmark.builders.network.td3_network.TD3CriticNetwork'>, initial_replay_size=500, max_replay_size=50000, batch_size=64, n_features=[80, 80], tau=0.001, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.actor_critic.trpo.TRPOBuilder(policy_params, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]

Bases: mushroom_rl_benchmark.builders.agent_builder.AgentBuilder

AgentBuilder for Trust Region Policy optimization algorithm (TRPO)

__init__(policy_params, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]

Constructor.

Parameters:
  • policy_params (dict) – parameters for the policy;
  • critic_params (dict) – parameters for the critic;
  • alg_params (dict) – parameters for the algorithm;
  • n_steps_per_fit (int, 3000) – number of steps per fit;
  • preprocessors (list, None) – list of preprocessors.
build(mdp_info)[source]

Build and return the AgentBuilder

Parameters:mdp_info (MDPInfo) – information about the environment.
compute_Q(agent, states)[source]

Compute the Q Value for an AgentBuilder

Parameters:
  • agent (Agent) – the considered agent;
  • states (np.ndarray) – the set of states over which we need to compute the Q function.
classmethod default(critic_lr=0.0003, critic_network=<class 'mushroom_rl_benchmark.builders.network.trpo_network.TRPONetwork'>, max_kl=0.01, ent_coeff=0.0, lam=0.95, batch_size=64, n_features=32, critic_fit_params=None, n_steps_per_fit=3000, n_epochs_line_search=10, n_epochs_cg=100, cg_damping=0.01, cg_residual_tol=1e-10, preprocessors=None, use_cuda=False, get_default_dict=False)[source]

Create a default initialization for the specific AgentBuilder and return it