Builders¶

class mushroom_rl_benchmark.builders.environment_builder.EnvironmentBuilder(env_name, env_params)[source]¶

Bases: object

Class to spawn instances of a MushroomRL environment

__init__(env_name, env_params)[source]¶

Constructor

Parameters:	env_name – name of the environment to build; env_params – required parameters to build the specified environment.

build()[source]¶: Build and return an environment

static set_eval_mode(env, eval)[source]¶

Make changes to the environment for evaluation mode.

Parameters:	env (Environment) – the environment to change; eval (bool) – flag for activating evaluation mode.

copy()[source]¶: Create a deepcopy of the environment_builder and return it

class mushroom_rl_benchmark.builders.agent_builder.AgentBuilder(n_steps_per_fit, compute_policy_entropy=True, compute_entropy_with_states=False, preprocessors=None)[source]¶

Bases: object

Base class to spawn instances of a MushroomRL agent

__init__(n_steps_per_fit, compute_policy_entropy=True, compute_entropy_with_states=False, preprocessors=None)[source]¶: Initialize AgentBuilder

set_n_steps_per_fit(n_steps_per_fit)[source]¶

Set n_steps_per_fit for the specific AgentBuilder

Parameters:	n_steps_per_fit – number of steps per fit.

get_n_steps_per_fit()[source]¶: Get n_steps_per_fit for the specific AgentBuilder

set_preprocessors(preprocessors)[source]¶

Set preprocessor for the specific AgentBuilder

Parameters:	preprocessors – list of preprocessor classes.

get_preprocessors()[source]¶: Get preprocessors for the specific AgentBuilder

copy()[source]¶: Create a deepcopy of the AgentBuilder and return it

build(mdp_info)[source]¶

Build and return the AgentBuilder

Parameters:	mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]¶

Compute the Q Value for an AgentBuilder

Parameters:	agent (Agent) – the considered agent; states (np.ndarray) – the set of states over which we need to compute the Q function.

set_eval_mode(agent, eval)[source]¶

Set the eval mode for the agent. This function can be overwritten by any agent builder to setup specific evaluation mode for the agent.

Parameters:	agent (Agent) – the considered agent; eval (bool) – whether to set eval mode (true) or learn mode.

classmethod default(get_default_dict=False, **kwargs)[source]¶: Create a default initialization for the specific AgentBuilder and return it

Value Based Builders¶

class mushroom_rl_benchmark.builders.value.dqn.DQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶

Bases: mushroom_rl_benchmark.builders.agent_builder.AgentBuilder

AgentBuilder for Deep Q-Network (DQN).

__init__(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶

Constructor.

Parameters:	policy (Policy) – policy class; approximator (dict) – Q-function approximator; approximator_params (dict) – parameters of the Q-function approximator; alg_params (dict) – parameters for the algorithm; n_steps_per_fit (int, 1) – number of steps per fit.

build(mdp_info)[source]¶

Build and return the AgentBuilder

Parameters:	mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]¶

Compute the Q Value for an AgentBuilder

Parameters:	agent (Agent) – the considered agent; states (np.ndarray) – the set of states over which we need to compute the Q function.

set_eval_mode(agent, eval)[source]¶

Set the eval mode for the agent. This function can be overwritten by any agent builder to setup specific evaluation mode for the agent.

Parameters:	agent (Agent) – the considered agent; eval (bool) – whether to set eval mode (true) or learn mode.

classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]¶: Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.value.double_dqn.DoubleDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶

Bases: mushroom_rl_benchmark.builders.value.dqn.DQNBuilder

build(mdp_info)[source]¶

Build and return the AgentBuilder

Parameters:	mdp_info (MDPInfo) – information about the environment.

class mushroom_rl_benchmark.builders.value.averaged_dqn.AveragedDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶

Bases: mushroom_rl_benchmark.builders.value.dqn.DQNBuilder

build(mdp_info)[source]¶

Build and return the AgentBuilder

Parameters:	mdp_info (MDPInfo) – information about the environment.

classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, n_approximators=10, use_cuda=False, get_default_dict=False)[source]¶: Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.value.prioritized_dqn.PrioritizedDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶

Bases: mushroom_rl_benchmark.builders.value.dqn.DQNBuilder

build(mdp_info)[source]¶

Build and return the AgentBuilder

Parameters:	mdp_info (MDPInfo) – information about the environment.

classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]¶: Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.value.dueling_dqn.DuelingDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶

Bases: mushroom_rl_benchmark.builders.value.dqn.DQNBuilder

build(mdp_info)[source]¶

Build and return the AgentBuilder

Parameters:	mdp_info (MDPInfo) – information about the environment.

classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]¶: Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.value.maxmin_dqn.MaxminDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶

Bases: mushroom_rl_benchmark.builders.value.dqn.DQNBuilder

build(mdp_info)[source]¶

Build and return the AgentBuilder

Parameters:	mdp_info (MDPInfo) – information about the environment.

classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, n_approximators=3, use_cuda=False, get_default_dict=False)[source]¶: Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.value.noisy_dqn.NoisyDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶

Bases: mushroom_rl_benchmark.builders.value.dqn.DQNBuilder

build(mdp_info)[source]¶

Build and return the AgentBuilder

Parameters:	mdp_info (MDPInfo) – information about the environment.

classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]¶: Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.value.categorical_dqn.CategoricalDQNBuilder(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶

Bases: mushroom_rl_benchmark.builders.value.dqn.DQNBuilder

build(mdp_info)[source]¶

Build and return the AgentBuilder

Parameters:	mdp_info (MDPInfo) – information about the environment.

classmethod default(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, v_min=-10, v_max=10, n_atoms=51, use_cuda=False, get_default_dict=False)[source]¶: Create a default initialization for the specific AgentBuilder and return it

Actor Critic Builders¶

class mushroom_rl_benchmark.builders.actor_critic.a2c.A2CBuilder(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=5, preprocessors=None)[source]¶

Bases: mushroom_rl_benchmark.builders.agent_builder.AgentBuilder

AgentBuilder for Advantage Actor Critic algorithm (A2C)

__init__(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=5, preprocessors=None)[source]¶

Constructor.

Parameters:	policy_params (dict) – parameters for the policy; actor_optimizer (dict) – parameters for the actor optimizer; critic_params (dict) – parameters for the critic; alg_params (dict) – parameters for the algorithm; n_steps_per_fit (int, 5) – number of steps per fit; preprocessors (list, None) – list of preprocessors.

build(mdp_info)[source]¶

Build and return the AgentBuilder

Parameters:	mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]¶

Compute the Q Value for an AgentBuilder

Parameters:	agent (Agent) – the considered agent; states (np.ndarray) – the set of states over which we need to compute the Q function.

classmethod default(actor_lr=0.0007, critic_lr=0.0007, eps_actor=0.003, eps_critic=1e-05, batch_size=64, max_grad_norm=0.5, ent_coeff=0.01, critic_network=<class 'mushroom_rl_benchmark.builders.network.a2c_network.A2CNetwork'>, n_features=64, preprocessors=None, use_cuda=False, get_default_dict=False)[source]¶: Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.actor_critic.ddpg.DDPGBuilder(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]¶

Bases: mushroom_rl_benchmark.builders.agent_builder.AgentBuilder

AgentBuilder for Deep Deterministic Policy Gradient algorithm (DDPG)

__init__(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]¶

Constructor.

Parameters:

policy_class (Policy) – policy class;
policy_params (dict) – parameters for the policy;
actor_params (dict) – parameters for the actor;
actor_optimizer (dict) – parameters for the actor optimizer;
critic_params (dict) – parameters for the critic;
alg_params (dict) – parameters for the algorithm;
n_steps_per_fit (int, 1) – number of steps per fit.

build(mdp_info)[source]¶

Build and return the AgentBuilder

Parameters:	mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]¶

Compute the Q Value for an AgentBuilder

Parameters:	agent (Agent) – the considered agent; states (np.ndarray) – the set of states over which we need to compute the Q function.

classmethod default(actor_lr=0.0001, actor_network=<class 'mushroom_rl_benchmark.builders.network.ddpg_network.DDPGActorNetwork'>, critic_lr=0.001, critic_network=<class 'mushroom_rl_benchmark.builders.network.ddpg_network.DDPGCriticNetwork'>, initial_replay_size=500, max_replay_size=50000, batch_size=64, n_features=[80, 80], tau=0.001, use_cuda=False, get_default_dict=False)[source]¶: Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.actor_critic.ppo.PPOBuilder(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]¶

Bases: mushroom_rl_benchmark.builders.agent_builder.AgentBuilder

AgentBuilder for Proximal Policy Optimization algorithm (PPO)

__init__(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]¶

Constructor.

Parameters:	policy_params (dict) – parameters for the policy; actor_optimizer (dict) – parameters for the actor optimizer; critic_params (dict) – parameters for the critic; alg_params (dict) – parameters for the algorithm; n_steps_per_fit (int, 3000) – number of steps per fit; preprocessors (list, None) – list of preprocessors.

build(mdp_info)[source]¶

Build and return the AgentBuilder

Parameters:	mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]¶

Compute the Q Value for an AgentBuilder

Parameters:	agent (Agent) – the considered agent; states (np.ndarray) – the set of states over which we need to compute the Q function.

classmethod default(eps=0.2, n_epochs_policy=4, actor_lr=0.0003, critic_lr=0.0003, critic_fit_params=None, critic_network=<class 'mushroom_rl_benchmark.builders.network.trpo_network.TRPONetwork'>, lam=0.95, batch_size=64, n_features=32, n_steps_per_fit=3000, preprocessors=None, use_cuda=False, get_default_dict=False)[source]¶: Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.actor_critic.sac.SACBuilder(actor_mu_params, actor_sigma_params, actor_optimizer, critic_params, alg_params, n_q_samples=100, n_steps_per_fit=1, preprocessors=None)[source]¶

Bases: mushroom_rl_benchmark.builders.agent_builder.AgentBuilder

AgentBuilder Soft Actor-Critic algorithm (SAC)

__init__(actor_mu_params, actor_sigma_params, actor_optimizer, critic_params, alg_params, n_q_samples=100, n_steps_per_fit=1, preprocessors=None)[source]¶

Constructor.

Parameters:

actor_mu_params (dict) – parameters for actor mu;
actor_sigma_params (dict) – parameters for actor sigma;
actor_optimizer (dict) – parameters for the actor optimizer;
critic_params (dict) – parameters for the critic;
alg_params (dict) – parameters for the algorithm;
n_q_samples (int, 100) – number of samples to compute value function;
n_steps_per_fit (int, 1) – number of steps per fit;
preprocessors (list, None) – list of preprocessors.

build(mdp_info)[source]¶

Build and return the AgentBuilder

Parameters:	mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]¶

Compute the Q Value for an AgentBuilder

Parameters:	agent (Agent) – the considered agent; states (np.ndarray) – the set of states over which we need to compute the Q function.

classmethod default(actor_lr=0.0003, actor_network=<class 'mushroom_rl_benchmark.builders.network.sac_network.SACActorNetwork'>, critic_lr=0.0003, critic_network=<class 'mushroom_rl_benchmark.builders.network.sac_network.SACCriticNetwork'>, initial_replay_size=64, max_replay_size=50000, n_features=64, warmup_transitions=100, batch_size=64, tau=0.005, lr_alpha=0.003, preprocessors=None, target_entropy=None, use_cuda=False, get_default_dict=False)[source]¶: Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.actor_critic.td3.TD3Builder(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]¶

Bases: mushroom_rl_benchmark.builders.agent_builder.AgentBuilder

AgentBuilder for Twin Delayed DDPG algorithm (TD3)

__init__(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]¶

Constructor.

Parameters:

policy_class (Policy) – policy class;
policy_params (dict) – parameters for the policy;
actor_params (dict) – parameters for the actor;
actor_optimizer (dict) – parameters for the actor optimizer;
critic_params (dict) – parameters for the critic;
alg_params (dict) – parameters for the algorithm;
n_steps_per_fit (int, 1) – number of steps per fit.

build(mdp_info)[source]¶

Build and return the AgentBuilder

Parameters:	mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]¶

Compute the Q Value for an AgentBuilder

Parameters:	agent (Agent) – the considered agent; states (np.ndarray) – the set of states over which we need to compute the Q function.

classmethod default(actor_lr=0.0001, actor_network=<class 'mushroom_rl_benchmark.builders.network.td3_network.TD3ActorNetwork'>, critic_lr=0.001, critic_network=<class 'mushroom_rl_benchmark.builders.network.td3_network.TD3CriticNetwork'>, initial_replay_size=500, max_replay_size=50000, batch_size=64, n_features=[80, 80], tau=0.001, use_cuda=False, get_default_dict=False)[source]¶: Create a default initialization for the specific AgentBuilder and return it

class mushroom_rl_benchmark.builders.actor_critic.trpo.TRPOBuilder(policy_params, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]¶

Bases: mushroom_rl_benchmark.builders.agent_builder.AgentBuilder

AgentBuilder for Trust Region Policy optimization algorithm (TRPO)

__init__(policy_params, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]¶

Constructor.

Parameters:	policy_params (dict) – parameters for the policy; critic_params (dict) – parameters for the critic; alg_params (dict) – parameters for the algorithm; n_steps_per_fit (int, 3000) – number of steps per fit; preprocessors (list, None) – list of preprocessors.

build(mdp_info)[source]¶

Build and return the AgentBuilder

Parameters:	mdp_info (MDPInfo) – information about the environment.

compute_Q(agent, states)[source]¶

Compute the Q Value for an AgentBuilder

Parameters:	agent (Agent) – the considered agent; states (np.ndarray) – the set of states over which we need to compute the Q function.

classmethod default(critic_lr=0.0003, critic_network=<class 'mushroom_rl_benchmark.builders.network.trpo_network.TRPONetwork'>, max_kl=0.01, ent_coeff=0.0, lam=0.95, batch_size=64, n_features=32, critic_fit_params=None, n_steps_per_fit=3000, n_epochs_line_search=10, n_epochs_cg=100, cg_damping=0.01, cg_residual_tol=1e-10, preprocessors=None, use_cuda=False, get_default_dict=False)[source]¶: Create a default initialization for the specific AgentBuilder and return it