Builders¶
-
class
mushroom_rl_benchmark.builders.environment_builder.
EnvironmentBuilder
(env_name, env_params)[source]¶ Bases:
object
Class to spawn instances of a MushroomRL environment
-
__init__
(env_name, env_params)[source]¶ Constructor
Parameters: - env_name – name of the environment to build;
- env_params – required parameters to build the specified environment.
-
-
class
mushroom_rl_benchmark.builders.agent_builder.
AgentBuilder
(n_steps_per_fit, compute_policy_entropy=True, compute_entropy_with_states=False, preprocessors=None)[source]¶ Bases:
object
Base class to spawn instances of a MushroomRL agent
-
__init__
(n_steps_per_fit, compute_policy_entropy=True, compute_entropy_with_states=False, preprocessors=None)[source]¶ Initialize AgentBuilder
-
set_n_steps_per_fit
(n_steps_per_fit)[source]¶ Set n_steps_per_fit for the specific AgentBuilder
Parameters: n_steps_per_fit – number of steps per fit.
-
set_preprocessors
(preprocessors)[source]¶ Set preprocessor for the specific AgentBuilder
Parameters: preprocessors – list of preprocessor classes.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
Value Based Builders¶
-
class
mushroom_rl_benchmark.builders.value.dqn.
DQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Deep Q-Network (DQN).
-
__init__
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Constructor.
Parameters: - policy (Policy) – policy class;
- approximator (dict) – Q-function approximator;
- approximator_params (dict) – parameters of the Q-function approximator;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 1) – number of steps per fit.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
set_eval_mode
(agent, eval)[source]¶ Set the eval mode for the agent. This function can be overwritten by any agent builder to setup specific evaluation mode for the agent.
Parameters: - agent (Agent) – the considered agent;
- eval (bool) – whether to set eval mode (true) or learn mode.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
mushroom_rl_benchmark.builders.value.double_dqn.
DoubleDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶
-
class
mushroom_rl_benchmark.builders.value.averaged_dqn.
AveragedDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, n_approximators=10, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
mushroom_rl_benchmark.builders.value.prioritized_dqn.
PrioritizedDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
mushroom_rl_benchmark.builders.value.dueling_dqn.
DuelingDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
mushroom_rl_benchmark.builders.value.maxmin_dqn.
MaxminDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, n_approximators=3, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
mushroom_rl_benchmark.builders.value.noisy_dqn.
NoisyDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
mushroom_rl_benchmark.builders.value.categorical_dqn.
CategoricalDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, v_min=-10, v_max=10, n_atoms=51, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
Actor Critic Builders¶
-
class
mushroom_rl_benchmark.builders.actor_critic.a2c.
A2CBuilder
(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=5, preprocessors=None)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Advantage Actor Critic algorithm (A2C)
-
__init__
(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=5, preprocessors=None)[source]¶ Constructor.
Parameters: - policy_params (dict) – parameters for the policy;
- actor_optimizer (dict) – parameters for the actor optimizer;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 5) – number of steps per fit;
- preprocessors (list, None) – list of preprocessors.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(actor_lr=0.0007, critic_lr=0.0007, eps_actor=0.003, eps_critic=1e-05, batch_size=64, max_grad_norm=0.5, ent_coeff=0.01, critic_network=<class 'mushroom_rl_benchmark.builders.network.a2c_network.A2CNetwork'>, n_features=64, preprocessors=None, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
mushroom_rl_benchmark.builders.actor_critic.ddpg.
DDPGBuilder
(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Deep Deterministic Policy Gradient algorithm (DDPG)
-
__init__
(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]¶ Constructor.
Parameters: - policy_class (Policy) – policy class;
- policy_params (dict) – parameters for the policy;
- actor_params (dict) – parameters for the actor;
- actor_optimizer (dict) – parameters for the actor optimizer;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 1) – number of steps per fit.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(actor_lr=0.0001, actor_network=<class 'mushroom_rl_benchmark.builders.network.ddpg_network.DDPGActorNetwork'>, critic_lr=0.001, critic_network=<class 'mushroom_rl_benchmark.builders.network.ddpg_network.DDPGCriticNetwork'>, initial_replay_size=500, max_replay_size=50000, batch_size=64, n_features=[80, 80], tau=0.001, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
mushroom_rl_benchmark.builders.actor_critic.ppo.
PPOBuilder
(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Proximal Policy Optimization algorithm (PPO)
-
__init__
(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]¶ Constructor.
Parameters: - policy_params (dict) – parameters for the policy;
- actor_optimizer (dict) – parameters for the actor optimizer;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 3000) – number of steps per fit;
- preprocessors (list, None) – list of preprocessors.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(eps=0.2, n_epochs_policy=4, actor_lr=0.0003, critic_lr=0.0003, critic_fit_params=None, critic_network=<class 'mushroom_rl_benchmark.builders.network.trpo_network.TRPONetwork'>, lam=0.95, batch_size=64, n_features=32, n_steps_per_fit=3000, preprocessors=None, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
mushroom_rl_benchmark.builders.actor_critic.sac.
SACBuilder
(actor_mu_params, actor_sigma_params, actor_optimizer, critic_params, alg_params, n_q_samples=100, n_steps_per_fit=1, preprocessors=None)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder Soft Actor-Critic algorithm (SAC)
-
__init__
(actor_mu_params, actor_sigma_params, actor_optimizer, critic_params, alg_params, n_q_samples=100, n_steps_per_fit=1, preprocessors=None)[source]¶ Constructor.
Parameters: - actor_mu_params (dict) – parameters for actor mu;
- actor_sigma_params (dict) – parameters for actor sigma;
- actor_optimizer (dict) – parameters for the actor optimizer;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_q_samples (int, 100) – number of samples to compute value function;
- n_steps_per_fit (int, 1) – number of steps per fit;
- preprocessors (list, None) – list of preprocessors.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(actor_lr=0.0003, actor_network=<class 'mushroom_rl_benchmark.builders.network.sac_network.SACActorNetwork'>, critic_lr=0.0003, critic_network=<class 'mushroom_rl_benchmark.builders.network.sac_network.SACCriticNetwork'>, initial_replay_size=64, max_replay_size=50000, n_features=64, warmup_transitions=100, batch_size=64, tau=0.005, lr_alpha=0.003, preprocessors=None, target_entropy=None, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
mushroom_rl_benchmark.builders.actor_critic.td3.
TD3Builder
(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Twin Delayed DDPG algorithm (TD3)
-
__init__
(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]¶ Constructor.
Parameters: - policy_class (Policy) – policy class;
- policy_params (dict) – parameters for the policy;
- actor_params (dict) – parameters for the actor;
- actor_optimizer (dict) – parameters for the actor optimizer;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 1) – number of steps per fit.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(actor_lr=0.0001, actor_network=<class 'mushroom_rl_benchmark.builders.network.td3_network.TD3ActorNetwork'>, critic_lr=0.001, critic_network=<class 'mushroom_rl_benchmark.builders.network.td3_network.TD3CriticNetwork'>, initial_replay_size=500, max_replay_size=50000, batch_size=64, n_features=[80, 80], tau=0.001, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
mushroom_rl_benchmark.builders.actor_critic.trpo.
TRPOBuilder
(policy_params, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Trust Region Policy optimization algorithm (TRPO)
-
__init__
(policy_params, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]¶ Constructor.
Parameters: - policy_params (dict) – parameters for the policy;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 3000) – number of steps per fit;
- preprocessors (list, None) – list of preprocessors.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(critic_lr=0.0003, critic_network=<class 'mushroom_rl_benchmark.builders.network.trpo_network.TRPONetwork'>, max_kl=0.01, ent_coeff=0.0, lam=0.95, batch_size=64, n_features=32, critic_fit_params=None, n_steps_per_fit=3000, n_epochs_line_search=10, n_epochs_cg=100, cg_damping=0.01, cg_residual_tol=1e-10, preprocessors=None, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-