Builders¶
-
class
EnvironmentBuilder
(env_name, env_params)[source]¶ Bases:
object
Class to spawn instances of a MushroomRL environment
-
__init__
(env_name, env_params)[source]¶ Constructor
Parameters: - env_name – name of the environment to build;
- env_params – required parameters to build the specified environment.
-
-
class
AgentBuilder
(n_steps_per_fit=None, n_episodes_per_fit=None, compute_policy_entropy=True, compute_entropy_with_states=False, compute_value_function=True, preprocessors=None)[source]¶ Bases:
object
Base class to spawn instances of a MushroomRL agent
-
__init__
(n_steps_per_fit=None, n_episodes_per_fit=None, compute_policy_entropy=True, compute_entropy_with_states=False, compute_value_function=True, preprocessors=None)[source]¶ Initialize AgentBuilder
-
set_preprocessors
(preprocessors)[source]¶ Set preprocessor for the specific AgentBuilder
Parameters: preprocessors – list of preprocessor classes.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
Policy Search Builders¶
Policy Gradient¶
-
class
PolicyGradientBuilder
(n_episodes_per_fit, optimizer, **kwargs)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Policy Gradient Methods. The current builder uses a state dependant gaussian with diagonal standard deviation and linear mean.
-
__init__
(n_episodes_per_fit, optimizer, **kwargs)[source]¶ Constructor.
Parameters: - optimizer (Optimizer) – optimizer to be used by the policy gradient algorithm;
- **kwargs – others algorithms parameters.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
-
class
REINFORCEBuilder
(n_episodes_per_fit, optimizer, **kwargs)[source]¶ Bases:
mushroom_rl_benchmark.builders.policy_search.policy_gradient.PolicyGradientBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.policy_search.policy_gradient.reinforce.REINFORCE
-
-
class
GPOMDPBuilder
(n_episodes_per_fit, optimizer, **kwargs)[source]¶ Bases:
mushroom_rl_benchmark.builders.policy_search.policy_gradient.PolicyGradientBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.policy_search.policy_gradient.gpomdp.GPOMDP
-
-
class
eNACBuilder
(n_episodes_per_fit, optimizer, **kwargs)[source]¶ Bases:
mushroom_rl_benchmark.builders.policy_search.policy_gradient.PolicyGradientBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.policy_search.policy_gradient.enac.eNAC
-
Black-Box optimization¶
-
class
BBOBuilder
(n_episodes_per_fit, **kwargs)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Black Box optimization methods. The current builder uses a simple deterministic linear policy and gaussian Diagonal distribution.
-
__init__
(n_episodes_per_fit, **kwargs)[source]¶ Constructor.
Parameters: - optimizer (Optimizer) – optimizer to be used by the policy gradient algorithm;
- **kwargs – others algorithms parameters.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
-
class
PGPEBuilder
(n_episodes_per_fit, optimizer)[source]¶ Bases:
mushroom_rl_benchmark.builders.policy_search.black_box_optimization.BBOBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.policy_search.black_box_optimization.pgpe.PGPE
-
-
class
RWRBuilder
(n_episodes_per_fit, beta)[source]¶ Bases:
mushroom_rl_benchmark.builders.policy_search.black_box_optimization.BBOBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.policy_search.black_box_optimization.rwr.RWR
-
-
class
REPSBuilder
(n_episodes_per_fit, eps)[source]¶ Bases:
mushroom_rl_benchmark.builders.policy_search.black_box_optimization.BBOBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.policy_search.black_box_optimization.reps.REPS
-
-
class
ConstrainedREPSBuilder
(n_episodes_per_fit, eps, kappa)[source]¶ Bases:
mushroom_rl_benchmark.builders.policy_search.black_box_optimization.BBOBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.policy_search.black_box_optimization.constrained_reps.ConstrainedREPS
-
Value Based Builders¶
Temporal Difference¶
-
class
TDFiniteBuilder
(learning_rate, epsilon, epsilon_test, **alg_params)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for a generic TD algorithm (for finite states).
-
__init__
(learning_rate, epsilon, epsilon_test, **alg_params)[source]¶ Constructor.
Parameters: - epsilon (Parameter) – exploration coefficient for learning;
- epsilon_test (Parameter) – exploration coefficient for test.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
-
class
QLearningBuilder
(learning_rate, epsilon, epsilon_test)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_finite.TDFiniteBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.value.td.q_learning.QLearning
-
-
class
SARSABuilder
(learning_rate, epsilon, epsilon_test)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_finite.TDFiniteBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.value.td.sarsa.SARSA
-
-
class
SpeedyQLearningBuilder
(learning_rate, epsilon, epsilon_test)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_finite.TDFiniteBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.value.td.speedy_q_learning.SpeedyQLearning
-
-
class
DoubleQLearningBuilder
(learning_rate, epsilon, epsilon_test)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_finite.TDFiniteBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.value.td.double_q_learning.DoubleQLearning
-
-
class
WeightedQLearningBuilder
(learning_rate, epsilon, epsilon_test, sampling, precision)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_finite.TDFiniteBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.value.td.weighted_q_learning.WeightedQLearning
-
-
class
TDTraceBuilder
(learning_rate, epsilon, epsilon_test, lambda_coeff, trace)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_finite.TDFiniteBuilder
Builder for TD algorithms with eligibility traces and finite states.
-
class
SARSALambdaBuilder
(learning_rate, epsilon, epsilon_test, lambda_coeff, trace)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_trace.TDTraceBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.value.td.sarsa_lambda.SARSALambda
-
-
class
QLambdaBuilder
(learning_rate, epsilon, epsilon_test, lambda_coeff, trace)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_trace.TDTraceBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.value.td.q_lambda.QLambda
-
-
class
SarsaLambdaContinuousBuilder
(policy, approximator, learning_rate, lambda_coeff, epsilon, epsilon_test, n_tilings, n_tiles)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_continuous.TDContinuousBuilder
AgentBuilder for Sarsa(Lambda) Continuous. Using tiles as function approximator.
-
__init__
(policy, approximator, learning_rate, lambda_coeff, epsilon, epsilon_test, n_tilings, n_tiles)[source]¶ Constructor.
Parameters: approximator (class) – Q-function approximator.
-
-
class
TrueOnlineSarsaLambdaBuilder
(policy, learning_rate, lambda_coeff, epsilon, epsilon_test, n_tilings, n_tiles)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_continuous.TDContinuousBuilder
AgentBuilder for True Online Sarsa(Lambda) Continuous. Using tiles as function approximator.
-
__init__
(policy, learning_rate, lambda_coeff, epsilon, epsilon_test, n_tilings, n_tiles)[source]¶ Constructor.
-
DQN¶
-
class
DQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Deep Q-Network (DQN).
-
__init__
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Constructor.
Parameters: - policy (Policy) – policy class;
- approximator (dict) – Q-function approximator;
- approximator_params (dict) – parameters of the Q-function approximator;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 1) – number of steps per fit.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
set_eval_mode
(agent, eval)[source]¶ Set the eval mode for the agent. This function can be overwritten by any agent builder to setup specific evaluation mode for the agent.
Parameters: - agent (Agent) – the considered agent;
- eval (bool) – whether to set eval mode (true) or learn mode.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
DoubleDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.dqn.DQNBuilder
-
class
AveragedDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, n_approximators=10, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
PrioritizedDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
DuelingDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
MaxminDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, n_approximators=3, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
NoisyDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
CategoricalDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, v_min=-10, v_max=10, n_atoms=51, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
Actor Critic Builders¶
Classic AC¶
-
class
StochasticACBuilder
(std_0, alpha_theta, alpha_v, lambda_par, n_tilings, n_tiles, **kwargs)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
Builder for the stochastic actor critic algorithm. Using linear approximator with tiles for mean, standard deviation and value function approximator. The value function approximator also uses a bias term.
-
__init__
(std_0, alpha_theta, alpha_v, lambda_par, n_tilings, n_tiles, **kwargs)[source]¶ Constructor.
Parameters: - std_0 (float) – initial standard deviation;
- alpha_theta (Parameter) – Learning rate for the policy;
- alpha_v (Parameter) – Learning rate for the value function;
- n_tilings (int) – number of tilings to be used as approximator;
- n_tiles (int) – number of tiles for each state space dimension.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
-
class
COPDAC_QBuilder
(std_exp, std_eval, alpha_theta, alpha_omega, alpha_v, n_tilings, n_tiles, **kwargs)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
Builder for the COPDAQ_Q actor critic algorithm. Using linear approximator with tiles for the mean and value function approximator.
-
__init__
(std_exp, std_eval, alpha_theta, alpha_omega, alpha_v, n_tilings, n_tiles, **kwargs)[source]¶ Constructor.
Parameters: - std_exp (float) – exploration standard deviation;
- std_eval (float) – evaluation standard deviation;
- alpha_theta (Parameter) – Learning rate for the policy;
- alpha_omega (Parameter) – Learning rate for the
- alpha_v (Parameter) – Learning rate for the value function;
- n_tilings (int) – number of tilings to be used as approximator;
- n_tiles (int) – number of tiles for each state space dimension.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
set_eval_mode
(agent, eval)[source]¶ Set the eval mode for the agent. This function can be overwritten by any agent builder to setup specific evaluation mode for the agent.
Parameters: - agent (Agent) – the considered agent;
- eval (bool) – whether to set eval mode (true) or learn mode.
-
Deep AC¶
-
class
A2CBuilder
(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=5, preprocessors=None)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Advantage Actor Critic algorithm (A2C)
-
__init__
(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=5, preprocessors=None)[source]¶ Constructor.
Parameters: - policy_params (dict) – parameters for the policy;
- actor_optimizer (dict) – parameters for the actor optimizer;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 5) – number of steps per fit;
- preprocessors (list, None) – list of preprocessors.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(actor_lr=0.0007, critic_lr=0.0007, eps_actor=0.003, eps_critic=1e-05, batch_size=64, max_grad_norm=0.5, ent_coeff=0.01, critic_network=<class 'mushroom_rl_benchmark.builders.network.a2c_network.A2CNetwork'>, n_features=64, preprocessors=None, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
DDPGBuilder
(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Deep Deterministic Policy Gradient algorithm (DDPG)
-
__init__
(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]¶ Constructor.
Parameters: - policy_class (Policy) – policy class;
- policy_params (dict) – parameters for the policy;
- actor_params (dict) – parameters for the actor;
- actor_optimizer (dict) – parameters for the actor optimizer;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 1) – number of steps per fit.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(actor_lr=0.0001, actor_network=<class 'mushroom_rl_benchmark.builders.network.ddpg_network.DDPGActorNetwork'>, critic_lr=0.001, critic_network=<class 'mushroom_rl_benchmark.builders.network.ddpg_network.DDPGCriticNetwork'>, initial_replay_size=500, max_replay_size=50000, batch_size=64, n_features=[80, 80], tau=0.001, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
PPOBuilder
(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Proximal Policy Optimization algorithm (PPO)
-
__init__
(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]¶ Constructor.
Parameters: - policy_params (dict) – parameters for the policy;
- actor_optimizer (dict) – parameters for the actor optimizer;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 3000) – number of steps per fit;
- preprocessors (list, None) – list of preprocessors.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(eps=0.2, ent_coeff=0.0, n_epochs_policy=4, actor_lr=0.0003, critic_lr=0.0003, critic_fit_params=None, critic_network=<class 'mushroom_rl_benchmark.builders.network.trpo_network.TRPONetwork'>, lam=0.95, batch_size=64, n_features=32, n_steps_per_fit=3000, std_0=1.0, preprocessors=None, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
SACBuilder
(actor_mu_params, actor_sigma_params, actor_optimizer, critic_params, alg_params, n_q_samples=100, n_steps_per_fit=1, preprocessors=None)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder Soft Actor-Critic algorithm (SAC)
-
__init__
(actor_mu_params, actor_sigma_params, actor_optimizer, critic_params, alg_params, n_q_samples=100, n_steps_per_fit=1, preprocessors=None)[source]¶ Constructor.
Parameters: - actor_mu_params (dict) – parameters for actor mu;
- actor_sigma_params (dict) – parameters for actor sigma;
- actor_optimizer (dict) – parameters for the actor optimizer;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_q_samples (int, 100) – number of samples to compute value function;
- n_steps_per_fit (int, 1) – number of steps per fit;
- preprocessors (list, None) – list of preprocessors.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(actor_lr=0.0003, actor_network=<class 'mushroom_rl_benchmark.builders.network.sac_network.SACActorNetwork'>, critic_lr=0.0003, critic_network=<class 'mushroom_rl_benchmark.builders.network.sac_network.SACCriticNetwork'>, initial_replay_size=64, max_replay_size=50000, n_features=64, warmup_transitions=100, batch_size=64, tau=0.005, lr_alpha=0.003, preprocessors=None, target_entropy=None, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
TD3Builder
(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Twin Delayed DDPG algorithm (TD3)
-
__init__
(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]¶ Constructor.
Parameters: - policy_class (Policy) – policy class;
- policy_params (dict) – parameters for the policy;
- actor_params (dict) – parameters for the actor;
- actor_optimizer (dict) – parameters for the actor optimizer;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 1) – number of steps per fit.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(actor_lr=0.0001, actor_network=<class 'mushroom_rl_benchmark.builders.network.td3_network.TD3ActorNetwork'>, critic_lr=0.001, critic_network=<class 'mushroom_rl_benchmark.builders.network.td3_network.TD3CriticNetwork'>, initial_replay_size=500, max_replay_size=50000, batch_size=64, n_features=[80, 80], tau=0.001, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
TRPOBuilder
(policy_params, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Trust Region Policy optimization algorithm (TRPO)
-
__init__
(policy_params, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]¶ Constructor.
Parameters: - policy_params (dict) – parameters for the policy;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 3000) – number of steps per fit;
- preprocessors (list, None) – list of preprocessors.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(critic_lr=0.0003, critic_network=<class 'mushroom_rl_benchmark.builders.network.trpo_network.TRPONetwork'>, max_kl=0.01, ent_coeff=0.0, lam=0.95, batch_size=64, n_features=32, critic_fit_params=None, n_steps_per_fit=3000, n_epochs_line_search=10, n_epochs_cg=100, cg_damping=0.01, cg_residual_tol=1e-10, std_0=1.0, preprocessors=None, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-