MushroomRL Benchmark¶
Reinforcement Learning python library¶
MushroomRL Benchmark is a benchmarking tool for the Mushroom RL library. The focus of this benchmarking tool is to benchmark the results of deep reinforcement learning algorithms, in particular Deep Actor-Critic. The idea behind MushroomRL Benchmarking is to have a complete platform to run batch comparisons of Deep RL algorithms implemented in MushroomRL under a set of standard benchmark tasks.
With MushroomRL Benchmarking you can:
- Run the benchmarks in a local machine, both sequentially and in parallel fashion
- Run experiments on a SLURM-based cluster.
Download and installation¶
MushroomRL Benchmark can be downloaded from the GitHub repository. Installation can be done running
cd mushroom-rl-benchmark
pip install -e .[all]
To compile the documentation:
cd mushroom-rl-benchmark/docs
make html
or to compile the pdf version:
cd mushroom-rl-benchmark/docs
make latexpdf
Benchmarks¶
Policy Search Benchmarks¶
We provide the benchmarks for the Policy Gradient algorithms:
- REINFORCE
- GPOMDP
- eNAC
We provide the benchmarks for the following Black-Box optimization algorithms:
- RWR
- REPS
- PGPE
- ConstrainedREPS
We consider the following environments in the benchmark
Actor-Critic Benchmarks¶
We provide the benchmarks for the following Deep Actor-Critic algorithms:
- StochasticAC
- COPDAC_Q
We provide the benchmarks for the following Deep Actor-Critic algorithms:
- A2C
- PPO
- TRPO
- SAC
- DDPG
- TD3
We consider the following environments in the benchmark
Classic Control Environments Benchmarks¶
Run Parameters | |
n_runs | 25 |
n_epochs | 100 |
n_episodes | 10 |
n_episodes_test | 5 |
InvertedPendulum¶
Gym Control Environments Benchmarks¶
Run Parameters | |
n_runs | 25 |
n_epochs | 10 |
n_steps | 30000 |
n_episodes_test | 10 |
Pendulum-v0¶
A2C:
actor_lr: 0.0007
batch_size: 64
critic_lr: 0.0007
critic_network: A2CNetwork
ent_coeff: 0.01
eps_actor: 0.003
eps_critic: 1.0e-05
max_grad_norm: 0.5
n_features: 64
preprocessors: null
DDPG:
actor_lr: 0.0001
actor_network: DDPGActorNetwork
batch_size: 64
critic_lr: 0.001
critic_network: DDPGCriticNetwork
initial_replay_size: 128
max_replay_size: 1000000
n_features:
- 64
- 64
tau: 0.001
PPO:
actor_lr: 0.0003
batch_size: 64
critic_fit_params: null
critic_lr: 0.0003
critic_network: TRPONetwork
eps: 0.2
lam: 0.95
n_epochs_policy: 4
n_features: 32
n_steps_per_fit: 3000
preprocessors: null
SAC:
actor_lr: 0.0001
actor_network: SACActorNetwork
batch_size: 64
critic_lr: 0.0003
critic_network: SACCriticNetwork
initial_replay_size: 128
lr_alpha: 0.0003
max_replay_size: 500000
n_features: 64
preprocessors: null
target_entropy: null
tau: 0.001
warmup_transitions: 128
TD3:
actor_lr: 0.0001
actor_network: TD3ActorNetwork
batch_size: 64
critic_lr: 0.001
critic_network: TD3CriticNetwork
initial_replay_size: 128
max_replay_size: 1000000
n_features:
- 64
- 64
tau: 0.001
TRPO:
batch_size: 64
cg_damping: 0.01
cg_residual_tol: 1.0e-10
critic_fit_params: null
critic_lr: 0.0003
critic_network: TRPONetwork
ent_coeff: 0.0
lam: 0.95
max_kl: 0.01
n_epochs_cg: 100
n_epochs_line_search: 10
n_features: 32
n_steps_per_fit: 3000
preprocessors: null
LunarLanderContinuous-v2¶
A2C:
actor_lr: 0.0007
batch_size: 64
critic_lr: 0.0007
critic_network: A2CNetwork
ent_coeff: 0.01
eps_actor: 0.003
eps_critic: 1.0e-05
max_grad_norm: 0.5
n_features: 64
preprocessors: StandardizationPreprocessor
DDPG:
actor_lr: 0.0001
actor_network: DDPGActorNetwork
batch_size: 64
critic_lr: 0.001
critic_network: DDPGCriticNetwork
initial_replay_size: 128
max_replay_size: 1000000
n_features:
- 64
- 64
tau: 0.001
PPO:
actor_lr: 0.0003
batch_size: 64
critic_fit_params: null
critic_lr: 0.003
critic_network: TRPONetwork
eps: 0.2
lam: 0.95
n_epochs_policy: 4
n_features: 32
n_steps_per_fit: 3000
preprocessors: StandardizationPreprocessor
SAC:
actor_lr: 0.0001
actor_network: SACActorNetwork
batch_size: 64
critic_lr: 0.0003
critic_network: SACCriticNetwork
initial_replay_size: 128
lr_alpha: 0.0003
max_replay_size: 500000
n_features: 64
preprocessors: null
target_entropy: null
tau: 0.005
warmup_transitions: 128
TD3:
actor_lr: 0.0001
actor_network: TD3ActorNetwork
batch_size: 64
critic_lr: 0.001
critic_network: TD3CriticNetwork
initial_replay_size: 128
max_replay_size: 1000000
n_features:
- 64
- 64
tau: 0.001
TRPO:
batch_size: 64
cg_damping: 0.01
cg_residual_tol: 1.0e-10
critic_fit_params: null
critic_lr: 0.03
critic_network: TRPONetwork
ent_coeff: 0.0
lam: 0.95
max_kl: 0.01
n_epochs_cg: 100
n_epochs_line_search: 10
n_features: 32
n_steps_per_fit: 3000
preprocessors: StandardizationPreprocessor
Mujoco Environments Benchmarks¶
Run Parameters | |
n_runs | 25 |
n_epochs | 50 |
n_steps | 30000 |
n_episodes_test | 10 |
Hopper-v3¶
A2C:
actor_lr: 0.0007
batch_size: 64
critic_lr: 0.0007
critic_network: A2CNetwork
ent_coeff: 0.01
eps_actor: 0.003
eps_critic: 1.0e-05
max_grad_norm: 0.5
n_features: 64
preprocessors: StandardizationPreprocessor
DDPG:
actor_lr: 0.0001
actor_network: DDPGActorNetwork
batch_size: 128
critic_lr: 0.001
critic_network: DDPGCriticNetwork
initial_replay_size: 5000
max_replay_size: 1000000
n_features:
- 400
- 300
tau: 0.001
PPO:
actor_lr: 0.0003
batch_size: 32
critic_fit_params: null
critic_lr: 0.0003
critic_network: TRPONetwork
eps: 0.2
lam: 0.95
n_epochs_policy: 10
n_features: 32
n_steps_per_fit: 2000
preprocessors: StandardizationPreprocessor
SAC:
actor_lr: 0.0001
actor_network: SACActorNetwork
batch_size: 256
critic_lr: 0.0003
critic_network: SACCriticNetwork
initial_replay_size: 5000
lr_alpha: 0.0003
max_replay_size: 500000
n_features: 256
preprocessors: null
target_entropy: null
tau: 0.005
warmup_transitions: 10000
TD3:
actor_lr: 0.001
actor_network: TD3ActorNetwork
batch_size: 100
critic_lr: 0.001
critic_network: TD3CriticNetwork
initial_replay_size: 1000
max_replay_size: 1000000
n_features:
- 400
- 300
tau: 0.005
TRPO:
batch_size: 64
cg_damping: 0.01
cg_residual_tol: 1.0e-10
critic_fit_params: null
critic_lr: 0.001
critic_network: TRPONetwork
ent_coeff: 0.0
lam: 0.95
max_kl: 0.01
n_epochs_cg: 100
n_epochs_line_search: 10
n_features: 32
n_steps_per_fit: 1000
preprocessors: StandardizationPreprocessor
Walker2d-v3¶
A2C:
actor_lr: 0.0007
batch_size: 64
critic_lr: 0.0007
critic_network: A2CNetwork
ent_coeff: 0.01
eps_actor: 0.003
eps_critic: 1.0e-05
max_grad_norm: 0.5
n_features: 64
preprocessors: StandardizationPreprocessor
DDPG:
actor_lr: 0.0001
actor_network: DDPGActorNetwork
batch_size: 128
critic_lr: 0.001
critic_network: DDPGCriticNetwork
initial_replay_size: 5000
max_replay_size: 1000000
n_features:
- 400
- 300
tau: 0.001
PPO:
actor_lr: 0.0003
batch_size: 32
critic_fit_params: null
critic_lr: 0.0003
critic_network: TRPONetwork
eps: 0.2
lam: 0.95
n_epochs_policy: 10
n_features: 32
n_steps_per_fit: 2000
preprocessors: StandardizationPreprocessor
SAC:
actor_lr: 0.0001
actor_network: SACActorNetwork
batch_size: 256
critic_lr: 0.0003
critic_network: SACCriticNetwork
initial_replay_size: 5000
lr_alpha: 0.0003
max_replay_size: 500000
n_features: 256
preprocessors: null
target_entropy: null
tau: 0.005
warmup_transitions: 10000
TD3:
actor_lr: 0.001
actor_network: TD3ActorNetwork
batch_size: 100
critic_lr: 0.001
critic_network: TD3CriticNetwork
initial_replay_size: 1000
max_replay_size: 1000000
n_features:
- 400
- 300
tau: 0.005
TRPO:
batch_size: 64
cg_damping: 0.01
cg_residual_tol: 1.0e-10
critic_fit_params: null
critic_lr: 0.001
critic_network: TRPONetwork
ent_coeff: 0.0
lam: 0.95
max_kl: 0.01
n_epochs_cg: 100
n_epochs_line_search: 10
n_features: 32
n_steps_per_fit: 1000
preprocessors: StandardizationPreprocessor
HalfCheetah-v3¶
A2C:
actor_lr: 0.0007
batch_size: 64
critic_lr: 0.0007
critic_network: A2CNetwork
ent_coeff: 0.01
eps_actor: 0.003
eps_critic: 1.0e-05
max_grad_norm: 0.5
n_features: 64
preprocessors: StandardizationPreprocessor
DDPG:
actor_lr: 0.0001
actor_network: DDPGActorNetwork
batch_size: 128
critic_lr: 0.001
critic_network: DDPGCriticNetwork
initial_replay_size: 5000
max_replay_size: 1000000
n_features:
- 400
- 300
tau: 0.001
PPO:
actor_lr: 0.0003
batch_size: 32
critic_fit_params: null
critic_lr: 0.0003
critic_network: TRPONetwork
eps: 0.2
lam: 0.95
n_epochs_policy: 10
n_features: 32
n_steps_per_fit: 2000
preprocessors: StandardizationPreprocessor
SAC:
actor_lr: 0.0001
actor_network: SACActorNetwork
batch_size: 256
critic_lr: 0.0003
critic_network: SACCriticNetwork
initial_replay_size: 5000
lr_alpha: 0.0003
max_replay_size: 500000
n_features: 256
preprocessors: null
target_entropy: null
tau: 0.005
warmup_transitions: 10000
TD3:
actor_lr: 0.001
actor_network: TD3ActorNetwork
batch_size: 100
critic_lr: 0.001
critic_network: TD3CriticNetwork
initial_replay_size: 10000
max_replay_size: 1000000
n_features:
- 400
- 300
tau: 0.005
TRPO:
batch_size: 64
cg_damping: 0.01
cg_residual_tol: 1.0e-10
critic_fit_params: null
critic_lr: 0.001
critic_network: TRPONetwork
ent_coeff: 0.0
lam: 0.95
max_kl: 0.01
n_epochs_cg: 100
n_epochs_line_search: 10
n_features: 32
n_steps_per_fit: 1000
preprocessors: StandardizationPreprocessor
Ant-v3¶
A2C:
actor_lr: 0.0007
batch_size: 64
critic_lr: 0.0007
critic_network: A2CNetwork
ent_coeff: 0.01
eps_actor: 0.003
eps_critic: 1.0e-05
max_grad_norm: 0.5
n_features: 64
preprocessors: StandardizationPreprocessor
DDPG:
actor_lr: 0.0001
actor_network: DDPGActorNetwork
batch_size: 128
critic_lr: 0.001
critic_network: DDPGCriticNetwork
initial_replay_size: 5000
max_replay_size: 1000000
n_features:
- 400
- 300
tau: 0.001
PPO:
actor_lr: 0.0003
batch_size: 32
critic_fit_params: null
critic_lr: 0.0003
critic_network: TRPONetwork
eps: 0.2
lam: 0.95
n_epochs_policy: 10
n_features: 32
n_steps_per_fit: 2000
preprocessors: StandardizationPreprocessor
SAC:
actor_lr: 0.0001
actor_network: SACActorNetwork
batch_size: 256
critic_lr: 0.0003
critic_network: SACCriticNetwork
initial_replay_size: 5000
lr_alpha: 0.0003
max_replay_size: 500000
n_features: 256
preprocessors: null
target_entropy: null
tau: 0.005
warmup_transitions: 10000
TD3:
actor_lr: 0.001
actor_network: TD3ActorNetwork
batch_size: 100
critic_lr: 0.001
critic_network: TD3CriticNetwork
initial_replay_size: 10000
max_replay_size: 1000000
n_features:
- 400
- 300
tau: 0.005
TRPO:
batch_size: 64
cg_damping: 0.01
cg_residual_tol: 1.0e-10
critic_fit_params: null
critic_lr: 0.001
critic_network: TRPONetwork
ent_coeff: 0.0
lam: 0.95
max_kl: 0.01
n_epochs_cg: 100
n_epochs_line_search: 10
n_features: 32
n_steps_per_fit: 1000
preprocessors: StandardizationPreprocessor
Bullet Environments Benchmarks¶
Run Parameters | |
n_runs | 25 |
n_epochs | 50 |
n_steps | 30000 |
n_episodes_test | 10 |
HopperBulletEnv-v0¶
A2C:
actor_lr: 0.0007
batch_size: 64
critic_lr: 0.0007
critic_network: A2CNetwork
ent_coeff: 0.01
eps_actor: 0.003
eps_critic: 1.0e-05
max_grad_norm: 0.5
n_features: 64
preprocessors: null
DDPG:
actor_lr: 0.0001
actor_network: DDPGActorNetwork
batch_size: 128
critic_lr: 0.001
critic_network: DDPGCriticNetwork
initial_replay_size: 5000
max_replay_size: 1000000
n_features:
- 400
- 300
tau: 0.001
PPO:
actor_lr: 0.0003
batch_size: 64
critic_fit_params: null
critic_lr: 0.0003
critic_network: TRPONetwork
eps: 0.2
lam: 0.95
n_epochs_policy: 4
n_features: 32
n_steps_per_fit: 3000
preprocessors: null
SAC:
actor_lr: 0.0001
actor_network: SACActorNetwork
batch_size: 256
critic_lr: 0.0003
critic_network: SACCriticNetwork
initial_replay_size: 5000
lr_alpha: 0.0003
max_replay_size: 500000
n_features: 256
preprocessors: null
target_entropy: null
tau: 0.005
warmup_transitions: 10000
TD3:
actor_lr: 0.001
actor_network: TD3ActorNetwork
batch_size: 100
critic_lr: 0.001
critic_network: TD3CriticNetwork
initial_replay_size: 1000
max_replay_size: 1000000
n_features:
- 400
- 300
tau: 0.005
TRPO:
batch_size: 64
cg_damping: 0.01
cg_residual_tol: 1.0e-10
critic_fit_params: null
critic_lr: 0.003
critic_network: TRPONetwork
ent_coeff: 0.0
lam: 0.95
max_kl: 0.01
n_epochs_cg: 100
n_epochs_line_search: 10
n_features: 32
n_steps_per_fit: 3000
preprocessors: null
Walker2DBulletEnv-v0¶
A2C:
actor_lr: 0.0007
batch_size: 64
critic_lr: 0.0007
critic_network: A2CNetwork
ent_coeff: 0.01
eps_actor: 0.003
eps_critic: 1.0e-05
max_grad_norm: 0.5
n_features: 64
preprocessors: null
DDPG:
actor_lr: 0.0001
actor_network: DDPGActorNetwork
batch_size: 128
critic_lr: 0.001
critic_network: DDPGCriticNetwork
initial_replay_size: 5000
max_replay_size: 1000000
n_features:
- 400
- 300
tau: 0.001
PPO:
actor_lr: 0.0003
batch_size: 64
critic_fit_params: null
critic_lr: 0.0003
critic_network: TRPONetwork
eps: 0.2
lam: 0.95
n_epochs_policy: 4
n_features: 32
n_steps_per_fit: 3000
preprocessors: null
SAC:
actor_lr: 0.0001
actor_network: SACActorNetwork
batch_size: 256
critic_lr: 0.0003
critic_network: SACCriticNetwork
initial_replay_size: 5000
lr_alpha: 0.0003
max_replay_size: 500000
n_features: 256
preprocessors: null
target_entropy: null
tau: 0.005
warmup_transitions: 10000
TD3:
actor_lr: 0.001
actor_network: TD3ActorNetwork
batch_size: 100
critic_lr: 0.001
critic_network: TD3CriticNetwork
initial_replay_size: 1000
max_replay_size: 1000000
n_features:
- 400
- 300
tau: 0.005
TRPO:
batch_size: 64
cg_damping: 0.01
cg_residual_tol: 1.0e-10
critic_fit_params: null
critic_lr: 0.003
critic_network: TRPONetwork
ent_coeff: 0.0
lam: 0.95
max_kl: 0.01
n_epochs_cg: 100
n_epochs_line_search: 10
n_features: 32
n_steps_per_fit: 3000
preprocessors: null
HalfCheetahBulletEnv-v0¶
A2C:
actor_lr: 0.0007
batch_size: 64
critic_lr: 0.0007
critic_network: A2CNetwork
ent_coeff: 0.01
eps_actor: 0.003
eps_critic: 1.0e-05
max_grad_norm: 0.5
n_features: 64
preprocessors: null
DDPG:
actor_lr: 0.0001
actor_network: DDPGActorNetwork
batch_size: 128
critic_lr: 0.001
critic_network: DDPGCriticNetwork
initial_replay_size: 5000
max_replay_size: 1000000
n_features:
- 400
- 300
tau: 0.001
PPO:
actor_lr: 0.0003
batch_size: 64
critic_fit_params: null
critic_lr: 0.0003
critic_network: TRPONetwork
eps: 0.2
lam: 0.95
n_epochs_policy: 4
n_features: 32
n_steps_per_fit: 3000
preprocessors: null
SAC:
actor_lr: 0.0001
actor_network: SACActorNetwork
batch_size: 256
critic_lr: 0.0003
critic_network: SACCriticNetwork
initial_replay_size: 5000
lr_alpha: 0.0003
max_replay_size: 500000
n_features: 256
preprocessors: null
target_entropy: null
tau: 0.005
warmup_transitions: 10000
TD3:
actor_lr: 0.001
actor_network: TD3ActorNetwork
batch_size: 100
critic_lr: 0.001
critic_network: TD3CriticNetwork
initial_replay_size: 10000
max_replay_size: 1000000
n_features:
- 400
- 300
tau: 0.005
TRPO:
batch_size: 64
cg_damping: 0.01
cg_residual_tol: 1.0e-10
critic_fit_params: null
critic_lr: 0.003
critic_network: TRPONetwork
ent_coeff: 0.0
lam: 0.95
max_kl: 0.01
n_epochs_cg: 100
n_epochs_line_search: 10
n_features: 32
n_steps_per_fit: 3000
preprocessors: null
AntBulletEnv-v0¶
A2C:
actor_lr: 0.0007
batch_size: 64
critic_lr: 0.0007
critic_network: A2CNetwork
ent_coeff: 0.01
eps_actor: 0.003
eps_critic: 1.0e-05
max_grad_norm: 0.5
n_features: 64
preprocessors: null
DDPG:
actor_lr: 0.0001
actor_network: DDPGActorNetwork
batch_size: 128
critic_lr: 0.001
critic_network: DDPGCriticNetwork
initial_replay_size: 5000
max_replay_size: 1000000
n_features:
- 400
- 300
tau: 0.001
PPO:
actor_lr: 0.0003
batch_size: 64
critic_fit_params: null
critic_lr: 0.0003
critic_network: TRPONetwork
eps: 0.2
lam: 0.95
n_epochs_policy: 4
n_features: 32
n_steps_per_fit: 3000
preprocessors: null
SAC:
actor_lr: 0.0001
actor_network: SACActorNetwork
batch_size: 256
critic_lr: 0.0003
critic_network: SACCriticNetwork
initial_replay_size: 5000
lr_alpha: 0.0003
max_replay_size: 500000
n_features: 256
preprocessors: null
target_entropy: null
tau: 0.005
warmup_transitions: 10000
TD3:
actor_lr: 0.001
actor_network: TD3ActorNetwork
batch_size: 100
critic_lr: 0.001
critic_network: TD3CriticNetwork
initial_replay_size: 10000
max_replay_size: 1000000
n_features:
- 400
- 300
tau: 0.005
TRPO:
batch_size: 64
cg_damping: 0.01
cg_residual_tol: 1.0e-10
critic_fit_params: null
critic_lr: 0.003
critic_network: TRPONetwork
ent_coeff: 0.0
lam: 0.95
max_kl: 0.01
n_epochs_cg: 100
n_epochs_line_search: 10
n_features: 32
n_steps_per_fit: 3000
preprocessors: null
Value-Based Benchmarks¶
We provide the benchmarks for the following Finite Temporal-Difference algorithms:
- SARSA
- QLearning
- SpeedyQLearning
- WeightedQLearning
- DoubleQLearning
- SARSALambda
- QLambda
We provide the benchmarks for the following Continuous state Temporal-Difference algorithms:
- SARSALambdaContinuous
- TrueOnlineSARSALambda
We provide the benchmarks for the following DQN algorithms:
- DQN
- PrioritiziedDQN
- DoubleDQN
- AveragedDQN
- DuelingDQN
- MaxminDQN
- CategoricalDQN
- NoisyDQN
We consider the following environments in the benchmark
Finite State Environment Benchmark¶
Run Parameters | |
n_runs | 25 |
n_epochs | 100 |
n_steps | 100 |
n_steps_test | 1000 |
GridWorld¶
DoubleQLearning:
decay_eps: 0.5
decay_lr: 0.8
epsilon: ExponentialParameter
epsilon_test: 0.0
learning_rate: ExponentialParameter
QLambda:
decay_eps: 0.5
decay_lr: 0.8
epsilon: ExponentialParameter
epsilon_test: 0.0
lambda_coeff: 0.9
learning_rate: ExponentialParameter
trace: replacing
QLearning:
decay_eps: 0.5
decay_lr: 0.8
epsilon: ExponentialParameter
epsilon_test: 0.0
learning_rate: ExponentialParameter
SARSA:
decay_eps: 0.5
decay_lr: 0.8
epsilon: ExponentialParameter
epsilon_test: 0.0
learning_rate: ExponentialParameter
SARSALambda:
decay_eps: 0.5
decay_lr: 0.8
epsilon: ExponentialParameter
epsilon_test: 0.0
lambda_coeff: 0.9
learning_rate: ExponentialParameter
trace: replacing
SpeedyQLearning:
decay_eps: 0.5
decay_lr: 0.8
epsilon: ExponentialParameter
epsilon_test: 0.0
learning_rate: ExponentialParameter
WeightedQLearning:
decay_eps: 0.5
decay_lr: 0.8
epsilon: ExponentialParameter
epsilon_test: 0.0
learning_rate: ExponentialParameter
precision: 1000
sampling: true
Gym Environments Benchmarks¶
Run Parameters | |
n_runs | 25 |
n_epochs | 100 |
n_steps | 1000 |
n_episodes_test | 10 |
MountainCar-v0¶
Atari Environment Benchmark¶
Run Parameters | |
n_runs | 5 |
n_epochs | 200 |
n_steps | 250000 |
n_episodes_test | 125000 |
BreakoutDeterministic-v4¶
AveragedDQN:
batch_size: 32
initial_replay_size: 50000
lr: 0.0001
max_replay_size: 1000000
n_approximators: 10
n_steps_per_fit: 4
network: DQNNetwork
target_update_frequency: 2500
CategoricalDQN:
batch_size: 32
initial_replay_size: 50000
lr: 0.0001
max_replay_size: 1000000
n_atoms: 51
n_features: 512
n_steps_per_fit: 4
network: DQNFeatureNetwork
target_update_frequency: 2500
v_max: 10
v_min: -10
DQN:
batch_size: 32
initial_replay_size: 50000
lr: 0.0001
max_replay_size: 1000000
n_steps_per_fit: 4
network: DQNNetwork
target_update_frequency: 2500
DoubleDQN:
batch_size: 32
initial_replay_size: 50000
lr: 0.0001
max_replay_size: 1000000
n_steps_per_fit: 4
network: DQNNetwork
target_update_frequency: 2500
DuelingDQN:
batch_size: 32
initial_replay_size: 50000
lr: 0.0001
max_replay_size: 1000000
n_features: 512
n_steps_per_fit: 4
network: DQNFeatureNetwork
target_update_frequency: 2500
MaxminDQN:
batch_size: 32
initial_replay_size: 50000
lr: 0.0001
max_replay_size: 1000000
n_approximators: 3
n_steps_per_fit: 4
network: DQNNetwork
target_update_frequency: 2500
NoisyDQN:
batch_size: 32
initial_replay_size: 50000
lr: 0.0001
max_replay_size: 1000000
n_features: 512
n_steps_per_fit: 4
network: DQNFeatureNetwork
target_update_frequency: 2500
PrioritizedDQN:
batch_size: 32
initial_replay_size: 50000
lr: 0.0001
max_replay_size: 1000000
n_steps_per_fit: 4
network: DQNNetwork
target_update_frequency: 2500
Core functionality¶
Suite¶
-
class
BenchmarkSuite
(log_dir=None, log_id=None, use_timestamp=True, parallel=None, slurm=None)[source]¶ Bases:
object
Class to orchestrate the execution of multiple experiments.
-
__init__
(log_dir=None, log_id=None, use_timestamp=True, parallel=None, slurm=None)[source]¶ Constructor.
Parameters: - log_dir (str) – path to the log directory (Default: ./logs or /work/scratch/$USER)
- log_id (str) – log id (Default: benchmark[_YYYY-mm-dd-HH-MM-SS])
- use_timestamp (bool) – select if a timestamp should be appended to the log id
- parallel (dict, None) – parameters that are passed to the run_parallel method of the experiment
- slurm (dict, None) – parameters that are passed to the run_slurm method of the experiment
-
add_experiments
(environment_name, environment_builder_params, agent_names_list, agent_builders_params, **run_params)[source]¶ Add a set of experiments for the same environment to the suite.
Parameters: - environment_name (str) – name of the environment for the experiment (E.g. Gym.Pendulum-v0);
- environment_builder_params (dict) – parameters for the environment builder;
- agent_names_list (list) – list of names of the agents for the experiments;
- agent_builders_params (list) – list of dictionaries containing the parameters for the agent builder;
- run_params – Parameters that are passed to the run method of the experiment.
-
add_experiments_sweeps
(environment_name, environment_builder_params, agent_names_list, agent_builders_params, sweeps_list, **run_params)[source]¶ Add a set of experiments sweeps for the same environment to the suite.
Parameters: - environment_name (str) – name of the environment for the experiment (E.g. Gym.Pendulum-v0);
- environment_builder_params (dict) – parameters for the environment builder;
- agent_names_list (list) – list of names of the agents for the experiments;
- agent_builders_params (list) – list of dictionaries containing the parameters for the agent builder;
- sweeps_list (list) – list of dictionaries containing the parameter sweep to be executed;
- run_params – Parameters that are passed to the run method of the experiment.
-
add_environment
(environment_name, environment_builder_params, **run_params)[source]¶ Add an environment to the benchmarking suite.
Parameters: - environment_name (str) – name of the environment for the experiment (E.g. Gym.Pendulum-v0);
- environment_builder_params (dict) – parameters for the environment builder;
- run_params – Parameters that are passed to the run method of the experiment.
-
add_agent
(environment_name, agent_name, agent_params)[source]¶ Add an agent to the benchmarking suite.
Parameters: - environment_name (str) – name of the environment for the experiment (E.g. Gym.Pendulum-v0);
- agent_name (str) – name of the agent for the experiments;
- agent_params (list) – dictionary containing the parameters for the agent builder.
-
add_sweep
(environment_name, agent_name, agent_params, sweep_dict)[source]¶ Add an agent sweep to the benchmarking suite.
Parameters: - environment_name (str) – name of the environment for the experiment (E.g. Gym.Pendulum-v0);
- agent_name (str) – name of the agent for the experiments;
- agent_params (list) – dictionary containing the parameters for the agent builder;
- sweep_dict (dict) – dictionary with the sweep configurations.
-
save_parameters
()[source]¶ Save the experiment parameters in yaml files inside the parameters folder
-
Experiment¶
-
class
BenchmarkExperiment
(agent_builder, env_builder, logger)[source]¶ Bases:
object
Class to create and run an experiment using MushroomRL
-
__init__
(agent_builder, env_builder, logger)[source]¶ Constructor.
Parameters: - agent_builder (AgentBuilder) – instance of a specific agent builder;
- env_builder (EnvironmentBuilder) – instance of an environment builder;
- logger (BenchmarkLogger) – instance of a benchmark logger.
-
run
(exec_type='sequential', **run_params)[source]¶ Execute the experiment.
Parameters: - exec_type (str, 'sequential') – type of executing the experiment [sequential|parallel|slurm];
- **run_params – parameters for the selected execution type.
-
run_sequential
(n_runs, n_runs_completed=0, save_plot=True, **run_params)[source]¶ Execute the experiment sequential.
Parameters: - n_runs (int) – number of total runs of the experiment;
- n_runs_completed (int, 0) – number of completed runs of the experiment;
- save_plot (bool, True) – select if a plot of the experiment should be saved to the log directory;
- **run_params – parameters for executing a benchmark run.
-
run_parallel
(n_runs, n_runs_completed=0, threading=False, save_plot=True, max_concurrent_runs=None, **run_params)[source]¶ Execute the experiment in parallel threads.
Parameters: - n_runs (int) – number of total runs of the experiment;
- n_runs_completed (int, 0) – number of completed runs of the experiment;
- threading (bool, False) – select to use threads instead of processes;
- save_plot (bool, True) – select if a plot of the experiment should be saved to the log directory;
- max_concurrent_runs (int, -1) – maximum number of concurrent runs. By default it uses the number of cores;
- **run_params – parameters for executing a benchmark run.
-
run_slurm
(n_runs, n_runs_completed=0, aggregation_job=True, aggregate_hours=3, aggregate_minutes=0, aggregate_seconds=0, only_print=False, **run_params)[source]¶ Execute the experiment with SLURM.
Parameters: - n_runs (int) – number of total runs of the experiment;
- n_runs_completed (int, 0) – number of completed runs of the experiment;
- aggregation_job (bool, True) – select if an aggregation job should be scheduled;
- aggregate_hours (int, 3) – maximum number of hours for the aggregation job;
- aggregate_minutes (int, 0) – maximum number of minutes for the aggregation job;
- aggregate_seconds (int, 0) – maximum number of seconds for the aggregation job;
- only_print (bool, False) – if True, don’t launch the benchmarks, only print the submitted commands to the terminal;
- **run_params – parameters for executing a benchmark run.
-
extend_and_save_J
(J)[source]¶ Extend J with another datapoint and save the current state to the log directory.
-
extend_and_save_R
(R)[source]¶ Extend R with another datapoint and save the current state to the log directory.
-
extend_and_save_V
(V)[source]¶ Extend V with another datapoint and save the current state to the log directory.
-
Logger¶
-
class
BenchmarkLogger
(log_dir=None, log_id=None, use_timestamp=True)[source]¶ Bases:
mushroom_rl.core.logger.console_logger.ConsoleLogger
Class to handle all interactions with the log directory.
-
__init__
(log_dir=None, log_id=None, use_timestamp=True)[source]¶ Constructor.
Parameters: - log_dir (str, None) – path to the log directory, if not specified defaults to ./logs or to /work/scratch/$USER if the second directory exists;
- log_id (str, None) – log id, if not specified defaults to: benchmark[_YY-mm-ddTHH:MM:SS.zzz]);
- use_timestamp (bool, True) – select if a timestamp should be appended to the log id.
-
set_log_dir
(log_dir)[source]¶ Set the directory for logging.
Parameters: log_dir (str) – path of the directory.
-
set_log_id
(log_id, use_timestamp=True)[source]¶ Set the id of the logged folder.
Parameters: - log_id (str) – id of the logged folder;
- use_timestamp (bool, True) – whether to use the timestamp or not.
-
get_path
(filename='')[source]¶ Get the path of the given file. If no filename is given, it returns the path of the logging folder.
Parameters: filename (str, '') – the name of the file. Returns: The complete path of the logged file.
-
get_params_path
(filename='')[source]¶ Get the path of the parameters of the given file. If no filename is given, it returns the path of the parameters folder.
Parameters: filename (str, '') – the name of the file. Returns: The complete path of the logged file.
-
get_figure_path
(filename='', subfolder=None)[source]¶ Get the path of the figures of the given file. If no filename is given, it returns the path of the figures folder.
Parameters: - filename (str, '') – the name of the file;
- subfolder (None) – the name of a subfolder to add.
Returns: The complete path of the logged file.
-
exists_value_function
()[source]¶ Returns: True if the log of the value function exists, False otherwise.
-
save_best_agent
(agent)[source]¶ Save the best agent in the respective path.
Parameters: agent (object) – the agent to save.
-
save_last_agent
(agent)[source]¶ Save the last agent in the respective path.
Parameters: agent (object) – the agent to save.
-
save_environment_builder
(env_builder)[source]¶ Save the environment builder using the respective path.
Parameters: env_builder (str) – the environment builder to save.
-
save_agent_builder
(agent_builder)[source]¶ Save the agent builder using the respective path.
Parameters: agent_builder (str) – the agent builder to save.
-
save_config
(config)[source]¶ Save the config file using the respective path.
Parameters: config (str) – the config file to save.
-
save_stats
(stats)[source]¶ Save the statistic file using the respective path.
Parameters: stats (str) – the statistics file to save.
-
save_params
(env, params)[source]¶ Save the parameters file.
Parameters: - env (str) – the environment used;
- params (str) – the parameters file to save.
-
save_figure
(figure, figname, subfolder=None, as_pdf=False, transparent=True)[source]¶ Save the figure file using the respective path.
Parameters: - figure (object) – the figure to save;
- figname (str) – the name of the figure;
- subfolder (str, None) – optional subfolder where to save the figure;
- as_pdf (bool, False) – whether to save the figure in PDF or not;
- transparent (bool, True) – whether the figure should be transparent or not.
-
Visualizer¶
-
class
BenchmarkVisualizer
(logger, data=None, has_entropy=None, has_value=None, id=1)[source]¶ Bases:
object
Class to handle all visualizations of the experiment.
-
plot_counter
= 0¶
-
__init__
(logger, data=None, has_entropy=None, has_value=None, id=1)[source]¶ Constructor.
Parameters: - logger (BenchmarkLogger) – logger to be used;
- data (dict, None) – dictionary with data points for visualization;
- has_entropy (bool, None) – select if entropy is available for the algorithm.
-
is_data_persisted
¶ Check if data was passed as dictionary or should be read from log directory.
-
save_report
(file_name='report_plot')[source]¶ Method to save an image of a report of the training metrics from a performend experiment.
-
show_report
()[source]¶ Method to show a report of the training metrics from a performend experiment.
-
-
class
BenchmarkSuiteVisualizer
(logger, is_sweep, color_cycle=None, y_limit=None, legend=None)[source]¶ Bases:
object
Class to handle visualization of a benchmark suite.
-
plot_counter
= 0¶
-
__init__
(logger, is_sweep, color_cycle=None, y_limit=None, legend=None)[source]¶ Constructor.
Parameters: - logger (BenchmarkLogger) – logger to be used;
- is_sweep (bool) – whether the benchmark is a parameter sweep.
- color_cycle (dict, None) – dictionary with colors to be used for each algorithm;
- y_limit (dict, None) – dictionary with environment specific plot limits.
- legend (dict, None) – dictionary with environment specific legend parameters.
-
get_boxplot
(env, metric_type, data_type, selected_alg=None)[source]¶ Create boxplot with matplotlib for a given metric.
Parameters: - env (str) – The environment name;
- metric_type (str) – The metric to compute.
Returns: A figure with the desired boxplot of the given metric.
-
save_reports
(as_pdf=True, transparent=True, alg_sweep=False)[source]¶ Method to save an image of a report of the training metrics from a performed experiment.
Parameters: - as_pdf (bool, True) – whether to save the reports as pdf files or png;
- transparent (bool, True) – If true, the figure background is transparent and not white;
- alg_sweep (bool, False) – If true, thw method will generate a separate figure for each algorithm sweep.
-
save_boxplots
(as_pdf=True, transparent=True, alg_sweep=False)[source]¶ Method to save an image of a report of the training metrics from a performed experiment.
Parameters: - as_pdf (bool, True) – whether to save the reports as pdf files or png;
- transparent (bool, True) – If true, the figure background is transparent and not white;
- alg_sweep (bool, False) – If true, thw method will generate a separate figure for each algorithm sweep.
-
Builders¶
-
class
EnvironmentBuilder
(env_name, env_params)[source]¶ Bases:
object
Class to spawn instances of a MushroomRL environment
-
__init__
(env_name, env_params)[source]¶ Constructor
Parameters: - env_name – name of the environment to build;
- env_params – required parameters to build the specified environment.
-
-
class
AgentBuilder
(n_steps_per_fit=None, n_episodes_per_fit=None, compute_policy_entropy=True, compute_entropy_with_states=False, compute_value_function=True, preprocessors=None)[source]¶ Bases:
object
Base class to spawn instances of a MushroomRL agent
-
__init__
(n_steps_per_fit=None, n_episodes_per_fit=None, compute_policy_entropy=True, compute_entropy_with_states=False, compute_value_function=True, preprocessors=None)[source]¶ Initialize AgentBuilder
-
set_preprocessors
(preprocessors)[source]¶ Set preprocessor for the specific AgentBuilder
Parameters: preprocessors – list of preprocessor classes.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
Policy Search Builders¶
Policy Gradient¶
-
class
PolicyGradientBuilder
(n_episodes_per_fit, optimizer, **kwargs)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Policy Gradient Methods. The current builder uses a state dependant gaussian with diagonal standard deviation and linear mean.
-
__init__
(n_episodes_per_fit, optimizer, **kwargs)[source]¶ Constructor.
Parameters: - optimizer (Optimizer) – optimizer to be used by the policy gradient algorithm;
- **kwargs – others algorithms parameters.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
-
class
REINFORCEBuilder
(n_episodes_per_fit, optimizer, **kwargs)[source]¶ Bases:
mushroom_rl_benchmark.builders.policy_search.policy_gradient.PolicyGradientBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.policy_search.policy_gradient.reinforce.REINFORCE
-
-
class
GPOMDPBuilder
(n_episodes_per_fit, optimizer, **kwargs)[source]¶ Bases:
mushroom_rl_benchmark.builders.policy_search.policy_gradient.PolicyGradientBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.policy_search.policy_gradient.gpomdp.GPOMDP
-
-
class
eNACBuilder
(n_episodes_per_fit, optimizer, **kwargs)[source]¶ Bases:
mushroom_rl_benchmark.builders.policy_search.policy_gradient.PolicyGradientBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.policy_search.policy_gradient.enac.eNAC
-
Black-Box optimization¶
-
class
BBOBuilder
(n_episodes_per_fit, **kwargs)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Black Box optimization methods. The current builder uses a simple deterministic linear policy and gaussian Diagonal distribution.
-
__init__
(n_episodes_per_fit, **kwargs)[source]¶ Constructor.
Parameters: - optimizer (Optimizer) – optimizer to be used by the policy gradient algorithm;
- **kwargs – others algorithms parameters.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
-
class
PGPEBuilder
(n_episodes_per_fit, optimizer)[source]¶ Bases:
mushroom_rl_benchmark.builders.policy_search.black_box_optimization.BBOBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.policy_search.black_box_optimization.pgpe.PGPE
-
-
class
RWRBuilder
(n_episodes_per_fit, beta)[source]¶ Bases:
mushroom_rl_benchmark.builders.policy_search.black_box_optimization.BBOBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.policy_search.black_box_optimization.rwr.RWR
-
-
class
REPSBuilder
(n_episodes_per_fit, eps)[source]¶ Bases:
mushroom_rl_benchmark.builders.policy_search.black_box_optimization.BBOBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.policy_search.black_box_optimization.reps.REPS
-
-
class
ConstrainedREPSBuilder
(n_episodes_per_fit, eps, kappa)[source]¶ Bases:
mushroom_rl_benchmark.builders.policy_search.black_box_optimization.BBOBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.policy_search.black_box_optimization.constrained_reps.ConstrainedREPS
-
Value Based Builders¶
Temporal Difference¶
-
class
TDFiniteBuilder
(learning_rate, epsilon, epsilon_test, **alg_params)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for a generic TD algorithm (for finite states).
-
__init__
(learning_rate, epsilon, epsilon_test, **alg_params)[source]¶ Constructor.
Parameters: - epsilon (Parameter) – exploration coefficient for learning;
- epsilon_test (Parameter) – exploration coefficient for test.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
-
class
QLearningBuilder
(learning_rate, epsilon, epsilon_test)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_finite.TDFiniteBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.value.td.q_learning.QLearning
-
-
class
SARSABuilder
(learning_rate, epsilon, epsilon_test)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_finite.TDFiniteBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.value.td.sarsa.SARSA
-
-
class
SpeedyQLearningBuilder
(learning_rate, epsilon, epsilon_test)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_finite.TDFiniteBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.value.td.speedy_q_learning.SpeedyQLearning
-
-
class
DoubleQLearningBuilder
(learning_rate, epsilon, epsilon_test)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_finite.TDFiniteBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.value.td.double_q_learning.DoubleQLearning
-
-
class
WeightedQLearningBuilder
(learning_rate, epsilon, epsilon_test, sampling, precision)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_finite.TDFiniteBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.value.td.weighted_q_learning.WeightedQLearning
-
-
class
TDTraceBuilder
(learning_rate, epsilon, epsilon_test, lambda_coeff, trace)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_finite.TDFiniteBuilder
Builder for TD algorithms with eligibility traces and finite states.
-
class
SARSALambdaBuilder
(learning_rate, epsilon, epsilon_test, lambda_coeff, trace)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_trace.TDTraceBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.value.td.sarsa_lambda.SARSALambda
-
-
class
QLambdaBuilder
(learning_rate, epsilon, epsilon_test, lambda_coeff, trace)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_trace.TDTraceBuilder
-
alg_class
¶ alias of
mushroom_rl.algorithms.value.td.q_lambda.QLambda
-
-
class
SarsaLambdaContinuousBuilder
(policy, approximator, learning_rate, lambda_coeff, epsilon, epsilon_test, n_tilings, n_tiles)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_continuous.TDContinuousBuilder
AgentBuilder for Sarsa(Lambda) Continuous. Using tiles as function approximator.
-
__init__
(policy, approximator, learning_rate, lambda_coeff, epsilon, epsilon_test, n_tilings, n_tiles)[source]¶ Constructor.
Parameters: approximator (class) – Q-function approximator.
-
-
class
TrueOnlineSarsaLambdaBuilder
(policy, learning_rate, lambda_coeff, epsilon, epsilon_test, n_tilings, n_tiles)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.td.td_continuous.TDContinuousBuilder
AgentBuilder for True Online Sarsa(Lambda) Continuous. Using tiles as function approximator.
-
__init__
(policy, learning_rate, lambda_coeff, epsilon, epsilon_test, n_tilings, n_tiles)[source]¶ Constructor.
-
DQN¶
-
class
DQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Deep Q-Network (DQN).
-
__init__
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Constructor.
Parameters: - policy (Policy) – policy class;
- approximator (dict) – Q-function approximator;
- approximator_params (dict) – parameters of the Q-function approximator;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 1) – number of steps per fit.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
set_eval_mode
(agent, eval)[source]¶ Set the eval mode for the agent. This function can be overwritten by any agent builder to setup specific evaluation mode for the agent.
Parameters: - agent (Agent) – the considered agent;
- eval (bool) – whether to set eval mode (true) or learn mode.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
DoubleDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.dqn.DQNBuilder
-
class
AveragedDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, n_approximators=10, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
PrioritizedDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
DuelingDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
MaxminDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_steps_per_fit=1, n_approximators=3, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
NoisyDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
CategoricalDQNBuilder
(policy, approximator, approximator_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.value.dqn.dqn.DQNBuilder
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
classmethod
default
(lr=0.0001, network=<class 'mushroom_rl_benchmark.builders.network.dqn_network.DQNFeatureNetwork'>, initial_replay_size=50000, max_replay_size=1000000, batch_size=32, target_update_frequency=2500, n_features=512, n_steps_per_fit=1, v_min=-10, v_max=10, n_atoms=51, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
Actor Critic Builders¶
Classic AC¶
-
class
StochasticACBuilder
(std_0, alpha_theta, alpha_v, lambda_par, n_tilings, n_tiles, **kwargs)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
Builder for the stochastic actor critic algorithm. Using linear approximator with tiles for mean, standard deviation and value function approximator. The value function approximator also uses a bias term.
-
__init__
(std_0, alpha_theta, alpha_v, lambda_par, n_tilings, n_tiles, **kwargs)[source]¶ Constructor.
Parameters: - std_0 (float) – initial standard deviation;
- alpha_theta (Parameter) – Learning rate for the policy;
- alpha_v (Parameter) – Learning rate for the value function;
- n_tilings (int) – number of tilings to be used as approximator;
- n_tiles (int) – number of tiles for each state space dimension.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
-
class
COPDAC_QBuilder
(std_exp, std_eval, alpha_theta, alpha_omega, alpha_v, n_tilings, n_tiles, **kwargs)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
Builder for the COPDAQ_Q actor critic algorithm. Using linear approximator with tiles for the mean and value function approximator.
-
__init__
(std_exp, std_eval, alpha_theta, alpha_omega, alpha_v, n_tilings, n_tiles, **kwargs)[source]¶ Constructor.
Parameters: - std_exp (float) – exploration standard deviation;
- std_eval (float) – evaluation standard deviation;
- alpha_theta (Parameter) – Learning rate for the policy;
- alpha_omega (Parameter) – Learning rate for the
- alpha_v (Parameter) – Learning rate for the value function;
- n_tilings (int) – number of tilings to be used as approximator;
- n_tiles (int) – number of tiles for each state space dimension.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
set_eval_mode
(agent, eval)[source]¶ Set the eval mode for the agent. This function can be overwritten by any agent builder to setup specific evaluation mode for the agent.
Parameters: - agent (Agent) – the considered agent;
- eval (bool) – whether to set eval mode (true) or learn mode.
-
Deep AC¶
-
class
A2CBuilder
(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=5, preprocessors=None)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Advantage Actor Critic algorithm (A2C)
-
__init__
(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=5, preprocessors=None)[source]¶ Constructor.
Parameters: - policy_params (dict) – parameters for the policy;
- actor_optimizer (dict) – parameters for the actor optimizer;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 5) – number of steps per fit;
- preprocessors (list, None) – list of preprocessors.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(actor_lr=0.0007, critic_lr=0.0007, eps_actor=0.003, eps_critic=1e-05, batch_size=64, max_grad_norm=0.5, ent_coeff=0.01, critic_network=<class 'mushroom_rl_benchmark.builders.network.a2c_network.A2CNetwork'>, n_features=64, preprocessors=None, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
DDPGBuilder
(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Deep Deterministic Policy Gradient algorithm (DDPG)
-
__init__
(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]¶ Constructor.
Parameters: - policy_class (Policy) – policy class;
- policy_params (dict) – parameters for the policy;
- actor_params (dict) – parameters for the actor;
- actor_optimizer (dict) – parameters for the actor optimizer;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 1) – number of steps per fit.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(actor_lr=0.0001, actor_network=<class 'mushroom_rl_benchmark.builders.network.ddpg_network.DDPGActorNetwork'>, critic_lr=0.001, critic_network=<class 'mushroom_rl_benchmark.builders.network.ddpg_network.DDPGCriticNetwork'>, initial_replay_size=500, max_replay_size=50000, batch_size=64, n_features=[80, 80], tau=0.001, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
PPOBuilder
(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Proximal Policy Optimization algorithm (PPO)
-
__init__
(policy_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]¶ Constructor.
Parameters: - policy_params (dict) – parameters for the policy;
- actor_optimizer (dict) – parameters for the actor optimizer;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 3000) – number of steps per fit;
- preprocessors (list, None) – list of preprocessors.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(eps=0.2, ent_coeff=0.0, n_epochs_policy=4, actor_lr=0.0003, critic_lr=0.0003, critic_fit_params=None, critic_network=<class 'mushroom_rl_benchmark.builders.network.trpo_network.TRPONetwork'>, lam=0.95, batch_size=64, n_features=32, n_steps_per_fit=3000, std_0=1.0, preprocessors=None, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
SACBuilder
(actor_mu_params, actor_sigma_params, actor_optimizer, critic_params, alg_params, n_q_samples=100, n_steps_per_fit=1, preprocessors=None)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder Soft Actor-Critic algorithm (SAC)
-
__init__
(actor_mu_params, actor_sigma_params, actor_optimizer, critic_params, alg_params, n_q_samples=100, n_steps_per_fit=1, preprocessors=None)[source]¶ Constructor.
Parameters: - actor_mu_params (dict) – parameters for actor mu;
- actor_sigma_params (dict) – parameters for actor sigma;
- actor_optimizer (dict) – parameters for the actor optimizer;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_q_samples (int, 100) – number of samples to compute value function;
- n_steps_per_fit (int, 1) – number of steps per fit;
- preprocessors (list, None) – list of preprocessors.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(actor_lr=0.0003, actor_network=<class 'mushroom_rl_benchmark.builders.network.sac_network.SACActorNetwork'>, critic_lr=0.0003, critic_network=<class 'mushroom_rl_benchmark.builders.network.sac_network.SACCriticNetwork'>, initial_replay_size=64, max_replay_size=50000, n_features=64, warmup_transitions=100, batch_size=64, tau=0.005, lr_alpha=0.003, preprocessors=None, target_entropy=None, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
TD3Builder
(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Twin Delayed DDPG algorithm (TD3)
-
__init__
(policy_class, policy_params, actor_params, actor_optimizer, critic_params, alg_params, n_steps_per_fit=1)[source]¶ Constructor.
Parameters: - policy_class (Policy) – policy class;
- policy_params (dict) – parameters for the policy;
- actor_params (dict) – parameters for the actor;
- actor_optimizer (dict) – parameters for the actor optimizer;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 1) – number of steps per fit.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(actor_lr=0.0001, actor_network=<class 'mushroom_rl_benchmark.builders.network.td3_network.TD3ActorNetwork'>, critic_lr=0.001, critic_network=<class 'mushroom_rl_benchmark.builders.network.td3_network.TD3CriticNetwork'>, initial_replay_size=500, max_replay_size=50000, batch_size=64, n_features=[80, 80], tau=0.001, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
-
class
TRPOBuilder
(policy_params, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]¶ Bases:
mushroom_rl_benchmark.builders.agent_builder.AgentBuilder
AgentBuilder for Trust Region Policy optimization algorithm (TRPO)
-
__init__
(policy_params, critic_params, alg_params, n_steps_per_fit=3000, preprocessors=None)[source]¶ Constructor.
Parameters: - policy_params (dict) – parameters for the policy;
- critic_params (dict) – parameters for the critic;
- alg_params (dict) – parameters for the algorithm;
- n_steps_per_fit (int, 3000) – number of steps per fit;
- preprocessors (list, None) – list of preprocessors.
-
build
(mdp_info)[source]¶ Build and return the AgentBuilder
Parameters: mdp_info (MDPInfo) – information about the environment.
-
compute_Q
(agent, states)[source]¶ Compute the Q Value for an AgentBuilder
Parameters: - agent (Agent) – the considered agent;
- states (np.ndarray) – the set of states over which we need to compute the Q function.
-
classmethod
default
(critic_lr=0.0003, critic_network=<class 'mushroom_rl_benchmark.builders.network.trpo_network.TRPONetwork'>, max_kl=0.01, ent_coeff=0.0, lam=0.95, batch_size=64, n_features=32, critic_fit_params=None, n_steps_per_fit=3000, n_epochs_line_search=10, n_epochs_cg=100, cg_damping=0.01, cg_residual_tol=1e-10, std_0=1.0, preprocessors=None, use_cuda=False, get_default_dict=False)[source]¶ Create a default initialization for the specific AgentBuilder and return it
-
Networks¶
-
class
A2CNetwork
(input_shape, output_shape, n_features, **kwargs)[source]¶ Bases:
torch.nn.modules.module.Module
-
__init__
(input_shape, output_shape, n_features, **kwargs)[source]¶ Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
forward
(state, **kwargs)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
DDPGCriticNetwork
(input_shape, output_shape, n_features, **kwargs)[source]¶ Bases:
torch.nn.modules.module.Module
-
__init__
(input_shape, output_shape, n_features, **kwargs)[source]¶ Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
forward
(state, action)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
DDPGActorNetwork
(input_shape, output_shape, **kwargs)[source]¶ Bases:
torch.nn.modules.module.Module
-
__init__
(input_shape, output_shape, **kwargs)[source]¶ Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
forward
(state)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
SACCriticNetwork
(input_shape, output_shape, n_features, **kwargs)[source]¶ Bases:
torch.nn.modules.module.Module
-
__init__
(input_shape, output_shape, n_features, **kwargs)[source]¶ Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
forward
(state, action)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
SACActorNetwork
(input_shape, output_shape, n_features, **kwargs)[source]¶ Bases:
torch.nn.modules.module.Module
-
__init__
(input_shape, output_shape, n_features, **kwargs)[source]¶ Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
forward
(state)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
TD3CriticNetwork
(input_shape, output_shape, n_features, **kwargs)[source]¶ Bases:
torch.nn.modules.module.Module
-
__init__
(input_shape, output_shape, n_features, **kwargs)[source]¶ Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
forward
(state, action)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
TD3ActorNetwork
(input_shape, output_shape, n_features, **kwargs)[source]¶ Bases:
torch.nn.modules.module.Module
-
__init__
(input_shape, output_shape, n_features, **kwargs)[source]¶ Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
forward
(state)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
TRPONetwork
(input_shape, output_shape, n_features, **kwargs)[source]¶ Bases:
torch.nn.modules.module.Module
-
__init__
(input_shape, output_shape, n_features, **kwargs)[source]¶ Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
forward
(state, **kwargs)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
Experiment¶
-
exec_run
(agent_builder, env_builder, n_epochs, n_steps=None, n_episodes=None, n_steps_test=None, n_episodes_test=None, seed=None, save_agent=False, quiet=True, **kwargs)[source]¶ Function that handles the execution of an experiment run.
Parameters: - agent_builder (AgentBuilder) – agent builder to spawn an agent;
- env_builder (EnvironmentBuilder) – environment builder to spawn an environment;
- n_epochs (int) – number of epochs;
- n_steps (int, None) – number of steps per epoch;
- n_episodes (int, None) – number of episodes per epoch;
- n_steps_test (int, None) – number of steps for testing;
- n_episodes_test (int, None) – number of episodes for testing;
- seed (int, None) – the seed;
- save_agent (bool, False) – select if the agent should be logged or not;
- quiet (bool, True) – select if run should print execution information.
-
compute_metrics
(core, eval_params, agent_builder, env_builder)[source]¶ Function to compute the metrics.
Parameters: - eval_params (dict) – parameters for running the evaluation;
- agent_builder (AgentBuilder) – the agent builder;
- env_builder (EnvironmentBuilder) – environment builder to spawn an environment;
-
print_metrics
(logger, epoch, J, R, V, E)[source]¶ Function that pretty prints the metrics on the standard output.
Parameters: - logger (Logger) – MushroomRL logger object;
- epoch (int) – the current epoch;
- J (float) – the current value of J;
- R (float) – the current value of R;
- V (float) – the current value of V;
- E (float) – the current value of E (Set None if not defined).
Slurm utilities¶
-
aggregate_results
(res_dir, res_id, console_log_dir=None)[source]¶ Function to aggregate the benchmark results from running in SLURM mode.
Parameters: - res_dir (str) – path to the result directory;
- res_id (str) – log id of the result directory;
- console_log_dir (str,None) – path to be used to log console.
-
read_arguments_run
(arg_string=None)[source]¶ Parse the arguments for the run script.
Parameters: arg_string (str, None) – pass the argument string.
-
read_arguments_aggregate
(arg_string=None)[source]¶ Parse the arguments for the aggregate script.
Parameters: arg_string (str, None) – pass the argument string.
-
create_slurm_script
(slurm_path, slurm_script_name='slurm.sh', **slurm_params)[source]¶ Function to create a slurm script in a specific directory
Parameters: - slurm_path (str) – path to locate the slurm script;
- slurm_script_name (str, slurm.sh) – name of the slurm script;
- **slurm_params – parameters for generating the slurm file content.
Returns: The path to the slurm script.
-
generate_slurm
(exp_name, exp_dir_slurm, python_file, gres=None, project_name=None, n_exp=1, max_concurrent_runs=None, memory=2000, hours=24, minutes=0, seconds=0)[source]¶ Function to generate the slurm file content.
Parameters: - exp_name (str) – name of the experiment;
- exp_dir_slurm (str) – directory where the slurm log files are located;
- python_file (str) – path to the python file that should be executed;
- gres (str, None) – request cluster resources. E.g. to add a GPU in the IAS cluster specify gres=’gpu:rtx2080:1’;
- project_name (str, None) – name of the slurm project;
- n_exp (int, 1) – number of experiments in the slurm array;
- max_concurrent_runs (int, None) – maximum number of runs that should be executed in parallel on the SLURM cluster;
- memory (int, 2000) – memory limit in mega bytes (MB) for the slurm jobs;
- hours (int, 24) – maximum number of execution hours for the slurm jobs;
- minutes (int, 0) – maximum number of execution minutes for the slurm jobs;
- seconds (int, 0) – maximum number of execution seconds for the slurm jobs.
Returns: The slurm script as string.
Utils¶
-
get_init_states
(dataset)[source]¶ Get the initial states of a MushroomRL dataset
Parameters: dataset (Dataset) – a MushroomRL dataset.
-
extract_arguments
(args, method)[source]¶ Extract the arguments from a dictionary that fit to a methods parameters.
Parameters: - args (dict) – dictionary of arguments;
- method (function) – method for which the arguments should be extracted.
-
object_to_primitive
(obj)[source]¶ Converts an object into a string using the class name
Parameters: obj – the object to convert. Returns: A string representing the object.
-
dictionary_to_primitive
(data)[source]¶ Function that converts a dictionary by transforming any objects inside into strings
Parameters: data (dict) – the dictionary to convert. Returns: The converted dictionary.
-
get_mean_and_confidence
(data)[source]¶ Compute the mean and 95% confidence interval
Parameters: data (np.ndarray) – Array of experiment data of shape (n_runs, n_epochs). Returns: The mean of the dataset at each epoch along with the confidence interval.
-
plot_mean_conf
(data, ax, color='blue', line='-', facecolor=None, alpha=0.4, label=None)[source]¶ Method to plot mean and confidence interval for data on pyplot axes.
-
build_sweep_list
(algs, sweep_conf, base_name='c_')[source]¶ Build the sweep list, from a compact dictionary specification, for every considered algorithm.
Parameters: - algs (list) – list of algorithms to be considered;
- sweep_conf (dict) – dictionary with a compact sweep configuration for every algorithm;
- base_name (str, 'c_') – base name for the sweep configuiration.
Returns: The sweep list to be used with the suite.
-
build_sweep_dict
(base_name='c_', **kwargs)[source]¶ Build the sweep dictionary, from a set of variable specifications.
Parameters: - base_name (str, 'c_') – base name for the sweep configuiration;
- **kwargs – the parameter specifications for the sweep.
Returns: The sweep dictionary, where the key is the sweep name and the value is a dictionary with the sweep parameters.