Puppy

Introduction

Puppy experiments are executed within the [Webots] simulator. Since this module is linked to PuPy through the class ActorCritic, this is the native approach. For the purpose of Puppy, an adapted Actor-Critic is implemented in PuppyHDP, handling Puppy specifics. It inherits from CollectingADHDP, hence can be used in the same fashion.

Simulation with [Webots] is often time consuming. Therefore, a method is provided to collect data in the simulation and replay it later. This is implemented through OfflineCollector and puppy.offline_playback(). An example of how to approach this is documented in Puppy offline workflow.

For puppy experiment analysis, snapshot functions are implemented in a similar fashion as for the ePuck robot. However, for Puppy, the action is assumed to be two dimensional. The action snapshot is hence an image (2d plot). Through puppy_plot_action(), the figure is plotted at a specific state (identified by the epoch index of a recorded episode). Furthermore, the overall trajectory and the location of the inspected states can be plotted through puppy_plot_inspected_trajectory(). This method can either be used at some isolated states (with the mentioned methods) or in a video-like fashion. For the latter case, PuppyActionVideo implements the necessary routines.

The environment plotting can be managed through the functions puppy_plot_linetarget(), puppy_plot_locationtarget() and puppy_plot_landmarks(), dependent on the training target (as defined in Puppy Plants). For plotting the robot’s trajectory the functions puppy_plot_trajectory() and puppy_plot_all_trajectories() can be used.

Reference

class HDPy.puppy.PuppyHDP(*args, **kwargs)

Bases: HDPy.hdp.CollectingADHDP

ADHDP subtype for simulations using Puppy in webots.

This class adds some code considering restarts of Puppy. It adds an optional argument tumbled_reward. The reward will be forced to this value after the supervisor detected tumbling. If None (the default) is used, the reward remains unchanged.

new_episode()

After restarting, reset the tumbled values and start the new episode.

init_episode(epoch, time_start_ms, time_end_ms, step_size_ms)

Initial behaviour (after reset)

Note

Assuming identical initial trajectories, the initial state is the same - and thus doesn’t matter. Non-identical initial trajectories will result in non-identical behaviour, therefore the initial state should be different (initial state w.r.t. start of learning). Due to this, the critic is already updated in the initial trajectory.

_step(s_curr, s_next, a_curr, reward)

Ensure the tumbled reward and initiate behaviour between restarts. The step of the parent is then invoked.

event_handler(robot, epoch, current_time, msg)

Handle messages from the supervisor. Messages are expected when the robot has tumbled and thus the robot has to be reset.

class HDPy.puppy.OfflineCollector(*args, **kwargs)

Bases: HDPy.hdp.CollectingADHDP

Collect sensor data for Puppy in webots, such that it can be reused later to train a critic offline.

Note that in contrast to CollectingADHDP, some structures are not required (reservoir, plant). They will be set to stubs, hence don’t need to be passed.

Some extra metadata is stored in the datafile, which allows processing of the experiment in an offline fashion through the function puppy.offline_playback().

new_episode()

After restarting, reset the tumbled values and start the new episode.

__call__(epoch, time_start_ms, time_end_ms, step_size_ms)

Store the sensor measurements of an epoch in the datafile as well as relevant metadata. The robot detects if the simulation was reverted and if it has tumbled (through the supervisor message). Other guards are not considered, as none are covered by PuppyHDP.

_next_action_hook(a_next)

Defines the action sampling policy of the offline data gathering. Note that this policy is very relevant to later experiments, hence this methods should be overloaded (although a default policy is provided).

event_handler(robot, epoch, current_time, msg)

Handle messages from the supervisor. Messages are expected when the robot has tumbled and thus the robot has to be reset.

HDPy.puppy.offline_playback(pth_data, critic, samples_per_action, ms_per_step, episode_start=None, episode_end=None, min_episode_len=0)

Simulate an experiment run for the critic by using offline data. The data has to be collected in webots, using the respective robot and supervisor. Note that the behaviour of the simulation should match what’s expected by the critic. The critic is fed the sensor data, in order. Of course, it can’t react to it since the next action is predefined.

Additional to the sensor fields, the ‘tumbling’ dataset is expected which indicates, if and when the robot has tumbled. It is used such that the respective signals can be sent to the critic.

The critic won’t store any sensory data again.

pth_data
Path to the datafile with the sensory information (HDF5).
critic
PuppyHDP instance.
samples_per_action
Number of samples per control step. Must correspond to the data.
ms_per_step
Sensor sampling period.
episode_start
Defines a lower limit on the episode number. Passed as int, is with respect to the episode index, not its identifier.
episode_stop
Defines an upper limit on the episode number. Passed as int, is with respect to the episode index, not its identifier.
min_episode_len
Only pick episodes longer than this threshold.
HDPy.puppy.plot_trajectory(analysis, axis, episode, step_width=1, offset=0, legend=True, **kwargs)

Plot the trajectory of an episode

HDPy.puppy.plot_all_trajectories(analysis, axis, step_width=1, **kwargs)

Plot all trajectories in analysis into axis.

HDPy.puppy.plot_linetarget(axis, origin=(2.0, 0.0), direction=(1.0, 1.0), range_=(-5.0, 5.0))

Plot a line given by origin and direction. The range_ may be supplid, which corresponds to the length of the line (from the origin).

HDPy.puppy.plot_locationtarget(axis, target=(4.0, 4.0), distance=0.5, **kwargs)

Plot the target location with a sphere of radius distance into axis to mark the target location. kwargs will be passed to all pylab calls.

HDPy.puppy.plot_landmarks(axis, landmarks, **kwargs)

Plot markers at landmark locations in axis.

HDPy.puppy.plot_action(analysis, episode, critic, reservoir, inspect_epochs, actions_range_x, actions_range_y, step_width, obs_offset, epoch_actions=None)

Along a trajectory episode of a conducted experiment given by analysis, plot the predicted return over a 2D-action at some fixed states. For each of the states (given by inspect_epochs), a figure is created including the return prediction as an image (i.e. 2D).

analysis
Analysis instance containing the experimental data.
episode
Episode which is analysed.
critic
critic() instance to be used for evaluation for a certain critic input (action and state).
reservoir
Reservoir to be used. Note that this must be the same instance as used in critic.
inspect_epochs
Epochs numbers for which the predicted actions should be plotted.
actions_range_x
Action range in the first dimension. The return is predicted for any combination of actions_range_x and actions_range_y.
actions_range_y
Action range in the second dimension. The return is predicted for any combination of actions_range_x and actions_range_y.
step_width
Number of observations per epoch. In terms of PuPy, this is the control period over the sensor polling period.
obs_offset
Offset between robot observations (e.g. GPS) and reinforcement learning data (i.e. actions). For offline data, the offset is one epoch (i.e. step_width), for online data, it is zero.
epoch_actions
A list of actually executed actions (as tuple), for each inspected epoch. The action is indicated in the plot by a marker. The argument or list items may be None, in which case nothing is plotted.
HDPy.puppy.plot_inspected_trajectory(analysis, episode_idx, step_width, axis, inspect_epochs, obs_offset)

Plot the robot trajectory of the experiment episode_idx found in analysis and a marker at all inspect_epochs. This function was created to support puppy_plot_action() by giving an overview over the whole path.

axis
plotting canvas.
step_width
Number of observations per epoch. In terms of PuPy, this is the control period over the sensor polling period.
obs_offset
Offset between robot observations (e.g. GPS) and reinforcement learning data (i.e. actions). For offline data, the offset is one epoch (i.e. step_width), for online data, it is zero.
class HDPy.puppy.ActionVideo(data, critic, reservoir, actions_range_x, actions_range_y, step_width, obs_offset, with_actions=True)

Set up a structure such that the predicted return over 2D actions can be successively plotted in the same figure.

Todo

The selected action isn’t displayed correctly (offset?)

data
Observed data of the underlying experiment. Usually a H5CombinedGroup or [HDF5] group (e.g. through Analysis).
critic
critic() instance to be used for evaluation for a certain critic input (action and state).
reservoir
Reservoir to be used. Note that this must be the same instance as used in critic.
actions_range_x
Action range in the first dimension. The return is predicted for any combination of actions_range_x and actions_range_y.
actions_range_y
Action range in the second dimension. The return is predicted for any combination of actions_range_x and actions_range_y.
step_width
Number of observations per epoch. In terms of PuPy, this is the control period over the sensor polling period.
obs_offset
Offset between robot observations (e.g. GPS) and reinforcement learning data (i.e. actions). For offline data, the offset is one epoch (i.e. step_width), for online data, it is zero.
with_actions
Plot markers and lines between them which represent the actually selected action.
draw_init(fig=None)

Set up the initial video figure. A new figure is created unless one is provided in fig.

draw_step(epoch, actions=None)

Update the figure by showing the action plot for epoch. If with_actions is set, a list of actions to be plotted should be present in actions.

draw_trajectory(loc_marker, epoch_idx)

Update the marker of the current state in a trajectory plot. The current state is read from data at epoch_idx, the marker plot given in loc_marker.

Table Of Contents

Previous topic

ePuck

Next topic

Puppy offline workflow

This Page