Reward Functions

🤗 Simulate provides a system to define simple and complex reward functions. This is achieved through the combination of “leaf” reward functions, such as Sparse and Dense rewards functions, and predicate reward functions.

(LINK TO REWARD PREDICATE DIAGRAM)

Reward functions can be parameterized with a variety of distance metrics. Currently “euclidean”, “cosine” and “best_euclidean” are supported. Through the combination of predicates and leaf rewards, complex reward functions can be created. A good example of the is the Move Boxes example.

The following “leaf” rewards are available in Simulate:

“dense”: A reward that is non-zero at every time-step.
“sparse”: A reward that is triggered by the proximity of another object.
“timeout”: A timeout reward that is triggered after a certain number of time-steps.
“see”: Triggered when an object is in the field of view of an Actor.
“angle_to”: Triggered when the angle between two objects and a certain direction is less that a threshold.

The “leaf” reward functions can be combined in a tree structure with the following predicate functions:

“not”: Triggers when a reward is not triggered.
“and”: Triggers when both children of this node are returning a positive reward.
“or”: Triggers when one or both of the children of this node are returning a positive reward.
“xor”: Triggers when only one of the children of this node are returning a positive reward.

class simulate.RewardFunction

< source >

( type: typing.Optional[str] = None entity_a: typing.Optional[typing.Any] = None entity_b: typing.Optional[typing.Any] = None distance_metric: typing.Optional[str] = None direction: typing.Optional[typing.List[float]] = None scalar: float = 1.0 threshold: float = 1.0 is_terminal: bool = False is_collectable: bool = False trigger_once: bool = True reward_function_a: dataclasses.InitVar[typing.Optional[ForwardRef('RewardFunction')]] = None reward_function_b: dataclasses.InitVar[typing.Optional[ForwardRef('RewardFunction')]] = None name: dataclasses.InitVar[typing.Optional[str]] = None position: dataclasses.InitVar[typing.Optional[typing.List[float]]] = <property object at 0x7f9e74dc1180> rotation: dataclasses.InitVar[typing.Optional[typing.List[float]]] = <property object at 0x7f9e74ec5ea0> scaling: dataclasses.InitVar[typing.Union[float, typing.List[float], NoneType]] = <property object at 0x7f9e74ec5ef0> transformation_matrix: dataclasses.InitVar[typing.Optional[typing.List[float]]] = <property object at 0x7f9e74ec5f40> parent: dataclasses.InitVar[typing.Optional[typing.Any]] = None children: dataclasses.InitVar[typing.Optional[typing.List[typing.Any]]] = None created_from_file: dataclasses.InitVar[typing.Optional[str]] = None )

Parameters

type (str, optional, defaults to "dense") — The type of reward function. Must be one of the following: [ “dense”, “sparse”, “or”, “and”, “not”, “see”, “timeout” ]
entity_a (Asset, optional, defaults to None) — The first entity in the reward function entity_b — (Asset, optional, defaults to None): The second entity in the reward function
distance_metric (str, optional, defaults to "euclidean") — The distance metric to use. Must be one of the following: [ “euclidean” ]
direction (List[float], optional, defaults to [1.0, 0.0, 0.0]) — The direction to use for the reward function.
scalar (float, optional, defaults to 1.0) — The scalar to modify the reward by a constant. Setting to -1 will make the reward behave as a cost.
threshold (float, optional, defaults to 1.0) — The distance threshold to give the reward
is_terminal (bool, optional, defaults to False) — Whether the reward is terminal
is_collectable (bool, optional, defaults to False) — Whether the reward is collectable
trigger_once (bool, optional, defaults to True) — Whether the reward is triggered once
reward_function_a (RewardFunction, optional, defaults to None) — When doing combination of rewards (and, or), the first reward function that are to be combined
reward_function_b (RewardFunction, optional, defaults to None) — When doing combination of rewards (and, or), the second reward function that are to be combined
name (str, optional, defaults to None) — The name of the reward function
position (List[float], optional, defaults to [0.0, 0.0, 0.0]) — The position of the reward function.
rotation (List[float], optional, defaults to [0.0, 0.0, 0.0]) — The rotation of the reward function.
scaling (List[float], optional, defaults to [1.0, 1.0, 1.0]) — The scaling of the reward function.
transformation_matrix (List[float], optional, defaults to None) — The transformation matrix of the reward function.
parent (Asset, optional, defaults to None) — The parent of the reward function.
children (List[Asset], optional, defaults to None) — The children of the reward function.
created_from_file (str, optional, defaults to None) — The file path of the file from which the reward function was created.

A reinforcement learning reward function.