Reward Functions

🤗 Simulate provides a system to define simple and complex reward functions. This is achieved through the combination of “leaf” reward functions, such as Sparse and Dense rewards functions, and predicate reward functions.

(LINK TO REWARD PREDICATE DIAGRAM)

Reward functions can be parameterized with a variety of distance metrics. Currently “euclidean”, “cosine” and “best_euclidean” are supported. Through the combination of predicates and leaf rewards, complex reward functions can be created. A good example of the is the Move Boxes example.

The following “leaf” rewards are available in Simulate:

“dense”: A reward that is non-zero at every time-step.
“sparse”: A reward that is triggered by the proximity of another object.
“timeout”: A timeout reward that is triggered after a certain number of time-steps.
“see”: Triggered when an object is in the field of view of an Actor.
“angle_to”: Triggered when the angle between two objects and a certain direction is less that a threshold.

The “leaf” reward functions can be combined in a tree structure with the following predicate functions:

“not”: Triggers when a reward is not triggered.
“and”: Triggers when both children of this node are returning a positive reward.
“or”: Triggers when one or both of the children of this node are returning a positive reward.
“xor”: Triggers when only one of the children of this node are returning a positive reward.

class simulate.RewardFunction

< source >

( type: typing.Optional[str] = None entity_a: typing.Optional[typing.Any] = None entity_b: typing.Optional[typing.Any] = None distance_metric: typing.Optional[str] = None direction: typing.Optional[typing.List[float]] = None scalar: float = 1.0 threshold: float = 1.0 is_terminal: bool = False is_collectable: bool = False trigger_once: bool = True reward_function_a: dataclasses.InitVar[typing.Optional[ForwardRef('RewardFunction')]] = None reward_function_b: dataclasses.InitVar[typing.Optional[ForwardRef('RewardFunction')]] = None name: dataclasses.InitVar[typing.Optional[str]] = None position: dataclasses.InitVar[typing.Optional[typing.List[float]]] = <property object at 0x7fca120c6450> rotation: dataclasses.InitVar[typing.Optional[typing.List[float]]] = <property object at 0x7fca120c62c0> scaling: dataclasses.InitVar[typing.Union[float, typing.List[float], NoneType]] = <property object at 0x7fca120c6310> transformation_matrix: dataclasses.InitVar[typing.Optional[typing.List[float]]] = <property object at 0x7fca120c6360> parent: dataclasses.InitVar[typing.Optional[typing.Any]] = None children: dataclasses.InitVar[typing.Optional[typing.List[typing.Any]]] = None created_from_file: dataclasses.InitVar[typing.Optional[str]] = None )

An RL reward function