Simulate documentation
Reward Functions
Reward Functions
🤗 Simulate provides a system to define simple and complex reward functions. This is achieved through the combination of “leaf” reward functions, such as Sparse and Dense rewards functions, and predicate reward functions.
(LINK TO REWARD PREDICATE DIAGRAM)
Reward functions can be parameterized with a variety of distance metrics. Currently “euclidean”, “cosine” and “best_euclidean” are supported. Through the combination of predicates and leaf rewards, complex reward functions can be created. A good example of the is the Move Boxes example.
The following “leaf” rewards are available in Simulate:
- “dense”: A reward that is non-zero at every time-step.
- “sparse”: A reward that is triggered by the proximity of another object.
- “timeout”: A timeout reward that is triggered after a certain number of time-steps.
- “see”: Triggered when an object is in the field of view of an Actor.
- “angle_to”: Triggered when the angle between two objects and a certain direction is less that a threshold.
The “leaf” reward functions can be combined in a tree structure with the following predicate functions:
- “not”: Triggers when a reward is not triggered.
- “and”: Triggers when both children of this node are returning a positive reward.
- “or”: Triggers when one or both of the children of this node are returning a positive reward.
- “xor”: Triggers when only one of the children of this node are returning a positive reward.
class simulate.RewardFunction
< source >( type: typing.Optional[str] = None entity_a: typing.Optional[typing.Any] = None entity_b: typing.Optional[typing.Any] = None distance_metric: typing.Optional[str] = None direction: typing.Optional[typing.List[float]] = None scalar: float = 1.0 threshold: float = 1.0 is_terminal: bool = False is_collectable: bool = False trigger_once: bool = True reward_function_a: dataclasses.InitVar[typing.Optional[ForwardRef('RewardFunction')]] = None reward_function_b: dataclasses.InitVar[typing.Optional[ForwardRef('RewardFunction')]] = None name: dataclasses.InitVar[typing.Optional[str]] = None position: dataclasses.InitVar[typing.Optional[typing.List[float]]] = <property object at 0x7fe8df52b630> rotation: dataclasses.InitVar[typing.Optional[typing.List[float]]] = <property object at 0x7fe8df52b4a0> scaling: dataclasses.InitVar[typing.Union[float, typing.List[float], NoneType]] = <property object at 0x7fe8df52b4f0> transformation_matrix: dataclasses.InitVar[typing.Optional[typing.List[float]]] = <property object at 0x7fe8df52b540> parent: dataclasses.InitVar[typing.Optional[typing.Any]] = None children: dataclasses.InitVar[typing.Optional[typing.List[typing.Any]]] = None created_from_file: dataclasses.InitVar[typing.Optional[str]] = None )
Parameters
- 
							type (str, optional, defaults to"dense") — The type of reward function. Must be one of the following: [ “dense”, “sparse”, “or”, “and”, “not”, “see”, “timeout” ]
- 
							entity_a (Asset, optional, defaults toNone) — The first entity in the reward function entity_b — (Asset, optional, defaults toNone): The second entity in the reward function
- 
							distance_metric (str, optional, defaults to"euclidean") — The distance metric to use. Must be one of the following: [ “euclidean” ]
- 
							direction (List[float], optional, defaults to [1.0, 0.0, 0.0]) — The direction to use for the reward function.
- 
							scalar (float, optional, defaults to1.0) — The scalar to modify the reward by a constant. Setting to -1 will make the reward behave as a cost.
- 
							threshold (float, optional, defaults to1.0) — The distance threshold to give the reward
- 
							is_terminal (bool, optional, defaults toFalse) — Whether the reward is terminal
- 
							is_collectable (bool, optional, defaults toFalse) — Whether the reward is collectable
- 
							trigger_once (bool, optional, defaults toTrue) — Whether the reward is triggered once
- 
							reward_function_a (RewardFunction, optional, defaults toNone) — When doing combination of rewards (and, or), the first reward function that are to be combined
- 
							reward_function_b (RewardFunction, optional, defaults toNone) — When doing combination of rewards (and, or), the second reward function that are to be combined
- 
							name (str, optional, defaults toNone) — The name of the reward function
- 
							position (List[float], optional, defaults to [0.0, 0.0, 0.0]) — The position of the reward function.
- 
							rotation (List[float], optional, defaults to [0.0, 0.0, 0.0]) — The rotation of the reward function.
- 
							scaling (List[float], optional, defaults to [1.0, 1.0, 1.0]) — The scaling of the reward function.
- 
							transformation_matrix (List[float], optional, defaults to None) — The transformation matrix of the reward function.
- 
							parent (Asset, optional, defaults toNone) — The parent of the reward function.
- 
							children (List[Asset], optional, defaults toNone) — The children of the reward function.
- 
							created_from_file (str, optional, defaults toNone) — The file path of the file from which the reward function was created.
A reinforcement learning reward function.