Add dtype choice in step type/functions #256

thomashirtz · 2024-11-03T17:16:37Z

Is your feature request related to a problem? Please describe

In a personal project, I had to be very efficient in the memory management, my reward were taking a lot of space. In my case I needed to change the reward and the discount to float16 instead of float32. I had to copy over the types file to do the modification locally. However I feel like some use case may need this flexibility.

Describe the solution you'd like

Give an extra parameter to the step functions to give dtype. (Example with one of them)

def truncation(
    reward: Array,
    observation: Observation,
    discount: Optional[Array] = None,
    extras: Optional[Dict] = None,
    shape: Union[int, Sequence[int]] = (),
    dtype: jnp.dtype = jnp.float32,
) -> TimeStep:
    """Returns a `TimeStep` with `step_type` set to `StepType.LAST`.

    Args:
        reward: array.
        observation: array or tree of arrays.
        discount: array.
        extras: environment metric(s) or information returned by the environment but
            not observed by the agent (hence not in the observation). For example, it
            could be whether an invalid action was taken. In most environments, extras
            is None.
        shape: optional parameter to specify the shape of the rewards and discounts.
            Allows multi-agent environment compatibility. Defaults to () for
            scalar reward and discount.
    Returns:
        TimeStep identified as the truncation of an episode.
    """
    discount = discount if discount is not None else jnp.ones(shape, dtype=dtype)
    extras = extras or {}
    return TimeStep(
        step_type=StepType.LAST,
        reward=reward,
        discount=discount,
        observation=observation,
        extras=extras,
    )

I would be happy to do the PR.

Misc

Check for duplicate requests.

sash-a · 2024-11-04T13:23:48Z

I think that would be a nice addition, happy to review it 😄

sash-a · 2024-11-04T13:26:35Z

To be honest I think the discounts should probably be booleans while they are stored in the timestep because for me they just indicated end of episode, but I think this would add nice flexibility

thomashirtz · 2024-11-04T18:24:56Z

To be honest I think the discounts should probably be booleans while they are stored in the timestep because for me they just indicated end of episode, but I think this would add nice flexibility

I'm fine with both, as long as it doesn't take too much space. I go with argument set by default to boolean ? or just boolean ?

sash-a · 2024-11-05T05:38:41Z

My only issue is that this strays from the original dm_env api where it is a float so it can represent both RL discount (gamma) and done.

Let's definitely add it as an argument, but for the default I'm not sure if boolean or float32 is best @clement-bonnet any thoughts on this?

clement-bonnet · 2024-11-05T07:40:50Z

To my knowledge, having the discount as a float is more common than as a boolean for the reasons you mentioned @sash-a. I would keep it a float unless there are strong reasons to do otherwise :)

sash-a · 2024-11-05T07:51:45Z

Great then if you could add the argument with a default of float32, I'm happy to accept the PR

thomashirtz · 2024-11-07T07:42:29Z

Great then if you could add the argument with a default of float32, I'm happy to accept the PR

The PR is available for review :)

thomashirtz added the enhancement New feature or request label Nov 3, 2024

thomashirtz mentioned this issue Nov 5, 2024

Add dtype choice in step type/functions #262

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dtype choice in step type/functions #256

Add dtype choice in step type/functions #256

thomashirtz commented Nov 3, 2024 •

edited

Loading

sash-a commented Nov 4, 2024

sash-a commented Nov 4, 2024

thomashirtz commented Nov 4, 2024

sash-a commented Nov 5, 2024

clement-bonnet commented Nov 5, 2024

sash-a commented Nov 5, 2024

thomashirtz commented Nov 7, 2024

Add dtype choice in step type/functions #256

Add dtype choice in step type/functions #256

Comments

thomashirtz commented Nov 3, 2024 • edited Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Misc

sash-a commented Nov 4, 2024

sash-a commented Nov 4, 2024

thomashirtz commented Nov 4, 2024

sash-a commented Nov 5, 2024

clement-bonnet commented Nov 5, 2024

sash-a commented Nov 5, 2024

thomashirtz commented Nov 7, 2024

thomashirtz commented Nov 3, 2024 •

edited

Loading