The Minigrid domain is a discrete analog of Maze2D. The "play" dataset is generated by commanding specific hand-picked goal locations from hand-picked initial positions. The dataset in 'antmaze-umaze-v0' is generated by commanding a fixed goal location from a fixed starting location (these are the opposite sides of the wall in the umaze).įor harder tasks, the "diverse" dataset is generated by commanding random goal locations in the maze and navigating the ant to them. The AntMaze domain uses the same umaze, medium, and large mazes from the Maze2D domain, but replaces the agent with the "Ant" robot from the OpenAI Gym MuJoCo benchmark. The four environments maze2d-open-v0, maze2d-umaze-v0, maze2d-medium-v0, maze2d-large-v0 use a sparse reward which is has a value of 1.0 when the agent (light green ball) is within a 0.5 unit radius of the target (light red ball).Įach environment has a dense reward version, which instead uses the negative exponentiated distance as the reward. The four maze layouts are shown below (from left to right: open, umaze, medium large): However, for the purposes of being able to split the trajectory into smaller subtrajectories, we use the timeout field to denote instances when the randomly selected navigation goal has been reached. The dataset consists of one continuous trajectory of the agent navigating to random goal locations, and thus has no terminal states. The observation consists of the (x, y) location and velocities. The Maze2D domain involves moving force-actuated ball (along the X and Y axis) to a fixed target location.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |