About the get_batch #531

leo-young · 2024-06-28T09:26:37Z

def get_batch(split):
    # We recreate np.memmap every batch to avoid a memory leak, as per
    # https://stackoverflow.com/questions/45132940/numpy-memmap-memory-usage-want-to-iterate-once/61472122#61472122
    if split == 'train':
        data = np.memmap(os.path.join(data_dir, 'train.bin'), dtype=np.uint16, mode='r')
    else:
        data = np.memmap(os.path.join(data_dir, 'val.bin'), dtype=np.uint16, mode='r')
    ix = torch.randint(len(data) - block_size, (batch_size,))
    x = torch.stack([torch.from_numpy((data[i:i+block_size]).astype(np.int64)) for i in ix])
    y = torch.stack([torch.from_numpy((data[i+1:i+1+block_size]).astype(np.int64)) for i in ix])
    if device_type == 'cuda':
        # pin arrays x,y, which allows us to move them to GPU asynchronously (non_blocking=True)
        x, y = x.pin_memory().to(device, non_blocking=True), y.pin_memory().to(device, non_blocking=True)
    else:
        x, y = x.to(device), y.to(device)
    return x, y

Hi，Thanks for the excellent work. I have a question about the get_batch，this code can't make the data across each distribute process unique, like DistributedSampler in pytorch does.(indices = indices[self.rank:self.total_size:self.num_replicas])The data maybe overlap。becase every process take data by the order ("x = torch.randint(len(data) - block_size, (batch_size,))" ). Is this a efficiency problem?

The text was updated successfully, but these errors were encountered:

Hannibal046 · 2024-08-11T04:17:24Z

fixed here: https://github.com/karpathy/build-nanogpt/blob/6104ab1b53920f6e2159749676073ff7d815c1fa/train_gpt2.py#L214

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the get_batch #531

About the get_batch #531

leo-young commented Jun 28, 2024 •

edited

Loading

Hannibal046 commented Aug 11, 2024

About the get_batch #531

About the get_batch #531

Comments

leo-young commented Jun 28, 2024 • edited Loading

Hannibal046 commented Aug 11, 2024

leo-young commented Jun 28, 2024 •

edited

Loading