Deepspeed gpt-2 megatron-LM problems

sammyboy123 51 Reputation points
2021-01-07T16:42:49.21+00:00

I am trying to make a GPT-2 model with deepspeed on an azure VM. I found ~2 bugs which I was able to patch, but I have stumbled upon a really tough one. You see, it says I need pytorch. No surprise. I install pytorch. It still says I don't have it. I used both pip and pip3 many times. I install pytorch from github and run setup.py. It says I need python 3. When I get python 3 it says the same. When I try google colab it gives me the following error:
Traceback (most recent call last): File "pretrain_gpt2.py", line 709, in <module>
main() File "pretrain_gpt2.py", line 654, in main
args.eod_token = get_train_val_test_data(args) File "pretrain_gpt2.py", line 600, in get_train_val_test_data
args) File "/content/DeepSpeedExamples/Megatron-LM/configure_data.py", line 34, in apply
return make_loaders(args) File "/content/DeepSpeedExamples/Megatron-LM/configure_data.py", line 170, in make_loaders
train, tokenizer = data_utils.make_dataset(**data_set_args) File "/content/DeepSpeedExamples/Megatron-LM/data_utils/init.py", line 109, in make_dataset
ds = split_ds(ds, split) File "/content/DeepSpeedExamples/Megatron-LM/data_utils/datasets.py", line 194, in split_ds
rtn_ds[i] = SplitDataset(ds, split_inds) File "/content/DeepSpeedExamples/Megatron-LM/data_utils/datasets.py", line 134, in init
self.lens = itemgetter(*self.split_inds)(list(self.wrapped_data.lens)) TypeError: itemgetter expected 1 arguments, got 0

How do I fix both the google colab and the azure VM errors?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,335 questions
Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
9,032 questions
{count} vote

Accepted answer
  1. deherman-MSFT 38,021 Reputation points Microsoft Employee Moderator
    2021-01-11T23:51:27.243+00:00

    @sammyboy123
    I would start by installing PyTorch via pip. Instructions can be found here. There is also a verification section which will test if you have PyTorch installed correctly. You also might find the DeepSpeed Getting Started page helpful. There are specific tutorials for Azure and also a docker image available.

    Let me know if this doesn't work for you or you are still facing issues.

    -------------------------------

    Please don’t forget to "Accept the answer" and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.