Deepspeed gpt-2 megatron-LM problems

Question

Deepspeed gpt-2 megatron-LM problems

sammyboy123 51

I am trying to make a GPT-2 model with deepspeed on an azure VM. I found ~2 bugs which I was able to patch, but I have stumbled upon a really tough one. You see, it says I need pytorch. No surprise. I install pytorch. It still says I don't have it. I used both pip and pip3 many times. I install pytorch from github and run setup.py. It says I need python 3. When I get python 3 it says the same. When I try google colab it gives me the following error:
Traceback (most recent call last): File "pretrain_gpt2.py", line 709, in <module>
main() File "pretrain_gpt2.py", line 654, in main
args.eod_token = get_train_val_test_data(args) File "pretrain_gpt2.py", line 600, in get_train_val_test_data
args) File "/content/DeepSpeedExamples/Megatron-LM/configure_data.py", line 34, in apply
return make_loaders(args) File "/content/DeepSpeedExamples/Megatron-LM/configure_data.py", line 170, in make_loaders
train, tokenizer = data_utils.make_dataset(**data_set_args) File "/content/DeepSpeedExamples/Megatron-LM/data_utils/init.py", line 109, in make_dataset
ds = split_ds(ds, split) File "/content/DeepSpeedExamples/Megatron-LM/data_utils/datasets.py", line 194, in split_ds
rtn_ds[i] = SplitDataset(ds, split_inds) File "/content/DeepSpeedExamples/Megatron-LM/data_utils/datasets.py", line 134, in init
self.lens = itemgetter(*self.split_inds)(list(self.wrapped_data.lens)) TypeError: itemgetter expected 1 arguments, got 0

How do I fix both the google colab and the azure VM errors?

deherman-MSFT 38,021 Reputation points Microsoft Employee Moderator

2021-01-07T19:55:56.773+00:00

@sammyboy123
I unfortunately have not worked with DeepSpeed before, happy to do my best to help though. Can you provide the documentation you are following as well as which commands/steps are failing for you? What type of VM and OS are you working with? Since this is related to the DeepSpeed it might be worth posting on their issues page.
sammyboy123 51 Reputation points

2021-01-11T13:53:41.81+00:00

Hi @deherman-MSFT ! Sorry for the response time. I am using an ubuntu 18.04-LTS on a V1 machine. The instructions are at the following link: https://github.com/microsoft/DeepSpeedExamples/tree/master/Megatron-LM There was an error that says pytorch wasn't installed. I installed pytorch using pip and pip3. Niether worked so I cloned the pytorch repo and ran setup.py. It said I needed python 3. I removed the python and python2.7 package, but it had the same error. Do you know what is going wrong? It did not have this error on google colab.
Thank you for helping me out.

Accepted answer

0 additional answers

Your answer

deherman-MSFT 38,021 Reputation points Microsoft Employee Moderator

2021-01-07T19:55:56.773+00:00

@sammyboy123
I unfortunately have not worked with DeepSpeed before, happy to do my best to help though. Can you provide the documentation you are following as well as which commands/steps are failing for you? What type of VM and OS are you working with? Since this is related to the DeepSpeed it might be worth posting on their issues page.
sammyboy123 51 Reputation points

2021-01-11T13:53:41.81+00:00

Hi @deherman-MSFT ! Sorry for the response time. I am using an ubuntu 18.04-LTS on a V1 machine. The instructions are at the following link: https://github.com/microsoft/DeepSpeedExamples/tree/master/Megatron-LM There was an error that says pytorch wasn't installed. I installed pytorch using pip and pip3. Niether worked so I cloned the pytorch repo and ran setup.py. It said I needed python 3. I removed the python and python2.7 package, but it had the same error. Do you know what is going wrong? It did not have this error on google colab.
Thank you for helping me out.

Answer 1

@sammyboy123
I would start by installing PyTorch via pip. Instructions can be found here. There is also a verification section which will test if you have PyTorch installed correctly. You also might find the DeepSpeed Getting Started page helpful. There are specific tutorials for Azure and also a docker image available.

Let me know if this doesn't work for you or you are still facing issues.

-------------------------------

Please don’t forget to "Accept the answer" and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Share via

Deepspeed gpt-2 megatron-LM problems

0 additional answers

Your answer