I am trying to make a GPT-2 model with deepspeed on an azure VM. I found ~2 bugs which I was able to patch, but I have stumbled upon a really tough one. You see, it says I need pytorch. No surprise. I install pytorch. It still says I don't have it. I used both pip and pip3 many times. I install pytorch from github and run setup.py. It says I need python 3. When I get python 3 it says the same. When I try google colab it gives me the following error:
Traceback (most recent call last):
File "pretrain_gpt2.py", line 709, in <module>
main()
File "pretrain_gpt2.py", line 654, in main
args.eod_token = get_train_val_test_data(args)
File "pretrain_gpt2.py", line 600, in get_train_val_test_data
args)
File "/content/DeepSpeedExamples/Megatron-LM/configure_data.py", line 34, in apply
return make_loaders(args)
File "/content/DeepSpeedExamples/Megatron-LM/configure_data.py", line 170, in make_loaders
train, tokenizer = data_utils.make_dataset(**data_set_args)
File "/content/DeepSpeedExamples/Megatron-LM/data_utils/init.py", line 109, in make_dataset
ds = split_ds(ds, split)
File "/content/DeepSpeedExamples/Megatron-LM/data_utils/datasets.py", line 194, in split_ds
rtn_ds[i] = SplitDataset(ds, split_inds)
File "/content/DeepSpeedExamples/Megatron-LM/data_utils/datasets.py", line 134, in init
self.lens = itemgetter(*self.split_inds)(list(self.wrapped_data.lens))
TypeError: itemgetter expected 1 arguments, got 0
How do I fix both the google colab and the azure VM errors?