Train a ViT model using a GPU elastic cloud host

2026-01-28 02:19:00

Background

ViT is the short form of Vision Transformer. This model was proposed by Alexey Dosovitskiy et al. in 2020, and Transformer is applied to image classification models. Although it is not the first time to apply Transformer to visual tasks, the model structure, featuring good effect and high scalability, makes Transformer a milestone in CV applications. The following shows the schematic diagram of the model:

The following table shows the instance environment.

Instance Type	pi2.2xlarge.4
Region	Shanghai 7
System Disk	40GB
Data Disk	10GB
OS	Ubuntu 18.04.5 LTS
EIP Bandwidth	5Mbps

Procedure

1. Configure the PyTorch development environment.

a. Install the NVIDIA GPU driver, CUDA, and CUDNN component.

Run the following command to install the NVIDIA graphics card driver.

apt install tar gcc g++ make build-essential
chmod +x NVIDIA-Linux-x86_64-515.65.01.run
./NVIDIA-Linux-x86_64-515.65.01.run --no-opengl-files

After the installation, run the nvidia-smi command to check whether the installation is successful.

./cuda_11.7.0_515.43.04_linux.run
tar xJvf  cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz
cd cudnn-linux-x86_64-8.5.0.96_cuda11-archive
sudo cp include/* /usr/local/cuda-11.7/include/
sudo cp lib/* /usr/local/cuda-11.7/lib64/
sudo chmod a+r /usr/local/cuda-11.7/include/cudnn*
sudo chmod a+r /usr/local/cuda-11.7/lib64/libcudnn*

b. Configure the Conda environment.

Run the following commands in turns to configure the Conda environment.

wget -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh
chmod +x Miniconda3-py39_4.12.0-Linux-x86_64.sh
./Miniconda3-py39_4.12.0-Linux-x86_64.sh

c. Edit the ~/.condarc file, add the configuration information in the following figure, and replace the conda software source with the Tsinghua source.

channels:
  - defaults
show_channel_urls: true
default_channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
custom_channels:
  conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch-lts: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  deepmodeling: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud

For more information, see https://mirror.tuna.tsinghua.edu.cn/help/anaconda/

Run conda info command to confirm that the software source is replaced.

d. Run the following command to replace the pip source with the Tsinghua source.

config set global.index-url
https://pypi.tuna.tsinghua.edu.cn/simple/

e. Install the PyTorch component.

Run the following command to install PyTorch.

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url
https://download.pytorch.org/whl/cu117

Run the following command to check whether PyTorch is successfully installed.

2. Experience data.

CIFAR-10 (Canadian Institute for Advanced Research-10) is a commonly used computer vision dataset for image classification tasks. It consists of 60,000 color images (32x32) from 10 different categories, each containing 6,000 images. The dataset is divided into two parts: the training set and the test set, where the training set contains 50,000 images and the test set contains 10,000 images. The images in the CIFAR-10 dataset cover a wide range of object categories, including airplanes, cars, birds, cats, deers, dogs, frogs, horses, boats, and trucks. Each image has a label indicating the category to which it belongs. This dataset is widely used in computer vision for algorithm development, model training, and performance evaluation.

3. The ColossalAI-Examples model training is used.

In this topic, the model is trained and developed on the basis of Colossal AI, a distributed training framework. Colossal AI provides a convenient set of interfaces through which data parallelism, model parallelism, pipeline parallelism, or hybrid parallelism can be easily implemented.

a. Install Colossal AI and other components.

pip install colossalai timm titans

b. ViT sample model training.

git clone https://github.com/hpcaitech/ColossalAI-Examples.git
cd ColossalAI-Examples/image/vision_transformer/data_parallel
Since a single T4 has limited graphic memory, modify the config.py file and set BATCH_SIZE to 32. Run the following command to start training:
colossalai run --nproc_per_node 1  train_with_cifar10.py --config config.py

The model running process is shown in the following figure:

GPU Cloud Server

Train a ViT model using a GPU elastic cloud host

Background

Procedure