Background
ViT is the short form of Vision Transformer. This model was proposed by Alexey Dosovitskiy et al. in 2020, and Transformer is applied to image classification models. Although it is not the first time to apply Transformer to visual tasks, the model structure, featuring good effect and high scalability, makes Transformer a milestone in CV applications. The following shows the schematic diagram of the model:
The following table shows the instance environment.
Instance Type | pi2.2xlarge.4 |
Region | Shanghai 7 |
System Disk | 40GB |
Data Disk | 10GB |
OS | Ubuntu 18.04.5 LTS |
EIP Bandwidth | 5Mbps |
Procedure
1. Configure the PyTorch development environment.
a. Install the NVIDIA GPU driver, CUDA, and CUDNN component.
Run the following command to install the NVIDIA graphics card driver.
apt install tar gcc g++ make build-essential chmod +x NVIDIA-Linux-x86_64-515.65.01.run ./NVIDIA-Linux-x86_64-515.65.01.run --no-opengl-files
After the installation, run the nvidia-smi command to check whether the installation is successful.
./cuda_11.7.0_515.43.04_linux.run tar xJvf cudnn-linux-x86_64-8.5.0.96_cuda11-archive.tar.xz cd cudnn-linux-x86_64-8.5.0.96_cuda11-archive sudo cp include/* /usr/local/cuda-11.7/include/ sudo cp lib/* /usr/local/cuda-11.7/lib64/ sudo chmod a+r /usr/local/cuda-11.7/include/cudnn* sudo chmod a+r /usr/local/cuda-11.7/lib64/libcudnn*
b. Configure the Conda environment.
Run the following commands in turns to configure the Conda environment.
wget -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh chmod +x Miniconda3-py39_4.12.0-Linux-x86_64.sh ./Miniconda3-py39_4.12.0-Linux-x86_64.sh
c. Edit the ~/.condarc file, add the configuration information in the following figure, and replace the conda software source with the Tsinghua source.
channels: - defaults show_channel_urls: true default_channels: - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2 custom_channels: conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud pytorch-lts: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud deepmodeling: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
For more information, see https://mirror.tuna.tsinghua.edu.cn/help/anaconda/
Run conda info command to confirm that the software source is replaced.
d. Run the following command to replace the pip source with the Tsinghua source.
config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple/
e. Install the PyTorch component.
Run the following command to install PyTorch.
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
Run the following command to check whether PyTorch is successfully installed.
2. Experience data.
CIFAR-10 (Canadian Institute for Advanced Research-10) is a commonly used computer vision dataset for image classification tasks. It consists of 60,000 color images (32x32) from 10 different categories, each containing 6,000 images. The dataset is divided into two parts: the training set and the test set, where the training set contains 50,000 images and the test set contains 10,000 images. The images in the CIFAR-10 dataset cover a wide range of object categories, including airplanes, cars, birds, cats, deers, dogs, frogs, horses, boats, and trucks. Each image has a label indicating the category to which it belongs. This dataset is widely used in computer vision for algorithm development, model training, and performance evaluation.
3. The ColossalAI-Examples model training is used.
In this topic, the model is trained and developed on the basis of Colossal AI, a distributed training framework. Colossal AI provides a convenient set of interfaces through which data parallelism, model parallelism, pipeline parallelism, or hybrid parallelism can be easily implemented.
a. Install Colossal AI and other components.
pip install colossalai timm titans
b. ViT sample model training.
git clone https://github.com/hpcaitech/ColossalAI-Examples.git cd ColossalAI-Examples/image/vision_transformer/data_parallel Since a single T4 has limited graphic memory, modify the config.py file and set BATCH_SIZE to 32. Run the following command to start training: colossalai run --nproc_per_node 1 train_with_cifar10.py --config config.py
The model running process is shown in the following figure: