1. What is Llama2
On July 18, Meta released the Llama2 open source large language model, available for free for research and commercial use.
The Llama training method involves unsupervised pre-training followed by supervised fine-tuning, training a reward model, and reinforcement learning based on human feedback. Llama 2 has 40% more training data than Llama 1, is trained on 2 trillion tokens and have double the context length of Llama 1. The Llama 2 model comes in three size variants: 7B, 13B, and 70B.
According to official data published by Meta, Llama 2 outperforms other open language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests, and even outperforms some closed source models in terms of helpfulness, security, and security.
Llama 2-Chat builds on Llama 2 with fine-tuning and security improvements for dialog use cases, and the tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety.
Llama 2-Chat is more focused on chatbots and mainly used in the following aspects:
· Customer service: Llama 2-Chat can be used for online customer service to answer FAQs about products and services, and provide help and support to users.
· Social entertainment: Llama 2-Chat can be used as a funny chat partner to have casual and relaxed conversations with users, providing entertainment content such as jokes, riddles, stories, etc., to increase the user's entertainment experience.
· Personal Assistant: Llama 2-Chat can answer some daily life questions, such as weather queries, time settings, reminders, etc., help users solve simple tasks and provide some practical functions.
· Mental Health: Llama 2-Chat can be used as a simple mental health support tool that can communicate with users, provide advice and tips for emotional regulation and stress relief, and provide comfort and support to users.
2. Build a model running environment on the GPU cloud server
Step 1: Download the model and upload
Download the Llama-2-7b-chat-hf model from huggingface official website, as shown in the figure below. Then, upload the downloaded model to the GPU cloud server.
Description
For more information on how to upload local files to the Linux-based cloud server, see How to Upload Local Files to Linux-based Cloud Server.
Step 2: Build an environment
1. Upload and install the GPU driver
Download the GPU driver from the Nvidia official website and upload it to the GPU cloud server. Install the driver in the following steps.
# Add the execution permission to the installation package
chmod +x NVIDIA-Linux-x86_64-515.105.01.run
# Install gcc and linux-kernel-headers
sudo apt-get install gcc linux-kernel-headers
# Run the driver installer
sudo sh NVIDIA-Linux-x86_64-515.105.01.run --disable-nouveau
# Check whether the driver is successfully installed
nvidia-smi
Description
For more information on how to select a driver, library, and software version, see How to Select a Driver, Library, or Software Version.
2. Install the Nvidia CUDA Toolkit component
wget http://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run
# Install CUDA
bash cuda_11.7.0_515.43.04_linux.run
# Edit the environment variable file
vi ~/.bashrc
# Add environment variables
export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
# Take environment variables into effect
source ~/.bashrc
# Check whether it is successfully installed
nvcc -V
3. Install Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
# Install Miniconda3
bash Miniconda3-latest-Linux-x86_64.sh
# Configure environment variables of Conda
vim /etc/profile
# Add environment variables
export ANACONDA_PATH=~/miniconda3 export PATH=$PATH:$ANACONDA_PATH/bin
# Take environment variables into effect
source /etc/profile
# Check whether it is successfully installed
which anaconda conda --version conda info -e python
# Check the virtual environment
conda env list
4. Install cuDNN
Down the cuDNN ZIP file from cudnn-download and upload it to the GPU cloud server. Install cuDNN in the following steps.
# Unzip
tar -xf cudnn-linux-x86_64-8.9.4.25_cuda11-archive.tar.xz
# Go to the directory
cd cudnn cudnn-linux-x86_64-8.9.4.25_cuda11-archive
# Copy
cp ./include/* /usr/local/cuda-11.7/include/ cp ./lib/libcudnn* /usr/local/cuda-11.7/lib64/
# Authorize
chmod a+r /usr/local/cuda-11.7/include/* /usr/local/cuda-11.7/lib64/libcudnn*
# Check whether it is successfully installed
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
5. Install dependencies
a. Download the code for Llama model
git clone https://github.com/facebookresearch/llama.git
b. Install dependencies online
python -m pip install --upgrade pip -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
# Download dependencies
pip install -e . -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com pip install transformer -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
# Download peft
git clone https://github.com/huggingface/peft.git
# Upload to the offline server and switch branches, and install the specific peft version
git checkout 13e53fc
# Install peft
pip install . -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn
Step 3: Package the image
In order to enable you to build a model running environment faster, after you complete the operations in Step 1 and Step 2, we package the system disk of the GPU cloud server and generate a standard GPU cloud server image. Now the package has been uploaded to eSurfing Cloud Chengdu 4 and Haikou 2 resource pools. You can directly use the image.
Package the image in the following steps:
echo "nameserver 114.114.114.114" > /etc/resolv.conf echo "localhost" > /etc/hostname
# Clear machine-id.
yes | cp -f /dev/null /etc/machine-id
# If /var/lib/dbus/machine-id exists,
# rm -f /var/lib/dbus/machine-id
# ln -s /etc/machine-id /var/lib/dbus/machine-id
cloud-init clean -l # clear cloud-init. If this command is unavailable, try: rm -rf /var/lib/cloud rm -f /tmp/*.log # clear the image script log.
# Clear /var/log log.
read -r -d '' script <<-"EOF" import os def clear_logs(base_path="/var/log"): files = os.listdir(base_path) for file in files: file_path = os.path.join(base_path, file) if os.path.isfile(file_path): with open(file_path, "w") as f: f.truncate() elif os.path.isdir(file_path): clear_logs(base_path=file_path) if __name__ == "__main__": clear_logs() EOF if [ -e /usr/bin/python ]; then python -c "$script" elif [ -e /usr/bin/python2 ]; then python2 -c "$script" elif [ -e /usr/bin/python3 ]; then python3 -c "$script" else echo "### no python env in /usr/bin. clear_logs failed ! ###" fi
# Clear the history.
rm -f /root/.python_history rm -f /root/.bash_history rm -f /root/.wget-hsts
3. Rapidly deploy the model with the foundation model image
Step 1: Create a GPU cloud server
Log in to the eSurfing Cloud console, go to the ECS ordering page, select the computing accelerated GPU cloud server, and select the foundation model image LLaMA2-7B-Chat in the public image.
The recommended minimum specification of the foundation model image LLaMA2-7B-Chat is p2v.2xlarge.4 8vCPU with 32GB memory and single v100 GPU.
Step 2: Online reasoning
Log in to the GPU cloud server and execute the reasoning task in the following steps.
#Go to the LLaMa directory and run the sh run.sh command
cd /opt/llama && sh run.sh
# Enter the reasoning question after "please input your question :" as instructed