0%

机器学习Win11+GTX4070Ti环境下WSL2 Ubuntu+Pytorch2部署教程

1 Win11 环境下安装WSL2

本文是基于GPU版本进行了逐步安装说明

基于Win11的WSL2 Ubuntu实现,实际上针对裸机Ubuntu环境同样具有参考意义

1.1 安装WSL2 Ubuntu

低于 18362 版本的Win11不支持 WSL2

安装WSL2 Unbuntu前置条件

1、启用适用于Linux的Windowx子系统

2、启用虚拟机平台

3、安装WSL2 更新包

image-20230410204059798

然后通过Microsoft Store下载Ubuntu20.04.05TLS

即可成功启用Ubuntu的WSL2系统

image-20230415210452789

1.2 几个错误的解决办法

1.2.1 遇到错误0x8007019e

image-20230410203129835

需要安装适用于Linux的Windows子系统

image-20230410204059798

1.2.2 遇到错误0x800701bc

需要安装WSL2 更新包

https://wslstorestorage.blob.core.windows.net/wslblob/wsl_update_x64.msi

将 WSL 2 设置为默认版本

wsl --set-default-version 2

验证

1
2
3
 wsl -l -v
NAME STATE VERSION
* Ubuntu-22.04 Stopped 2

1.2.3 遇到错误0x80370102

在启用或关闭Windows功能中,选择安装虚拟机平台

image-20230412220409129

1.3 Win11安装NVIDIA驱动

此项简单,直接安装驱动程序即可

注意,如果裸机Ubuntu安装的话,需要在Ubuntu安装显卡驱动

1
apt install nvidia-utils-525

验证

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
(base) root@Ethan-4070Ti:~# nvidia-smi
Fri Apr 21 23:27:49 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 531.41 CUDA Version: 12.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 0% 30C P8 3W / 285W | 980MiB / 12282MiB | 8% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

image-20230421232805723

1.4 WSL Ubuntu22安装CUDA

通过Poweshell进入WSL2-Ubuntu

image-20230410210000116

官网选择下载链接

注意选择WSL-Ubuntu【这点很重要】

image-20230410205325138

1
2
3
4
5
6
7
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.0.1/local_installers/cuda-repo-wsl-ubuntu-12-0-local_12.0.1-1_amd64.deb
dpkg -i cuda-repo-wsl-ubuntu-12-0-local_12.0.1-1_amd64.deb
cp /var/cuda-repo-wsl-ubuntu-12-0-local/cuda-*-keyring.gpg /usr/share/keyrings/
apt-get update
apt-get -y install cuda

特别注意

安装结束之后执行nvcc -V,会提示没有nvcc可执行

1
2
3
root@DESKTOP-B3ONOL3:~# nvcc -V
Command 'nvcc' not found, but can be installed with:
apt install nvidia-cuda-toolkit

这并不是因为我们cudatoolkit没安装好,而是因为环境变量还没配置好

1
2
3
4
5
6
7
8
9
#注意这里是安装cuda12的,因此注意后续路径
# 验证
pwd
/usr/local/cuda-12.0/lib64

#编辑bashrc,将以下内容添加到最后
vim ~/.bashrc
export PATH=/usr/local/cuda-12.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-12.0/lib64

保存退出后(Ctrl+x),更新一下环境变量:

1
source ~/.bashrc

这时候在执行 nvcc -V 就能够显示cuda版本了

1
2
3
4
5
6
7
nvcc -V
#输出
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

1.5 WSL Ubuntu22安装cuDNN

https://developer.nvidia.com/rdp/cudnn-archive

必须先注册

image-20230410210942180

将文件保存到windows环境,然后直接复制到wsl2 ubuntu的home目录下,和在windows环境中复制粘贴一样操作【文件浏览器左侧有Linux】

在wsl的ternimal中进入到home目录,然后解压下载的文件

1
2
mkdir -p cudnn
tar -xvf cudnn-linux-x86_64-8.8.1.3_cuda12-archive.tar.xz -C cudnn

然后把解压得到的文件分别拷贝到对应的文件夹:

1
2
3
4
5
6
7
#拷贝,根据安装版本注意实际路径   
cp -r cudnn/cudnn-linux-x86_64-8.8.1.3_cuda12-archive/lib/libcudnn* /usr/local/cuda-12.0/lib64/
cp -r cudnn/cudnn-linux-x86_64-8.8.1.3_cuda12-archive/include/cudnn*.h /usr/local/cuda-12.0/include/

#为更改读取权限:
chmod a+r /usr/local/cuda-12.0/lib64/libcu*
chmod a+r /usr/local/cuda-12.0/include/cud*

验证

1
2
3
4
5
6
7
8
9
10
11
#检查cudnn是否安装成功
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

#应该正确输出以下内容
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 8
#define CUDNN_PATCHLEVEL 1
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

/* cannot use constexpr here since this is a C-only file */

1.6 安装anacoda

https://www.anaconda.com/products/distribution

1
wget https://repo.anaconda.com/archive/Anaconda3-2023.03-Linux-x86_64.sh

安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
chmod +x Anaconda3-2023.03-Linux-x86_64.sh
./Anaconda3-2023.03-Linux-x86_64.sh

Welcome to Anaconda3 py310_2023.03-0

In order to continue the installation process, please review the license
agreement.
Please, press ENTER to continue

# 按下回车
Anaconda3 will now be installed into this location:
/root/anaconda3

- Press ENTER to confirm the location
- Press CTRL-C to abort the installation
- Or specify a different location below

[/root/anaconda3] >>>
PREFIX=/root/anaconda3


# 最终完成
installation finished.
Do you wish the installer to initialize Anaconda3
by running conda init? [yes|no]
[no] >>> yes
no change /root/anaconda3/condabin/conda
no change /root/anaconda3/bin/conda
no change /root/anaconda3/bin/conda-env
no change /root/anaconda3/bin/activate
no change /root/anaconda3/bin/deactivate
no change /root/anaconda3/etc/profile.d/conda.sh
no change /root/anaconda3/etc/fish/conf.d/conda.fish
no change /root/anaconda3/shell/condabin/Conda.psm1
no change /root/anaconda3/shell/condabin/conda-hook.ps1
no change /root/anaconda3/lib/python3.10/site-packages/xontrib/conda.xsh
no change /root/anaconda3/etc/profile.d/conda.csh
modified /root/.bashrc

==> For changes to take effect, close and re-open your current shell. <==

If you'd prefer that conda's base environment not be activated on startup,
set the auto_activate_base parameter to false:

conda config --set auto_activate_base false

Thank you for installing Anaconda3!

验证Python3版本

1
2
python3 -V
Python 3.10.6

利用conda创建python3.10环境

1
conda create -n pytorch python=3.10.0

激活环境

1
conda activate pytorch

下载并安装pytorch2

1
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

验证

1
2
3
4
python3
#进入python3提示符后
import torch
torch.cuda.is_available()

img

管理anacoda

1
2
3
4
# 取消
deactivate env name
# 移除环境
conda remove -n env name --all

1.7 Jupter Lab

安装Jupter lab

1
2
3
4
# 激活环境
conda activate pytorch
# 安装jupyterlab
pip3 install jupyterlab

生成配置文件

1
2
jupyter-lab  --generate-config
Writing default config to: /root/.jupyter/jupyter_lab_config.py

设置密码【后续可以通过token或者密码登录】

1
2
3
4
jupyter-lab password
Enter password:
Verify password:
[JupyterPasswordApp] Wrote hashed password to /root/.jupyter/jupyter_server_config.json

开启服务【如果是root用户启动,必须添加--allow-root参数】

1
2
3
# 如果没有激活环境,先确保激活环境
conda activate pytorch
jupyter-lab --ip 0.0.0.0 --port 8888 --no-browser --allow-root

输入远程地址,访问

image-20230411104110350

最终界面【2022年4月21日界面】

image-20230421222823303

1.7.1 固定WSL2IP地址脚本

如果是直接本机访问,可忽略此步骤

如果不在内网访问,需要通过固定地址访问或者通过外网固定IP访问单位服务器,可参考以下步骤

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
@echo off
setlocal enabledelayedexpansion
::先停掉可能在跑的wsl实例
wsl --shutdown ubuntu
if !errorlevel! equ 0 (
::检查WSL有没有我需要的IP
wsl -u root ip addr | findstr "192.168.3.100" > nul
if !errorlevel! equ 0 (
echo wsl ip has set
) else (
::IP不存在则绑定IP
wsl -u root ip addr add 192.168.3.100/24 broadcast 192.168.3.255 dev eth0 label eth0:1
echo set wsl ip success: 192.168.3.100
)
::检查宿主机有没有我需要的IP
ipconfig | findstr "192.168.3.200" > nul
if !errorlevel! equ 0 (
echo windows ip has set
) else (
::IP不存在则绑定IP
netsh interface ip add address "vEthernet (WSL)" 192.168.3.200 255.255.255.0
echo set windows ip success: 192.168.3.200
)
)
::为主机设置SSH转发端口
netsh interface portproxy add v4tov4 listenaddress=0.0.0.0 listenport=22 connectaddress=192.168.3.100 connectport=22
netsh interface portproxy add v4tov4 listenaddress=0.0.0.0 listenport=8888 connectaddress=192.168.3.100 connectport=8888

pause

启动Jupter Lab

1
2
3
# 如果没有激活环境,先确保激活环境
conda activate pytorch
jupyter-lab --ip 0.0.0.0 --port 8888 --no-browser --allow-root

1.7.2 防火墙问题

如果不关闭Win11 防火墙需要新建入站规则

image-20230421224233666