NVIDA CUDA on Ubuntu Server

In my previous post about ethereum mining on Ubuntu I ended by stating I wanted to look at what it would take to get NVIDIA’s CUDA drivers. Use of the CUDA drivers unlocks even further performance from my NVIDIA GTX 1070 graphics card in certain applications and specifically can demonstrate improvements while doing ethereum mining.

This post will demonstrate two methods of install for the CUDA drivers:

Ubuntu package archive
NVIDIA package archive

Ubuntu Package Archive Link to heading

Install from the archive is extremely simple and quick. All that is required is to install the nvidia-cuda-toolkit package and it will also get all the required CUDA libraries and tools:

1
2
3
sudo apt update
sudo apt install nvidia-cuda-toolkit
sudo shutdown -r now

After rebooting, to verify that CUDA drivers are installed there are three ways to check that everything is up and running:

Check that the nvidia* device files exist in /dev
Use the nvcc command to show what version of the driver is installed
Run nvidia-smi to get detailed version about the device like power and fan info, processes using the GPU, and driver versions.

Example output of all three is below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ ls /dev/nvidia*
/dev/nvidia0  /dev/nvidiactl /dev/nvidia-uvm
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
$ nvidia-smi
Sat Oct 28 14:17:15 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   49C    P8    11W / 185W |     99MiB /  8110MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1170      G   /usr/lib/xorg/Xorg                            97MiB |
+-----------------------------------------------------------------------------+

NVIDIA Package Archive Link to heading

This section will look to downloading the library directly from NVIDIA itself in order to get the latest version of the package. The NVIDIA repo also contains a variety of meta packages allowing an end-user to limit the install to the libraries, runtime, or the tool kits that are needed versus installing everything.

NVIDIA runs a repo which can be added to apt and then install directly from. Using this repo means the install will stay up-to-date. I will use the CUDA metapackage, which will install all CUDA toolkit and driver packages and upgrade both as new versions are released:

1
2
3
4
5
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/ /" | sudo tee /etc/apt/sources.list.d/cuda.list
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo apt update
sudo apt install cuda
sudo shutdown -r now

The final step, which is required is to modify your path to point at the binaries:

1
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}

Test CUDA Link to heading

Similar to installs from the archive below is output from /dev, nvcc, and nvidia-smi:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ ls /dev/nvidia*
/dev/nvidia0  /dev/nvidiactl  /dev/nvidia-modeset  /dev/nvidia-uvm  /dev/nvidia-uvm-tools
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
$ nvidia-smi
Sat Oct 28 15:57:59 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   41C    P8    10W / 185W |    121MiB /  8110MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1181      G   /usr/lib/xorg/Xorg                           119MiB |
+-----------------------------------------------------------------------------+

Additionally, the package comes with some additional scripts that are interesting to play with, found under /usr/local/cuda/extras/demo_suite.

Device Query Link to heading

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
$ /usr/local/cuda/extras/demo_suite/deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1070"
  CUDA Driver Version / Runtime Version          9.0 / 9.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 8111 MBytes (8504868864 bytes)
  (15) Multiprocessors, (128) CUDA Cores/MP:     1920 CUDA Cores
  GPU Max Clock rate:                            1797 MHz (1.80 GHz)
  Memory Clock rate:                             4004 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

Bandwidth Test Link to heading

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ /usr/local/cuda/extras/demo_suite/bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX 1070
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         12665.1

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         12916.1

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         190526.5

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Ubuntu Package Archive Link to heading

NVIDIA Package Archive Link to heading

Test CUDA Link to heading

Device Query Link to heading

Bandwidth Test Link to heading

References Link to heading