Device information1
Runtime information
We can get some basic information about devices installed in the system by running the Nvidia System Management Interface program
nvida-smi --query
The output reveals the basic information of attached devices like name, brand, bus information, utilization, memory usage, clock rates, and power consumption.
Output nvidia-smi --query
==============NVSMI LOG==============
Timestamp : Sat Jun 18 10:30:37 2022
Driver Version : 510.39.01
CUDA Version : 11.6
Attached GPUs : 1
GPU 00000000:81:00.0
Product Name : Tesla V100S-PCIE-32GB
Product Brand : Tesla
Product Architecture : Volta
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Disabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1562820002759
GPU UUID : GPU-57b2d021-f0e4-5d7f-b433-671b628da8cc
Minor Number : 0
VBIOS Version : 88.00.98.00.01
MultiGPU Board : No
Board ID : 0x8100
GPU Part Number : 900-2G500-0440-030
Module ID : 0
Inforom Version
Image Version : G500.0212.00.02
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x81
Device : 0x00
Domain : 0x0000
Device Id : 0x1DF610DE
Bus Id : 00000000:81:00.0
Sub System Id : 0x13D610DE
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 32768 MiB
Reserved : 257 MiB
Used : 0 MiB
Free : 32510 MiB
BAR1 Memory Usage
Total : 32768 MiB
Used : 2 MiB
Free : 32766 MiB
Compute Mode : Default
Utilization
Gpu : 4 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending Page Blacklist : No
Remapped Rows : N/A
Temperature
GPU Current Temp : 32 C
GPU Shutdown Temp : 90 C
GPU Slowdown Temp : 87 C
GPU Max Operating Temp : 83 C
GPU Target Temperature : N/A
Memory Current Temp : 29 C
Memory Max Operating Temp : 85 C
Power Readings
Power Management : Supported
Power Draw : 37.29 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 100.00 W
Max Power Limit : 250.00 W
Clocks
Graphics : 1245 MHz
SM : 1245 MHz
Memory : 1107 MHz
Video : 1132 MHz
Applications Clocks
Graphics : 1245 MHz
Memory : 1107 MHz
Default Applications Clocks
Graphics : 1245 MHz
Memory : 1107 MHz
Max Clocks
Graphics : 1597 MHz
SM : 1597 MHz
Memory : 1107 MHz
Video : 1432 MHz
Max Customer Boost Clocks
Graphics : 1597 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Processes : None
Device properties
Many functions are available in the CUDA runtime API to help us manage devices. For example, we can use the following two functions to query all information about GPU devices:
cudaError_t cudaGetDeviceProperties(cudaDeviceProp* prop, int device)
which returns properties for a selected device, and
cudaError_t cudaDeviceGetAttribute(int* value, cudaDeviceAttr attr, int device)
which returns information about the device.
A description of both functions and associated arguments and data structures can be found online in Cuda Toolkit Documentation.
The code below retrieves the basic device information:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
|
The output:
========== cudaDeviceGetProperties ============
Device 0: "Tesla V100S-PCIE-32GB"
GPU Clock Rate (MHz): 1597
Memory Clock Rate (MHz): 1107
Memory Bus Width (bits): 4096
Peak Memory Bandwidth (GB/s): 1133.57
CUDA Cores/MP: 64
CUDA Cores: 5120
Total amount of global memory: 32 GB
Total amount of shared memory per block: 48 kB
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
========== cudaDeviceGetAttribute ============
Device 0: "Tesla V100S-PCIE-32GB"
Max number of threads per block: 1024
Max block dimension X: 1024
Max block dimension Y: 1024
Max block dimension Z: 64
Max grid dimension X: 2147483647
Max grid dimension Y: 65535
Max grid dimension Z: 65535
Max shared memory per block: 49152
Warp size: 32
Peak clock frequency in kilohertz: 1597000
Peak memory clock frequency in kilohertz: 1107000
Global memory bus width in bits: 4096
Size of L2 cache in bytes: 6291456
Maximum resident threads per SM: 2048
Major compute capability version number: 7
Minor compute capability version number: 0
Max shared memory per SM in bytes: 98304
Max number of 32-bit registers per SM: 65536
Max per block shmem size on the device: 98304
Max thread blocks that can reside on a SM: 32
We can see that the Tesla V100S GPU has 5120 SMs or cores, with 64 cores per each of 80 multiprocessors. The query also lists available global memory, shared memory, and registers. Although the number of threads in a grid is very high, it is still limited, and programs should not exceeded the limits.
The above code is published in folder 01-discover-devices
of the workshop's repo.
-
© Patricio Bulić, University of Ljubljana, Faculty of Computer and Information Science. The material is published under license CC BY-NC-SA 4.0. ↩