Informacije o napravi1
O osnovnih informacijah o prisotnih napravah v sistemu lahko izvemo z zagonom programa Nvidia System Management Interface
.
nvida-smi --query
Če pri delu uporabljamo gručo Arnes, moramo zgornji program pognati na enem od računskih vozlišč, ki vsebujejo GPE. To storimo s spodnjim ukazom:
$ srun -G1 --partition=gpu nvidia-smi --query
Izpis razkrije osnovne informacije o prisotnih napravah, kot so ime, znamka, informacije o vodilu, izraba, zasedenost pomnilnika, frekvenca delovanja in poraba energije.
Izhod nvidia-smi --query
==============NVSMI LOG==============
Timestamp : Sat Jun 18 10:30:37 2022
Driver Version : 510.39.01
CUDA Version : 11.6
Attached GPUs : 1
GPU 00000000:81:00.0
Product Name : Tesla V100S-PCIE-32GB
Product Brand : Tesla
Product Architecture : Volta
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Disabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 1562820002759
GPU UUID : GPU-57b2d021-f0e4-5d7f-b433-671b628da8cc
Minor Number : 0
VBIOS Version : 88.00.98.00.01
MultiGPU Board : No
Board ID : 0x8100
GPU Part Number : 900-2G500-0440-030
Module ID : 0
Inforom Version
Image Version : G500.0212.00.02
OEM Object : 1.1
ECC Object : 5.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x81
Device : 0x00
Domain : 0x0000
Device Id : 0x1DF610DE
Bus Id : 00000000:81:00.0
Sub System Id : 0x13D610DE
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 32768 MiB
Reserved : 257 MiB
Used : 0 MiB
Free : 32510 MiB
BAR1 Memory Usage
Total : 32768 MiB
Used : 2 MiB
Free : 32766 MiB
Compute Mode : Default
Utilization
Gpu : 4 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Texture Shared : N/A
CBU : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending Page Blacklist : No
Remapped Rows : N/A
Temperature
GPU Current Temp : 32 C
GPU Shutdown Temp : 90 C
GPU Slowdown Temp : 87 C
GPU Max Operating Temp : 83 C
GPU Target Temperature : N/A
Memory Current Temp : 29 C
Memory Max Operating Temp : 85 C
Power Readings
Power Management : Supported
Power Draw : 37.29 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 100.00 W
Max Power Limit : 250.00 W
Clocks
Graphics : 1245 MHz
SM : 1245 MHz
Memory : 1107 MHz
Video : 1132 MHz
Applications Clocks
Graphics : 1245 MHz
Memory : 1107 MHz
Default Applications Clocks
Graphics : 1245 MHz
Memory : 1107 MHz
Max Clocks
Graphics : 1597 MHz
SM : 1597 MHz
Memory : 1107 MHz
Video : 1432 MHz
Max Customer Boost Clocks
Graphics : 1597 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Processes : None
Izpis lastnosti naprave s pomočjo izvajalnega okolja CUDA
Izvajalno okolje CUDA nudi veliko funkcij, ki nam pomagajo upravljati z napravami. Na primer, lahko uporabimo naslednji dve funkciji, da pridobimo vse informacije o GPU napravah:
cudaError_t cudaGetDeviceProperties(cudaDeviceProp* prop, int device)
,
ki vrne osnovne lastnosti izbrane naprave, in
cudaError_t cudaDeviceGetAttribute(int* value, cudaDeviceAttr attr, int device)
,
ki vrne informacijo o posamezni lastnosti naprave.
Opis obeh funkcij ter pripadajočih argumentov in podatkovnih struktur najdete na spletu v dokumentaciji Cuda Toolkit.
Spodnja koda iziše informacije o napravi:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
|
Izhod za Nvidia Tesla V100:
Izhod gpuinfo
========== cudaDeviceGetProperties ============
Device 0: "Tesla V100S-PCIE-32GB"
GPU Clock Rate (MHz): 1597
Memory Clock Rate (MHz): 1107
Memory Bus Width (bits): 4096
Peak Memory Bandwidth (GB/s): 1133.57
CUDA Cores/MP: 64
CUDA Cores: 5120
Total amount of global memory: 32 GB
Total amount of shared memory per block: 48 kB
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
========== cudaDeviceGetAttribute ============
Device 0: "Tesla V100S-PCIE-32GB"
Max number of threads per block: 1024
Max block dimension X: 1024
Max block dimension Y: 1024
Max block dimension Z: 64
Max grid dimension X: 2147483647
Max grid dimension Y: 65535
Max grid dimension Z: 65535
Max shared memory per block: 49152
Warp size: 32
Peak clock frequency in kilohertz: 1597000
Peak memory clock frequency in kilohertz: 1107000
Global memory bus width in bits: 4096
Size of L2 cache in bytes: 6291456
Maximum resident threads per SM: 2048
Major compute capability version number: 7
Minor compute capability version number: 0
Max shared memory per SM in bytes: 98304
Max number of 32-bit registers per SM: 65536
Max per block shmem size on the device: 98304
Max thread blocks that can reside on a SM: 32
Vidimo, da ima GPETesla V100 5120 jeder CUDA, s 64 jedri na vsakem od 80 računskih enot. Poizvedba tudi navede razpoložljiv globalni pomnilnik, skupni pomnilnik in registre. Čeprav je največje število niti v mreži zelo visoko, je še vedno omejeno, in programi ne smejo presegati teh omejitev.
Zgornja koda je objavljena na repozitoriju delavnice skupaj z navodili, kako jo prevedemo in poženemo na gruči Arnes 01-discover-devices
.
-
© Patricio Bulić, Davor Sluga, Univerza v Ljubljani, Fakulteta za računalništvo in informatiko. Gradivo je objavljeno pod licenco Creative Commons Priznanje avtorstva-Nekomercialno-Deljenje pod enakimi pogoji 4.0 Mednarodna. ↩