Skoči na vsebino

Informacije o napravi1

O osnovnih informacijah o prisotnih napravah v sistemu lahko izvemo z zagonom programa Nvidia System Management Interface.

nvida-smi --query

Če pri delu uporabljamo gručo Arnes, moramo zgornji program pognati na enem od računskih vozlišč, ki vsebujejo GPE. To storimo s spodnjim ukazom:

$ srun -G1 --partition=gpu nvidia-smi --query

Izpis razkrije osnovne informacije o prisotnih napravah, kot so ime, znamka, informacije o vodilu, izraba, zasedenost pomnilnika, frekvenca delovanja in poraba energije.

Izhod nvidia-smi --query
==============NVSMI LOG==============

Timestamp                                 : Sat Jun 18 10:30:37 2022
Driver Version                            : 510.39.01
CUDA Version                              : 11.6

Attached GPUs                             : 1
GPU 00000000:81:00.0
    Product Name                          : Tesla V100S-PCIE-32GB
    Product Brand                         : Tesla
    Product Architecture                  : Volta
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Disabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1562820002759
    GPU UUID                              : GPU-57b2d021-f0e4-5d7f-b433-671b628da8cc
    Minor Number                          : 0
    VBIOS Version                         : 88.00.98.00.01
    MultiGPU Board                        : No
    Board ID                              : 0x8100
    GPU Part Number                       : 900-2G500-0440-030
    Module ID                             : 0
    Inforom Version
        Image Version                     : G500.0212.00.02
        OEM Object                        : 1.1
        ECC Object                        : 5.0
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x81
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x1DF610DE
        Bus Id                            : 00000000:81:00.0
        Sub System Id                     : 0x13D610DE
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 3
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 0 KB/s
    Fan Speed                             : N/A
    Performance State                     : P0
    Clocks Throttle Reasons
        Idle                              : Not Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 32768 MiB
        Reserved                          : 257 MiB
        Used                              : 0 MiB
        Free                              : 32510 MiB
    BAR1 Memory Usage
        Total                             : 32768 MiB
        Used                              : 2 MiB
        Free                              : 32766 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 4 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : Enabled
        Pending                           : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : 0
            Double Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : 0
                Total                     : 0
        Aggregate
            Single Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : 0
            Double Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : 0
                Total                     : 0
    Retired Pages
        Single Bit ECC                    : 0
        Double Bit ECC                    : 0
        Pending Page Blacklist            : No
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : 32 C
        GPU Shutdown Temp                 : 90 C
        GPU Slowdown Temp                 : 87 C
        GPU Max Operating Temp            : 83 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : 29 C
        Memory Max Operating Temp         : 85 C
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 37.29 W
        Power Limit                       : 250.00 W
        Default Power Limit               : 250.00 W
        Enforced Power Limit              : 250.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 250.00 W
    Clocks
        Graphics                          : 1245 MHz
        SM                                : 1245 MHz
        Memory                            : 1107 MHz
        Video                             : 1132 MHz
    Applications Clocks
        Graphics                          : 1245 MHz
        Memory                            : 1107 MHz
    Default Applications Clocks
        Graphics                          : 1245 MHz
        Memory                            : 1107 MHz
    Max Clocks
        Graphics                          : 1597 MHz
        SM                                : 1597 MHz
        Memory                            : 1107 MHz
        Video                             : 1432 MHz
    Max Customer Boost Clocks
        Graphics                          : 1597 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : N/A
    Processes                             : None

Izpis lastnosti naprave s pomočjo izvajalnega okolja CUDA

Izvajalno okolje CUDA nudi veliko funkcij, ki nam pomagajo upravljati z napravami. Na primer, lahko uporabimo naslednji dve funkciji, da pridobimo vse informacije o GPU napravah:

cudaError_t cudaGetDeviceProperties(cudaDeviceProp* prop, int device),

ki vrne osnovne lastnosti izbrane naprave, in

cudaError_t cudaDeviceGetAttribute(int* value, cudaDeviceAttr attr, int device),

ki vrne informacijo o posamezni lastnosti naprave.

Opis obeh funkcij ter pripadajočih argumentov in podatkovnih struktur najdete na spletu v dokumentaciji Cuda Toolkit.

Spodnja koda iziše informacije o napravi:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
#include <stdio.h>
#include <cuda.h>
#include <cuda_runtime.h>

#include "helper_cuda.h"

int main(int argc, char **argv) {

  // Get number of GPUs
  int deviceCount = 0;
  cudaError_t error = cudaGetDeviceCount(&deviceCount);

  if (error != cudaSuccess) {
    printf("cudaGetDeviceCount error %d\n-> %s\n", error, cudaGetErrorString(error));
    exit(EXIT_FAILURE);
  }

  // Get device propreties and print 
  for (int dev = 0; dev < deviceCount; dev++) {
    struct cudaDeviceProp prop;
    int value;
    printf("\n ==========  cudaDeviceGetProperties ============  \n\n");
    cudaGetDeviceProperties(&prop, dev);
    printf("\nDevice %d: \"%s\"\n", dev, prop.name);
    printf("  GPU Clock Rate (MHz):                          %d\n", prop.clockRate/1000);
    printf("  Memory Clock Rate (MHz):                       %d\n", prop.memoryClockRate/1000);
    printf("  Memory Bus Width (bits):                       %d\n", prop.memoryBusWidth);
    printf("  Peak Memory Bandwidth (GB/s):                  %.2f\n", 2.0*prop.memoryClockRate*(prop.memoryBusWidth/8)/1.0e6);
    printf("  CUDA Cores/MP:                                 %d\n", _ConvertSMVer2Cores(prop.major, prop.minor));
    printf("  CUDA Cores:                                    %d\n", _ConvertSMVer2Cores(prop.major, prop.minor) *
           prop.multiProcessorCount);
    printf("  Total amount of global memory:                 %.0f GB\n", prop.totalGlobalMem / 1073741824.0f);
    printf("  Total amount of shared memory per block:       %zu kB\n",
           prop.sharedMemPerBlock/1024);
    printf("  Total number of registers available per block: %d\n",
           prop.regsPerBlock);
    printf("  Warp size:                                     %d\n",
           prop.warpSize);
    printf("  Maximum number of threads per block:           %d\n",
           prop.maxThreadsPerBlock);
    printf("  Max dimension size of a thread block (x,y,z): (%d, %d, %d)\n",
           prop.maxThreadsDim[0], prop.maxThreadsDim[1],
           prop.maxThreadsDim[2]);
    printf("  Max dimension size of a grid size    (x,y,z): (%d, %d, %d)\n",
           prop.maxGridSize[0], prop.maxGridSize[1],
           prop.maxGridSize[2]);

    printf("\n\n\n ==========  cudaDeviceGetAttribute ============  \n\n");
    cudaDeviceGetAttribute (&value, cudaDevAttrMaxThreadsPerBlock, dev);
    printf("  Max number of threads per block:              %d\n",
           value);
    cudaDeviceGetAttribute (&value, cudaDevAttrMaxBlockDimX, dev);
    printf("  Max block dimension X:                        %d\n",
           value);
    cudaDeviceGetAttribute (&value, cudaDevAttrMaxBlockDimY, dev);
    printf("  Max block dimension Y:                        %d\n",
           value);
    cudaDeviceGetAttribute (&value, cudaDevAttrMaxBlockDimZ, dev);
    printf("  Max block dimension Z:                        %d\n",
           value);
    cudaDeviceGetAttribute (&value, cudaDevAttrMaxGridDimX, dev);
    printf("  Max grid dimension X:                         %d\n",
           value);
    cudaDeviceGetAttribute (&value, cudaDevAttrMaxGridDimY, dev);
    printf("  Max grid dimension Y:                         %d\n",
           value);
    cudaDeviceGetAttribute (&value, cudaDevAttrMaxGridDimZ, dev);
    printf("  Max grid dimension Z:                         %d\n",
           value);
    cudaDeviceGetAttribute (&value, cudaDevAttrMaxSharedMemoryPerBlock, dev);
    printf("  Max shared memory per block:                  %d\n",
           value);
    cudaDeviceGetAttribute (&value, cudaDevAttrWarpSize, dev);
    printf("  Warp size:                                    %d\n",
           value);      
    cudaDeviceGetAttribute (&value, cudaDevAttrClockRate, dev);
    printf("  Peak clock frequency in kilohertz:            %d\n",
           value);
    cudaDeviceGetAttribute (&value, cudaDevAttrMemoryClockRate, dev);
    printf("  Peak memory clock frequency in kilohertz:     %d\n",
           value);
    cudaDeviceGetAttribute (&value, cudaDevAttrGlobalMemoryBusWidth, dev);
    printf("  Global memory bus width in bits:              %d\n",
           value);
    cudaDeviceGetAttribute (&value, cudaDevAttrL2CacheSize, dev);
    printf("  Size of L2 cache in bytes:                    %d\n",
           value);
    cudaDeviceGetAttribute (&value, cudaDevAttrMaxThreadsPerMultiProcessor, dev);
    printf("  Maximum resident threads per SM:              %d\n",
           value);
    cudaDeviceGetAttribute (&value, cudaDevAttrComputeCapabilityMajor, dev);
    printf("  Major compute capability version number:      %d\n",
           value);
    cudaDeviceGetAttribute (&value, cudaDevAttrComputeCapabilityMinor, dev);
    printf("  Minor compute capability version number:      %d\n",
           value);
    cudaDeviceGetAttribute (&value, cudaDevAttrMaxSharedMemoryPerMultiprocessor, dev);
    printf("  Max shared memory per SM in bytes:            %d\n",
           value);
    cudaDeviceGetAttribute (&value, cudaDevAttrMaxRegistersPerMultiprocessor, dev);
    printf("  Max number of 32-bit registers per SM:        %d\n",
           value);  
    cudaDeviceGetAttribute (&value, cudaDevAttrMaxSharedMemoryPerBlockOptin, dev);
    printf("  Max per block shmem size on the device:       %d\n",
           value);  
    cudaDeviceGetAttribute (&value, cudaDevAttrMaxBlocksPerMultiprocessor, dev);
    printf("  Max thread blocks that can reside on a SM:    %d\n",
           value);  
  }
}

Izhod za Nvidia Tesla V100:

Izhod gpuinfo
==========  cudaDeviceGetProperties ============  

Device 0: "Tesla V100S-PCIE-32GB"
  GPU Clock Rate (MHz):                          1597
  Memory Clock Rate (MHz):                       1107
  Memory Bus Width (bits):                       4096
  Peak Memory Bandwidth (GB/s):                  1133.57
  CUDA Cores/MP:                                 64
  CUDA Cores:                                    5120
  Total amount of global memory:                 32 GB
  Total amount of shared memory per block:       48 kB
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)


==========  cudaDeviceGetAttribute ============  

Device 0: "Tesla V100S-PCIE-32GB"
  Max number of threads per block:              1024
  Max block dimension X:                        1024
  Max block dimension Y:                        1024
  Max block dimension Z:                        64
  Max grid dimension X:                         2147483647
  Max grid dimension Y:                         65535
  Max grid dimension Z:                         65535
  Max shared memory per block:                  49152
  Warp size:                                    32
  Peak clock frequency in kilohertz:            1597000
  Peak memory clock frequency in kilohertz:     1107000
  Global memory bus width in bits:              4096
  Size of L2 cache in bytes:                    6291456
  Maximum resident threads per SM:              2048
  Major compute capability version number:      7
  Minor compute capability version number:      0
  Max shared memory per SM in bytes:            98304
  Max number of 32-bit registers per SM:        65536
  Max per block shmem size on the device:       98304
  Max thread blocks that can reside on a SM:    32

Vidimo, da ima GPETesla V100 5120 jeder CUDA, s 64 jedri na vsakem od 80 računskih enot. Poizvedba tudi navede razpoložljiv globalni pomnilnik, skupni pomnilnik in registre. Čeprav je največje število niti v mreži zelo visoko, je še vedno omejeno, in programi ne smejo presegati teh omejitev.

Zgornja koda je objavljena na repozitoriju delavnice skupaj z navodili, kako jo prevedemo in poženemo na gruči Arnes 01-discover-devices.


  1. © Patricio Bulić, Davor Sluga, Univerza v Ljubljani, Fakulteta za računalništvo in informatiko. Gradivo je objavljeno pod licenco Creative Commons Priznanje avtorstva-Nekomercialno-Deljenje pod enakimi pogoji 4.0 Mednarodna