Skip to content

Data management

Data handling

Clusters use various locations for data:

  • user space on login node (user's home, usually limited with quota)
  • space on local storage (agreement with administrators is usually requested)
  • temporary space for job input/output when using ARC middleware
  • data available only via ARC RTE
  • short and long term storage on SRM/dCache servers (eg. dcache.arnes.si)

ARC middleware additionally supports various protocols for access: ftp, gsiftp, http, https, httpg, dav, davs, ldap, srm, root, rucio, s3.

ARC uses cache for input data and can optimize transfers (retransfers, using only one transfer for jobs using same dataset, ...)

Storing data on remote dCache server

  • Arnes maintains dCache server, available to SLING users, members of gen.vo.sling.si and othe VOs
  • for jobs input-output data, there is 100TB of available space
  • members of same VO can read data of other members, hence the default setup is not appropriate for confidential un-encrypted data
  • no backup is being made for data on dCache server

Basic instructions for using Arnes dCache ara available on SLING pages (currently only in Slovene language).

ARC client provides commands for direct handling of data that can also be used for job input/output.

  • arcls list files in remote storage
  • arccp making copies of files
  • arcrm removal of files
  • arcmkdir creating a new directory
  • arcrename renaming files

Examples for WebDAV protocol

Example for arcls:

$ arcls https://dcache.sling.si:2880/gen.vo.sling.si/test/

file1.txt
directory1/

Example for arccp:

$ arccp test.txt https://dcache.sling.si:2880/gen.vo.sling.si/test/directory2/
Directory directory2 is automatically created if it does not already exist in above example. Slash at the end signifies that directory2 is a directory, omitting it would copy file test.txt into file directory2 on server. Command arcmkdir is not functional with WebDAV protocol.

Example for arcrm:

$ arcrm https://dcache.sling.si:2880/gen.vo.sling.si/test/imenik2
If argument for arcrm is a directory, entire content of directory will be removed.

Example of copy from one server to another:

$ arccp -r https://dcache.arnes.si:2880/data/arnes.si/gen.vo.sling.si/projekt1/ https://dcache.sling.si:2880/gen.vo.sling.si/projekt1/

Examples for GridFTP protocol

Deprecated protocol

The GridFTP protocol is deprecated and can cause problems in usage. We recommend using WebDAV (above) if at all possible.

Example for arcls:

$ arcls srm://dcache.sling.si/gen.vo.sling.si/project_name/

centos7.sif
gmp_test.c
gmp_test.sh
gmp_test.xrsl
...

To use gsiftp protocol, CRL files of related certificate authorities must be renewed daily, using fetch-crl command. It is advisable to use cron job, installed by fetch-crl package for automatic regular updates. Example for arccp:

arccp test.txt gsiftp://dcache.sling.si/gen.vo.sling.si/proj_name

Example for arcrm:

arcrm srm://dcache.sling.si/gen.vo.sling.si/proj_name/test

Example for arcmkdir:

arcmkdir srm://dcache.sling.si/gen.vo.sling.si/proj_name/test

S3 Object Storage Usage

HPC Vega is offering object storage. To obtain credentials, Openstack client is needed. For data management any S3 client should work, below is an example for s5cmd. Users of Vega HPC can use Openstack client on login nodes. Initial user quota is set to 100GB.

Obtaining key and secret for accessing project in S3 object storage:

openstack --os-auth-url http://auth01.ijs.si:5000/v3 --os-project-domain-name sling --os-user-domain-name sling --os-project-name <ime_projekta> --os-username <uporabniško_ime> ec2 credentials create

Paramaters can be saved as environment varibales:

OS_AUTH_URL=https://keystone.sling.si:5000/v3
OS_PROJECT_NAME=<ime_projekta>
OS_PROJECT_DOMAIN_NAME=sling
OS_USER_DOMAIN_NAME=sling
OS_IDENTITY_API_VERSION=3
OS_URL=https://keystone.sling.si:5000/v3
OS_USERNAME=<uporabniško_ime>

In this case, command for obtaining key and secret is simplified:

openstack ec2 credentials create

Example for s5cmd Client

For data transfer, s5cmd client can be used.

Obtained key and secret should be written in file ~/.aws/credentials. The file and directory should be protected for reading from other users:

mkdir ~/.aws
chmod 700 ~/.aws
touch ~/.aws/credentials
chmod 600 ~/.aws/credentials
cat >~/.aws/credentials <<EOF
[default]
aws_access_key_id = <access>
aws_secret_access_key = <secret>
EOF

Listing contents:

s5cmd --endpoint-url https://ceph-s3.vega.izum.si ls

Example of bucket creation

s5cmd mb test1

Example of file copy into a bucket:

s5cmd --endpoint-url https://ceph-s3.vega.izum.si cp primer.txt s3://test1/