Posts by Year

2023

Hpc network technologies

In the rapidly evolving landscape of High-Performance Computing (HPC) and Artificial Intelligence (AI), understanding the nuances between various networking ...

Nccl test on aks ndmv4 vm

This write-up aims to replicate the blog Deploy NDm_v4 (A100) Kubernetes Cluster by Cormac Garvey. The original blog assumes you have an exising ACR.

Azhop backbone cost analyses

In the rapidly evolving landscape of High-Performance Computing (HPC) and Artificial Intelligence (AI), the quest for optimizing operational cost without com...

Azhop add anf volume

Network topology When create AZHOP without NetApp volumes, a subnet for ANF will still be created as shown in figure below: You can manually add a ANF volum...

Azure nccl test on ncv4

You can setup a SLURM cluster on Azure using AZHOP. This blog has details on how to deploy AZHOP.

Azhop e2e deployment

Azure HPC On-Demand Platform (az-hop) is a tool that provides an end-to-end deployment mechanism for a base HPC infrastructure on Azure. It uses industry sta...

H100 white paper

H100 is NVIDIA’s 9th-generation data center GPU. For today’s mainstream AI and HPC models, H100 with InfiniBand interconnect delivers up to 30 times the perf...

A100 white paper

Third generation Tensor Cores Tensor Cores are specialized high-performance compute cores that perform mixed-precision matrix multiply and accu...

Change docker root dir

System information # cat /etc/*release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.04 DISTRIB_CODENAME=bionic DISTRIB_DESCRIPTION="Ubuntu 18.04.6 LTS" NAME="Ubuntu"...

Run rstudio server with nvidia enroot

This is to explain how to run Rstudio Server container image using NVIDIA Enroot/Pyxis. You have the option to deploy the container using Docker. Here is a n...

Slurm hybrid cluster setup in azure

This post is to demonstrate how to setup SLURM federation between an on-prem cluster and an Azure cluster. Both clusters will be deployed by azhop. Please re...

2022

Azure Hpc Resources

Azure HPC documentation Azure high-performance computing High-performance computing (HPC) on Azure Azure HPC-Certification Run high-performance co...

Magnum io

Magnum IO SDK accelerates and enables developers to optimize all data access, movement and management between CPU, GPU, DPU and Storage.

Nvidia modulus singularity run on gpu

Get the docker image from NVIDIA website. You need to register and login. Download file is modulus_image_v21.06.tar.gz (5.7G). Build a singularity image. ml ...

Mpi processes per gpu

Some guidlines on selecting the number of MPI processes per GPU When using the GPU package, you cannot assign more than one GPU to a single MPI task Mul...

Nvidia profilers

Profiling PyTorch (PyProf) PyProf is a tool that profiles and analyzes the GPU performance of PyTorch models. PyProf aggregates kernel performa...

2021

Nv dli fundamentals of dl

At times you will need to clear the GPUs memory, either to reset the GPU state when an experiment goes wrong, or, in between notebooks when you need a fresh ...

2020

Lammps compilation on nersc

NOTE: NERSC provides compiled LAMMPS binaries as modules. This tutorial is for users with modified source code.

Download nimh data with ndatools

In order to access NIMH Data Archive, users need to apply for an account using this link. Project PI needs to add the new user to their project for data acce...

Create R environment using conda

```bash #install Miniconda mkdir ~/conda; cd ~/conda wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-...

Rstudio installation with conda

Install Miniconda if you haven’t done so. mkdir ~/conda; cd ~/conda wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Mi...

R jupyter notebook kernal

Install Miniconda and configure Jupyter notebook to run R kernals. First make sure X11 forwarding works on your local machine. Then follow the steps below. `...

Aepera quick setup

Aspera is IBM’s high-performance file transfer software which allows for the transfer large files and data sets with predictable, reliable and secure deliver...

2019

Ms word replace line break with return

This is mostly used for preparing a reponse letter for journal reviews. Copying/pasting reviewers’ comments to a MS Word file ends up with lots of line break...

Matlab parallel server

This document provides the steps to configure MATLAB to submit jobs to a cluster, retrieve results, and debug errors.

Check whether infiniband is being used

Check for number incrementals in files at /sys/class/infiniband/mlx4_0/ports/1/counters/ watch -n .5 -diff=cumm 'grep -H . /sys/class/infiniband/mlx4_0/ports...

Matlab alexnet support package install

If Deep Learning Toolbox Model for AlexNet Network support package is not installed, then the function provides a link to the required support package in the...

Create shared conda environment

create a shared location in $COMMON mkdir /common/GROUP/conda chown :GROUP conda chmod 770 conda create conda env in the shared location ...

Slurm_cgroup_memory_usage

Put the line below at the end of a slurm submit script to find out the memory usage. cgget -g memory /slurm/uid_${UID}/job_${SLURM_JOB_ID}

2018

Coldfront deployment

install LDAP dependences yum install python-devel openldap-devel -y pip install django-auth-ldap pip install python-ldap ldap3

Notes on hcc purge

Crane/Tusker purged files are temporarily stored at /lustre/backup/robinhood/YYYY-WXX (YYYY=year and XX=week number) for 2 weeks. Files beyond 2 weeks are...

Smileimpi compilation

Compile OpenMPI with MPI_THREAD_MULTIPLE support. ./configure --enable-mpi-thread-multiple ml load compiler/gcc/4.9 openmpi/1.10 phdf5/1.8 python/3.6 ...

Anvil add new user

From Tusker/Crane, ssh beta.anvil.hcc.unl.edu sudo su - source /root/keystonerc_admin /util/accounts-mgmt-cli/anvilman.py -v -g GROUPNAME -u USERNAM...

Cp2k compilation with plumed

Building Your Own FFTW3 Interface Wrapper Library source /util/opt/lmod/lmod/init/profile ml load compiler/intel/16 cd /util/comp/intel/2016.3/mkl/interfaces...

Acs reviewing certificate

Click the button to see and download your peer review activity certificate. We hope you will display it with the same pride we take in working with such an a...

My elsevier reviews profile

Peer review is the cornerstone of science, and Elsevier is dedicated to supporting and recognizing our journals’ reviewers. My Elsevier Reviews Profile aims ...

El 7 docker upgrade to docker Ce

yum erase *docker* -y yum-config-manager –add-repo “https://download.docker.com/linux/centos/docker-ce.repo” yum install docker-ci -y systemctl stop...

Module usage search in splunk

https://slurp.unl.edu:8000/en-US/account/login?return_to=%2Fen-US%2Fapp%2Flauncher%2Fhome HCC username and password “Launch search app” “Lmod loggin...

Conda build

Tensorflow 1.10.1 https://www.tensorflow.org/install/source#tested_source_configurations https://github.com/tensorflow/tensorflow/blob/v1.10.1/tensorflow/too...

Hcc Puppet

https://hcc-git.unl.edu/red-puppet.git git clone git@hcc-git.unl.edu:red-puppet

Conda project

New Project ```bash mkdir packages/blast cd packages/blast cp ../../anaconda-project.yml.example ./anaconda-project.yml anaconda-project add-env-spec -n blas...

Boost openmpi

Boost $ ./bootstrap.sh --prefix=/home/username/usr $ vim boost_1_55_0/tools/build/v2/user-config.jam using mpi ;#add this to the last line $ ./b2 install --p...

Xdmod boot from snapshot

OpenStack Flavor: general-xlarge Boot from snapshot: “hcc-xdmod prod snap 3/8/18” Security Groups: Port 8080/tcp; default Net...

Slurm cluster usage per group

$ sreport cluster AccountUtilizationByUser accounts=swanson start=1/1/17 end=12/31/17 -t Hour ---------------------------------------------------------------...

Lustre Acl Control

get the ACL’s for /home/usera [usera@login-damiana ~]# lfs lgetfacl /home/usera # file: home/usera # owner: usera # group: users user::rwx group::--- other:...

2017

Time sync using chrony

chrony is a versatile implementation of the Network Time Protocol (NTP). It can synchronize the system clock with NTP servers, reference clocks (e.g. GPS rec...

Apache ssl

xdmod.conf ```bash Redirect HTTP to HTTPS <VirtualHost *:80> ServerName hcc-xdmod.unl.edu Redirect / https://hcc-xdmod.unl.edu </VirtualHost&g...

Xdmod fresh install

yum install epel-release -y yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm -y yum groupinstall "Development Tools" -y yum...

Xdmod purge

```bash mysql Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 1299 Server version: 5.5.50-MariaDB MariaDB Server

2016

Globus Connect Cli File Transfer

Add ssh key to your Globus account. instruction ssh jingchao@cli.globusonline.org $ transfer -- hcc#tusker/home/swanson/jingchao/file1 hcc#crane/home/sw...

Rmpi Cluster Installation

module load compiler/gcc openmpi R export R_LIBS=$HOME/R/x86_64-pc-linux-gnu-library/3.3 mkdir -p $HOME/R/x86_64-pc-linux-gnu-library/3.3 wget https://cran.r...