Hpc network technologies
In the rapidly evolving landscape of High-Performance Computing (HPC) and Artificial Intelligence (AI), understanding the nuances between various networking ...
In the rapidly evolving landscape of High-Performance Computing (HPC) and Artificial Intelligence (AI), understanding the nuances between various networking ...
This write-up aims to replicate the blog Deploy NDm_v4 (A100) Kubernetes Cluster by Cormac Garvey. The original blog assumes you have an exising ACR.
In the rapidly evolving landscape of High-Performance Computing (HPC) and Artificial Intelligence (AI), the quest for optimizing operational cost without com...
Az-HOP in the Azure Marketplace
Network topology When create AZHOP without NetApp volumes, a subnet for ANF will still be created as shown in figure below: You can manually add a ANF volum...
To enable ssh to the ondemand node, you need to Change the NSG rule to allow inbound traffic on port 22; Edit /etc/ssh/sshd_config file on the ondemand ...
You can setup a SLURM cluster on Azure using AZHOP. This blog has details on how to deploy AZHOP.
Disk Details
Azure HPC On-Demand Platform (az-hop) is a tool that provides an end-to-end deployment mechanism for a base HPC infrastructure on Azure. It uses industry sta...
H100 is NVIDIA’s 9th-generation data center GPU. For today’s mainstream AI and HPC models, H100 with InfiniBand interconnect delivers up to 30 times the perf...
Third generation Tensor Cores Tensor Cores are specialized high-performance compute cores that perform mixed-precision matrix multiply and accu...
System information # cat /etc/*release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.04 DISTRIB_CODENAME=bionic DISTRIB_DESCRIPTION="Ubuntu 18.04.6 LTS" NAME="Ubuntu"...
This is to explain how to run Rstudio Server container image using NVIDIA Enroot/Pyxis. You have the option to deploy the container using Docker. Here is a n...
This post is to demonstrate how to setup SLURM federation between an on-prem cluster and an Azure cluster. Both clusters will be deployed by azhop. Please re...
Azure HPC documentation Azure high-performance computing High-performance computing (HPC) on Azure Azure HPC-Certification Run high-performance co...
After doing az login and az account set -s XXX, I still got the following error message: Subscription (xxx) or Tenant (xxx) doesn't exists.
Magnum IO SDK accelerates and enables developers to optimize all data access, movement and management between CPU, GPU, DPU and Storage.
Background There are new opportunities and challenges for the High-Performance Computing (HPC) community to rethink and enhance communication middleware like...
Get the docker image from NVIDIA website. You need to register and login. Download file is modulus_image_v21.06.tar.gz (5.7G). Build a singularity image. ml ...
```Python import tensorflow as tf
Some guidlines on selecting the number of MPI processes per GPU When using the GPU package, you cannot assign more than one GPU to a single MPI task Mul...
Impact of Transformers on NLP (and ML more broadly)
A more general definition of attention: Given a set of vector values, and a vector query, attention is a technique to compute a weighted sum of the values, d...
What is AI, really? Jeff Dean, the head of Google’s AI efforts, explains the underlying technology that enables artificial intelligence to do all sorts of th...
Profiling PyTorch (PyProf) PyProf is a tool that profiles and analyzes the GPU performance of PyTorch models. PyProf aggregates kernel performa...
Code in this demo Github
Week 1 A simple intro to the Keras Tokenizer API ```python from tensorflow.keras.preprocessing.text import Tokenizer
Steps to implement Horovod Initialize Horovod and Select the GPU to Run On Print Verbose Logs Only on the First Worker Add Distributed Optimizer Ini...
Lab 2: Multi-GPU DL Training Implementation using Horovod Horovod is a distributed deep learning training framework. It is available for TensorFlow, Keras, P...
Lab 1: Gradient Descent vs Stochastic Gradient Descent, and the Effects of Batch Size Gradient Descent ```python #Generating a random dataset Numpy is a fund...
At times you will need to clear the GPUs memory, either to reset the GPU state when an experiment goes wrong, or, in between notebooks when you need a fresh ...
Running these via mpirun or srun like on https://developer.nvidia.com/blog/how-to-run-ngc-deep-learning-containers-with-singularity/ would help to ensure cor...
Get a snapshot of GPU stats without DCGM.
NOTE: NERSC provides compiled LAMMPS binaries as modules. This tutorial is for users with modified source code.
In order to access NIMH Data Archive, users need to apply for an account using this link. Project PI needs to add the new user to their project for data acce...
```bash #install Miniconda mkdir ~/conda; cd ~/conda wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-...
In Anaconda Prompt conda create -n newEnv conda activate newEnv - y conda install spyder-kernels -y conda install ... where python Copy the Python environme...
Install Miniconda if you haven’t done so. mkdir ~/conda; cd ~/conda wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Mi...
Install Miniconda and configure Jupyter notebook to run R kernals. First make sure X11 forwarding works on your local machine. Then follow the steps below. `...
Use the GNU du command [jingchao@login.crane hao]$ du --apparent-size -sh 11T . [jingchao@login.crane hao]$ du -sh 3.2T .
Aspera is IBM’s high-performance file transfer software which allows for the transfer large files and data sets with predictable, reliable and secure deliver...
This is mostly used for preparing a reponse letter for journal reviews. Copying/pasting reviewers’ comments to a MS Word file ends up with lots of line break...
```bash #!/bin/sh Change the job name, if you want to #SBATCH –job-name=pendulum
This document provides the steps to configure MATLAB to submit jobs to a cluster, retrieve results, and debug errors.
Check for number incrementals in files at /sys/class/infiniband/mlx4_0/ports/1/counters/ watch -n .5 -diff=cumm 'grep -H . /sys/class/infiniband/mlx4_0/ports...
If Deep Learning Toolbox Model for AlexNet Network support package is not installed, then the function provides a link to the required support package in the...
create a shared location in $COMMON mkdir /common/GROUP/conda chown :GROUP conda chmod 770 conda create conda env in the shared location ...
Put the line below at the end of a slurm submit script to find out the memory usage. cgget -g memory /slurm/uid_${UID}/job_${SLURM_JOB_ID}
Global lock. usermod.py --lock user
install LDAP dependences yum install python-devel openldap-devel -y pip install django-auth-ldap pip install python-ldap ldap3
```console [root@beta ~]# source keystonerc_admin
Crane/Tusker purged files are temporarily stored at /lustre/backup/robinhood/YYYY-WXX (YYYY=year and XX=week number) for 2 weeks. Files beyond 2 weeks are...
wget https://github.com/lammps/lammps/archive/stable_22Aug2018.tar.gz tar zxvf stable_22Aug2018.tar.gz cd lammps-stable_22Aug2018/src for i in BODY ...
Compile OpenMPI with MPI_THREAD_MULTIPLE support. ./configure --enable-mpi-thread-multiple ml load compiler/gcc/4.9 openmpi/1.10 phdf5/1.8 python/3.6 ...
From Tusker/Crane, ssh beta.anvil.hcc.unl.edu sudo su - source /root/keystonerc_admin /util/accounts-mgmt-cli/anvilman.py -v -g GROUPNAME -u USERNAM...
Building Your Own FFTW3 Interface Wrapper Library source /util/opt/lmod/lmod/init/profile ml load compiler/intel/16 cd /util/comp/intel/2016.3/mkl/interfaces...
Click the button to see and download your peer review activity certificate. We hope you will display it with the same pride we take in working with such an a...
Peer review is the cornerstone of science, and Elsevier is dedicated to supporting and recognizing our journals’ reviewers. My Elsevier Reviews Profile aims ...
SIESTA is both a method and its computer program implementation, to perform efficient electronic structure calculations and ab initio molecular dynamics simu...
yum erase *docker* -y yum-config-manager –add-repo “https://download.docker.com/linux/centos/docker-ce.repo” yum install docker-ci -y systemctl stop...
https://slurp.unl.edu:8000/en-US/account/login?return_to=%2Fen-US%2Fapp%2Flauncher%2Fhome HCC username and password “Launch search app” “Lmod loggin...
Tensorflow 1.10.1 https://www.tensorflow.org/install/source#tested_source_configurations https://github.com/tensorflow/tensorflow/blob/v1.10.1/tensorflow/too...
https://hcc-git.unl.edu/red-puppet.git git clone git@hcc-git.unl.edu:red-puppet
```bash module load anaconda conda create -n tensorflow-keras tensorflow keras ipykernel python=3.6
New Project ```bash mkdir packages/blast cd packages/blast cp ../../anaconda-project.yml.example ./anaconda-project.yml anaconda-project add-env-spec -n blas...
Octopus FFTW3 OpenMPI Intel-MKL GSL
Boost $ ./bootstrap.sh --prefix=/home/username/usr $ vim boost_1_55_0/tools/build/v2/user-config.jam using mpi ;#add this to the last line $ ./b2 install --p...
OpenStack Flavor: general-xlarge Boot from snapshot: “hcc-xdmod prod snap 3/8/18” Security Groups: Port 8080/tcp; default Net...
$ sreport cluster AccountUtilizationByUser accounts=swanson start=1/1/17 end=12/31/17 -t Hour ---------------------------------------------------------------...
get the ACL’s for /home/usera [usera@login-damiana ~]# lfs lgetfacl /home/usera # file: home/usera # owner: usera # group: users user::rwx group::--- other:...
OS is CentOS 7
Problem: VM run out of entropy cat /proc/sys/kernel/random/entropy_avail 13
To see the directories that R searches for libraries, from within R you can do: .libPaths();
OpenStack Manage Security Group Rules: XALT
chrony is a versatile implementation of the Network Time Protocol (NTP). It can synchronize the system clock with NTP servers, reference clocks (e.g. GPS rec...
xdmod.conf ```bash Redirect HTTP to HTTPS <VirtualHost *:80> ServerName hcc-xdmod.unl.edu Redirect / https://hcc-xdmod.unl.edu </VirtualHost&g...
yum install epel-release -y yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm -y yum groupinstall "Development Tools" -y yum...
```bash mysql Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 1299 Server version: 5.5.50-MariaDB MariaDB Server
option 1 echo 'eval $(perl -Mlocal::lib)' >> ~/.bashrc source ~/.bashrc cpan cpan> install Module::Load::Conditional
Add ssh key to your Globus account. instruction ssh jingchao@cli.globusonline.org $ transfer -- hcc#tusker/home/swanson/jingchao/file1 hcc#crane/home/sw...
module load compiler/gcc openmpi R export R_LIBS=$HOME/R/x86_64-pc-linux-gnu-library/3.3 mkdir -p $HOME/R/x86_64-pc-linux-gnu-library/3.3 wget https://cran.r...
Only essential commands are listed. Confidential info varies by sites.
To reduce noise, turn off SElinux and firewall. Clone the simplesamlphp-duosecurity repo and put it in /usr/share/xdmod/vendor/simplesamlphp/simplesamlp...
Setup daily cron job on each cluster running command: