Tips Linux (Ubuntu and CentOS): 2018

This post is based in 2 articles, written by Andrea Zonca:

These articles helped me a lot to implement my cluster, but I had many problems because the frameworks have changed in their configurations. I update this information with the current frameworks:

Docker version 18.03.1-ce
Jupyterhub 0.8.1
nvidia-docker2

In my particular case, I need an internal cluster for research, and my site won't be public, so I will remove the authentication part and I implemented my own authentication class. I make the cluster for single Ubuntu user, and implement the authentication for access with specific usernames (In this blog only show a dummy authentication for simplicity). I create a share folder outside my home user (/export), you can change this like zonca article if you wish. I am not using a OpenStack, but I hope to integrate it later.

Until now, nvidia-docker2 doesn't have support to use docker swarm mode. So I used Docker Swarm.
We start since this point:

- Ubuntu 16.04 in your machines.

- Docker Installed in you master and slaves.
- Nividia-Docker2 installed in your master and slaves.

1) Main Server

Setup Docker Swarm

You must login as a root.

Configure the file /etc/init/docker.conf and replace DOCKER_OPTS= in the start section with:

DOCKER_OPTS="-H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock"

This will be used to communicated the server with the nodes. Then you can restart the docker service.

systemctl daemon-reload
systemctl restart docker

You can check if you configuration is ok with the command:

service docker status

Will be appear the docker service and the subprocess, the daemon dockerd must appear like this:

CGroup: /system.slice/docker.service
           ├─12764 /usr/bin/dockerd -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock
           ├─12776 docker-containerd --config /var/run/docker/containerd/containerd.toml

Tip: If after restart you service docker status is not like 
this, you can stop the docker service and execute this command:

service docker stop
dockerd -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock
service docker start

Now, we need to run 2 swarm service:
- Consul: a distributed key-store listening on port 8500. It will store the information about the available nodes.

docker run --restart=always  -d -p 8500:8500 --name=consul progrium/consul -server -bootstrap

- Swarm Manager: Which provide the interface to Docker Swarm:

HUB_LOCAL_IP=<THE IP IN YOUR PRIVATE NETWORK>
docker run --restart=always  -d -p 4000:4000 dockerswarm/swarm:master manage -H :4000 --replication --advertise $HUB_LOCAL_IP:4000 consul://$HUB_LOCAL_IP:8500

I recommend that you write your internal IP for HUB_LOCAL_IP.
You can check if the containers are running with:

docker ps -a

and then you can check if connection works with docker Swarm on port 4000:

docker -H :4000 ps -a

Setup Jupyterhub

Create a user : in my case the username is user.

In your host you must install Jupyterhub. I install using the step by step from zonca:

wget --no-check-certificate https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
 bash Miniconda3-latest-Linux-x86_64.sh
 ```

 use all defaults, answer "yes" to modify PATH

 ```
sudo apt-get install npm nodejs-legacy
sudo npm install -g configurable-http-proxy
conda install traitlets tornado jinja2 sqlalchemy 
pip install jupyterhub

Then , you must install dockerspawner:

pip install dockerspawner

You need a jupyterhub_config.py to configure your connection with docker. You can use my configuration.

I configure nvidia runtime and have some example of volumes (to share folders).
I put some constraint to control the CPU # cores limits and memory limits.
I put a DummyAuthenticator as example. You can change this for your specific case.

Share user home via NFS

Install NFS with package manager:

sudo apt-get install nfs-kernel-server

Create a folder /export/nfs and edit /etc/exports :

/export    *(rw,sync,no_subtree_check)

2) Nodes

Setup the Docker Swarm nodes

Configure the file /etc/init/docker.conf and replace DOCKER_OPTS= in the start section with:

DOCKER_OPTS="-H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock"

You must check if the docker_opts are working such as the fisrt part.

Then run the container that interfaces with Swarm:

HUB_LOCAL_IP=10.XX.XX.XX
NODE_LOCAL_IP=$(ip route get 8.8.8.8 | awk 'NR==1 {print $NF}')
docker run --restart=always -d swarm join --advertise=$NODE_LOCAL_IP:2375 consul://$HUB_LOCAL_IP:8500

HUB_LOCAL_IP :Is the LOCAL IP of your manager computer.
NODE_LOCAL_IP: is the LOCAL IP of your node computer.

Setup mounting the home filesystem

sudo apt-get install autofs

mount the folder taht will be shared across nodes and server:

sudo mount HUB_LOCAL_IP:/export /export

After all, you can enter into your Jupyterhub server (MYIP:9000 in my case) and enjoy!

References
- https://zonca.github.io/2016/10/dockerspawner-cuda.html
- https://zonca.github.io/2016/05/jupyterhub-docker-swarm.html
- https://github.com/jupyterhub/dockerspawner
- https://hub.docker.com/_/swarm/
- https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
- https://docs.docker.com/install/

Author: Cristian Muñoz
e-mail: crisstrink@gmail.com

Tips Linux (Ubuntu and CentOS)

martes, 2 de octubre de 2018

Configure User in NIS environment

lunes, 16 de julio de 2018

Empty buffer and cache on Linux System

domingo, 29 de abril de 2018

Jupyterhub deployment on multiple nodes with GPU for single user