Skip to content Skip to navigation

Limitation Enforced On CS Linux Machines

NOTE: Please consult this file before running big or long jobs. We may change these limits as we see fit.

The following limitation are enforced on CS Linux machines that you should be aware of. These limits may be changed from time to time. Please consult this page before running a big job.

Limits set For ALL CS Machines:

1.  X2GO and Remote Desktop Limits:

X2GO and Remote Desktop clients allow users to resume their works after disconnecting their session. Unfortunately many users do not returned back to their sessions resulting unnecessary waste of resources for long period of time. Please logout after using the machine. If you don't logout, by default, your X2Go or Remote Desktop session will be terminated and all unsaved work will be lost.

If you would like to resume with the same session at a later time, you have to opt-in by openning a terminal window and  type:

    in bash : echo 5  > /var/run/user/$UID/KeepSession 

    in csh: echo 5  > /var/run/user/$uid/KeepSession 

Number 5 means you get 5 hours to get back to it. The maximum time you can set is 36 hours.  You need to redo the above command to extend the time.  The time starts from when you invoke the above command.

 

To prevent anyone person from dominating the entire machine and to prevent runaway processes or abuse, iLab machines by default are set with some limitations that will terminate a process when the limit is reached without warning.

The following limits are currently set on our iLab machines.

2. MEMORY LIMIT: Set on iLab/Grad machines

Below is what is preset on the system and can't be adjusted by end user.  If your memory need is high, make sure you pick machine with most amount of memory available to you. When you run low on memory, Linux oomkiller will terminate your job automatically. We have a script that watches the log and notify you when this happens so you are aware of the issues with your codes. Here are the current details of memory limits:

- on ilab*.cs.rutgers.edu,  maximum memory per user is  48GB.
ilab*.cs.rutgers.edu have tuned profile virtual-host to reduce its swapiness.

- on data*.cs.rutgers.edu, jupyter.cs.rutgers.edu (hadoop cluster), maximum memory per user is 32GB
data*.cs.rutgers.edu have tuned profile virtual-guest to reduces swapiness.

- on other desktops, maximum memory per user is 50% of physical memory
Desktops have the detault tuned profile, which is balanced. You could argue that desktop would be slightty better.

Note: ilab* and data* both have swap space, on Solid State Drive. 
 

3. GPU LIMITS:  Set on iLab Server cluster

On machine with 8 GPUs, maximum GPU you can use is 4.
 

4. CPU LIMITS:  Set on iLab{1,2,3}.cs.rutgers.edu machines  [ NOTE: NEW CHANGE as of Oct 10, 2018]

Because jobs sometimes run away, continuing to use computer time without limit, we limit the amount of time your jobs can use.

The limits only affect jobs that have used more than 24 hours of CPU time for all processes in one session. A session is roughly anything you start from a single login. If you logout and login again, you’ll have a new session.  

That means if you are using 24 cores, your process will only run for 1 hour to reach 24 hrs maximum CPU limit. After 24 CPU hours, your processes will be terminated unless you have specified a time limit. This limit operates on sessions. A user can see their sessions and the amount of CPU they’ve used by typing  sessions or sessions -l

If you expect your job to use more than 24 hours of CPU time, you can declare a time limit. The system won’t terminate the job unless it goes over the limit you have declared. To do that, use terminal windows and type a command below:

  in bash:  echo 48  > /run/user/$UID/LongjobLimit

  in csh:  echo 48  > /run/user/$uid/LongjobLimit
 
The number 48 means your jobs will continue up to 48 clock hours. The maximum time you can set is 80 clock hours.  (This was chosen to be a bit over 3 days.) Values over 80 are treated as 80. If a job is going to run longer than 3 days, you'll need to redo this every 3 days. The time runs from when the file was created or updated. 
 
(Yes, the 24 hour limit is *CPU* time and the limit in /run/user/$UID/LongjobLimit is *wall clock* time.)

 

CPU LIMITS:  Set on other iLab machines (This limits will be replaced by above limits soon)

By default,  24 hours CPU limits are set. That means if you are using 24 cores, your process will only run for 1 hr to reach 24 hrs maximum CPU limit. After 24 CPU hours, your processed will be terminated.

If  you have legitimate need to run over than 24 hrs, you can set a larger limit as follow:

In bash, type:    ulimit -t NNNN where NNNN is the CPU limit in seconds.  Do it in the shell before starting the long job. The limit applies to the shell where it’s done, and any processes started from it (because they inherit the limit).

Please set a realistic limit, not an absurdly large one. If you abuse this limit, your process will be terminated without warning. 

 

5. Blacklisting System: Sets on ALL CS machines

When our machines detect abnormal activities, it may put remote machines in a blacklist. This blacklist will block any listed machine attempt to connect. If you have issue connecting to CS machines, make sure to check if your IP is blocked and how to get around the block.