Navy DSRC Introduction and Policy Guide

Table of Contents

1. Introductionto top

1.1. Purpose

This document provides an overview of the Navy DSRC. This guide is intended to offer assistance to users and their S/AAAs in determining which systems will best meet specific computational needs.

To contact us with questions, comments, or suggestions about this guide, please visit the Contact Us page for complete contact information.

1.2. Overview of Supported CTAs

The Navy Department of Defense (DoD) Supercomputing Resource Center (Navy DSRC) is organizationally located with the Naval Meteorology and Oceanography Command (NAVMETOCCOM) and is collocated with the headquarters (Commander, Naval Meteorology and Oceanography Command - CNMOC) at the John C. Stennis Space Center, MS. NAVMETOCCOM/CNMOC provides oceanographic support to the Department of Defense through a wide range of oceanographic modeling, prediction and data collection techniques.

The Navy DSRC, formerly the NAVO MSRC, was the second of the four major shared DoD High Performance Computing (HPC) centers to be formed under the auspices of the DoD HPC Modernization Program. Now one of five such centers, the Navy DSRC provides specialized support in the following critical defense computational technology areas (CTAs):

Supported CTAs
CTADescription
CWO Climate/Weather/Ocean Modeling and Simulation
CFD Computational Fluid Dynamics
CSM Computational Structural Mechanics
CCM Computational Chemistry, Biology, and Materials Science
CEA Computational Electromagnetics and Acoustics
ENS Electronics, Networking, and Systems/C4I
SIP Signal/Image Processing
FMS Forces Modeling and Simulation
EQM Environmental Quality Modeling and Simulation
IMT Integrated Medeling and Test Environments
SAS Space and Astrophysical Science

DoD Supercomputing Resource Centers provide DoD scientists and engineers with most of the program's computational resources. Each center supports a full range of centralized systems and services, including vector machines, scalable parallel systems, clustered workstations, DoD scientific visualization resources, and training.

1.3. Requesting Assistance

The Consolidated Customer Assistance Center (CCAC) is available to help users with unclassified problems, issues, or questions. Analysts are on duty 8:00 a.m. - 11:00 p.m. Eastern, Monday - Friday (excluding Federal holidays).

You can contact the Navy DSRC Help Desk directly in any of the following ways for issues related to classified and non-HPCMP resources.

  • E-mail: dsrchelp@navo.hpc.mil
  • Phone: 1-800-993-7677 or 228-688-7677
  • Fax: 228-688-4356
  • U.S. Mail:
    Navy DoD Supercomputing Resource Center
    1002 Balch Blvd
    Stennis Space Center, MS 39522-5001

For more detailed contact information, please see the Contact Us page.

1.4. Obtaining an Account

The process of getting an account on the HPC systems at any of the DSRCs begins with getting an account on the HPCMP Portal to the Information Environment, commonly called a "pIE User Account". If you do not yet have a pIE User Account, please visit the Consolidated Customer Assistance Center (CCAC) Accounts page and follow the instructions there. Once you have an active pIE User Account, visit the Navy DSRC Accounts page for instructions on how to request accounts on the Navy DSRC HPC systems. If you need assistance with any part of this process, please contact CCAC at accounts@ccac.htp.mil.

1.5. Visitor Information

If you are planning to visit the Navy DSRC, it is important that you review the instructions on the Planning a Visit page. This page contains important information including pre-trip and on-arrival instructions that you will need to know to ensure that your visit to our center goes smoothly.

2. Hardware, Network, and Softwareto top

All HPC systems currently in operation at the Navy DSRC are seamlessly integrated with the Mass Storage Archive Server and the Defense Research and Engineering Network (DREN) via many high-speed networking technologies.

2.1. High Performance Computing

2.1.1. Cray XT5 Cluster (Einstein)

Einstein is a Cray XT5. The login nodes are populated with 2.4-GHz AMD Opteron quad-core processors. The compute nodes are populated with 2.4-GHz AMD Opteron quad-core processors. Einstein uses a dedicated SeaStar2+ communications network for MPI messages and IO traffic. Einstein uses Lustre to manage its parallel file system that targets its LSI arrays. Einstein has 1592 compute nodes that share memory only on the node; memory is not shared across the nodes. Each compute node has 2 quad-core processors (8 cores) with its own Compute Node Linux (CNL) operating system, sharing 16 GBytes of DDR2 memory, with no user-accessible swap space. Einstein is rated at 123 peak TFLOPS and has 518 TBytes (formatted) of disk storage.

einstein.navo.hpc.mil
Cray XT5 - 123 TFLOPS
Login Nodes Compute Nodes
Total Nodes 4 1592
Operating System SUSE Linux Compute Node Linux (CNL)
Cores/Node 16 8
Core Type AMD Opteron 64-bit AMD Opteron 64-bit
Core Speed 2.4 GHz 2.4 GHz
Memory/Node 128 GBytes 1584 nodes - 16 GBytes
8 nodes - 31 GBytes
Accessible Memory/Node 16 GBytes 1584 nodes - 14 GBytes
8 nodes - 30 GBytes
Memory Model Shared on node Shared on node.
Distributed across cluster.
Interconnect Type Ethernet Seastar2+
File Systems on Einstein
Path Capacity Type
/scr516 TBytesLustre
/u/home22 TBytesLustre

For detailed information on using Einstein, see the Einstein User Guide..

2.1.2. IBM P6 Cluster (Davinci)

Davinci is an IBM P6. The login and compute nodes are populated with Power6 processors. Davinci uses 4X DDR Infiniband as its high-speed network for MPI messages and IO traffic. Davinci uses IBM's General Parallel File System (GPFS) to manage its parallel file system that targets DCS9550 RAID arrays. Davinci has 150 compute nodes that share memory only on the node; memory is not shared across the nodes. 148 nodes have 32 processors with 64 GBytes of memory and 2 nodes have 32 processors with 256 GBytes of memory. Davinci is rated at 90 peak TFLOPS and has 437 TBytes (formatted) of disk storage.

Davinci also uses an advanced feature called Simultaneous Multi-Threading (SMT), which allows two threads to run concurrently on the same physical core, effectively creating an additional 32 "virtual" cores per node. If properly used, this feature may yield performance increases of 20% or more for some applications. For detailed information on using the SMT capability, see the SMT Guide.

davinci.navo.hpc.mil
IBM Power 6 - 90 TFLOPS
Login Nodes Compute Nodes
Total Nodes 2 148
Operating System AIX AIX
Cores/Node 32 32
Core Type Power6 Power6
Core Speed 4.7 GHz 4.7 GHz
Memory/Node 64 GBytes 146 nodes - 64 GBytes
2 nodes - 256 GBytes
Accessible Memory/Node 2 GBytes 146 nodes - 52 GBytes
2 nodes - 246.4 GBytes
Memory Model Shared on node Shared on node.
Distributed across cluster.
Interconnect Type 4x DDR Infiniband 4x DDR Infiniband
File Systems on Davinci
Path Capacity Type
/scr391 TBytesGPFS
/u/home47 TBytesGPFS

For detailed information on using Davinci, see the Davinci User Guide..

2.2. Mass Storage Archive Server

2.2.1. Sun M5000 (Newton)

There is one M5000 system, Newton, which makes up the Resilient Mass Storage Server (RMSS). The system is configured with (6) dual-core 2.1-GHz processors, 32 GBytes of main memory and over 60 TBytes of hard disk storage. For information on using the archive system, see the Archive User Guide.

2.3. Network Connectivity

Our site is a primary node of the Defense Research and Engineering Network, or DREN. DREN is a robust, high-speed network providing connectivity to user sites and centers nationwide. We connect to the DREN Wide Area Network (WAN) via an OC-48 circuit capable of data transfers up to 2.48 Gbits/sec and a secondary OC-12 circuit capable of data transfers up to 622 Mbits/sec to provide fault tolerance and additional bandwidth.

Our Local Area Network (LAN), a 10-Gigabit Ethernet connection, provides primary connectivity to the Navy DSRC infrastructure, HPCs, and mass storage assets. The users of the Navy DSRC are able to use this high-performance connectivity for interactive and data transfer functions.

2.4. Software Environment

All Navy DSRC systems run derivatives of the UNIX System V operating system with vendor-specific enhancements. A large variety of compiler environments, math libraries, programming tools and third-party analysis applications are available on the DSRC systems.

HPC Software Listings
SystemSoftware Listing
Cray XT5 (Einstein) http://www.navo.hpc.mil/software/index.html?sys=Einstein
IBM P6 (Davinci) http://www.navo.hpc.mil/software/index.html?sys=Davinci

3. Data Storageto top

The Navy DSRC data storage consists of local home directories on each system, temporary disk storage on each system and long-term storage on the Resilient Mass Storage Server (RMSS). Files stored on the RMSS are subject to migration to off-line status that is controlled by Sun's Storage and Archive Manager/Quick File System (SAM/QFS) software.

3.1. Permanent File Storage

Users are allocated a home directory (referenced locally with the $HOME environment variable) on each Navy DSRC system with 1 GByte of non-migrated storage. $HOME is not backed up by the Center; therefore users are responsible for maintaining backup copies of any files in this directory.

3.2. Temporary File Storage

Each Navy DSRC system is configured with a large quantity of high-speed disk storage configured as the /scr file system. /scr is the globally accessible, high-speed working storage primarily for interactive and batch processing. Batch jobs use large amounts of temporary space. There are no limits on the size of individual files. Users are responsible for managing their own files in the /scr areas. The /scr file system is not backed up by the Center. Users are responsible for maintaining backup copies of any files in the temporary file system. Users can access their temporary storage by using the $WORKDIR environment variable. The table below lists the /scr allocations for each system.

Temporary Space Allocations on HPC Systems
System/scr
Cray XT5 (Einstein)516 TBytes
IBM P6 (Davinci)391 TBytes

3.3. Archival File Storage

All of our HPC systems have access to an online archival mass storage system that provides long-term storage for users' files on a Petascale archival storage system that resides on a robotic tape library system. A 60-TByte disk cache frontends the tape file system and temporarily holds files while they are being transferred to or from tape.

The environment variables $ARCHIVE_HOST and $ARCHIVE_HOME are automatically set for you. $ARCHIVE_HOST can be used to reference the archive server, and $ARCHIVE_HOME can be used to reference your archive directory on the server. These can be used when transferring files to/from archive. For information on using the archive system, see the Archive User Guide.

4. Processing Environmentto top

4.1. Determining the Correct HPC System

Determining the correct HPC System for your needs can be a complex task. The following are just a few of the factors that might influence your choice:

4.1.1. Software Availability

If your work depends upon a specific Commercial Off-The-Shelf (COTS) application, you can verify it's availability on any system in the HPCMP by checking the Consolidated Software List. Software information for Navy DSRC systems is also available on our local software page. You can also check the HPCMP Benchmarking site, which not only lists the specific benchmarked codes, but also shows the code's relative performance across all HPCMP systems. If you can't find the application that you need on these sites, contact CCAC for assistance.

4.1.2. Hardware Requirements

To ensure that your jobs will have access to sufficient cores and memory to run as needed, you can review the hardware specifications on our Hardware page. Additional details are available in each of the HPC User Guides, available from the Documentation page.

4.1.3. Queue Limits

If your jobs require exceptionally long run times or if you need an exceptionally large number of cores, you should verify that queue limits on the system you choose allow both the number of cores and run time that you need. To check this, see our Queue Summary page.

4.2. Processing Environment Overview and Philosophy

Navy DSRC provides both an interactive and a batch submission environment. Batch queue environments are available on all of the systems. The batch environment is the primary environment for most user work. All of the HPC systems at the Navy DSRC use the PBS batch queue system.

The batch queue environments allow users to submit, monitor and terminate their own batch jobs. This capability is intended for jobs requiring large amounts of memory and/or CPU time that generally run for many hours. Through the batch queue environments, the user submits a job either from the command line or through a shell script. Resource requirements (e.g., CPU time and number of processors) or runtime parameters (e.g., output file redirection) can be issued on the command line or embedded in the shell script for the batch job to be executed.

4.3. Job Scheduling/Queuing Environment and Policies

4.3.1. Cray XT5 Queue Usage Policies
Summary of Queues on the Cray XT5 - Einstein
Priority Queue
Name
Job
Class
Max Wall
Clock Time
Max Cores
Per Job
Comments
Highest urgent Urgent TBD TBD Designated Urgent Project by DoD HPCMP
Down Arrow for decreasing priority high High 168 Hours 4096 Designated High-Priority Projects by DoD HPCMP
challenge Challenge 168 Hours 4096 Challenge Projects Only
special N/A 168 Hours 4096 Access Available by Request
debug Debug 30 Minutes 512 User Diagonostic Jobs
standard Standard 168 Hours 2048 Normal Priority Jobs
bigmem N/A 24 Hours 56 Large Memory Jobs
transfer N/A 12 Hours 1 Data Transfer Jobs
analysis N/A 8 Hours 1 Serial Jobs
Lowest background Background 4 Hours 512 User jobs that will not be charged against the project allocation
4.3.2. IBM P6 Queue Usage Policies
Summary of Queues on the IBM P6 - Davinci
Priority Queue
Name
Job
Class
Max Wall
Clock Time
Max Cores
Per Job
Comments
Highest high_share N/A 168 Hours 1 High Priority Serial Jobs
Down Arrow for decreasing priority special N/A 24 Hours 3072 Access Available by Request
urgent Urgent TBD TBD Designated Urgent Project by DoD HPCMP
high High 168 Hours 3072 Designated High-Priority Projects by DoD HPCMP
challenge Challenge 168 Hours 3072 Challenge Projects Only
debug Debug 30 Minutes 512 User Diagonostic Jobs
standard Standard 168 Hours 2048 Non-Challenge User Jobs
share N/A 24 Hours 1 Serial Jobs
bigmem N/A 24 Hours 32 Large Memory Jobs
transfer N/A 12 Hours 1 Data Transfer Jobs
Lowest background Background 4 Hours 1536 User jobs that will not be charged against the project allocation

4.4. Interactive CPU-time Limits

The Navy DSRC has implemented a 15 minute (900 second) interactive processing limit on login nodes for processes running outside of the batch scheduler. This also applies to systems that do not have a batch scheduler installed. If you were to run an application on a login node, the application would be allowed to accrue 900 seconds-worth of CPU time, not real time, before being terminated. This policy has been put in place in order to protect interactive access for all users.

Interactive CPU-Time Limits
SystemCPU Time
Cray XT5 (Einstein)15 mins
IBM P6 (Davinci)15 mins
Sun M5000 (Newton)15 mins

5. Security/Authentication Environment and Policiesto top

5.1. Account Protection

The Navy DSRC uses Kerberos in combination with SecurID cards to access unclassified DSRC resources. Kerberos is an authentication system that utilizes a series of encrypted messages sent between two systems to verify the identity of someone attempting access. SecurID is a card based system used to generate a unique passcode each time it is utilized.

Under no circumstances is the sharing of accounts or passwords allowed. Any user found doing so will be disabled from all DSRC systems pending contact with PI and/or S/AAA.

6. Navy DSRC Specific Documentationto top

On-line documentation and information can be found through the Navy DSRC Web site, the message of the day (MOTD) that is displayed when logging on any system, and manual pages via the man command.