Personal tools
You are here: Home Computing Info Hardware KIPAC Orange Cluster
Navigation
Log in


Forgot your password?
Weather
Fair
80°F
27°C
Wind
Northwest @ 10.35 MPH
Pressure
29.77"
Humidity
42%
Dewpoint
55°F
Current conditions for Palo Alto, CA
 
Document Actions

KIPAC Orange Cluster

Introduction


The KIPAC "orange" cluster comprises 90 compute nodes, 1 head node,  a parallel filesystem and a high speed DDR infiniband interconnect.  Each node hosts two Dual-Core AMD Opteron(tm) Processor 2218's running at 2.6 GHz and 16 GB of memory.  Hence the 360 core cluster has 4 GB/core totaling 1.44 TB of memory. Complementing the memory and compute capacity is an I/O system intended to allow memory <--> disk read/write cycles in a reasonable amount of time.  The expected through put is 1 GB/s or better for the lustre version 1.6 based system.  The nominal configuration will have 2 MDS/MGS machines as an HA pair, 6 OSS machines and 6 OST's connected via an FC switch to the OSS's.  All nodes, compute and filesystem, will be interconnected via DDR Infiniband in a 50% blocking factor tree configuration.  The details of the components are shown in the table:

item
qnty
description
interactive node
1
Sun X2200 w/2218 CPU, 16x1GB memory, 250 GB HD
compute node
90
Sun X2200 w/2218 CPU, 16x1GB memory, 250 GB HD
lustre MGS/MDS
2
Sun X2200 w/2218 CPU,  8x1GB memory, 250 GB HD
lustre OSS
6
Sun X2200 w/2218 CPU,  4x1GB memory, 250 GB HD
DISK Array
3
Sun 6140 w/2GB cache, 16x500 GB HD,  dual controller, configured as 2x{6+1+1} RAID 5
FC switch 1
16 port McData FC switch
racks
4
Sun Rack 1000-38 w/power distribution etc.
IB switch
11
Cisco 7000D series DDR IB switches (4 core, 7 leaf (<=12)
IB HCA
100 Cisco DDR HCA
IB cables
156
100 1m, 16 3m, 40 5m DDR rated IB cables
Ethernet switch
2
48 Port Cisco ~3500 class GigE switch  as uplink to main network
management switch
4
Cisco ???, for service processor (SP) connection 
serial concentrator
4
various for console access and logging distribution

Current Status

The compute nodes are fully functional.  Basic monitoring and job control/submission is functional.  Additional monitoring and control features will be added over time.  Nodes 001...080 are available for batch processing.  The Lustre file system is functional and is being tested on a subset of the compute nodes (081..090).  Aggregate write/read speeds are typically at the 700/800 MB/s scale with no optimizations yet applied.  The Lustre file system is expected to be in service in early July.  NFS space is available for job output and long term storage.

How to Run jobs on this system -- click here




Powered by Plone CMS, the Open Source Content Management System