Homegrown high-performance computing
Virginia Tech starts from scratch
At Virginia Tech's Advanced Research Institute (ARI), constructing an HPC cluster for cancer research has been an educational experience for the electrical and computer engineering grad students involved.
With little prior HPC experience, the students built a 16-node cluster and parallelized apps they had written in MATLAB, a numerical programming environment, over the course of several months. The project taps huge amounts of data acquired from biologists and physicians to perform molecular profiling of cancer patients. The students are also working on vehicle-related data for transportation projects.
Rather than make every aspect a learning experience, when it came to choose an HPC platform, the students and professors decided to stick with what they already knew: Microsoft Windows.
"Our students had already been running MATLAB and all their other programs on Windows," says Dr. Saifur Rahman, director of ARI. "We didn't want to have to retrain them on Linux." As was the case at BAE Systems, there were also obvious advantages to a cluster that could integrate easily with the rest of ARI's Windows infrastructure, including Active Directory.
Microsoft had already approached Virginia Tech to be an early adopter of Windows Compute Cluster Server 2003, so Dr. Rahman and his team said yes and started looking for the right hardware. They vetted several vendors, but when they found out Microsoft was performing its own testing on Hewlett-Packard servers, they decided to go with HP. "We knew we'd need help from Microsoft to fix various bugs," says Dr. Rahman, "and since all their experience was on HP servers, we felt we'd have the most success with HP."
So with help from Microsoft and HP, ARI installed 16 HP ProLiant DL 145 servers with dual-core 2.01GHz AMD Opteron 270 processors and 1GB of RAM each. On the same rack, ARI installed 1TB of HP FC storage. The rack also includes one head node, as well as an HP ProLiant DL385 G1 server with two dual-core 2.4GHZ AMD64 processors and 4GB of RAM.
As did BAE Systems, ARI decided to stick with Gigabit Ethernet for its cluster interconnect, mainly because it was what the team knew. "There are other interconnects that are faster, but we've found that Gigabit Ethernet is pretty robust and works fine for our purposes," Dr. Rahman says. And after some servers overheated, ARI placed the entire cluster in a 55-degree Fahrenheit chilled server room.
ARI found parallelizing MATLAB apps to be a significant challenge requiring a number of iterations. "The students would work on parallelizing the algorithms, then run case studies to verify the results they were getting with the clustered applications were similar to results they got when they ran one machine," Dr. Rahman says.
At first, the results weren't coinciding, and the students had to learn more about how to parallelize effectively and clean up what they had already coded. "We missed some important relationships at first," Dr. Rahman says. With some help from MATLAB, it took two graduate students about a month to get the app parallelization right.
Dr. Rahman feels that the team's diverse expertise was a large factor in the project's success. One of the grad students had deep knowledge of molecular-level data quality, biomarkers, and the relevance of different data types; another offered a lot of hardware expertise; and the IT person had much experience interacting with vendors effectively. MATLAB provided help in determining which toolboxes were relevant to the task.
"When we went to MATLAB, they were just getting started with HPC," Dr. Rahman says. "I hope they will start to pay more attention, as it would be nice if they were all ready so we didn't have to spend months on this."
There were also hardware communications glitches.
"At first we had some problems controlling the servers as they talked to each other and the head node," Dr. Rahman says. "Sometimes they wouldn't respond. In other cases we wouldn't see any data coming through." Solving the problem took a lot of reconfiguring and reconnecting. "Perhaps we were giving the wrong commands at first. We're not sure," he adds. There were also problems with incorrect server and software license manager configurations.
Dr. Rahman says that managing the cluster has been relatively trouble-free with Windows Compute Cluster Server 2003 and adds that if he could do this all over again, he'd send his students to Microsoft for a longer time to learn more of what Microsoft itself has discovered about building clusters with HP servers. The use of HPC has enabled ARI researchers to dive much more deeply into molecular data, not only analysing differences in relationships among disparate classes of subjects, but also revealing more subtle but important variations within each class.
Borderless corporate networks to shift focus to secure content management in Australia in 2009 2008-12-04 16:06:00+11
IDC Says Asia/Pacific Excluding Japan IT Market Will Remain The Bright Spot... 2008-12-04 15:04:00+11
AOC Launches 18.5” Widescreen Green 16:9 LCD Monitor in Australia and New Zealand 2008-12-03 15:30:00+11
Progress Software's Cure for Managing Services-based Applications 2008-12-03 14:42:00+11
EXCOM scores back-to-back award trifecta 2008-12-01 10:46:00+11



