Scientific Computing Helps Researchers Explore the Universe
When it comes to exploring space, mapping the universe generates a ton of data. There is data about planets, stars, galaxies, and an incredible number of other objects dotting the sky. In fact, it took 4.5 petabytes of data to generate the world’s first image of a supermassive black hole. To crunch these massive datasets, researchers are using high-performance machines to accelerate their scientific computing efforts.
Scientific Computing Has Unique Storage Demands
Scientists who study space, especially deep space, use data-intensive modeling and simulations. Their research covers a bunch of different areas, including:
- Formation of galaxies
- Next-generation space vehicles
- Visualization of planetary environments
All that data has to be stored somewhere.
To solve this problem, the first breakthrough made by scientific computing was the idea of a “Redundant Array of Independent Disks”, commonly known as RAID. RAID helped scientists, for the first time, to bring together multiple disk drives in a single logical unit. This structure often led to better redundancy – as data was backed up in multiple places to avoid massive failures – as well as performance. Looking back, RAID helped set the foundation for distributed computing, which is a staple of high-performance computing today.
Once stored on drives, data currently being unused is archived for future use. And, surprisingly, it is a long-term storage technology that has been around a while being used: tape. This transfer from disk-to-tape frees up space on supercomputers for other experiments to run. Such archival systems can often store in excess of a hundred petabytes of data1.
But, there is more to scientific computing than storage alone. Data has to live and move across supercomputers, disk storage, and archival storage.
How Data Travels on High Performance Networks
High performance computing is highly demanding. Very high throughput with very low latency is not only desired, but expected. This demands seamless communication both between the individual nodes that make up a supercomputer, along with their attached network.
One popular communications protocol2 – an architecture part of the growing NVMe-oF standard – acts as a bridge that facilitates speedy communication between different disk and tape storage systems. It is flexible, being used as either a direct or switched interconnect between servers and storage. This communications standard is also scalable, fitting in with dynamically changing datasets. Data is sent and received over the network in packets through a series of switches, which together form a message.
A researcher can have all their data properly stored, archived, and transported. At some point, they need to see what that data describes. That’s where data visualization is key to scientific computing.
Visualizing Data during Supercomputer Simulations
As scientific simulations run in real-time, supercomputers attached to advanced graphic displays enable scientists and engineers to see their results spatially. This could include mapping distant expanses of the universe, or even simulating deep space travel. One popular data visualization system is known as the “hyperwall”. The display is made up of an eight-by-sixteen array of high-definition screens, each with processors and graphics processing units. Collectively, the “hyperwall” can crunch nearly 57 trillion floating operations per second.
In such high-performance computing, data visualization happens concurrently. As data is ingested, analysis and interpretation is carried out simultaneously. Perhaps the biggest innovation is the direct connection between the “hyperwall” and the supercomputer’s file system. By linking the two systems across a high performance network, files from the supercomputer are read directly into the data visualization system. This connection avoids the costly penalty of copying stored data to the visualization system’s memory.
Read More Scientific Computing Breakthroughs
Advancements in technology have helped researchers continue to push the limits of science. Learn more about innovation in areas such as future HDD development, deep space, and precision medicine.
- We ran 2.5 million HPC tasks on 1 million virtual CPUs using an AWS cluster. Find out the result
- Our helium-sealed HDDs helped visualize the first image of a black hole. Here’s how it happened
- With our collaborator, we sequenced 48 human genomes in just 48 hours. This is how we did it
- Data Storage Systems. https://www.nas.nasa.gov/hecc/resources/storage_systems.html
- What is InfiniBand? – Definition from WhatIs.com. https://searchstorage.techtarget.com/definition/InfiniBand