Home AI Modern file storage accelerates the AI-driven search for cures

Modern file storage accelerates the AI-driven search for cures

by Support-User

The hunt for new cures and treatments is at the leading edge of life sciences research. HPE GreenLake for File Storage delivers the next-gen file storage needed to accelerate that search. Learn how.

David Yu, HPE Storage Product Marketing

In recent decades, experts in life sciences and healthcare have made significant progress in their search to improve human health and quality of life. The effort spans everything from new medical devices, treatments, and vaccines to cures for diseases such as diabetes, leukemia, cancer, and Alzheimer’s.

Key research areas in the hunt for exciting breakthroughs are genomic sequencing and cryogenic electron microscopy (Cryo-EM). These two fields provide a leading-edge representation of current efforts in life sciences and healthcare. The research is driven by new artificial intelligence (AI) capabilities and powerful new compute and data storage architectures, with modern file storage playing a key role. Let’s take a closer look at genomic sequencing and Cryo-EM.

Sequencing fundamental genetic information

Genomic sequencing takes a lot of lab work using special instruments and requires significant compute and data storage resources. However, it holds the promise of insights into diseases at the genetic level (including pre-dispositions to certain disorders) and specialized treatments based on a patient’s unique genetic profile.

One example: targeted cancer treatments that use drugs specifically to attack cancer cells. In this process, genomic tests identify cancerous cells with mutated genes. Clinicians then prescribe an appropriate drug to target and attack those cancer cells by attaching to certain molecular cell structures or blocking their function. This is often an iterative process, as cancer cells tend to continue to mutate and grow. Of course, the ultimate goal is not just targeted treatment but to find a cure, and genomic sequencing research also plays an important part in that effort.

Modeling the deep structures of life

Cryo-M allows researchers to see and study cells, viruses, ribosomes, and protein structures at the molecular level using an electron microscope on samples that are frozen at cryogenic temperatures. With this technique, scientists have been able to learn a great deal about proteins and the ribosomes that produce them.

Leveraging Cryo-EM, life sciences researchers build 3-D models from thousands to millions of high-resolution images and video files. To do so, they run compute- and data-intensive tasks with Fourier transforms for image processing and motion correction for video files. The models hold the potential to improve medical diagnosis, aid drug discovery, and find explanations for the side effects and ineffectiveness of certain existing drugs.

The challenge of massive amounts of file data

As mentioned, genomic sequencing and Cryo-EM are representative of the applications at the new frontier of life sciences and healthcare. In the search for cures and personalized treatments, these technologies leverage AI capabilities and hold much promise. However, they also present a challenge because they generate so much file data.

The human genome has 3 billion nucleotide base pairs in DNA, and figuring out the correct sequence of a single human genome is quite complex. The techniques used involve AI and machine learning (ML) workflows through data pipelines that first ingest and then output huge amounts of data — sorting, analyzing, interpreting, and inferencing genomic information. (We outlined the workflows and data pipelines of AI applications in a previous blog on AI.)

Genomic sequencing is extremely data-intensive and requires anywhere from 100-200GB to store data from a single human genome. To get statistically meaningful results, researchers need to do studies spanning large populations, so the amount of data shoots up very quickly. According to the National Human Genome Research Institute, “As biomedical research projects and large-scale collaborations grow rapidly, the amount of genomic data being generated is also increasing, with roughly 2 to 40 billion gigabytes of data now generated each year.” That’s 2 to 40 exabytes each and every year.

Cryo-EM is similar to genomic sequencing in its AI workflow and the amount of file data that it generates. The image and video files that Cry-EM creates are very large. One Cryo-EM machine can generate 3-5PB of data every year, and it’s not unusual for research organizations to have many such machines. With 10 machines, you can have up to 50PB of incremental data a year to store, process, and manage.

Modern file storage powers the life sciences

Both genomic sequencing and Cryo-EM are data-centric and data-rich endeavors. They both utilize AI with ML workflows that include data collection, preparation, model building, training, and inferencing. The massive amounts of data involved present huge challenges to compute and data storage capabilities in terms of scalable performance and capacity, ease of management, headroom for growth, and support.

To optimize these workloads, you need computational and data storage resources that can crunch through massive amounts of data. Research organizations must invest in high-performance, GPU-powered servers. They also need modern file storage capable of feeding data to the GPUs fast enough to keep up. So the performance of the file system matters a great deal. In practice, you need a file system with performance scalable to exabyte capacities to keep your GPU-driven servers busy.

Legacy file systems will not be able to meet those demands, as they come with unavoidable trade-offs in performance, scalability, simplicity, feature set, and cost. The other alternative, public cloud storage, has drawbacks that include security, performance and latency, cost at scale, data sovereignty and management, flexibility, and egress fees. These limitations have pushed organizations to look for on-premises file storage that eliminates trade-offs and also delivers an intuitive cloud experience. Let’s detail what we really need in file storage for life sciences and healthcare.

Performance at scale that’s affordable: Your file storage solution must deliver performance at scale — for all your data, not just some of it — and it must be fast despite working with huge amounts of data. Moreover, it must be affordable, even while delivering performance at scale for both sequential and random I/O operations, the latter being common in genomic analysis. Your file storage cannot achieve affordability via tiering because spinning disks will introduce rotational delays. Tiers create additional problems, as you will need to move data around between faster and slower storage and constantly tune performance. Even with all that overhead, legacy file storage can’t guarantee that you will be able to access and store the right data in the right place, at the right time, and with the right performance.

Simple to use with enterprise features: Researchers and scientists need file storage that can be set up and run easily, so that they can focus on strategic initiatives rather than being weighed down by day-to-day operations. Life sciences researchers have enough complexity to deal with in their field. They should not be burdened with the additional complications of managing traditional file storage infrastructure. And they want file storage that is not only fast but has enterprise features such as native replication, no-overhead snapshots, and a multiprotocol, global namespace to facilitate collaboration.

Enter HPE GreenLake for File Storage

As covered in a previous blog on AIHPE GreenLake for File Storage is AI-ready storage that meets all the criteria that life sciences and healthcare researchers are looking for. First, it delivers enterprise performance at scale with fast, sustained performance that spans the full scale of your data, supporting workloads that process extremely high data volumes. There are no storage tiers. You just get ultra-efficient, all-NVMe storage for blazing-fast performance. HPE GreenLake for File Storage delivers linear performance scaling while overhead remains flat.

HPE’s file storage solution also offers independent scaling of performance and capacity to lower costs. You get flexibility, efficiency, and affordability in meeting exactly the performance and capacity specs you need with a disaggregated, shared-everything architecture. And with the ultra-efficient Similarity data reduction algorithm, you also get up to 8:1 data reduction for additional savings and affordability.1

Last but not least, HPE GreenLake for File Storage delivers an intuitive cloud experience with radically simple file data management that reduces operational and management overhead for your researchers and IT staff. Powered by the HPE GreenLake edge-to-cloud platform, you’ll enjoy streamlined deployment, easy file share creation, unified storage management with a single cloud console, and automated, nondisruptive upgrades. In addition, you can modernize your data management with a rich set of enterprise features and a comprehensive suite of cloud data services. The simple, end-to-end, self-service cloud experience across the file storage management lifecycle – accessible from anywhere, on any device – empowers your researchers to devote themselves to breakthroughs in life sciences rather than being slowed down by managing file storage.

File storage that expedites the search for cures

In helping to meet the needs of researchers and scientists in life sciences as they seek new medical cures and treatments, it’s satisfying for us at HPE to deliver HPE GreenLake for File Storage – a solution that meets the performance, cost, efficiency, feature, and simplicity requirements of the AI/ML workloads that accelerate progress in worthy causes.

One of HPE’s company goals is to be a force for good. We work to make a difference every day for others, for our company, and for the planet. We’re excited that HPE GreenLake for File Storage enables us to make a positive impact on so many lives.

You may also like