A SCALABLE SOLUTION FOR PROCESSING HIGH RESOLUTION BRAIN CONNECTOMICS DATA
This NSF BRAIN EAGER project supports the development and evaluation of a computational infrastructure designed to meet the big data challenges for a broad community of neuroscientists, with the ultimate goal of enabling study of how large brains are wired. As sample preparation and microscopy technologies advance, it is becoming feasible to acquire imagery of large sections of brain tissue, however, the software infrastructures to visualize and analyze such data have not kept pace. To produce connectomes critical in understanding brain function, the quantity of imagery that must be analyzed will necessitate effective use of High-Performance Computing (HPC) resources. This proposal will support development of a strategic proof of concept demonstration of computational infrastructure to meet these needs. New services will enable the efficient use of existing and future parallel computing platforms in support of workflows that are typical of neuroscientific discovery, and will facilitate the scaling of data management and processing algorithms to the massive datasets that will be generated in the future. Furthermore, this project will bring together an interdisciplinary team leveraging expertise in HPC, visualization, analytics, and neuroscience, to build new software tools and evaluate their practical efficiency and usability from an application perspective while focusing on real brain data.
To enable the scaling of science workflows to HPC systems, this project will produce a solution where a user can interactively visualize, analyze, and iterate over local views of large data, and then run large-scale analytics on a remote computational infrastructure. This project will leverage the team’s experience in multi-scale data streaming algorithms for parallel computing platforms to allow efficient processing of large scale brain models on modern computational infrastructures with services that are easily accessible to neuroscientists. The work will be innovative in the automatic mapping the classical algorithms and data flows into a high concurrency, parallel execution environment, and in the use of efficient runtime system for their allocation and execution. Moreover, those capabilities will be provided as services to users that are not skilled in HPC architectures, allowing to take advantage of the increasing performance of HPC systems to deal with the large data problems emerging in neuroscience.