Challenges associated with bringing cloud computing to neuroimaging include storing increasing amounts of data, the complexity of representations, and the need for sophisticated analytic techniques and infrastructures, said Michael Milham. Researchers must decide when cloud computing is appropriate for neuroimaging data or whether running data locally makes more sense, he said. The economics and sustainability of cloud computing plays a significant role in these decisions, as do concerns about privacy protection, said Milham. The size of neuroimaging datasets can be enormous, so the cost of cloud storage and computing becomes correspondingly greater. Relatedly, long-term storage costs for neuroimaging data can be expensive due to their size; although options for different types of storage exist (e.g., for archived data or data being actively used), a sustainable support plan and a plan for removing neuroimaging data when they may no longer be relevant or useful is helpful, discussed Milham, Deanna Barch, Daniel Marcus, and Maryann Martone. Privacy protection for neuroimaging data is important because of the potential for facial morphology, or perhaps eventually brain function, to be extracted from neuroimaging data, said Benjamin Neale and Lyn Jakeman.
CURRENT PROMISING PRACTICES FOR NEUROIMAGING DATA IN THE CLOUD
The Collaborative Informatics and Neuroimaging Suite (COINS), developed at the Mind Research Network (MRN) to collect, manage, and share data of different modalities including neuroimaging and phenotypic data from more than 700 studies (King et al., 2014), has recently made the transition from institutional-based servers to AWS, said Jessica Turner, professor of psychology at Georgia State University. COINS allows for radiological review, data review, analysis, and data sharing through the COINS data exchange, which includes data that can be accessed by the public, said Turner. It has a centralized database with distributed repositories. Each investigator’s data goes into a private section of the cloud with multiple copies so that multiple queries can be run simultaneously, said Turner.
Alan Evans described two other cloud-based infrastructures for neuroimaging data: CBRAIN1 and LORIS. LORIS is essentially a data storage environment that includes behavioral, clinical, neuroimaging, and genetic data, said Evans, while CBRAIN is a cloud-based portal that enables neuroimaging researchers to analyze data by accessing High-Performance Computing (HPC) facilities in Canada and elsewhere. Evans said they have recently added the Boutiques environment, a tool that enables standardization across workflows (Glatard et al., 2018). Boutiques also allows investi-
___________________
1 For more information, see http://www.cbrain.ca (accessed November 11, 2019).
gators to set up containerized environments, said Evans, which allows for robust and reliable resource and data sharing and makes data available to more people, according to Barch. Some 900 users in 30 countries use CBRAIN, which at least for now is a free resource, said Evans.
The Canadian Open Neuroscience Platform (CONP)2 brings together different platforms for open sharing of neuroimaging and other data. CONP addresses many of the same issues discussed at this workshop, said Evans, including interoperability, scalability, training, ethics, and data governance.
ISSUES TO BE RESOLVED TO ADVANCE CLOUD-BASED NEUROIMAGING DATA RESOURCES
The economics of cloud-based computing for neuroimaging remain unclear, said Evans. He said it is up to platform providers to demonstrate that cloud computing, despite its relatively high cost, is more cost effective than a lab maintaining an entire compute infrastructure locally. Investigators will not make the commitment to the cloud unless they are guaranteed that it provides reliable and better services. However, funders, researchers, and institutions may all have different perspectives on cost and who should bear the cost, said Clare Mackay. For example, an institution may not be willing to pay the costs of managing data for multi-institutional projects, said Martone. In the end, solutions are needed that work across these three entities, said Mackay.
The cost of sustaining cloud infrastructure is another concern, said Barch. Although grants may pay for compute resources, the life cycle of grants may differ from the life cycle of data, she said. However, Marcus noted that the cloud offers tiers of storage, so that when a study shuts down, the data can be moved to a lower cost platform. Nonetheless, someone has to pay for this storage, and at this point it is not clear who is responsible for stewarding that process: the researcher, the institution, NIH, or some other organization, said Martone.
Gayle Wittenberg of Janssen R&D noted that in the genomics field, NIH established dbGaP to provide centralized government storage of data. Evans added that NIMH established the NDA3 to store and share data, including clinical and neuroimaging data. However, Stacia Friedman-Hill, acting program chief for the Biomarker and Intervention Development for Childhood-Onset Mental Disorders Branch of the Division of Translational Research at NIMH, suggested that the responsibility for archiving data may need to
___________________
2 For more information, see https://conp.ca (accessed November 11, 2019).
3 For more information, see https://nda.nih.gov (accessed November 12, 2019).
be shared between NDA and others in the community. Moreover, a solution is needed that includes investigators not funded by NIH, said Mackay.
Martone suggested that a framework or succession plan needs to be laid out for maintaining neuroimaging data. Given the proliferation of data, frameworks are also needed to evaluate which datasets are most valuable or most difficult to replace and prioritize those that should be archived and for how long, said Milham. Randal Burns added that guidelines for prioritizing data for preservation should include reproducibility, that is, whether data and analytical methods have been published. Friedman-Hill added that while neuroimaging technology changes so rapidly that data may become obsolete, she believes there are scientific reasons to preserve some of the older data, which may be useful for epidemiological studies.