Abstract

Matrix-assisted laser desorption/ionization (MALDI) imaging mass spectrometry (IMS) is a technique that can reveal powerful insights into the correlation between molecular distributions and histological features. Due to their high-dimensional, hierarchical and spatial nature, MALDI IMS datasets present numerous statistical challenges. In collaboration with the bioimaging team at GlaxoSmithKline (GSK), we have developed special purpose statistical workflows in R that provide end-to-end support for the entire MALDI IMS analysis pipeline, from study design and assay quantification to functional pharmacology. These applications leverage numerous R packages, with a particular focus on the “tidyverse” and “tidymodels” ecosystems due to their modularity and interconnectedness (to protect GSK’s intellectual property, we are currently unable to share our code). Our workflows include robust smoothing and estimation of calibration curves; non-trivial animal and tissue sample size calculations via in silico experiments; and AI/ML implementations for prediction of drug effects from the high-dimensional molecular space. These solutions addressed unique biological and quantitative challenges, and yielded actionable insights for GSK’s bioimaging team.

Keywords

Bioimaging, R workflow, high dimensional data

MALDI Imaging Mass Spectrometry Data

The MALDI technology enables the mapping of molecular profiles to histology.

  1. Tissue sections are extracted from each animal
  2. At each x, y coordinate of the tissue section, molecular profiles (intensities of ions, or m/z values) are extracted

All studies were conducted in accordance with the GSK Policy on the Care, Welfare and Treatment of Laboratory Animals and were reviewed by the Institutional Animal Care and Use Committee either at GSK or by the ethical review process at the institution where the work was performed.

Imaging Mass Spectrometry Workflows

In this note, we will focus on the study design and functional pharmacology workflows. We have also developed a workflow for assay quantification and calibration.

Sample Size Calculation Workflow

MALDI datasets are hierarchical, with tissue sections nested in animals. We simulated from historical data to develop guidelines for animal and tissue section sample sizes.

Functional Pharmacology Workflow

For the functional pharmacology analysis, the goal is to use ions to predict binary drug response

  1. Model selection and training: we use tidymodels (rsample and tune packages) to tune xgboost parameters via grouped (by mouse) cross-validation
  2. We make predictions from molecular profiles, then map those to the spatial domain to identify interesting histological features
  3. Using SHAP values, we map the contributions of individual ions to the spatial domain

Availability of supporting source code and requirements

  • To protect GSK’s intellectual property, we are unable to share the source code at this time
  • Operating system(s): Platform independent
  • Programming language: R

Data availability

  • The data is confidential and is not publicly available

Declarations

List of abbreviations

  • AI/ML - Artificial Intellgence/Machine Learning
  • MALDI - Matrix-assisted laser desorption/ionization
  • IMS - Imaging mass spectrometry

Author contributions

Hoang Tran and Valeriia Sherina conducted the statistical analysis and Fang Xie conducted the experiments and prepared the data.

References

  • Chen, Tianqi, et al. “Xgboost: extreme gradient boosting.” R package version 0.4-2 1.4 (2015).
  • Lundberg, Scott M., et al. “From local explanations to global understanding with explainable AI for trees.” Nature machine intelligence 2.1 (2020): 56-67.