Commentary  |
Single Cell Analysis

Sparsely Connected Autoencoders: A Multi-Purpose Tool for Single Cell Omics Analysis

28 July 2022
In this Commentary article, we explore the opportunity afforded by Sparsely Connected Autoencoders to transform data in a controlled way to enable the comprehension of hidden biological information in single-cell data. Autoencoders provide researchers with a simplified means of inspecting functional relationships among regulatory elements as transcription factors.

Presented by: Raffaele Calogero, Associate Professor of Molecular Biology at the University of Turin

Edited by: Ben Norris

In single cell omics analysis, different cell types are not easily quantifiable without the removal of experimental noise. As such, a reduction in dimensionality is crucial for the visualisation and interpretation of single-cell sequencing data. Raffaele Calogero, Associate Professor of Molecular Biology at the University of Turin, has recently developed a new class of autoencoders: artificial neural networks used to address a biologically driven data reduction in single cell RNA-seq data. In his research, Sparsely Connected Autoencoders represent the bioinformatics version of a Swiss Army Knife for the extraction of hidden bioinformatics from single-cell omics data.

Clarification of Cellular Structure: De-Noising Data

“Bioinformatics should be used on the basis of the biology that you are studying,” opened Raffaele Calogero at Oxford Global’s Spatial Biology Europe 2022 conference. He highlighted the importance of dimensional reduction in single-cell analysis, adding that single cells are, in general, “very noisy and highly dimensional.” This necessitates the use of Sparsely Connected Autoencoders (SCAs) to depict biologically interesting features from single-cell omics data.

“You get, as input, your noisy single-cell data,” Calogero explained. “Everything is compressed and then deconvoluted in order to remove experimental noise and enhance cells biological characteristics.” In doing so, this enables the reconstruction of the overall picture of cells organisation in a tissue. This reduction in dimensionality is achieved through the projection of high-dimensional data into low-dimensional spaces to visualise the cluster structures and difference in developmental trajectory.

Already an Omics Community Member? Skip the forms and Login

Sparsely Connected Autoencoders: Deep Learning Approaches

The solution Calogero and his team arrived at was a deep-learning approach utilising SCAs, which he described as being ‘very powerful’. The autoencoder is an unsupervised artificial neural network designed to reduce data dimensions by learning how to ignore both noise and anomalies in the data. “It’s a relatively simple neural network,” continued Calogero, “characterised by input and output data representing general gene expression.”

The autoencoder is an unsupervised artificial neural network, designed to reduce data dimensions by learning how to ignore noise and anomalies in data. It compresses and encodes data before reconstructing it from the reduced encoded representation. In doing so, it produces an output that is as close as possible to the original input (seen in Figure 1). The autoencoder utilises a single-later with sparse connections to attain a value for biological features and not the simple gene expression. “This approach has the advantage of being easily extended to any gene set designed around some biological feature, such as functional networks or gene regulatory elements such as transcription factors,” Calogero added.

Sparsely Connected Autoencoders: The process by which an autoencoder de-noises data.
Figure 1. The process by which an autoencoder de-noises data.

A study undertaken by Calogero and his team attempted to reconstruct breast cancer data using a spatial transcriptomic dataset. “When you look at clustering on SCA processed data, you realise that some clusters have intracluster heterogeneity which cannon be easily depicted using the simple gene expression,” he said.

Depicting Hidden Cell Type Characteristics

Calogero then moved on to discussing the Colon Immune Atlas. The colon, as a barrier tissue, represents a unique immune environment where immune cells display tolerance towards diverse communities of microbes collectively known as the microbiome. The Colon Schematic Study – an investigation of single-cell transcriptomes for 41,650 cells isolated from the caecum – revealed differences between the immune cells in different parts of the colon.

As Calogero explained, the immune cell types in the Colon Immune Atlas were annotated using a limited number of markers and the separation among the immune compartments was not easily depictable using conventional single cell transcription profiling. By representing the data using SCAs based on transcription factors, Calogero found that the separation among cell types was more evident along different immunological cell origin, highlighting the differences based on cell specific transcription control.” Additionally, cell type annotation could be refined using SCA transformation.

“In an oncological setting, the idea was to try to investigate if we could identify differences in, for example, a population that seems to be relatively homogenous,” said Calogero. Analysis of this type of dataset involved the counting of matrices produced by a mixture of five human lung adenocarcinoma cell lines. “The interesting part is that, after SCA deconvolution by mean of transforming expression data in cytoband expression, differences within a relatively homogenous cell line can be detected.”

Notes for Future Applications of Sparsely Connected Autoencoders

Cell subpopulation can be depicted simply using the gene level information. However, when expression data is aggregated on the basis of some biological feature or genome structure characteristic – for example, transcription factors or cytobands – hidden subtle cell population structures start to become apparent. “The work we are now doing is collecting data on how the open chromatin is changing in cell differentiation,” Calogero said.

The focus for future projects will involve building a new model where different biological features are characterised by their interactions with one another. For example, kinases, transcription factors, miRNAs, and chromatin organisation, where kinases drugs can be tested in silico with a view to identifying key elements involved in their functionalities. “From a technical point of view, it’s very difficult to generate these kinds of models,” he said. However, the benefits for healthcare studies are manifold, with huge implications for the future of single-cell omics analysis.

“The ability of the autoencoder to retain only the important part of a signal can help in discriminating true differences in cell populations.”

SCAs offer the opportunity to transform data in a controlled way to enable the comprehension of hidden biological information in single-cell data. This information may include relations among cell populations. SCAs also provide researchers with a simplified means of inspecting functional relationships among regulatory elements as transcription factors, or miRNAs. “The peculiar ability of the autoencoder to retain only the important part of a signal can help in discriminating between true differences among cell populations and clustering overfitting,” concluded Calogero.

Visit our Omics portal to read more about the latest advances in single-cell analysis and other breakthroughs in genomic data imaging. If you’d like to register your interest in our upcoming Spatial Biology US: In-Person event, click here.

Speaker Biographies

Raffaele Calogero began his journey in bioinformatics in the 1990s when he developed a tool to simulate a molecular biology laboratory for PC. He established the Genomics and Bioinformatics Unit at the University of Turin in 1998, where he works as an Associate Professor in Molecular Biology. In collaboration with Professor Forni, he has focused on the identification of new targets for anti-tumoural vaccination protocols. His research experience has been devoted to the development and optimisation of analysis workflows and the mining of transcription-based experiments, mainly in the framework of oncology.

Share this article

Share on facebook
Share on twitter
Share on linkedin

Sign up for our monthly Omics Newsletter

You may also be interested in...

Discussion Group Report
Our April 2022 Omics Series discussion group focused on the potential and future of single-cell proteomics.
25 May 2022
Insight Article
Single-cell proteomics has developed quickly, despite being a relatively new field. NanoPOTS and FACS are being used to improve proteome coverage and minimise sample size.
05 May 2022

Continue browsing

Share this article

Share on facebook
Share on twitter
Share on linkedin

Join our Omics mailing list

We produce cutting edge congresses and summits for the Life Sciences Industry, bringing together industry leaders and solution providers at a senior level, creating the opportunity to partner, network and knowledge share.

Contact Us:

Copyright Oxford Global Marketing Limited. All rights reserved.

Member Community Login

Stay up to date

Sign up for our monthly Editorial Newsletter to keep up with all things Omics