Commit 2c405146 authored by Ivan Merelli's avatar Ivan Merelli
Browse files

Initial commit

parents
MIT License
Copyright (c) 2022 scVAR
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
# scVAR
**scVAR** is a computational framework for extracting and integrating genetic variants from single-cell RNA sequencing (scRNA-seq) data. It uses a coupled variational autoencoder (VAE) to merge transcriptomic and variant-derived information into a unified latent representation, enabling deeper characterization of cellular heterogeneity in diseases such as leukemia.
## 🔍 Motivation
AML and B-ALL display extensive genetic and transcriptional heterogeneity, making clonal identification difficult when relying solely on gene expression.
While scRNA-seq is routinely used to quantify transcriptional states, it also contains information on expressed genetic variants.
**scVAR** is designed to simultaneously analyze both sources of information from a *single* scRNA-seq assay, without requiring matched DNA sequencing.
## 👉 Key Features
- Extracts expressed genetic variants from scRNA-seq BAMs
- Produces a variant-by-cell matrix using VarTrix
- Processes transcriptomic data using Scanpy-compatible workflows
- Integrates both modalities through a dual-encoder VAE
- Fuses RNA and variant embeddings via cross-attention
- Generates a unified latent space for clustering and visualization
- Robust under sparse and noisy 3′ scRNA-seq coverage
- Scales up to datasets with tens of thousands of cells
## 🛠️ Installation
To install **scVAR**, create a new environment using `mamba` and install the package from source:
```
mamba create -n scvar_env python=3.10
mamba activate scvar_env
git clone http://www.bioinfotiget.it/gitlab/custom/scvar.git
cd scvar
pip install .
```
**Note:** scVAR requires **Python == 3.10**.
## 📁 Data Availability
All datasets used in the scVAR manuscript including AML, B-ALL, and synthetic benchmarking datasets are publicly available at:
**https://www.dropbox.com/scl/fo/kc49b6y47hjf2zdle1zz2/AA-UA7lKpLpdHOTldAhasds?rlkey=4dkx4t5yxc407twomwqjte65p&dl=0**
The repository contains:
- 10x matrices
- VarTrix genotype matrices
- metadata
- synthetic datasets
- files required to reproduce manuscript figures
## 🚀 Getting Started
Two Jupyter notebooks are included in the `notebooks` directory:
- **Leukemia notebook:** full application of scVAR to a public AML dataset
- **Synthetic notebook:** benchmarking scVAR using the in silico datasets
## 🧪 Synthetic Dataset Generator
scVAR provides an in silico simulator designed to generate paired single-cell datasets containing both transcriptomic and variant-derived information for each simulated cell.
These synthetic datasets were used to benchmark the integration performance of scVAR under controlled noise, sparsity, and coverage conditions.
The simulator produces:
- gene expression matrices
- variant-by-cell genotype matrices
- configurable cell types and genotypes
- realistic dropout, sparsity, and allelic imbalance
- optional cross-modal label mismatches
- datasets ranging from 5,000 to 50,000+ cells
## 📜 License
Distributed under the MIT License. See the `LICENSE` file for more information.
../aml
\ No newline at end of file
../bll
\ No newline at end of file
"""
scVAR package initialization
============================
Expose main analysis functions for transcriptomic and variant integration.
"""
from .scVAR import (
transcriptomicAnalysis,
variantAnalysis,
calcOmicsClusters,
weightsInit,
save_all_umaps,
omicsIntegration,
pairedIntegrationTrainer,
distributionClusters,
)
__all__ = [
"transcriptomicAnalysis",
"variantAnalysis",
"calcOmicsClusters",
"weightsInit",
"omicsIntegration",
"save_all_umaps",
"pairedIntegrationTrainer",
"distributionClusters",
]
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Metadata-Version: 2.4
Name: scVAR
Version: 0.0.1
Summary: A tool to integrate genomics and transcriptomics in scRNA-seq data.
Author: Samuele Manessi
Author-email: samuele.manessi@itb.cnr.it
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: scanpy
Requires-Dist: torch
Requires-Dist: umap
Requires-Dist: leidenalg
Requires-Dist: igraph
Requires-Dist: anndata
Requires-Dist: scikit-learn
Requires-Dist: scipy
Requires-Dist: matplotlib
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary
# scVAR
**scVAR** is a computational tool for extracting and integrating genetic variants from single-cell RNA-seq (scRNA-seq) data. It uses variational autoencoders to construct a latent space that combines transcriptional and genetic signals, helping to resolve cellular heterogeneity — particularly in complex diseases such as leukemia.
## 🔍 Motivation
Leukemias like AML and B-ALL exhibit high genetic and transcriptomic heterogeneity, making clonal analysis particularly challenging. Although scRNA-seq is widely used to study gene expression, it also contains valuable information on genetic variants. **scVAR** leverages this dual information to jointly analyze transcriptional and genetic signals from the same dataset, without requiring matched DNA sequencing.
## 🧠 What It Does
- Detects expressed genetic variants directly from scRNA-seq data
- Integrates transcriptomic and variant information using multi-input variational autoencoders
- Builds a shared latent space capturing both omics layers
- Enhances detection of rare subclones and subtle transcriptional states
- Recovers structure often missed when analyzing transcriptomic or genomic data in isolation
## 📊 Use Cases
- Clonal architecture analysis in AML and B-ALL
- Interpretation of relapse samples
- Joint modeling of gene expression and mutational signals
- Effective utilization of sparse variant data from 10x Genomics 5′ scRNA-seq
## 📁 Data & Results
In AML samples, **scVAR** identified subclones with distinct transcriptional programs that were not detectable using gene expression or variant data alone. In B-ALL, it revealed fine-grained cellular structures and helped disentangle overlapping transcriptional and genetic signals.
## 🚀 Getting Started
An example of workflow is provided in the `example/` folder. A jupyter notebbok is also provided in the `notebooks/` folder.
## 🛠️ Installation
To install **scVAR**, create a new environment using `mamba` and install the package from source:
```
mamba create -n scvar_env python=3.10
mamba activate scvar_env
git clone http://www.bioinfotiget.it/gitlab/custom/scvar.git
cd scvar
pip install .
```
**Note:** scVAR requires **Python == 3.10**.
## 📜 License
Distributed under the MIT License. See the `LICENSE` file for more information.
LICENSE
README.md
setup.py
scVAR/__init__.py
scVAR/scVAR.py
scVAR/scVAR_muon.py
scVAR.egg-info/PKG-INFO
scVAR.egg-info/SOURCES.txt
scVAR.egg-info/dependency_links.txt
scVAR.egg-info/requires.txt
scVAR.egg-info/top_level.txt
\ No newline at end of file
numpy
pandas
scanpy
torch
umap
leidenalg
igraph
anndata
scikit-learn
scipy
matplotlib
"""
scVAR package initialization
============================
Expose main analysis functions for transcriptomic and variant integration.
"""
from .scVAR import (
transcriptomicAnalysis,
variantAnalysis,
calcOmicsClusters,
weightsInit,
save_all_umaps,
omicsIntegration,
pairedIntegrationTrainer,
distributionClusters,
)
__all__ = [
"transcriptomicAnalysis",
"variantAnalysis",
"calcOmicsClusters",
"weightsInit",
"omicsIntegration",
"save_all_umaps",
"pairedIntegrationTrainer",
"distributionClusters",
]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment