Initial commit

2c405146 · Ivan Merelli · 2c405146 · 2c405146 · 2c405146 · 2c405146
Commit 2c405146 authored Nov 25, 2025 by Ivan Merelli
--- a/LICENSE
+++ b/LICENSE
+MIT License
+
+Copyright (c) 2022 scVAR
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
+# scVAR
+
+**scVAR** is a computational framework for extracting and integrating genetic variants from single-cell RNA sequencing (scRNA-seq) data. It uses a coupled variational autoencoder (VAE) to merge transcriptomic and variant-derived information into a unified latent representation, enabling deeper characterization of cellular heterogeneity in diseases such as leukemia.
+
+
+## 🔍 Motivation
+
+AML and B-ALL display extensive genetic and transcriptional heterogeneity, making clonal identification difficult when relying solely on gene expression.  
+While scRNA-seq is routinely used to quantify transcriptional states, it also contains information on expressed genetic variants.  
+**scVAR** is designed to simultaneously analyze both sources of information from a *single* scRNA-seq assay, without requiring matched DNA sequencing.
+
+
+## 👉 Key Features
+
+- Extracts expressed genetic variants from scRNA-seq BAMs  
+- Produces a variant-by-cell matrix using VarTrix  
+- Processes transcriptomic data using Scanpy-compatible workflows  
+- Integrates both modalities through a dual-encoder VAE  
+- Fuses RNA and variant embeddings via cross-attention  
+- Generates a unified latent space for clustering and visualization  
+- Robust under sparse and noisy 3′ scRNA-seq coverage  
+- Scales up to datasets with tens of thousands of cells
+
+
+## 🛠️ Installation
+
+To install **scVAR**, create a new environment using `mamba` and install the package from source:
+
+```
+mamba create -n scvar_env python=3.10  
+mamba activate scvar_env
+git clone http://www.bioinfotiget.it/gitlab/custom/scvar.git
+cd scvar  
+pip install .
+```
+
+**Note:** scVAR requires **Python == 3.10**.
+
+
+## 📁 Data Availability
+
+All datasets used in the scVAR manuscript including AML, B-ALL, and synthetic benchmarking datasets are publicly available at:
+
+**https://www.dropbox.com/scl/fo/kc49b6y47hjf2zdle1zz2/AA-UA7lKpLpdHOTldAhasds?rlkey=4dkx4t5yxc407twomwqjte65p&dl=0**
+
+The repository contains:
+
+- 10x matrices
+- VarTrix genotype matrices
+- metadata
+- synthetic datasets
+- files required to reproduce manuscript figures
+
+
+## 🚀 Getting Started
+ 
+Two Jupyter notebooks are included in the `notebooks` directory:
+
+- **Leukemia notebook:** full application of scVAR to a public AML dataset
+- **Synthetic notebook:** benchmarking scVAR using the in silico datasets
+
+
+## 🧪 Synthetic Dataset Generator
+
+scVAR provides an in silico simulator designed to generate paired single-cell datasets containing both transcriptomic and variant-derived information for each simulated cell.  
+These synthetic datasets were used to benchmark the integration performance of scVAR under controlled noise, sparsity, and coverage conditions.
+
+The simulator produces:
+
+- gene expression matrices
+- variant-by-cell genotype matrices
+- configurable cell types and genotypes
+- realistic dropout, sparsity, and allelic imbalance
+- optional cross-modal label mismatches
+- datasets ranging from 5,000 to 50,000+ cells
+
+
+## 📜 License
+
+Distributed under the MIT License. See the `LICENSE` file for more information.
--- a/aml
+++ b/aml
+../aml
\ No newline at end of file
--- a/bll
+++ b/bll
+../bll
\ No newline at end of file
--- a/build/lib/scVAR/__init__.py
+++ b/build/lib/scVAR/__init__.py
+"""
+scVAR package initialization
+============================
+Expose main analysis functions for transcriptomic and variant integration.
+"""
+
+from .scVAR import (
+    transcriptomicAnalysis,
+    variantAnalysis,
+    calcOmicsClusters,
+    weightsInit,
+    save_all_umaps,
+    omicsIntegration,
+    pairedIntegrationTrainer,
+    distributionClusters,
+)
+
+__all__ = [
+    "transcriptomicAnalysis",
+    "variantAnalysis",
+    "calcOmicsClusters",
+    "weightsInit",
+    "omicsIntegration",
+    "save_all_umaps",
+    "pairedIntegrationTrainer",
+    "distributionClusters",
+]
--- a/build/lib/scVAR/scVAR.py
+++ b/build/lib/scVAR/scVAR.py
--- a/build/lib/scVAR/scVAR_boh.py
+++ b/build/lib/scVAR/scVAR_boh.py
--- a/build/lib/scVAR/scVAR_muon.py
+++ b/build/lib/scVAR/scVAR_muon.py
--- a/build/lib/scVAR/scVAR_old.py
+++ b/build/lib/scVAR/scVAR_old.py
--- a/build/lib/scVAR/scVAR_umap.py
+++ b/build/lib/scVAR/scVAR_umap.py
--- a/notebook/aml_example.ipynb
+++ b/notebook/aml_example.ipynb
--- a/notebook/muon.ipynb
+++ b/notebook/muon.ipynb
--- a/notebook/scvar.ipynb
+++ b/notebook/scvar.ipynb
--- a/scVAR.egg-info/PKG-INFO
+++ b/scVAR.egg-info/PKG-INFO
+Metadata-Version: 2.4
+Name: scVAR
+Version: 0.0.1
+Summary: A tool to integrate genomics and transcriptomics in scRNA-seq data.
+Author: Samuele Manessi
+Author-email: samuele.manessi@itb.cnr.it
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: numpy
+Requires-Dist: pandas
+Requires-Dist: scanpy
+Requires-Dist: torch
+Requires-Dist: umap
+Requires-Dist: leidenalg
+Requires-Dist: igraph
+Requires-Dist: anndata
+Requires-Dist: scikit-learn
+Requires-Dist: scipy
+Requires-Dist: matplotlib
+Dynamic: author
+Dynamic: author-email
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: license-file
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+
+# scVAR
+
+**scVAR** is a computational tool for extracting and integrating genetic variants from single-cell RNA-seq (scRNA-seq) data. It uses variational autoencoders to construct a latent space that combines transcriptional and genetic signals, helping to resolve cellular heterogeneity — particularly in complex diseases such as leukemia.
+
+## 🔍 Motivation
+
+Leukemias like AML and B-ALL exhibit high genetic and transcriptomic heterogeneity, making clonal analysis particularly challenging. Although scRNA-seq is widely used to study gene expression, it also contains valuable information on genetic variants. **scVAR** leverages this dual information to jointly analyze transcriptional and genetic signals from the same dataset, without requiring matched DNA sequencing.
+
+## 🧠 What It Does
+
+- Detects expressed genetic variants directly from scRNA-seq data  
+- Integrates transcriptomic and variant information using multi-input variational autoencoders  
+- Builds a shared latent space capturing both omics layers  
+- Enhances detection of rare subclones and subtle transcriptional states  
+- Recovers structure often missed when analyzing transcriptomic or genomic data in isolation
+
+## 📊 Use Cases
+
+- Clonal architecture analysis in AML and B-ALL  
+- Interpretation of relapse samples  
+- Joint modeling of gene expression and mutational signals  
+- Effective utilization of sparse variant data from 10x Genomics 5′ scRNA-seq
+
+## 📁 Data & Results
+
+In AML samples, **scVAR** identified subclones with distinct transcriptional programs that were not detectable using gene expression or variant data alone. In B-ALL, it revealed fine-grained cellular structures and helped disentangle overlapping transcriptional and genetic signals.
+
+## 🚀 Getting Started
+
+An example of workflow is provided in the `example/` folder. A jupyter notebbok is also provided in the `notebooks/` folder.
+
+## 🛠️ Installation
+
+To install **scVAR**, create a new environment using `mamba` and install the package from source:
+
+```
+mamba create -n scvar_env python=3.10  
+mamba activate scvar_env
+git clone http://www.bioinfotiget.it/gitlab/custom/scvar.git
+cd scvar  
+pip install .
+```
+
+**Note:** scVAR requires **Python == 3.10**.
+
+## 📜 License
+
+Distributed under the MIT License. See the `LICENSE` file for more information.
--- a/scVAR.egg-info/SOURCES.txt
+++ b/scVAR.egg-info/SOURCES.txt
+LICENSE
+README.md
+setup.py
+scVAR/__init__.py
+scVAR/scVAR.py
+scVAR/scVAR_muon.py
+scVAR.egg-info/PKG-INFO
+scVAR.egg-info/SOURCES.txt
+scVAR.egg-info/dependency_links.txt
+scVAR.egg-info/requires.txt
+scVAR.egg-info/top_level.txt
\ No newline at end of file
--- a/scVAR.egg-info/dependency_links.txt
+++ b/scVAR.egg-info/dependency_links.txt
+
--- a/scVAR.egg-info/requires.txt
+++ b/scVAR.egg-info/requires.txt
+numpy
+pandas
+scanpy
+torch
+umap
+leidenalg
+igraph
+anndata
+scikit-learn
+scipy
+matplotlib
--- a/scVAR.egg-info/top_level.txt
+++ b/scVAR.egg-info/top_level.txt
+scVAR
--- a/scVAR/__init__.py
+++ b/scVAR/__init__.py
+"""
+scVAR package initialization
+============================
+Expose main analysis functions for transcriptomic and variant integration.
+"""
+
+from .scVAR import (
+    transcriptomicAnalysis,
+    variantAnalysis,
+    calcOmicsClusters,
+    weightsInit,
+    save_all_umaps,
+    omicsIntegration,
+    pairedIntegrationTrainer,
+    distributionClusters,
+)
+
+__all__ = [
+    "transcriptomicAnalysis",
+    "variantAnalysis",
+    "calcOmicsClusters",
+    "weightsInit",
+    "omicsIntegration",
+    "save_all_umaps",
+    "pairedIntegrationTrainer",
+    "distributionClusters",
+]
--- a/scVAR/__pycache__/__init__.cpython-310.pyc
+++ b/scVAR/__pycache__/__init__.cpython-310.pyc