Data processing articles within Nature Communications

Featured

  • Article
    | Open Access

    There has been a rapid rise in single cell RNA-seq methods and associated pipelines. Here the authors use simulated data to systematically evaluate the performance of 3000 possible pipelines to derive recommendations for data processing and analysis of different types of scRNA-seq experiments.

    • Beate Vieth
    • , Swati Parekh
    •  & Ines Hellmann
  • Article
    | Open Access

    Sequencing platforms, such as Oxford Nanopore or Pacific Biosciences generate long-read data that preserve long-range genomic information but have high error rates. Here, the authors develop MetaMaps, a computational tool for strain-level metagenomic assignment and compositional estimation using long reads.

    • Alexander T. Dilthey
    • , Chirag Jain
    •  & Adam M. Phillippy
  • Article
    | Open Access

    Complete gene expression deconvolution remains a challenging problem. Here, the authors provide a solution based on the recognition that expression levels of cell type specific genes are mutually linear across mixtures and mutually linear gene clusters correspond to cell type-specific signatures.

    • Konstantin Zaitsev
    • , Monika Bambouskova
    •  & Maxim N. Artyomov
  • Article
    | Open Access

    The increasing accessibility of single cell omics technologies beyond transcriptomics demands parallel advances in analysis. Here, the authors introduce STREAM, a pipeline for reconstruction and visualization of differentiation trajectories from both single-cell RNA-seq and ATAC-seq data.

    • Huidong Chen
    • , Luca Albergante
    •  & Luca Pinello
  • Article
    | Open Access

    With the increasing obtainability of multi-OMICs data comes the need for easy to use data analysis tools. Here, the authors introduce Metascape, a biologist-oriented portal that provides a gene list annotation, enrichment and interactome resource and enables integrated analysis of multi-OMICs datasets.

    • Yingyao Zhou
    • , Bin Zhou
    •  & Sumit K. Chanda
  • Article
    | Open Access

    Inferring direct protein−protein interactions (PPIs) and modules in PPI networks remains a challenge. Here, the authors introduce an algorithm to infer potential direct PPIs from quantitative proteomic AP-MS data by identifying enriched interactions of each bait relative to the other baits.

    • Mihaela E. Sardiu
    • , Joshua M. Gilmore
    •  & Michael P. Washburn
  • Article
    | Open Access

    Bacterial outer membrane vesicles (OMVs) are increasingly used as carriers for drug delivery. Here the authors encapsulate biopolymer melanin into OMVs, extending their use to optoacoustic imaging both in vitro and in vivo, and demonstrate the potential of this tool for photothermal therapy applications.

    • Vipul Gujrati
    • , Jaya Prakash
    •  & Vasilis Ntziachristos
  • Article
    | Open Access

    Analyzing the organization of molecular complexes in multi-color single-molecule localization microscopy data requires heavy computation resources that are impractical for laboratory computers. Here the authors develop a coordinate-based Triple-Correlation algorithm with improved speed and reduced computational cost.

    • Yandong Yin
    • , Wei Ting Chelsea Lee
    •  & Eli Rothenberg
  • Article
    | Open Access

    Biomedical image analysis challenges have increased in the last ten years, but common practices have not been established yet. Here the authors analyze 150 recent challenges and demonstrate that outcome varies based on the metrics used and that limited information reporting hampers reproducibility.

    • Lena Maier-Hein
    • , Matthias Eisenmann
    •  & Annette Kopp-Schneider
  • Article
    | Open Access

    Integrated analyses of multiple large-scale screenings can be complicated by batch effects and technical artefacts. McFarland et al. introduce DEMETER2, a hierarchical model coupled with model-based normalization, which allows the assessment of differential dependencies across genes and cell lines.

    • James M. McFarland
    • , Zandra V. Ho
    •  & Aviad Tsherniak
  • Article
    | Open Access

    Sharing of whole genome sequencing (WGS) data improves study scale and power, but data from different groups are often incompatible. Here, US genome centers and NIH programs define WGS data processing standards and a flexible validation method, facilitating collaboration in human genetics research.

    • Allison A. Regier
    • , Yossi Farjoun
    •  & Ira M. Hall
  • Article
    | Open Access

    Functional magnetic resonance imaging (fMRI) is a powerful technique for measuring human brain activity, but the statistical analysis of fMRI data can be difficult. Here, the authors introduce a new fMRI analysis tool, LISA, which provides increased statistical power compared to existing techniques.

    • Gabriele Lohmann
    • , Johannes Stelzer
    •  & Klaus Scheffler
  • Article
    | Open Access

    Inference and representation of differentiation trajectories from single cell RNA-seq data remains a challenge. Here, the authors offer a visualization approach that captures both continuous differentiation trajectories and discrete clusters representing metastable states along the trajectories.

    • Fabrizio Costa
    • , Dominic Grün
    •  & Rolf Backofen
  • Article
    | Open Access

    DNA barcode swapping results in mislabelling of sequencing reads between multiplexed samples. Here, the authors investigate the severity and consequences of barcode swapping for single-cell RNA-seq data, and develop a computational method to exclude swapped reads.

    • Jonathan A. Griffiths
    • , Arianne C. Richard
    •  & John C. Marioni
  • Article
    | Open Access

    Publicly available RNA-seq data is provided mostly in raw form, resulting in a barrier for integrative analyses. Here, Lachmann et al. develop a high-throughput processing infrastructure and search database (ARCHS4) that provides processed RNA-seq data for 187,946 publicly available mouse and human samples to support exploration and reuse.

    • Alexander Lachmann
    • , Denis Torre
    •  & Avi Ma’ayan
  • Article
    | Open Access

    B and T cell receptor diversity can be studied by high-throughput immune receptor sequencing. Here, the authors develop a software tool, IGoR, that calculates the likelihoods of potential V(D)J recombination and somatic hypermutation scenarios from raw immune sequence reads.

    • Quentin Marcou
    • , Thierry Mora
    •  & Aleksandra M. Walczak
  • Article
    | Open Access

    A central problem in biodiversity estimation from genetic markers is the ability of algorithms to retain ‘true’ species while discarding artefacts. Here, the authors present a new post-clusturing curation algorithm using OTU co-occurrences to estimate plant biodiversity from soil samples.

    • Tobias Guldberg Frøslev
    • , Rasmus Kjøller
    •  & Anders Johannes Hansen
  • Article
    | Open Access

    Normal tissue adjacent to the tumour (NAT) is often used as a control in cancer studies. Here, the authors analyse across cancer types the transcriptomes of healthy, NAT, and tumour tissues, and find that NAT presents a unique state, potentially due to inflammatory response of the NAT to the tumour tissue.

    • Dvir Aran
    • , Roman Camarda
    •  & Atul J. Butte
  • Article
    | Open Access

    Somatic hypermutation of antibodies can occur in infants but are difficult to track. Here the authors present a new method called MIDCIRS for deep quantitative repertoire sequencing with few cells, and show infants as young as 3 months can expand antibody lineage complexity in response to malaria infection.

    • Ben S. Wendel
    • , Chenfeng He
    •  & Ning Jiang
  • Article
    | Open Access

    Outsourcing computation for genomic data processing offers the ability to allocate massive computing power and storage on demand. Here, Popic and Batzoglou develop a hybrid cloud aligner for sequence read mapping that preserves privacy with competitive accuracy and speed.

    • Victoria Popic
    •  & Serafim Batzoglou
  • Article
    | Open Access

    We can often observe only a small fraction of a system, which leads to biases in the inference of its global properties. Here, the authors develop a framework that enables overcoming subsampling effects, apply it to recordings from developing neural networks, and find that neural networks become critical as they mature.

    • A. Levina
    •  & V. Priesemann
  • Article
    | Open Access

    Pathway analysis aids interpretation of large-scale gene expression data, but existing algorithms fall short of providing robust pathway identification. The method introduced here includes coexpression analysis and gene importance estimation to robustly identify relevant pathways and biomarkers for patient stratification.

    • Ivan V. Ozerov
    • , Ksenia V. Lezhnina
    •  & Alex Zhavoronkov
  • Article
    | Open Access

    A wealth of gene expression data is publicly available, yet is little use without additional human curation. Ma’ayan and colleagues report a crowdsourcing project involving over 70 participants to annotate and analyse thousands of human disease-related gene expression datasets.

    • Zichen Wang
    • , Caroline D. Monteiro
    •  & Avi Ma’ayan
  • Article
    | Open Access

    The paradigm of reservoir computing shows that, like the human brain, complex networks can perform efficient information processing. Here, a simple delay dynamical system is demonstrated that can efficiently perform computations capable of replacing a complex network in reservoir computing.

    • L. Appeltant
    • , M.C. Soriano
    •  & I. Fischer