Examples: histone, BN000065

Project: PRJNA1031667

Chemical RNA modifications, collectively referred to as the ‘epitranscriptome’, have been intensively studied during the last years, largely facilitated by the use of next-generation sequencing technologies. Recent efforts have turned towards the nanopore direct RNA sequencing (DRS) platform, as it allows simultaneous detection of diverse RNA modification types in full-length native RNA molecules. While RNA modifications can be identified in the form of systematic basecalling ‘errors’ in DRS datasets, m6A modifications produce very modest ‘errors’, limiting the applicability of this approach to sites that are modified at high stoichiometries. Here, we demonstrate that the use of alternative RNA basecalling models, trained with fully-unmodified in vitro synthetic sequences, increase the ‘error’ signal of m6A modifications, leading to enhanced detection of RNA modifications even at lower stoichiometries. We then show that the use of these models enhances the detection of RNA modifications on previously published in vivo human samples, using third-party softwares for the detection of RNA modifications. Moreover, our work provides a novel RNA basecalling model that shows a median accuracy of 97%, compared to previously available RNA basecalling models that show 91% accuracy. Notably, this increase in accuracy does not only lead to improved detection of RNA modifications, but also enhanced mappability of RNA reads, which becomes more evident in the case of short RNA reads (50% increase). Altogether, our work stresses the importance of using fully unmodified RNA sequences for training RNA basecalling models, and how the use of different basecalling models can significantly affect the detection of RNA modifications and read mappability. Overall design: To compare error-signatures produced by dRNA-seq across three basecalling models (default, IVT and SUP) and seven RNA modifications types (m6A, m5C, hm5C, ac4C, Ψ, m1Ψ, and m5U) we sequenced fully modified and unmodified in vitro construct. These 'curlcake' sequences contain ever possible 5-mer (n = 1024) in multiple broader sequence contexts (median occurence per 5-mer, n = 10). To provide information on the effect of varying stoichiometries on the modificatoin signature of m6A we additionally sequenced curlcakes generated with the following amount of incorporated m6A (12.5%, 25%, 50% and 75% modified). Subsequently we tested whether our novel basecalling models can lead to an imporved detection of m6A sites on in vivo data. We used publicly availabe dRNA-seq data of wildtype and Mettl3-/- HEK293T cells present in biological triplicates which we basecalled using three different models (default, IVT and SUP). To identify m6A modified sites we performed pairwise-comparisons using eligos2 between widltype and knockout samples To compare read accuracy and mapping ability of the three tested basecalling models, we used publicly available dRNA-seq data for several model species (H. sapiens, M. musculus, X. laevis, S. cerevisiae and A. thaliana) We tested the ability of two basecalling models (default and SUP) to detect m1Ψ in a synthetic mRNA vaccine

General