- featureCounts software program summarizes the read counts for genomic features (e.g., exons) and meta-features (e.g., gene) from genome mapped RNA-seq, or genomic DNA-seq reads (SAM/BAM files).
- featureCounts uses genomics annotations in GTF or SAF format for counting genomic features and meta-features.
When you want to analyze the data for differential gene expression analysis, it would be convenient to have counts for all samples in a single file (gene count matrix). You can get this gene count matrix file when you run featureCounts on all mapped files at once.
# meta-feature (gene) level count featureCounts -t 'exon' -g 'gene_id' -a annotation.gtf -T 10 -o counts.txt library1.bam library2.bam library3.bam # use -f option for feature (exon) level count
But, when you run a featureCounts for large samples individually, then the counts for each sample will be in a separate text file.
To get the merged gene count matrix from all individual counts files, we will use bioinfokit v2.0.5
# run this Python code (in a Python interpreter) from a folder where all files are present from bioinfokit.analys import HtsAna # make sure all individual count files are present in same folder # by default, it assumes each count file has .txt extension HtsAna.merge_featureCount()
See detailed usgae of
Once it runs successfully, you can see the output file
gene_matrix_count.csv in the same folder, which has counts
merged for all samples.
# gene_matrix_count.csv Geneid,sample1.bam,sample2.bam,sample3.bam PGSC0003DMG400015133,0,7,2 PGSC0003DMG400015132,72,95,155 PGSC0003DMG400022764,42,78,77 PGSC0003DMG400022799,2,3,5
- featureCounts: an efficient general purpose program for assigning sequence reads to genomic features
- featureCounts: a ultrafast and accurate read summarization program
If you have any questions, comments or recommendations, please email me at email@example.com
This work is licensed under a Creative Commons Attribution 4.0 International License