How to Filter Mapped and Unmapped Sequence Reads with Samtools

Renesh Bedre    2 minute read

Samtools is a suite of utilities commonly used in analyzing the aligned sequence data in the SAM (Sequence Alignment/Map) and BAM (Binary Alignment/Map) formats in bioinformatics and genomics analysis.

samtools view command with -F or -f parameter and a flag value is typically used in the filtering mapped and unmapped sequence reads from SAM/BAM files.

The flag value is a numerical value that encodes various properties of each read alignment. For example, the flag value of 4 (0x4) indicates that the sequence read does not have a valid alignment to the reference genome (unmapped sequence reads).

The following examples demonstrate how to filter mapped and unmapped equence reads from the BAM file using samtools.

Filter unmapped sequence reads

You can use the following commands to filter the unmapped sequence reads from the BAM file using Samtools.

samtools view -b -f 4 input.bam > unmapped.bam

Where, -b parameter specify the output should be in BAM format, -f 4 parameter specifies to filter the unmapped sequence reads (retain only unmapped sequence reads in unmapped.bam).

The above command will create a new BAM file unmapped.bam which will contain only unmapped reads from the input BAM file.

If you want to create an output file in SAM format, you can use the following command.

samtools view -f 4 input.bam > unmapped.sam

The above command will create a new SAM file unmapped.sam which will contain only unmapped reads from the input BAM file.

These commands will work for the single-end reads. While filtering mapped and unmapped sequence reads for paired-end data, it is also important to consider whether the paired-end reads are properly paired.

Filter mapped sequence reads

You can use the following commands to filter the mapped sequence reads from the BAM file using Samtools.

samtools view -b -F 4 input.bam > mapped.bam

Where, -b parameter specifies the output should be in BAM format, -F 4 parameter specifies to filter out the unmapped sequence reads (retain only mapped sequence reads in mapped.bam).

The above command will create a new file mapped.bam which will contain only mapped reads from the input BAM file.

If you want to create an output file in SAM format, you can use the following command.

samtools view -F 4 input.bam > mapped.sam

The above command will create a new SAM file mapped.sam which will contain only mapped reads from the input BAM file.

These commands will work for the single-end reads. While filtering mapped and unmapped sequence reads for paired-end data, it is also important to consider whether the paired-end reads are properly paired.

Enhance your skills with courses on genomics and bioinformatics


This work is licensed under a Creative Commons Attribution 4.0 International License

Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.