Samtools: Extract Mapped and Unmapped Paired-end Reads

Renesh Bedre    2 minute read

Samtools can be used for extracting mapped and unmapped sequence reads from SAM and BAM files.

Unlike single-end read filtering, you need to consider whether the paired-end reads are properly paired and both reads of the pairs are mapped while extracting mapped and unmapped sequence reads.

The paired-end reads are properly paired (concordant alignments) when both of the reads are mapped to the reference genome in the correct orientation as per library preparation protocol (e.g.first read on the forward strand and second read on the reverse strand). In addition, the properly paired reads will have the expected insert size (distance between the mapped positions of the read pair).

You can use the samtools view command with -F or -f parameter and associated flag values for extracting mapped and unmapped paired-end reads from SAM/BAM files.

The following examples demonstrate how to extract mapped and unmapped paired-end reads from the BAM file using samtools.

Extract paired reads mapped in the proper pair

You can use the following commands to extract the reads mapped in the proper pair.

samtools view -b -f 2 input.bam > mapped.bam

Where, -b parameter specifies the output should be in BAM format, -f 2 parameter specifies to extract paired-end reads mapped in proper pair.

The above command will create a new BAM file mapped.bam which will contain paired-end reads mapped in proper pair.

If you want to create an output file in SAM format, you can use the following command.

samtools view -f 2 input.bam > mapped.sam

Extract paired reads where one read is mapped and the other is unmapped

You can use the following commands to extract the paired-end reads where one read is mapped and the other read is unmapped.

samtools view -b -F 4 -f 8 input.bam > mapped.bam

Where, -b parameter specifies the output should be in BAM format, -F 4 parameter specifies to extract paired-end reads that are mapped, and -f 8 parameter specifies to extract paired-end reads where one of the reads in the pair is unmapped.

The above command will create a new file mapped.bam which will contain mapped paired-end reads where one of the reads in the pair is unmapped i.e. extract paired reads where one of the read is mapped and other is unmapped.

If you want to create an output file in SAM format, you can use the following command.

samtools view -F 4 -f 8 input.bam > mapped.sam

Extract paired-end reads where both reads are not mapped

You can use the following commands to extract the paired-end reads where both reads are not mapped to the reference genome.

samtools view -b -f 12 input.bam > unmapped.bam

Where, -b parameter specifies the output should be in BAM format, -f 12 parameter specifies to extract the paired-end reads where both reads of the pairs are not mapped.

The above command will create a new file unmapped.bam which will contain paired-end reads where both reads of the pairs are not mapped.

Enhance your skills with courses on genomics and bioinformatics


This work is licensed under a Creative Commons Attribution 4.0 International License

Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.