blastn: Command-line Utility for Nucleotide Sequence Search

Renesh Bedre    3 minute read

The blastn is a command-line utility from the NCBI BLAST toolkit that is used for performing nucleotide-nucleotide sequence similarity searches using the BLAST algorithm.

blastn compares a query nucleotide sequence against a nucleotide BLAST database to identify homologous sequences. If you want to compare protein sequence against a protein BLAST database, please see blastp tool.

The general syntax of blastn looks like this:

# basic command
blastn -query query_fasta -db blast_nucl_db  -outfmt output_format -out output_file


# command with advanced regularly used options
blastn -query query_fasta -db blast_nucl_db -evalue 1e-05 -perc_identity 60  \
    -max_target_seqs 5  -num_threads 10  -outfmt output_format -out output_file

Where,

Parameter Description
-query Input nucleotide sequences in FASTA format to search against a nucleotide BLAST database
-db Formatted nucleotide BLAST database. See makeblastdb for creating a formatted BLAST database.
-evalue Expectation value (E) value threshold you want to use for the search (default 10). Matches with lower evalue represent significant matches
-perc_identity Percent identity
-max_target_seqs Maximum number of aligned sequences to be reported for each query in the output (default 500). A value of >=5 is recommended
-num_threads Number of threads (CPU cores) for the search (default 1). More is better for a faster search.
-outfmt Numerical value representing a predefined output format or a custom string specifying the fields you want to include in the BLAST output (default 0, pairwise)
-out Name of the output file where results will be saved

In addition to the above frequently used parameters, you can see more parameters and their usage using the blastn -help command

Note: blastn requires the formatted BLAST database. You can create it using the makeblastdb command or you can download the preformatted BLAST database from NCBI.

The following examples explain how to use blastn on the command line for nucleotide-nucleotide sequence similarity searches.

Let’s say you have an input query nucleotide sequence (input.fasta) and a formatted nucleotide database (target_nucl_db).

Run basic blastn command

blastn -query input.fasta -db target_nucl_db -outfmt 6 -out blastn_output.txt

Above blastn compare the nucleotide sequences in input.fasta against the formatted target_nucl_db, and save the results in tabular format (-outfmt 6) in the blastn_output.txt file.

The output should look like this:

head -n5 blastp_ouput.txt
seq1    tar4    100.000 95      0       0       1       95      1       95      4.24e-49        176
seq1    tar3    100.000 95      0       0       1       95      1       95      4.24e-49        176
seq2    tar2    100.000 101     0       0       1       101     102     202     2.49e-52        187
seq3    tar2    100.000 101     0       0       1       101     1       101     2.06e-52        187
seq4    tar4    100.000 101     0       0       1       101     1       101     2.54e-52        187
seq4    tar3    100.000 101     0       0       1       101     1       101     2.54e-52        187

The columns in the output file (with -outfmt 6) represent query id, target id, % identical matches, alignment length, mismatches, gap openings, query start, query end, target start, target end, evalue, and bitscore.

Run blastn command with customized options

blastn -query input.fasta -db target_nucl_db -evalue 1e-05 -perc_identity 60 -max_target_seqs 5  -num_threads 10 \
  -outfmt "6 qseqid qlen sseqid slen qstart qend sstart send nident pident length mismatch gaps qcovs evalue bitscore" \
  -out blastn_output.txt

Above blastn compare the nucleotide sequences in input.fasta against the target_nucl_db with given parameter cut-offs, and save the results with in a tabular format with customized fields in the blastn_output.txt file.

The output should look like this:

head -n5 blastn_output.txt
seq1    95      tar4    101     1       95      1       95      95      100.000 95      0       0       100     4.24e-49176
seq1    95      tar3    224     1       95      1       95      95      100.000 95      0       0       100     4.24e-49176
seq2    120     tar2    202     1       101     102     202     101     100.000 101     0       0       84      2.49e-52187
seq3    101     tar2    202     1       101     1       101     101     100.000 101     0       0       100     2.06e-52187
seq4    122     tar4    101     1       101     1       101     101     100.000 101     0       0       83      2.54e-52187
seq4    122     tar3    224     1       101     1       101     101     100.000 101     0       0       83      2.54e-52187                                                                                                                                                                                                                              105

The columns in the output file represent the customized columns mentioned in -outfmt parameter.

Enhance your skills with courses on genomics and bioinformatics


This work is licensed under a Creative Commons Attribution 4.0 International License

Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.