Create a Local BLAST Database From FASTA File

Renesh Bedre    2 minute read

The local BLAST database is useful for performing fast and efficient local sequence searches using NCBI BLAST tool.

You can search against specific sequences using the local BLAST database (instead of the whole NCBI database). A local BLAST database is also useful for reproducible sequence searches.

The NCBI BLAST executables contains the makeblastdb utility for creating the local BLAST database for nucleotide and protein sequences.

The general syntax of makeblastdb looks like this:

# for nucleotide sequences
makeblastdb -in input.fasta -dbtype nucl -parse_seqids -out test 

# for protein sequences
makeblastdb -in input.fasta -dbtype prot -parse_seqids -out test 

Where,

Parameter Description
-in Input FASTA file (nucleotide or protein) to create a BLAST database
-dbtype Molecule type (“nucl” for nucleotide and “prot” for protein sequences)
-parse_seqids Enable sequence id parsing. This is useful for extracting the sequences by their IDs using blastdbcmd. This is optional but recommended to use
-out Name of the database (default will be input fasta file name). This is optional.

The following examples explains how to use makeblastdb to create a local BLAST database from FASTA file.

Create a BLAST database for nucleotide sequences

For example, the sample_nucl.fasta file contains the following DNA sequences,

# example FASTA file (sample_nucl.fasta)
>seq1
TTCAGTTCCTCCATCTCTCTAAGCTGTTTTTCAGAAATGGTGTCTGGGTTGGAGACATCAAGA
>seq2
CTTCACGATCACGAATCACGATTACATAAACTCCACAACTTCACGGTTCCTTCCAATCAGTTCCAGTGT
>seq3
TTTTTGAGAGCTGGAACTATCTGGAGCATCAATTTTCCCAGGATTAGGGAATTGACATCTCT

Now, use a makeblastdb to create a local BLAST database of DNA sequences

makeblastdb -in sample_nucl.fasta -dbtype nucl -parse_seqids -out sample 

Building a new DB, current time: 07/30/2023 22:16:52
New DB name:   /home/renesh/lin_proj/atha_eg/temp/sample
New DB title:  sample_nucl.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 3 sequences in 0.0218239 seconds.

You should see sample.nhr, sample.nin, sample.nog, sample.nsd, sample.nsi, and sample.nsq database files, once the makeblastdb succesfully completed.

Note: If your input FASTA file is large (> 4GB; greater than -max_file_sz parameter), the BLAST database will be split into many parts. For example, you should see database files like sample.00.nhr, sample.01.nhr, and so on.

You can use this formatted BLAST database to perform local sequence search (e.g. blastn) using input FASTA query sequence.

Create a BLAST database for protein sequences

For example, the sample_prot.fasta file contains the following protein sequences,

# example FASTA file (sample_prot.fasta)
>seq1
MERLNSKLYVENCYIMKENEKLRKKAELLNQENQQLLVQLKQKLSKANKNPNGSNNDNNVSSSSSASGKS
>seq2
KQKLSKANKNPNGSNNDNNVSSSSSASGKSNCYIMKENEKLRKKAELLNQENQQLL
>seq3
KLRKKAELLNQENQQLLVQLKQKLSKLVQLKQKLSKANKNPNGSNNDNNVSSSSNSKLYVENCYIMKEN

Now, use a makeblastdb to create a BLAST database of protein sequences

makeblastdb -in sample_prot.fasta -dbtype prot -parse_seqids -out sample 

Building a new DB, current time: 07/30/2023 22:25:49
New DB name:   /home/renesh/lin_proj/atha_eg/temp/sample
New DB title:  sample_prot.fasta
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 3 sequences in 0.00608587 seconds.

You should see sample.phr, sample.pin, sample.pog, sample.psd, sample.psi, and sample.psq database files, once the makeblastdb succesfully completed.

You can use this formatted BLAST database to perform local sequence search (e.g. blastp) using input FASTA query sequence.

Enhance your skills with courses on genomics and bioinformatics


This work is licensed under a Creative Commons Attribution 4.0 International License

Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.