Understanding E-value, bit score, and p value in BLAST

Renesh Bedre, Shreya Udawant    3 minute read

BLAST (Basic Local Alignment Search Tool) uses the E-value parameter for assessing the significance of the similarity and to infer homology between the two sequences.

E-value (Expect or Expectation Value) is a statistical value that estimates how likely it is that the number of alignments with a bit score equal to or greater than the observed score could be found by chance when comparing query sequences with the target database.

E-value depends on the bit score (S’), query sequence length (m), and the size of the target database (n: sum of all sequence lengths).

BLAST E-value formula

What is a bit score (S’)?

The bit score (S’) represents the alignment quality between the query sequences and target sequence (a higher bit score represents higher similarity).

Bit score is better than raw score (S) as it is a normalized score (adjusted for scoring parameters such as substitution matrix and gap penalty), and useful for comparing the alignments obtained using different scoring parameters.

E-value and bit score are important measures that help assess the significance of sequence alignments. The E-value decreases exponentially as bit score increases.

Lower E-values indicate a more significant match that is less likely to have occurred by chance and could be indicative of a biological relationship between the query and target sequences.

For example, an E-value of 0.01 suggests that you would expect to find a match with at least the same score as the one obtained in the current alignment once in every 100 random database searches. In other words, there is a 1 in 100 chance of occurring this alignment by pure random chance (which implies there could be true biological relationship).

There is no ideal E-value cut-off for inferring homology. The default E-value cut-off is 0.05 in NCBI BLAST. Generally, a E-value < 1e-05 (0.00001) considered as significant match and provides high confidence for homologous relationship. But you should always interpret the E-value cautiously and also consider other factors such as bit score, query coverage, percent similarity, number of significant alignments, and biological context.

Difference between E-value and p value in BLAST search?

Sometimes E-value get confused with p value for assessing the significance of the alignment, but both are different in context of BLAST search.

In BLAST, the p value represents the probability of obtaining a alignment with a bit score >= S’ by chance in a database search.

BLAST uses the E-value as a standard metric for ranking the alignment because E-value is easy to understand while comparing the alignments.

For example, alignments with E-value of 5 and 10 are easy to compare than corresponding p of values of 0.993 and 0.99995.

The E-values and p values are roughly similar when E-value < 0.01. The E-value could range from 0 to infinity (most E-value observed in between 0 to 1), whereas p value range from 0 to 1.

References

Enhance your skills with courses on genomics and bioinformatics


This work is licensed under a Creative Commons Attribution 4.0 International License

Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.