Calculate All Possible Combinations of DNA bases

Renesh Bedre    1 minute read

There are four possible bases nucleotide bases (A, T, G, and C) in the DNA sequence. Sometimes in genomic analysis, you need to calculate all possible combinations with a certain length of nucleotide bases in a DNA sequence.

You can use a formula 4n to calculate the all possible combinations of DNA bases of a given length (where n is the length of the nucleotide bases combination).

For example, there are 16 possible combinations of two nucleotide bases, 64 possible combinations of three nucleotide bases, and so on. The number of possible combinations grows exponentially with the sequence length.

You can use the following Python codes to calculate all possible combinations of nucleotide bases with a certain length in DNA sequence.

Example 1: Calculate all combination of two nucleotide bases

# import package
from itertools import product

# calculate all combinations of two bases
comb = [''.join(b) for b in product("ATGC", repeat=2)]
print(comb)
# output
['AA', 'AT', 'AG', 'AC', 'TA', 'TT', 'TG', 'TC', 'GA', 'GT', 'GG', 'GC', 
 'CA', 'CT', 'CG', 'CC']

There are 16 possible combinations of two nucleotide bases

Example 1: Calculate all combination of three bases

# import package
from itertools import product

# calculate all combinations of two bases
comb = [''.join(b) for b in product("ATGC", repeat=3)]
print(comb)
# output
['AAA', 'AAT', 'AAG', 'AAC', 'ATA', 'ATT', 'ATG', 'ATC', 'AGA', 'AGT', 'AGG', 'AGC', 'ACA', 'ACT', 'ACG', 'ACC', 
 'TAA', 'TAT', 'TAG', 'TAC', 'TTA', 'TTT', 'TTG', 'TTC', 'TGA', 'TGT', 'TGG', 'TGC', 'TCA', 'TCT', 'TCG', 'TCC', 
 'GAA', 'GAT', 'GAG', 'GAC', 'GTA', 'GTT', 'GTG', 'GTC', 'GGA', 'GGT', 'GGG', 'GGC', 'GCA', 'GCT', 'GCG', 'GCC', 
 'CAA', 'CAT', 'CAG', 'CAC', 'CTA', 'CTT', 'CTG', 'CTC', 'CGA', 'CGT', 'CGG', 'CGC', 'CCA', 'CCT', 'CCG', 'CCC']

There are 64 possible combinations of three nucleotide bases

Similarly, there will be 256 combinations of four bases, and this number increases exponentially with the sequence length



Enhance your skills with courses on genomics and bioinformatics


This work is licensed under a Creative Commons Attribution 4.0 International License

Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.