Concatenate and split VCF files

Renesh Bedre    1 minute read

What is VCF file?

  • VCF stands for variant call format
  • It is a text file (file extension as .vcf) storing meta-information, marker and genotype data of genetic variations

How to merge multiple VCF files?

Sometimes, it is necessary to concatenate different VCF files for analysis as the genotype information stored in multiple files (For example, you have different VCF files for every chromosome).

# I am using interactive python interpreter (Python 3.7.4)
# go to a directory where all vcf files are stored. make sure all files are uncompressed.
# make sure you will have uniform VCF files. For example, multiple VCF files 
# generated from same source datasets
>>> from bioinfokit.analys import marker
# concatenate VCF files. You can provide multiple VCF files separated by comma.
>>> marker.concatvcf("file_1.vcf,file_2.vcf,file_3.vcf,file_4.vcf")
# merged VCF files will be stored in same directory (concat_vcf.vcf)

Split VCF file by chromosome

Split single VCF file containing variants for all chromosomes into individual VCF file containing variants for each chromosomes

>>> from bioinfokit.analys import marker
>>> marker.splitvcf(file="file.vcf")
# multiple VCF files for each chromosomes will be saved in same directory

This work is licensed under a Creative Commons Attribution 4.0 International License