ENTM201L - General Entomology Laboratory | UC Riverside
Listen to this module while following along with the text below, or download for offline study.
After Sanger sequencing reveals your mosquito COI sequences in, you possess genetic barcodes that identify species. But DNA sequences contain far more information than just species names. They record evolutionary history - the branching pattern of descent from common ancestors.
Phylogenetic trees (also called phylogenies or evolutionary trees) visualize these relationships, allowing you to:In ENTM201L, you will build a phylogenetic tree from your class COI sequences plus reference sequences from GenBank, using industry-standard software (MAFFT for alignment, IQ-TREE for tree building). This connects your lab work to real-world applications in vector surveillance, invasion biology, and evolutionary research.
A phylogenetic tree is a branching diagram showing evolutionary relationships among organisms or genes.
Basic components: ┌─── Aedes albopictus (USA)
┌─────┤
│ └─── Aedes albopictus (Japan)
┌─────┤
│ │ ┌─── Aedes aegypti (Brazil)
│ └─────┤
────┤ └─── Aedes aegypti (Florida)
│
│ ┌───────── Culex pipiens
└─────┤
└───────── Culex tarsalis
Key terms:
| Term | Definition | Example |
|---|---|---|
| Tips (leaves) | Terminal nodes representing species/samples | Aedes albopictus (USA) |
| Branches | Lines connecting nodes, represent evolutionary change | Horizontal/diagonal lines |
| Nodes (internal) | Branching points representing common ancestors | Where Ae. albopictus USA & Japan connect |
| Root | Base of tree, most recent common ancestor of all tips | Leftmost point where all branches connect |
| Clade | Group of organisms sharing common ancestor | All Aedes species = one clade |
| Sister taxa | Two closest relatives sharing immediate common ancestor | Ae. albopictus USA & Japan |
| Outgroup | Distantly related species used to root tree | Culex (if analyzing Aedes) |
- 0.01 = 1% divergence (1 substitution per 100 bp on average)
- 0.10 = 10% divergence
- Within species: 0.001-0.02 (0.1-2% divergence)
- Between species (same genus): 0.05-0.15 (5-15%)
- Between genera: 0.20-0.30 (20-30%)
Example interpretation:Sequence 1: ATCGATCGAA...
Sequence 2: ATCGTCGAA... (deleted one A)
Sequence 3: ATCGAATCGAA... (inserted one A)
You cannot tell position 5 in Seq 1 corresponds to position 4 in Seq 2 or position 6 in Seq 3.
With alignment (inserts gaps to maintain positional homology):Sequence 1: ATCGATCGAA
Sequence 2: ATCG-TCGAA (gap shows deletion)
Sequence 3: ATCGAATCGAA (insertion)
||||*||||||
Positions now comparable
Alignment principle: Insert gaps (indels) to maximize similarity across sequences, ensuring each column represents one ancestral site.
1. Pairwise distance calculation: Compare all sequences to each other
2. Guide tree construction: Build rough tree showing approximate relationships
3. Progressive alignment: Align most similar sequences first, gradually add more distant sequences
4. Refinement: Iteratively improve alignment by realigning subgroups
Why MAFFT for COI:mafft --auto input.fasta > aligned.fasta
Alignment strategies:
--auto: MAFFT selects best algorithm based on dataset size (recommended)--localpair: More accurate for divergent sequences (slower)--retree 2: Fast, good for large datasets1. Translate to protein: COI codes for Cytochrome Oxidase protein
- Should have no stop codons (except terminal TAA)
- Conserved amino acids (e.g., active site residues) should align
2. Check for NUMTs: Nuclear mitochondrial pseudogenes often have stop codons or frameshifts
- If present, exclude from phylogeny
3. Trim poorly aligned regions: Remove first/last 10-20 bp if ambiguous
1. Propose a tree topology
2. Calculate likelihood: P(data | tree, model)
3. Optimize branch lengths to maximize likelihood
4. Try different topologies, keep best one
5. Repeat until convergence
Why ML is superior:Not all nucleotide changes are equally likely. Transitions (A↔G, C↔T) occur more frequently than transversions (A↔C, A↔T, G↔C, G↔T) due to:
-m MFP flagBest-fit model: TN+F+I+G4
TN = Tamura-Nei (different transition rates for A↔G vs. C↔T)
+F = Unequal base frequencies
+I = Invariant sites
+G4 = Gamma distribution of rates (4 categories)
Base frequencies:
A=29.8%, C=15.4%, G=16.1%, T=38.7% (AT-rich, typical for insect mtDNA)
1. Take your alignment (e.g., 650 bp)
2. Resample with replacement: Randomly select 650 positions (some sampled multiple times, some not at all)
3. Build tree from resampled alignment
4. Repeat 1,000 times
5. Count how often each clade appears in the 1,000 bootstrap trees
6. Bootstrap value = % of trees containing that clade
Interpretation: ┌─── Ae. albopictus (USA)
┌─────┤ 99%
│ └─── Ae. albopictus (Japan)
┌─────┤
│ │ ┌─── Ae. aegypti (Brazil)
────┤ └─────┤ 95%
│ 85% └─── Ae. aegypti (Florida)
│
└───────────── Culex pipiens (outgroup)
Interpretation:
-bb 1000These three trees are identical:
Tree 1: Tree 2: Tree 3:
┌─── A ┌─── B ┌─── C
────┤ ────┤ ────┤
│ ┌─── B │ ┌─── A │ ┌─── A
└───┤ └───┤ └───┤
└─── C └─── C └─── B
All three show: (A,B) are sisters, and C is outgroup. Order of tips does not matter.
Implication: Do not assume species listed next to each other are more closely related than distant ones. Only branching pattern matters. A
|
────┼────
| |
B C
Did A diverge first, or B, or C? Cannot tell.
Rooting with outgroup: Add distantly related species known to be outside your group of interest.Outgroup (Culex) ────┬────┬─── A (Ae. albopictus USA)
│ └─── B (Ae. albopictus Japan)
│
└──────── C (Ae. aegypti)
Now clear: Outgroup diverged first, then A&B lineage split from C, then A and B split.
Choosing outgroup:# Align sequences with MAFFT
mafft --auto mosquito_sequences.fasta > aligned.fasta
# Build ML tree with IQ-TREE
iqtree -s aligned.fasta -m MFP -bb 1000 -nt AUTO
# Parameters explained:
# -s aligned.fasta: Input alignment file
# -m MFP: Auto model selection (ModelFinder Plus)
# -bb 1000: Ultrafast bootstrap with 1000 replicates
# -nt AUTO: Use all available CPU cores
Output files:
aligned.fasta.treefile: Best ML tree (Newick format)aligned.fasta.iqtree: Log file (models tested, tree stats, bootstrap values)aligned.fasta.log: Progress logdocker run -v $(pwd):/data iqtree/iqtree -s /data/aligned.fasta -m MFP -bb 1000
Example with Apptainer (common on HPC clusters):
apptainer exec iqtree.sif iqtree -s aligned.fasta -m MFP -bb 1000
((A:0.1,B:0.15)85:0.05,C:0.25);
Interpretation:
A:0.1 = tip A with branch length 0.1(A:0.1,B:0.15) = A and B are sisters85 = bootstrap value for nodeC:0.25 = outgroup C.treefile, displays tree graphicallyTREE_COLORS
SEPARATOR TAB
DATA
Ae_albopictus_USA range #FF0000 Japan
Ae_albopictus_Japan range #0000FF USA
Result: USA samples red, Japan samples blue, immediately shows geographic clustering.
1. Sample collection: Collect Ae. albopictus from 50 sites across USA (California, Texas, Florida, New York)
2. DNA extraction + sequencing: COI barcoding (712 bp)
3. Reference sequences: Download 100 Ae. albopictus COI from GenBank (Asia, Europe, South America)
4. Alignment: MAFFT with 150 sequences
5. Phylogenetic tree: IQ-TREE with GTR+I+G model, 1,000 bootstrap replicates
6. Visualization: iTOL, color by geographic origin
Results (hypothetical but based on real studies): ┌─── USA West Coast (CA, WA)
┌─────┤ 99%
│ └─── Japan (Tokyo, Osaka)
│
┌─────┤ ┌─── USA East Coast (FL, NY)
│ └─────┤ 95%
│ 78% └─── Southern China (Guangzhou)
────┤
│ ┌─── Europe (Italy, France)
│ ┌─────┤ 97%
└─────┤ └─── Northern China (Beijing)
│ 88%
└───────── South America (Brazil, Argentina)
Interpretation:
1. Multiple introductions: USA populations are not monophyletic (West Coast clusters with Japan, East Coast with Southern China)
2. Source populations identified: West Coast invasion from Japan (99% bootstrap), East Coast from Southern China (95% bootstrap)
3. Invasion routes: Likely via used tire shipments from specific Asian ports
4. No evidence of USA-Europe connection: European populations cluster with Northern China, independent from USA
Public health implications: ┌─── Cx. p. pipiens (Europe, outdoor)
┌─────┤ 65%
│ └─── Cx. p. molestus (Europe, basements)
────┤
│ ┌─── Cx. p. quinquefasciatus (USA, tropics)
└─────┤ 85%
78% └─── Cx. p. pipiens (USA, outdoor)
Problem: Bootstrap values low (<80%), forms not clearly separated by COI.
Solution: Use additional markers:
1. Phylogenetic Inference:
- Felsenstein, J. (1981). Evolutionary trees from DNA sequences: A maximum likelihood approach. Journal of Molecular Evolution 17(6): 368-376. https://doi.org/10.1007/BF01734359
- Yang, Z., & Rannala, B. (2012). Molecular phylogenetics: Principles and practice. Nature Reviews Genetics 13(5): 303-314. https://doi.org/10.1038/nrg3186
2. IQ-TREE Software:
- Nguyen, L. T., et al. (2015). IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution 32(1): 268-274. https://doi.org/10.1093/molbev/msu300
- Kalyaanamoorthy, S., et al. (2017). ModelFinder: Fast model selection for accurate phylogenetic estimates. Nature Methods 14: 587-589. https://doi.org/10.1038/nmeth.4285
3. Bootstrap Methods:
- Felsenstein, J. (1985). Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39(4): 783-791. https://doi.org/10.2307/2408678
- Hoang, D. T., et al. (2018). UFBoot2: Improving the ultrafast bootstrap approximation. Molecular Biology and Evolution 35(2): 518-522. https://doi.org/10.1093/molbev/msx281
4. MAFFT:
- Katoh, K., & Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Molecular Biology and Evolution 30(4): 772-780. https://doi.org/10.1093/molbev/mst010
5. Alignment Quality:
- Sievers, F., & Higgins, D. G. (2018). Clustal Omega for making accurate alignments of many protein sequences. Protein Science 27(1): 135-145. https://doi.org/10.1002/pro.3290
6. DNA Barcoding and Phylogeography:
- Kamgang, B., et al. (2011). Geographic and ecological distribution of the dengue and chikungunya virus vectors Aedes aegypti and Aedes albopictus in three major Cameroon towns. Medical and Veterinary Entomology 25(2): 132-141.
- Paupy, C., et al. (2009). Comparative role of Aedes albopictus and Aedes aegypti in the emergence of Dengue and Chikungunya in Central Africa. Vector-Borne and Zoonotic Diseases 9(6): 585-595.
7. Aedes albopictus Invasion Genetics:
- Goubert, C., et al. (2016). Population genetics of Aedes albopictus invading populations. PLoS ONE 11(1): e0147673. https://doi.org/10.1371/journal.pone.0147673
- Kotsakiozi, P., et al. (2017). Population genomics of the Asian tiger mosquito, Aedes albopictus: Insights into the recent worldwide invasion. Ecology and Evolution 7(23): 10143-10157.
8. Cryptic Species and COI Barcoding Limitations:
- Cywinska, A., et al. (2006). Identifying Canadian mosquito species through DNA barcodes. Medical and Veterinary Entomology 20(4): 413-424.
- Ratnasingham, S., & Hebert, P. D. N. (2013). A DNA-based registry for all animal species: The Barcode Index Number (BIN) system. PLoS ONE 8(7): e66213.
9. FigTree: http://tree.bio.ed.ac.uk/software/figtree/
10. iTOL: Letunic, I., & Bork, P. (2021). Interactive Tree Of Life (iTOL) v5: An online tool for phylogenetic tree display and annotation. Nucleic Acids Research 49(W1): W293-W296. https://doi.org/10.1093/nar/gkab301
11. ggtree: Yu, G., et al. (2017). ggtree: An R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution 8(1): 28-36. https://doi.org/10.1111/2041-210X.12628
DNA sequences contain information about ancestry. Trees visualize this, showing:
Unrooted trees show relationships but not ancestry. Outgroup (distantly related species) roots the tree, revealing which lineages are ancestral vs. derived.
Phylogenetics informs:
In lab, you will:
1. Collect sequences:
- Your trimmed COI sequences from Sanger sequencing
- Classmates' sequences (different Aedes samples)
- Reference sequences from GenBank (outgroups, known species)
2. Multiple sequence alignment:
- Use MAFFT (via web server or command line)
- Export aligned FASTA file
3. Build phylogenetic tree:
- IQ-TREE with automatic model selection (-m MFP)
- Ultrafast bootstrap (-bb 1000)
- Examine .iqtree file (best model, log-likelihood)
4. Visualize tree:
- Load .treefile into FigTree
- Display bootstrap values
- Root with outgroup (Culex or Anopheles)
- Color tips by species or geographic origin
5. Interpret results:
- Are your samples monophyletic (cluster together)?
- What is % divergence within species? Between species?
- Do bootstrap values support major clades?
- Can you infer invasion routes or population structure?
6. Report findings:
- Include tree figure in lab report
- Discuss bootstrap support for key nodes
- Compare your results to published phylogenies
Remember: Phylogenetics is the capstone of your molecular workflow. It transforms individual DNA sequences into evolutionary insights, connecting your lab work to global scientific questions about mosquito evolution, invasion biology, and disease transmission.Citation: Nguyen, L.-T., Schmidt, H. A., von Haeseler, A., & Minh, B. Q. (2015). IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular Biology and Evolution, 32(1), 268-274.
DOI: 10.1093/molbev/msu300 | PMID: 25371430
Introduced IQ-TREE, a fast stochastic maximum likelihood phylogenetic inference tool. Features novel tree search algorithm using stochastic perturbations, often finding trees with higher likelihoods compared to RAxML and PhyML. Significantly faster than competing ML methods while maintaining or improving accuracy. Built-in ModelFinder for automatic substitution model selection.
Citation: Trifinopoulos, J., Nguyen, L.-T., von Haeseler, A., & Minh, B. Q. (2016). W-IQ-TREE: A fast online phylogenetic tool for maximum likelihood analysis. Nucleic Acids Research, 44(W1), W232-W235.
DOI: 10.1093/nar/gkw256 | PMID: 27084950
Web-based interface making IQ-TREE accessible without command-line experience. Automated workflow handles sequence alignment, model selection, tree inference, and bootstrapping. Ideal for teaching environments and researchers without bioinformatics background.
Citation: Minh, B. Q., Schmidt, H. A., Chernomor, O., Schrempf, D., Woodhams, M. D., von Haeseler, A., & Lanfear, R. (2020). IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Molecular Biology and Evolution, 37(5), 1530-1534.
DOI: 10.1093/molbev/msaa015 | PMID: 32011700
Major software update with enhanced features for phylogenomics. New models include mixture models, heterotachy, and rate heterogeneity across sites. Introduced concordance factors for assessing phylogenetic signal. Better scalability for genome-scale datasets.
IQ-TREE is ideal for DNA barcoding phylogenetics because: (1) fast enough for classroom use (minutes, not hours), (2) automatic model selection via ModelFinder eliminates need to understand complex substitution models, (3) ultrafast bootstrap provides 100-1000 replicates in reasonable time, (4) user-friendly with relatively simple command-line syntax, (5) well-documented with extensive tutorials, and (6) actively maintained with regular updates. Performance benchmarks show IQ-TREE finds higher likelihood trees than RAxML in 87.1% of cases while being 2-10x faster than PhyML.