CCCTC-binding factor variants (2024)

This application claims the benefit of U.S. Provisional Patent Application Ser. No. 62/672,682, filed on May 17, 2018 and U.S. Provisional Patent Application Ser. No. 62/828,277, filed on Apr. 2, 2019. The entire contents of the foregoing are hereby incorporated by reference.

This invention was made with Government support under Grant No. GM118158 awarded by the National Institutes of Health. The Government has certain rights in the invention.

The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Jun. 20, 2019, is named 29539-0339WO1 SL.txt and is 1,104,397 bytes in size.

The invention relates, at least in part, to engineered CCCTC-binding factor variants with altered DNA-binding specificities.

CCCTC-binding factor (CTCF) is a multi-domain protein that acts as an essential genome organizer by maintaining higher-order chromatin structure while also having a role in cell differentiation and the promotion or repression of gene expression (Ong and Corces, Nature Reviews Genetics (2014); Phillips and Corces, Cell (2009)). CTCF maintains topologically associated domains (TADs) spanning MBs of the genome as well as smaller scale Sub-TADs leading to fine-tuned gene insulation or gene activation within gene clusters (Ali et al., Current Opinion in Genetics & Development (2016); Nora et al., Nature (2012); Rao et al., Cell (2014)). In addition, CTCF has been found to regulate mRNA splicing by influencing the rate of transcription and more recently been implicated in promoting hom*ologous recombination repair at double-strand breaks (Shukla et al., Nature (2011); Hilmi et al., Science Advances (2017); Han et al., Scientific Reports (2016)). CTCF binds throughout the genome via an 11 finger zinc finger (ZF) array that recognizes CTCF binding sites (CBSs). The CBS is typically 40 bp in length with a highly conserved 15 bp core sequence.

The present invention is based, at least in part, on the development of engineered CTCF variants that can bind to mutant CBSs with higher affinity than a wild-type CTCF.

The present invention relates to an engineered CCCTC-binding factor (CTCF) variant including at least one amino acid residue in at least one zinc finger that differs in sequence from the amino acid sequence of a wild-type CTCF, where the engineered CTCF variant binds to a mutant CTCF binding sequence (CBS) with a higher affinity than wild-type CTCF, the mutant CBS including at least one nucleotide base that differs in sequence from the nucleotide sequence of a consensus CBS, where the at least one amino acid residue that differs in sequence from the amino acid sequence of a wild-type CTCF is selected from the group consisting of the amino acid residues at the position(s) −1, +1, +2, +3, +5, and +6 of any of ZF7, ZF6, ZF5, ZF4, and ZF3 of the engineered CTCF variant.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CTCF binding sequence (CBS) that has a Thymine (T), Adenine (A), or Guanine (G) residue at position 2 of the consensus CBS motif, the engineered CTCF including an amino acid residue threonine, asparagine, or histidine at ZF7 position +3.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a G residue at position 2 of the consensus CBS motif, the engineered CTCF including the amino acid sequence DHLQT (SEQ ID NO: 8), EHLNV (SEQ ID NO: 9), AHLQV (SEQ ID NO: 10), EHLRE (SEQ ID NO: 11), DHLQV (SEQ ID NO: 12), EHLKV (SEQ ID NO: 13), EHLVV (SEQ ID NO: 15), DHLRT (SEQ ID NO: 16), or DHLAT (SEQ ID NO: 17) at ZF7 positions +2 to +6.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, G, or C residue at position 3 of the consensus CBS motif, the engineered CTCF at ZF7 positions −1 to +3 including: the amino acid sequence RKHD (SEQ ID NO: 173) or RRSD (SEQ ID NO: 174), where the mutant CBS has a T residue at position 3 of the consensus CBS motif; the amino acid sequence RKAD (SEQ ID NO: 175), IPRI (SEQ ID NO: 176), RKHD (SEQ ID NO: 173), or RKDD (SEQ ID NO: 177), where the mutant CBS has a G residue at position 3 of the consensus CBS motif; or the amino acid sequence GIVN (SEQ ID NO: 178), ELLN (SEQ ID NO: 179), QALL (SEQ ID NO: 180) or PHRM (SEQ ID NO: 181), where the mutant CBS has a C residue at position 3 of the consensus CBS motif.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, G, or A residue at position 5 of the consensus CBS motif, the engineered CTCF at ZF6 positions +2 to +6 including: the amino acid sequence NAMKR (SEQ ID NO: 30), GNMAR (SEQ ID NO: 182), EGMTR (SEQ ID NO: 183), SNMVR (SEQ ID NO: 184), or NAMRG (SEQ ID NO: 185), where the mutant CBS has a T residue at position 5 of the consensus CBS motif; or the amino acid sequence EHMGR (SEQ ID NO: 31), DHMNR (SEQ ID NO: 32), THMKR (SEQ ID NO: 33), EHMRR (SEQ ID NO: 34), or THMNR (SEQ ID NO: 35), where the mutant CBS has a G residue at position 5 of the consensus CBS motif.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, G, or C residue at position 6 of the consensus CBS motif, the engineered CTCF at ZF6 positions −1 to +3 including: the amino acid sequence MNES (SEQ ID NO: 36) or HRES (SEQ ID NO: 37), where the mutant CBS has a T residue at position 6 of the consensus CBS motif; or the amino acid sequence RPDT (SEQ ID NO: 38), RTDI (SEQ ID NO: 39), or RHDT (SEQ ID NO: 40), where the mutant CBS has a G residue at position 6 of the consensus CBS motif.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a C, A, or T residue at position 7 of the consensus CBS motif, the engineered CTCF at ZF5 positions +2 to +6 including: the amino acid sequence HGLKV (SEQ ID NO: 41), HRLKE (SEQ ID NO: 42), HALKV (SEQ ID NO: 43), SRLKE (SEQ ID NO: 44), or DGLRV (SEQ ID NO: 45), where the mutant CBS has a T residue at position 7 of the consensus CBS motif; the amino acid sequence HTLKV (SEQ ID NO: 46), or HGLKV (SEQ ID NO: 41), where the mutant CBS has an A residue at position 7 of the consensus CBS motif; or the amino acid sequence SRLKE (SEQ ID NO: 44), HRLKE (SEQ ID NO: 42) or NRLKE (SEQ ID NO: 47), where the mutant CBS has a C residue at position 7 of the consensus CBS motif.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a C, A, or T residue at position 8 of the consensus CBS motif, the engineered CTCF at ZF5 positions +2 to +6 including: the amino acid sequence ATLKR (SEQ ID NO: 48), QALRR (SEQ ID NO: 49), GGLVR (SEQ ID NO: 50), or HGLIR (SEQ ID NO: 51), where the mutant CBS has a T residue at position 8 of the consensus CBS motif; the amino acid sequence ANLSR (SEQ ID NO: 52), TGLTR (SEQ ID NO: 53), HGLVR (SEQ ID NO: 54), or GGLTR (SEQ ID NO: 55), where the mutant CBS has an A residue at position 8 of the consensus CBS motif; the amino acid sequence HTLRR (SEQ ID NO: 56), TVLKR (SEQ ID NO: 57), ADLKR (SEQ ID NO: 58), or HGLRR (SEQ ID NO: 59), where the mutant CBS has a C residue at position 8 of the consensus CBS motif.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, A, or C residue at position 10 of the consensus CBS motif, the engineered CTCF at ZF4 positions +2 to +6 including: the amino acid sequence AHLRK (SEQ ID NO: 60), wherein the mutant CBS has a T residue at position 10 of the consensus CBS motif; the amino acid sequence AKLRV (SEQ ID NO: 61), EKLRI (SEQ ID NO: 186), or AKLRI (SEQ ID NO: 63), where the mutant CBS has an A residue at position 10 of the consensus CBS motif; or the amino acid sequence TKLKV (SEQ ID NO: 64), wherein the mutant CBS has a C residue at position 10 of the consensus CBS motif.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, A, or C residue at position 11 of the consensus CBS motif, the engineered CTCF at ZF4 positions +2 to +6 including: the amino acid sequence ATLRR (SEQ ID NO: 66) or RRLDR (SEQ ID NO: 67), where the mutant CBS has a T residue at position 11 of the consensus CBS motif; the amino acid sequence TNLRR (SEQ ID NO: 68), ANLRR (SEQ ID NO: 69), or GNLTR (SEQ ID NO: 70), where the mutant CBS has an A residue at position 11 of the consensus CBS motif; or the amino acid sequence AMLKR (SEQ ID NO: 71), HMLTR (SEQ ID NO: 72), AMLRR (SEQ ID NO: 73), or TMLRR (SEQ ID NO: 74), where the mutant CBS has a C residue at position 11 of the consensus CBS motif.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a T, A, or C residue at position 13 of the consensus CBS motif, the engineered CTCF at ZF3 positions +2 to +6 including: the amino acid sequence QQLIV (SEQ ID NO: 75), SQLIV (SEQ ID NO: 76), QQLLV (SEQ ID NO: 77), GELVV (SEQ ID NO: 78), or QQLLI (SEQ ID NO: 79), where the mutant CBS has a T residue at position 13 of the consensus CBS motif; the amino acid sequence GQLIV (SEQ ID NO: 80), GQLTV (SEQ ID NO: 81), GKLVT (SEQ ID NO: 187), TELII (SEQ ID NO: 82) or QGLLV (SEQ ID NO: 83), where the mutant CBS has an A residue at position 13 of the consensus CBS motif; or the amino acid sequence QQLLT (SEQ ID NO: 84), GQLLT (SEQ ID NO: 85), GELLT (SEQ ID NO: 86), or QQLLI (SEQ ID NO: 79), where the mutant CBS has a C residue at position 13 of the consensus CBS motif.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has A, G, T, and T residues at positions 2, 6, 7, and 10 of the consensus CBS motif, respectively, the engineered CTCF including: (i) the amino acid sequence AKLKK (SEQ ID NO: 88), AKLRK (SEQ ID NO: 89), AHLRV (SEQ ID NO: 90), AKLRV (SEQ ID NO: 61), or SKLRL (SEQ ID NO: 92) at ZF4 positions +2 to +6 of the engineered CTCF; (ii) the amino acid sequence ERLRV (SEQ ID NO: 93), NRLKV (SEQ ID NO: 94), SRLKE (SEQ ID NO: 44), or NRLKV (SEQ ID NO: 94) at ZF5 positions +2 to +6 of the engineered CTCF; (iii) the amino acid sequence RPDT (SEQ ID NO: 38), RTET (SEQ ID NO: 98), or RADV (SEQ ID NO: 99) at ZF6 positions −1 to +3 of the engineered CTCF; and (iv) the amino acid sequence DNLLA (SEQ ID NO: 100), SNLLV (SEQ ID NO: 101), DNLMA (SEQ ID NO: 102), or DNLRV (SEQ ID NO: 103) at ZF7 positions +2 to +6 of the engineered CTCF.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has G, G, T, and T residues at positions 2, 6, 7, and 10 of the consensus CBS motif, respectively, the engineered CTCF including: (i) the amino acid sequence GHLKK (SEQ ID NO: 158), AHLRK (SEQ ID NO: 60), or GKLRI (SEQ ID NO: 106) at ZF4 positions +2 to +6 of the engineered CTCF; (ii) the amino acid sequence SRLKE (SEQ ID NO: 44), DALRR (SEQ ID NO: 108), DGLKR (SEQ ID NO: 109), or TRLRE (SEQ ID NO: 110) at ZF5 positions +2 to +6 of the engineered CTCF; (iii) the amino acid sequence at RPDTMKR (SEQ ID NO: 188) or RTENMKM (SEQ ID NO: 189) at ZF6 positions −1 to +6 of the engineered CTCF; and (iv) the amino acid sequence EHLKV (SEQ ID NO: 13), DHLLA (SEQ ID NO: 114), or HHLDV (SEQ ID NO: 115) at ZF7 positions +2 to +6 of the engineered CTCF.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has A, G, and A residues at positions 2, 5, and 11 of the consensus CBS motif, respectively, the engineered CTCF including: (i) the amino acid sequence SNLRR (SEQ ID NO: 116), GNLVR (SEQ ID NO: 117), GNLRR (SEQ ID NO: 118), GNLKR (SEQ ID NO: 119), ANLRR (SEQ ID NO: 69), NNLRR (SEQ ID NO: 121), or TNLRR (SEQ ID NO: 68) at ZF4 positions +2 to +6 of the engineered CTCF; (ii) the amino acid sequence EHMKR (SEQ ID NO: 123), EHMRR (SEQ ID NO: 34), THMKR (SEQ ID NO: 33), EHMNR (SEQ ID NO: 126), or EHMAR (SEQ ID NO: 127) at ZF6 positions +2 to +6 of the engineered CTCF; and (iii) the amino acid sequence DNLLT (SEQ ID NO: 128), DNLLV (SEQ ID NO: 129), DNLQT (SEQ ID NO: 130), DNLLA (SEQ ID NO: 100), DNLAT (SEQ ID NO: 132), DNLQA (SEQ ID NO: 133), DNLMA (SEQ ID NO: 102), or DNLMT (SEQ ID NO: 135) at ZF7 positions +2 to +6 of the engineered CTCF.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has G, G, and A residues at positions 2, 5, and 11 of the consensus CBS motif, respectively, the engineered CTCF including: (i) the amino acid sequence GNLVR (SEQ ID NO: 117), GNLRR (SEQ ID NO: 118), GNLAR (SEQ ID NO: 138), GNLMR (SEQ ID NO: 139), ANLRR (SEQ ID NO: 69), SNLRR (SEQ ID NO: 116), or NNLRR (SEQ ID NO: 121) at ZF4 positions +2 to +6 of the engineered CTCF; (ii) the amino acid sequence EHMNR (SEQ ID NO: 126), EHMKR (SEQ ID NO: 123), EHMRR (SEQ ID NO: 34), SHMNR (SEQ ID NO: 146), SHMRR (SEQ ID NO: 147), THMKR (SEQ ID NO: 33), or DHMNR (SEQ ID NO: 32) at ZF6 positions +2 to +6 of the engineered CTCF; and (iii) the amino acid sequence EHLKV (SEQ ID NO: 13), EHLAE (SEQ ID NO: 151), STLNE (SEQ ID NO: 152), DHLQV (SEQ ID NO: 12), EHLNV (SEQ ID NO: 9), DHLNT (SEQ ID NO: 155), EHLQA (SEQ ID NO: 156), or HHLMH (SEQ ID NO: 157) at ZF7 positions +2 to +6 of the engineered CTCF.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has G, T, and T residues at positions 6, 7, and 10 of the consensus CBS motif, respectively, the engineered CTCF including: (i) the amino acid sequence GHLKK (SEQ ID NO: 158), AHLKK (SEQ ID NO: 159), TKLRL (SEQ ID NO: 160), TKLKL (SEQ ID NO: 161), GHLRK (SEQ ID NO: 162), THLKK (SEQ ID NO: 163), or AHLRK (SEQ ID NO: 60) at ZF4 positions +2 to +6 of the engineered CTCF; (ii) the amino acid sequence TRLKE (SEQ ID NO: 165) or SRLKE (SEQ ID NO: 44) at ZF5 positions +2 to +6 of the engineered CTCF; and (iii) the amino acid sequence RADN (SEQ ID NO: 167), RHDT (SEQ ID NO: 40), RRDT (SEQ ID NO: 169), RPDT (SEQ ID NO: 38), RTSS (SEQ ID NO: 171), or RNDT (SEQ ID NO: 172) at ZF6 positions −1 to +3 of the engineered CTCF.

In some embodiments, the engineered CTCF variant includes at least one amino acid residue in at least one zinc finger that differs in sequence from the amino acid sequence of a wild-type CTCF, where the engineered CTCF variant binds to a mutant CTCF binding sequence (CBS) with a higher affinity than wild-type CTCF, the mutant CBS including at least one nucleotide base that differs in sequence from the nucleotide sequence of a consensus CBS, where the at least one amino acid residue that differs in sequence from the amino acid sequence of a wild-type CTCF is selected from the group consisting of the amino acid residues at the position(s) −1, +1, +2, +3, +5, and +6 of any of ZF7, ZF6, ZF5, ZF4, and ZF3 of the engineered CTCF variant.

In some embodiments, the engineered CCCTC-binding factor (CTCF) variant that binds with a higher affinity than a wild-type CTCF to a mutant CTCF binding sequence (CBS) that differs from a consensus CBS at position 2 of the consensus CBS motif, the engineered CTCF including an amino acid residue threonine, asparagine, or histidine at ZF7 +3 position.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that has a C-to-G mutation at position 2 of the consensus CBS motif, the engineered CTCF including the amino acid sequence DHLQT (SEQ ID NO: 8), EHLNV (SEQ ID NO: 9), AHLQV (SEQ ID NO: 10), EHLRE (SEQ ID NO: 11), DHLQV (SEQ ID NO: 12), EHLKV (SEQ ID NO: 13), DHLQV (SEQ ID NO: 12), EHLVV (SEQ ID NO: 15), DHLRT (SEQ ID NO: 16), DHLAT (SEQ ID NO: 17), or DHLQT (SEQ ID NO: 8) at ZF7 positions +2 to +6.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 3 of the consensus CBS motif, the engineered CTCF including the amino acid sequence RKHD (SEQ ID NO: 173), RRSD (SEQ ID NO: 174), GIVN (SEQ ID NO: 178), ELLN (SEQ ID NO: 179), or PHRM (SEQ ID NO: 181) at ZF7 positions −1 to +3.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 5 of the consensus CBS motif, the engineered CTCF including the amino acid sequence NAMKR (SEQ ID NO: 30), EHMGR (SEQ ID NO: 31), DHMNR (SEQ ID NO: 32), THMKR (SEQ ID NO: 33), EHMRR (SEQ ID NO: 34), or THMNR (SEQ ID NO: 35) at ZF6 positions +2 to +6.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 6 of the consensus CBS motif, the engineered CTCF including the amino acid sequence MNES (SEQ ID NO: 36), HRES (SEQ ID NO: 37), RPDT (SEQ ID NO: 38), RTDI (SEQ ID NO: 39), or RHDT (SEQ ID NO: 40) at ZF6 positions −1 to +3.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 7 of the consensus CBS motif, the engineered CTCF including the amino acid sequence HGLKV (SEQ ID NO: 41), HRLKE (SEQ ID NO: 42), HALKV (SEQ ID NO: 43), SRLKE (SEQ ID NO: 44), DGLRV (SEQ ID NO: 45), HTLKV (SEQ ID NO: 46), or NRLKE (SEQ ID NO: 47) at ZF5 positions +2 to +6.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 8 of the consensus CBS motif, the engineered CTCF including the amino acid sequence ATLKR (SEQ ID NO: 48), QALRR (SEQ ID NO: 49), GGLVR (SEQ ID NO: 50), HGLIR (SEQ ID NO: 51), ANLSR (SEQ ID NO: 52), TGLTR (SEQ ID NO: 53), HGLVR (SEQ ID NO: 54), GGLTR (SEQ ID NO: 55), HTLRR (SEQ ID NO: 56), TVLKR (SEQ ID NO: 57), ADLKR (SEQ ID NO: 58), or HGLRR (SEQ ID NO: 59) at ZF5 positions +2 to +6.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 10 of the consensus CBS motif, the engineered CTCF including the amino acid sequence AHLRK (SEQ ID NO: 60), AKLRV (SEQ ID NO: 61), GGLGL (SEQ ID NO: 62), AKLRI (SEQ ID NO: 63), TKLKV (SEQ ID NO: 64), or SKLRV (SEQ ID NO: 65) at ZF4 positions +2 to +6.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 11 of the consensus CBS motif, the engineered CTCF including the amino acid sequence ATLRR (SEQ ID NO: 66), RRLDR (SEQ ID NO: 67), TNLRR (SEQ ID NO: 68), ANLRR (SEQ ID NO: 69), GNLTR (SEQ ID NO: 70), AMLKR (SEQ ID NO: 71), HMLTR (SEQ ID NO: 72), AMLRR (SEQ ID NO: 73), or TMLRR (SEQ ID NO: 74) at ZF4 positions +2 to +6.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at position 13 of the consensus CBS motif, the engineered CTCF including the amino acid sequence QQLIV (SEQ ID NO: 75), SQLIV (SEQ ID NO: 76), QQLLV (SEQ ID NO: 77), GELVV (SEQ ID NO: 78), QQLLI (SEQ ID NO: 79), GQLIV (SEQ ID NO: 80), GQLTV (SEQ ID NO: 81), TELII (SEQ ID NO: 82), QGLLV (SEQ ID NO: 83), QQLLT (SEQ ID NO: 84), GQLLT (SEQ ID NO: 85), GELLT (SEQ ID NO: 86), or QQLLI (SEQ ID NO: 79) at ZF3 positions +2 to +6.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at positions 2, 6, 7, and 10 of the consensus CBS motif, the engineered CTCF including:

(i) the amino acid sequence AKLKK (SEQ ID NO: 88), AKLRK (SEQ ID NO: 89), AHLRV (SEQ ID NO: 90), AKLRV (SEQ ID NO: 61), or SKLRL (SEQ ID NO: 92) at ZF4 positions +2 to +6;

(ii) the amino acid sequence ERLRV (SEQ ID NO: 93), NRLKV (SEQ ID NO: 94), SRLKE (SEQ ID NO: 44), or NRLKV (SEQ ID NO: 94) at ZF5 positions +2 to +6;

(iii) the amino acid sequence RPDT (SEQ ID NO: 38), RTET (SEQ ID NO: 98), or RADV (SEQ ID NO: 99) at ZF6 positions −1 to +3; and (iv) the amino acid sequence DNLLA (SEQ ID NO: 100), SNLLV (SEQ ID NO: 101), DNLMA (SEQ ID NO: 102), or DNLRV (SEQ ID NO: 103) at ZF7 positions +2 to +6.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at positions 2, 6, 7, and 10 of the consensus CBS motif, the engineered CTCF including:

(i) the amino acid sequence GHLKK (SEQ ID NO: 158), AHLRK (SEQ ID NO: 60), or GKLRI (SEQ ID NO: 106) at ZF4 positions +2 to +6;

(ii) the amino acid sequence SRLKE (SEQ ID NO: 44), DALRR (SEQ ID NO: 108), DGLKR (SEQ ID NO: 109), or TRLRE (SEQ ID NO: 110) at ZF5 positions +2 to +6;

(iii) the amino acid sequence at RPDTMKR (SEQ ID NO: 188) or RTENMKM (SEQ ID NO: 189) at ZF6 positions −1 to +36; and (iv) the amino acid sequence EHLKV (SEQ ID NO: 13), DHLLA (SEQ ID NO: 114), or HHLDV (SEQ ID NO: 115) at ZF7 positions +2 to +6.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at positions 2, 5, and 11 of the consensus CBS motif, the engineered CTCF including:

(i) the amino acid sequence SNLRR (SEQ ID NO: 116), GNLVR (SEQ ID NO: 117), GNLRR (SEQ ID NO: 118), GNLKR (SEQ ID NO: 119), ANLRR (SEQ ID NO: 69), NNLRR (SEQ ID NO: 121), or TNLRR (SEQ ID NO: 68) at ZF4 positions +2 to +6;

(ii) the amino acid sequence EHMKR (SEQ ID NO: 123), EHMRR (SEQ ID NO: 34), THMKR (SEQ ID NO: 33), EHMNR (SEQ ID NO: 126), or EHMAR (SEQ ID NO: 127) at ZF6 positions +2 to +6; and

(iii) the amino acid sequence DNLLT (SEQ ID NO: 128), DNLLV (SEQ ID NO: 129), DNLQT (SEQ ID NO: 130), DNLLA (SEQ ID NO: 100), DNLAT (SEQ ID NO: 132), DNLQA (SEQ ID NO: 133), DNLMA (SEQ ID NO: 102), or DNLMT (SEQ ID NO: 135) at ZF7 positions +2 to +6.

In some embodiments, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at positions 2, 5, and 11 of the consensus CBS motif, the engineered CTCF including:

(i) the amino acid sequence GNLVR (SEQ ID NO: 117), GNLRR (SEQ ID NO: 118), GNLAR (SEQ ID NO: 138), GNLMR (SEQ ID NO: 139), ANLRR (SEQ ID NO: 69), SNLRR (SEQ ID NO: 116), or NNLRR (SEQ ID NO: 121) at ZF4 positions +2 to +6;

(ii) the amino acid sequence EHMNR (SEQ ID NO: 126), EHMKR (SEQ ID NO: 123), EHMRR (SEQ ID NO: 34), SHMNR (SEQ ID NO: 146), SHMRR (SEQ ID NO: 147), THMKR (SEQ ID NO: 33), or DHMNR (SEQ ID NO: 32) at ZF6 positions +2 to +6; and

(iii) the amino acid sequence EHLKV (SEQ ID NO: 13), EHLAE (SEQ ID NO: 151), STLNE (SEQ ID NO: 152), DHLQV (SEQ ID NO: 12), EHLNV (SEQ ID NO: 9), DHLNT (SEQ ID NO: 155), EHLQA (SEQ ID NO: 156), or HHLMH (SEQ ID NO: 157) at ZF7 positions +2 to +6.

In one embodiment, the engineered CTCF variant binds with a higher affinity than a wild-type CTCF to a mutant CBS that differs from a consensus CBS at positions 6, 7, and 10 of the consensus CBS motif, the engineered CTCF including:

(i) the amino acid sequence GHLKK (SEQ ID NO: 158), AHLKK (SEQ ID NO: 159), TKLRL (SEQ ID NO: 160), TKLKL (SEQ ID NO: 161), GHLRK (SEQ ID NO: 162), THLKK (SEQ ID NO: 163), or AHLRK (SEQ ID NO: 60) at ZF4 positions +2 to +6;

(ii) the amino acid sequence TRLKE (SEQ ID NO: 165) or SRLKE (SEQ ID NO: 44) at ZF5 positions +2 to +6; and

(iii) the amino acid sequence RADN (SEQ ID NO: 167), RHDT (SEQ ID NO: 40), RRDT (SEQ ID NO: 169), RPDT (SEQ ID NO: 38), RTSS (SEQ ID NO: 171), or RNDT (SEQ ID NO: 172) at ZF6 positions −1 to +3.

In some embodiments, the engineered CTCF variant interacts with cohesion to mediate the formation of an enhancer-promoter loop to modulate gene expression.

In another aspect, the invention features a method of treating a subject in need thereof, the method including administering to the subject a therapeutically effective amount of an engineered CTCF variant described herein.

In some embodiments, the subject can have cancer.

In another aspect, the invention features a method of activating or repressing expression of a gene which is under the control of a CBS bearing one or more mutations, the method including contacting an engineered CTCF described herein with a sequence of interest in the gene, such that the expression of the gene is regulated.

In another aspect, the invention features a pharmaceutical composition including an engineered CTCF variant described herein.

In another aspect, the invention features a gene expression system for regulation of a gene, the system including a nucleic acid encoding an engineered CTCF variant according described herein.

In another aspect, the invention features a method of altering the structure of chromatin including contacting an engineered CTCF variant described herein with a sequence of interest to form a binding complex, such that the structure of the chromatin is altered.

In another aspect, the invention features a method of activating or repressing expression of a gene which is under the control of a CBS bearing one or more mutations, the method including contacting the CBS bearing one or more mutations with an engineered CTCF variant described herein.

In another aspect, the invention features a kit including an engineered CTCF variant described herein.

In this disclosure, “comprises,” “comprising,” “containing” and “having” and the like can have the meaning ascribed to them in U.S. Patent law and can mean “includes,” “including,” and the like; “consisting essentially of” or “consists essentially” likewise has the meaning ascribed in U.S. Patent law and the term is open-ended, allowing for the presence of more than that which is recited so long as basic or novel characteristics of that which is recited is not changed by the presence of more than that which is recited, but excludes prior art embodiments.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Methods and materials are described herein for use in the present invention; other, suitable methods and materials known in the art can also be used. The materials, methods, and examples are illustrative only and not intended to be limiting. All publications, patent applications, patents, sequences, database entries, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control.

Other features and advantages of the invention will be apparent from the following detailed description and figures, and from the claims.

The following Detailed Description, given by way of example, but not intended to limit the invention to specific embodiment described, may be understood in conjunction with the accompanying figures, incorporated herein by reference.

FIG. 1: Diagram of an exemplary 11-finger CTCF zinc finger array protein-DNA interactions at the CTCF binding site. Each zinc finger of the 11-finger array contained a recognition alpha-helix where protein-DNA base contacts were made by amino acids in position −1, 2, 3 and 6 of each alpha-helix. Here, position −1, 3, and 6 were only depicted as positon 2 makes a cross strand contact with the opposite strand of the binding site that is not shown here. The sequence for the binding site was derived from ChIP-seq data (Nakahashi et al., 2013). The binding site was partitioned into three segments: 5′ flanking (gray-line), core (black-line), and 3′ flanking (light gray line). The position of each nucleotide within each segment are numbered. Dashes indicate known DNA-protein contacts (black) and theoretical DNA-protein contacts (gray) between the zinc finger array and the binding site. Zinc fingers 3-7 of the array (white) make protein-DNA contacts with the core sequence (bold, black lined). There was a possible 5-6 base pair gap (represented by horizontal dashed lines) between zinc finger 8 and zinc fingers 9-11 as suggested by ChIP-exo and DNAse I footprinting of CTCF bound DNA fragments (Hashimoto, H. et al., 2017). Note CTCF binds to its target site in the 3′-5′ direction with the N-terminal side of the protein binding to the 3′ end of the binding site. FIG. 1 discloses SEQ ID NO: 5544.

FIG. 2: Diagram of B2H Beta-galactosidase reporter assay. The B2H reporter assay used Gal11P-mediated recruitment of Gal4 to indicate binding. E. coli is transformed with two plasmids: one plasmid encoded for both a zinc finger-Gal11P fusion and an alpha N-terminal domain of RNA polymerase (α-NTD)-Gal4 fusion; the second plasmid contained a modifiable binding sequence upstream of a weak promoter that drives the expression of the lacZ gene, which encodes for β-galactosidase. A zinc finger-Gal11P fusion that was able to bind to the target sequence recruited the α-NTD-Gal4 fusion to the promoter, thereby inducing the expression of lacZ. This increase in β-galactosidase levels was detected by a simple colorimetric ONPG-based assay. The CTCF zinc finger array-gal11P fusion was bound to a CTCF binding site in this diagram, recruiting the α-NTD-Gal4 fusion to the promoter region upstream of lacZ, leading to expression.

FIG. 3: Fold activation in the B2H B-gal assay was greatest when CTCF zinc fingers 1-11 of 11 finger array interacts with full length target site. Five target sites (sequence indicated in the legend) were tested along with the full CTCF zinc finger array and four different subsets (indicated on the x-axis). The core sequence (black and bolded) which is the most highly conserved sequence of CTCF binding sites was tested independently and with different quantities of flanking sequence as derived from Hashimoto, H. et al. Mol. Cell. 2017 (black and light gray); Persikov, A and Singh, M. NAR. 2014 (medium gray); and Nakahashi, H. et al., Cell Rep. 2013 (very light gray and dark gray). Positive control reflects binding activity of a known 3-finger zinc finger that binds strongly in bacterial and human contexts to a known sequence. The negative control reflects baseline beta-galactosidase levels when the alpha N-terminal domain of RNA polymerase (α-NTD)-Gal4 fusion is not directly recruited to the promoter of lacZ. This baseline was used to calculate fold activation when the CTCF zinc finger array is fused to gal11P. FIG. 3 discloses SEQ ID NOS 5545-5548 and 5544, respectively, in order of appearance.

FIG. 4: CTCF zinc finger array is sensitive to sequence changes at certain positions of the core region within the CTCF binding site. Each of the four possible nucleotides at each position of the 40 bp reference CBS were tested for ability to bind the CTCF zinc finger array in the B2H y. Fold activation reflects binding activity above background β-galactosidase levels (Background β-gal levels are obtained from the levels of β-gal from samples with each binding site in the presence of the gal4-RNA polymerase fusion with no zinc finger array fused to gal11P). The reference sequence above is partitioned into three segments: 5′ flanking (dark gray lined), core (black lined), and 3′ flanking (gray lined). The position of each nucleotide within each segment are numbered. Dashes indicate known DNA-protein contacts (black) and theoretical DNA-protein contacts (gray) between the zinc finger array and the binding site. Core sequence 1-15 of the binding site (black, bold) interacts with zinc finger 3-7 of the array (white, black outline) and appear to be most sensitive to changes in the binding sequence. Alterations to the 5′ flanking sequence as well as the 3′ flanking sequence did not negatively impact binding. FIG. 4 discloses SEQ ID NO: 5544.

FIG. 5: Maximizing binding potential of the CTCF binding site. Modifications were made to the reference binding site (bottom) to combine nucleotide changes that, individually, showed increased binding activity of the CTCF zinc finger array. The core sequence motif is bold while changes made are underlined. Binding activity of the 11-finger CTCF zinc finger array was quantified in the B2H Beta-galactosidase reporter assay in triplicate. Fold activation reflects binding activity above background levels when no DNA binding protein is present. FIG. 5 discloses SEQ ID NOS 5549-5550 and 5544, respectively, in order of appearance.

FIG. 6: Diagram of B2H Beta-lactamase inhibitor selection. The selection system contained the same components as the reporter system except successful binding of the zinc finger array to the CBS drove BlaC expression, an inhibitor of the beta-lactamase class of antibiotics, instead of lacZ. Expression of BlaC allowed for growth on Carbenicillin plates. The selection was driven by the addition of Clavulanic acid, an inhibitor of beta lactamase inhibitors. Low level expression of BlaC can result in growth on Carbenicillin plates, but the addition of clavulanic acid inhibits BlaC activity and results in the depletion of false positives and further enrichment of strong binders to any modification made to the binding site. Libraries of mutations in the zinc finger array fused to gal11P were selected for binders to an altered binding sequence through low stringency conditions followed by selection on a gradient of clavulanic acid. Growth on the highest stringency end of the gradient indicated variants in the zinc finger array that are strong binders to the new binding sequence.

FIGS. 7A-C: Binding activity of variants on altered CTCF binding sites. Variants picked from the high stringency gradient of the selective plates were tested for binding activity on sequences representing all four possible nucleotides at position 2 of the core sequence (gray star). Amino acid sequence of variants pulled out of the selection were listed above the heat map and the nucleotide present at position 2 of the core sequence was indicated on the y-axis. FIG. 7A: The nucleotide at position 2 is T. FIG. 7B: The nucleotide at position 2 is A. FIG. 7C: The nucleotide at Binding was quantified by the beta-galactosidase reporter system and colorimetric ONPG assay. Binding activity of wild-type CTCF zinc finger array on the wild-type binding site sequence was indicated by the white dot. A diagram of the ZF7 alpha recognition helix for each nucleotide change is on the left. It included the amino acid residues interacting with the triplet in the binding sequence. The amino acid at position 3 of the alpha helix was varied in the library and is indicated by an ‘X’. FIGS. 7A-C disclose “RKSXLGV” as SEQ ID NO: 5551.

FIG. 8: Increasing the variation within the recognition helix produced stronger binders. Four amino acids were targeted for variance in the library to allow for more flexibility in the selection and generate stronger binders to the modified binding site of choice. ZF7 targeting a C:G change at position 2 (gray star) of the core sequence was selected for variants using the expanded approach. Each amino acid codon was replaced with ‘VNS’ codons at the indicated sites (‘X’). Twelve colonies were picked from the high-stringency end of the selection and tested for their ability to bind to the CTCF binding site when the indicated nucleotide is at positon 2 of core sequence. Amino acid sequence of the variants selected are listed on the x-axis and the nucleotide at position two of the core sequence is on the y-axis. Wild-type zinc finger array binding activity on wild-type binding sequence is indicated by the white dot. FIG. 8 discloses “RKSXLGV” as SEQ ID NO: 5551, “AHLQV” as SEQ ID NO: 10, “DHLRT” as SEQ ID NO: 16, “DHLAT” as SEQ ID NO: 17, “DHLQT” as SEQ ID NO: 8, “DHLQV” as SEQ ID NO: 12, “SDLGV” as SEQ ID NO: 5552, “EHLKV” as SEQ ID NO: 13, “EHLVV” as SEQ ID NO: 15, “EHLNV” as SEQ ID NO: 9 and “EHLRE” as SEQ ID NO: 11.

FIGS. 9A-C: Selected variants binding altered binding sites sequence at position 3 of core motif in CBS. Selections performed on library of variants centered around alterations in position −1 to 3 of recognition helix in ZF7 of the 11 finger CTCF zinc finger array. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 3 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by dashed lines. (A) Selections performed on A:T change in the binding site, (B) A:G change, (C) A:C change. Most variants pulled out had relaxed binding specificity instead of altered specificity. FIGS. 9A-C disclose “RKSD” as SEQ ID NO: 711, “RKHD” as SEQ ID NO: 173, “RRSD” as SEQ ID NO: 174, “RKAD” as SEQ ID NO: 175, “IPRI” as SEQ ID NO: 176, “RKDD” as SEQ ID NO: 177, “QALL” as SEQ ID NO: 180, “PHRM” as SEQ ID NO: 181, “ELLN” as SEQ ID NO: 179 and “GIVN” as SEQ ID NO: 178.

FIGS. 10A-B: Selections performed targeting sequence changes at position 5 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 2 to 6 of the ZF6 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 5 of the core motif in the core motif of the CBS (gray star). Direct protein-DNA contacts were indicated by dashed lines. (A) Selections performed on C:T change in the binding site, (B) C:G change. No variants grew beyond the low stringency end of the gradient on selection plates for C:A change and were considered weak/insufficient binders. Most variants pulled out had relaxed binding specificity instead of altered specificity with the exception of THMKR' (SEQ ID NO: 33) targeting C:G change in the binding sequence. FIGS. 10A-B disclose “GNMAR” as SEQ ID NO: 182, “NAMKR” as SEQ ID NO: 30, “EGMTR” as SEQ ID NO: 183, “NAMRG” as SEQ ID NO: 185, “GTMKM” as SEQ ID NO: 1255, “SNMVR” as SEQ ID NO: 184, “DHMNR” as SEQ ID NO: 32, “EHMRR” as SEQ ID NO: 34, “EHMGR” as SEQ ID NO: 31, “THMNR” as SEQ ID NO: 35 and “THMKR” as SEQ ID NO: 33.

FIGS. 11A-C: Selections performed targeting sequence changes at position 6 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position −1 to 3 of ZF6 recognition helix. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 6 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by dashed lines. (A) Selections performed on A:T change in the binding site, (B) A:G change, (C) A:C change. Variants analyzed from the A:T selection had relaxed binding profile while variants from A:G selection showed strong binding for only the changed nucleotide. No good binders were identified in the A:C selection. FIGS. 11A-C disclose “NINES” as SEQ ID NO: 36, “QSGT” as SEQ ID NO: 1582, “HRES” as SEQ ID NO: 37, “RHDT” as SEQ ID NO: 40, “RPDT” as SEQ ID NO: 38, “RTDI” as SEQ ID NO: 39, “RADN” as SEQ ID NO: 167 and “ERKS” as SEQ ID NO: 1479.

FIGS. 12A-C: Selections performed targeting sequence changes at position 7 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 4 to 6 of ZF5 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 7 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by a line. (A) Selections performed on G:T change in the binding site, (B) G:A change, (C) G:C change. FIGS. 12A-C disclose “DGLRV” as SEQ ID NO: 45, “HGLKV” as SEQ ID NO: 41, “HRLKE” as SEQ ID NO: 42, “HALKV” as SEQ ID NO: 43, “YKLKR” as SEQ ID NO: 5553, “SRLKE” as SEQ ID NO: 44, “HTLKV” as SEQ ID NO: 46 and “NRLKE” as SEQ ID NO: 47.

FIGS. 13A-C: Selections performed targeting sequence changes at position 8 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 2 to 6 of ZF5 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 8 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by a line. (A) Selections performed on G:T change in the binding site, (B) G:A change, (C) G:C change. Note the different variants that appear with the same library being used to bind to the same changes in the sequence, but in a different position on the binding site. FIGS. 13A-C disclose “GGLVR” as SEQ ID NO: 50, “QALRR” as SEQ ID NO: 49, “HGLIR” as SEQ ID NO: 51, “YKLKR” as SEQ ID NO: 5553, “ATLKR” as SEQ ID NO: 48, “GGLTR” as SEQ ID NO: 55, “HGLVR” as SEQ ID NO: 54, “ANLSR” as SEQ ID NO: 52, “TGLTR” as SEQ ID NO: 53, “HGLRR” as SEQ ID NO: 59, “ADLKR” as SEQ ID NO: 58, “HTLRR” as SEQ ID NO: 56 and “TVLKR” as SEQ ID NO: 57.

FIGS. 14A-C: Selections performed targeting sequence changes at position 10 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 2 to 6 of ZF4 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 10 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by a line. (A) Selections performed on G:T change in the binding site, (B) G:A change, (C) G:C change. G:C selection did not produce any growth at the high stringency end of the gradient selective plates. Binding data reflects colonies picked from mid-tier region, which is why they did not perform well as binders. White dot indicates wild-type CTCF zinc finger array binding activity on wild-type binding sequence. FIGS. 14A-C disclose “GHLRK” as SEQ ID NO: 162, “AKLRL” as SEQ ID NO: 3311, “AHLRK” as SEQ ID NO: 60, “SKLKR” as SEQ ID NO: 3470, “GGLGL” as SEQ ID NO: 62, “AKLRI” as SEQ ID NO: 63, “AKLRV” as SEQ ID NO: 61, “EKLRI” as SEQ ID NO: 186, “SKLRV” as SEQ ID NO: 65 and “TKLKV” as SEQ ID NO: 64.

FIGS. 15A-C: Selections performed targeting sequence changes at position 11 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 2 to 6 of ZF4 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 11 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by a line. (A) Selections performed on G:T change in the binding site, (B) G:A change, (C) G:C change. FIGS. 15A-C disclose “RRLDR” as SEQ ID NO: 67, “SKLKR” as SEQ ID NO: 3470, “ATLRR” as SEQ ID NO: 66, “GNLTR” as SEQ ID NO: 70, “ANLRR” as SEQ ID NO: 69, “TNLRR” as SEQ ID NO: 68, “AMLRR” as SEQ ID NO: 73, “AMLKR” as SEQ ID NO: 71, “HMLTR” as SEQ ID NO: 72 and “TMLRR” as SEQ ID NO: 74.

FIGS. 16A-C: Selections performed targeting sequence changes at position 13 of the core motif in the CBS. Selections performed on library of variants centered around alterations in position 2 to 6 of ZF3 recognition helix, leaving the 4th position unchanged. ‘VNS’ codons were introduced at positions indicated by ‘X’ and selected against three different nucleotide changes at position 13 of the core motif in the CBS (gray star). Direct protein-DNA contacts are indicated by a line. (A) Selections performed on G:T change in the binding site, (B) G:A change, (C) G:C change. FIGS. 16A-C disclose “QQLLI” as SEQ ID NO: 79, “QQLLV” as SEQ ID NO: 77, “QQLIV” as SEQ ID NO: 75, “GELVV” as SEQ ID NO: 78, “GELVR” as SEQ ID NO: 5554, “SQLIV” as SEQ ID NO: 76, “QGLLV” as SEQ ID NO: 83, “GQLTV” as SEQ ID NO: 81, “GQLIV” as SEQ ID NO: 80, “GKLVT” as SEQ ID NO: 187, “TELII” as SEQ ID NO: 82, “GQLLT” as SEQ ID NO: 85, “QQLLT” as SEQ ID NO: 84, “GELLT” as SEQ ID NO: 86 and “ATLAD” as SEQ ID NO: 5555.

FIG. 17: Binding activity of multi-finger variants on multiple sequence changes to the CBS. Diagram of the recognition helices of zinc finger 4-7 out of the 11 finger array, binding to their respective triplets in the core motif of the CBS. Altered amino acids are indicated by ‘X’ and nucleotide changes to the wild-type CBS are indicated by a gray star in the diagram and by bolded letters. ZF1-3 and ZF8-11 were unmodified in this library Protein-DNA contacts are indicated by lines between the ZF recognition helices and the CBS sequence. Wild-type CTCF 11-finger zinc finger array binding strength to wild-type CBS is indicated by a white dot. The amino acid sequence of each variant recognition helix in ZF4-7 are listed on the y-axis and binding activity on the modified CBS (changes in red) or the wild-type CBS are reflected by B2H β-gal reporter assay. FIG. 17 discloses “CGTGGTGCGAAC” as SEQ ID NO: 5556, “CAAGCGTGGTGCGCT” as SEQ ID NO: 5557, “CCAGCAGGGGGCGCT” as SEQ ID NO: 5558, “ERLRV” as SEQ ID NO: 93, “RPDT” as SEQ ID NO: 38, “DNLLA” as SEQ ID NO: 100, “AKLKK” as SEQ ID NO: 88, “AKLRK” as SEQ ID NO: 89, “NRLKV” as SEQ ID NO: 94, “RTET” as SEQ ID NO: 98, “SNLLV” as SEQ ID NO: 101, “AHLRV” as SEQ ID NO: 90, “SRLKE” as SEQ ID NO: 44, “DNLMA” as SEQ ID NO: 102, “AKLRV” as SEQ ID NO: 61, “SKLRL” as SEQ ID NO: 92, “RADV” as SEQ ID NO: 99 and “DNLRV” as SEQ ID NO: 103.

FIG. 18: Binding activity of multi-finger variants on multiple sequence changes to the CBS. The same selection as before except now there is a C:G change at position 2 of the CBS, where previously there was a C:A change. Variants pulled out of this selection had binding activity on the modified CBS without binding to the wild-type CBS. Wild-type 11-finger ZF array only showed binding activity on wild-type CBS (white dot) and no ability to bind to the modified CBS. Interestingly, the dominant variant selected for in the library contained a mutation that occurs at position 9 of the recognition helix that was either introduced during oligo synthesis (0.05% chance of the wrong nucleotide at each position) or through PCR while constructing these libraries. FIG. 18 discloses “CGTGGTGCGAGC” as SEQ ID NO: 5559, “CGAGCGTGGTGCGCT” as SEQ ID NO: 5560, “CCAGCAGGGGGCGCT” as SEQ ID NO: 5558, “GHLKK” as SEQ ID NO: 158, “SRLKE” as SEQ ID NO: 44, “EHLKV” as SEQ ID NO: 13, “RPDT(MK)R” as SEQ ID NO: 5561, “AHLRK” as SEQ ID NO: 60, “DALRR” as SEQ ID NO: 108, “RTEN” as SEQ ID NO: 112, “DHLLA” as SEQ ID NO: 114, “DGLKR” as SEQ ID NO: 109, “RPDT” as SEQ ID NO: 38, “HHLDV” as SEQ ID NO: 115, “GKLRI” as SEQ ID NO: 106 and “TRLRE” as SEQ ID NO: 110.

FIG. 19: Binding activity of multi-finger variants on multiple sequence changes to the CBS. Variants from individual pooled high stringency selections were stitched together and selected against three changes to the wild-type CBS (indicated by gray stars or bolded). Variants were assayed for binding on the modified CBS and the wild-type CBS alongside wild-type CTCF zinc finger array. The variants picked out of the selection were able to bind to the modified CBS, but not the wild-type sequence. Inversely, the wild-type zinc finger array was able to bind to the wild-type CBS, but not the modified one. FIG. 19 discloses “DTYKLKR” as SEQ ID NO: 3, “CAGGGGAGGAAC” as SEQ ID NO: 5562, “CAAGGAGGGGACGCT” as SEQ ID NO: 5563, “CCAGCAGGGGGCGCT” as SEQ ID NO: 5558, “SNLRR” as SEQ ID NO: 116, “EHMKR” as SEQ ID NO: 123, “DNLLT” as SEQ ID NO: 128, “GNLVR” as SEQ ID NO: 117, “EHMIRR” as SEQ ID NO: 34, “DNLLV” as SEQ ID NO: 129, “GNLRR” as SEQ ID NO: 118, “THMKR” as SEQ ID NO: 33, “DNLQT” as SEQ ID NO: 130, “GNLKR” as SEQ ID NO: 119, “EHMNR” as SEQ ID NO: 126, “DNLLA” as SEQ ID NO: 100, “ANLRR” as SEQ ID NO: 69, “DNLAT” as SEQ ID NO: 132, “DNLQA” as SEQ ID NO: 133, “NNLRR” as SEQ ID NO: 121, “DNLMA” as SEQ ID NO: 102, “TNLRR” as SEQ ID NO: 68, “EHMAR” as SEQ ID NO: 127 and “DNLMT” as SEQ ID NO: 135.

FIG. 20: Binding activity of multi-finger variants on multiple sequence changes to the CBS. Variants from individual pooled high stringency selections were stitched together and selected against three changes to the wild-type CBS (indicated by gray stars or bolded). Variants were assayed for binding on the modified CBS and the wild-type CBS alongside wild-type CTCF zinc finger array. The variants picked out of the selection were able to bind to the modified CBS, but not the wild-type sequence. Inversely, the wild-type zinc finger array was able to bind to the wild-type CBS, but not the modified one. FIG. 20 discloses “DTYKLKR” as SEQ ID NO: 3, “CAGGGGAGGAGC” as SEQ ID NO: 5564, “CGAGGAGGGGACGCT” as SEQ ID NO: 5565, “CCAGCAGGGGGCGCT” as SEQ ID NO: 5558, “GNLVR” as SEQ ID NO: 117, “EHMNR” as SEQ ID NO: 126, “EHLKV” as SEQ ID NO: 13, “GNLRR” as SEQ ID NO: 118, “EHMKR” as SEQ ID NO: 123, “EHLAE” as SEQ ID NO: 151, “GNLAR” as SEQ ID NO: 138, “EHMRR” as SEQ ID NO: 34, “STLNE” as SEQ ID NO: 152, “GNLMR” as SEQ ID NO: 139, “SHMNR” as SEQ ID NO: 146, “DHLQV” as SEQ ID NO: 12, “ANLRR” as SEQ ID NO: 69, “SHMRR” as SEQ ID NO: 147, “EHLNV” as SEQ ID NO: 9, “SNLRR” as SEQ ID NO: 116, “DHLNT” as SEQ ID NO: 155, “EHLQA” as SEQ ID NO: 156, “NNLRR” as SEQ ID NO: 121, “THMKR” as SEQ ID NO: 33, “DHMNR” as SEQ ID NO: 32 and “HHLMH” as SEQ ID NO: 157.

FIG. 21: Binding activity of multi-finger variants on multiple sequence changes to the CBS. Variants from individual pooled high stringency selections were stitched together and selected against three changes to the wild-type CBS (indicated by gray stars or bolded). Variants were assayed for binding on the modified CBS and the wild-type CBS alongside wild-type CTCF zinc finger array. The variants picked out of the selection were able to bind to the modified CBS, but not the wild-type sequence. Inversely, the wild-type zinc finger array was able to bind to the wild-type CBS (white dot), but not the modified one. FIG. 21 discloses “CGTGGTGCGACC” as SEQ ID NO: 5566, “RKSDLGV” as SEQ ID NO: 5, “CCAGCGTGGTGCGCT” as SEQ ID NO: 5567, “CCAGCAGGGGGCGCT” as SEQ ID NO: 5558, “GHLKK” as SEQ ID NO: 158, “TRLKE” as SEQ ID NO: 165, “RADN” as SEQ ID NO: 167, “AHLKK” as SEQ ID NO: 159, “RHDT” as SEQ ID NO: 40, “TKLRL” as SEQ ID NO: 160, “SRLKE” as SEQ ID NO: 44, “RRDT” as SEQ ID NO: 169, “TKLKL” as SEQ ID NO: 161, “RPDT” as SEQ ID NO: 38, “GHLRK” as SEQ ID NO: 162, “RTSS” as SEQ ID NO: 171, “RNDT” as SEQ ID NO: 172, “THLKK” as SEQ ID NO: 163 and “AHLRK” as SEQ ID NO: 60.

FIG. 22: Wild-type CTCF has binding activity to wild-type CTCF target site and no binding activity to two variant target sites. To confirm endogenous CTCF binds to the wild-type CBSs and not the variant binding sites, as seen in the B2H assay, in a human cell context, we harvested K562 cells, a human erythroleukemia cell line, and examined CTCF binding through ChIP-qPCR. CTCF was assayed for binding to a known CTCF target site and to two endogenous variant binding site sequences using a CTCF specific antibody to enrich for genomic DNA crosslinked to CTCF. Two sets of qPCR primers were designed for each binding site (indicated by 1.1, 1.2, etc). Binding was determined by enrichment of target site above 1% input of crosslinked and sonicated sample not treated with antibody, which is to represent the levels of the site of interest as a fold increase over the frequency of the site of interest in a sample unenriched with antibody. Antibody based enrichment of each sample is quantified by fold enrichment above untreated, and therefore unenriched, input. The negative control reflects background qPCR amplification levels of a target site that CTCF does not bind to. Anything above this negative level is considered enriched indicating CTCF binding while anything below is considered to not be unenriched, and therefore no binding by CTCF. Wild-type CTCF binds to the wild-type target site with no detectable binding to the variant binding sites as predicted by the bacterial B2H reporter assay

FIGS. 23A-B: Exogenous wild-type and variant CTCF binding activity in human cells. Two endogenous variant binding site sequences, matching one of the five variant binding sites that CTCF variants were selected on, were identified in the human genome (Variant site 1 and Variant site 2). Both wild-type CTCF with a 3×HA tag and one of the 3×HA tagged engineered CTCF variants, selected to bind to the variant binding site sequence of Variant site 1 and Variant site 2, were assayed for binding in human cells through ChIP-qPCR. FIG. 23A: 3×HA tagged wild-type CTCF binds to wild-type CTCF binding site and does not bind to either variant binding site. Human K562 cells were transfected with plasmid expressing 3×HA tagged CTCF and processed with HA antibody to enrich specifically for the exogenous CTCF (3×HA tagged) and not endogenous CTCF (no tag) binding. A negative control is provided to show ChIP-qPCR levels with no enrichment for a region that is not occupied by CTCF. These results demonstrate exogenous wild-type CTCF has the same binding activity as endogenous CTCF. FIG. 23B: 3×HA tagged variant CTCF binds to variant binding sites and does not bind to wild-type CTCF binding site. K562 cells expressing variant CTCF tagged with 3×HA were analyzed by ChIP-qPCR and treated with HA specific antibody. The same sites as in FIGS. 22 and 23A were investigated for variant CTCF binding. The variant CTCF could bind to the variant sites as indicated by enrichment with variant specific HA antibody and no detectable binding was seen at the wild-type binding site as indicated by lack of HA antibody-based enrichment.

FIGS. 24A-B: Changes in gene expression relative to wild-type control of genes located around variant binding sites. A variant CTCF selected to the G3 binding site sequence and variant CTCF selected to the Other binding site sequence were expressed in wild-type K562s. The variant CTCFs were fused to GFP and RNA was isolated from GFP+ cells 72 hours post nucleofection. cDNA was generated from the RNA and quantified by RT-qPCR. Gene expression levels across samples were normalized to a house keeping gene (HPRT). Changes in gene expression are relative to gene expression levels in wild-type K562s expressing wild-type CTCF tagged with GFP. FIG. 24A. Changes in gene expression of genes around G3 variant binding site in the presence of variant CTCF relative to the wild-type CTCF control. FIG. 24B. Changes in gene expression of genes around Other variant binding site relative to the wild-type control.

FIG. 25: Introduction of variant binding sites upstream of MYC leads to reduction of Endogenous MYC expression. The CTCF binding site ˜2 kb upstream of the MYC TSS was replaced with one of six different sequences used for CTCF variant selections (listed in table). The introduction of these sequences with 4-6 nucleotide changes from the wild-type CTCF binding site sequence result in a reduction of endogenous MYC expression to the same levels as when the CTCF binding site is deleted and loop formation is disrupted. WT_6 sequence has 4 point mutations from the native CTCF binding site, but these changes should have no impact on wild-type CTCF binding as indicated by results from the B2H reporter assay. This appears to be the case as MYC expression levels in the WT_6 cell line are comparable to wild-type K562 MYC expression levels. Because K562 vitality is linked to MYC expression, all variant cell lines were generated in a K562 cell line with exogenous MYC expressed off of a separate PGK promoter (exoMYC.K562). FIG. 25 discloses SEQ ID NOS 5568-5573, respectively, in order of appearance.

FIGS. 26A-B: Variant CTCFs are able to bind the engineered G3 variant binding site and recover MYC expression. CTCF variants selected to bind to the G3 variant binding site sequence were expressed in the G3_3.K562 cell line. Cells were analyzed for MYC expression and CTCF occupancy on the DNA 72 hours post nucleofection. Residues of ZF helix of the variant and wild-type (indicated by (wt) are listed in the legend. G3 binding site sequence and interacting recognition helix of the CTCF zinc finger array is also diagramed. FIG. 26A. Endogenous MYC levels are recovered to wild-type levels in the G3_3 cell line when CTCF variants are expressed. Endogenous MYC expression levels were quantified by RT-qPCR and are relative to reduced endogenous MYC levels of G3_3 cell line. Endogenous MYC levels from the exoMYC.K562 cell line without any alterations to the CTCF binding site is shown as a positive control (separated by dashed lines). FIG. 26B. CTCF variants are able to bind to the introduced variant binding site in G3_3 cell line while the wild-type CTCF does not. CTCF Ab specific enrichment captures both wild-type and variant CTCF while HA Ab will only detect HA-tagged CTCF (transiently expressed). exoMYC.K562 is included as a control for ChIP-qPCR and is separated by dashed line. exoMYC.K52 has the native sequence at the CTCF binding site upstream of MYC and should demonstrate wild-type CTCF binding. The exogenously expressed CTCFs (variant and wild-type) are HA tagged and expressed in the G3_3 cell line. ChIP-qPCR was performed to investigate CTCF binding to the variant CTCF site replacing the wild-type site ˜2 kb upstream of MYC (MYC site). An endogenous G3 site elsewhere in the genome and a region with no known CTCF binding served as a positive and negative control respectively. The variant CTCFs are able to bind to the variant site as indicated by enrichment with both CTCF and HA antibody, while the wild-type CTCF does not. FIGS. 26A-B disclose “CAGGGGAGGAGC” as SEQ ID NO: 5564, “DTYKLKR” as SEQ ID NO: 3, “SNLRR” as SEQ ID NO: 116, “GNLRR” as SEQ ID NO: 118, “GNLVR” as SEQ ID NO: 117, “ANLRR” as SEQ ID NO: 69, “GNLMR” as SEQ ID NO: 139, “NNLRR” as SEQ ID NO: 121, “GNLAR” as SEQ ID NO: 138, “SKLKR” as SEQ ID NO: 3470, “EHMKR” as SEQ ID NO: 123, “EHMIRR” as SEQ ID NO: 34, “EHMNR” as SEQ ID NO: 126, “SHMRR” as SEQ ID NO: 147, “SHMNR” as SEQ ID NO: 146, “THMKR” as SEQ ID NO: 33, “DHMNR” as SEQ ID NO: 32, “GTMKM” as SEQ ID NO: 1255, “DHLNT” as SEQ ID NO: 155, “EHLAE” as SEQ ID NO: 151, “DHLQV” as SEQ ID NO: 12, “EHLKV” as SEQ ID NO: 13, “STLQE” as SEQ ID NO: 225, “EHLNV” as SEQ ID NO: 9, “STLNE” as SEQ ID NO: 152, “EHLQA” as SEQ ID NO: 156, “HHLMH” as SEQ ID NO: 157 and “SDLGV” as SEQ ID NO: 5552.

FIGS. 27A-B: Variant CTCFs are able to bind the engineered A3 variant binding site and recover MYC expression. CTCF variants selected to bind to the A3 variant binding site sequence were expressed in the A3_4.K562 cell line. Cells were analyzed for MYC expression and CTCF occupancy on the DNA 72 hours post nucleofection. Residues of ZF helix of the variant and wild-type (indicated by (wt) are listed in the legend. A3 binding site sequence and interacting recognition helix of the CTCF zinc finger array is also diagramed. FIG. 27A. Endogenous MYC levels are recovered to wild-type levels in the A3_4 cell line when CTCF variants are expressed. Endogenous MYC expression levels were quantified by RT-qPCR and are relative to reduced endogenous MYC levels of A3_4 cell line. Endogenous MYC levels from the exoMYC.K562 cell line without any alterations to the CTCF binding site is shown as a positive control (separated by dashed lines). FIG. 27B. CTCF variants are able to bind to the introduced variant binding site in A3_4 cell line while the wild-type CTCF does not. CTCF Ab specific enrichment captures both wild-type and variant CTCF while HAAb will only detect HA-tagged CTCF (transiently expressed). exoMYC.K562 is included as a control for ChIP-qPCR and is separated by dashed line. exoMYC.K52 has the native sequence at the CTCF binding site upstream of MYC and should demonstrate wild-type CTCF binding. The exogenously expressed CTCFs (variant and wild-type) are HA tagged and expressed in the A3_4 cell line. ChIP-qPCR was performed to investigate CTCF binding to the variant CTCF site replacing the wild-type site ˜2 kb upstream of MYC (MYC site). An endogenous A3 site elsewhere in the genome and a region with no known CTCF binding served as a positive and negative control respectively. The variant CTCFs are able to bind to the variant site as indicated by enrichment with both CTCF and HA antibody above the negative control, while the wild-type CTCF does not bind. FIGS. 27A-B disclose “CAGGGGAGGAAC” as SEQ ID NO: 5562, “DTYKLKR” as SEQ ID NO: 3, “GNLKR” as SEQ ID NO: 119, “GNLVR” as SEQ ID NO: 117, “SNLRR” as SEQ ID NO: 116, “ANLRR” as SEQ ID NO: 69, “GNLRR” as SEQ ID NO: 118, “NNLRR” as SEQ ID NO: 121, “TNLRR” as SEQ ID NO: 68, “SKLKR” as SEQ ID NO: 3470, “EHMNR” as SEQ ID NO: 126, “EHMIRR” as SEQ ID NO: 34, “EHMKR” as SEQ ID NO: 123, “THMKR” as SEQ ID NO: 33, “EHMAR” as SEQ ID NO: 127, “GTMKM” as SEQ ID NO: 1255, “DNLLA” as SEQ ID NO: 100, “DNLLV” as SEQ ID NO: 129, “DNLQA” as SEQ ID NO: 133, “DNLLT” as SEQ ID NO: 128, “DNLAT” as SEQ ID NO: 132, “DNLQT” as SEQ ID NO: 130, “DNLMA” as SEQ ID NO: 102, “DNLMT” as SEQ ID NO: 135 and “SDLGV” as SEQ ID NO: 5552.

FIG. 28: Variant CTCFs recover MYC expression of the Other 10 variant binding site cell line. CTCF variants selected to bind to the Other variant binding site sequence were expressed in the Other 10.K562 cell line. Cells were analyzed for MYC expression 72 hours post nucleofection. Residues of ZF helix of the variant and wild-type CTCFs (indicated by (wt) are listed in the legend. Other binding site sequence and interacting recognition helix of the CTCF zinc finger array is also diagramed. A. Endogenous MYC levels are recovered to wild-type levels in the Other 10 cell line when CTCF variants are expressed. Endogenous MYC expression levels were quantified by RT-qPCR and are relative to reduced endogenous MYC levels of Other 10 cell line. Endogenous MYC levels from the exoMYC.K562 cell line without any alterations to the CTCF binding site is shown as a positive control (separated by dashed lines). FIG. 28 discloses “RKSDLGV” as SEQ ID NO: 5, “CGTGGTGCGACC” as SEQ ID NO: 5574, “TKLRL” as SEQ ID NO: 160, “THLKK” as SEQ ID NO: 163, “GHLRK” as SEQ ID NO: 162, “TKLKL” as SEQ ID NO: 161, “AHLRK” as SEQ ID NO: 60, “AHLKK” as SEQ ID NO: 159, “SKLKR” as SEQ ID NO: 3470, “SRLKE” as SEQ ID NO: 44, “TRLKE” as SEQ ID NO: 165, “YKLKR” as SEQ ID NO: 5553, “RRDT” as SEQ ID NO: 169, “RPDT” as SEQ ID NO: 38, “RNDT” as SEQ ID NO: 172, “RADN” as SEQ ID NO: 167, “RHDT” as SEQ ID NO: 40 and “QSGT” as SEQ ID NO: 1582.

FIG. 29: Variant CTCFs recover MYC expression of the Aother_2 variant binding site cell line. CTCF variants selected to bind to the Aother variant binding site sequence were expressed in the Aother_2.K562 cell line. Cells were analyzed for MYC expression 72 hours post nucleofection. Residues of ZF helix of the variant and wild-type CTCFs (indicated by (wt) are listed in the legend. Aother binding site sequence and interacting recognition helix of the CTCF zinc finger array is also diagramed. A. Endogenous MYC levels are recovered to wild-type levels in the Aother_2 cell line when CTCF variants are expressed. Endogenous MYC expression levels were quantified by RT-qPCR and are relative to reduced endogenous MYC levels of Aother_2 cell line. Endogenous MYC levels from the exoMYC.K562 cell line without any alterations to the CTCF binding site is shown as a positive control (separated by dashed lines). FIG. 29 discloses “CGTGGTGCGAAC” as SEQ ID NO: 5575, “AKLRK” as SEQ ID NO: 89, “AKLRV” as SEQ ID NO: 61, “SKLRL” as SEQ ID NO: 92, “SKLKR” as SEQ ID NO: 3470, “NRLKV” as SEQ ID NO: 94, “SRLKE” as SEQ ID NO: 44, “YKLKR” as SEQ ID NO: 5553, “RTET” as SEQ ID NO: 98, “RPDT” as SEQ ID NO: 38, “RADV” as SEQ ID NO: 99, “QSGT” as SEQ ID NO: 1582, “SNLLV” as SEQ ID NO: 101, “DNLMA” as SEQ ID NO: 102, “DNLRV” as SEQ ID NO: 103 and “SDLGV” as SEQ ID NO: 5552.

To date, there are no engineered CTCF variants available that are designed to bind to mutant CBSs with higher affinity than wild-type CTCF. Therefore, there is a need for engineered CTCF variants that can bind to mutant CBSs with higher affinity than wild-type CTCF.

The present disclosure is based, at least in part, on the discovery that CTCF variants with alterations in the zinc finger array can be engineered to recognize CBSs that harbor one or more point mutations, i.e., mutant CBSs.

CTCF

CCCTC-binding factor (CTCF) is a multi-domain protein that acts as an essential genome organizer by maintaining higher-order chromatin structure while also having a role in cell differentiation and the promotion or repression of gene expression. CTCF maintains topologically associated domains (TADs) spanning megabases of the genome as well as smaller scale Sub-TADs leading to fine-tuned gene insulation or gene activation within gene clusters. In addition, CTCF has been found to regulate mRNA splicing by influencing the rate of transcription and more recently been implicated in promoting hom*ologous recombination repair at double-strand breaks. Wild type CTCF binds throughout the genome via an 11 finger zinc finger array that recognizes canonical CTCF binding sites (CBSs).

Wild-type CTCF ZF arrays comprise the following sequences at ZFs 3-6 positions −1 to +6:

(SEQ ID NO: 1)
ZF3 positions −1 to +6: TSGELVR
(SEQ ID NO: 2)
ZF4 positions −1 to +6: EVSKLKR
(SEQ ID NO: 3)
ZF5 positions −1 to +6: DTYKLKR
(SEQ ID NO: 4)
ZF6 positions −1 to +6: QSGTMKM
(SEQ ID NO: 5)
ZF7 positions −1 to +6: RKSDLGV

A wild-type CTCF has an amino acid sequence that has greater than 80%, greater than 85%, greater than 90%, greater than 95%, greater than 96%, greater than 97%, greater than 98% or greater than 99% sequence identity as compared to the amino acid sequence shown below:

(SEQ ID NO: 190)
MEGDAVEAIVEESETFIKGKERKTYQRRREGGQEEDACHLPQNQTDGGEV
VQDVNSSVQMVMMEQLDPTLLQMKTEVMEGTVAPEAEAAVDDTQIITLQV
VNMEEQPINIGELQLVQVPVPVTVPVATTSVEELQGAYENEVSKEGLAES
EPMICHTLPLPEGFQVVKVGANGEVETLEQGELPPQEDPSWQKDPDYQPP
AKKTKKTKKSKLRYTEEGKDVDVSVYDFEEEQQEGLLSEVNAEKVVGNMK
PPKPTKIKKKGVKKTFQCELCSYTCPRRSNLDRHMKSHTDERPHKCHLCG
RAFRTVTLLRNHLNTHTGTRPHKCPDCDMAFVTSGELVRHRRYKHTHEKP
FKCSMCDYASVEVSKLKRHIRSHTGERPFQCSLCSYASRDTYKLKRHMRT
HSGEKPYECYICHARFTQSGTMKMHILQKHTENVAKFHCPHCDTVIARKS
DLGVHLRKQHSYIEQGKKCRYCDAVFHERYALIQHQKSHKNEKRFKCDQC
DYACRQERHMIMHKRTHTGEKPYACSHCDKTFRQKQLLDMHFKRYHDPNF
VPAAFVCSKCGKTFTRRNTMARHADNCAGPDGVEGENGGETKKSKRGRKR
KMRSKKEDSSDSENAEPDLDDNEDEEEPAVEIEPEPEPQPVTPAPPPAKK
RRGRPPGRTNQPKQNQPTAIIQVEDQNTGAIENIIVEVKKEPDAEPAEGE
EEEAQPAATDAPNGDLTPEMILSMMDR

For the purpose of comparing two different nucleic acid or polypeptide sequences, one sequence (test sequence) may be described to be a specific percentage identical to another sequence (comparison sequence). The percentage identity can be determined by the algorithm of Karlin and Altschul, Proc. Natl. Acad. Sci. USA, 90:5873-5877 (1993), which is incorporated into various BLAST programs. The percentage identity can be determined by the “BLAST 2 Sequences” tool, which is available at the National Center for Biotechnology Information (NCBI) website. See Tatusova and Madden, FEMS Microbiol. Lett., 174(2):247-250 (1999). For pairwise DNA-DNA comparison, the BLASTN program is used with default parameters (e.g., Match: 1; Mismatch: −2; Open gap: 5 penalties; extension gap: 2 penalties; gap x_dropoff: 50; expect: 10; and word size: 11, with filter). For pairwise protein-protein sequence comparison, the BLASTP program can be employed using default parameters (e.g., Matrix: BLOSUM62; gap open: 11; gap extension: 1; x_dropoff: 15; expect: 10.0; and wordsize: 3, with filter). Percent identity of two sequences is calculated by aligning a test sequence with a comparison sequence using BLAST, determining the number of amino acids or nucleotides in the aligned test sequence that are identical to amino acids or nucleotides in the same position of the comparison sequence, and dividing the number of identical amino acids or nucleotides by the number of amino acids or nucleotides in the comparison sequence. When BLAST is used to compare two sequences, it aligns the sequences and yields the percent identity over defined, aligned regions. If the two sequences are aligned across their entire length, the percent identity yielded by the BLAST is the percent identity of the two sequences. If BLAST does not align the two sequences over their entire length, then the number of identical amino acids or nucleotides in the unaligned regions of the test sequence and comparison sequence is considered to be zero and the percent identity is calculated by adding the number of identical amino acids or nucleotides in the aligned regions and dividing that number by the length of the comparison sequence. Various versions of the BLAST programs can be used to compare sequences, e.g., BLAST 2.1.2 or BLAST+ 2.2.22.

CTCF Binding Sites (CBSs)

The CBS is typically 40 bp in length with a highly conserved 15 bp core sequence (or core motif). Sequence flanking the core sequence is significantly less well conserved, but still important for CTCF binding at sites throughout the genome (FIG. 1).

Wild type CTCF binds to a “consensus CBS motif” contains the following core sequence: 5′-NCDNHNGRNGDNNNN-3′ (SEQ ID NO: 191).

In one embodiment, the consensus CBS motif contains the following core sequence: 5′-CCAGCAGGGGGCGCT-3′ (SEQ ID NO:6). Other core sequences that are known in the art.

It is not known if the nucleotides flanking the core sequence are bound by the 11 finger ZF array present within CTCF. Co-crystal structures of the 11-finger Zinc Finger (ZF) array bound to a consensus CTCF Binding Sequence (CBS) suggests that only ZFs 3-7 of the 11-finger ZF array appear to bind directly to the highly conserved core sequence while ZFs 8-11 and 1-2 do not appear to mediate sequence-specific contacts. Progressive truncations of the ZF array suggest that ZFs 8-11 and ZFs 1-2 may improve DNA-binding of CTCF to CBSs and DNasel foot printing, as well as ChIP-Seq and ChIP-Exo data, suggests that ZFs 9-11 may make important protein-DNA contacts (Rhee and Pugh, Cell (2011); Nakahashi et al., Cell Reports (2013)). Interestingly, the co-crystal structure of the CTCF Z array bound to a CBS only contains zinc finger 2-9 with the other fingers not visible in the structure, consistent with the idea that zinc fingers interacting with flanking regions of the motif may not make stable contacts with the DNA (Hashimoto, et al., Molecular Cell (2017)). Thus, it remains unclear what impact all 11 fingers of the array have on DNA binding activity of CTCF and if all zinc fingers, or a subset, contact the DNA.

CTCF binding is sensitive to changes in the conserved 15 bp core motif of the CBS, where, in mice, single nucleotide changes at certain positions can lead to loss of CTCF binding (Nakahashi et al., Cell Reports (2013)). CTCF binding sites have been reported to be mutational hotspots in cancer with cancer-associated mutations localized to the core sequence of the CTCF binding site in primary samples from gastrointestinal cancer patients and with accompanying atypical gene expression profiles of oncogenic and tumor suppressor genes (Guo et al., Nature Communications (2018)). Small deletions of CTCF binding sites have also been shown to lead to loss of expression of genes such as MYC and PTGS2, which both play a role in cancer development (Schuijers et al., Cell Reports (2018); Kang et al., Oncogene (2015)).

Methods described herein can be used to select and generate engineered CTCF variants comprising a plurality of zinc fingers, where the selected polypeptide has at least one amino acid residue in at least one zinc finger that differs in sequence from a wild-type CTCF, and where the engineered CTCF variant binds to a DNA sequence of interest (e.g., CBS harboring at least one mutation in the consensus CBS sequence) but does not bind to a consensus CBS. Using methods of the present invention, a scaffold polypeptide is re-engineered into a new scaffold-based zinc-finger polypeptide that has different structural and functional features, such that the new polypeptide binds to a sequence of interest but does not bind to a naturally occurring DNA binding site of the scaffold protein.

The term “zinc finger” or “Zf” refers to a polypeptide having DNA binding domains that are stabilized by zinc. The individual DNA binding domains are typically referred to as “fingers.” A Zf protein has at least one finger, preferably 2 fingers, 3 fingers, or 6 fingers. A Zf protein having two or more Zfs is referred to as a “multi-finger” or “multi-Zf” protein. Each finger typically comprises an approximately 30 amino acid, zinc-chelating, DNA-binding domain. An exemplary motif characterizing one class of these proteins is -Cys-(X) (2-4)-Cys-(X) (12)-His-(X) (3-5)-His (SEQ ID NO:7), where X is any amino acid, which is known as the “C(2)H(2)class.” A single Zf of this class typically consists of an alpha helix containing the two invariant histidine residues co-ordinated with zinc along with the two cysteine residues.

The term “bind to” or “binding” with respect to a nucleic acid binding factor and its target nucleic acid, e.g., CTCF (variant or wild-type) and CBS, refers to sequence-dependent binding of the nucleic acid binding factor to the target nucleic acid sequence of a nucleic acid through intermolecular interactions, e.g., ionic, covalent, London dispersion, dipole-dipole, or hydrogen bonding, in such a way that the binding allows the nucleic acid binding factor to mediate a biologically significant function, e.g., transcriptional activation, recruitment of other proteins to the binding site, and/or alteration of chromatic structure. Such binding can result in modulation of expression of genes, such as activation, overexpression, suppression, or inactivation of gene expression.

The term “does not bind to” with respect to a nucleic acid binding factor and its target nucleic acid, e.g., CTCF (variant or wild-type) and CBS, refers to the lack of sequence-specific binding of the nucleic acid binding factor to a nucleic acid through intermolecular interactions, e.g., ionic, covalent, London dispersion, dipole-dipole, or hydrogen bonding, as a result of the lack of presence of a target sequence in the nucleic acid (e.g., due to one or more point-mutations in the CBS). Such non-binding does not allow the nucleic acid binding factor to mediate a biologically significant function, e.g., transcriptional activation, DNA modification, DNA cleavage, recruitment of other proteins to the binding site, and/or alteration of chromatic structure.

Each finger within a Zf protein binds to from about two to about five base pairs within a DNA sequence. Typically a single Zf within a Zf protein binds to a three or four base pair “subsite” within a DNA sequence. Accordingly, a “subsite” is a DNA sequence that is bound by a single zinc finger. A “multi-subsite” is a DNA sequence that is bound by more than one zinc finger, and comprises at least 4 bp, preferably 6 bp or more. A multi-Zf protein binds at least two, and typically three, four, five, six or more subsites, i.e., one for each finger of the protein.

Compositions and Methods

Described herein are engineered CTCF variants that can bind to mutant CBSs with higher affinity than wild-type CTCF. The engineered CTCF variants can be used in regulating genes that are under the control of mutant CBSs (CBSs having at least one nucleic acid that is different in sequence from the nucleic acid sequence of a consensus CBS). The CTCF variants have at least one amino acid residue in at least one zinc finger that differs in sequence from the amino acid sequence of a wild-type CTCF.

Exemplary engineered CTCF variants include those that contain:

(1) the amino acid sequence DHLQT (SEQ ID NO:8), EHLNV (SEQ ID NO:9), AHLQV (SEQ ID NO:10), EHLRE (SEQ ID NO:11), DHLQV (SEQ ID NO:12), EHLKV (SEQ ID NO:13), DHLQV (SEQ ID NO:14), EHLVV (SEQ ID NO:15), DHLRT (SEQ ID NO:16), DHLAT (SEQ ID NO:17), or DHLQT (SEQ ID NO:18) at ZF7 positions +2 to +6;

(2) the amino acid sequence DHLQT (SEQ ID NO:19), EHLNV (SEQ ID NO:20), AHLQV (SEQ ID NO:21), EHLRE(SEQ ID NO:22), DHLQV (SEQ ID NO:23), EHLKV (SEQ ID NO:24), DHLQV (SEQ ID NO:25), EHLVV (SEQ ID NO:26), DHLRT (SEQ ID NO:27), DHLAT (SEQ ID NO:28), or DHLQT (SEQ ID NO:29) at ZF7 positions +2 to +6;

(3) the amino acid sequence NAMKR (SEQ ID NO:30), EHMGR (SEQ ID NO:31), DHMNR (SEQ ID NO:32), THMKR (SEQ ID NO:33), EHMRR (SEQ ID NO:34), or THMNR (SEQ ID NO:35) at ZF6 positions +2 to +6;

(4) the amino acid sequence MNES (SEQ ID NO:36), HRES (SEQ ID NO:37), RPDT (SEQ ID NO:38), RTDI (SEQ ID NO:39), or RHDT (SEQ ID NO:40) at ZF6 positions −1 to +3;

(5) the amino acid sequence HGLKV (SEQ ID NO:41), HRLKE (SEQ ID NO:42), HALKV (SEQ ID NO:43), SRLKE (SEQ ID NO:44), DGLRV (SEQ ID NO:45), HTLKV (SEQ ID NO:46), or NRLKE (SEQ ID NO:47) at ZF5 positions +2 to +6;

(6) the amino acid sequence ATLKR (SEQ ID NO:48), QALRR (SEQ ID NO:49), GGLVR (SEQ ID NO:50), HGLIR (SEQ ID NO:51), ANLSR (SEQ ID NO:52), TGLTR (SEQ ID NO:53), HGLVR (SEQ ID NO:54), GGLTR(SEQ ID NO:55), HTLRR(SEQ ID NO:56), TVLKR(SEQ ID NO:57), ADLKR (SEQ ID NO:58), or HGLRR (SEQ ID NO:59) at ZF5 positions +2 to +6;

(7) the amino acid sequence AHLRK (SEQ ID NO:60), AKLRV (SEQ ID NO:61), GGLGL (SEQ ID NO:62), AKLRI (SEQ ID NO:63), TKLKV (SEQ ID NO:64), or SKLRV (SEQ ID NO:65) at ZF4 positions +2 to +6;

(8) the amino acid sequence ATLRR (SEQ ID NO:66), RRLDR (SEQ ID NO:67), TNLRR (SEQ ID NO:68), ANLRR (SEQ ID NO:69), GNLTR (SEQ ID NO:70), AMLKR (SEQ ID NO:71), HMLTR (SEQ ID NO:72), AMLRR (SEQ ID NO:73), or TMLRR (SEQ ID NO:74) at ZF4 positions +2 to +6;

(9) the amino acid sequence QQLIV (SEQ ID NO:75), SQLIV (SEQ ID NO:76), QQLLV (SEQ ID NO:77), GELVV (SEQ ID NO:78), QQLLI (SEQ ID NO:79), GQLIV (SEQ ID NO:80), GQLTV (SEQ ID NO:81), TELII (SEQ ID NO:82), QGLLV (SEQ ID NO:83), QQLLT (SEQ ID NO:84), GQLLT (SEQ ID NO:85), GELLT (SEQ ID NO:86), or QQLLI (SEQ ID NO:87) at ZF3 positions +2 to +6;

(10) the amino acid sequence AKLKK (SEQ ID NO:88), AKLRK (SEQ ID NO:89), AHLRV (SEQ ID NO:90), AKLRV (SEQ ID NO:91), or SKLRL (SEQ ID NO:92) at ZF4 positions +2 to +6; the amino acid sequence ERLRV (SEQ ID NO:93), NRLKV (SEQ ID NO:94), SRLKE (SEQ ID NO:95), or NRLKV (SEQ ID NO:96) at ZF5 positions +2 to +6; the amino acid sequence RPDT (SEQ ID NO:97), RTET (SEQ ID NO:98), or RADV (SEQ ID NO:99) at ZF6 positions −1 to +3; and the amino acid sequence DNLLA (SEQ ID NO:100), SNLLV (SEQ ID NO:101), DNLMA (SEQ ID NO:102), or DNLRV (SEQ ID NO:103) at ZF7 positions +2 to +6;

(11) the amino acid sequence GHLKK (SEQ ID NO:104), AHLRK (SEQ ID NO:105), or GKLRI (SEQ ID NO:106) at ZF4 positions +2 to +6; the amino acid sequence SRLKE (SEQ ID NO:107), DALRR (SEQ ID NO:108), DGLKR (SEQ ID NO:109), or TRLRE (SEQ ID NO:110) at ZF5 positions +2 to +6; the amino acid sequence at RPDT (SEQ ID NO:111) or RTEN (SEQ ID NO:112) at ZF6 positions −1 to +3; and the amino acid sequence EHLKV (SEQ ID NO:113), DHLLA (SEQ ID NO:114), or HHLDV (SEQ ID NO:115) at ZF7 positions +2 to +6;

(12) the amino acid sequence SNLRR (SEQ ID NO:116), GNLVR (SEQ ID NO:117), GNLRR (SEQ ID NO:118), GNLKR (SEQ ID NO:119), ANLRR (SEQ ID NO:120), NNLRR (SEQ ID NO:121), or TNLRR (SEQ ID NO:122) at ZF4 positions +2 to +6; the amino acid sequence EHMKR (SEQ ID NO:123), EHMRR (SEQ ID NO:124), THMKR (SEQ ID NO:125), EHMNR (SEQ ID NO:126), or EHMAR (SEQ ID NO:127) at ZF6 positions +2 to +6; and the amino acid sequence DNLLT (SEQ ID NO:128), DNLLV (SEQ ID NO:129), DNLQT (SEQ ID NO:130), DNLLA (SEQ ID NO:131), DNLAT (SEQ ID NO:132), DNLQA (SEQ ID NO:133), DNLMA (SEQ ID NO:134), or DNLMT (SEQ ID NO:135) at ZF7 positions +2 to +6;

(13) the amino acid sequence GNLVR (SEQ ID NO:136), GNLRR (SEQ ID NO:137), GNLAR (SEQ ID NO:138), GNLMR (SEQ ID NO:139), ANLRR (SEQ ID NO:140), SNLRR (SEQ ID NO:141), or NNLRR (SEQ ID NO:142) at ZF4 positions +2 to +6; the amino acid sequence EHMNR (SEQ ID NO:143), EHMKR (SEQ ID NO:144), EHMRR (SEQ ID NO:145), SHMNR (SEQ ID NO:146), SHMRR (SEQ ID NO:147), THMKR (SEQ ID NO:148), or DHMNR (SEQ ID NO:149) at ZF6 positions +2 to +6; and the amino acid sequence EHLKV (SEQ ID NO:150), EHLAE (SEQ ID NO:151), STLNE (SEQ ID NO:152), DHLQV (SEQ ID NO:153), EHLNV (SEQ ID NO:154), DHLNT (SEQ ID NO:155), EHLQA (SEQ ID NO:156), or HHLMH (SEQ ID NO:157) at ZF7 positions +2 to +6; or

(14) the amino acid sequence GHLKK (SEQ ID NO:158), AHLKK (SEQ ID NO:159), TKLRL (SEQ ID NO:160), TKLKL (SEQ ID NO:161), GHLRK (SEQ ID NO:162), THLKK (SEQ ID NO:163), or AHLRK (SEQ ID NO:164) at ZF4 positions +2 to +6; the amino acid sequence TRLKE (SEQ ID NO:165) or SRLKE (SEQ ID NO:166) at ZF5 positions +2 to +6; and the amino acid sequence RADN (SEQ ID NO:167), RHDT (SEQ ID NO:168), RRDT (SEQ ID NO:169), RPDT (SEQ ID NO:170), RTSS (SEQ ID NO:171), or RNDT (SEQ ID NO:172) at ZF6 positions −1 to +3.

In some embodiments, the engineered CTCF variants contain two or more combinations of the above-listed amino acid sequences.

In one embodiment of the present disclosure, mutations at certain positions within the consensus CBS substantially reduced binding by the wild-type CTCF zinc finger array in a bacterial two-hybrid system that was used to select for variants from randomized libraries that are capable of recognizing the mutated CBS sequence. Combining fingers together can be used to generate variant CTCF zinc finger arrays capable of recognizing CBSs harboring multiple point mutations. In some embodiments of the present disclosure, CTCF proteins harboring these zinc finger array variants are used to restore CTCF binding activity at sites bearing one or more mutations within a CBS (i.e., non-canonical CBSs). In some embodiments of the present disclosure, CTCF variants capable of recognizing alternative non-CBS sites in the genome. In some embodiments, such CTCF variants can be used to create artificial TADs and/or enhancer-promoter loops that can purposefully insulate genes and/or perturb the higher order structure of the genome and thereby alter expression of certain target genes of interest.

Diagnosis and Treatment of Diseases

The engineered CTCF variants described herein can be used for treating diseases where aberrant gene regulation due to mutant CBS is an underlying factor. The engineered CTCF variants described herein can, for example, bind to mutant CBSs that do not bind wild-type CTCFs, thereby altering or restoring gene regulation that can reverse or slow down progression of diseases. CTCF binding has been shown to regulate expression of oncogenes, such as MYC. Mutations accumulated in CTCF binding sites and loss of wild-type CTCF binding are associated to dysregulation of oncogenes and increased risk of carcinogenesis. Transcriptional dysregulation of MYC is one of the most frequent events in aggressive tumor cells and the dysregulation is a result of mutations in CTCF binding site disrupting enhancer-promoter loop. Engineered CTCF variants can bind to the mutated sites and restore normal gene expression levels, reducing risk of cancer development. In another case, Fragile X Syndrome is the result of a duplication in a repetitive region and the loss of FMR1 expression. Duplication of a repeat region in the X chromosome disrupts a CTCF binding site, leading to the loss of an enhancer-promoter loop driving the expression of FMR1. The engineered CTCF variants could restore the enhancer-promoter loop, leading to restoration of FMR1 expression. Human Papilloma Virus (HPV) and other integrating viruses (such as HIV) are often silenced by CTCF-mediated insulation of the viral genome from nearby enhancers. In the case of HPV18, there is a CTCF binding site in the promoter region of the viral genome. HPV18 that have mutations in the CTCF binding site are not silenced because these sequence mutations in the binding site can no longer be recognized by CTCF. Engineered CTCF variants would be able to bind to the mutated HPV integrated genomes and restore the insulating loop.

Kits

Also provided herein are kits comprising the engineered CTCF variant, and/or nucleic acids encoding an engineered CTCF variant as described herein and instructions for use.

Other Applications for the Engineered CTCF Variants

The engineered CTCF variants described herein can be used in a number of other applications, some of which are disclosed herein.

In some embodiments, the engineered CTCF variant, or nucleic acids encoding such engineered CTCF variant can be used to further elucidate the complex interactions of CTCF and other chromatin organization proteins. The structural maintenance of chromosomes is tightly regulated within cells and CTCF plays a major role. It still remains unclear how higher order structures are inherited across cell division and maintained through cell differentiation, the use of CTCF variants can help clarify that role. CTCF variants might be used to investigate how loops are formed across the genome and to modify or restore normal genomic architecture in a manner that impacts endogenous gene expression for research and therapeutic applications. They might also be used to re-establish ancestral CTCF binding sites so that we may better understand the evolutionary implications of TAD-based genome organization and epigenetic regulation of gene expression or to create alternative genomic architectures that impact endogenous gene expression for research and therapeutic applications.

Materials and Methods

The following materials and methods were used in the examples set forth below.

Construction of B2H Reporter Assay Components

The zinc-finger bacterial expression plasmid contained the CTCF zinc finger array (or variants) fused to gal11P. The amino-terminal end of all or part of the CTCF 11-finger zinc finger array was fused to the carboxy-terminal end of gal11P with a Flag tag linker between them. The zinc finger expression plasmid contains a Kanamycin resistance gene. The second plasmid, known as the bacterial reporter plasmid, contained CTCF binding site sequence that was introduced via BsaI restriction digest followed by T4 mediated ligation of annealed oligos containing the CTCF binding site. The reporter plasmid contained bacterial lac promoter that promoted the expression of lacZ when the CTCF binding site was bound. The reporter plasmid also has a Chloramphenicol resistance gene.

Bacterial-Two-Hybrid (B2H) Randomized Library Construction

Complimentary oligos were synthesized by IDT with ‘VNS’ or ‘NNS’ variation introduced in the sequence by design. Oligos were annealed and ligated into the zinc finger expression plasmid (previously digested with XbaI and BamHI) using T4 ligase. Ligation reaction was purified using Qiagen Minelute column and the purified substrate was electro-transformed into electro-competent XL1blue E. coli strain. After 1 hour recover in SOC at 37° C., the transformation was inoculated into 150 mL Luria broth (LB) with 50 ug/mL of Kanamycin. After the culture reached a OD600 of 0.400-0.600 (about 10 hours growth at 37° C.) the culture was spun down and the library was harvested using Qiagen Maxiprep kit.

Bacterial-Two-Hybrid (B2H) Reporter Assay

600 ng of gal11P-zinc finger expression plasmid and 600 ng of reporter plasmid with CTCF binding site of interest were chemically transformed into 150 uL of Δλ E. coli strain with an alpha N-terminal domain of RNA polymerase (α-NTD)-Gal4 fusion. Plasmid and cell mixture was incubated on ice for 30 minutes, heat shocked at 42° C. for 1 minute, recovered on ice for 2 minutes, followed by recovery in 500 uL of Luria Broth for 1 hour. Post-recovery, transformation was plated on Kanamycin (50 ug/mL), Chloramphenicol (12.5 ug/uL) selective LB agar plates. After 14-16 hours of growth at 37° C., colonies were picked and grown overnight in 1 mL of induction media (Luria broth with 50 ug/uL of Kanamycin, 12.5 ug/mL of Chloramphenicol, 10 ug/mL of ZnCl, and 500 ug/mL of IPTG). After 15-17 hours of growth, 25 uL of the overnight culture was sub-cultured into 1 mL of fresh induction media and grown for 2 hours at 37° C. or until cultures were between OD595 0.157-0.268 as measured by spectrophotometer. 100 uL of the subculture in then lysed for minimum of 15 minutes using 11 ul of a 1:10 mixture of lysozyme and PopCulture soap. 15 uL of the lysis mixture was then analyzed for fold activation of LacZ by previously described colorimetric ONPG assay. Binding was quantified by fold activation of LacZ. Fold activation was determined by calculating the fold increase of β-gal levels of a sample above the β-gal levels of the negative control (no zinc finger protein fused to gal11P).

Bacterial-Two-Hybrid (B2H) Selection Assay

Plasmids involved in the selection assay are the same as before with only one difference: The reporter plasmid is made to be a selective plasmid by swapping LacZ with BlaC, an antibiotic resistance gene for β-lactam ring class of antibiotics, such as Carbenicillin. Selections are carried out by constructing libraries of variants from a pool of oligos ligated into the zinc finger-gal11P expression plasmid. These are electro-transformed into electro-competent E. coli strain containing the selective plasmid with the CTCF binding site of interest. Cells are recovered in 1 mL of SOC for 1 hour at 37° C. followed by induction of selective plasmid for 3 additional hours at 37° C. in 4 mLs of induction media (previously described). After four total hours, transformations are plated on low stringency plates (LB agar with 50 ug/mL of Kanamycin, 12.5 ug/mL of chloramphenicol, 100 ug/mL of Carbenicillin, 10 ug/mL of zinc chloride, and 200 ug/mL, IPTG and 0.45 ug/mL of Clavulanic acid). Plates are grown overnight at 37° C. for 20-24 hours and then colonies are harvested off the surface with 2 mL of LB. 50 uL of the scrapped colonies are sub-cultured into 1 mL of terrific broth (TB) with 50 ug/mL of Kanamycin, and 12.5 ug/mL of Chloramphenicol and grown 14-16 hours at 37° C. The next day, plasmid is harvested from the overnight cultures and chemically transformed into chemically competent Δλ E. coli strain containing the same selective plasmid with the CTCF binding site of interest as before. The chemical transformation is performed as previously described with the addition of 2 hour growth in induction media following a 1 hour recovery at 37° C. After a total of 3 hours of growth, cells are plated on high stringency selective gradient plates. The high stringency gradient plates contains 50 ug/mL of Kanamycin, 12.5 ug/mL of Chloramphenicol, 100 ug/mL of Carbenicillin, 10 ug/mL of ZnCl, 200 ug/mL of IPTG with a gradient of Clavulanic acid starting from ˜1 up to 40 ug/mL in concentration. Plates were incubated 20-24 hours at 37° C. Colonies that grew on the gradient with the highest levels of Clavulanic acid were picked and grown in lmL of TB with 50 ug/mL of Kanamycin and grown overnight in order to harvest the plasmid. The variant plasmid was then Sanger sequenced as well as analyzed for binding activity in the B2H β-gal reporter assay.

High Stringency Gradient Plates

The high stringency gradient plates contains 50 ug/mL of Kanamycin, 12.5 ug/mL of Chloramphenicol, 100 ug/mL of Carbenicillin, 10 ug/mL of ZnCl, 200 ug/mL of IPTG with a gradient of Clavulanic acid starting from ˜1 to 40 ug/mL in concentration. To obtain a gradient of Clavulanic acid, rectangle plates are elevated using a pipette tip so as to have a ˜25° C. slope (enough of an angle so that the thin end of the wedge is only barely covered with LB agar). 20-25 mL of LB agar with 50 ug/mL of Kanamycin, 12.5 ug/mL of Chloramphenicol, 100 ug/mL of Carbenicillin, 10 ug/mL of ZnCl, 200 ug/mL of IPTG and 4 ug/mL of Clavulanic acid is added to the inclined plate to form the bottom wedge. Once solidified, the plates are laid flat and 20-25 mLs of LB agar with 50 ug/mL of Kanamycin, 12.5 ug/mL of Chloramphenicol, 100 ug/mL of Carbenicillin, 10 ug/mL of ZnCl, 200 ug/mL of IPTG (with no Clavulanic acid) is poured on top. This creates plates with a gradient of Clavulanic acid ranging from ˜1 ug/mL up to 4.0 ug/mL.

CTCF Binding Assay Using ChIP-qPCR

K562 cells were seeded 18-24 hours in advance of transfection at a density of 3×105 cells/mL. 3 million K562s per variant were transfected using Lonza Kit V using the provided optimized protocol and pooled in a 10 cm dish. 5 ug of plasmid expressing HA epitope tagged CTCF (wild-type or variant) expressed by a pCAG promoter was used for each 1 million cell reaction. 72 hours post transfection, approximately 10 million cells were crosslinked with 1% Formaldehyde at 37° C. for 10 mins. Reaction was quenched with 1.2 mL of 2.5M Glycine for 5 mins at 37° C. Cells were pelleted at 430 g for 10 mins and sonicated on SFX250 Branson sonifier for 5.5 mins, 32% Amplitude, 1.3s off, 0.7s on. The samples were then split in half, one precipitated overnight, rotating at 4° C. with antibody specific to CTCF and the other precipitated overnight with HA specific antibody. The next day, antibody bound chromatin complexes were incubated with G-dynabeads for 2 hours at 4° C., rotating. Beads were washed three times in 1 mL of ice-cold RIPA 150 Wash Buffer (0.1% SDS, 0.1% DOC, 1% Triton X-100, 1 mM EDTA, 10 mM Tris-HCl pH 8, 150 mM NaCl), three time in 1 mL of ice-cold RIPA 500 wash buffer (0.1% SDS, 0.1% DOC, 1% Triton X-100, 1 mM EDTA, 10 mM Tris-HCl pH 8, 500 mM NaCl), three times in 1 mL of ice-cold LiCl wash buffer (10 mM Tris-HCl pH8, 250 mM LiCl, 0.5% Triton X-100, 0.5% DOC), and once in 1 mL of ice-cold 10 mM Tris-HCl pH 8.5. The antibody chromatin complex was eluted from the beads in 100 uL of Elution Buffer (10 mM Tris-HCl pH 8, 0.1% SDS, 150 mM NaCl) with 5 mM DTT added fresh. Beads were incubated with elution buffer at 65° C. for 1 hour, shaking at 900 rpm. Beads were pelleted by magnet and supernatant was moved to a clean tube where, after cooling to room temp, 1 uL of RNAse (Roche 11119915001) was added to the sample and incubated at 37° C. for 30 mins at 600 rpm. 3 uL of Proteinase K [20 mg/mL] was added to samples and incubated overnight at 65° C. (Lifetech #100005393). The next day, 100 uL of SPRI beads with 160 uL of PEG/NaCl (20% PEG, 2.5M NaCl) were added to samples, vortexed and incubated at room temp for 5 minutes before pelleting beads on a magnet. Pellet was washed twice with 80% ethanol and air dried for 5 minutes before final elution in 150 uL of 10 mM Tris-HCl pH 8. 3 uL of recovered supernatant was mixed with 5 uL of SYBR qPCR master mix and 2 uL of primer mix for quantification of fragment enrichment over 1% input untreated by antibody by Real Time-qPCR.

Generation of Variant Binding Site Cell Lines

Cell lines with the variant binding site introduced at the CTCF binding site ˜2 kb upstream of MYC TSS were generated by nucleofecting exoMYC.K562 with SpCas9-P2A-GFP, gRNA targeting the CTCF binding site, and one of 6 distinct ssODNs as HDR templates to introduce the 6 different variant binding sites. exoMYC.K562 is K562 cell line transduced with exogenous MYC construct expressed off of PGK promoter. This was necessary as any reduction of endogenous MYC expression can impact the survival of K562 cells. GFP+ cells were sorted at a high dilution into a 96 well plate for single-cell clonal expansion. Once expanded, gDNA and RNA was extracted to genotype and phenotype the clonal cell population. Clonal lines that had a reduction of endogenous MYC and also appeared hom*ozygous at the target site for the desired HDR event were used in the study.

Quantifying MYC Expression by RT-qPCR

Three million K562 cells genome edited to harbor the variant binding site upstream of MYC were nucleofected with 5 ug of plasmid expressing a variant CTCF following the Lonza Kit V protocol. 72 hours post nucleofection, 1 million cells were isolated for RNA extraction following the NucleoSpin RNA Plus RNA isolation protocol. The RNA was converted to cDNA via Thermo High-Capacity RNA-to-cDNA Kit. 3 uL of 1:20 dilution of cDNA was mixed with 5 uL of Thermo Fast SYBRgreen Master Mix and run on RT-qPCR machine following standard PCR amplification protocol.

Results

Single Nucleotide Substitution at CBS Affecting CTCF Binding Efficiency

We reasoned we could use a bacterial two-hybrid (B2H) system to evolve the zinc finger array of CTCF to bind to mutated CBSs bearing single or multiple sequence changes that disrupt wild-type CTCF binding (Wright et al. Nature Protocols (2006); Sander et al., Nature Methods (2010); Maeder et al. Molecular Cell. (2008)). We used a previously described bacterial-two-hybrid (B2H) system to systematically define the impact of single nucleotide substitutions within a previously defined consensus CBS site (Joung et al., PNAS (2000)). In the B2H system, the binding of a DNA-binding zinc finger array to a target site of interest can be configured to result in increased transcription of a reporter gene (e.g., beta-galactosidase or an antibiotic resistance gene) (FIG. 2). To do this, two fusions are expressed in an E. coli cell bearing a reporter construct. The first fusion consists of a zinc finger array fused to a fragment of the yeast Gal11P protein, which interacts with a fragment of the yeast Gal4 fusion. The second fusion consists of a fusion of the N-terminal domain of the E. coli RNA polymerase alpha subunit to the yeast Gal4 fragment (the α-Gal4 fusion). The reporter construct consists of a weak E. coli promoter that drives expression of the reporter gene of interest with a binding site for the zinc finger array positioned upstream of the promoter. Binding of the zinc finger-Gal11P fusion to the zinc finger binding site results in recruitment of RNA polymerase complexes harboring the alpha-Gal4 fusion, resulting in increased transcription of the reporter gene. If the reporter gene is lacZ, which encodes for β-galactosidase (β-gal), the level of beta-gal expression can be easily quantified using a well-established colorimetric ONPG-based assay (FIG. 2).

In this B2H reporter assay, we determined the entire zinc finger array (ZF1-11) and the full CTCF binding site (CBS), not just the 15 bp consensus CBS sequence, was required for optimal expression of the lacZ gene (FIG. 3), which mimics observed CTCF binding requirements in human cells 10, 11. After optimizing positioning of the CBS site relative to the transcription start site, we then systematically introduce point mutations into the CBS and tested their impact on lacZ expression. These results demonstrated that mutation of nucleotides outside the 15 bp core sequence had little impact on lacZ expression. By contrast, binding, however certain sequences at certain positions within the core sequence resulted in no or reduced binding (FIG. 4). Our results closely match ChIP-Seq data for CTCF binding sites in human cells and reflect other studies in the literature in which point mutations in the CTCF core lead to loss of CTCF binding. Taken together, these results strongly suggest that binding activity of the CTCF zinc finger array in the B2H system mimics the binding activity of intact CTCF protein in human cells.

Although most sequence changes in the flanking regions of the binding site had little impact on binding efficiency, certain alterations appeared to slightly improve the fold-activation of lacZ expression. Therefore, we tested whether a more “optimized” CBS bearing the “best” nucleotides as defined in the B2H assay might lead to higher-fold activation of lacZ expression but we did not observe any higher activity compared with the original consensus sequence (derived from Nakahashi et al. ChIP-seq data) (FIG. 5).

Generation of Engineered CTCF Variants That Bind to Mutated CBSs with Single Altered Nucleotide

Next, we sought to determine if we could use the B2H system to select for CTCF zinc finger array variants capable of recognizing mutated CBSs not recognized by the wild-type CTCF zinc finger array. To do this, we modified the B2H reporter construct, replacing the lacZ gene with the blaC gene (FIG. 6), which encodes beta-lactamase and therefore confers resistance to beta-lactam antibiotics (e.g., carbenicillin). This modification enables us to select for cells that express a CTCF zinc finger array variant that can efficiently bind a mutant CBS positioned upstream of the weak promoter driving blaC expression. Increasingly higher levels of blaC expression can be selected for by using media containing carbenicillin and increasingly higher concentrations of the beta-lactamase inhibitor clavulanic acid. Gradients of clavulanic acid can be created within a single agar plate (FIG. 6; see Materials and Methods), thereby enabling sampling of cells at various concentrations of the inhibitor.

With this modified B2H selection system, we first sought to identify CTCF zinc finger array variants that can bind to CBSs bearing single point mutations that abolish binding by the wild-type CTCF zinc finger array in this system. In an initial set of selection experiments, we sought to identify CTCF zinc finger array variants that could bind to mutant CBSs bearing mutations of the C that is contacted by an aspartic acid (D) present at the third position (+3) of the alpha-helical recognition helix of ZF7 (shown by previously published co-crystal structures cited above). We created a randomized library of CTCF zinc finger array variants in which the codon encoding the ZF7 +3 position was randomized using a degenerate NNS codon (where N=G, A, C, or T and S=G or C). We then used the B2H selection system to interrogate this library to identify variants capable of recognizing CBSs bearing C to T, C to G, and C to A substitutions at the position contacted by ZF +3. Selections were initially performed on low stringency plates with clavulanic acid gradients ranging from 0 to 0.45 ug/ml) and surviving colonies harvested and plasmids encoding the variant zinc finger arrays were purified. This selected subset of variants was then subjected to high stringency selection in the B2H system on plates with carbenicillin and gradients of clavulanic acid ranging from 0 to 4 ug/ml). Plasmids encoding variant zinc finger arrays were purified from colonies that grew on the end of the gradient plate with highest concentration of clavulanic acid, sequenced, and then tested in the B2H reporter assay by beta-galactosidase assay.

As can be seen in FIGS. 7A-C, we obtained CTCF zinc finger array variants that showed preferential binding activity (as judged by the B2H reporter assay) for the mutated CBS compared with the original consensus CBS. These clones also showed selection for a particular amino acid at the ZF7 +3 position: for the C to T site, a threonine (T) was selected, for the C to A site, an asparagine (N) was selected, and for the C to G site a histidine (H) was selected. The identities of these amino acids is consistent with what might be expected to recognize the mutant nucleotide based on previous zinc finger selections using the Zif268 zinc finger array. However, although we successfully selected for mutants that had altered binding activity, in most cases, the binding activity of the variant for the mutated CBS was not as strong (as judged by the B2H reporter assay) as that of the wild-type CTCF zinc finger array for the consensus CBS (FIGS. 7A-C).

Based on our previous experience with re-engineering the DNA-binding specificities of the Zif268 zinc finger array, we hypothesized that obtaining stronger binding variants might require alteration of amino acids flanking the +3 position in ZF7. To test this idea, we created a larger library of variants in which we randomized positions +2, +3, +5 and +6 of ZF7 using degenerate VNS codons (where V=G, A, or C). Position +4 of ZF7 was not altered because it faces the internal core of the ZF domain and is not expected to make contacts to the DNA. We then performed B2H selections as described above using this library to identify variants that could identify a mutant CBS with a C to G mutation at the position contacted by ZF7 +3 in the wild-type CTCF zinc finger array. These selections identified variants that showed stronger binding activity for the mutant CBS and showed some degree of consensus in the identities of amino acids selected (FIG. 8).

Based on this success, we generated additional randomized libraries in which randomized positions −1, +1, +2, and +3 or +2, +3, +5 and +6 for ZF7, ZF6, ZF5, ZF4, and ZF3. We then performed selections as described above using these libraries against various matched mutant CBSs harboring nucleotide substitutions at positions expected to be contacted by residues randomized in the libraries (FIGS. 9-16). Analysis of variants from individual surviving colonies at the most selective end of the high stringency selection plates showed that many of these selections yielded variants with high activity for the mutant CBS of interest and sequencing of these clones showed that there was generally a degree of consensus in the amino acid sequences suggesting that selection was successfully occurring (FIGS. 9-16).

Generation of Engineered CTCF Variants That Bind to Mutated CBSs with Multiple Altered Nucleotides

Having successfully identified CTCF zinc finger variants that could recognize CBSs with a single altered nucleotide position, we next sought to identify variants that could recognize CBSs bearing multiple mutated nucleotides. To do this, we sought to recombine ZF variants each selected to bind to different “subsites” within the CBS that bear individual mutations. However, because of well-known context-dependent effects that exist between ZFs in a multi-finger array, we undertook a strategy in which we recombined together pools of selected ZF variants (rather than a single variant) for any given altered subsite to identify the combinations of mutated ZFs that best work together to recognize a CBS bearing multiple mutations. To isolate pools of ZF variants for various mutated CBS subsites, we harvested all remaining clones from the high stringency selection plates we performed with the CBS sites bearing single mutations (depicted in FIGS. 9-16). Deep sequencing of the various selected clones in these pools yielded a variety of sequences with some degree of consensus within each selection as expected (Table 1).

We then recombined pools of variants for ZFs 4, 5, 6, and 7 to create CTCF zinc finger arrays that harbored various altered recognition helices for these positions and then performed B2H selections (see Materials and Methods) against five different mutated CBSs bearing combinations of various nucleotide substitutions in subsites for ZFs 4, 5, 6, and 7 (FIGS. 17-21). Sequencing of clones from these selections showed that certain recognition helix sequences for each finger were selected multiple times, suggesting that the selections were identifying combinations that work well together. Importantly, for all five of the multiply mutated CBSs, several of the CTCF zinc finger array variants identified showed good binding activity on the site for which they were selected as judged by B2H assay (FIGS. 17-21). In addition, for four of the five mutant CBS sites, we were able to identify variants that not only bind to the mutant CBS but also fail to bind to the original unmutated (consensus) CBS. Thus, we conclude that using our approach described here we are able to identify CTCF ZF array variants capable of recognizing multiply mutated CBSs that are not efficient bound by the original wild-type CTCF zinc finger array.

Binding Specificity of Engineered CTCF Variants to Mutant and Wild-Type CBSs in Human Cells

Having successfully engineered variants that can recognize CBSs with multiple sequence changes across the motif, we next wanted to investigate whether the variants can bind to these same mutant binding sites in a human cell context while not binding to wild-type CBSs. First, we found a collection of sites in the human genome that matched the 15 bp core sequence for each of the five mutated binding sites that we had selected CTCF variants to bind (described in FIG. 17-21). We then looked at two variant binding sites with sequence that matched one of the five mutated binding sites (sequence depicted in FIG. 20) as well as known CBSs to determine if endogenous CTCF could bind to the wild-type CBS and not bind to the variant binding sites as the B2H reporter assay would suggest (FIG. 20). Human K562s, an erythroleukemia cell line, were harvested and analyzed by ChIP-qPCR using CTCF specific antibody to detect CTCF-DNA binding. Wild-type CTCF showed no detectable binding to two different target sites that matched the mutated CBS but showed great enrichment for wild-type CTCF binding site, supporting the results of the B2H reporter assay (FIG. 22). Next, we wanted to see if overexpressed, exogenous, 3×HA tagged wild-type CTCF delivered by plasmid transfection in K562s had the same binding profile observed with endogenous CTCF. Wild-type K562s were transfected with 3×HA-CTCF and 72 hours later were harvested and processed for ChIP-qPCR analysis with HA specific antibodies. Exogenous wild-type 3×HA-CTCF could bind to the wild-type CBSs and could not bind to the variant binding sites, same as endogenous wild-type CTCF, suggesting overexpression of CTCF by plasmid delivery reflects biologically relevant behavior (FIG. 23A). Based on these results, we next examined the ability of a variant CTCF to bind to the variant binding sites native to the human genome. The variant chosen was one pulled out from selection in the B2H selection assay and shown to bind to the variant site with the same sequence as variant site 1 and 2, used in FIGS. 22-23B, by the B2H reporter assay. K562s were transfected with the 3×HA-tagged CTCF variant and the same sites as before were examined for binding activity by ChIP-qPCR. Variant specific HA enrichment was present at the variant binding sites and lacking at the wild-type sites suggesting we successfully evolved a variant that can specifically bind to mutant CBS with as few as three nucleotide changes without binding native CBSs (FIG. 23B).

Gene Expression Regulation by Engineered CTCF Variants Via Looping

CTCF has the capacity to alter gene expression through CTCF-Cohesin mediated looping of the genome. We were curious to see if the variant CTCFs could reproduce the gene regulatory capacity of wild-type CTCF when bound to the endogenous variant binding sites. To investigate gene expression changes, we focused on genes within a 1 Mb region of the variant binding sites. Eleven genes were identified within 1 Mb region for Variant site 1.1 and 1.2 and another 10 genes were identified for Variant site 2.1 and 2.2. K562s were nucleofected with variant CTCFs fused to GFP that had the capacity to bind to Variant site 1 and Variant site 2. 72 hours post nucleofection, RNA was isolated from GFP+ cells and gene expression levels were compared to RNA extracted from K562s nucleofected with a wild-type CTCF control. Of the 11 genes for Variant site 1.1 and 1.2, 6 genes showed a change in gene expression relative to cells nucleofected with the wild-type CTCF control (JJ388) (FIG. 24A). 2 of the 10 genes identified for Variant site 2.1 and 2.2 had altered gene expression levels relative to wild-type control (FIG. 24B). This data suggests that not only do the variant CTCF proteins bind to their target sequence in human cells, but it also reproduces the biological role of native CTCF to regulate gene expression possibly through the formation of loops or sub-TADs.

Next we wanted to demonstrate that the CTCF variants could replicate the biological function of wild-type CTCF at a known CTCF binding site that creates an enhancer-promoter loop. MYC expression is maintained by a loop formed between a CTCF binding site ˜2 kb upstream of the transcriptional start site (TSS) of MYC and a CTCF binding site ˜1 kb downstream of the MYC TSS14. When CTCF Is bound to both sites, cohesin links both CTCFs via the CTCF's cohesin-interaction domain, creating a loop that maintains the expression of MYC. If one or both of the CTCF binding sites is disrupted the CTCF-mediated loop is lost and there is a reduction in MYC expression14. Five cell lines were generated containing the 5 different variant binding site sequences (defined in FIG. 25) at the CTCF binding site ˜2 kb upstream of the MYC TSS. This was done in K562 background transduced with a lentiviral construct expressing exogenous MYC via phosphoglycerate kinase (PGK) promoter (exoMYC.K562) to compensate for any reduced cell fitness that reduction of endogenous MYC expression may cause. An additional sixth cell line was generated where point mutations to the CTCF binding site were made that should have no affect on wild-type CTCF binding as indicated by results from the B2H reporter assay. RNA was isolated from the clonal cell lines hom*ozygous for the variant binding sites and endogenous MYC gene expression levels were assayed by reverse transcriptase Real Time qPCR (RT-qPCR). Each of the isolated cell lines with the variant CTCF binding site demonstrated a reduced level of MYC expression suggesting that the CTCF-mediated loop is disrupted (FIG. 25).

Based on this result, we wanted to see if expression of the variant CTCFs in these modified cell lines could bind to the engineered sites and restore MYC expression. HA tagged wild-type CTCF and HA tagged CTCF variants were expressed in the cell line that contained their matching variant binding site. Variants selected to bind to the G3 variant binding site were expressed in the G3_3 cell line, A3 variants in the A3_4 cell line, etc. HA-tagged wild-type CTCF was also tested in each of the variant cell lines for binding and for recovery of endogenous MYC expression. The level of endogenous MYC expression in exoMYC.K562 served as wild-type control as there is no alteration to the CTCF binding site upstream of the MYC TSS. CTCF variants expressed in the engineered cell lines recovered endogenous MYC expression while expression of wild-type CTCF in these cell lines failed to recover MYC expression (FIGS. 26A-29). The same samples were analyzed for occupancy of the variant binding sites by wild-type CTCF or the variant CTCFs by ChIP-qPCR enriching for CTCF-bound DNA fragments with CTCF or HA antibody. Wild-type CTCF had a reduced occupancy of the variant binding sites, consistent with continued reduction of MYC expression, while variant CTCF proteins could bind to the variant site they were selected for as well as rescue MYC expression (FIG. 26-29). Together, this data suggests that we have evolved CTCF variants that can bind to novel sequences and still interact with cohesin to form loops that maintain gene expression profiles.

Tables

Amino acid sequence of variants selected for on different CTCF binding sites. All amino acids sequences are listed from N to C terminal. Colonies growing on the highest stringency of selection were scrapped off, pooled, and plasmid encoding for the zinc finger was isolated and deep sequenced. The number of reads reflects how prominent the variant was in the population pooled from selections performed in triplicate.

TABLE 1
ZF7
selection on C:G
change at nt 2 of
core motif in CBS.
Sequences reflect
position 2 through 6.
SEQ ID#
NO:Sequencereads
8DHLQT2981
15EHLVV2413
155DHLNT1517
16DHLRT1442
13EHLKV1434
192KDLVV1357
193DHLQA1114
194DHLLV1076
195DHLLT881
196EHLTV803
197STLME786
17DHLAT777
9EHLNV736
12DHLQV574
198DHLKT541
199EHLKE517
200DHLLE506
201EHLRV503
202STLRE498
203DHLMV431
204DHLKV427
205DHLRV394
206DHLNV389
114DHLLA380
207DHLKE368
208DHLNE330
11EHLRE330
209STLLE323
210DHLMA305
211KDLTV296
212DHLVT284
213AHLNV270
214AHLTV268
215HTLME245
216DHLRA237
217DHLAV221
218HHLAE221
219GHLMD207
220DHLST199
221EHLMV197
222AHLVV196
223EHLAV192
224HTLAE187
225STLQE181
226DHLAE167
227AHLQE163
228SSLNE158
229GHLNV155
230EHLVE144
231DHLME143
232DHLRE134
233AHLNA120
234HTLVE120
235STLKE112
236EHLQV107
237GTLME106
238HHLAV102
239HSLME101
240HSLTE97
241EHLMA97
242DHLHT94
10AHLQV94
243DHLTV93
244EHLIV90
245SGLNE89
246AHLLV85
247EHLLV84
248VKLKI83
249DHLQE80
250HTLTE77
251STLHE76
252DHLVV76
253AGLAL70
254STLND69
255DHLKA68
256KDLTQ66
257DKLMN66
258GTLRE66
259GHLTV66
260RLLTA65
261SSLRE63
262HTLKE62
263GHLAV60
264RLLAQ58
265KDLAV57
266EHLQE57
267SHLNV57
268AGLPI57
269TTLME56
90AHLRV56
270AHLMV55
271EHLME55
272EHLQT55
273EVLNR55
274HHLVV54
275KDLSV54
276RHLVM53
277THLNE50
278RDLRT49
279LLLGS49
280MVLGN48
281KTLIE47
282AHLGV46
283SGLLA46
284DHLHV45
285EHLNT45
286STLLQ44
287AHLKV44
288AHLAV42
289TNLID41
290GTLNE41
291QVLTQ40
292SSLME39
293GHLVE38
294HSLLE38
295SGLLE38
296GGLLE36
297STLRV36
298HTLAD35
299SHLME35
300DHLAI35
301EHLLA35
302HNLLL34
303PHLVV34
304KALGT33
305PHLVI31
306VLLII30
307HHLRE29
308GALRM29
309RGLHE29
310AHLLE28
311EHLKA28
312DTLLV27
313EHLRT26
314SSLRD24
156EHLQA23
315EHLAT23
316SGLGE22
317ATLQE22
318DHLSA22
101SNLLV22
319SHLLV21
320KDLMV21
321DHLQQ20
322ATLME20
323GHLQA20
324RTLTE20
325RRLAH20
326DTLQA20
327GHLEV19
328HQLKL19
329EHLLT19
330DGLRT18
331THLRP18
132DNLAT18
332EHLNA17
333STLVV17
135DNLMT17
334DTLLA17
335STLDE16
336KDLVA15
337AHLHA15
338KDLQV15
339HHLTV15
340SGLLD15
341ANLME14
129DNLLV14
342EHLKT13
343GSLAI13
344EHLSV13
345EHLNE13
346EHLVI13
347KDLKV13
348EGLGT13
130DNLQT12
349STLMS12
350AHLMM12
351IKLDG12
352VLLGA12
353PGLSA12
354AELNR12
355HQLVI12
356GHLVV12
357PHLLV11
358PRLAL11
359DHLNA11
360KDLDV11
361AHLHV11
362RVLGG11
363AHLQA11
364RQLRT10
365AHLQT10
100DNLLA10
151EHLAE10
366EHLAM10
367DRLSI10
368GGLGA10
369GHLNT10
370AHLRT10
371DTLRV10
372MSLRG9
373DHLTI9
374THLIV9
375DTLMA9
376MKLQE9
377TALGT9
378GHLLV9
379GQLAI8
380ANLES8
381AHLNT8
382EHLLE8
383SNLTV8
384STLLV8
385STLMV8
386GTLVS7
387DNLKT7
388GHLQT7
128DNLLT7
389EHLVT7
390GALRE7
391SSLAE7
392DTLRQ7
393KALLG7
394AMLNP6
395DTLHQ6
396DNLLQ6
397EHLAH6
398AHLKE6
399ATLAE6
400EHLMD6
401STLHM6
402DTLAV6
403DHLVE6
404PTLGE6
405KGLPL6
406DTLLQ6
407AHLNE6
408AHLAE6
409GHLKV6
410SGLQV5
411HHLLV5
412EPLLP5
413DNLAV5
414AHLLT5
415AHLST5
133DNLQA5
416DNLRT5
417DTLAL5
418DTLQV5
419EHLRA5
420SNLQV5
421KDLRV5
422DTLAT5
423DTLRA5
424QHLRV4
425SSLLE4
426SNLMV4
427SDLGG4
428DNLHT4
429DNLTA4
430DTLMV4
431EHLST4
432DTLSV4
102DNLMA4
433EHLVM4
434STLAE4
435KDLAE4
436SSLNV4
437SSLLV4
438AHLKT4
439AHLRE4
440KDLLV4
TABLE 2
ZF7
selection on C:T
change at nt 2 of
core motif in CBS.
Sequences reflect
position 2 through 6.
SEQ ID NO:SequenceRead #
312DTLLV3772
334DTLLA1720
406DTLLQ1681
326DTLQA1340
371DTLRV1048
418DTLQV715
423DTLRA643
375DTLMA620
430DTLMV538
402DTLAV451
422DTLAT406
441DSLLV373
432DTLSV359
442DTLLM339
392DTLRQ334
443DTLLI306
444DTLTQ300
434STLAE269
445DTLAA268
395DTLHQ246
446DTLSA227
447DTLKA216
384STLLV213
448STLQQ201
449DTLQQ200
450DTLLL194
451DTLMQ189
225STLQE189
452DTLNA180
453STLLA176
454DTLKV163
455STLNA162
456DTLRE161
457DTLTA152
458DTLQD146
459DTLVA137
460DTLLS123
461STLTQ122
462DSLLA116
463DTLRT116
464DTLQI115
465DTLMN114
466STLSE114
467SSLQV112
468TNLAV109
469DTLVV108
470DTLHA107
471DTLMT107
437SSLLV107
209STLLE107
472DSLRV106
473DTLAE105
474STLNV105
475DTLRN101
476DTLNV100
477DTLRD99
478DSLAV94
479DTLVQ94
480DTLQE93
481STLLD92
482DTLTH89
483SSLND88
484STLTV88
385STLMV87
485DTLML86
286STLLQ85
202STLRE85
486STLQA84
487DTLLD83
488DTLKQ82
489DTLLT81
417DTLAL76
490DTLII75
491DTLLN75
492DSLLQ73
493STLEQ73
494DTLGV71
495DVLRE67
496STLSA66
497DSLSV65
498DTLLE63
499STLAA63
500DTLKI62
501DTLKM62
502DTLQN60
197STLME60
503TTLMT60
504TTLAE59
505STLTE58
506VELVQ57
507TTLNQ56
508DTLMI54
509TTLMD54
510STLMA51
511DVLLA50
512DVLLT49
235STLKE49
513TTLNE49
514MTLPT48
292SSLME48
251STLHE48
515HTLVV47
269TTLME46
516ATLTQ45
517STLAS45
333STLVV44
425SSLLE43
518SSLVE42
519DALQA41
520DVLDA41
521GSLMQ41
522DTLTM40
523STLAQ39
524STLMI38
525DTLAM37
526DTLHT37
527DTLQL37
528DSLKQ36
529DSLRA36
530STLHV35
531STLMQ35
532DGLMA34
533DTLRL34
534SSLLT34
535DSLQA33
536DTLRI33
537STLGE33
538DALKE32
539STLRA31
540DTLHH30
541DTLRG30
542DTLRM30
543DVLMT30
544DTLEI29
228SSLNE29
545DTLHV28
546GTLDE28
547SSLAV28
548STLKQ28
549DTLMD27
550GTLQT27
551SSLVQ27
297STLRV27
552LMLMG25
553STLRQ25
554STLTA25
8DHLQT24
555DSLVA23
556SSLRV23
557DSLRE22
558GRLQD22
559MALQD22
560STLLH21
561STLVQ21
562VRLTA21
563AVLGD20
564PILVT20
565STLDD20
566DSLMI19
567STLID19
568TKLDT19
569ATLVA18
570DTLIA18
571DTLTE18
572GTLNH17
573STLAI17
282AHLGV16
129DNLLV16
574DQLVQ16
575MPLIL16
576TTLHQ16
577TTLQV16
578ATLLE15
579DVLHE15
580ETLRA15
581KVLRS15
101SNLLV15
135DNLMT14
582DSLRQ14
583DTLAN14
584GTLNV14
585HNLMV14
586QTLQA14
587RQLTT14
588DTLSI13
589DRLVG12
590ETLRQ12
591SSLGE12
592SSLVV12
193DHLQA11
128DNLLT11
593DTLME11
594DTLTV11
595DTLVG11
596ETLKA11
597GVLSQ11
598LALMR11
599RTLVE11
600TTLLI11
601TTLNV11
602DTLSE10
391SSLAE10
603STLAV10
TABLE 3
ZF7
selection on C:A
change at nt 2 of
core motif in CBS.
Sequences reflect
position 2 through 6.
SEQ ID NO:Sequence# read
100DNLLA2659
101SNLLV2616
135DNLMT2555
130DNLQT1983
129DNLLV1945
128DNLLT1922
132DNLAT1457
604DNLRA1117
102DNLMA1038
605DNLMV901
606DNLQV845
607DNLQQ841
396DNLLQ813
387DNLKT582
133DNLQA571
420SNLQV565
608DNLRQ494
426SNLMV459
383SNLTV458
609DNLNT412
428DNLHT389
610SNLVV349
611SNLQQ334
429DNLTA323
612DNLLS322
413DNLAV316
416DNLRT309
613DNLTT300
614DNLAA295
615SNLLA295
616SNLLQ278
617SNLAV257
618DNLNA240
619DNLGT240
103DNLRV239
620DNLKA167
621DNLMQ156
622DNLKV148
623SNLNV132
624SNLMA128
625SVLQD113
626DNLQS110
627DNLSA105
628DNLAQ103
629DNLMS98
630DNLSQ95
631DNLNV87
632DNLGV87
633SNLLT87
634DNLIA83
635DNLNQ83
636SNLQT80
637SNLRV79
638SNLIV79
639DNLSV74
640SNLQA60
641SNLLL57
642SNLDV56
643DNLVQ54
644SNLLI54
645TGLAL52
646SNLMQ51
647DQLKI40
648GDLGT40
649SNLKV39
650VPLVD38
651DNLRI37
652DNLLI37
653TNLDV36
654HDLKI35
655DNLVV35
312DTLLV32
656DNLTV31
657DNLVT31
658SNLAQ30
659DNLIV28
660SNLMT27
465DTLMN25
661SNLTQ23
662EILRI23
663IGLEA22
664HRLGG22
8DHLQT21
665DNLST20
666MRLHV19
667SNLTT18
668SNLGV16
669SNLAT16
15EHLVV16
670ANLMV14
671HVLVG14
672SNLRA13
673HNLQL12
674DNLVA12
675SNLTA12
676KGLRM12
334DTLLA12
677PMLGV11
678GVLVA11
679DNLQD11
680MKLGT11
406DTLLQ11
TABLE 4
ZF7 selection on A:T change at nt 3 of
core motif in CBS.
Sequences reflect position −1 to 3.
SEQ
ID#
NO:SequenceReads
173RKHD4641
175RKAD1938
174RRSD1299
681RRHD868
682RKTD182
683NVSM146
684RQSD76
685RKND69
686SENV69
687VDHR60
688AQIV58
689KTPH56
690PKIV51
691GAEP42
692MLVE40
693VVGN40
694KGPE36
695GKVM33
696TEPG33
697TPHN32
698MPGG31
699DLEK28
700GTDN27
701ISRL25
702ATGL21
703ASNP19
704GAPT17
705HSPN17
706RPVA16
177RKDD6
707MLVD4
708RHRK3
709RKHV3
710RKQD3
711RKSD3
712DHHT2
713GKHD2
714MKAD2
715RKAE2
716RRAD2
717APIG1
718AQNR1
719DMDA1
720EAPM1
721EEMM1
722EPIR1
723GALE1
724GENV1
725GKAD1
726GKVD1
727GPLA1
728GRIE1
729IEKL1
730KAAS1
731KEEH1
732LKVD1
733LLVE1
734LMTQ1
735MASL1
736MGIG1
737MPGD1
738MSLG1
739NDMT1
740NMHT1
741NRIV1
742PENA1
743QKHD1
744QVPD1
745RASD1
746REHD1
747RGHD1
748RKHA1
749RKHY1
750RKLD1
751RKPD1
752RKVD1
753RKYD1
754RMSD1
755RRLD1
756RRND1
757RRRD1
758RRSG1
759RWHD1
760SHRL1
761SQHV1
762SSHD1
763TTHV1
764VHHV1
765WKAD1
766WKHD1
TABLE 5
ZF7 selection on A:G change at nt 3 of
core motif in CBS.
Sequences reflect position −1 to 3.
SEQ
ID
NO:SequenceRead #
174RRSD2997
173RKHD2731
175RKAD1867
177RKDD667
682RKTD475
767HADA411
710RKQD376
768RKWD296
745RASD265
681RRHD169
685RKND126
754RMSD40
769RKGD5
743QKHD3
757RRRD3
711RKSD3
752RKVD2
180QALL2
753RKYD2
756RRND2
720EAPM1
770RRCD1
771MLPA1
772RATD1
773RKDV1
774KKPV1
775GEHG1
776HPVR1
777RQHD1
778RMMQ1
779RRGD1
780GREV1
781REQD1
782DRDM1
783SKHD1
784RLSD1
785VPTV1
786HKWD1
787KKND1
788RRSE1
749RKHY1
789READ1
790RNTD1
791MVRA1
792RKED1
793KTMG1
794NEPN1
795RGSD1
796RKRD1
797RWSD1
798TPLP1
799RKAN1
800RKAY1
801QLPL1
709RKHV1
802QGTS1
803DTMV1
804LKWD1
805MNTL1
806HADV1
697TPHN1
750RKLD1
807GRAH1
704GAPT1
808MKHD1
809HEDA1
712DHHT1
810RMLS1
811WRSD1
812DDAT1
735MASL1
730KAAS1
TABLE 6
ZF7 selection on A:C change at nt 3 of
core motif in CBS.
Sequences reflect position −1 to 3.
SEQ
ID
NO:SequenceRead #
173RKHD9
813DTEN6
775GEHG5
814STKN5
815NIEI5
801QLPL4
780GREV4
712DHHT4
782DRDM4
816MVIN4
817VPDT4
818NIVP4
819MVPS4
820PNHP4
821KTDV4
794NEPN3
760SHRL3
736MGIG3
822HIKM3
823ILQI3
741NRIV3
824IVMQ3
825QTNS3
826ENMD3
827TVER3
828THDR3
829IRSP3
771MLPA3
721EEMM2
830ARIA2
785VPTV2
831EELI2
832KPLR2
812DDAT2
833NRLS2
834PTLR2
835MHIL2
836GGGP2
837MVEN2
719DMDA2
838IVAT2
839TLDR2
840MEPL2
841DTGV2
842TSRS2
843VLSI2
844STVQ2
845GPAQ2
846VEQP2
847MTKK2
848PLIM2
802QGTS2
849AMTV2
850SPMR2
851EPNV2
735MASL2
852MQIN2
853ALDE2
728GRIE2
854ALEH2
855REKD2
856ELLA2
857GVAR2
858VDTL2
859GHEN2
730KAAS2
860ELES2
861DPDT2
862SLEL2
863TMNV2
764VHHV2
864IQPV2
865MLQE1
866VMTV1
867MVEE1
868VARP...1
869KAIG1
870DRSM1
871KNSI1
872DDVS1
873KPQP1
874PHVP1
875DTLQ1
876KLGT1
877IDPH1
878HPNT1
879KSRG1
880RQMA1
881KKEN1
882QVLD1
722EPIR1
883RRQM1
798TPLP1
884ILKN1
885HQMK1
179ELLN1
886MDGG1
887AAGS1
888STVV1
889PARA1
890ALQG1
891SAPG1
892PVLN1
742PENA1
893TSLL1
731KEEH1
894HLDV1
895IHIR1
896SVTL1
897VKDR1
898KMTI1
899AGEM1
900GDSE1
901QPVK1
902KVEA1
903EQER1
729IEKL1
904GHHV1
905GMHL1
906RLRR1
907ATIR1
908RMDI1
909SVIH1
910MDIG1
911LART1
912RLMA1
913RQPP1
914MTMT1
915EDTR1
739NDMT1
916MRGR1
917ELHA1
918TNGQ1
919VNLT1
920MHIR1
921MLLQ1
922GRGE1
923NLRG1
924HIML1
807GRAH1
805MNTL1
763TTHV1
793KTMG1
925MTSV1
926RLSM1
803DTMV1
720EAPM1
927DMGM1
928MLMM1
929LMEM1
930QAVS1
931SRVL1
932DEDP1
933SGDR1
934MMNC1
935NIGM1
936MVQR1
937APHR1
938LDAG1
939RLAN1
940MKGS1
941KKLV1
942VNQE1
943ILKQ1
944PVIP1
945VESL1
946IKQN1
947EDNI1
948THRD1
949IPAG1
950GLNH1
951VDGR1
181PHRM1
952RTGA1
953VSPD1
954KVGD1
TABLE 7
ZF6 selection on C:T change at nt 5 of
core motif in CBS.
Sequences reflect position 2 to 6.
SEQ
ID
NO:SequenceRead #
955GHMRR29
956GHMNR23
34EHMRR23
957THMRR19
33THMKR17
126EHMNR17
958GHMKR12
127EHMAR11
959EHMQR10
147SHMRR10
960SAMRR9
961ENMGR8
962SHMKR8
35THMNR7
963NHMRR7
964EGMRR7
965GNMGR7
146SHMNR6
966NGMRI6
967EGMAR6
968ESMRR6
969GHMSR5
970EGMHR5
971TAMRR5
972TNMQR5
973VNMRR5
974AHMKR4
975NGMTA4
976DGMRR4
977GHMTR4
978EHMSR4
123EHMKR4
979GSMRR4
980TNMLR4
981NHMKR4
982ENMLR4
983SPMGV3
984TNMGR3
985SSMAR3
986GGMRR3
987GGMKL3
988SGMVR3
989EHMHR3
990THMSR3
991GSMKI3
992EKMKE3
993NGMAR3
994QNMVR3
995DNMRR3
996ENMER3
997NSMRR3
998SGMKR3
999ANMQR3
1000GHMQR3
1001ANMGR3
1002DNMVR3
1003QAMRE2
1004GNMSR2
1005ESMQR2
1006TPMKV2
1007SNMGR2
1008GAMRI2
1009ANMNR2
1010DNMMR2
1011GSMKM2
31EHMGR2
1012GNMAQ2
1013EGMKG2
1014SSMKI2
1015TSMRR2
1016DGMKR2
1017DNMAR2
1018SSMRR2
1019GNMMR2
185NAMRG2
1020THMKL2
1021ENMAR2
1022NNMVR2
1023TGMKR2
1024TAMKR2
1025AHMNR2
1026QNMGR2
1027TNMVR2
1028NHMNR2
1029EHMTR2
1030GNMIR2
1031SGMRR2
1032NHMSR2
1033GGMRL2
1034SPMKV2
1035TNMRR2
1036GNMRE2
1037ENMMR2
1038THMER1
1039QKMRT1
1040GAMRR1
1041TPMEV1
1042GGMRE1
1043GDMDR1
1044GAMRA1
1045PNMSR1
1046EGMGR1
1047EGTHR1
1048QSMRE1
1049THMKG1
1050NNMGR1
1051GHMNS1
1052IDMKG1
1053ESMTR1
1054SHMKI1
1055HNMMR1
184SNMVR1
1056TAMKV1
1057DSMKR1
1058SNMAR1
1059ESMGR1
1060EAMRR1
1061GNMVR1
1062ANMRR1
1063DGMKI1
1064SHMHR1
1065GAMKE1
1066ESMRE1
1067GSMLR1
1068THMEV1
1069TSMGR1
1070EAMSK1
1071NAMRQ1
1072EGMRT1
1073SHMQR1
1074NGMKR1
1075ESMKE1
1076ANMHR1
1077DHTKR1
1078NGMRE1
1079GSMRA1
1080EGMNQ1
1081GGMRM1
1082PNMKR1
1083NGMKI1
1084SNMLR1
1085SNMRR1
1086SHMTR1
1087TGMRR1
1088SGMRI1
1089DNMGR1
183EGMTR1
TABLE 8
ZF6
selection on C:A
change at nt 5 of
core motif in CBS.
Sequences reflect
position 2 to 6.
SEQ
ID
NO:SequenceRead #
965GNMGR873
968ESMRR784
964EGMRR772
967EGMAR672
970EGMHR648
994QNMVR597
980TNMLR556
998SGMKR486
975NGMTA479
979GSMRR453
1003QAMRE452
961ENMGR434
960SAMRR431
993NGMAR401
1079GSMRA390
996ENMER389
1007SNMGR378
1046EGMGR376
1017DNMAR368
1063DGMKI347
999ANMQR342
1040GAMRR322
973VNMRR297
997NSMRR295
1005ESMQR293
1018SSMRR289
1087TGMRR289
1009ANMNR279
1044GAMRA275
183EGMTR273
126EHMNR265
1004GNMSR263
971TAMRR260
972TNMQR257
1010DNMMR253
976DGMRR241
1026QNMGR240
1082PNMKR228
1089DNMGR226
1090ETMRR225
1091DNMKI224
1014SSMKI224
995DNMRR221
1053ESMTR214
1042GGMRE214
984TNMGR211
1031SGMRR204
986GGMRR203
1022NNMVR201
1092TNMER197
1083NGMKI195
1021ENMAR194
1059ESMGR194
1019GNMMR193
1036GNMRE193
1002DNMVR187
1093TNMAR186
34EHMRR182
1066ESMRE181
1027TNMVR181
1015TSMRR175
988SGMVR173
1024TAMKR170
1030GNMIR169
985SSMAR163
991GSMKI159
1094EHMKQ149
982ENMLR149
1016DGMKR144
1012GNMAQ139
1095SGMQR138
1084SNMLR133
1061GNMVR130
1001ANMGR129
1096HNMRR129
1050NNMGR128
1081GGMRM127
1033GGMRL124
1097QNMER124
1057DSMKR122
1035TNMRR122
1008GAMRI115
1058SNMAR115
1056TAMKV114
1098VSMKR113
966NGMRI112
1099TNMMR110
1013EGMKG109
1071NAMRQ108
123EHMKR107
1032NHMSR106
1100GAMRM102
1070EAMSK100
1101TAMNQ99
1102ESMSR96
1103GGMNQ95
1048QSMRE95
185NAMRG92
1104GGMKR89
184SNMVR84
1105ESMRL83
1075ESMKE81
1106SAMRE80
1107GGMQM76
1023TGMKR73
1037ENMMR69
1108NSMKM69
1109ESMKN66
1072EGMRT64
987GGMKL64
1110TNMSR63
1111DAMRV61
1112GNMER60
1113GAMRE59
182GNMAR54
1114EGMRK53
1011GSMKM50
1115SGMAR50
TABLE 9
ZF6 selection on C:G change at nt 5 of
core motif in CBS.
Sequences reflect position 2 to 6.
SEQ
ID
NO:Sequence# Read
34EHMRR3207
955GHMRR2397
957THMRR2025
956GHMNR1880
33THMKR1415
35THMNR1341
958GHMKR1208
978EHMSR1038
127EHMAR927
962SHMKR771
959EHMQR764
126EHMNR676
146SHMNR646
147SHMRR579
123EHMKR511
1029EHMTR460
963NHMRR436
992EKMKE381
32DHMNR374
981NHMKR342
983SPMGV322
977GHMTR318
1028NHMNR285
1116DHMKR264
969GHMSR258
1025AHMNR247
989EHMHR232
974AHMKR227
31EHMGR210
1117GHMHR129
1118THMKV129
1020THMKL117
1006TPMKV110
1000GHMQR105
1119DHMRR105
990THMSR97
1120AHMRR92
1121EKMRE86
1122GHMAR84
1074NGMKR81
1123VHMNR77
1052IDMKG72
1124NHMTR65
1032NHMSR64
964EGMRR57
1125THMTR57
1126GHMKI56
1073SHMQR52
1127EHMVR43
1086SHMTR43
1128TKMKE42
1129EHMER38
1130THMKT37
1043GDMDR36
1131NGMRR35
1132EPMLM34
1133GHMVR31
1134THMRT29
968ESMRR28
1135PHMKR26
1136EHMRQ24
1137EHMRT23
1138DHMSR22
1039QKMRT22
1139ETMMI21
1034SPMKV21
1140SHMKL21
1141TPMKL21
1142GHMKM20
965GNMGR19
1143RQMLI19
1144GHMRM18
1145EGMKR17
1146EHMKA17
1147QIMPL17
1148SHMKV16
1149SGMNR16
1150THMAR16
1151QGMKR15
960SAMRR14
1152TKMEG14
1153RPMGR14
1154VHMRR13
1155THMRV13
1068THMEV12
1156NHMKS11
1049THMKG11
1157AAMST11
980TNMLR11
996ENMER10
1158GKMRD10
1159THMEL10
998SGMKR10
1160TPMRV10
1161SPMRV10
1104GGMKR10
967EGMAR10
1162THMGV9
971TAMRR9
995DNMRR9
966NGMRI9
961ENMGR9
1163MGMGR8
973VNMRR8
1164GKPSM8
975NGMTA8
1165SHMRV8
1166SPMNR8
1167SAMNR8
1168SHMSR8
1169NGMPR8
972TNMQR8
1170SPMRR8
994QNMVR8
970EGMHR8
1017DNMAR7
1026QNMGR7
1171GHMGV7
1172THMRL7
979GSMRR7
1173QHMKR7
1174THMGR7
976DGMRR7
1175THMQR6
1038THMER6
1021ENMAR6
1176RHMKR6
1018SSMRR6
1177EHMRV6
1178KHMKR6
1179QHMNR6
1180RAMKV6
993NGMAR6
984TNMGR6
1002DNMVR6
1066ESMRE6
1181GHMRV6
982ENMLR6
185NAMRG5
1014SSMKI5
1182TPMGV5
1040GAMRR5
1183GHMKV5
1184RHMNR5
1009ANMNR5
1185TPMEL5
1022NNMVR5
988SGMVR5
1186SPMKL5
1187SPMKR5
1035TNMRR5
1082PNMKR5
1188LAMEE5
1044GAMRA5
1100GAMRM5
1046EGMGR5
1033GGMRL5
1189PGMMS5
986GGMRR5
991GSMKI5
1089DNMGR5
183EGMTR4
1190SHMEV4
1004GNMSR4
1191GMMLT4
1003QAMRE4
997NSMRR4
1087TGMRR4
1192TPMKG4
1041TPMEV4
1193THMHR4
1194SHMGV4
1063DGMKI4
1016DGMKR4
1195THMKS4
1196THMRG4
1197GHMKT4
1015TSMRR4
1019GNMMR4
999ANMQR4
1079GSMRA4
1036GNMRE4
1083NGMKI4
1008GAMRI4
1050NNMGR4
1198THMRS4
1013EGMKG4
1199NHMQR4
1007SNMGR4
1200SHMAR3
1061GNMVR3
1201EAMKR3
1202GSMRE3
1203SPMEL3
1204AHMAR3
1057DSMKR3
1205PPMMV3
1027TNMVR3
1096HNMRR3
1206KHMNR3
1030GNMIR3
1084SNMLR3
1207TPMKR3
1208QSMKR3
1209RHMRR3
1075ESMKE3
1210DHMQR3
1056TAMKV3
1211AHMSR3
1212EHMRS3
1213AHMTR3
1214GHINR3
1048QSMRE3
1093TNMAR3
1215EYMRR3
1216GQMNR3
1217GHMKE3
1011GSMKM3
1064SHMHR3
1059ESMGR3
1005ESMQR3
1051GHMNS3
1058SNMAR3
1012GNMAQ3
1023TGMKR3
1031SGMRR3
1001ANMGR3
987GGMKL3
1218EHMMR2
1219SHMRL2
1072EGMRT2
1107GGMQM2
1220GGMKA2
1070EAMSK2
1221EHMPR2
1222AHMKS2
1223AHMQR2
1224GHTRR2
1225GHMKG2
1226EPMKV2
1227EHMAK2
1228GYMNR2
1229THMSS2
1230GDMNR2
1231GHMRT2
1094EHMKQ2
1232QRMGV2
1233GSMRQ2
1234DHMTR2
1235VEMER2
1236SPMEV2
1237GPMKV2
1238TPMER2
1239EHMDR2
1240EHVRR2
1091DNMKI2
1241GGMAR2
1242HHMKR2
1243GHMRS2
1244EYMAR2
1245KHMRR2
1246EHMSS2
1247TPMRL2
1248GHMSL2
1249VHMKR2
1250GHTNR2
1251GPMRT2
1081GGMRM2
1092TNMER2
1109ESMKN2
1252EQMRR2
1053ESMTR2
1253EHMKS2
1254THMKM2
1065GAMKE2
1024TAMKR2
1010DNMMR2
985SSMAR2
1037ENMMR2
1255GTMKM1
1256VHRIR1
1257DHMNK1
1258TPMNM1
1259RQMII1
1260EHMRW1
1261SPMRL1
1262GVMRA1
1263GHMQV1
1264GPMKL1
1265IDMKR1
1266PGMMG1
1267KHMER1
1268TPMNV1
1269EHVQR1
1270ENMKE1
1271DHMKM1
1272SHMNQ1
1108NSMKM1
1273GLMKR1
1274APMNL1
1275RHMSR1
1276EHMRG1
1277DWMRR1
1278GHMRH1
1279QNMHR1
1280CHMRR1
1281ERMRR1
1282EHMKE1
1283EPMKR1
1284AHINR1
1285SHMRT1
1286PHMNR1
1287AHMKV1
1288THMGM1
1289NGMKM1
1290EKMKR1
1291EHMIR1
1292NNMHR1
1293GNMNR1
1294KRMQR1
1295EKMRR1
1296TQMKQ1
1297EHMKV1
1298DHMKE1
1299EHTTR1
1300SPMRM1
1301GKMNR1
1302TNMKR1
1303THKRR1
1304SQTNR1
1305THLKR1
1306SHMQS1
1307THMSV1
1308THMRH1
1309DPMKV1
1310PHMMS1
1311SHVKR1
1102ESMSR1
1312SHMGL1
1313TDMVA1
1314PQMMS1
1315KHMQR1
1316EHMQL1
1317EHISR1
1318SHMKK1
1319EQMTR1
1320TPMRG1
1321GHISR1
1322GPMGV1
1323GYMRR1
1324GHMTV1
1325APMIM1
1326THINR1
1327DHMMS1
1328GHMKL1
1329EKMEE1
1330DPMRM1
1331SHMKT1
1332SPMGL1
1333SPMGE1
1334DHISR1
1335TPMKQ1
1336GHMKW1
1337EHMCR1
1338NNMKR1
1339ESMKR1
1340TEMLI1
1341SHMKM1
1342EHVNR1
1343GHMER1
1344NHMDR1
1345GHMWR1
1346THMKI1
1347QKMKE1
1348THMNK1
1349AHMKQ1
1350DHMGR1
1351EGMKW1
1352TQMKE1
1353TRMRR1
1354AHMGR1
1355TRMKR1
1356KNLTR1
1357PEMMS1
1358EHLTL1
1359RHMKV1
1360PGMIR1
1361THTKR1
1362EHIRR1
1363THMPR1
1364GKMKQ1
1365GPMRV1
1366AHVNR1
1367EPMSR1
1368PRMMV1
1369ELMSR1
1090ETMRR1
1370SNMNR1
1371TSMKT1
1372GNMHR1
1373TQMRR1
1374SHMKG1
1375DHMRT1
1376EHMRE1
1377SQLNR1
1378SHMGR1
1379GHKNR1
1380THMNL1
1381GYMKR1
1382SNMKV1
1383GHMRC1
1384NHMRV1
1385SGMKT1
1386EHLRR1
1387VPMRR1
1388DLMKR1
1389TSMKL1
1390APMTV1
1105ESMRL1
1391EHMLM1
1392EKMNR1
1393THRRR1
1111DAMRV1
1394ERMNR1
1395NHMHR1
1396DLMNR1
1397GQMQR1
1398RGMMI1
1399TQMKR1
1400EHMGV1
1401AHMTQ1
1402TPMMV1
1403GHKRR1
1404GPMER1
1405EPMQV1
1101TAMNQ1
1406GDMRR1
1407EHLKR1
1408DHMKK1
1409GDIDR1
1410GHMKK1
1411TQMMI1
1412SGMKA1
1413TPMRM1
1414SPMKG1
1415KQLNR1
1416NHMKT1
1417TKMRE1
1098VSMKR1
1418EHMAV1
1419EHMNS1
1420DHMHR1
1421AHMVR1
1422GRMRR1
1423GHMNV1
1424GHMNL1
1425GHVSR1
1426GQMHR1
1427EKMAR1
1428NHMGL1
1429EHMKG1
1430EPMAL1
1431AHLTR1
1432KHMTR1
1433GHMTM1
1434EPMSG1
1435NHMNM1
1436GQMKR1
1437TPMEG1
1438KHMRV1
1439SLMKR1
1440DGMRN1
1441RQMHI1
1442EPMRV1
1113GAMRE1
1443SHMRM1
1444EQMAR1
1445SHMRS1
1446EHMQV1
1447EPMPM1
1448IDMNR1
1449TKMKQ1
1450RQMLS1
1451ATMML1
1452PQMMI1
1453NAMKI1
1454GHMQS1
1455EAMKK1
1456THMRK1
1457PHMRR1
1458GHMKA1
1459AHMNH1
1460EYMSR1
1461EHMAW1
1462NHMGR1
1463GHMKS1
1464EHMRL1
1465ENMTR1
1099TNMMR1
1466QAMRV1
1467EHMQP1
1468THMSM1
1469IDMKE1
1047EGTHR1
1055HNMMR1
1045PNMSR1
184SNMVR1
1062ANMRR1
1042GGMRE1
1060EAMRR1
1067GSMLR1
1054SHMKI1
1076ANMHR1
1069TSMGR1
1077DHTKR1
1078NGMRE1
1071NAMRQ1
1080EGMNQ1
1085SNMRR1
1088SGMRI1
TABLE 10
ZF6 selection on A:C change at nt 6 of
core motif in CBS.
Sequences reflect position −1 to 3.
SEQ
ID
NO:SequenceRead #
37HRES6362
36MNES5959
1470VKES3337
1471LRDS2986
1472HLES1799
1473TRES1285
1474MREA648
1475VRET601
1476MRET284
1477LLES222
1478MRTS192
1479ERKS122
1480IKES111
38RPDT95
1481VRVT61
1482RNES51
1483HVES41
98RTET40
1484LSHT33
1485RPES33
1486SRES32
1487ENKA25
167RADN24
1488TREN23
1489DSPQ21
1490RRES20
1491RGEN17
1492VRES17
1493HRDS15
1494HREA15
1495LRDT15
1496RVES15
1497EKKS14
1498GRES13
1499RMES13
1500LRES12
1501RTDN12
1502HADH12
1503VNES12
1504ANES12
112RTEN12
1505RNEH11
1506MNET11
1507RLDT11
99RADV10
1508RLET9
1509HRET9
HMR...9
1510NRES8
1511TGEA8
1512TGES8
1513RHET8
1514MRES7
172RNDT7
1515LVES7
1516VGSS7
40RHDT7
1517RIDT7
1518VREA6
1519HMES6
1520ERKN5
1521RPEA5
1522TPPI5
1523RREA5
1524RQEN5
1525VKDS4
1526RKES4
1527MLGL...4
1528DRPN4
1529RKEA4
1530VMLGL...4
1531TRDS4
1532HLET4
1533HLDS4
1534PPAT4
1535ENAS4
1536VKET4
1537GREA4
1538TREA4
H...4
1539IRDS3
1540MNDS3
1541LLDS3
1542RTES3
1543RPET3
1544IDVH3
1545RTEH3
1546TRET3
1547HGES3
1548TMES3
1549LRVS2
1550PREA2
1551EGKN2
1552TSES2
1553VKFGHIFCVL2
L*NV...
1554YRES2
1555MKES2
39RTDI2
1556MNEG2
1557MIES2
1558QRES2
1559MMEA2
1560MNER2
RGS2
171RTSS2
1561RNAS2
1562RTDT2
1563TRVS1
1564TFNV1
1565VRVS1
1566FRDS1
1567IKER1
1568RLEN1
1569IKET1
1570HRVS1
1571DRKG1
1572VKEC1
1573MSEA1
1574LRDR1
1575INES1
1576MSES1
1577NLES1
1578LQDS1
1579HAPT1
HRR...1
1580HRKA1
1581LRGS1
1582QSGT1
1583HQES1
1584ETGS1
SGT...1
1585MLGF...1
1586MNGS1
1587MRED1
1588TKES1
1589RPDH1
1590HRGS1
1591GNES1
1592LWDS1
1593MRDS1
1594IHES1
1595LRDG1
1596LRDC1
1597MYES1
1598RPNI1
1599EGRS1
TRR...1
1600RLES1
1601LGLPTGR...1
1602ARES1
1603HLGS1
1604HSES1
1605PRTS1
1606MNKS1
1607RRDS1
1608RREN1
1609QGES1
1610LREA1
1611LLET1
1612MREV1
1613VEES1
1614MNEA1
1615RNEN1
1616HWES1
1617RHEA1
1618MTES1
1619GRDS1
1620VSET1
1621MRKA1
1622FKES1
1623ERKG1
VKR...1
1624RNDH1
1625VPDA1
TGR...1
1626RKDA1
1627SPDT1
1628TTTL1
1629RKDS1
1630RRLT1
1631RTSN1
LRT...1
1632RQSA1
1633ARFT1
1634DRKS1
169RRDT1
1635RMDS1
1636HRKS1
1637GTTP1
1638DKRN1
1639RPERE...1
1640SGDS1
TAG1
GR...1
T...1
1582...QSGT...0
TABLE 11
ZF6 selection on A:G change at nt 6 of
core motif in CBS.
Sequences reflect position −1 to 3.
SEQ
ID
NO:Sequence# Reads
38RPDT6216
1482RNES2750
98RTET1736
1485RPES1565
167RADN1412
112RTEN973
1499RMES860
1507RLDT734
1490RRES690
1501RTDN588
1496RVES584
1505RNEH575
1517RIDT557
1521RPEA516
1491RGEN467
99RADV455
172RNDT452
1513RHET413
1529RKEA340
1508RLET297
1543RPET263
1523RREA252
40RHDT247
37HRES239
1526RKES231
1524RQEN199
1641RGSA186
171RTSS154
39RTDI152
1479ERKS123
36MNES104
1561RNAS90
1608RREN88
1642RLDP82
169RRDT80
1545RTEH80
1626RKDA63
1470VKES61
1643RRET53
1471LRDS44
1562RTDT36
1568RLEN35
1564TFNV29
1644RADT28
1472HLES28
1473TRES27
1645RKET24
1646ATNM23
1647RREH22
1648RTDH21
1632RQSA21
1542RTES20
1649RNET20
1650RPDN19
1651THVP19
1633ARFT18
1487ENKA18
1637GTTP17
1652EASN16
1653RMEG14
1654RTAA14
1589RPDH14
1627SPDT14
1489DSPQ14
1497EKKS13
1474MREA13
1655RNEP12
1656VHDN12
1657RKEN12
1658RPYT12
1659RQES11
1660RSGS11
1661RPDS10
1475VRET10
1662MTGN7
1530VMLGL...7
1615RNEN7
1663RGET6
1664RKGS6
1600RLES5
1476MRET5
1624RNDH5
1665RNDS5
1666STET5
1537GREA5
1667SNES5
1668RPDA4
1669RNER4
1670RPEN4
1671RVET4
1672RAET4
1673SHET4
1674RSDT4
Q...4
1535ENAS3
1675LPDT3
1676MMES3
1677SPES3
1678RMEN3
1679RVEI3
1607RRDS3
1680RMET3
1681SADN3
1682RAES3
1683RPDV3
1684RTEA3
1685RHES3
1686RQEA3
1478MRTS3
1520ERKN3
1687RNRS2
1688RAEA2
1689RVDN2
1690RNEG2
1691RVEG2
1692RAEN2
1693RVDT2
1694RDDN2
1695RLEA2
1696RPNT2
1697RGES2
1698SPEA2
1699RTAG2
1700MKEA2
1486SRES2
1701WNES2
1591GNES2
1629RKDS2
1628TTTL2
1702RVEN2
1635RMDS2
1703RMEH2
1630RRLT2
1704RKEH1
1705ENRS1
1706RNKS1
1707RPGE...1
1708RKDT1
1625VPDA1
1709RGEA1
1710WIDT1
1711RNEY1
1712RADI1
1713RADY1
1714RTDD1
1715RVDS1
1716HTET1
1717HTEN1
1718SGEN1
1719RTST1
1720RAGR...1
1721SNAS1
1722RPGT1
1723RAEH1
1724MHDT1
1725REDN1
1726REEV1
RRR...1
1727RMEW1
1728RRER1
1729RLDN1
RPT...1
1730MVES1
1510NRES1
1731RIPA1
1732RMEA1
1733RHNT1
1734RNSS1
1735LPES1
1736SLDP1
1737STEN1
1738RPKS1
ATS...1
1739MIDT1
1740PPDT1
1741GLDA1
1742RPEGE...1
1743RHYT1
1744RTEI1
1745SPEN1
APR...1
LSL...1
1746RHEN1
1747REDV1
1748RLKT1
1749RIET1
1750RIES1
1477LLES1
1751RPDI1
1752MNDT1
1753RLYT1
1504ANES1
1754RAYN1
1755RADS1
1756KNES1
1757RVSA1
1758RPED1
1759RGEH1
1728RRER...1
1760LTET1
1761LADN1
GTR...1
1762RPER...1
1763MLGLPGTR...1
1764RPDP1
1765QADV1
1599EGRS1
RGR...1
1766MADV1
1767HTDN1
1768RKEV1
1769RADA1
1770RDAS1
1771MLDT1
1772RPGS1
1773RTEY1
1774SLDT1
1775RWES1
1776ERKA1
1777RIYT1
1778TPVP1
1779RQDA1
1780RMER1
1631RTSN1
LRT...1
1559MMEA1
1481VRVT1
1634DRKS1
1488TREN1
1636HRKS1
1500LRES1
1639RPERE...1
1638DKRN1
1781VGTV1
1582...QSGT...0
TABLE 12
ZF6 selection on A:C change at nt 6 of
core motif in CBS.
Sequences reflect position −1 to 3.
SEQ
ID
NO:Sequence# Reads
37HRES7487
1479ERKS7125
1489DSPQ876
1487ENKA801
1497EKKS508
1473TRES141
38RPDT126
1520ERKN120
1537GREA112
1535ENAS103
1471LRDS95
36MNES89
1504ANES84
1571DRKG73
1634DRKS72
1599EGRS69
1584ETGS67
1482RNES60
1470VKES57
1486SRES50
98RTET42
1625VPDA39
1630RRLT37
167RADN30
1485RPES30
1782ERGG27
1472HLES25
1638DKRN25
112RTEN21
1628TTTL19
1636HRKS19
1490RRES19
1499RMES18
1551EGKN17
1623ERKG16
1491RGEN16
1705ENRS15
1498GRES15
1501RTDN15
1507RLDT13
1496RVES13
1517RIDT13
1510NRES13
1505RNEH12
1783EKGT11
1513RHET11
1474MREA10
1543RPET9
QGK9
1519HMES9
1475VRET9
99RADV9
HMR...9
1784ERNS8
1524RQEN8
172RNDT8
40RHDT8
1493HRDS7
171RTSS7
1529RKEA7
1785ENNS6
1776ERKA6
1523RREA5
RGS5
QEK...5
1478MRTS5
1500LRES4
1526RKES4
1786HREN4
1521RPEA4
1547HGES4
39RTDI4
1508RLET4
1477LLES3
1626RKDA3
1476MRET3
1590HRGS3
1787ERKR3
1561RNAS3
1788ERKI3
1789ERRS2
1642RLDP2
1604HSES2
1790YSPQ2
1791EGKS2
1792HRER2
QVK...2
1793DRKA2
1794ESGN2
QG...2
1795ERES2
1796HKES2
1797ESKS2
1558QRES2
1798EMKS2
1627SPDT2
169RRDT2
1527MLGL...2
1633ARFT2
1562RTDT2
1799KRKS1
1652EASN1
1800TGDA1
1801NRKS1
RGK1
1802EKNS1
HRE...1
1803QGKS1
1662MTGN1
1804DSTQ1
TGE...1
1805VRKS1
1509HRET1
1806ENKV1
1568RLEN1
1732RMEA1
1494HREA1
1692RAEN1
1774SLDT1
R...1
1512TGES1
1644RADT1
QAK...1
1807DIPQ1
QGT...1
1808ERKC1
1809HSPQ1
1542RTES1
1538TREA1
1810RTAT1
QGR...1
1811TRKS1
1812GRKS1
1813ESKA1
ERK...1
1554YRES1
1814EKRN1
MGK...1
1815DSPH1
1816ERNG1
1817VSPQ1
QWK...1
1818EKKC1
1601LGLPTGR...1
1819ERNN1
1643RRET1
1820TNES1
1821HRKN1
RLF...1
1822DKSN1
1823DRNS1
KRN1
1824ERMS1
1608RREN1
1825EIAS1
1826HREC1
1827ERKT1
1828ETGN1
1632RQSA1
1631RTSN1
1635RMDS1
1545RTEH1
1559MMEA1
1629RKDS1
LRT...1
1481VRVT1
1488TREN1
1639RPERE...1
1637GTTP1
1640SGDS1
1582...QSGT...0
TABLE 13
ZF5 selection on G:T change at nt 7 of
core motif in CBS.
Sequences reflect position 2 to 6.
SEQ
ID
NO:Sequence# Read
165TRLKE2129
42HRLKE1938
44SRLKE1530
110TRLRE1078
1829HRLRE1073
47NRLKE1015
1830QRLRE769
1831DALKR700
109DGLKR681
1832SRLRE534
43HALKV389
94NRLKV381
93ERLRV375
1833DGLKK374
41HGLKV335
1834HRLKV315
1835ERLRM295
1836QRLKE243
1837DGLVR235
46HTLKV233
1838NRLRE195
1839ARLRE168
108DALRR168
1840ERLRQ141
1841ARLKE135
1842TRLRD125
1843DGLRR118
1844SRLNE118
1845TGLKV92
1846HRLSE91
1847HRLNE78
1848SHLKV75
1849TTLKV75
1850HRLGE68
1851STLKV66
1852DGLKV65
1853DGLRK61
1854HRLTE60
1855DRLKV59
1856HSLKV56
45DGLRV47
1857SRLKV45
1858QRLKV44
1859HGLTV43
1860HRLME43
1861RLLPN42
1862ERLKV41
1863NRLRV35
1864TRLKV34
1865DGLKE29
454DTLKV29
1866HGLRV29
1867SALKT28
1868HRLAE25
1869ERLIS23
1870DGLTR22
1871DALVR21
1872HRLKR21
1873ERLRE20
1874HQLKV20
1875TTLKQ18
1876SRLKR17
1877DRLKQ16
1878HRLRV16
1879TRLKR16
1880TRLNE16
1881NRLKQ15
1882TRLKD14
1883TRLRV14
1884EALKR13
1885HTLKQ13
1886NALKV13
1887SALKV13
1888SRLKD13
1889DGLRE12
1890ERLKE12
488DTLKQ11
1891HKLKV11
1892GTLKV10
1893ERLRR9
1894HALKT9
1895HGLKE9
1896HHLVQ9
1897NGLKV9
538DALKE8
1898DALKV8
1899HALKE8
1900HHLKQ8
1901HHLKV8
1902TRLKK8
1903DRLRT7
1904DRLRV7
371DTLRV7
1905HRLKK7
262HTLKE7
1906NRLKK7
235STLKE7
1907SRLIE6
1908TRLME6
1909ATLKV5
1910HGLVV5
1911HRLRM5
1912HRLRQ5
1913HTLKA5
1914NRLRD5
1915TGLKE5
1916TGLKT5
1917TRLRQ5
1918TTLKI5
1919TTLRV5
1920DRLKE4
1921HRLKA4
1922HRLKD4
1923HSLKE4
1924NRLKI4
1925NRLKR4
1926STLKA4
548STLKQ4
1927TRLKA4
1928TRLKQ4
1929TRLRR4
447DTLKA3
1930HALKR3
1931HGLKA3
1932HGLKR3
1933HPEG...3
1934HRLK...3
1935HRLRK3
1936HTLRV3
1937NTLKQ3
1938QRLRV3
1939SRLME3
1940SRPKE3
1941TQLKV3
1942TRLQE3
1943TRLR...3
1944ARLKR2
1945ARLKV2
1946ARLR...2
1947ARLRV2
1948ARLVR2
1949DALKK2
1950DALRV2
1951DAPKR2
1952DRLRE2
1953EGLKV2
1954ERLLV2
1955ERLRA2
1956ERMRM2
1957GGLKV2
1958GGLVT2
1959HALRE2
1960HGLRE2
1961HHLKE2
1962HILKA2
1963HRLQE2
1964HRLRR2
1965KRLKE2
1966KTLKQ2
1967NALKE2
1968NRLNE2
1969NTLKV2
1970QRLKR2
1971QRLRQ2
1972QSLIA2
1973QTLKV2
1974RKLRS2
1975RRLRE2
1976SALKE2
1977SRLKK2
1978SRLRK2
1979SRLRV2
297STLRV2
1980TMLKE2
1981TRLKG2
1982TRLRM2
1983TRLTE2
1984TRRKE2
1985AALKR1
1986AGLKR1
1987AGLKV1
1988AGLVR1
1989ARLGE1
1990ARLME1
1991ARLNE1
1992ARLRD1
1993ARLRM1
1994CRLKE1
1995DALDR1
1996DALKT1
1997DALKW1
1998DALRK1
1999DALTV1
2000DELKR1
2001DELPG1
2002DGLK...1
2003DGLKG1
2004DGLKW1
2005DGLLR1
2006DGLRQ1
2007DGLTV1
2008DGLVW1
1016DGMKR1
2009DKLKQ1
2010DKLRQ1
2011DRLRK1
2012DTHAG...1
2013DTLKT1
2014DVLKK1
2015EAAG...1
2016EHLRQ1
2017ELLKV1
2018EPLRV1
2019ERLCV1
2020ERLKK1
1893ERLRR...1
2021ERLVR1
2022ERLWE1
2023ERPRM1
2024ERPRV1
2025ERQRM1
2026GGLKQ1
2027GGLKR1
2028GMLKV1
2029GRLKE1
2030GTLKQ1
2031HALKA1
2032HALKG1
2033HALPV1
2034HAPEV1
2035HGLKK1
2036HGLKQ1
2037HGLMV1
2038HGLPV1
2039HGLRD1
54HGLVR1
2040HGQKE1
2041HGRKV1
2042HGRRG1
2043HHLRV1
2044HILIA1
2045HKLKE1
2046HKLRV1
2047HMLKR1
2048HMLRE1
2049HNLKV1
2050HPLKV1
2051HQLKE1
2052HQLRE1
2053HQLRV1
HR*A...1
2054HRGCG...1
2055HRLDE1
2056HRLIE1
2057HRLKF1
2058HRLKG1
2059HRLKL1
2060HRLMV1
2061HRLN...1
2062HRLR...1
2063HRLRA1
2064HRLS...1
2065HRLVR1
2066HRMRE1
2067HRPKE1
2068HRPNE1
2069HRQRE1
2070HRRKE1
2071HRRME1
2072HRRRE1
2073HRVRE1
2074HSACG...1
2075HSLNV1
2076HSLRV1
2077HTLAQ1
2078HTLNV1
2079HTMKV1
2080HVLKV1
2081HWLRE1
2082KGLKQ1
2083MHLRS1
2084MRLRE1
2085MRLRM1
2086NALKR1
2087NGLKE1
2088NLLRE1
2089NMLKE1
2090NMLNV1
2091NPLRE1
2092NRFKE1
2093NRLIE1
2094NRLKA1
2095NRLKF1
2096NRLKL1
2097NRLKT1
2098NRLME1
2099NRLND1
2100NRLNV1
2101NRLQE1
2102NRLR...1
2103NRLRM1
2104NRLRQ1
2105NRMKE1
2106NRPKE1
2107NRPKV1
2108NRQKE1
2109NSLKE1
2110NTLTV1
2111PRLKE1
2112PRLLP1
2113PRLRE1
2114PRLTE1
2115QAEG...1
2116QRLIS1
2117QRLKK1
2118QRLME1
2119QRLRG1
2120QRLRM1
2121QRLTE1
2122QTA*R...1
2123QTAW...1
2124QTG*S...1
R...1
2125RGLKV1
2126RRLGD1
2127RRLKE1
2128RRLNE1
2129RRLTK1
2130SALKK1
2131SALKR1
2132SCLKE1
2133SGLAM1
2134SGLAV1
2135SGLKV1
2136SHLKE1
2137SKLKV1
649SNLKV1
2138SQLKV1
2139SRLIG1
2140SRLK...1
2141SRLKA1
2142SRLKG1
2143SRLQE1
2144SRLR...1
2145SRLRA1
2146SRLRM1
2147SRLRQ1
2148SRLTE1
2149SRQRE1
2150SSLKE1
2151SSLKV1
2152SSQRE1
2153STLKR1
TAG...1
2154TGLKG1
2155TGLKQ1
2156TGLKS1
2157TGLRV1
2158TGRRG1
2159TLLRE1
2160TMQKE1
2161TRL*L1
2162TRLAE1
2163TRLE...1
2164TRLEE1
2165TRLGE1
2166TRLK...1
2167TRLKY1
2168TRLRG1
2169TRLRK1
2170TRLSE1
2171TRPKE1
2172TRQRD1
2173TRRRD1
2174TRVRE1
2175TSLRE1
2176TTLKA1
2177TTLKE1
2178TTLKL1
2179TTLKT1
2180TTPRG1
2181TTRKQ1
2182TWLRE1
2183VRRKV1
2184YGLKR1
2185YRLKE1
2186YTLKV1
TABLE 14
ZF5 selection on G:C change at nt 7 of
core motif in CBS.
Sequences reflect position 2 to 6.
SEQ
ID
NO:SequenceRead #
44SRLKE2533
165TRLKE2146
42HRLKE1984
47NRLKE1528
1829HRLRE1001
1832SRLRE799
110TRLRE625
46HTLKV499
41HGLKV320
1830QRLRE299
1851STLKV249
1841ARLKE238
1836QRLKE135
235STLKE126
1849TTLKV102
447DTLKA95
1891HKLKV87
454DTLKV84
43HALKV82
1962HILKA80
1845TGLKV80
1839ARLRE78
1850HRLGE75
1838NRLRE75
1854HRLTE61
1861RLLPN55
1852DGLKV50
1834HRLKV46
1856HSLKV43
1931HGLKA37
94NRLKV30
1901HHLKV27
1972QSLIA26
371DTLRV25
1864TRLKV25
2177TTLKE25
262HTLKE24
1888SRLKD23
1948ARLVR20
2187SKLKE20
1855DRLKV19
93ERLRV19
1857SRLKV19
1831DALKR18
109DGLKR18
2029GRLKE18
1892GTLKV18
1842TRLRD17
1913HTLKA16
1868HRLAE15
488DTLKQ14
1895HGLKE14
2188HILKT14
1974RKLRS14
2133SGLAM12
1875TTLKQ12
1926STLKA11
1833DGLKK10
2126RRLGD10
1882TRLKD10
2189TSLKV10
1837DGLVR9
1835ERLRM9
1961HHLKE9
1896HHLVQ9
1847HRLNE9
1885HTLKQ9
1880TRLNE9
2190HRLHE8
1848SHLKV8
2191SKLRM8
45DGLRV7
1862ERLKV7
2192GTLRV7
1921HRLKA7
2193HTLKS7
1844SRLNE7
1915TGLKE7
108DALRR6
2194HGLKT6
1859HGLTV6
2045HKLKE6
1860HRLME6
1887SALKV6
1909ATLKV5
2195DTLKE5
2196GILND5
2135SGLKV5
2141SRLKA5
1871DALVR4
2197ETLKV4
1846HRLSE4
1923HSLKE4
1936HTLRV4
1969NTLKV4
1858QRLKV4
2140SRLK...4
2198THLKE4
1928TRLKQ4
1945ARLKV3
1853DGLRK3
1843DGLRR3
1840ERLRQ3
1957GGLKV3
1960HGLRE3
1900HHLKQ3
1965KRLKE3
2199NALRV3
1897NGLKV3
2200NRLGE3
1906NRLKK3
1975RRLRE3
2132SCLKE3
2137SKLKV3
2201SRLRD3
1979SRLRV3
548STLKQ3
1927TRLKA3
1942TRLQE3
2186YTLKV3
2202APLLR2
2009DKLKQ2
2203DKLKV2
1920DRLKE2
1873ERLRE2
1899HALKE2
2043HHLRV2
2051HQLKE2
2204HRLEE2
1878HRLRV2
2205HTLKG2
1966KTLKQ2
2206MVLVV2
2094NRLKA2
2207NRLKD2
1881NRLKQ2
2101NRLQE2
2108NRQKE2
2208NTLKA2
1938QRLRV2
1973QTLKV2
2127RRLKE2
2209SRLKQ2
2151SSLKV2
553STLRQ2
297STLRV2
1983TRLTE2
2175TSLRE2
1987AGLKV1
2210AQMKE1
1991ARLNE1
1992ARLRD1
2211ARRRE1
2212CRLM...1
2213CRLMV1
538DALKE1
1898DALKV1
2001DELPG1
1865DGLKE1
2010DKLRQ1
2214DRLKA1
2215DRLKT1
1952DRLRE1
1903DRLRT1
2013DTLKT1
2216DTPKA1
1869ERLIS1
1893ERLRR...1
2023ERPRM1
2026GGLKQ1
2028GMLKV1
2217GRLKA1
2218GRLKV1
2030GTLKQ1
2219GVLKE1
2220GVLTG1
2221HALDV1
2031HALKA1
2222HELKV1
2223HGLEA1
2036HGLKQ1
2224HGLRG1
2225HGMKA1
2226HGPKV1
2044HILIA1
2227HILKE1
2228HILKV1
2229HILNA1
2230HKLKG1
2231HKLKQ1
2046HKLRV1
2048HMLRE1
1933HPEG...1
2232HPLKE1
1874HQLKV1
2233HRLGV1
1922HRLKD1
2058HRLKG1
2059HRLKL1
1872HRLKR1
2234HRLLE1
2235HRLQG1
2063HRLRA1
2236HRLRS1
2237HRLTV1
2065HRLVR1
2066HRMRE1
2072HRRRE1
2238HSG*G...1
2239HSLKQ1
2240HSLRE1
2241HSVKA1
2242HTG*R...1
2077HTLAQ1
2243HTLEV1
215HTLME1
2244HTLMV1
2245HTLQE1
2246HTLRQ1
2080HVLKV1
2247IRLKE1
2248IRQEE1
2082KGLKQ1
2249KRLKV1
2250LRLKK1
2251NKLKE1
2252NKLKG1
2092NRFKE1
2253NRLAE1
2254NRLEE1
1925NRLKR1
2255NRLKS1
2097NRLKT1
1914NRLRD1
2256NRLRG1
1863NRLRV1
2257NRLTE1
2109NSLKE1
1937NTLKQ1
2258PAEG...1
2259PPPPE1
2113PRLRE1
2115QAEG...1
2260QGRRE1
2261QRLEE1
2119QRLRG1
2262QSLGR1
2134SGLAV1
2263SKLK...1
2264SMLRE1
2265SRLAE1
2266SRLCE1
2142SRLKG1
2267SRLLE1
2143SRLQE1
2145SRLRA1
1978SRLRK1
1940SRPKE1
2149SRQRE1
2268SRRKE1
2150SSLKE1
2152SSQRE1
539STLRA1
202STLRE1
2155TGLKQ1
2269TGLRE1
2270THLKV1
2271TILYE1
2272TLLKE1
1981TRLKG1
1908TRLME1
1883TRLRV1
2273TRLTV1
2274TRMGE1
2275TRMKQ1
2176TTLKA1
1918TTLKI1
2178TTLKL1
2276YTLKE1
TABLE 15
ZF5 selection on G:A change at nt 7 of
core motif in CBS.
Sequences reflect position 2 to 6.
SEQ
ID
NO:SequenceRead #
46HTLKV3934
41HGLKV2682
1851STLKV2167
1861RLLPN1887
1849TTLKV1471
43HALKV923
454DTLKV888
1875TTLKQ754
1891HKLKV571
1885HTLKQ513
1845TGLKV482
1892GTLKV473
488DTLKQ462
1852DGLKV443
1856HSLKV352
1896HHLVQ298
1901HHLKV259
1834HRLKV210
42HRLKE190
371DTLRV189
44SRLKE186
165TRLKE178
1887SALKV177
1909ATLKV155
1900HHLKQ149
1926STLKA140
1897NGLKV136
47NRLKE124
548STLKQ118
1973QTLKV112
1874HQLKV94
2135SGLKV91
1829HRLRE89
1936HTLRV88
297STLRV78
447DTLKA75
1957GGLKV75
1928TRLKQ75
1966KTLKQ69
2277HTL*A66
1913HTLKA64
1832SRLRE61
110TRLRE58
1937NTLKQ56
2278SKLKQ55
1830QRLRE53
2203DKLKV51
1919TTLRV48
2151SSLKV43
1848SHLKV42
2030GTLKQ40
1864TRLKV40
2270THLKV38
1969NTLKV37
553STLRQ35
2279HALRV34
1931HGLKA33
2009DKLKQ32
109DGLKR29
1953EGLKV29
2197ETLKV29
2280GILKV28
1855DRLKV26
1866HGLRV24
2281SVLKQ23
1831DALKR22
93ERLRV22
2282GQLHV21
2283TTLRQ21
45DGLRV20
2284DTLKN20
2179TTLKT20
2285GVLKV17
2010DKLRQ16
2286GTLKA16
2026GGLKQ15
2036HGLKQ15
2043HHLRV15
94NRLKV15
2192GTLRV14
262HTLKE14
2287SVLKV14
2155TGLKQ14
1835ERLRM13
1838NRLRE13
2137SKLKV13
649SNLKV13
2288TVLKV13
1841ARLKE12
1839ARLRE12
1833DGLKK12
2289HHLRQ12
2205HTLKG12
2080HVLKV12
1917TRLRQ12
2290NTLRQ11
2134SGLAV11
108DALRR10
2291QTLKQ10
2292RTLKQ10
235STLKE10
1987AGLKV9
2013DTLKT9
274HHLVV9
2049HNLKV9
1836QRLKE9
2293STLKG9
2294TVLKQ9
1837DGLVR8
2295GGLVV8
2296HGLQV8
1850HRLGE8
1854HRLTE8
2246HTLRQ8
1857SRLKV8
2297DTLKG7
2298GGLTV7
2299GVLKA7
2031HALKA7
2194HGLKT7
2176TTLKA7
2300GTLRQ6
2301HALKQ6
1844SRLNE6
2302STLKT6
1842TRLRD6
2303ATLKA5
2304ATLKQ5
2305DGLKQ5
1843DGLRR5
1862ERLKV5
2306GTLNA5
2307GVLKN5
1895HGLKE5
1910HGLVV5
2308TTLKG5
1853DGLRK4
1840ERLRQ4
2309ETLRV4
2310HGLKG4
2311HGLNV4
1859HGLTV4
1961HHLKE4
1846HRLSE4
1886NALKV4
484STLTV4
2312VGLGE4
2186YTLKV4
2313AGLAT3
1948ARLVR3
2314D*LPG3
2003DGLKG3
2315DKLRV3
1899HALKE3
1860HRLME3
2239HSLKQ3
2078HTLNV3
2079HTMKV3
2316HTQKV3
2262QSLGR3
1974RKLRS3
474STLNV3
2177TTLKE3
1871DALVR2
2001DELPG2
2317DGLRA2
2318DVLKV2
2319GALRV2
2320GGLVQ2
2321GNLKV2
2322GPLKV2
2323GTLKG2
2324GVLKQ2
2325GVLRV2
678GVLVA2
2032HALKG2
2326HDLKV2
2327HGLEV2
2226HGPKV2
2328HHMVQ2
1962HILKA2
2329HKLKA2
2045HKLKE2
2231HKLKQ2
1921HRLKA2
2330HRLKQ2
1847HRLNE2
2082KGLKQ2
2331KTLKV2
2332PTLKV2
1972QSLIA2
2333RLLPY2
2334RLRPN2
2335RTLAQ2
2336RTLKV2
2337SALTV2
2338STLKL2
1916TGLKT2
2339TKLKQ2
1918TTLKI2
2340TTPKV2
2341AGLAS1
2342AGLKM1
2343APLKV1
1945ARLKV1
1992ARLRD1
2344ATLKG1
538DALKE1
1898DALKV1
2345DELRQ1
2346DGLKA1
1865DGLKE1
2347DGLKL1
2348DKLKG1
1877DRLKQ1
1952DRLRE1
1904DRLRV1
2349DSLKV1
2195DTLKE1
2350DTLNQ1
326DTLQA1
423DTLRA1
533DTLRL1
2351DTLWQ1
2352DTMKV1
2353EGLKQ1
1955ERLRA1
1873ERLRE1
2023ERPRM1
2354ETLKE1
2355ETRRV1
2356GGLAV1
2357GGLRG1
2358GGLRV1
2359GHLKA1
2196GILND1
2028GMLKV1
2360GPLRA1
2361GQQHV1
2362GTLQA1
2363GTPKV1
2364HALES1
2365HALKF1
2366HALMV1
2033HALPV1
2367HAMKV1
2368HARKV1
2222HELKV1
2369HGLKD1
2370HGLKL1
2371HGLKM1
2372HGLKW1
2373HGRKI1
2041HGRKV1
2374HHLAQ1
2375HHLGQ1
2376HHLMQ1
2377HHMKV1
2044HILIA1
2228HILKV1
2230HKLKG1
2378HKLKM1
2379HKLNV1
2380HKLQE1
2046HKLRV1
2381HMLNV1
2382HPLDV1
2050HPLKV1
2383HPLQV1
2384HQLKA1
2385HQLKG1
2386HQLKT1
1868HRLAE1
2058HRLKG1
2059HRLKL1
1872HRLKR1
1912HRLRQ1
2065HRLVR1
2067HRPKE1
2387HSLKA1
1923HSLKE1
2388HSLKG1
2389HSLKL1
2241HSVKA1
2077HTLAQ1
2390HTLAV1
2243HTLEV1
2391HTLKN1
2244HTLMV1
2392HTLNA1
2393HTLQV1
250HTLTE1
2394HTLTV1
2395HTPKV1
2396HTRKQ1
2397HVLKF1
2398HVMKV1
2399HWLKV1
2400KADTV1
2401KGLKG1
2402KRLKQ1
2403KTLAQ1
2404KTLRV1
2405KTLTQ1
2406LHLKV1
2407LTLKQ1
2408LTLKV1
2409MGLKV1
2410MPPK1
2411MRLKQ1
2412NAVTE1
2413NGLKG1
2414NGLKL1
2415NRLKG1
1914NRLRD1
1863NRLRV1
2416NTLRV1
2417PGLKV1
2418QGLKV1
1858QRLKV1
1938QRLRV1
2419QRQRV1
2420QTLKA1
2421QTLKG1
2422QTLKK1
2423QTLKM1
2424QTLMV1
2125RGLKV1
2425RHLVQ1
2426RLLPT1
2427RLLSN1
2428RLMPD1
2429RMLPN1
2126RRLGD1
2430RSLKV1
2431RTLKG1
2432SALKQ1
2433SALRQ1
2434SELKV1
2435SFLKV1
2133SGLAM1
2436SGLKQ1
2437SHLKQ1
2438SKLKA1
2187SKLKE1
1888SRLKD1
2145SRLRA1
556SSLRV1
2152SSQRE1
2439STLKK1
2440STLKM1
385STLMV1
448STLQQ1
554STLTA1
2441STMKA1
2442STMKV1
2443TALKV1
2444TGLKA1
2445TGLKD1
1915TGLKE1
2154TGLKG1
2446TGLMV1
2198THLKE1
2447THLKG1
2448THLKL1
2449THLKQ1
2450THLMV1
64TKLKV1
2451TPLQV1
1882TRLKD1
1981TRLKG1
2452TRLPQ1
1942TRLQE1
2453TTLEV1
2454TTLHV1
507TTLNQ1
577TTLQV1
2455TTLRG1
2456TTLYV1
2457TTMKV1
2458TVLRQ1
2459VGLGG1
2460VTLKV1
TABLE 16
ZF5
selection on G:A
change position 8 of the CBS core motif. 
Sequences reflect position 2 to 6.
SEQ
ID
NO:Sequence# Read
2461GGLRR341
50GGLVR336
2462TGLRR274
2463EGLRR267
1843DGLRR232
2464SGLRR206
2465AGLAR179
2466SGLAR178
2467GGLAR177
55GGLTR168
2468DGLAR152
1986AGLKR148
2469TGLAR135
1837DGLVR129
2470GGLQR127
70GNLTR124
117GNLVR123
2471HGLAR123
2027GGLKR111
2472TGLVR108
2473AGLTR105
2474SGLSR102
2475AGLRR100
2476GGLSR94
59HGLRR91
54HGLVR87
2477SGLTR84
2478NGLVR80
2479AGLQR79
118GNLRR79
2480AGLHR76
2481GNLER76
2482HNLLR76
138GNLAR73
1870DGLTR72
2483HALRR69
2484HGLQR69
2485NGLRR69
2486SGLVR68
2487SNLDR67
68TNLRR66
2488HGLTR63
2489SSLRR63
108DALRR61
2490EGLTR61
2491GGLER61
109DGLKR60
2492TGLQR60
56HTLRR59
1985AALKR58
1988AGLVR55
2493AGLIR54
1932HGLKR54
2494ANLVR53
2495EGLKR53
2496SNLLR51
2497EGLAR50
2498AGLSR49
2499DGLIR48
2500TGLKR48
2501SGLQR46
2502ETLKR45
2503HGLLR45
2504NGLQR45
2505TGLMR45
69ANLRR43
2506DNLVR42
2507TGLLR42
2508DGLMR41
2509ASLKR39
2510QGLRR38
2511TNLVR38
2512NGLTR37
2513SGLDR37
2514SGLHR37
2515TGLNR37
2516TGLSR37
2517GNLLR36
2518NNLVR36
2519TGLIR36
2520DMLRR35
2521GALKR35
2522GNLDR35
2523SALRR35
2524SNLAR35
2525SGLLR34
2526TNLNR33
2527AGLLR31
2528GGLIR31
2529DGLHR30
2530DTLRR30
2531HLLKR30
2532SALAR30
2533SMLAR30
2534VGLKR30
2535DNLLR28
2536GGLMR28
2537SGLMR28
2538AALRR27
2539ETLRR27
2540NGLAR27
2157TGLRV27
53TGLTR27
2541TNLQR27
2542ANLAR26
2543NNLAR26
2544SNLSR26
2545STLSR26
2546AALAR25
2547HALVR25
2548HGLSR25
2549SGLNR25
2550STLAR25
2551ANLIR24
2552DGLDR24
2553DGLSR24
2554GTLKR24
1884EALKR23
2555NGLSR23
2556SMLRR23
2557HNLHR22
2558HNLRR22
2559SGLKR22
2560TGLGR22
2561TNLMR22
1871DALVR21
2562GTLTR21
2563DGLNR20
2564SSLVR20
2565TGLER20
2566DTLKR19
2567GNLSR19
51HGLIR19
2568HSLVR19
2569AGLNR18
2570DALAR18
2571GGLHR18
2572NGLIR18
2573QGLTR18
2574QMLKR18
2575QNLRR18
1845TGLKV18
2576AILKR17
119GNLKR17
139GNLMR17
2577HNLTR17
2578HTLAR17
2579QGLKR17
2580SGLER17
2581SGLGR17
2582SNLVR17
2583EALRR16
2584GTLRR16
2585HGLGR16
2586HTLMR16
2587NTLRR16
2588TGLHR16
2589TSLRR16
2590TTLQR16
2591DNLKR15
2592GALTR15
2593QTLRR15
2594SGLIR15
2595TNLKR15
2596DGLGR14
2597DSLQR14
2598EGLNR14
2599ENLRR14
2600GSLRR14
2601NGLNR14
2602QALKR14
2603SALSR14
2604SSLGR14
2605VNLKR14
66ATLRR13
2005DGLLR13
2606EMLKR13
2607GALVR13
2608GNLGR13
2609GNLQR13
2610HALAR13
2611HSLIR13
2612HTLER13
2613HTLQR13
2614NGLER13
2615NGLMR13
2616QGLVR13
2617TALKR13
2618TTLMR13
2619VGLRR13
2620ANLKR12
2621ANLNR12
2622ATLTR12
2623DNLRR12
2624ENLKR12
2625GGLLR12
2626GTLVR12
2627HNLSR12
2628NTLKR12
2629SALER12
2630SSLTR12
2631TALVR12
52ANLSR11
2632DNLAR11
2633ENLSR11
2634ESLRR11
2635NALRR11
2636NGLKR11
2637NNLLR11
2418QGLKV11
116SNLRR11
2638STLRR11
2639VNLSR11
2640DMLKR10
2641GALRR10
2642GGLDR10
2643HGLMR10
2644HNLVR10
2645HQLIR10
2086NALKR10
1969NTLKV10
2646QNLQR10
1887SALKV10
2647SMLIR10
2648TALRV10
2649TNLAR10
2650TQLKR10
1849TTLKV10
2651TTLTR10
2652VGLQR10
2653AALSR9
2654ATLAR9
2655DALGR9
2656DTLNR9
2657EILKR9
2658ESLKR9
2659GGLNR9
2660GSLTR9
2661HNLAR9
2662MGLKR9
2663NGLHR9
2664NMLKR9
2665PNLKR9
2666SALTR9
2667SDLKR9
2668STLGR9
2669AGLER8
2670DILRR8
2671DMLNR8
2672DTLAR8
2673HALLR8
2674HALSR8
2675HNLGR8
2676NALVR8
2677SMLTR8
2678TALAR8
2679TNLER8
2680TNLGR8
2681TTLNR8
2682DALLR7
2683DSLAR7
2684GTLAR7
2685GTLLV7
2686HALIR7
2687HGLDR7
2688HGLER7
2689HTLLR7
2690NNLIR7
2691NNLMR7
2692QSLKR7
2693SALGR7
2694SALVR7
2695SNLMR7
2696SQLRR7
2697STLQR7
2698STLVR7
2699SVLKR7
2189TSLKV7
2700AALTR6
2701DSLKR6
2702DSLRR6
2703DTLMR6
2704EGLLR6
2705ENLAR6
2706GNLNR6
2707GTLQR6
2708HALDR6
2709HVLER6
2710IGLRR6
2711INLTR6
2712NMLRR6
2713QMLRR6
2714TNLHR6
2715TSLHR6
2716VGLAR6
2717AALQR5
2718AGLDR5
48ATLKR5
1833DGLKK5
2719DTLQR5
2720DVLKR5
2721GALSR5
2722GMLKR5
2723GTLSR5
2724HNLER5
2725NGLLV5
2726NNLTR5
2727QALAV5
2728QGLAR5
2729QNLHR5
2730SALMR5
2731SLLLR5
2732SVLAR5
2733SVLTR5
2734TALRR5
74TMLRR5
2735TQLRV5
2736TTLLR5
2737TTLRR5
2738AALNR4
2739ATLVR4
2740DALHR4
2741DALMR4
2742DGLER4
2743DGLQR4
45DGLRV4
2744DLLRR4
1855DRLKV4
2745GGLGR4
2746GNLHR4
1892GTLKV4
2747GTLNR4
2748HALHR4
2749HALMR4
2750HILTR4
2751HLLLR4
2752HNLQR4
2753HTLGR4
2754IGLTG4
2755NGLLR4
2756NSLRR4
2757PNLIR4
2758PNLRR4
2759SALIR4
2760SILGR4
2761SPLVR4
2762STLTR4
2763TALKT4
2764TALTR4
2765TGLDR4
2766TSLKR4
2767TTLVR4
2768VGLQN4
2769VNLRR4
2770AALVR3
58ADLKR3
2771ANLGR3
2772ATLSR3
2773DNLQR3
2774DNLTR3
2775DRLRR3
2776DTLVR3
2777EGLVR3
2778GALNR3
2779GDLKR3
2780GDLTR3
62GGLGL3
2781GSLQR3
1930HALKR3
2782HGLHR3
1866HGLRV3
2783HTLKR3
2784HVLKR3
2785NGLDR3
2786NMLAR3
2787NSLAR3
2788NTLAR3
2789QGLHR3
2134SGLAV3
2790SILTR3
2791SILVR3
2792SQLKR3
2793SSLQR3
2794TALHR3
2795TALNR3
2796TALSR3
2797AGLGR2
2798AGLMR2
2799ASLQR2
2800ASLVR2
2801ATLMR2
2802AVLKR2
2803DALNR2
2804DALQR2
2805DALSR2
1853DGLRK2
2806DHLHR2
2807DHLVR2
2808DNLSR2
2809DTLSR2
2810DTLTR2
2811DVLRR2
2812EGLIR2
2813EGLSR2
2814GAEE . . .2
2815GALQR2
2319GALRV2
2816GDLRR2
2817GDLVR2
1957GGLKV2
2358GGLRV2
2818GSLAR2
2819GSLKR2
2820HDLRR2
2821HGLNR2
2822HHLIR2
2047HMLKR2
2823HMLRR2
2824HQLVR2
2825HSLAR2
2826HSLHR2
2827HSLRR2
46HTLKV2
2828HTLNR2
2829HTLTR2
2830HTLVR2
2831IGLKR2
2832ITLKR2
2833MTLKR2
2834NALHR2
2835NALSR2
2836NGLGR2
2837NTLHR2
2838QDLKR2
2839QGLLR2
2840QNLLR2
2841QNLRW2
2842QSLRR2
2843QTLKR2
2131SALKR2
2844SALRV2
2845SSLAR2
2846SSLSR2
2847STLDR2
2848STLER2
2849STLHR2
1851STLKV2
2850STLMR2
2851TALGR2
2852TGLAT2
2853TGLSV2
2854TGLVT2
2855TNLKV2
2856TNLSR2
2857TTLAR2
2858TTLGR2
2859TTLIR2
2860TTLKR2
2179TTLKT2
2861TVLRM2
2862VQLAM2
2863VTLTR2
A*S . . .1
2864AALLR1
2865AALMR1
2866AAPER1
2867ADLRR1
2868AGLAW1
2869AGLRW1
2870AGLTS1
2871AILTR1
71AMLKR1
2872ANLPR1
1944ARLKR1
2873ARLQR1
2874ARLTR1
2875ASLRR1
2876ASLTR1
2877ATLDR1
2878ATLER1
2879ATLIR1
2880ATLLR1
2881ATLQR1
2882AVLRR1
1831DALKR1
1950DALRV1
2883DGLSV1
2884DILHR1
2885DQLRR1
2886DSLSR1
2887DTLAK1
2888DVLLR1
2889EALNR1
2890EALTR1
1953EGLKV1
2891EGLMR1
2892EGLQR1
2893EGLRL1
2894EGLRV1
2895EGVRR1
2896ELLRR1
2897ENLER1
2898ETLLR1
2899GALHR1
2900GGHRR1
2901GGLAG1
2356GGLAV1
2902GGLDV1
2903GGLGS1
2904GGLQE1
2905GGLVL1
1958GGLVT1
2906GGPSH1
2907GGPSR1
2908GGQRR1
2909GGVRR1
2910GGWR . . .1
2911GILER1
2912GKLRR1
2913GMLAR1
2914GNLIR1
2915GSLER1
2916GSLVR1
2917GTLER1
2918GTLGR1
2919GTLHR1
2920GTQVR1
2921GVLRR1
2922GVLTR1
2923HALGR1
43HALKV1
2924HDLAK1
2925HGAAR1
2035HGLKK1
2371HGLKM1
41HGLKV1
2926HGLSV1
2927HGLTW1
2928HGPAR1
2929HKLAR1
2930HNLLS1
2931HRLSR1
2932HSLNR1
2933HSLSR1
2934HTLHR1
2935HVLAR1
2936INLSR1
2937NALAR1
2938NHLVQ1
2939NTLIR1
2940NTLNR1
2941NTLQR1
2942NVLKR1
2943PALKR1
2944PGLLR1
PWS . . .1
2945QAAWG . . .1
2946QALAR1
2947QALTR1
2948QDLIR1
2949QTLAR1
2950QTLQR1
2951QVLRR1
2952RGLTR1
2953RGLVR1
2954SALDR1
2955SALMC1
2956SALNR1
2957SDLAR1
2958SDLQR1
2959SDLRR1
2960SGPRR1
2961SLLSD1
2962SMLHR1
2963SNLQR1
2964SSLIR1
2965SSLKR1
2966STLLR1
2967STLNR1
2968STLRK1
2969SVLGR1
2970SVLRR1
2971TALER1
2972TALRT1
2973TDLAR1
2974TDLRR1
2975TGLQV1
2976TGLVRR1
2977TGPAR1
2978TMLKR1
2979TNLPR1
2980TSLAR1
2981TSLGG1
2982TSLGR1
2983TSLQR1
2984TSLVR1
2985VALAR1
2986VALKR1
2987VALSR1
2988VGLKC1
2989VGLSR1
2990VGLTM1
2991VNLAR1
2992VNLIR1
2993VNLNR1
2994VTLGR1
2995VTLKR1
2996VTLMR1
2997VTLRR1
2998WGLER1
TABLE 17
ZF5
selection on G:C
change at nt 8 of core motif in CBS.
Sequences reflect position 2 to 6.
SEQ
ID
NO:Sequence# Read
1843DGLRR498
108DALRR388
2463EGLRR348
1871DALVR288
1837DGLVR262
2468DGLAR261
1986AGLKR257
1870DGLTR255
2462TGLRR237
2530DTLRR196
59HGLRR192
66ATLRR176
2539ETLRR149
2464SGLRR142
2584GTLRR136
50GGLVR132
2545STLSR132
2707GTLQR131
2553DGLSR127
2027GGLKR126
2684GTLAR123
2578HTLAR114
2486SGLVR111
2779GDLKR109
2593QTLRR107
2472TGLVR106
2668STLGR103
2776DTLVR102
2563DGLNR100
2811DVLRR100
2698STLVR100
2720DVLKR99
48ATLKR96
2461GGLRR93
2638STLRR93
2802AVLKR91
2816GDLRR90
2554GTLKR89
1932HGLKR89
56HTLRR89
2492TGLQR87
2559SGLKR86
2672DTLAR84
2654ATLAR83
2848STLER81
2737TTLRR80
2495EGLKR79
2562GTLTR79
2469TGLAR75
2529DGLHR74
54HGLVR74
2828HTLNR73
2967STLNR71
2489SSLRR69
2516TGLSR68
2772ATLSR67
2656DTLNR67
2788NTLAR66
58ADLKR65
2570DALAR65
2626GTLVR64
2719DTLQR62
2739ATLVR61
2478NGLVR61
109DGLKR59
2467GGLAR59
2568HSLVR59
2804DALQR58
2507TGLLR58
2640DMLKR57
55GGLTR56
2867ADLRR55
2474SGLSR55
2564SSLVR54
2500TGLKR53
2475AGLRR52
2550STLAR52
2783HTLKR51
2587NTLRR51
2857TTLAR51
2622ATLTR49
2817GDLVR49
2667SDLKR49
2767TTLVR49
2466SGLAR48
2847STLDR48
2850STLMR48
2515TGLNR48
2502ETLKR47
2970SVLRR47
2849STLHR46
2959SDLRR45
2699SVLKR44
2488HGLTR43
2702DSLRR42
2974TDLRR42
2471HGLAR40
2586HTLMR40
2477SGLTR40
2966STLLR40
2736TTLLR40
2636NGLKR39
2810DTLTR38
2598EGLNR37
2723GTLSR37
2978TMLKR37
2589TSLRR37
2801ATLMR36
2999DALTR36
2697STLQR36
2762STLTR36
2780GDLTR35
2476GGLSR35
51HGLIR35
2509ASLKR34
2630SSLTR34
1985AALKR33
3000DALIR33
2859TTLIR33
2490EGLTR32
2753HTLGR32
2613HTLQR32
2692QSLKR32
2701DSLKR31
2131SALKR31
2845SSLAR31
2618TTLMR31
2878ATLER30
2086NALKR30
2594SGLIR30
2556SMLRR30
3001GVLKR29
53TGLTR29
2497EGLAR28
2612HTLER28
2766TSLKR28
3002GDLHR27
2644HNLVR27
1936HTLRV27
2465AGLAR26
3003GDLNR26
2503HGLLR26
3004SILKR26
2858TTLGR26
2499DGLIR25
2732SVLAR25
2590TTLQR25
2473AGLTR24
1988AGLVR24
2805DALSR24
3005DTLIR24
2777EGLVR24
2579QGLKR24
2820HDLRR23
2784HVLKR23
3006NTLTR23
2957SDLAR23
2965SSLKR23
2973TDLAR23
2803DALNR22
3007HTLIR22
2628NTLKR22
2838QDLKR22
2860TTLKR22
3008EVLRR21
3009GDLSR21
3010HVLRR21
2837NTLHR21
3011TDLTR21
2681TTLNR21
1833DGLKK20
2520DMLRR20
2919GTLHR20
2833MTLKR20
2980TSLAR20
3012ATLHR19
3013DSLVR19
3014GTLDR19
2830HTLVR19
3015NTLLR19
2843QTLKR19
2634ESLRR18
3016HDLQR18
2821HGLNR18
2823HMLRR18
57TVLKR18
3017ATLNR17
2596DGLGR17
2485NGLRR17
2549SGLNR17
2501SGLQR17
3018STLIR16
2617TALKR16
2519TGLIR16
3019TTLSR16
3020DILKR15
3021ETLNR15
2916GSLVR15
3022MDLKR15
2504NGLQR15
2949QTLAR15
2964SSLIR15
2538AALRR14
2818GSLAR14
2484HGLQR14
2512NGLTR14
3023QDLRR14
2588TGLHR14
3024TSLTR14
71AMLKR13
3025ATLGR13
3026GDLQR13
2470GGLQR13
2819GSLKR13
3027NTLVR13
3028SILRR13
2582SNLVR13
2846SSLSR13
2995VTLKR13
2880ATLLR12
2597DSLQR12
2659GGLNR12
2548HGLSR12
2525SGLLR12
2792SQLKR12
2505TGLMR12
2982TSLGR12
2479AGLQR11
2670DILRR11
3029DTLER11
3030DTLLR11
2917GTLER11
2689HTLLR11
2540NGLAR11
2663NGLHR11
3031SDLTR11
3032SMLKR11
1849TTLKV11
2879ATLIR10
2722GMLKR10
2600GSLRR10
3033GTLLR10
2510QGLRR10
2480AGLHR9
2498AGLSR9
2740DALHR9
2005DGLLR9
3034DTLGR9
3035GDLAR9
1930HALKR9
2782HGLHR9
46HTLKV9
3036HVLVR9
2664NMLKR9
2939NTLIR9
3037QDLAR9
2560TGLGR9
2875ASLRR8
2881ATLQR8
3038ETLAR8
2592GALTR8
2607GALVR8
2547HALVR8
2643HGLMR8
3039HILKR8
3040HMLVR8
2827HSLRR8
3041NTLSR8
2948QDLIR8
3042SDLVR8
2537SGLMR8
2677SMLTR8
2189TSLKV8
2651TTLTR8
2700AALTR7
3043ETLQR7
2521GALKR7
2641GALRR7
2528GGLIR7
117GNLVR7
3044HDLGR7
3045HDLTR7
2826HSLHR7
2934HTLHR7
2942NVLKR7
2678TALAR7
3046TDLKR7
1845TGLKV7
3047TSLNR7
2983TSLQR7
3048VDLKR7
2014DVLKK6
3049GILKR6
2921GVLRR6
2610HALAR6
2483HALRR6
2531HLLKR6
3050HNLKR6
2834NALHR6
3051QDLQR6
2616QGLVR6
2532SALAR6
3052SDLGR6
2514SGLHR6
2302STLKT6
3053TDLSR6
2565TGLER6
2742DGLER5
3054DILVR5
2566DTLKR5
1884EALKR5
2657EILKR5
3055GVLVG5
3056HSLTR5
3057HTLDR5
2937NALAR5
2572NGLIR5
2555NGLSR5
3058QQLQR5
2523SALRR5
2694SALVR5
2513SGLDR5
2581SGLGR5
2496SNLLR5
3059SVLLR5
3060TDLGR5
3061TDLQR5
2534VGLKR5
2493AGLIR4
2576AILKR4
3062ALLKR4
2683DSLAR4
2886DSLSR4
3063DTLRK4
3064ETLTR4
3065GELTR4
70GNLTR4
2660GSLTR4
2918GTLGR4
2748HALHR4
3066HDLNR4
2482HNLLR4
3067MTLRR4
2615NGLMR4
3068NTLER4
2956SALNR4
2958SDLQR4
3069SELKR4
2580SGLER4
2604SSLGR4
3070STLSM4
3071TDLMR4
68TNLRR4
2650TQLKR4
3072TSLLR4
3073TSLMR4
2984TSLVR4
3074TTLER4
3075TVLRR4
2738AALNR3
3076ADLTR3
2669AGLER3
2542ANLAR3
69ANLRR3
2877ATLDR3
2741DALMR3
3077DILTR3
3078DMLQR3
2632DNLAR3
2591DNLKR3
2809DTLSR3
3079DVLVR3
2583EALRR3
2813EGLSR3
3080ETLRK3
2481GNLER3
3081GTLMR3
2747GTLNR3
3082HAEG . . .3
3083HDLMR3
3084HMLQR3
2577HNLTR3
3085HSLKR3
2829HTLTR3
2935HVLAR3
2835NALSR3
2518NNLVR3
3086QSLNR3
3087SILAR3
2962SMLHR3
297STLRV3
2733SVLTR3
3088SVLVR3
2734TALRR3
2981TSLGG3
2994VTLGR3
2546AALAR2
2864AALLR2
2770AALVR2
3089ADLVR2
2569AGLNR2
2494ANLVR2
3090ASLAR2
3091ASLIR2
2800ASLVR2
2655DALGR2
2552DGLDR2
2743DGLQR2
1853DGLRK2
2506DNLVR2
3092DVLMR2
3093DVLQR2
3094EGLGR2
3095EGLHR2
2892EGLQR2
2658ESLKR2
2536GGLMR2
138GNLAR2
139GNLMR2
3096HDLSR2
2687HGLDR2
2585HGLGR2
2371HGLKM2
3097HILMR2
2557HNLHR2
2627HNLSR2
2611HSLIR2
3098HSLQR2
3099HVLHR2
3100IDLKR2
2755NGLLR2
3101NILVR2
2943PALKR2
3102PGLAR2
3103PTLMR2
2573QGLTR2
2574QMLKR2
2842QSLRR2
3104QTLSR2
2759SALIR2
2603SALSR2
3105SELRR2
2487SNLDR2
116SNLRR2
2544SNLSR2
2696SQLRR2
2153STLKR2
2968STLRK2
3106TDLHR2
3107TDLVR2
3108TGLKL2
2157TGLRV2
3109TMLNR2
2649TNLAR2
2595TNLKR2
2511TNLVR2
3110TSLIR2
2176TTLKA2
3111VDLRR2
3112VTLAR2
3113AALHR1
2717AALQR1
2866AAPER1
3114ADLNR1
3115ADLRV1
2868AGLAW1
3116AGLKK1
2527AGLLR1
3117AILRR1
2621ANLNR1
3118ASLKS1
2799ASLQR1
2876ASLTR1
3119ASMKR1
3120ATPVP1
2882AVLRR1
3121AVLTR1
3122CGLRR1
3123DAEA . . .1
3124DALER1
1831DALKR1
2682DALLR1
3125DALPR1
3126DARRR1
3127DDLNR1
3128DGAAE . . .1
1852DGLKV1
3129DGLWR1
3130DGPAR1
3131DGPKK1
3132DGRRR1
3133DGVRR1
3134DMLTR1
2535DNLLR1
2808DNLSR1
3135DSLNR1
3136DTLDR1
371DTLRV1
3137DVLRK1
3138DVLRS1
3139DVLSR1
3140DVQKR1
3141EALVR1
2812EGLIR1
3142EGLKM1
2704EGLLR1
2891EGLMR1
3143EGLQC1
3144EGLRS1
2894EGLRV1
3145EGRRR1
2895EGVRR1
3146EGWS . . .1
2705ENLAR1
2633ENLSR1
3147ESLAR1
3148ETGWG . . .1
3149ETLER1
3150ETLHR1
3151ETLVR1
3152ETRRR1
3153EVLKR1
2814GAEE . . .1
3154GALAR1
2778GALNR1
3155GDLYR1
3156GDPAP . . .1
2642GGLDR1
2745GGLGR1
2904GGLQE1
3157GGQTR1
3158GGVVR1
3159GHLQR1
3160GILRR1
3161GMLRR1
2522GNLDR1
3162GNLLL1
2517GNLLR1
2609GNLQR1
3163GNLVM1
2685GTLLV1
2192GTLRV1
3164GTLRW1
3165GTPHR1
3166GVLAR1
3167GVLNR1
3168GVLVR1
3169GWLSR1
3170HAEA . . .1
43HALKV1
3171HDLKR1
3172HELTR1
3173HGLRW1
3174HGMRR1
3175HILIR1
3176HLLNR1
2661HNLAR1
3177HPAP . . .1
2645HQLIR1
2825HSLAR1
2933HSLSR1
3178HTLNK1
3179HTLRA1
3180HTLRG1
3181HTLSR1
2709HVLER1
3182HWLLR1
2710IGLRR1
2754IGLTG1
2711INLTR1
3183ITLTR1
3184KGLPG1
3185MDVKG1
3186MTLIR1
2635NALRR1
2676NALVR1
2614NGLER1
2938NHLVQ1
2786NMLAR1
2543NNLAR1
2637NNLLR1
2787NSLAR1
2940NTLNR1
2941NTLQR1
3187P*MGS1
3188PALKP1
3189PGWAG1
3190PTLKR1
3191PTLRR1
PWS . . .1
2602QALKR1
2947QALTR1
3192QDLAT1
3193QDLVR1
2728QGLAR1
2729QNLHR1
2646QNLQR1
2575QNLRR1
2841QNLRW1
3194QPACV1
3195QTLHR1
2950QTLQR1
3196QTLTR1
3197RGLKR1
3198RPAA . . .1
2336RTLKV1
3199SALHR1
1887SALKV1
2955SALMC1
2730SALMR1
3200SDLKS1
3201SILKV1
3202SILNR1
2791SILVR1
2533SMLAR1
3203SMLLR1
3204SMLR1
2524SNLAR1
3205SNLHR1
2963SNLQR1
3206SPLHR1
3207SSLKW1
3208STPER1
3209STQVR1
3210SVLQR1
3211SVLSR1
2795TALNR1
2631TALVR1
2765TGLDR1
3212TGLKW1
3213TGLNV1
3214TGLQC1
3215TGLRQ1
2977TGPAR1
3216TGPNR1
3217TGQRR1
74TMLRR1
2561TNLMR1
2526TNLNR1
3218TRLVR1
3219TSLIS1
3220TTLDR1
3221TTLKK1
3222TTLRT1
1919TTLRV1
2861TVLRM1
2985VALAR1
3223VALRR1
3224VGLHR1
3225VGLNR1
2652VGLQR1
2619VGLRR1
2990VGLTM1
2605VNLKR1
3226YGLAR1
3227YGLVR1
3228YILRR1
TABLE 18
ZF5
selection on G:T
change at nt 8 of core motif in CBS.
Sequences reflect position 2 to 6.
SEQ
ID
NO:SequenceRead #
50GGLVR178
2538AALRR174
2607GALVR170
2462TGLRR162
2464SGLRR158
2461GGLRR152
2463EGLRR148
2475AGLRR143
2641GALRR126
56HTLRR125
2027GGLKR117
2700AALTR111
2473AGLTR108
2521GALKR104
2465AGLAR102
54HGLVR101
1932HGLKR99
2610HALAR97
1986AGLKR96
59HGLRR96
1985AALKR94
2466SGLAR93
66ATLRR90
2539ETLRR90
2471HGLAR90
2495EGLKR83
2477SGLTR82
2488HGLTR79
1843DGLRR77
2592GALTR75
2467GGLAR74
2483HALRR74
2523SALRR71
2486SGLVR70
2734TALRR69
3154GALAR66
2500TGLKR66
55GGLTR63
2694SALVR61
2875ASLRR57
108DALRR57
2530DTLRR52
2819GSLKR50
2748HALHR46
2568HSLVR46
2546AALAR45
2131SALKR45
2583EALRR44
2770AALVR42
1884EALKR42
2827HSLRR42
2532SALAR42
2666SALTR42
2489SSLRR41
2654ATLAR40
1930HALKR40
2587NTLRR40
2956SALNR40
2479AGLQR39
1837DGLVR38
2502ETLKR38
49QALRR38
2678TALAR36
2857TTLAR36
2737TTLRR36
2547HALVR35
2578HTLAR35
2476GGLSR34
2738AALNR33
2470GGLQR33
2564SSLVR33
2656DTLNR31
2600GSLRR31
2586HTLMR30
2559SGLKR30
2550STLAR30
2498AGLSR29
1988AGLVR29
2509ASLKR29
2684GTLAR29
3229QALVR29
2594SGLIR29
2545STLSR29
2472TGLVR29
2468DGLAR28
2701DSLKR28
2762STLTR28
2653AALSR27
2674HALSR27
2603SALSR27
2850STLMR26
2828HTLNR25
1870DGLTR24
51HGLIR24
2628NTLKR24
2589TSLRR24
2997VTLRR24
2569AGLNR23
2721GALSR23
2630SSLTR22
2480AGLHR21
2778GALNR21
2753HTLGR21
2593QTLRR21
53TGLTR21
2717AALQR20
2562GTLTR20
2643HGLMR20
2617TALKR20
2799ASLQR19
2739ATLVR19
1831DALKR19
2634ESLRR19
2659GGLNR19
2622ATLTR18
2528GGLIR18
2660GSLTR18
2554GTLKR18
2707GTLQR18
2636NGLKR18
2667SDLKR18
2698STLVR18
2584GTLRR17
2525SGLLR17
2493AGLIR16
2800ASLVR16
2818GSLAR16
2934HTLHR16
2549SGLNR16
2474SGLSR16
1871DALVR15
2916GSLVR15
2782HGLHR15
2878ATLER14
3098HSLQR14
2501SGLQR14
2519TGLIR14
2516TGLSR14
2858TTLGR14
2767TTLVR14
2995VTLKR14
2772ATLSR13
2702DSLRR13
2759SALIR13
2631TALVR13
2736TTLLR13
2864AALLR12
3230HALTR12
2616QGLVR12
2469TGLAR12
2880ATLLR11
2563DGLNR11
2626GTLVR11
2602QALKR11
3231SALLR11
3232SSLHR11
2967STLNR11
2492TGLQR11
2590TTLQR11
2876ASLTR10
109DGLKR10
2756NSLRR10
2692QSLKR10
2537SGLMR10
2849STLHR10
2638STLRR10
3113AALHR9
2879ATLIR9
3017ATLNR9
2672DTLAR9
2566DTLKR9
2484HGLQR9
2933HSLSR9
2943PALKR9
2964SSLIR9
2764TALTR9
2588TGLHR9
2881ATLQR8
3007HTLIR8
2829HTLTR8
2941NTLQR8
2579QGLKR8
2699SVLKR8
3047TSLNR8
3233AALIR7
2865AALMR7
2999DALTR7
2719DTLQR7
3234GSLHR7
2781GSLQR7
2548HGLSR7
2478NGLVR7
2965SSLKR7
2848STLER7
2795TALNR7
48ATLKR6
2802AVLKR6
3038ETLAR6
2503HGLLR6
2830HTLVR6
2784HVLKR6
3235NALQR6
2485NGLRR6
3236NSLVR6
2580SGLER6
2514SGLHR6
2860TTLKR6
3237AALER5
3238AALGR5
3025ATLGR5
2598EGLNR5
2904GGLQE5
70GNLTR5
2086NALKR5
2788NTLAR5
2843QTLKR5
2950QTLQR5
2505TGLMR5
2515TGLNR5
2980TSLAR5
2743DGLQR4
2703DTLMR4
2777EGLVR4
2745GGLGR4
2536GGLMR4
3239GSLIR4
3240GSLNR4
2673HALLR4
2783HTLKR4
46HTLKV4
2938NHLVQ4
2510QGLRR4
3241QVLKR4
3199SALHR4
2845SSLAR4
2668STLGR4
3018STLIR4
2966STLLR4
3242TALQR4
3073TSLMR4
3243AALDR3
2527AGLLR3
2542ANLAR3
69ANLRR3
3244ASLSR3
3012ATLHR3
2570DALAR3
2804DALQR3
2499DGLIR3
2553DGLSR3
2520DMLRR3
2497EGLAR3
2490EGLTR3
2658ESLKR3
2491GGLER3
2625GGLLR3
138GNLAR3
117GNLVR3
3245GSLSR3
3246HALQR3
2577HNLTR3
3085HSLKR3
2613HTLQR3
2832ITLKR3
2833MTLKR3
2787NSLAR3
3247NSLSR3
2940NTLNR3
2947QALTR3
2573QGLTR3
3195QTLHR3
3248QTLVR3
2730SALMR3
2496SNLLR3
2604SSLGR3
2847STLDR3
2970SVLRR3
2507TGLLR3
2561TNLMR3
68TNLRR3
3249TSLER3
2618TTLMR3
2534VGLKR3
2718AGLDR2
2669AGLER2
2797AGLGR2
3250ASLMR2
3251ASLNR2
2552DGLDR2
2529DGLHR2
2591DNLKR2
2535DNLLR2
2623DNLRR2
2506DNLVR2
2683DSLAR2
3030DTLLR2
2809DTLSR2
2810DTLTR2
2720DVLKR2
2811DVLRR2
2890EALTR2
3043ETLQR2
3252GALDR2
2779GDLKR2
2780GDLTR2
3253GGPRR2
2917GTLER2
3254HALNR2
2820HDLRR2
2687HGLDR2
2585HGLGR2
2821HGLNR2
2482HNLLR2
2826HSLHR2
3255MPLTR2
2834NALHR2
2540NGLAR2
2572NGLIR2
2755NGLLR2
2504NGLQR2
2512NGLTR2
2837NTLHR2
2939NTLIR2
2942NVLKR2
2948QDLIR2
2838QDLKR2
2842QSLRR2
3004SILKR2
2556SMLRR2
2793SSLQR2
2697STLQR2
2971TALER2
2851TALGR2
2157TGLRV2
2978TMLKR2
2511TNLVR2
2715TSLHR2
3019TTLSR2
2651TTLTR2
3256AALTG1
2866AAPER1
58ADLKR1
2868AGLAW1
3257AGVIR1
3258AGVTR1
71AMLKR1
2621ANLNR1
3090ASLAR1
3259ASLRG1
2801ATLMR1
3260ATLRM1
3261ATPRR1
3262AVLAR1
2882AVLRR1
3263AVLVR1
2803DALNR1
2596DGLGR1
1833DGLKK1
1853DGLRK1
3129DGLWR1
3264DGPAA . . .1
2640DMLKR1
2597DSLQR1
2776DTLVR1
2014DVLKK1
3265EALHR1
3266EALSR1
3095EGLHR1
2891EGLMR1
3267EGLRG1
2894EGLRV1
2705ENLAR1
2633ENLSR1
2814GAEE . . .1
3268GALER1
3269GALGK1
3270GALIR1
3271GALKV1
3272GALMR1
2815GALQR1
3273GAPRR1
3003GDLNR1
2817GDLVR1
2642GGLDR1
2571GGLHR1
3274GGPAR1
3275GGPVR1
3276GGQVR1
3277GGVAR1
3278GGWP . . .1
2913GMLAR1
2481GNLER1
139GNLMR1
2609GNLQR1
3279GSLRV1
2918GTLGR1
2919GTLHR1
3081GTLMR1
2747GTLNR1
2723GTLSR1
3280HAAQ . . .1
3281HALAS1
3282HALER1
3283HALVH1
3284HAMRR1
3285HAQHR1
3286HGLTL1
3287HGLVM1
2531HLLKR1
2661HNLAR1
2557HNLHR1
3050HNLKR1
2627HNLSR1
2644HNLVR1
3177HPAP . . .1
2645HQLIR1
3288HSLGR1
1936HTLRV1
2935HVLAR1
2710IGLRR1
2754IGLTG1
2711INLTR1
3184KGLPG1
3289MPLQR1
2937NALAR1
2663NGLHR1
2615NGLMR1
2555NGLSR1
2664NMLKR1
2543NNLAR1
2637NNLLR1
3006NTLTR1
PWS . . .1
3290QAPWP . . .1
3023QDLRR1
2728QGLAR1
2574QMLKR1
2729QNLHR1
2646QNLQR1
2841QNLRW1
3104QTLSR1
3291RGLQR1
2629SALER1
2693SALGR1
2955SALMC1
3292SALQR1
3293SAQR . . .1
3294SARVR1
2957SDLAR1
3295SDLNR1
2958SDLQR1
2959SDLRR1
3105SELRR1
3296SGADA . . .1
3297SGLR . . .1
3298SGLVC1
3299SGPDP . . .1
2533SMLAR1
2487SNLDR1
2963SNLQR1
2544SNLSR1
2696SQLRR1
3300SSLPR1
2302STLKT1
2968STLRK1
3301STPSR1
2733SVLTR1
3302TALLR1
3303TAPTR1
2973TDLAR1
2974TDLRR1
3304TGLIK1
2977TGPAR1
3217TGQRR1
2595TNLKR1
2526TNLNR1
2766TSLKR1
2983TSLQR1
2859TTLIR1
1849TTLKV1
2681TTLNR1
2861TVLRM1
3305TWLRR1
2985VALAR1
3306VALQR1
2652VGLQR1
2990VGLTM1
2605VNLKR1
3307VSLKR1
3308VSLRR1
3112VTLAR1
2994VTLGR1
TABLE 19
ZF4
selection on G:T
change at nt 10 of core motif in CBS.
Sequences reflect position 2 to 6.
SEQ
ID
NO:SequenceRead #
60AHLRK4967
158GHLKK1446
3309THLRA1429
1386EHLRR1293
162GHLRK1082
3310HHLTK876
63AKLRI867
61AKLRV641
3311AKLRL625
3312AKLKI599
3313SHLRK566
159AHLKK560
163THLKK496
160TKLRL486
92SKLRL475
2137SKLKV466
161TKLKL466
3314QHLRK457
3315AKLKL443
3316GHLVK419
3317GKLKI302
3318THLRK268
3319AKLKV258
106GKLRI246
3320GKLRL224
3321GHLRL213
3322TKLKI199
3323RSLGL178
90AHLRV177
3324AHLRL153
3325TKLRV152
3326SKLKI146
3327SHLVG132
3328GKLKL116
64TKLKV108
3329THLRT107
3330GHLRR102
*R . . .92
3331SHLRL90
65SKLRV80
3332GALV . . .79
3333GHLKM75
3334SKLRI74
3335GILS . . .71
3336SK*VL63
3337SKLVL62
TR . . .61
3338IRLGV59
3339MALGL58
3340EHLRK54
3341GHLRM54
1407EHLKR50
3342ITLM . . .48
3343AHLVK40
3344THLRL40
3345GKLKV38
3346GHLKL34
3347AHLRR32
3348GHLIK30
3349EHLVR28
3350GKLRV27
3351TALSM26
3352EHLQR25
3353EKLKV25
3354QHLVK25
3355TKLNL25
3356GHLRA23
3357GRLPK21
NGR . . .21
3358SKLKL21
3359THLTK21
3360RLLSG20
3361TKLRI19
3362AHLRI18
409GHLKV16
3363GHLRV16
3364GLLPG16
3365AKLRT14
3366RHLRV14
3367AALRK11
3368AHLHK11
3369GHLTK11
3370QHLRR11
3371RSHS . . .11
3372SHLNK11
3373AHLQK10
3374GHLMK10
3375SKLRT10
287AHLKV9
3376AHLRA9
370AHLRT9
3377EHLRL9
3378GHLKI9
3379SHLKL9
3380EHLKK8
3381GHLRT8
3382GKLKM8
3383HHLKK8
3384SKLTI8
3385THEKP . . .8
*G . . .7
3386AKLIL7
3387AKLTI7
3388HALAA7
3389TKLQV7
3390AKLRM6
3391EHLRI6
3392GHLAK6
3393GHLKR6
3394GKLTL6
3395SHLKK6
3396SHLRR6
3397AILKA5
89AKLRK5
3398AKLTL5
3399ASLTG5
201EHLRV5
3400EVLTM5
3401GHLKT5
3402NGRS . . .5
3403THLRR5
3404AHLKL4
3405GALVH4
3406GKLVL4
3407NGRSPV . . .4
3408QALSI4
3409SHLRT4
TRS . . .4
3410AALRL3
3411AHLMK3
439AHLRE3
3412AHLRQ3
3413AKLNL3
3414AKLRA3
3415APLRK3
186EKLRI3
3416GALMG3
3417GALTG3
3418GHLRG3
3419GHLTL3
3420GKLRK3
3421GKLTV3
187GKLVT3
3422HHLRK3
3423MGLVG3
1848SHLKV3
3424SHLRI3
3425SKLIL3
3426SKLMV3
3427SLLAG3
3428THLKI3
3429THLQK3
3430VPLAG3
3431AGLLG2
3432AHLKM2
3433AHLRN2
3434AHLTK2
3435AKLIV2
3436AKLKA2
88AKLKK2
3437AKLTV2
3438AKLVL2
3439AKSRI2
3440AMLMQ2
3441AQLRI2
3442DALR . . .2
419EHLRA2
313EHLRT2
3443EKLKL2
3444GGLQK2
3445GGLTM2
GH*R . . .2
3446GHLLR2
3447GHLRI2
3448GHLVG2
3449GHLVR2
3450GKLNL2
2912GKLRR2
3451GKLVP2
3452GLLGL2
3453GNLGM2
3454GVLQK2
3455HGLLP2
2043HHLRV2
3456HLLEN2
3457IGLQR2
3458KTLGV2
3459LSLLK2
3460MRLGE2
3461NSLTR2
3462NVLNK2
3463PHLRK2
3464PLLMP2
3465PRLRH2
3466QKLHL2
3467QKLNL2
3468SHLRV2
3469SKLHL2
3470SKLKR2
3471SKLNL2
3472SPLAE2
3473SVLML2
TH*R . . .2
2448THLKL2
3474THLRV2
3475TKLIL2
3476TKLMV2
3477TPLNI2
3478TRLQK2
3024TSLTR2
3479VGLGQ2
3480VHLRK2
3481AALES1
3482AALRI1
3483ADLRK1
3484AELLG1
3485AELRI1
3486AGLAA1
1986AGLKR1
3487AGLMD1
3488AHLGL1
3489AHLK . . .1
3490AHLKA1
3491AHLKI1
438AHLKT1
3492AHLNK1
3493AHLR . . .1
3494AHLSK1
3495AHLSP1
214AHLTV1
3496AHLWK1
3497AKFKI1
3498AKIKH1
3499AKIRI1
3500AKIRL1
3501AKIRV1
3502AKLHT1
3503AKLKE1
3504AKLKG1
3505AKLKM1
3506AKLMN1
3507AKLNI1
3508AKLQL1
3509AKLRG1
3510AKLRR1
3511AKLSM1
3512AKSRV1
3513AKVKL1
3514AKVRI1
3515ALLMA1
3516ALLRR1
3517AMLIM1
3518AMLKI1
3519AMLRG1
3520AMLRL1
3521ANLSN1
3522ANVAQ1
3523APLKK1
3524AQFRK1
3525AQLVD1
3526ARLAG1
3527ARLGT1
3528ARLRA1
3529ARLRK1
3530ASLRM1
3531ATLKL1
3532ATLRV1
3533C*LKI1
3534DELMR1
3535DELRV1
3536DGLES1
2005DGLLR1
3537DGLMD1
3538DGLVG1
3539DHLKK1
3540DHLRK1
3541DHLRR1
3542DKLRK1
3543DLLGV1
3544DLLLN1
3545DNLRE1
3546DPLAR1
3547DSLGE1
3548EALMA1
3549EDLVK1
3550EELGL1
3551EELMM1
3267EGLRG1
3552EGLVE1
3553EHLG . . .1
3554EHLHK1
3555EHLKL1
3556EHLKM1
2016EHLRQ1
3557EHLRS1
3558EHLSE1
3559EHLSR1
3560EHLTK1
3561EHLVK1
3562EQLGP1
3563ERLAA1
3564ERLGR1
1893ERLRR1
3565ESLMA1
3566ETLSH1
3567EVLGI1
3568FFLRV1
3569GALGR1
3570GALIM1
3571GDLSG1
3572GGLDL1
3573GGLDQ1
1957GGLKV1
3574GGLNM1
3575GGLPE1
2295GGLVV1
3576GHFKT1
3577GHFQN1
3578GHLK . . .1
3579GHLMN1
3580GHLMV1
3159GHLQR1
3581GHLR . . .1
3582GILAG1
3583GKLHE1
3584GKLKA1
3585GKLKF1
3586GKLKT1
3587GKLR . . .1
3588GKLRA1
3589GKLRM1
3590GKLVA1
3591GKLVV1
3592GLLGE1
3593GLLLD1
3594GLLMG1
3595GLLRG1
3596GMLGG1
3597GPLGV1
3598GPLRV1
3599GRLKI1
3600GRLKK1
3601GSLST1
3602GSLVK1
2554GTLKR1
3603GVLAG1
3604GVLLV1
3605GVLS . . .1
3606GYLRK1
3607HALRT1
3608HALVN1
3609HGLTG1
3610HHLAK1
3611HHLRR1
3612HIRS . . .1
3613HTHEK1
3614IELVQ1
3615IGLGL1
3616IKLRL1
3617IMLRE1
3618IMLVE1
3619IPLGD1
3620IQLRK1
3621IRLG . . .1
3622IRLGG1
3623IRLVV1
3624IVLAA1
3625KHLRA1
3626KHLRL1
3627KILPE1
3628KKLLE1
3629KMLPP1
3630KNLIK1
3631KSLMP1
3632LALGG1
3633LGLGA1
3634LGLVG1
3635LHLTK1
LQ . . .1
3636LRLIG1
LTE . . .1
3637LTLQR1
3638LVLRR1
3639MA*SHMK1
3640MALRL1
3641MALTR1
3642MGLDP1
3643MGLGE1
3644MGLQN1
3645MHLRM1
3646MKLEQ1
3647MLLRN1
3648MLLSH1
3649MLLVN1
3650MPLRA1
3651MQLGG1
3652MRLAR1
3653MRLMG1
3654MRLVG1
3655MSLER1
3656MTLPL1
3657MTLSD1
3658MVLAG1
NG . . .1
2615NGLMR1
2504NGLQR1
3659NKLRL1
3660NLAH1
3661NLLPT1
3662NRLES1
3663NRLGG1
3664NTLPK1
3665PGLHG1
3666PGLRA1
3667PHFTK1
3668PILLQ1
3669PKLGL1
3670PLLKS1
3671PQLTG1
3672PREAM1
3673PTLQR1
3674QELGR1
3675QGLPV1
3676QHLKK1
3677QHLQR1
3678QHLR . . .1
3679QHLRI1
3680QHLRL1
3681QHLTK1
3682QILLH1
3683QKLRI1
3684QNLHK1
3685QPLIK1
3686QQVTA . . .1
3687QTLAE1
3688QVTLA1
3689RALSA1
RGL . . .1
3690RGLGA1
3691RGLTA1
2953RGLVR1
3692RGLVV1
3693RHLRA1
3694RHLRE1
3695RHLRM1
3696RHLRR1
3697RILPR1
3698RKLIV1
3699RKLKL1
3700RLLGA1
3701RLLMP1
3702RLLRR1
3703RMLVP1
3704RRLEG1
3705RRLVN1
3706RTLML1
3707RTLTQ1
3708SDLHV1
3709SDLRK1
2581SGLGR1
3710SGLLV1
2486SGLVR1
3711SHLKM1
3712SHLRA1
3713SHLRE1
3714SHLRG1
3715SHLTK1
3716SHLTM1
3717SHLV . . .1
3718SHLVK1
3719SKIRL1
3720SKLEG1
3721SKLGA1
3722SKLKG1
2191SKLRM1
3723SKLRN1
3724SKLRR1
3725SLLEE1
3726SLLGT1
3727SLLNG1
2138SQLKV1
3728SQLLE1
3729SRLMA1
3730STLLM1
3731STLVG1
3732TALRG1
TG . . .1
2469TGLAR1
3733TGLGL1
3734TGLLK1
2157TGLRV1
3735TGLVD1
3385THEKP1
3736THFRT1
3737THIR . . .1
3738THLAR1
2449THLKQ1
3739THLLK1
3740THLMK1
331THLRP1
3741THLVK1
3742THMK1
3743THVKK1
3744TKLKM1
3745TKLKR1
3746TKLNM1
3747TKLRK1
3748TKLRP1
3749TKLS . . .1
3750TKLTI1
3751TMLGG1
3752TMLKL1
3753TMLPG1
3754TPLKR1
3755TPLRA1
3756TQLKK1
3757TQLKL1
1941TQLKV1
3758TR*RL1
3759TRLKL1
110TRLRE1
TS . . .1
3760TTLGI1
3761TYLKK1
3762VELDP1
3763VELVN1
3764VKLQQ1
3765VKLRL1
3766VKLRN1
3767VKLRV1
3768VLLKS1
3769VLLQM1
3770VMLKD1
3771VMLMG1
3772VPLAL1
3773VPLER1
3774VPLNT1
3775VPLSS1
3776VPLVP1
VQ*G . . .1
3777VRLEE1
3778VRLQA1
3779VVTA . . .1
3780WHLKK1
YG . . .1
TABLE 20
ZF4
selection on G:C
change at nt 10 of core motif in CBS.
Sequences reflect position 2 to 6.
SEQ
ID
NO:SequenceRead #
61AKLRV5924
3325TKLRV4888
64TKLKV3542
2137SKLKV3056
3319AKLKV2451
65SKLRV1583
3375SKLRT474
3350GKLRV320
63AKLRI254
3345GKLKV237
3312AKLKI164
1986AGLKR132
3322TKLKI129
1957GGLKV78
3326SKLKI76
3334SKLRI76
3527ARLGT64
3781VALGS48
3454GVLQK46
TRS . . .39
60AHLRK30
3782AKLVV26
3783TKLRA24
3784LGLRG18
3652MRLAR15
3785TKLKA14
3722SKLKG13
3361TKLRI13
3365AKLRT12
NGR . . .12
3786PNLAV12
3787GGLEV10
158GHLKK10
3788PREAI10
3789TKLKG10
3790TKLIV9
3791WILRA9
3792AK*RG8
3414AKLRA8
3311AKLRL8
3793EK*KV8
106GKLRI8
3310HHLTK8
3385THEKP . . .8
3794TK*RG8
3795TKLRT8
3315AKLKL7
3796AKLRE7
3437AKLTV7
3353EKLKV7
2187SKLKE7
3797TKLRG7
3509AKLRG6
1386EHLRR6
3798EKLRV6
3799RALW . . .6
2438SKLKA6
3504AKLKG5
3390AKLRM5
3400EVLTM5
3314QHLRK5
3800SKLVV5
1851STLKV5
3801TKLKE5
3802TKLNV5
3316GHLVK4
3320GKLRL4
3803KDALQYESEC4
G . . .
3804LSLVD4
3805QKLKV4
3806RELKE . . .4
3807RILGS4
163THLKK4
3309THLRA4
3808TKIRV4
160TKLRL4
3809TKLRM4
3810TKLVV4
3811TKVRV4
3812TRSHSR . . .4
159AHLKK3
3436AKLKA3
3813AKLRD3
1909ATLKV3
3532ATLRV3
3536DGLES3
3814GGLKG3
3418GHLRG3
162GHLRK3
3815GKLIV3
3816GKLKG3
3317GKLKI3
3451GKLVP3
3817KKLHW . . .3
3408QALSI3
3818RTLS . . .3
3819SKLRA3
3820SKVRV3
3427SLLAG3
3821TK*SV3
3822TKLAV3
3823TKLRE3
3824TKSRV3
3825TKVKV3
3826VMLMM3
3430VPLAG3
3431AGLLG2
3827AILQV2
3501AKIRV2
3435AKLIV2
3503AKLKE2
3828AKLMV2
3829AKLSV2
3830AKVKV2
3521ANLSN2
2315DKLRV2
3831ETLMH2
3416GALMG2
3444GGLQK2
3445GGLTM2
3333GHLKM2
3832GKSKV2
3592GLLGE2
3452GLLGL2
3453GNLGM2
2554GTLKR2
3456HLLEN2
3457IGLQR2
3833IKLRV2
3834KALHT2
3835KGLMM2
3836MELAE2
3423MGLVG2
3460MRLGE2
3656MTLPL2
2615NGLMR2
3402NGRS . . .2
3837NKLKV2
3838PRLLA2
3465PRLRH2
3839PRLSR2
3840QGLEA2
2434SELKV2
3470SKLKR2
3841SKLRE2
3842SKLRG2
TH*R . . .2
3843TKIKV2
161TKLKL2
3476TKLMV2
3389TKLQV2
3844TKLRD2
3845TKLSV2
3477TPLNI2
3478TRLQK2
3024TSLTR2
1919TTLRV2
V2
3481AALES1
3846AELKA1
3847AELKV1
3484AELLG1
3486AGLAA1
3848AGLKH1
2475AGLRR1
2498AGLSR1
2473AGLTR1
1988AGLVR1
3490AHLKA1
287AHLKV1
90AHLRV1
3495AHLSP1
3849AKIRE1
3850AKLAV1
3851AKLGV1
3852AKLMI1
3853AKLNV1
3854AKLRF1
3855AKLRN1
3387AKLTI1
3856AKLWV1
3857AKRRV1
3858AKSKV1
3859AKVRG1
3860ALLKV1
3517AMLIM1
3861AMLKV1
3440AMLMQ1
3519AMLRG1
3862AQLKV1
3863AQLRV1
3525AQLVD1
1945ARLKV1
3864ARLRI1
1993ARLRM1
1947ARLRV1
3865ATLQV1
3866AVLKV1
3867AYPRE1
3868CGLHW . . .1
3869CKLRV1
1995DALDR1
3535DELRV1
1852DGLKV1
2005DGLLR1
3537DGLMD1
3870DGLTG1
3538DGLVG1
3871DHLKR1
206DHLNV1
3543DLLGV1
3544DLLLN1
3545DNLRE1
3546DPLAR1
3872DRLTI1
3873DVLKG1
3874DVLRG1
3875EALVH1
3551EELMM1
3267EGLRG1
3552EGLVE1
201EHLRV1
3349EHLVR1
3562EQLGP1
3876EQLMT1
3564ERLGR1
3565ESLMA1
3566ETLSH1
3877EVLAA1
3567EVLGI1
G . . .1
3571GDLSG1
3573GGLDQ1
3878GGLKD1
3879GGLKI1
2659GGLNR1
3575GGLPE1
GH*R . . .1
3393GHLKR1
3446GHLLR1
3580GHLMV1
3330GHLRR1
3363GHLRV1
3419GHLTL1
3448GHLVG1
3582GILAG1
3880GILRM1
3881GK*RG1
3584GKLKA1
3382GKLKM1
3882GKLML1
3883GKLQV1
3588GKLRA1
3884GKLRQ1
3885GKLRT1
3394GKLTL1
3593GLLLD1
3594GLLMG1
3364GLLPG1
3595GLLRG1
3886GPLGQ1
3597GPLGV1
3887GPLMG1
3888GQLKA1
3889GRLAV1
3890GRLNA1
3601GSLST1
3602GSLVK1
3603GVLAG1
3604GVLLV1
3607HALRT1
3455HGLLP1
3612HIRS . . .1
3891HPLTV1
3892HRLTR1
3614IELVQ1
3615IGLGL1
3893IKLKV1
3894IMLKS1
3618IMLVE1
3895IQSGE1
3896IQVTLA1
3897IRLAL1
3621IRLG . . .1
3338IRLGV1
3342ITLM . . .1
3624IVLAA1
3898KALRG1
3628KKLLE1
3899KKLRE1
3900KKLVR1
3629KMLPP1
3630KNLIK1
3631KSLMP1
3458KTLGV1
3632LALGG1
3633LGLGA1
3634LGLVG1
LQ . . .1
3636LRLIG1
3901LSLDG1
3637LTLQR1
3638LVLRR1
MA . . .1
3339MALGL1
3641MALTR1
3902MELDR1
3642MGLDP1
3643MGLGE1
3644MGLQN1
3646MKLEQ1
3903MKLQA1
3904MKLRV1
3647MLLRN1
3649MLLVN1
3905MPLLA1
3650MPLRA1
3906MRLARHIRS1
HTGERP . . .
3653MRLMG1
3655MSLER1
3907MSLVN1
3657MTLSD1
3658MVLAG1
3908MVLQE1
3909MVLVG1
N . . .1
3910NDALEYESEC1
GP . . .
3911NDALQYESV1
CVP . . .
2504NGLQR1
3912NGLVV1
3913NK*NV1
3914NKLRV1
3660NLAH1
3661NLLPT1
3663NRLGG1
3664NTLPK1
NV . . .1
3915NVLGG1
3462NVLNK1
3916PGLAA1
3665PGLHG1
3669PKLGL1
3917PKLRA1
3670PLLKS1
3464PLLMP1
3918PNLAG1
3919PNYW . . .1
3671PQLTG1
3672PREAM1
3673PTLQR1
3920PVLDH1
Q1
3921QALTN1
3674QELGR1
3675QGLPV1
3682QILLH1
3467QKLNL1
3684QNLHK1
3685QPLIK1
3687QTLAE1
3922QVLRK1
3689RALSA1
3923RELVR1
RGL . . .1
3924RGLDM1
3925RGLDR1
3691RGLTA1
3926RGLVA1
2953RGLVR1
3692RGLVV1
3694RHLRE1
3697RILPR1
3698RKLIV1
3927RKLKA1
3928RKLKV1
3929RKLRE1
3930RKLRV1
3931RKVRV1
3700RLLGA1
3701RLLMP1
3932RMLQE1
3703RMLVP1
3933RPLEV1
3705RRLVN1
3706RTLML1
3707RTLTQ1
S*G . . .1
3708SDLHV1
2581SGLGR1
3710SGLLV1
2486SGLVR1
1848SHLKV1
3331SHLRL1
3934SKFKV1
3935SKFRV1
3936SKIRT1
3469SKLHL1
3937SKLKD1
3358SKLKL1
3938SKLKM1
3939SKLQI1
92SKLRL1
3940SKLSV1
3941SKLTV1
3337SKLVL1
3942SKSRT1
3943SKVKV1
3944SKVRT1
3725SLLEE1
3726SLLGT1
3945SNLKG1
3946SNLTH1
3728SQLLE1
1857SRLKV1
3730STLLM1
3947TALIS1
3732TALRG1
3948TELIG1
3949TELKV1
TG*S . . .1
2469TGLAR1
3733TGLGL1
2157TGLRV1
3385THEKP1
3737THIR . . .1
3738THLAR1
3429THLQK1
3318THLRK1
3344THLRL1
3329THLRT1
3950TKLHV1
3951TKLKD1
3744TKLKM1
3745TKLKR1
3952TKLKT1
3953TKLMA1
3746TKLNM1
3954TKLQI1
3955TKLR . . .1
3956TKLTV1
3957TKLWV1
3958TKSRD1
3751TMLGG1
3959TMLKV1
3753TMLPG1
3960TMLRV1
3754TPLKR1
1864TRLKV1
110TRLRE1
2168TRLRG1
1883TRLRV1
3961TRSHS . . .1
3962TTIRV1
3760TTLGI1
1849TTLKV1
3963TTLSA1
3964TTLVP1
3965TVLAP1
3966TVLPM1
3967VALTK1
3763VELVN1
3479VGLGQ1
3968VGLLR1
3969VKLLV1
3764VKLQQ1
3766VKLRN1
3767VKLRV1
3768VLLKS1
3970VLLMA1
3971VLLPS1
3770VMLKD1
3771VMLMG1
3972VNLLE1
3772VPLAL1
3773VPLER1
3774VPLNT1
3775VPLSS1
3776VPLVP1
VQ*G . . .1
3973VQLPV1
3777VRLEE1
3778VRLQA1
2994VTLGR1
3974YTHMK1
TABLE 21
ZF4
selection on G:A
change at nt 10 of core motif in CBS.
Sequences reflect position 2 to 6.
SEQ
ID
NO:SequenceRead #
61AKLRV408
3350GKLRV294
TRS180
64TKLKV170
3320GKLRL166
3402NGRS155
3325TKLRV124
3390AKLRM109
160TKLRL109
3345GKLKV107
3312AKLKI92
3319AKLKV88
186EKLRI84
3655MSLER68
3975NGRSPVC67
3416GALMG66
3976AELIR63
2581SGLGR63
3915NVLGG61
3977RGLT61
3978TLLMG58
3451GKLVP57
3430VPLAG57
3682QILLH55
3979TLPL55
3980*MLTS54
3981EMLTS53
2137SKLKV53
3615IGLGL52
3322TKLKI52
3495AHLSP51
3828AKLMV51
3982DALRG51
3633LGLGA51
3805QKLKV51
3408QALSI50
3983PLLET49
3984PSLM49
3452GLLGL48
3985TLLVG48
3766VKLRN48
62GGLGL47
3419GHLTL47
3986GPLHI46
3649MLLVN46
3987VELNS46
3988AKLIT45
3394GKLTL45
3946SNLTH45
3989AT*RR44
3544DLLLN44
3596GMLGG44
3923RELVR44
3990SPLLS44
3991DKLRR43
3570GALIM43
3992GLLG43
3993GLMM42
3994IHLAD42
3995TLTQ42
3996TRSHSS42
3997ALMQ41
1947ARLRV41
3321GHLRL41
3456HLLEN41
3998HTLNM41
3999PMLVD41
3469SKLHL41
4000GK*KL40
3440AMLMQ39
3546DPLAR39
3328GKLKL39
3914NKLRV39
3732TALRG39
3827AILQV38
3435AKLIV38
3311AKLRL38
3612HIRS38
3382GKLKM37
3592GLLGE37
3453GNLGM37
3582GILAG36
4001GPLAL36
3908MVLQE36
3669PKLGL36
4002ARLGL35
4003EELLK35
3647MLLRN35
3685QPLIK35
288AHLAV34
3400EVLTM34
3460MRLGE34
3548EALMA33
4004PLLGV33
3671PQLTG33
3877EVLAA32
4005HPLQQ32
3916PGLAA32
3467QKLNL32
4006SKLNN32
4007TRLRN32
3438AKLVL31
4008DLLV31
462DSLLA31
4009GELRT31
4010RLLGV31
2700AALTR30
3444GGLQK30
2615NGLMR30
4011NRLQ30
4012PALGN30
4013PLLGM30
4014PPLMQ30
4015TQLEE30
4016VGLEG30
3543DLLGV29
3572GGLDL29
3418GHLRG29
4017KTLRE29
4018PRLR29
4019PSLGV29
4020RR*PS29
3735TGLVD29
3429THLQK29
DGLMDHIRSH
4021TGERPF28
3459LSLLK28
4022MVLVP28
4023SELTG28
4024SGLKH28
3754TPLKR28
4025VGLG28
60AHLRK27
3506AKLMN27
63AKLRI27
4026DRLGP27
4027GLLGR27
3617IMLRE27
4028KQLQP27
MA*S27
NGR27
3694RHLRE27
4029RPLLR27
4030RSLRL27
65SKLRV27
3427SLLAG27
3760TTLGI27
3484AELLG26
2473AGLTR26
3538DGLVG26
4031GALG26
4032GDLSP26
3573GGLDQ26
3580GHLMV26
3317GKLKI26
4033GKLSL26
3603GVLAG26
4034LRLNL26
4035MTLGN26
4036PMLAA26
3375SKLRT26
3746TKLNM26
4037ALIG25
4038AQLAN25
4039DGLAM25
3575GGLPE25
4040GLPV25
3631KSLMP25
2601NGLNR25
4041SHMK25
3477TPLNI25
3965TVLAP25
4042VLLME25
3431AGLLG24
4043GALPR24
4044GKLIL24
3882GKLML24
3604GVLLV24
4045KQLTD24
4046LKLIG24
3636LRLIG24
4047LRLMS24
3663NRLGG24
4048PNYWP24
4049RHLVP24
4050SRLGA24
3855AKLRN23
4051DRLAS23
3547DSLGE23
3563ERLAA23
106GKLRI23
4052GSLS23
664HRLGG23
4053MDLLL23
4054MTLGA23
4055PPLER23
4056PVLPG23
3674QELGR23
3818RTLS23
4057SLLQG23
2157TGLRV23
3476TKLMV23
3773VPLER23
4058APLGM22
1386EHLRR22
2607GALVR22
2659GGLNR22
3446GHLLR22
4059GILAK22
4060GMLPD22
3597GPLGV22
4061GSLPM22
3602GSLVK22
3166GVLAR22
3634LGLVG22
3637LTLQR22
4062NGRSPVET22
3666PGLRA22
4063PMLRV22
4064TLML22
90AHLRV21
3515ALLMA21
4065ASLGQ21
3870DGLTG21
3267EGLRG21
223EHLAV21
4066ELILE21
4067GH*RS21
4068GHLAM21
3589GKLRM21
4069GLLP21
4070GTLAI21
4071IRLKK21
4072KELRR21
3627KILPE21
4073LHLPI21
3423MGLVG21
3905MPLLA21
4074NELRG21
3462NVLNK21
4075PHLNG21
3464PLLMP21
4076RLLGS21
4077RTLIS21
4078SC*AS21
3708SDLHV21
92SKLRL21
4079VKLMN21
4080VTLIG21
4081AGLQE20
4082ALHT20
4083DPLVD20
E20
4084EALDA20
4085GALAT20
4052GSLS20
4086GTLLM20
4087IKLRP20
LQ20
NGP20
3684QNLHK20
4088RRLLD20
3726SLLGT20
3948TELIG20
4089TGLMG20
4090TKLLL20
4091TTLGA20
4092VE*DP20
3968VGLLR20
4093AGLGI19
4094AGLLQ19
3526ARLAG19
4095AVLSH19
3535DELRV19
4096DRLAG19
4097ERLSN19
4098ETLM19
4099GELRG19
3590GKLVA19
4100GRLNR19
4101GRLRL19
4102IMLAG19
4103IVLDP19
4104KVLAP19
4105LMLGM19
3641MALTR19
4106MPLRE19
4107RLLGP19
3819SKLRA19
4108SMYRS19
4109THLAK19
3762VELDP19
4110VGLTR19
3775VPLSS19
4111VQLPT19
2538AALRR18
4112AGLD18
3517AMLIM18
3519AMLRG18
4113DVLPG18
3562EQLGP18
3393GHLKR18
3880GILRM18
4114GLLV18
4115GLMN18
4116GMLVG18
4117GPLTI18
4118GRLE18
4119GSLQS18
4120GVLVS18
4121HKLLK18
3614IELVQ18
3619IPLGD18
3632LALGG18
3648MLLSH18
4122MRLKV18
4123MRLRS18
4124MSLSP18
4125PALGG18
3665PGLHG18
3673PTLQR18
4126QPLAG18
4127SK*VV18
3842SKLRG18
4128TLIN18
4129TLLTP18
4130DALME17
4131EALNK17
4132EGLPT17
4133ELLKS17
4134GELTD17
3884GKLRQ17
3161GMLRR17
4135GPLVS17
4136GQLMM17
4137GQLVG17
4138KGLEG17
4139QGLDN17
4140RALVS17
4141RGLAT17
3426SKLMV17
3800SKLVV17
3729SRLMA17
4142TLHE17
2168TRLRG17
3864ARLRI16
201EHLRV16
4143GHLKS16
4144GLLKH16
3890GRLNA16
4145GVLSI16
4146GVLST16
3607HALRT16
3900KKLVR16
3638LVLRR16
4147MPLVP16
3661NLLPT16
4148PKLQP16
4149PVLMG16
4150QALIG16
4151RGLIT16
3691RGLTA16
3705RRLVN16
4152RVQD16
3725SLLEE16
4153TELPM16
TGL16
3751TMLGG16
3776VPLVP16
4154APLDL15
4155ARLGR15
4156DALSA15
4157EGLAG15
50GGLVR15
4158GGLVS15
3363GHLRV15
3815GKLIV15
3595GLLRG15
4159GMLGT15
4160GPLLG15
4161HIRSH15
3457IGLQR15
4162IMLV15
3897IRLAL15
304KALGT15
3898KALRG15
4163LHLQG15
4164MELMT15
4165MPLGG15
4166PGLAD15
4167PTLEV15
4168RQLGM15
4169RVLRG15
2525SGLLR15
4170SVLRV15
3733TGLGL15
4171TVLAG15
4172VGLA15
4173VGLRG15
3770VMLKD15
3774VPLNT15
2994VTLGR15
WR15
A14
4174AALHH14
3490AHLKA14
4175ALLGV14
3525AQLVD14
4176ARLHA14
4177DGLG14
4178DHLVG14
4179DILRG14
4180DQLVE14
4181DQLVG14
4182EKLMM14
4183ELLTP14
3564ERLGR14
4184GALRS14
3445GGLTM14
3583GKLHE14
4185GKLNI14
3406GKLVL14
4186GRLLE14
3628KKLLE14
3458KTLGV14
4187MALPE14
3653MRLMG14
4188NDALQYES14
3662NRLES14
3461NSLTR14
4189PKLRS14
4190PRLPP14
4191PVLKL14
4192QKLAN14
4193QKLKL14
4194RALPK14
3697RILPR14
4195THLGR14
3753TMLPG14
4196VALGT14
4197VKLHE14
4198VTLG14
4199ARLLG13
4200ARLTG13
4201ASLGA13
4202DLLSG13
3545DNLRE13
4203EALTI13
3551EELMM13
4204ETLS13
4205GALGS13
3381GHLRT13
4206GPLVL13
4207GRLGA13
4208GRSYMA13
4209GVLGS13
4210HPLLV13
4211ITLSP13
3642MGLDP13
4212MLLNG13
4213MRLAE13
4214NMLSR13
4215PGLGG13
4216PGLVP13
3670PLLKS13
3468SHLRV13
4217SRLGV13
2469TGLAR13
4218TLMG13
4219TRLMM13
TRLREHIRSHT
4220GERPF13
4221VELGP13
4222VHLAR13
4223VKLVG13
3486AGLAA12
4224APLRV12
4225EALV12
4226EVLPE12
4227GALMN12
4228GLQA12
4229GLTG12
4230GTLGD12
4231HLLGP12
4232LKLKL12
4233MALRK12
4234MVLTG12
4235NGLIE12
4236NKLVV12
4237PALNV12
4238PMLRL12
4239PQLLG12
4240PVLRV12
4241QPLKR12
3924RGLDM12
4242RGLEN12
3700RLLGA12
4243RRLMV12
2486SGLVR12
4244SPLSG12
3728SQLLE12
4245SRLGR12
4246TGLVG12
3403THLRR12
3809TKLRM12
4247TKLVM12
4248TLLG12
4249TMLPR12
4250TNLRL12
4251TPLGE12
4252TPLVG12
4253TRLLT12
4254VGLGR12
4255VKLQ12
3768VLLKS12
4256AGLML11
3398AKLTL11
3521ANLSN11
4257ARLLT11
2880ATLLR11
4258EGLGG11
4259EGLHL11
3333GHLKM11
3889GRLAV11
4260GVLG11
4261LGLEG11
4262LNLQP11
4263LRLRT11
4264MELGD11
4265MLLQR11
4266MLPP11
4267MSLGG11
4268PKLII11
4269PNLQT11
4270PPLLS11
4271PTLGM11
4272QKLMT11
3687QTLAE11
3701RLLMP11
4273RRLVG11
4274SNLIM11
3730STLLM11
3738THLAR11
4275TLTM11
4276TRLGG11
3478TRLQK11
4277VGLLA11
4278VKLRM11
4279VLLGG11
4280VQ*GG11
3777VRLEE11
4281AGLSG10
4282AGLTE10
4283AGLVA10
4284ALSA10
4285ATLMK10
2468DGLAR10
206DHLNV10
4286EALAI10
4287EELVE10
4288EMLIP10
4289EPLAA10
4290ERLQE10
3878GGLKD10
3588GKLRA10
3591GKLVV10
4291GMLRV10
4292GPLME10
4293GVLSP10
4294IKLMG10
4295IPLNR10
4296MLLKG10
4297MRLPR10
4298MSLRE10
3918PNLAG10
4299PPLMV10
4300PTLGV10
4301RGLRN10
3692RGLVV10
4302RSLIV10
4303RTLGE10
4304SSLGV10
3947TALIS10
4305TGLGT10
3344THLRL10
3822TKLAV10
4306TKLLG10
4307TLIG10
4308TNLLR10
4309TTLGG10
4310VILGA10
3972VNLLE10
3481AALES9
4311AALGL9
4312AELMR9
4313AGLDG9
1988AGLVR9
3534DELMR9
4314DSLVI9
4315EKLKA9
3798EKLRV9
4316GKLIA9
4317GNLVT9
4318GRLLI9
4319GRLRS9
3239GSLIR9
2554GTLKR9
4320HELMK9
4321KMLGG9
4322LGLIQ9
4323LKLER9
4324LPLNG9
4325MGLGV9
3658MVLAG9
3909MVLVG9
2540NGLAR9
3668PILLQ9
4326PMLTV9
4327PPLII9
4328QRLVE9
3698RKLIV9
4329RKLKE9
4330RRLHE9
4331RVLGA9
2532SALAR9
4332SC*RP9
4333SGLDA9
4334SQLDR9
2507TGLLR9
3952TKLKT9
4335TSLTE9
2342AGLKM8
4336AGLRS8
4337AHLGQ8
3493AHLR8
4338ALME8
2875ASLRR8
1995DALDR8
4339DGLHG8
4340DGLLQ8
3550EELGL8
4341EKLRS8
3876EQLMT8
4342ERLAR8
3569GALGR8
4343GELKA8
2295GGLVV8
3341GHLRM8
4344GLML8
4345GLQN8
4346GLTA8
4347GMLGE8
4348GPLRR8
4349GVLDT8
4350GVLNT8
4351IQLAD8
4352KGLTM8
4353MELGN8
4354MPLMR8
3657MTLSD8
4355NGLAM8
4356NGLQD8
4357NTLDV8
4358PHLSM8
4359PILLG8
4360PVLQG8
4361QGLGG8
4362QKLQI8
4363QPLIA8
3926RGLVA8
3727SLLNG8
4364SRLTD8
4365TLLGD8
4366TRSHSSV8
3024TSLTR8
4367TTLGD8
4368VKLAP8
3973VQLPV8
3367AALRK7
159AHLKK7
4369AKLHP7
4370AVLEN7
3571GDLSG7
4371GELGV7
187GKLVT7
3593GLLLD7
3594GLLMG7
4372GLMA7
4373GLNR7
4374GLVV7
4375GPLPV7
4376GSLTQ7
4377GVLRG7
4378HPLAV7
4379HTLGM7
4380IQLGG7
4381KLLGD7
3630KNLIK7
4382MALAR7
4383MELEP7
4384MGLAN7
3643MGLGE7
4385MPLDG7
4386NVLGR7
4387PGLPE7
4388PHLQN7
4389PRLGS7
4390PSLLV7
4391PTLAR7
4392QMLER7
4393RDLGS7
4394RGLGN7
4395RLLEK7
3703RMLVP7
4396SVLSG7
4397TGLVN7
4398TLA*SH7
4399TRLHT7
3967VALTK7
3771VMLMG7
4400VVLAG7
4401AGLVG6
3315AKLKL6
4402AR*PS6
1945ARLKV6
2005DGLLR6
4403DKLHR6
2203DKLKV6
4404ERLPV6
4405GDLVE6
4406GELGE6
4407GGLMQ6
4408GLLT6
4409GLPG6
4410GSLRT6
4411GTLQV6
4412GVLKS6
4413HGLVN6
4414IELGR6
4415KPLEL6
4416MKLE6
3664NTLPK6
4417PALMR6
303PHLVV6
4418PPLVV6
4419QALVP6
4420QELGG6
3370QHLRR6
4421QTLGV6
4422RILEP6
4423RLLMN6
4424RPLVG6
4425RRLEP6
4426SGLRA6
4427SKLMA6
3940SKLSV6
4428TMLEP6
4429TRSQ6
4430VALRK6
4431VDLSG6
4432VMLLG6
4433VPLSE6
2718AGLDR5
4434ARLPV5
4435ARYGC5
1909ATLKV5
2317DGLRA5
4436ERLLQ5
4437ETLMG5
4438GHLML5
4439GHLQG5
4440GKLMV5
4441GPLG5
4442GPLTM5
4443GQLV5
4444GSLTL5
4445GTLRA5
4446GTLTG5
3310HHLTK5
4447IVLVR5
4448MALVR5
4449MELGK5
4450MGLEG5
4451MGLMA5
4452MPLNR5
4453NMLGG5
4454NPLEL5
4455NSLGG5
4456PRLLQ5
4457PRLVK5
2953RGLVR5
4458RHLRS5
4459RSLVV5
RSPV*ERMWI
4460LRA5
4461RTLNA5
4462TELN5
4463VKLRA5
4464VLLQD5
4465VMLG5
4466AGLNG4
4467AHLRM4
3414AKLRA4
4468AR*RA4
4469ARLPE4
4470AVLNK4
DALQYESECG
4471GLNH4
3030DTLLR4
4472EGLRD4
4473ESLMG4
G4
4474GELV4
4475GGLRP4
158GHLKK4
3584GKLKA4
4476GLIG4
4477GLIS4
4478GLLGN4
4479GMLVN4
4480GPLED4
4481GPLQA4
4482GTLTV4
4483GVLGI4
4484IDLGM4
4485IELGG4
4486IGLAT4
4487KKLMP4
4488KLLGE4
4489KLLLG4
3629KMLPP4
4490MGLTL4
4491MNLGM4
4492MPLMV4
3650MPLRA4
3651MQLGG4
2085MRLRM4
4493PALTV4
4494PGLAL4
4495PGLMG4
4496PHLMS4
4497PQLSA4
4498PRLKA4
4499QKLIR4
4500RELGV4
4501RGLHQ4
4502RGLIG4
4503RGLMG4
4504RTRSH4
4505SQLDT4
4506TELGG4
163THLKK4
3309THLRA4
4507TKLGV4
4508TMLEG4
4509VSLGV4
4510VSLTA4
4511VSLVG4
1986AGLKR3
4512AGLQN3
4513AGLRV3
3516ALLRR3
4514ARLRT3
4515ASLQK3
4516ASLR3
2772ATLSR3
4517DILGE3
4518EELRM3
4519EGLTG3
4520EMLKE3
4521ESLLG3
3565ESLMA3
4522ETLAG3
4523EVLVQ3
2521GALKR3
2745GGLGR3
162GHLRK3
4524GKLRS3
4525GLKT3
4526GLLGV3
4527GMLLP3
4528GMLSG3
3887GPLMG3
4529GRLAP3
4530GSLLR3
4531GTLTM3
GVI3
4532ILLQQ3
4533KLLQM3
4534LGLPG3
4535MELVL3
4536MGLAG3
4537MGLPV3
3644MGLQN3
4538MQLAD3
4539MSLLR3
4540MSLPE3
4541NGLKQ3
2504NGLQR3
4542NGRSPV*E3
4543NPLSR3
4544NQLVA3
4545NTLGL3
4546PRLRV3
4547PVLLM3
4548PVLTG3
3314QHLRK3
4549QQLL3
4550RGLVN3
4551RHLVV3
4552RLLAE3
4553RLLPG3
4554RPLIT3
4555RVLMN3
4556RVLQR3
2580SGLER3
161TKLKL3
4557TLLPG3
110TRLRE3
3249TSLER3
4558VGLPA3
4559VPLRP3
4560VRLMP3
4561VSLGE3
4562AALTK2
4563AALVK2
4564AHLTP2
4565AILRT2
4566AKLNS2
3853AKLNV2
3509AKLRG2
4567ALLGA2
4568ARLLR2
3528ARLRA2
4569DVLG2
4570EELQS2
3552EGLVE2
4571ELLGP2
4572ERMC2
4573EVLAG2
4574GALGE2
4575GDLVP2
4576GELRI2
4577GGLEL2
4578GHLSP2
4579GKLEA2
4580GKLKR2
2912GKLRR2
4581GKLVI2
4582GLHQ2
4583GLLR2
4584GLMV2
4585GLTL2
117GNLVR2
4586GPLVG2
4587GQLVD2
4588GRLSV2
4589GVLAV2
3609HGLTG2
4590HVLEL2
4591IELEM2
4592IGLQA2
4593KGLGN2
4594KILPV2
4595KPLPG2
4596KSLRM2
4597KTLGT2
4598LGLAA2
4599LGLGG2
4600LVLQE2
4601MGLAS2
4602MLLEE2
771MLPA2
3652MRLAR2
4603MSLRQ2
4604MTLGT2
4605NGLIV2
4606NHLRM2
NLA2
4607PALIM2
4608PGLAG2
4609PLLRA2
4610PPLDG2
4611PPLIM2
4612PPLLG2
4613PQLTE2
4614PVLDG2
4615QGLTT2
4616QRLAV2
4617RELGG2
4618RGLDG2
4619RGLTE2
4620RHLGA2
4621RSLMI2
4622RSLRP2
3721SKLGA2
4623SKLGE2
T*LT2
2443TALKV2
4624THLR2
1864TRLKV2
4625TRLPP2
4626VELGD2
3763VELVN2
2459VGLGG2
4627VGLKD2
4628VKLHV2
4629VKLLS2
4630VQLTK2
4631VRLK2
4632VRLPP2
4633AALEN1
4634AALGP1
4635AALGT1
4636AALKI1
4637AALMN1
4638AALMQ1
2865AALMR1
4639AALRV1
4640AALSS1
4641AELGP1
4642AELRA1
3485AELRI1
4643AGIAA1
4644AGILQ1
4645AGLDS1
4646AGLG1
4647AGLGG1
4648AGLGN1
4649AGLGP1
4650AGLGQ1
4651AHFRV1
4652AHLRG1
4653AHLRP1
4654AKFRM1
4655AKLE1
4656AKLGE1
4657AKLGL1
4658AKLHA1
3504AKLKG1
4659AKLLG1
4660AKLML1
4661AKLQP1
3854AKLRF1
4662AKLRQ1
4663AKLS1
4664AKLTN1
4665AKLWL1
4666ALDA1
4667ALIM1
4668ALKG1
4669ALLGE1
4670ALLRS1
4671ALTG1
4672ALTR1
4673AMLPD1
4674AMLR1
4675APLAG1
4676APLGP1
4677AQLAD1
4678AQLLL1
4679AR*RG1
4680ARLAA1
3527ARLGT1
4681ARLMS1
4682ARLRS1
4683ARLTE1
4684ARYGR1
4685ASLGP1
4686ASLRP1
4687AT*RS1
4688ATLAK1
4689ATLEV1
4690ATLKI1
4691ATLMG1
4692ATLNM1
4693ATLNV1
4694AVIG1
4695CGLGR1
4696DALQP1
1999DALTV1
4697DELM1
4698DELMN1
4699DELRA1
4700DGLE1
4701DGLEK1
3536DGLES1
4702DGLML1
DGLTGHIRSHT
4703GERPF1
4704DGVAM1
4705DHLVD1
4706DILG1
4707DILRT1
2348DKLKG1
4708DKLMM1
4709DLLA1
4710DLLAR1
103DNLRV1
4711DRLAA1
4712DRLGG1
4713DSLPE1
4714DSLV1
3874DVLRG1
4715DYLNV1
4716EALA1
4717EALKV1
4718EALMV1
4719EALTN1
4720EELAP1
EELMMHIRSH
4721TGERPF1
EELVEHIRSHT
4722GERPF1
3377EHLRL1
3349EHLVR1
4723EKLIV1
3353EKLKV1
4724ELLAR1
4725ELLPS1
4726EMLVA1
4727EQLGT1
4728ERLAV1
93ERLRV1
4729ETLNS1
4730ETSSH1
4731EVLAV1
3567EVLGI1
4732EVLIQ1
4733EVLQE1
4734GALGL1
4735GALGV1
4736GALIS1
4737GALMQ1
4738GALRD1
4739GALRG1
4740GAVMN1
4741GE*GI1
4742GELKV1
4743GELML1
4744GELMR1
4745GELRV1
4746GELTG1
4747GFLAR1
4748GGFRD1
4749GGLA1
4750GGLAE1
368GGLGA1
4751GGLGE1
4752GGLGP1
4753GGLHP1
1957GGLKV1
4754GGLMD1
4755GGLMT1
4756GGLNI1
2357GGLRG1
4757GGLRL1
4758GGLSG1
4759GGLVG1
4760GGVGL1
4761GHLAI1
4762GHLQC1
3159GHLQR1
3330GHLRR1
4763GHLSV1
3448GHLVG1
3316GHLVK1
4764GILAR1
4765GILSG1
4766GKLAI1
4767GKLGG1
4768GKLIG1
4769GKLII1
4770GKLIT1
GKLKMHIRSH
4771TGERPF1
4772GKLLK1
4773GKLNA1
4774GKLPT1
4775GKLQA1
3587GKLR1
3588GKLRA1
4776GKLRE1
4777GKLT1
4778GKLTM1
4779GLAA1
4780GLIV1
4781GLLEK1
4782GLLGG1
4783GLLMV1
3364GLLPG1
4784GLLQD1
4785GLLTG1
4786GLSG1
4787GLSGR1
4788GLSV1
4789GLVN1
4790GLVQ1
4791GMLAG1
4792GNLSN1
727GPLA1
4793GPLKP1
4794GPLRP1
4795GPLVP1
4796GQLGP1
4797GQLLE1
4798GR*ML1
4799GRLGG1
4800GRLLG1
4801GRLMP1
4802GRLVS1
4803GRYGC1
3279GSLRV1
4804GSLSK1
4805GSLSP1
4806GTLKL1
4807GTLLL1
2685GTLLV1
4808GTLMT1
2192GTLRV1
4809GTLTE1
4810GVIN1
GVL1
4811GVLDN1
4812GVLE1
4813GVLKD1
3454GVLQK1
4814GVLRL1
4815GVLSG1
2220GVLTG1
4816GVMN1
4817GVPV1
4818HELMR1
4819HLLVP1
4820HPLDR1
4821HPLLS1
4822HPVKE1
4823HTLKM1
4824HTLLK1
4825HTLNI1
3178HTLNK1
4826HTLRP1
4827IALPG1
4828IELAL1
4829IELG1
4830IELHL1
4831IGIQR1
4832IGLGA1
4833IGLRL1
4834IHLAG1
4835IHLRM1
4836IKLTG1
4837IMLPR1
4838IQLMG1
4839IQLRL1
4840IRLAA1
4841IRLGP1
3338IRLGV1
4842IRLRR1
4843ISLVG1
4844ITLMV1
4845ITLRG1
4846ITLRP1
4847ITLVG1
4848IVLPG1
KG1
4849KGLAT1
4850KGLDL1
4851KGLMR1
4852KGRSPVET1
4853KIIV1
4854KILLA1
4855KKLAG1
4856KKLGV1
4857KKLRI1
4858KLLAG1
4859KLLRV1
4860KPLAA1
4861KPLMV1
4862KRLEG1
4863KSLVG1
4864KTLEG1
4865KTLRG1
2404KTLRV1
4866KTLVG1
4867KVLPV1
4868LAHGT1
4869LGLGP1
4870LGLGV1
4871LKVKL1
4872LNLHT1
4873LRLIM1
4874LRVIG1
4875LSLSG1
4876LTLQQ1
4877LVLRG1
4878MALRG1
4879MELIG1
4880MGLRV1
4881MLAA1
4882MLLIS1
4883MLLLP1
4884MLLMV1
4885MLLPP1
4886MLLPV1
4887MLLV1
4888MLLVG1
4889MLVG1
4890MMLDP1
4891MPLGA1
4892MPLGL1
4893MPLLG1
4894MRLEE1
4895MRLGA1
4896MRLGG1
4897MRLGR1
3654MRLVG1
4898MSLHG1
4899MSLQQ1
4900MTLER1
MVL1
4901MVLMN1
4902MVLNT1
4903MVLRG1
4904MVLVT1
4905MVVAS1
4906NDALQYD1
NDALQYESEC
4907GP1
4908NELLR1
4909NELMR1
4910NELRV1
4911NGLG1
NGLIVHIRSHT
4912GERPF1
NGR1
4913NGRPPG*E1
4914NGRSPVR1
4915NILMG1
4916NKLAR1
4917NKLRA1
4918NKLRG1
4919NKLVA1
4920NKLVK1
4921NMLGV1
4922NNLIN1
1838NRLRE1
4923NRLRI1
4924NSLV1
4925NSLVA1
NVHP*VVGLA
4926A1
4927NVLGE1
4928PALAG1
4929PALGP1
4930PALV1
4931PASV1
4932PDLRA1
4933PGITE1
4934PGLAP1
4935PGLHE1
4936PGVAA1
4937PGVVP1
4938PHLKR1
4939PKLIF1
4940PLRG1
4941PMLAG1
4942PMLTM1
4943PNLAS1
3786PNLAV1
3919PNYW1
4944PNYWS1
4945PQLVV1
4946PQSRG*RG1
4947PR*GA1
4948PRLRL1
4949PSFQ1
4950PTLAK1
4951PVLKV1
4952PVLMT1
2602QALKR1
4953QALRG1
4954QALSP1
4955QGLHL1
3675QGLPV1
4956QILLQ1
QILLRHIRSHT
4957GERPF1
4958QILLY1
4959QILPE1
4960QMLAR1
4961QPLAV1
4962QPLTM1
4963QRLGG1
4964QTLAV1
4965QTLGG1
4966QTLGP1
4967REIVR1
4968RELRR1
4969RGLAA1
4970RGLDN1
4971RGLNS1
4972RGLRS1
4973RGLTG1
4974RGLVE1
4975RGYGT1
RHE1
4976RHLKM1
4977RLLGL1
4978RP*SG1
4979RPLAG1
4980RQLGK1
4981RQLLE1
4982RRLEA1
4983RRLET1
2126RRLGD1
4984RRLGS1
4985RRLSE1
4986RRLTP1
4987RRVVG1
RSH1
4988RTLKL1
4989RTLVG1
4990RVLEP1
4991RVLRE1
SC**A1
4992SCLK1
4993SGILV1
4994SGLGG1
4995SGLGL1
4996SGLGT1
4997SGLLG1
4998SGLNL1
4999SGLRL1
5000SGLVG1
3331SHLRL1
3425SKLIL1
2438SKLKA1
3722SKLKG1
5001SKLLG1
3334SKLRI1
2191SKLRM1
3337SKLVL1
5002SL*HG1
5003SLLRT1
5004SNLTY1
5005SNYWP1
5006SPLIG1
5007SPLKI1
5008SPLRN1
2138SQLKV1
5009SQMK1
SR*G1
1857SRLKV1
5010SRLMT1
5011SRLVT1
5012SSLGA1
5013SSLGL1
5014STLQK1
5015SVLVG1
5016SVLVS1
T1
5017TALEA1
5018TALKG1
5019TELE1
5020TELIR1
5021TELPR1
5022TELRV1
5023TGLAD1
5024TGLGA1
5025THLAN1
5026THLAV1
3318THLRK1
3808TKIRV1
3785TKLKA1
5027TKLLR1
5028TKLME1
3802TKLNV1
3955TKLR1
3783TKLRA1
3361TKLRI1
5029TKLRR1
5030TKLVL1
5031TKSGV1
5032TLIS1
5033TLLIR1
5034TLLM1
5035TLLMQ1
5036TLNG1
5037TLQP1
5038TMLDP1
5039TMLRE1
5040TNLVG1
5041TPLIV1
5042TPLMQ1
5043TPLSD1
5044TPLSI1
5045TQLED1
5046TRLGA1
5047TRLMI1
5048TRLRL1
1883TRLRV1
5049TRLTG1
5050TSLSE1
5051TTLEP1
5052TTLGV1
1849TTLKV1
1919TTLRV1
5053TVLGG1
5054TVLT1
V*KS1
5055VALHT1
5056VDLLL1
5057VELAP1
5058VELN1
5059VELNN1
5060VELRV1
5061VGLPV1
5062VGLQA1
2652VGLQR1
5063VGLRN1
5064VGLRV1
5065VGLSP1
5066VGLSQ1
5067VHLAL1
5068VKLMA1
5069VKLQN1
3765VKLRL1
5070VLLAA1
5071VLLIE1
5072VLLKI1
5073VLLTP1
5074VLMV1
5075VLQR1
5076VMLRG1
3772VPLAL1
5077VPLVG1
5078VQLPM1
5079VQLRV1
5080VRLEG1
5081VRLGG1
3778VRLQA1
5082VRLVR1
VTG1
5083VTLER1
5084VTLGS1
WRN1
TABLE 22
ZF4
selection on G:A
change at nt 11 of
core motif in CBS.
Sequences reflect
position 2 to 6.
SEQ
ID
NO:SequenceRead #
118GNLRR3407
69ANLRR1937
117GNLVR1794
116SNLRR1771
5085SNLKR1208
68TNLRR862
119GNLKR850
138GNLAR805
2582SNLVR764
2609GNLQR562
70GNLTR531
121NNLRR486
2914GNLIR475
2494ANLVR455
2706GNLNR373
2517GNLLR360
2620ANLKR326
2524SNLAR269
2963SNLQR261
139GNLMR251
2695SNLMR228
2746GNLHR220
5086SNLTR209
5087NNLKR202
5088SNLIR199
5089ANLMR191
2621ANLNR179
74TMLRR158
5090SNLNR155
5091ANLTR136
5092ANLQR125
2595TNLKR118
73AMLRR111
2567GNLSR107
2542ANLAR102
66ATLRR96
2558HNLRR90
2538AALRR81
2496SNLLR77
5093ANLER73
2556SMLRR62
5094ANLHR59
5095ANLLR58
3032SMLKR51
2544SNLSR47
2541TNLQR47
2521GALKR44
2641GALRR44
3347AHLRR42
2823HMLRR40
2047HMLKR36
5096RNLQR35
71AMLKR31
2722GMLKR31
3161GMLRR29
2131SALKR28
5097SNLER26
5098KNLQR25
5099RNLRR24
2584GTLRR21
2978TMLKR21
2481GNLER20
5100QNLKR19
67RRLDR19
2638STLRR19
2526TNLNR17
2575QNLRR16
2523SALRR16
2714TNLHR16
2551ANLIR15
1985AALKR14
48ATLKR14
2875ASLRR13
2587NTLRR13
2511TNLVR13
3330GHLRR12
2691NNLMR12
2617TALKR12
5101KNLER11
2518NNLVR11
3403THLRR11
5102SMLQR10
2561TNLMR10
2737TTLRR10
2475AGLRR9
2622ATLTR9
3050HNLKR9
5103KNLVR9
2464SGLRR9
2769VNLRR9
5104AMLTR8
2882AVLRR8
3393GHLKR8
5105TNLTR8
3017ATLNR7
2739ATLVR7
5106HNLMR7
2734TALRR7
4308TNLLR7
5107AMLQR6
52ANLSR6
2509ASLKR6
2876ASLTR6
2801ATLMR6
5108GMLER6
5109RLLIN6
5110SGLLK6
2649TNLAR6
5111AHLVR5
3012ATLHR5
2881ATLQR5
2599ENLRR5
3084HMLQR5
72HMLTR5
5112ISLRV5
2543NNLAR5
3205SNLHR5
2153STLKR5
5113AHLKR4
2879ATLIR4
2623DNLRR4
2592GALTR4
5114GNLRK4
5115KKLLR4
5116MNLRR4
5117MVLLR4
5118NNLQR4
5119QNLVR4
5120RNLAR4
3396SHLRR4
2962SMLHR4
2679TNLER4
5121TVLLV4
2738AALNR3
2770AALVR3
1986AGLKR3
2539ETLRR3
3159GHLQR3
3449GHLVR3
5122GMLNR3
5123GMLTR3
5124GMLVR3
2608GNLGR3
5125GNLRG3
5126GNLVK3
2600GSLRR3
2554GTLKR3
56HTLRR3
3010HVLRR3
5127KNLRR3
5128MNLKR3
3407NGRSPV...3
2712NMLRR3
2757PNLIR3
3370QHLRR3
2956SALNR3
5129STLEV3
2967STLNR3
5130TALRS3
1305THLKR3
5131TNLIR3
2700AALTR2
5132AMLNR2
5133ANLRL2
5134ANLRW2
2654ATLAR2
5135DALLV2
2528GGLIR2
4764GILAR2
3160GILRR2
GN*S...2
2522GNLDR2
5136GNLNK2
5137GNLRP2
5138GNLRS2
5139GTLIR2
3081GTLMR2
2626GTLVR2
5140HGLET2
5141HMLNR2
2644HNLVR2
5142KNLMR2
2637NNLLR2
2756NSLRR2
5143PGLLG2
5144RNLVR2
5145SMLNR2
2677SMLTR2
2487SNLDR2
2850STLMR2
2970SVLRR2
2462TGLRR2
5146TMLQR2
2766TSLKR2
2860TTLKR2
3075TVLRR2
5147AALRS1
5148ADLER1
3089ADLVR1
2798AGLMR1
1431AHLTR1
2871AILTR1
5149AMLAR1
5150AMLHR1
5151AMLIR1
5152ANFRR1
5153ANIQR1
5154ANLDR1
2771ANLGR1
5155ANLVG1
5156ANSRR1
5157ANVRR1
5158APLRR1
2799ASLQR1
2880ATLLR1
5159ATLRS1
5160AYFRR1
5161CNLAR1
5162CNLNR1
5163CNLVR1
2591DNLKR1
2506DNLVR1
2778GALNR1
3035GDLAR1
2816GDLRR1
2780GDLTR1
2027GGLKR1
2461GGLRR1
2909GGVRR1
5164GHLNR1
5165GNFRR1
5166GNFVG1
5167GNLAG1
5168GNLAS1
5169GNLHK1
5170GNLLS1
5171GNLMS1
5172GNLNH1
5173GNLQS1
5174GNLRH1
5175GNLS...1
5176GNLTK1
5177GNLTQ1
5178GNLTW1
5179GNLVW1
5180GNLWR1
5181GNSKR1
5182GNSQR1
5183GNSRR1
5184GNVQR1
5185GNVTR1
5186GQLAL1
2819GSLKR1
2747GTLNR1
5187GY*LR1
2661HNLAR1
2752HNLQR1
5188ITLQR1
5189KILGN1
5190KNLKR1
1356KNLTR1
5191KSLRR1
5192LNLRR1
5193LNLVR1
2664NMLKR1
2690NNLIR1
5194NNLNR1
2726NNLTR1
5195NNSRR1
2788NTLAR1
2939NTLIR1
2628NTLKR1
2940NTLNR1
5196PRLRG1
5197QHLKR1
2574QMLKR1
2593QTLRR1
5198RLIIN1
5199RNLKR1
3292SALQR1
2559SGLKR1
5200SHLKR1
3202SILNR1
5201SKLTR1
2647SMLIR1
5202SMLVR1
5203SNLFR1
5204SNLIH1
5205SNLRK1
5206SNLRQ1
5207SNLSG1
5208SNLTS1
5209SNLVW1
5210SNSRR1
5211SNVKR1
5212SNVRG1
2698STLVR1
5213TMFRR1
3109TMLNR1
2680TNLGR1
5214TNLLS1
5215TPTRS1
5216TQLVL1
2589TSLRR1
5217VNLTR1
2997VTLRR1
TABLE 23
ZF4
selection on G:C
change at nt 11 of
core motif in CBS.
Sequences reflect
position 2 to 6.
SEQ
ID
NO:SequenceRead #
73AMLRR3064
74TMLRR2212
2556SMLRR1556
3161GMLRR1320
2722GMLKR1160
3032SMLKR1049
71AMLKR797
2978TMLKR515
2823HMLRR478
2047HMLKR429
66ATLRR261
5102SMLQR248
5107AMLQR212
5132AMLNR125
5104AMLTR124
5146TMLQR123
2712NMLRR119
2664NMLKR102
2677SMLTR98
72HMLTR93
5123GMLTR88
5150AMLHR72
5122GMLNR68
2962SMLHR63
5145SMLNR59
48ATLKR58
5124GMLVR50
5141HMLNR47
3084HMLQR47
5149AMLAR46
5218AMLVR45
3109TMLNR38
5219GMLHR34
5202SMLVR34
2533SMLAR29
2638STLRR27
2970SVLRR27
67RRLDR26
118GNLRR24
2737TTLRR24
2882AVLRR23
5151AMLIR22
2913GMLAR22
5220GMLQR22
2584GTLRR19
2875ASLRR18
5221HMLAR17
2587NTLRR17
69ANLRR16
2713QMLRR16
3017ATLNR15
2574QMLKR15
5222RRLKN15
5223AMLMR14
2801ATLMR14
5224GMLIR14
5225EMLRR13
117GNLVR13
5226RTLAL13
5227SMLSR13
116SNLRR13
2647SMLIR12
1986AGLKR11
TRS11
2739ATLVR10
TRS...10
2538AALRR9
3012ATLHR9
2582SNLVR9
5228TMLTR9
68TNLRR9
5229TMLVR8
3075TVLRR8
2027GGLKR7
2914GNLIR7
2609GNLQR7
3407NGRSPV...7
2559SGLKR7
5230TMLMR7
2860TTLKR7
2881ATLQR6
2622ATLTR6
5231GMLMR6
70GNLTR6
2554GTLKR6
5085SNLKR6
2965SSLKR6
5232AMLER5
5233AMVRR5
2494ANLVR5
119GNLKR5
5086SNLTR5
5234TMLAR5
3987VELNS5
2654ATLAR4
2879ATLIR4
2606EMLKR4
138GNLAR4
139GNLMR4
5087NNLKR4
5235SMLMR4
2153STLKR4
2462TGLRR4
5093ANLER3
2620ANLKR3
2621ANLNR3
5092ANLQR3
2509ASLKR3
2520DMLRR3
2641GALRR3
2706GNLNR3
5236HLLRR3
5237HMLHR3
3010HVLRR3
5238KTLRR3
LL...3
121NNLRR3
2477SGLTR3
5239SMLKN3
3203SMLLR3
2963SNLQR3
2967STLNR3
1985AALKR2
2738AALNR2
3516ALLRR2
5240AMLLR2
5241AMLRH2
5242AMLRS2
5243AMLRW2
5244AMLSR2
5094ANLHR2
2802AVLKR2
5108GMLER2
5245GMLKN2
5246GMLRW2
5247GMVRR2
2600GSLRR2
2921GVLRR2
3039HILKR2
5248HILRR2
5249HMLRS2
3040HMLVR2
2558HNLRR2
56HTLRR2
5250MGLST2
5251NMLIR2
2628NTLKR2
2593QTLRR2
5252RMLKR2
5253RMLQR2
RN*P...2
5254SMFKR2
2524SNLAR2
2850STLMR2
5255TLLRR2
5256TMIRR2
5257TMVRR2
5258VIKR...2
5259AKLQR1
3062ALLKR1
5260AMFRR1
5261AMIRR1
5262AMITR1
5263AMKTR1
5264AMLCR1
5265AMLHS1
5266AMLPR1
4674AMLR...1
3519AMLRG1
5267AMLRK1
5268AMLTM1
5269AMLWR1
5270AMYT...1
2542ANLAR1
5271ARLRR1
4682ARLRS1
1947ARLRV1
3251ASLNR1
2878ATLER1
3025ATLGR1
5159ATLRS1
2772ATLSR1
5272CMLRR1
2640DMLKR1
3078DMLQR1
5273DMVKR1
5274EMLNS1
2539ETLRR1
5275GLLKR1
5276GLLQS1
5277GLLSR1
5278GMIKR1
5279GMLKT1
5280GMLRM1
5281GMLTW1
2746GNLHR1
2517GNLLR1
5282GRLKR1
5283GRLKS1
5284GRLRV1
2747GTLNR1
2626GTLVR1
3001GVLKR1
2483HALRR1
2531HLLKR1
5285HLLNS...1
5286HMLLR1
5287HMLMR1
5288HMVRR1
5106HNLMR1
2784HVLKR1
5189KILGN1
5289KMLKR1
5290LMLGK1
5291MLRR1
5292NLLKR1
5293NMLGR1
5294NTFRR1
2939NTLIR1
2940NTLNR1
5295PMLMR1
5296PVVKR1
2692QSLKR1
5297RMFRR1
5298RMLRR1
2956SALNR1
2523SALRR1
2464SGLRR1
3004SILKR1
3470SKLKR1
5201SKLTR1
5299SLLNR1
5300SMFRR1
5301SMIKR1
5302SMLGR1
5303SMLKW1
5304SMSRR1
5305SMVKR1
2496SNLLR1
5090SNLNR1
2792SQLKR1
1876SRLKR1
5306SRLRR1
2845SSLAR1
2698STLVR1
2699SVLKR1
5307TILRR1
5308TMLER1
5309TMLGR1
5310TMLHR1
5311TMLLR1
5312TMLRH1
5313TMLWR1
2595TNLKR1
2856TNLSR1
5215TPTRS1
5314VMLKR1
5315VSLRK1
2997VTLRR1
5316WMLKR1
5317WMLRR1
5318YMLKR1
5319YMLRR1
TABLE 24
ZF4
selection on G:T
change at nt 11 of
core motif in CBS.
Sequences reflect
position 2 to 6.
SEQ
ID
NO:SequenceRead #
66ATLRR6399
67RRLDR1155
2584GTLRR1073
2737TTLRR1024
2638STLRR970
3017ATLNR770
2739ATLVR727
48ATLKR708
2587NTLRR670
2538AALRR657
2801ATLMR456
2654ATLAR418
2554GTLKR399
2875ASLRR366
2622ATLTR363
2593QTLRR298
2539ETLRR292
2881ATLQR291
2879ATLIR261
2153STLKR252
2628NTLKR237
56HTLRR227
2882AVLRR208
2880ATLLR171
1985AALKR141
2878ATLER134
3012ATLHR130
2860TTLKR125
2509ASLKR95
73AMLRR93
3010HVLRR81
2523SALRR63
5248HILRR60
74TMLRR59
2967STLNR58
2131SALKR47
2738AALNR46
2483HALRR44
2641GALRR41
2843QTLKR41
2783HTLKR39
3032SMLKR39
1930HALKR36
2970SVLRR36
2802AVLKR35
2556SMLRR34
3161GMLRR33
2722GMLKR31
2850STLMR31
2698STLVR31
2626GTLVR28
2521GALKR27
2747GTLNR27
2590TTLQR27
2921GVLRR25
118GNLRR24
116SNLRR24
2589TSLRR24
69ANLRR23
2997VTLRR23
2700AALTR22
71AMLKR22
2697STLQR22
5320ATLRK21
117GNLVR21
2823HMLRR20
2772ATLSR17
5321RTLQR17
2734TALRR17
2819GSLKR16
3018STLIR16
2717AALQR15
2800ASLVR15
2849STLHR15
2489SSLRR14
2978TMLKR14
3075TVLRR14
2876ASLTR13
3081GTLMR13
2047HMLKR13
2966STLLR13
2762STLTR13
2681TTLNR13
70GNLTR12
5189KILGN12
68TNLRR11
3864ARLRI10
2502ETLKR10
2600GSLRR10
2684GTLAR10
5322KTLER10
5323QTLMR10
3028SILRR10
5085SNLKR10
2617TALKR10
2799ASLQR9
3001GVLKR9
121NNLRR9
2877ATLDR8
138GNLAR8
2914GNLIR8
5324KTLQR8
5325RTLRR8
5102SMLQR8
2965SSLKR8
1947ARLRV7
2607GALVR7
5139GTLIR7
2784HVLKR7
3067MTLRR7
5086SNLTR7
2582SNLVR7
2620ANLKR6
119GNLKR6
5326HILNR6
5327MTLMR6
2770AALVR5
5107AMLQR5
2609GNLQR5
2940NTLNR5
3027NTLVR5
3196QTLTR5
5328RTLKR5
2666SALTR5
2699SVLKR5
5104AMLTR4
2621ANLNR4
2494ANLVR4
5158APLRR4
3025ATLGR4
5329ATVRR4
2530DTLRR4
3160GILRR4
5122GMLNR4
3033GTLLR4
2707GTLQR4
5330GVLSR4
5331HRLKI4
2830HTLVR4
5332KTLIR4
5238KTLRR4
5087NNLKR4
2756NSLRR4
2939NTLIR4
2677SMLTR4
2524SNLAR4
2963SNLQR4
2550STLAR4
5333TILAR4
2766TSLKR4
2857TTLAR4
2618TTLMR4
3117AILRR3
5089ANLMR3
3090ASLAR3
5334ASLHR3
5335ATLNK3
5336ATLRG3
2583EALRR3
3049GILKR3
5123GMLTR3
2706GNLNR3
4375GPLPV3
5337GPLVR3
3245GSLSR3
72HMLTR3
2827HSLRR3
5338HVLNR3
5339NSLKR3
5340NTLMR3
5341NVLRR3
2950QTLQR3
5342RRLNR3
2956SALNR3
3292SALQR3
2733SVLTR3
1986AGLKR2
2475AGLRR2
1988AGLVR2
5150AMLHR2
5151AMLIR2
5343ARLKI2
3251ASLNR2
3244ASLSR2
5344ATFRR2
5345ATLNW2
5346ATLRW2
2634ESLRR2
3151ETLVR2
2778GALNR2
2815GALQR2
5124GMLVR2
2517GNLLR2
3230HALTR2
5141HMLNR2
2558HNLRR2
2586HTLMR2
2613HTLQR2
5347IALAG2
5348MSLRR2
5349MTLLR2
5350MTLVR2
3407NGRSPV...2
2664NMLKR2
2712NMLRR2
3191PTLRR2
5351QRLSV2
4424RPLVG2
5352RRIDR2
5353RRLDS2
5354RRVDR2
5355RSLIR2
5356RTLIR2
5357SDLTV2
2962SMLHR2
5358SRLKI2
2564SSLVR2
5359STVRR2
2651TTLTR2
2767TTLVR2
57TVLKR2
2546AALAR1
2864AALLR1
5360AALNS1
3367AALRK1
3410AALRL1
5147AALRS1
5361AAVRR1
5259AKLQR1
3510AKLRR1
3062ALLKR1
5149AMLAR1
5132AMLNR1
5218AMLVR1
5094ANLHR1
5092ANLQR1
5091ANLTR1
AP*C...1
5362APLHR1
5363APLKR1
5364APLMR1
5365APLVR1
5366APYP...1
5271ARLRR1
2874ARLTR1
5367ARLVG1
5368ASFRR1
5369ASLER1
3250ASLMR1
AT*G...1
5370ATFKR1
5371ATFRT1
5372ATFTR1
5373ATIRR1
5374ATLES1
5375ATLFR1
5376ATLHW1
5377ATLIS1
5378ATLNH1
5379ATLNS1
5380ATLQG1
5381ATLQW1
5382ATLRI1
5383ATLRP1
5384ATLWR1
5385ATSVR1
5386ATVAR1
5387AVLGR1
5388AVLLR1
5389AVLNR1
3121AVLTR1
3991DKLRR1
2640DMLKR1
5390DRLRA1
2656DTLNR1
5391EPLVM1
3038ETLAR1
3043ETLQR1
2592GALTR1
2816GDLRR1
2913GMLAR1
139GNLMR1
5392GPFKR1
5393GPLGL1
5394GPLKR1
5395GSLGA1
2781GSLQR1
2660GSLTR1
5396GTFRR1
3014GTLDR1
2917GTLER1
2918GTLGR1
5397GTLMW1
5398GTLRK1
2562GTLTR1
386GTLVS1
5399GTSNR1
5400GTSRR1
5401GVLRK1
5402GVVRR1
2749HALMR1
3246HALQR1
3039HILKR1
5403HILQR1
2578HTLAR1
2689HTLLR1
2828HTLNR1
3180HTLRG1
3181HTLSR1
3099HVLHR1
5404KTLLR1
5405KTLVR1
5406MALRM1
5407MPLAR1
4452MPLNR1
5408MPLVR1
MRS1
2833MTLKR1
4923NRLRI1
2788NTLAR1
2837NTLHR1
3015NTLLR1
2941NTLQR1
5409NTLRW1
3006NTLTR1
5410NTLVS1
5411NTVRR1
2942NVLKR1
5412PPLKR1
5413PSLKR1
5414PTFHR1
5415QKLA...1
2574QMLKR1
2692QSLKR1
3195QTLHR1
5416QTLIR1
5417QTLRQ1
3248QTLVR1
RN*P...1
5418RRLAG1
5419RRLAR1
5420RRLDG1
5421RRLHR1
5422RRLVR1
5423RRSDR1
5424RRVEK1
5425RTLER1
5426RTLNR1
5427RTLRG1
5428SAVKR1
2559SGLKR1
5201SKLTR1
2647SMLIR1
5145SMLNR1
5304SMSRR1
5088SNLIR1
5429SPLRR1
5430SRLRI1
5431STLCR1
2848STLER1
5432STLKS1
5433STLRI1
5434STSRR1
5435SVLRK1
5436TALIR1
5437TALMR1
2764TALTR1
5146TMLQR1
5438TMLRG1
5131TNLIR1
2595TNLKR1
5439TPIMM1
5215TPTRS1
1883TRLRV1
5440TRSP...1
2858TTLGR1
2859TTLIR1
5441TTLRS1
5442TVLNR1
3308VSLRR1
2995VTLKR1
5443VTLQR1
5444VVLGN1
5445WRLDR1
5446WTLRR1
TABLE 25
ZF3
selection on G:A
change at nt 13 of
core motif in CBS.
Sequences reflect
position 2 to 6.
SEQ
ID
NO:SequenceRead #
81GQLTV1094
5447GQLVV906
78GELVV766
5448AELIV643
5449TELIV552
5450QELLV528
5451GELIV525
5452GELTV505
80GQLIV476
5453QELLT457
5454SELIV416
5455GQLLV372
5456SGLIV372
5457GQLII361
5458AELLV311
5459VELLI277
5460AELVV271
5461AQLIV267
76SQLIV265
82TELII251
83QGLLV247
5462SQLII243
79QQLLI224
5463AGLIV221
5464QELVV209
5465GELLV206
86GELLT202
5466SQLLV199
5467GELVI194
75QQLIV179
5468QELII177
5469TQLIV176
5470VELII172
5471VELLV160
5472GELLI151
85GQLLT150
5473NELLI149
5474GQLLI148
5475SQLLI140
5476AQLLV136
5477GQLIT132
5478GQLTI129
5479TELIT122
5480TELLI118
5481TELLV116
5482QELLI112
5483AQLVV106
5484GSLLV104
5485AQLLI102
5486HPPEE100
5487SQLVV100
77QQLLV98
5488QELIV95
5489SELII91
5490AQLII90
5491QQLVV90
5492TGLLV88
5493NQLII88
5494GQLVI81
5495AGLLV80
5496NQLLV73
5497QELGV69
5498GALVV68
5499SQLTV67
5500GELTT67
5501GELII65
3710SGLLV63
5502AELII60
5503TQLII59
5504QQLII59
5505AQLIT58
5506SQLIT58
5507SSLIV57
5508SELTV57
5509NELLV57
5510TQLLV56
5511QGLIV55
5512QELVI55
5513NELIV55
5514TELLT53
TABLE 26
ZF3
selection on G:T
change at nt 13 of
core motif in CBS.
Sequences reflect
position 2 to 6.
SEQ
ID
NO:SequenceRead #
79QQLLI1145
5452GELTV1108
81GQLTV933
5474GQLLI748
5447GQLVV545
5457GQLII518
80GQLIV479
78GELVV477
5515GELIT438
5466SQLLV432
5462SQLII431
85GQLLT404
5516SQLSM365
84QQLLT349
75QQLIV312
5486HPPEE308
5453QELLT300
5475SQLLI282
4773GKLNA281
5451GELIV263
5455GQLLV225
76SQLIV219
5517RALLI216
5518ENLLI201
5476AQLLV174
5519PDLKR174
86GELLT172
5505AQLIT164
5520GQLVT138
5521GQLLS116
5450QELLV112
5522GELNP112
5523GQLIQ98
5524PTLVG98
5525LVLAD95
5526EALRA94
5467GELVI87
1926STLKA87
5494GQLVI85
5463AGLIV82
5527GQLTL82
5528NVLGT81
5529KGLGP79
5530MQLRR79
3026GDLQR75
5531VLLPN71
5532MRLGD69
5533GQLAQ67
4074NELRG67
5500GELTT66
5534GELVT64
333STLVV63
5535VDLAV61
5536AQLTI59
5537DALPA57
5538SVLQL57
5539GPLGN56
5540GHLLL52
5541DVLDP51
5542SSLSI50
5543KM LAD50
TABLE 27
ZF3
selection on G:C
change at nt 13 of
core motif in CBS.
Sequences reflect
position 2 to 6.
SEQ
ID
NO:Sequence# Reads
173RKHD4641
175RKAD1938
174RRSD1299
681RRHD868
682RKTD182
683NVSM146
684RQSD76
685RKND69
686SENV69
687VDHR60
688AQIV58
689KTPH56
690PKIV51
691GAEP42
692MLVE40
693VVGN40
694KGPE36
695GKVM33
696TEPG33
697TPHN32
698MPGG31
699DLEK28
700GTDN27
701ISRL25
702ATGL21
703ASNP19
704GAPT17
705HSPN17
706RPVA16
177RKDD6
707MLVD4
708RHRK3
709RKHV3
710RKQD3
711RKSD3
712DHHT2
713GKHD2
714MKAD2
715RKAE2
716RRAD2
717APIG1
718AQNR1
719DMDA1
720EAPM1
721EEMM1
722EPIR1
723GALE1
724GENV1
725GKAD1
726GKVD1
727GPLA1
728GRIE1
729IEKL1
730KAAS1
731KEEH1
732LKVD1
733LLVE1
734LMTQ1
735MASL1
736MGIG1
737MPGD1
738MSLG1
739NDMT1
740NMHT1
741NRIV1
742PENA1
743QKHD1
744QVPD1
745RASD1
746REHD1
747RGHD1
748RKHA1
749RKHY1
750RKLD1
751RKPD1
752RKVD1
753RKYD1
754RMSD1
755RRLD1
756RRND1
757RRRD1
758RRSG1
759RWHD1
760SHRL1
761SQHV1
762SSHD1
763TTHV1
764VHHV1
765WKAD1
766WKHD1
  • 1. Ong, Chin-Tong & Corces, V. P., Nat Rev Genet. 2014 April; 15(4):234-46.
  • 2. Phillips, J. & Corces, V. P., Cell. 2009 Jun. 26; 137(7): 1194-1211.
  • 3. Ali, T. et al., Curr Opin Genet Dev. 2016 April; 37:17-26.
  • 4. Nora, E. P. et al., Nature. 2012 Apr. 11; 485(7398):381-5.
  • 5. Rao, S. S. et al., Cell. 2014 Dec. 18; 159(7): 1665-1680.
  • 6. Phillip, J., et al., Cell. 2013 Jun. 6; 153(6): 1281-1295.
  • 7. Shukla, S., et al., Nature. 2011 Nov. 3; 479(7371):74-9.
  • 8. Hilmi, K., et al. Sci Adv. 2017 May 24; 3(5):e1601898.
  • 9. Han, D., et al. Sci Rep. 2017 Mar. 6; 7:43530.
  • 10. Rhee, S., & Pugh, F. B., Cell. 2011 Dec. 9; 147(6):1408-19.
  • 11. Nakahashi, H., et al., Cell Rep. 2013 May 30; 3(5):1678-1689.
  • 12. Hashimoto, et al., Mol Cell. 2017 Jun. 1; 66(5):711-720.e3.
  • 13. Guo, A. et al., Nat Commun. 2018 Apr. 18; 9(1):1520.
  • 14. Schuijers, J. et al., Cell Reports (2018). Cell Rep. 2018 Apr. 10; 23(2):349-360.
  • 15. Kang, J. Y. et al., Oncogene. 2015 Nov. 5; 34(45):5677-84.
  • 16. Wright, D., et al. Nat Protoc. 2006; 1(3):1637-52.
  • 17. Sander, J., et al. Nat Methods. 2011 January; 8(1):67-9.
  • 18. Maeder, M., et al. Mol Cell. 2008 Jul. 25; 31(2):294-301.
  • 19. Joung J. K. et al., Proc Natl Acad Sci USA. 2000 Jun. 20; 97(13):7382-7.

It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.

CCCTC-binding factor variants (2024)
Top Articles
Latest Posts
Article information

Author: Fredrick Kertzmann

Last Updated:

Views: 5877

Rating: 4.6 / 5 (46 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Fredrick Kertzmann

Birthday: 2000-04-29

Address: Apt. 203 613 Huels Gateway, Ralphtown, LA 40204

Phone: +2135150832870

Job: Regional Design Producer

Hobby: Nordic skating, Lacemaking, Mountain biking, Rowing, Gardening, Water sports, role-playing games

Introduction: My name is Fredrick Kertzmann, I am a gleaming, encouraging, inexpensive, thankful, tender, quaint, precious person who loves writing and wants to share my knowledge and understanding with you.