Predicted amino acid motif repeats in proteins potentially encode extensive multivalent macromolecular assemblies in the human proteome.


Department of Biochemistry, Faculty of Medicine, University of Montreal, Canada. Electronic address: [Email]


There are emerging interests in understanding higher order assemblies of biopolymers within and between cells, such as protein-protein and protein-RNA biomolecular condensates. These biomolecular condensates are thought to assemble/disassemble via multivalent interactions, including those mediated particularly by unique repeated amino acid motifs (URM). We asked how common are proteins with such URMs, their incidence and abundance, by exhaustively enumerating repeating motifs of length 3-10 in the human proteome. We found that URMs are very common and widely distributed across the human proteome. Moreover, the number of repetitions and intervals between them do not correlate with their lengths, which suggests that the number of repeats among proteins in the proteome is independent of length, contrary to the notion that short motifs are more abundant then long motifs. Finally, we describe two examples of URMs in proteins known to form higher order biopolymer assemblies: multi-PDZ domain-containing proteins and the FUS family of RNA binding proteins. For the FUS family, we predicted a known sequence 'grammar', specific motifs and interval sequence compositions that are essential to phase separation and material properties of condensates formed by this family of proteins. In PDZ domain-containing proteins we found a novel repeated motif that was surprisingly both within and between individual PDZ domains. We speculate that these motifs could be binding sites for multivalent interactions, a residual result of the mechanism by which PDZ-domain duplications occurred or that the linker sequences between PDZ domains may encode cryptic PDZ domains.

