The gluten proteins of wheat (a monocotyledonous plant) and certain closely related species - including rye, barley, and, with qualifications, oats - are unique among plant storage proteins in having exceptionally high proportions of the amino acids glutamine and proline. It was suggested (Kasarda 1980, 1981) that because of their unique composition, the extensive occurrence of repeating sequences based largely on glutamine and proline, and their occurrence only in recently evolved grass species, wheat prolamins are a late evolutionary development. Subsequently, with the development of molecular biological techniques, gene sequencing provided evidence of short amino acid sequences in most wheat prolamins, homologous to sequences found in storage globulins of more distantly related dicotyledonous plant species (Kreis et al. 1985). This discovery pushes the possible age of the ancestral genes back to within the period during which the flowering plants have existed - perhaps 100 million years.
However, repetitive sequences make up major parts of all gluten proteins, whereas the homologous sequences just mentioned occurred only in nonrepetitive regions. These repetitive sequences, having major proportions of glutamine and proline, apparently do not have counterparts in proteins of species outside the grass family. Accordingly, it seems likely that at least the repetitive domains of gluten proteins, which have slightly differing repeat motifs according to type, are of more recent (<~65 million years) origin. All of the various repeating sequences include glutamine and proline residues, and although the repeats are often imperfect, comparison allows a consensus repeating sequence to be recognized. It is at least possible that the most active (in celiac disease) peptides may result from variations on the themes represented by the consensus sequences, as a consequence of these imperfections, rather than from the consensus sequences themselves, but that remains to be established.
There is reasonably strong evidence that the peptides with sequences found in the repeat domain of alpha-gliadin are capable of triggering the intestinal damage characteristic of celiac disease. Furthermore, significant sequence similarities between gluten proteins and other proteins are rare. The first important similarity found was between alpha-gliadin and the E1b protein (Kagnoff et al. 1984), produced in conjunction with infection by adenovirus 12, which infects the human gastrointestinal tract. Recently, similarities have been found for peptides produced in conjunction with infection by other types of adenovirus (Lahdeaho et al. 1993).
The recent evolution of repeating sequence domains in gluten proteins through extensive duplication of the DNA codons (for glutamine and proline, along with a few other amino acids) corresponding to the repeat motifs may be the basis for the lack of homologies or similarities with other proteins. Most proteins do not have large amounts of glutamine and proline. Hence, the sequences active in celiac disease are likely to be confined to the grass family.