Opinion Article - (2021) Volume 9, Issue 5

Design and engineering of electron transfer cofactors, proteins, and protein networks Biodesign for Bioenergetics
Joseph M.Brisendine*
 
Department of Physics, the City College of New York, New York, NY 10031, United States
 
*Correspondence: Joseph M.Brisendine, Department of Physics, the City College of New York, New York, NY 10031, United States,

Published: 26-Nov-2021

Introduction

The accumulated results of thirty years of rational and computational de novo protein design have taught us important lessons about the stability, information content, and evolution of natural proteins. First, de novo protein design has complicated the assertion that biological function is equivalent to biological structure â?? demonstrating the capacity to abstract active sites from natural contexts and paste them into non-native topologies without loss of function. The structureâ??function relationship has thus been revealed to be either a generality or strictly true only in a local sense. Second, the simplification to “maquette” topologies carried out by rational protein design also has demonstrated that even sophisticated functions such as conformational switching, cooperative ligand binding, and light-activated electron transfer can be achieved with low-information design approaches. This is because for simple topologies the functional footprint in sequence space is enormous and easily exceeds the number of structures which could have possibly existed in the history of life on Earth. Finally, the pervasiveness of extraordinary stability in designed proteins challenges accepted models for the “marginal stability” of natural proteins, suggesting that there must be a selection pressure against highly stable proteins. This can be explained using recent theories which relate non-equilibrium thermodynamics and self-replication. This article is part of a Special Issue entitled Biodesign for Bioenergetics â?? The design and engineering of electronc transfer cofactors, proteins and protein networks, edited by Ronald L. Koder and J.L. Ross Anderson.

The central tenets of the theory of biopolymer folding are implied by the “Anfinsen principle” the information required to reach the native state is contained in the primary sequence of amino acids. The well-known experimental proof of this principle lies in the reversible nature of the folding process in dilute aqueous conditionsâ??if information from an outside source was needed to specify the native state then the protein would not spontaneously refold in isolation. This simple principle places strong thermodynamic constraints on the nature of the folding process and provided a starting point for a more comprehensive thermodynamic picture of folding and the development of the lattice models of folding. It has also been known for some time, however, that this simple picture of folding only strictly holds for small proteins and protein subunits, that some proteins require chaperones to reach their native state, and furthermore that some proteins are intrinsically disordered in their native state and thus do not “fold” at all in the sense defined by the theory.

 The clearest implication of the Anfinsen principle is that protein folding is spontaneous in the appropriate environment and can be regarded as a phase transition from random-coil states with continuous energy spectra to an ordered or semi-ordered set of states characterized by discrete, well separated energy levels. The energy gap between the folded state and the unfolded state must therefore have a negative free energy change large enough with respect to thermal noise for the folded state to exist a sufficient fraction of the time to perform its biological function (a timescale which itself varies considerably for different proteins).

Computational models of folding

The first computational evidence for the present framework was provided by lattice models, in which a polymer chain is folded onto a 2D or 3D lattice and the energy of a structure is calculated directly from the energy of all contact pairs

Where the delta function counts the number of non-adjacent i,j pairs in contact and Eij is the energy of amino acid i in contact with amino acid j. Folding results from contact pairings in the native state both between the residues of the chain and between the surface residues with solvent. The most simple “HP model”, in which the chain identity is simply either Hydrophobic (H) or Polar (P), requires assigning only three potentials corresponding to Hâ??H, Pâ??P, and Hâ??P contacts between the chain elements. More detailed models with all 20 amino acids make frequent use of the Miyazawa-Jergen matrix of contact potentials which utilizes experimentally derived energies for all 20 Ã? 20 pairwise interactions of the amino acids. With a set of contact potentials the energy of any sequence in any structure can be readily computed and there is consensus that these simple models confirm the predictions of the “thermodynamic hypothesis” and further, generically reproduce many other features of natural proteins such as native states with high degrees of symmetry and fast folding kinetics.

One important concept to emerge from the analysis of these models is the notion of “designability,” defined for a particular structure S as the number of sequences which have that structure as their native state. The distribution of designabilities of different structures was found to vary significantly from the expectation for a Poisson distribution, with many structures having designabilities orders of magnitude larger than the mean of the distribution. The implications of these findings for protein evolution have been discussed and debated extensively. Highly designable structures are, by definition, more tolerant of mutation and require less sequence information per amino acid to encode, suggesting a number of reasons that natural selection would have favored more designable structures. Additionally, in a purely random search of sequence space with no biasing on the part of the environment whatsoever, the probability of finding a structure is simply Ns/AN, the designability of that structure divided by the total number of sequences of equal length (length N drawn from alphabet of size A). If we now note that the requirement for reversible folding is equivalent, in Shannon's information theory terms, to the claim that the uncertainty about the structure given the sequence is zero, then we see that reversible folding is a noiseless channel