|
| Research articleCalculation of the relative metastabilities of proteins using the CHNOSZ software packageDepartment of Earth and Planetary Science, University of California, Berkeley, CA 94720, USA
Geochemical Transactions 2008, 9:10doi:10.1186/1467-4866-9-10 The electronic version of this article is the complete one and can be found online at: http://www.geochemicaltransactions.com/content/9/1/10
©
2008 Dick; licensee BioMed Central Ltd. AbstractBackgroundProteins of various compositions are required by organisms inhabiting different environments. The energetic demands for protein formation are a function of the compositions of proteins as well as geochemical variables including temperature, pressure, oxygen fugacity and pH. The purpose of this study was to explore the dependence of metastable equilibrium states of protein systems on changes in the geochemical variables. ResultsA software package called CHNOSZ implementing the revised Helgeson-Kirkham-Flowers (HKF) equations of state and group additivity for ionized unfolded aqueous proteins was developed. The program can be used to calculate standard molal Gibbs energies and other thermodynamic properties of reactions and to make chemical speciation and predominance diagrams that represent the metastable equilibrium distributions of proteins. The approach takes account of the chemical affinities of reactions in open systems characterized by the chemical potentials of basis species. The thermodynamic database included with the package permits application of the software to mineral and other inorganic systems as well as systems of proteins or other biomolecules. ConclusionMetastable equilibrium activity diagrams were generated for model cell-surface proteins from archaea and bacteria adapted to growth in environments that differ in temperature and chemical conditions. The predicted metastable equilibrium distributions of the proteins can be compared with the optimal growth temperatures of the organisms and with geochemical variables. The results suggest that a thermodynamic assessment of protein metastability may be useful for integrating bio- and geochemical observations. BackgroundOwing to the growing body of compositional data for microbial proteins and the exploration of environments that are extreme from the human standpoint, it has become possible in recent years to draw correlations between the compositions of proteins and environmental parameters such as temperature [1]. Accounting for the underlying causes of the observed correlations between environmental parameters and protein composition is an ongoing challenge. Biochemical approaches are based in part on the notion that proteins from thermophilic and hyperthermophilic organisms should have greater structural stabilities than their mesophilic counterparts [2]. Compositional features of thermophilic proteins that may enhance their structural stabilities include increased numbers of hydrophobic residues, stronger charge interactions on the protein surfaces, and other properties of the amino acid sequence [3]. However, it has also been suggested that, at least for sulfur, the elemental makeup of proteins is correlated with the chemical compositions of the environment [4]. This study was motivated by the desire to explore a possible thermodynamic explanation for the relationship between protein composition and the extracellular environment, which is shaped in part by geochemical constraints. A thermodynamic assessment of protein metastability provides a framework for describing the relationship between geochemistry and protein composition that until now has received relatively little attention. The geochemical literature abounds with examples of theoretical calculation of the compositions of stable and/or metastable equilibrium reference states as a way to predict the distributions of, and reaction pathways among, minerals and inorganic or organic aqueous species [5,6]. In recent years, the calculation [7-11] and experimental investigation [12-14] of metastable equilibrium states in biogeochemical systems has gained traction. The primary advantage of extending a framework of this type to proteins and other biomacromolecules is that it places biochemical reactions in the same context as observations on the inorganic systems to which microbial metabolic pathways are coupled. Temperature, pressure, oxidation state and pH are just some of the variables that are commonly measured in geochemical studies that also appear explicitly in the thermodynamic representation of protein metastability reactions. This study was undertaken in order to explore the thermodynamic relationships between geochemical variables and protein composition for model proteins from a number of organisms adapted to different environments. The cell-surface glycoproteins in archaea and the surface-layer proteins in bacteria [15,16] were chosen for this purpose because they are intimately associated with the extracellular aquatic and mineralogical setting. Because experimental values of the standard molal Gibbs energies of the model proteins were not available, they were calculated using previously reported group additivity and equations of state algorithms that are referenced to ionized unfolded aqueous proteins [17,18]. These values are requisite for calculating the composition of the metastable equilibrium state in an open system described by chemical potentials of basis species, or perfectly mobile components [19-22]. The predicted chemical activities of species can then be displayed on chemical predominance and/or speciation diagrams whose axes correspond to intensive chemical variables. Because of the lack of integration of algorithms for calculating thermodynamic properties of proteins in available geochemical equilibrium software packages, the task of calculating and graphically representing the metastable equilibrium distributions of the proteins was managed through development of the CHNOSZ software package, which is introduced in this study. The implementation of the thermodynamic algorithms and data into the package is described first below. The results of the calculations for the model system of proteins are then described and are displayed primarily in the form of diagrams depicting the calculated metastable equilibrium distributions of the proteins. The graphical depictions shown below are only limited portrayals of the metastable equilibrium states of systems of proteins, which are in fact multidimensional functions of thermodynamic variables. The predicted response of at least one of the metastability reactions between proteins from hyperthermophilic and mesophilic organisms appears to be aligned with the differences in temperature, pressure and oxidation state between their environments. However, more tests in other systems will be required to assess the generality of the approach. Some potential implications of the findings are addressed briefly in the concluding remarks, and the paper is finished with a section devoted to the methods adopted for writing protein metastability reactions and computing their thermodynamic properties. ImplementationThe CHNOSZ software package consists of source code, data files, and documentation. It is written for the cross-platform R software environment [23]. The package can be freely downloaded from the project website at http://www.chnosz.net webcite. The features of the package, its basic program structure, and the thermodynamic database are summarized in the following paragraphs. FeaturesCHNOSZ was developed in order to ease calculations of 1) the standard molal thermodynamic properties of chemical species and reactions as a function of temperature and pressure, 2) the standard molal thermodynamic properties and equations of state parameters of neutral and ionized proteins using group additivity algorithms, 3) the chemical affinities of formation reactions of species of interest from basis species describing the system, and to assist in 4) generating metastable equilibrium activity diagrams for systems of biomolecules and/or other species. The functions provided in CHNOSZ are suitable for either interactive use or scripted operation. The diagrams that are produced can be viewed on screen or saved as postscript files. Because the thermodynamic database includes the chemical formulas of species in addition to their standard molal thermodynamic properties, functions operating on user-input chemical reactions have the option to check, and possibly automatically correct, the mass balance of the reactions. This feature can speed up user interaction with the program and the writing of program scripts. The program has been designed with features in mind and is not presently optimized for speed. Most of the diagrams shown below can be produced in under a minute, but temperature-pressure diagrams of the same resolution require substantially more computational time, owing to the number of times the equations of state subroutines are called. The package was developed with the goal of analyzing protein reactions, but the range of systems that can be studied using the software is limited only by the species available in the thermodynamic database, to which the user can make either temporary or persistent additions or updates. Complete documentation of the functions, including examples derived from the geochemical literature and this study, is provided with the package. Usage of the major functions in CHNOSZ is summarized below. Standard molal propertiesThe relationships among the primary functions provided in CHNOSZ and some of the accessory functions are depicted in the flowchart shown in Fig. 1. Calculation of the standard molal thermodynamic properties of species and chemical reactions as a function of temperature and pressure is implemented in the primary function subcrt. The name of this function is a variation of the name of the SUPCRT92 software package [24]. The temperature and pressure ranges of calculations possible using subcrt are the same as those for SUPCRT92.
The accessory function water implements two computational options for calculating the thermodynamic and electrostatic properties of liquid H2O as a function of temperature and pressure. The first of these options provides an interface to the FORTRAN subroutine named H2O92D.F that was distributed with SUCPRT92 [24] and that is included in the CHNOSZ source package. The calculation of the properties of liquid H2O in this case is consistent with data and equations from Refs. [25-27] and others (see Ref. [24]). The stated temperature and pressure limits of applicability for these calculations, described in Ref. [24], are from 0.01°C and PSAT (i.e., 1 bar at temperatures below 100°C and the saturation vapor pressure of H2O at higher temperatures) to 2250°C and 30000 bar. However, electrostatic properties of the solvent, which are required by the revised Helgeson-Kirkham-Flowers (HKF) equations of state for aqueous species, can not be computed above 1000°C and 5000 bar. An alternative computational option for the properties of liquid H2O corresponds to the IAPWS-95 formulation for thermodynamic properties [28] coupled with equations for electrostatic properties taken from Ref. [29]. The functions denoted by eos in Fig. 1 actually consist of two functions, hkf, for calculating as a function of temperature and pressure the standard molal thermodynamic properties of aqueous species using the revised HKF equations of state [30-33], and cgl, for calculating the properties of crystalline, gaseous and liquid (except H2O) species. The heat capacity equation implemented in CHNOSZ for these species contains up to six terms, as used in Ref. [34]; the first three terms are those in the Maier-Kelley equation [35,36] which is used in the SUPCRT92 package. The accessory function info provides a bridge between the thermodynamic and protein databases and the other functions. The function known as makeup is concerned with conversion between various computer- and human-readable representations of the chemical compositions of species. Its primary purpose is to transform the chemical formulas of species contained in the thermodynamic database (e.g., 'C4H6NO4-' for aspartate) into dataframe objects (which in R are similar to matrices with named columns and rows) so that other functions or makeup itself can perform further calculations on the stoichiometries of species. This function is also responsible for transforming a compositional dataframe back into a one-line chemical formula, and for calculating the reaction coefficients of basis species in formation reactions of the species of interest. It is with the aid of this function that subcrt checks whether a user-input chemical reaction is balanced with respect to mass and charge and automatically corrects the reaction if the necessary basis species have been defined. Examples of the usage of the info and subcrt functions are shown in the program transcript in Fig. 2. The standard molal thermodynamic properties at 25°C and 1 bar and the equations of state parameters of chicken lysozyme (LYSC_CHICK, accession no. P00698 in the Swiss-Prot database [37]) can be retrieved using the code shown in Fig. 2a. The properties and parameters whose values appear in the example are standard molal Gibbs energy (ΔG°) and enthalpy (ΔH°) of formation from the elements (cal mol-1), standard molal entropy (S°), heat capacity (
Chemical affinities and metastability diagramsThe primary function subcrt and the related accessory functions permit calculation of the standard molal Gibbs energies of protein formation reactions and corresponding values of the equilibrium constants (Kr in Eqn. M7). Calculation of the activity products and chemical affinities of reactions (Qr and Ar in Eqn. M7) is implemented in the sequence of primary functions basis, species, affinity that is depicted in Fig. 1. Two conditions are required of a valid set of basis species in CHNOSZ: 1) the number of basis species is equal to the number of elements (and charge, if present). 2) The stoichiometric matrix denoting the elemental composition (and charge if present) of the basis species, which is square according to condition (1), is non-singular and has a real inverse. These two conditions ensure that a formation reaction for any species of interest in the system can be written using only positive or negative real numbers as reaction coefficients on the basis species. The basis species themselves can be any species that are present in the thermodynamic database, including nonionized proteins. The function basis also permits redefining the physical states of basis species (if a corresponding species in that state is present in the thermodynamic database) and/or setting the activities (a) or fugacities (f) of the basis species to be used in the following calculations. These values have default settings given by log a = -3 for aqueous species, log f = 0 for gases and log a = 0 for other species. The function basis can also be used to assign a buffer to one or more basis species so that the activities or fugacities of those basis species are taken from the buffer system. After defining the basis species, the user can select any number of species of interest using the primary function species. The user may also call species to remove species or to alter the chemical activities or fugacities of the species of interest to be used in the calculations of chemical affinity. These values default to log a = -3 for aqueous species, log f = 0 for gases and log a = 0 for other species. The function affinity permits calculation of log Qr and Ar of formation reactions (such as those represented generically by Reaction M1) using Eqn. (M7) taking into account the activities and/or fugacities of the basis species and the species of interest. The contributions of the Qr and Kr terms to the calculation are denoted conceptually in Fig. 1 by the two arrows, from the top and left, respectively, pointing toward the box labeled affinity. The calculations of chemical affinity can be carried out at a single point in temperature, pressure, chemical activity space, or as a function of one or two of T, P and logarithms of chemical activity or fugacity of the basis species. The accessory function buffer is invoked by affinity if one or more basis species were previously associated with a buffer system; the activities or fugacities of the basis species constrained in this way are then used by the program to calculate log Qr using Eqn. (M5). The results of the calculations performed by affinity are accepted as input by diagram, which produces the diagrams using plotting functions provided in the R distribution. Many options are available for adding labels and legends and otherwise customizing the plot style. Thermodynamic databaseThe database of thermodynamic properties packaged with CHNOSZ is contained in a file named OBIGT.csv. Work on this database was motivated by a software project developed by H. C. Helgeson and coworkers, named OrganoBioGeoTherm, that provides a Windows interface to the SUPCRT92 program (J. J. Donovan, personal communication). The thermodynamic data file has records for over 2500 inorganic, organic and biochemical crystalline, gaseous, liquid and aqueous species. The thermodynamic data were originally taken from the data file distributed with the SUPCRT92 package. Updates since that time were taken from the SLOP98 data file downloaded from http://geopig.asu.edu webcite and from recent reports of thermodynamic data and revised HKF equations of state parameters for aqueous inorganic and organic species, as well as proteins and other species of biogeochemical interest [[38-40], and others]. The records in the data file include the names, states and chemical formulas of the species, up to two literature citations, and values of the standard molal thermodynamic properties at 25°C and 1 bar and equations of state parameters. The comma-separated-value (.csv) file format permits rapid reading of the data file by the CHNOSZ program or other software as well as addition to or modification of the file contents by the user. The CHNOSZ package also provides utility functions that can be used to export or import thermodynamic data to or from the SUPCRT92 data file format. The data file protein.csv of amino acid compositions of proteins has records for over 200 proteins including those referred to in the present study. The user can add the composition of a protein to CHNOSZ by modifying this file, or at run time by inputting the amino acid composition of the protein at the command line or requesting a search of the online Swiss-Prot database http://www.expasy.org webcite[37] through the function called protein. ResultsThe model cell-surface proteins used in this study are listed in Table 1. The selected organisms were chosen to represent diverse geochemical environments. It can be seen from the optimal growth temperatures given in Table 1 that three of the organisms (M. jannaschii, M. sociabilis and M. fervidus) are hyperthermophilic, others such as M. voltae are mesophilic, and one organism (M. burtonii) is psychrotolerant. The chemical formulas and standard molal Gibbs energies of the proteins shown in Table 1 are those calculated for the nonionized aqueous proteins. Although the real proteins form crystalline or paracrystalline lattices on the cell surface [41], we are restricted at this time to using an aqueous group additivity model for lack of a crystalline analog. The present formulation is also restricted to the polypeptide molecules of proteins and does not take account of the presence of the carbohydrate chains in the glycoproteins. The standard molal Gibbs energies of ionized proteins were calculated in the present study by combining those of the nonionized proteins with ionization contributions (see Ref. [18] and the Methods). Table 1. Model proteins used in the present study. The relative metastabilities of the model proteins were calculated as a function of temperature, pressure and chemical activities or fugacities of basis species. Results of the calculations are presented below primarily on metastable equilibrium activity diagrams depicting either the predominant protein species as a function of two intensive variables, or on speciation diagrams showing the metastable equilibrium chemical activities of proteins as a function of a single variable. The computations were carried out using the CHNOSZ software package together with a program script for use with the package that is provided in Additional File 1. Additional file 1. Program script for generating figures. This text file contains the program script used to make the diagrams in Figs. 3, 4, 5, 6, 7. Use the commands listed at the top of the file to generate one or all of the figures on screen or in postscript format. Format: TXT Size: 7KB Download file Predominance diagramsTo assess the relative metastabilities of surface-layer proteins from different organisms as a function of temperature, pressure and oxidation state, we can first write a reaction between the cell-surface proteins from M. voltae and M. jannaschii as which is a specific statement of Reaction M2 for the ionized proteins. The coefficient in front of each of the protein formulas is the reciprocal of the number of amino acid residues in the corresponding protein. Hence, protein length is conserved in Reaction 1. Let us now write a specific statement of Eqn. (M8) for Reaction 1 as where R stands for the gas constant and log K1 and A1 denote, respectively, the logarithm of the equilibrium constant and the chemical affinity of Reaction 1. The equal-activity boundary shown in Fig. 3a between CSG_METVO and CSG_METJA is consistent with metastable equilibrium between the proteins, or A1 = 0. The location of the boundary can be calculated by combining Eqn. (2) with A1 = 0, the equilibrium constant of the reaction, and the reference activities of the basis species and proteins. In this study, the reference activities of the proteins were set to 10-3 and those of the basis species set to the values listed in the Methods.
In Reaction 1 it can be noted that O2(g) appears on the same side of the reaction as
Figure 3a was generated in CHNOSZ using a sequence of commands similar to the following. The complete program script for this and the other figures is provided in Additional File 1: Execution of the first command shown in Example 3 defines the basis species characterizing the chemical system. Here, 'CHNOS+' is a keyword that identifies the basis species used in this paper and that appear in Reaction 1. The second command defines the species of interest, corresponding to the proteins listed in Table 1. With the third command, the chemical affinities of the formation reactions of each of the proteins are calculated on a two-dimensional grid as a function of pH and log The approach used in CHNOSZ to make predominance diagrams does not rely on writing metastability reactions as represented by Reaction 1 but instead on using formation reactions for the proteins. For example, a specific statement of Reaction M1 for CSG_METJA in its computed ionization state at 25°C, 1 bar and pH 7 is Using CHNOSZ, the chemical affinities of Reaction 4 and its counterparts for any other specified proteins of interest are first computed using Eqn. (M7). The chemical affinities of the formation reactions are then compared with one another to determine the theoretically predominant protein given the input conditions, which is the one with the highest chemical affinity of formation per residue. In this way, it is possible to generate predominance diagrams like those shown in Figs. 3a and 3b for any number of proteins. The diagram shown in Fig. 3a was produced using all ten proteins listed in Table 1, but only some of the proteins predominate at different points in the diagram. Removing these proteins from consideration leads to the results shown in Fig. 3b, where the metastability relationships among some of the less metastable proteins are depicted. Chemical activity (speciation) diagramsTo calculate the chemical activities of proteins in metastable equilibrium, let us consider two ways of writing the formulas of proteins in chemical reactions. The first is represented in Reaction 1 above, in which are entered the whole formulas of proteins. If the conditions are such that metastable equilibrium between the proteins in this reaction corresponds to activities of the proteins each equal to 10-3, we have in Eqn. (2)
Let us propose to write the formulas of proteins in metastability reactions as residue equivalents instead of whole protein formulas. The chemical formula or any standard molal thermodynamic property of a residue equivalent of a protein is defined to be that of the protein divided by the length of the protein. In contrast, assuming activity coefficients of proteins and residue equivalents to be unity, the chemical activity of the residue equivalent of the jth protein (aresidue, j) is equal to the chemical activity of the protein (aj) multiplied by the length of the protein (nj): aresidue, j = nj × aj.(5) We can rewrite Reaction 1 in terms of the residue equivalents of the proteins as In Reaction 6, the coefficients on the reactant and product residue equivalents are both set to unity. Hence, in both Reactions 1 and 6 protein length is conserved. Using Eqn. (M8) we can write for Reaction 6, Let us now consider conditions such that the metastable equilibrium activities of the proteins are each equal to 10-3. From Eqn. (5) we have aresidue, CSG_METJA = 0.530 and aresidue,CSG_METVO = 0.553, so log (aresidue,CSG_METJA/aresidue,CSG_METVO) = - 0.018. Now, if log The diagram shown in Fig. 5b was actually constructed using CHNOSZ by taking account of the formation reactions of residue equivalents of the proteins, instead of the metastability reaction represented by Reaction 6. To demonstrate this procedure, let us write the formation reaction for the residue equivalent of CSG_METVO as and that for the residue equivalent of CSG_METJA as Specific statements of Eqn. (M8) for Reactions 8 and 9 are, respectively, and At metastable equilibrium, A8 = A9, i.e. the chemical affinities of the formation reactions of the residue equivalents are equal. Values of log K8 = -367.714 and log K9 = -379.687 can be obtained using standard molal Gibbs energies at 25°C and 1 bar of the basis species and of the ionized proteins at pH 7 (see Fig. 4b). Let us also substitute the reference activities of the basis species described in the Methods and log and There are three unknowns in Eqns. (12) and (13). Conservation of protein length leads to a third equation: where the value on the right-hand side corresponds to initial activities of the proteins each equal to 10-3. Solving Eqns. (12)–(14) gives The addition of any protein to the system increases by one the number of unknowns in Eqn. (14) but also provides another equation in the form of Eqns. (12) and (13). The procedure to set up and solve these equations has been encoded in a general form in CHNOSZ and was used to produce the diagrams shown in Fig. 5. The CHNOSZ program includes options to analyze the protein formation reactions using whole protein formulas or their residue equivalents, which were used to construct Figs. 5a and 5b, respectively. The logarithm of total activity of protein residues is 0.8211 in each of these figures, which corresponds to the sum of the activities of the residue equivalents of the ten model proteins whose starting activities are 10-3. Another way of representing the chemical speciation in a protein system is on a degree of formation diagram. The degree of formation of the kth protein (αk) can be calculated from where Additional file 2. Degree of formation diagram. This file contains the degree of formation diagram related to Fig. 5b (see text) together with the program script used to make the figure. This additional material is the source of the graphical abstract for this paper. Format: PDF Size: 39KB Download file This file can be viewed with: Adobe Acrobat Reader The residue-equivalent approach was used in this study only to produce the diagrams shown in Fig. 5b and Additional File 2. The predominance diagrams shown elsewhere were produced using whole protein formulas in the formation reactions. Extending the residue-equivalent method to these diagrams would subtly alter the positions of the predominance field boundaries, more so for reactions between proteins that differ significantly in length. The differences in the locations of the predominance field boundaries can be assessed in part by comparing the locations of the crossover between predominant proteins in Figs. 5a and 5b. Temperature and pressure diagramsThe approach described above for constructing Fig. 3 in CHNOSZ was used to produce the diagrams shown in Figs. 6a and 6b. These diagrams portray the metastabilities among the predominant model proteins as a function of temperature or pressure and log
We can recover nominal values of log It appears in Fig. 6b that increasing pressure also generally favors those proteins in lower oxidation states, but that the dependence of equilibrium log Proteins as chemical activity buffersThe chemical activities of basis species buffered by reacting protein assemblages correspond to the locations of the (pseudo)invariant points on metastable equilibrium predominance diagrams. Equal activities of three proteins correspond to the triple point, which is a pseudoinvariant point, in the predominance diagram shown in Fig. 3b. The number of independent variables on the axes of this diagram is two; in an eight-dimensional predominance diagram (of temperature, pressure and six chemical activities) one could distinguish the true invariant points in this system where nine proteins coexist with equal metastable equilibrium activities. Let us ask what are the activities of CO2(aq) , H2O, NH3(aq) and H2S(aq) if they are buffered by a hypothetical metastable assemblage made up of the proteins from the METXX organisms listed in Table 1, at T = 100°C, P = 1000 bar, pH 7, log A rearranged statement of Eqn. (M8) for this reaction can be written as where where the rows on the right-hand side and in the stoichiometric matrix on the left-hand side correspond to the proteins from the METXX organisms listed in Table 1. Solving Eqn. (18) gives The pseudoinvariant point representing the buffer assemblage described above is shown in Figs. 7a and 7b. The same pseudoinvariant point is present in both figures, but different variables are projected onto each diagram. The temperature-pressure relationships appearing in Fig. 7a suggest that metastability of CSG_METJA increases relative to that of CSG_METVO with increasing temperature and/or pressure, but that the sensitivity to temperature is much greater than that to pressure. These relationships are also apparent in Figs. 6a and 6b. In the projection of Fig. 7a all the proteins at the pseudoinvariant point are not visible, but in Fig. 7b convergence of the five predominance fields is apparent. Note the similarity in Figs. 7b and 3a of the reaction boundary between CSG_METVO and CSG_METJA, as well as the nearly horizontal boundary between CSG_METSC and CSG_METFE, which would be expected from the closeness of their ionization states as a function of pH (see Fig. 4a for the ionization states at 25°C).
Concluding remarksA computer program called CHNOSZ was introduced in this paper for producing metastable equilibrium chemical activity diagrams for proteins. The methods used here were borrowed from geochemistry, and the program with the accompanying thermodynamic database is suitable for performing thermodynamic calculations in inorganic and mineral systems as well as organic and biochemical systems, or combinations thereof. To investigate the utility of the program for a geochemical description of protein reactions, metastability diagrams were produced for surface-layer proteins from a number of bacteria and archaea. The diagrams show either the metastably predominant proteins as a function of two intensive variables or the metastable equilibrium chemical activities of proteins as a function of one variable. The primary variables of interest in this study were log In the preceding sections we have considered the theoretical metastable equilibrium relationships among only a few model proteins. Because the software is now available to do so, a plethora of predictions concerning the energetically favorable outcomes of any number of overall protein mutation reactions is now within reach. Consideration of the results presented above, and of the wide range of model systems that could potentially be investigated in a similar manner, leads to the conclusion that the metastable equilibrium distribution of proteins in many cases does not mirror geobiochemical reality. Nevertheless, the ability to quantify the characteristics of metastable equilibrium reference states as a function of geochemical variables may be of utility in identifying specific pathways in evolution where the resulting proteins are relatively energetically favored. These particular outcomes may reflect a tendency for natural selection to increase the fit between phenotypes and their environments [46]. A thermodynamic and geochemical perspective on the relative metastabilities of proteins permits a quantitative integration of observations on the geosphere and biosphere. This study has only touched the surface of the myriad possible environments and organisms, the properties and chemical compositions of which are becoming more well constrained through experiment and observation. As these data grow in abundance, they will provide other opportunities where thermodynamic description of the chemical speciation of proteins can be tested and calibrated. MethodsThe thermodynamic conventions and relations used to compute the relative metastabilities of proteins in the present study are summarized below. The computational assessment depends first on the adoption of standard states for the species appearing in chemical reactions. Standard state conventionsThe standard state convention adopted for aqueous species other than H2O corresponds to unit activity of a hypothetical one molal solution referenced to infinite dilution at any temperature and pressure [30,47]. The conventional standard molal thermodynamic properties of both the aqueous electron and proton are taken to be zero at all temperatures and pressures [48]. For gases, the standard state convention is unit fugacity of the hypothetical pure ideal gas at 1 bar and any temperature. The standard state convention adopted for solids and liquids, including H2O, corresponds to unit activity of the pure substance at any temperature and pressure. Protein formation and metastability reactionsThe compositions of species of interest, such as proteins, are represented by linear combination of the compositions of basis species in a system (for an application in geochemical systems, see Ref. [49]). The number of basis species is the minimum required to write formation reactions for all possible species of interest. There are no thermodynamic restrictions on the actual identities of the basis species, and the basis species do not necessarily correspond to thermodynamic components in the system of interest [50]. Hence, the choice of basis species may be constrained by the chemical activities that can be measured in a system or that are thought to behave as perfectly mobile components [22]. The basis species used in the present study are CO2(aq), H2O, NH3(aq), H2S(aq), H+ and O2(g). Let a generic chemical formula for the jth ionized protein be written as The reaction coefficients on the basis species in Reaction M1 are completely determined by the chemical formulas of the protein and of the basis species. Depending on the sign of the coefficients in front of the basis species, they would appear in specific statements of Reaction M1 as reactants or products. A generic metastability reaction between two proteins (j = 1 and j = 2) can be written as which corresponds to the difference between specific statements of Reaction M1 for j = 2 and j = 1, divided by n2 or n1, respectively. Here, 1/n1 and 1/n2 denote the conservation coefficients for the corresponding proteins. Reaction M2 is balanced with respect to mass and charge for any values of n1 and n2. If n1 = n2 = 1, Reaction M2 denotes the mass balance constraints for the formation of one mole of product protein at the expense of one mole of reactant protein. Other values may be chosen for n1 and n2, depending on what is specified about the conservation constraints in the system. For example, if n1 = C1 and n2 = C2, the protein metastability reaction conserves carbon [18] (i.e., the coefficient on CO2(aq) in Reaction M2 becomes zero). The protein metastability reactions considered in the present study are written for nj equal to the length of the jth protein. Relation of reaction energetics to activities of basis speciesThe standard Gibbs energy of the rth formation or metastability reaction ( where The equilibrium constant, like where ai represents the chemical activity of the ith species in the reaction. For gaseous species, ai in Eqn. (M5) is replaced by the fugacity of the species (fi). Activity and fugacity coefficients are taken in a first approximation in this study to be unity. The activity or fugacity of the ith aqueous or gaseous component is related to its chemical potential (μi) by [6] where The chemical affinities of reactions (Ar) can be computed from [51] Ar = 2.303RT log (Kr/Qr),(M7) which can be combined with Eqn. (M5) to write for Reaction M2 In an equilibrium state, Ar = 0 for metastability reactions and Eqn. (M8) reduces to the logarithmic analog of the law of mass action equation for Reaction M2. Reference activities of basis species and proteinsThe reference temperature and pressure correspond to 25°C and 1 bar, respectively. The reference chemical activities of basis species used in this study are given by log Equations of stateThe standard molal thermodynamic properties of aqueous species as a function of temperature and pressure can be evaluated using the revised Helgeson-Kirkham-Flowers (HKF) equations of state [30-33,54,55]. The temperature dependence of the standard molal thermodynamic properties of crystalline, gaseous and liquid species other than H2O are calculated using a standard equation for heat capacity [34,35,56]. For the basis species other than H+ and e-, values of the standard molal thermodynamic properties and of the equations of state parameters were taken from Refs. [55,57] (CO2(aq), NH3(aq) and H2S(aq)) and [58,59] (O2(g)). The equations of state adopted for liquid H2O in the present study are those used in the SUPCRT92 software package [24]. Group additivity algorithms for ionized proteinsThe standard molal properties and revised HKF equations of state parameters of ionized proteins are calculated in the present study using group additivity algorithms and data taken from Ref. [18] and outlined briefly below. The standard molal Gibbs energy of the jth unfolded protein with net charge denoted by Zj ( where where, for the ith type of ionizable sidechain or backbone group, ni, j represents the number of moles of the group in one mole of protein, αi denotes the degree of ionization of the group (0 <αi < 1), and Although where Zi denotes the charge (+1 or -1) of the ith ionized group and αi (also in Eqn. M10) is given by where pKi represents the negative logarithm of the equilibrium constant for the deprotonation reaction of the ith ionizable group. For a protein composed of a single polypeptide chain, the values of where represents the total number of amino acid residues, or length of the protein. Values of The thermodynamic properties of unfolded aqueous proteins calculated using the above equations are taken in a first approximation to be representative of the proteins of interest, which may be folded and/or present in crystalline form in cells. Two observations lend support to the applicability of the unfolded protein reference state for the present calculations: 1) The standard molal Gibbs energies of protein folding would tend to cancel each other in metastability reactions, in which proteins appear on both sides of the reaction. 2) The Gibbs energy of unfolding for a small to average-sized protein is about two or three orders of magnitude smaller than the standard molal Gibbs energy for the unfolded protein itself. For example, the Gibbs energy of unfolding of chicken lysozyme is ~14.5 kcal mol-1 at 25°C [61], but the standard molal Gibbs energy of this protein at 25°C and 1 bar is ~-4.2 × 103 kcal mol-1 (see Figs. 2a and 2b). The size of the unfolding property in this case is much smaller than the ca. ± 5% uncertainty ascribed to the group additivity algorithm [18]. It should be noted, however, that the compositional consequences of protein folding include changes in ionization state, and preferential surface exposure of charged residues [1], which would be manifested by changes in the reaction coefficients of basis species that might affect the outcome of metastability calculations to a greater extent than the differences in Gibbs free energy alone. Competing interestsThe authors declare that they have no competing interests. AcknowledgementsI would like to acknowledge the late Professor Harold C. Helgeson for his friendship and advice during the Ph.D. research project that provided the foundation for this paper. This work was supported by grants EAR-0309829 from the U.S. National Science Foundation and DE-FG02-03ER15418 from the U.S. Department of Energy. References
Have something to say? Post a comment on this article! |





on Google Scholar








author email
corresponding author email
Figure 1.
Figure 2.

Figure 3.


Figure 4.


Figure 5.














Figure 6.








Figure 7.


















and 





