|
|
||||||||
Review Article |
1 Centre for Mathematical Biology, Mathematical Institute, University of Oxford, 2429 St Giles, Oxford, OX1 3LB, UK2 Bioengineering Institute, University of Auckland, Private Bag 92019, Auckland, New Zealand3 University Laboratory of Physiology, Parks Road, Oxford, OX1 3PT, UK
| Abstract |
|---|
|
|
|---|
(Received 21 October 2003;
; first published online 5 November 2003)
Corresponding author P. Hunter: Bioengineering Institute, University of Auckland, Private Bag 92019, Auckland, New Zealand. Email: p.hunter{at}auckland.ac.nz
| Introduction |
|---|
|
|
|---|
The physical sciences, on the other hand, have for the past 200 years confronted nature's complexity with the development of mathematical models of natural phenomena. Our ability to understand complex fluid flow, for example, in order to design aircraft or forecast weather, is a testament to the physicists and mathematicians of the 19th and 20th centuries, who identified nature's physical conservation laws and developed the mathematical framework to describe them. The successful application of these laws to the solution of engineering problems has also advanced greatly in the past 50 years from the development of computers and numerical analysis [every new complex engineering device is, these days, designed with the aid of mathematical models and finite element (or similar) analysis].
The use of mathematical modelling in physiology gained prominence in the 1950s with the successful prediction by Hodgkin and Huxley of the speed of action potential propagation along a nerve fibre from cable theory coupled to models of ion channel conduction and gating kinetics. Other early successes in applying techniques from the physical sciences to physiological systems were the engineering analysis of blood flow in arteries using computational fluid dynamics, and orthopaedic stress-strain analysis using linear elasticity theory and finite element analysis. In both cases, however, the biological problems are not significantly different from other engineering problems blood can be treated as a viscous Newtonian fluid like water, albeit with higher viscosity and some unusual characteristics at low shear rates, and bone behaves as an orthotropic linear elastic material, at least until one considers its ability to grow and remodel. It was not possible in these early applications of mathematics and engineering to biology, which pre-date modern molecular biology, to link physiological behaviour to molecular detail.
In this article we argue that the discipline of physiology should now embrace the new era of computational physiology in which mathematicians and bioengineers will work alongside physiologists and molecular biologists to link the physiology of cells, tissues and organs to the growing genomic and proteomic databases. The time is right genomic and proteomic data are routinely collected; it is clear that the timing and spatial location of gene expression is controlled by environmental factors conveyed to the transcription factors in the gene regulatory regions by complex and redundant pathways which can only be analysed with network models; the physiological significance of genetic diseases will only be understood by linking quantitative models of tissue and organ physiology with the signal transduction cascades, metabolic pathways and other cellular processes. However, the tools required for this so-called Physiome Project[the term physiome comes from physio (life) and ome (as a whole) and is intended to convey a quantitative description of physiological dynamics and functional behaviour of the intact organism] are significantly different from the tools of standard engineering analysis for a number of reasons. Biological materials are almost always inhomogeneous, anisotropic and exhibit non-linear behaviour, but nowadays even these characteristics are not unusual in engineering materials. Biological processes (such as signal transduction pathways) exhibit enormous complexity, often with extraordinary degrees of apparent redundancy, but again so do engineered systems such as the electronic circuits in a Boeing 777 aircraft. The really significant and unique characteristic of biological materials is their ability to grow and remodel in response to changing environments determined partly by genes and partly by their physical environment. An important consequence is that structure and function are intimately linked in a way that no engineering material or system can emulate. Capturing these structure-function relationships in a computationally efficient manner is the key to successful computational physiology and requires models and software that are fundamentally different to those found in the engineering world.
Our physiome theme is developed in several stages. We begin with a discussion of biological complexity and the sources of data for modellers at the molecular level. We briefly describe the framework of ontologies and markup language standards that are being developed for handling our knowledge of biological systems at all levels of biological organization from genes to organisms, integrating across species, age and pathological state. We then discuss progress in modelling biological systems, beginning with gene regulation and cell function and ending with anatomically and biophysically based models at the organ level. The heart and lungs are used to illustrate the use of models that integrate structure and function across multiple spatial scales. Throughout the paper we use the heart as our primary example of multiscale modelling because the heart has provided the first example of a physiome model of an organ.
| Biological complexity |
|---|
|
|
|---|
Biological complexity and the direction of causality
But why should we bother with such complexity? Having broken biological systems down into their smallest components, have we not begun to reach our goal of understanding, at least in principle? Wasn't the aim to simplify, not to complexify? Could we not now compute the behaviour of living systems (Brenner, 1998) from the automatic and inevitable interactions of their component parts, starting with genes coding for proteins, proteins interacting to form pathways, in turn to form cells, tissues and organs and so on up to the highest levels? There are at least two reasons why this is not possible. The first reason is that it is computationally impossible and would be lacking in insight (Noble, 2002a). Being able to reconstruct a process is not sufficient, by itself, to understand the process. There will be many parameters in the reconstruction equations that will be in need of explanation, and we will also want to achieve more general insight into how the process reacts to perturbations, including most importantly, those of disease states. The second reason is that causation does not simply run in one direction (i.e. upwards). Lower level events, such as gene expression, are controlled by higher level processes. Without understanding the higher level logic in its own right (Noble & Boyd, 1993), the lower level data will often be just that: masses and masses of unexplained data.
Levels of Selection
But do the higher levels have a logic? Or are they just the lumbering robots (Dawkins, 1976) of gene-level selection, with about as much logic as a cloud formation? The issue of levels of selection is now the subject of important debates in the theory of biology (Williams, 1992; Keller, 1999; Gould, 2002; Krakauer, 2002) The question here is the level at which selection operates. If it really were the case that selection is entirely gene-orientated, then the lumbering robots (i.e. the physiology of higher level function) would indeed be of limited value, even irrelevant to a general theory of biology. However, this gene-orientated view confuses several different processes. Genes are not simply physical stretches of DNA, just as the information on a magnetic surface is not the surface itself. Genes are the carriers of information that any part of an organism can read, information which is transmitted from one generation to another. This transmission is parsimonious in the sense that it is not necessary to code for everything that happens in the development and life of an organism. The properties of water, lipids and the rules of self-assembly, for example, must all be taken for granted. As a book of life the genome is necessarily incomplete (Noble, 2002b). It may also be repetitive and redundant, though the extent to which junk DNA is functional, rather than truly redundant, is also an open and controversial issue.
Replicators and interactors
These considerations are usually expressed by distinguishing between replicators, containing information that gets copied from one generation to another, and interactors, such as macromolecules (even DNA itself, as a chemical substance, is in this sense an interactor), biochemical pathways, cells, organisms, and species and clades, which both carry the replicator information and are subject to selection. Since genes interact so that many genes contribute to any given higher-level function, and each higher-level function depends on many genes for its transmission, natural selection must act at the levels at which functionality appears and this will only very rarely depend on a single gene. One of the challenges for theoretical biology is to identify the levels at which different functions can be said to be expressed and therefore to identify the levels at which natural selection must operate. This is a challenge to which the Physiome Project can respond since, when it is complete enough, it should reveal the modular nature of gene interaction, and the extent of that modularity, at all levels.
Gene numbers and functions
Analysing biological complexity at various levels is therefore necessary. But there must be very many different ways of measuring and assessing complexity, some more appropriate to one level rather than another, with each level having its own criteria. One way to look at complexity from a genome level is to ask how many biological functions are available to evolution in any given genome. For a genome of 40 000 genes if we make the extreme assumption that only two genes are needed to define a function then the total number of possible functions would be 0.5 x 40 000 x 39 999 = 799,980,000. Even with this absurdly minimalist view, the number of possible combinations of effects is very large. With more realistic assumptions about the number of genes involved in each function, the figures are really huge: thus at 100 genes/function we obtain
1.5 e302; and if we allow for all possible combinations we get
2 e166713! This has two consequences that are relevant to complexity. The first is that with such huge potential combinations of gene product interactions, trying to understand the system purely at the genome level is clearly impossible. Nature can have used only a minute fraction of these combinations and we need to look at these from above to identify them, i.e. we need to look at the genome through the eyes of higher levels, including physiology. Since only a minute fraction of possible interactions are used, this introduces another way of viewing the question of how contingent evolution has been. With such enormous possibilities, the chances of humans (or any other species) evolving may be exceedingly small. The counterargument would be that evolution really is constrained. But constrained by what? This must be constraint by the logic of life at a higher level. In both cases, though, we need to understand the higher-level logic, either as a constraint or as a chance historical success, or most probably both: chance and necessity interact.
The (imperfect) logic of life
We are a long way from being able to judge such broad questions. But it is worth noting here that we would not expect the logic to be perfect from an engineering point of view. Evolution is a dynamic process and there is no reason to suppose that what has evolved is fully in equilibrium with its environment (though Gould's view of stasis in evolution comes close to this see Williams, 1992, chapter 9). Moreover, evolution is an historical process, with all the accidents and dead ends that a blind journey will produce (see particularly chapter 6, Historicity and Constraint, in Williams, 1992). Genes are perhaps better viewed as prisoners of the successful physiological systems that carry them than the determinants of those systems. They simply, and necessarily, code what has worked well at the level of the interactors. Worked well is the correct description. Works best would require design that avoids the historical accidents (using existing structures for a new function, such as the formation of the mammalian middle ear from modified jaw and hyoid arch bones) and evolutionary traps (such as the convoluted path of the sperm ducts following descent of the testes, or the inversion of the retina). Ultimately, we must be able to account for the evolutionary history of functional development. As Jared Diamond has insisted (Diamond, 1993), physiology should re-connect with the mainstream biological theory of evolution.
Comparison of genomes
These calculations put into perspective the frequent comparisons of similarity of genomes. If we differ from species x by only 1% then, from a genome point of view, the differences look insignificant. Using the same calculations as above, however, we get a totally different perspective. Introducing less than 1% of 1% of a genome of 40,000, i.e. a single gene, doubles the available functions. The argument is remarkably simple. We still have all the possible functions of the original set of genes, and all of those again in combination with the new gene. If we differ from another species at the genome level by, say, 1% (i.e. 400 genes) then the potential differences at a functional level are truly enormous. Even a 0.1% difference corresponds to 40 genes and also a huge number of new possibilities. To these calculations, we have also to add the influence of the environment through its control of gene expression levels. Suppose, for example, that gene expression levels might vary for a given protein by around an order of magnitude. This would mean that, to the original 40 000, we can add a further 400 000 elements to play a part in combinations. That puts the naturenurture argument in perspective, and the absurdity of trying to arrive at percentages for the influence of each. Of course, only a very small proportion of the possible combinations actually occur.
| Functional genomics and molecular physiology |
|---|
|
|
|---|
The enormous challenge lying ahead of the human genome project, and the ultimate goal of functional genomics, is to understand the role of the genome in development, normal physiology and, perhaps most significantly, in disease. New experimental technologies are accelerating the identification of patterns of gene expression that delineate tissue types and underlie pathology. Altered gene expression in response to changing stimuli or through disease gives rise to reprogramming and changing function within tissue. Examples in cardiac ventricular tissue that have received recent attention include the molecular changes underlying the sequence of structural and functional modifications during remodelling after myocardial infarction (Cohn et al. 2000; Sehl et al. 2000; Stanton et al. 2000) and those responsible for modified electrical properties generating cardiac arrhythmia (Antzelevitch et al. 2001; Noble, 2002c). Variation in gene expression between individuals gives rise to increased disease susceptibility or responsiveness to therapy and it is a major aim of functional genomics to unravel the combinations of environmental and genetic factors that underlie complex disease, which to this point have proved elusive. These examples highlight the position of functional genomics at the interface between molecular biology and physiology. [This has been recognized in recent titles for the quadrennial IUPS International Congress of Physiological Sciences, held in 2001 in Christchurch, New Zealand, under the banner from Molecule to Malady, and due to be held in San Diego in 2005 with the focus from Genomes to Functions].
Progress on understanding the role of the genome in cell, tissue and organ function is a highly integrative problem, which requires significant computational input. The major emphasis is currently on genome (and proteome) informatics: databasing, annotation and so forth (Winslow & Boguski, 2003). In addition to sequencing and annotation projects for different species, genomic databases with a particular tissue or disease focus are now being developed, for example, the Cardiac Gene Expression knowledgebase (CaGE; http://www.cage.wbmei.jhu.edu), which aims to catalogue all genes expressed in human cardiac tissue (Bober et al. 2002). While this is of great importance for making genomic data available and usable, it is increasingly apparent that the challenge ahead is to give quantitative predictions to physiological outcomes from genomic knowledge. As more and more data accumulates, a computational framework is required which can reach down to the genomic level via modelling of regulatory processes; in particular gene regulatory networks, interactions with signalling and metabolic pathways and how these influence and effect higher level organization and function.
The trend in current biology is towards comprehensive, automated measurement techniques. The development of high-throughput technologies for the systematic measurement of gene transcripts, proteins and their interactions, coupled to advances in computational data processing, provides a method for probing the functional role of large numbers of genes. DNA microarrays have emerged as the pre-eminent technology for large scale parallel gene expression studies (see, for example, The Chipping Forecast II, 2002).
Gene expression profiling
Microarrays exploit the complementary binding properties of nucleic acids: fluorescently labelled gene transcripts (mRNA) isolated from a tissue sample are allowed to hybridize with molecules of known sequence (ESTs or cDNA) immobilized at particular locations in an array. The fluorescence intensity at a particular spot on the microarray reveals the amount of the gene transcript which has hybridized at that location and the ratio of intensities of red and green fluorescent labels on the same array provides a measure of the differential expression level of the corresponding genes between two samples. These data on the up- and down-regulation of genes can be collected across different tissue types, disease states and in response to different stimuli. Data collected on temporal gene expression profiles, by sampling mRNA at several different times following experimental perturbation, provide gene expression time series data which are crucial for the development of dynamic models of gene regulation.
Studies of human disease using microarrays for gene expression are increasingly common, as shown in several recent reviews on microarrays in cardiovascular research and medicine (Cook & Rosenzweig, 2002; Napoli et al. 2003). Disease and tissue-specific microarrays have been developed for example, a cDNA CardioChip, comprising over 10 000 cardiovascular-related ESTs (Barrans et al. 2001; Barrans et al. 2002). Most of these studies aim to identify differentially expressed genes (Henriksen & Kotelevtsev, 2002; Simkhovich et al. 2003). Clustering gene expression profiles is a common starting point for the analysis of microarray data and is used to find functional groupings of genes (Eisen et al. 1998; Slonim, 2002). If the expression profiles of a set of genes are closely correlated over many different experimental conditions then this co-expression is taken as a possible indication of a similar functional role for the cluster of genes or, even, of shared regulatory control mechanisms. Similarly, principal component analysis (PCA) can be used to reduce the data to a few simple underlying expression patterns, in particular for periodic temporal patterns of gene expression (Alter et al. 2000; Holter et al. 2000). The guiltby association of new genes with genes of known function via clustering suggests new targets for study.
Data requirements for modelling regulatory networks
High-throughput gene expression techniques generate large amounts of data. The primary concern for the analysis and modelling of large scale data sets is the so -called curse of dimensionality: the parameter space for a model grows exponentially with the number of variables (genes). This makes finding appropriate parameter values for large scale gene network studies a major challenge. The key to this problem is the use of prior knowledge on interactions and components of the network which have already been identified, features common to biological networks such as sparseness (Jeong et al. 2000), and the use of simplified response functions in the model, can all be used to reduce the number of undetermined parameters. To fully characterize network behaviour it is necessary to sample gene expression profiles under many different combinations of inputs and perturbations, for example point mutations, gene deletions and over-expression, inhibition of translation by antisense RNA, and perturbation of signalling and metabolic pathways.
Localized information on gene expression, required for spatially extended models, can be obtained from immunofluorescence studies. Laser capture microdissection (LCM), which automates the dissection of specific cells and multicell structures which have first been identified under the microscope (Emmert-Buck et al. 1996; Mills et al. 2001), allows gene (and protein) profiling studies to be conducted at high spatial resolution. The quantity of mRNA which can be extracted from such small samples may not be sufficient for high quality microarray studies. For detection of low abundance transcripts quantitative reverse-transcription PCR (RT-PCR) can give high sensitivity measurements of transcript levels, with the added advantage that RT-PCR can measure absolute, rather than relative concentrations of RNA (Schnell & Mendoza, 1997). PCR primers are required for each gene of interest and so, as with microarrays, the technique cannot be used to analyse previously unidentified genes. Transcripts of unknown genes can be measured using Serial Analysis of Gene Expression (SAGE; Velculescu et al. 1995; Yamamoto et al. 2001; Patino et al. 2003), where large numbers of transcripts are counted and analysed efficiently by sequencing only short tags from each RNA molecule. These short sequences can usually identify the corresponding gene (as long as its sequence is known). All tags collected from a sample are assembled into a single molecule which is then amplified and sequenced, providing quantitative information on the amount of each transcript in the sample.
Proteomics and metabolomics
Gene expression profiles describe the transcriptional state of cells or tissue, but do not directly reflect protein abundance. Proteomics and metabolomics aim to provide an unbiased identification and quantification of all proteins and metabolites present in a biological sample. The proteome encompasses modifications of protein
molecules and interactions between proteins. Knowledge of the changing protein content of cells, and functional interactions between proteins is of primary importance in developing models of cellular function. Gene expression studies provide the link between the (relatively) static genome and the highly dynamic protein content of a cell, responding to environmental changes, signalling pathways, as well as normal homeostatic protein turnover. The relation between gene expression profiles (the transcriptional state), as measured by mRNA level, and net protein synthesis depends on translation and post-transcriptional modification rates, as well as on the rate of protein decay, whether the molecule is actively degraded or transformed.
Protein molecules are intrinsically less amenable to high-throughput studies than are nucleic acids due to their secondary and tertiary structure. Technologies for large scale proteomics analyses include separation techniques (2D-PAGE, Poly Acrylamide Gel Electrophoresis) and mass spectrometry. Microarray-based techniques are also available for protein studies, although these platforms are less developed than DNA microarrays, both for detecting protein abundance by using different affinity reagents (for example antibodies, as a multiplexed immunoassay) to bind to proteins selectively, and for proteinprotein interactions (MacBeath, 2002), also studied in the Yeast Two-Hybrid system. The identification and measurement of interactions, both stable and transient, between biomolecules in vivo is probably the widest current knowledge gap, and greatest experimental challenge.
Issues and standards for gene expression profiling experiments
Important for the development of robust and predictive models is the quality of the data available, good data annotation, and a methodology for assessing and managing the sources of variability in the data. There remain many issues to do with data reproducibility, variability and experimental design for microarray experiments (Churchill, 2002; Speed, 2003), which we will not discuss here. Differences between nominally similar tissue samples reflecting normal biological variability need to be quantified to minimize false positives in the identification of differentially expressed genes. Modelling may be a useful tool for identifying the sources of robustness to normal biological variability in regulatory networks (Alon et al. 1999; von Dassow et al. 2000).
The details of experiments performed, on sample preparation and so on, have significant bearing on the interpretation of profiling studies. With this in mind, ontologies and languages for the annotation of gene expression and proteome data are being developed (Stoeckert et al. 2002; Taylor et al. 2003). Standards have been proposed for describing microarray experiments, MIAME (Minimum Information About a Microarray Experiment) by the Microarray Gene Expression Data society (MGED; http://www.mged.org), which in late 2002 were adopted by several major journals including Nature, The Lancet and Cell as requirements for publication of the results of microarray studies. Annotated data is required to be deposited in one of several public databases: for example Gene Expression Omnibus (GEO) at NCBI (http://www.ncbi.nlm.nih.gov) and ArrayExpress at EBI (http://www.ebi.ac.uk/arrayexpress), which conform to the MIAME standards. A mark-up language for data exchange between databases, MicroArray Gene Expression Markup Language (MAGE-ML; http://www.mged.sourceforge.net), has been developed by MGED and others, under the auspices of the Object Management Group (OMG; http://www.omg.org).
| Physiome ontologies and markup language standards |
|---|
|
|
|---|
Levels of representation
The physiome modelling framework can be considered at three levels:
The goal is to represent the details of these levels in a way that can be used to explore the meaning of ideas and observations across the levels. Biology is usually represented using biological terms and relationships in a natural language context. Mathematics is a language in which we can represent theories for the processes and structures that the biological terms and relationships describe. At the computer representation language level, both the biological and mathematical interpretations need to be represented in constructs that can be used in various computer applications, such as simulation and visualization tools, and data repositories.
The three levels of representation are intimately related. Biological concepts link to mathematical concepts and simulation architectures, which link to computer based representations. At each level there is a description of the model, and at each level we find information that feeds across into the other levels. An example; a biological concept of predatorprey interactions can be modelled mathematically as two coupled ordinary differential equations. The way we express this model makes a certain claim on the relationship of the rates of change of predators and prey. Simulation of the model helps us to understand the longterm behaviour of such a description, and allows us to measure data appropriate for validating the biological reality of the model.
The requirements for this multilevelled modelling framework to be effective are: (i) tools for experts to represent models and solutions within particular levels using the language and interfaces that are natural to them, and (ii) computer based representations of what is described using these tools.
One toolset under the first of these requirements that is being developed is a visual editor. Visual building blocks and interfaces are a very natural way for people to navigate libraries and to build and interpret models (see Fig. 1). To develop a framework for building toolsets, we need to have good solutions to the second requirement, i.e. computer based representations of what modellers are describing. To this end, various representation languages, including ontologies are being developed. The integration of ontologies into the physiome project will provide an unambiguous, and machine interpretable, representation of concepts across these levels of modelling, helping to communicate biological models through tools for building, sharing, interpreting, and visualization. One such representation language, CellML (http://www.cellml.org) has been developed over the last four years, and aims to represent models at the cellular level. The CellML language itself is a set of constructs that have elegant interpretations within both the computational simulation domain and the object orientated programming domain. As a modelling representation language, it is sufficiently generic to represent any mathematical representation of any biological model, not just cells, so serves as a base for the development of a more generalized modelling representation environment. The evolution of this generalized representation environment includes the integration of ontology data, which provides a machine-interpretable pathway through the levels of modelling, and the further development and integration of FieldML (http://www.bioeng.auckland.ac.nz/physiome/markup.php) which facilitates the representation of structural and continuum based information about biological and physical entities.
|
Ontologies are a vehicle for providing unambiguous descriptions of terms and their relationship to one another. To a computer scientist, they provide a formal framework for describing the properties and relationships of concepts that have both a formal logical foundation and a structure amenable to machine processing, interpretation and sharing. To a biologist or modeller, ontologies provide a thesaurus and structure for understanding and binding terms, ideas, data sets, and visualizations, etc.
Many different groups are constructing ontologies for various biological domains. One approach to integrating ontologies from different biological domains would be to generate a large composite ontology. However, this is not the intention of the IUPS project's ontological framework because the biological ontologies that currently exist do not form pieces of the same puzzle they may have biology in common, but it usually stops there. There is no currently agreed core framework or methodology that can be used to guide the development of compatible domain specific ontologies, but there are efforts underway to promote such a platform. The Unified Medical Language System (UMLS), for example, is attempting to bring together various ontologies from different domains into a composite ontology that fully integrates these knowledgebases.
The current view on the IUPS project's ontological framework is shown in Fig. 2. Some new ontologies are being built from scratch while some existing ones will need to be extended and integrated as both a common framework and data source (i.e. a composite approach). The focus at present is to describe constructs for interpreting our computer based model representations within the biological and mathematical levels of modelling. The domains of modelling theory, and the ML library domains (Fig. 2) capture representations of mathematical relationships, model architectures, and component structure (both physical and abstract).
|
|
| Modelling gene function |
|---|
|
|
|---|
|
One way to greatly simplify the mathematical representation of networks is to ignore the details of molecular interactions and focus instead on their outcomes, namely whether a gene is on or off, according to whether its transcription level is above a given threshold. Regulation is then represented by logical operations (AND, NOR, etc.) on the gene expression states, according to whether interactions activate or repress transcription. Gene expression levels are logical variables which are updated synchronously according to a rule table (a set of if then instructions) which describes the logical operations representing the regulatory interactions between genes. A Boolean network thus represents a wiring diagram for the gene network (Bolouri & Davidson, 2002; Davidson et al. 2002). The aim of a reverse engineering approach is to infer the logical rule table directly from data. Techniques have been proposed which demonstrate that in principle a Boolean network can be constructed from data using no prior knowledge (Liang et al. 1998).
Although the properties of Boolean networks are much simpler than their continuous variable counterparts, they retain many of the properties of networks that are important in terms of gene function. For example, steady states are achieved as the network settles down into a stationary state or a repeating pattern of states (oscillation). The appeal of this modelling framework is its simplicity, but much biological detail is clearly lost. In reality, gene expression levels recorded in time course microarray studies spend much of their time at intermediate levels, rather than quickly saturating at maximal expression rate, or falling to negligible levels. Logical networks may therefore be a good modelling strategy when the data quality is poor, and where intermediate expression levels cannot be resolved.
Kinetic modelling
Systems of differential equations provide a very natural modelling framework for the kinetic behaviour of gene networks, which easily extends to encorporate stochastic effects on gene transcription (McAdams & Arkin, 1999; Kepler & Elston, 2001) and modelling of transport processes in spatially extended systems. For large networks, the curse of dimensionality limits the possibility of inferring parameters of a model from data. A reverse engineering technique has been proposed for linear systems, motivated by the observation that gene networks are sparse (there are relatively very few regulatory interactions between genes) demonstrating that progress can be made even with limited data (Yeung et al. 2002).
For well characterized networks, in which the relevant genes have been identified and the wiring diagram determined, biophysically realistic nonlinear kinetic functions can be assumed for the regulatory interactions. This allows quantitative prediction of transcription rates etc. in response to perturbations of the network. Kinetic parameters can be determined for individual reactions (for example the binding of transcription factors, degradation rates of mRNAs, etc.) using time-course data from microarrays, or GFP reporter gene approaches (Ronen et al. 2002). Interactions between cells are critical in determining ptterns of differentiation into distinct tissue types. Models of spatially extended systems must therefore incorporate mechanisms for signalling between cells, transport of gene products as signalling molecules and signalling networks within cells (Mjolsness et al. 1991; von Dassow et al. 2000; Davidson et al. 2002).
Cellular modelling
The coupling of gene expression to tissue and whole organ function requires a number of intermediate models at physiologically based spatial scales. A number of groups have developed sophisticated representations of ion transport between subcellular compartments and via the membrane in, for example, cardiac (Noble et al. 1998 and Luo & Rudy, 1994), pulmonary (Smirnov & Aaronson, 1994), and smooth muscle (Yang et al. 2003) cell types. Based on systems of coupled ordinary differential equations, the development of these modelling techniques is illustrated below for the cardiac myocyte by considering their application to understanding excitation, contraction and metabolism.
| Multi-scale modelling in the heart |
|---|
|
|
|---|
Cardiac cell models
The key cellular processes which are currently characterized within the organ level modelling framework are ion channel electrophysiology, myofilament mechanics and cellular metabolism. The most advanced of these models are the ion channel models pioneered by Noble (DiFrancesco & Noble, 1985; Noble et al. 1998; Noble & Rudy, 2001) following the seminal modelling work of Hodgkin & Huxley (1952) on squid axon excitation. The Luo & Rudy (1991, 1994) models are alternative frameworks focusing on the physiological behaviours of premature stimulation and arrhythmogenic activity of the single myocyte. Their modelling framework has since been extended by Jafri et al. (1998), among others, to accommodate more complex calcium kinetics which are important for contraction coupling.
In parallel with electrophysiology, Wong (1972), Panerai (1980), and Guccione & McCulloch (1993) have developed specific cardiac models of active tension generation largely based on experimental data from inhomogeneous papillary preparations. Hunter et al. (1998) has published a model fitted from more current data measured from intact and skinned trabeculae and the isolated cardiac cell. This model consists of two components, the binding kinetics that regulate the maximum number of cross-bridges (the force-generating bonds formed between contractile proteins) and the phenomenological characterization of the binding and unbinding kinetics of the cross-bridges themselves.
The modelling of metabolic regulation in cardiac cells has, until recently, lagged behind the characterization of excitation and contraction. This is in part due to the immense complexity of what are highly integrated metabolic pathways which exhibit multistate nonlinear regulation via a large number of state variables (Smith et al. 2002). However, a number of recent studies focusing on regulation of ion transport (Smirnov & Aaronson, 1994; Michailova & McCulloch, 2001; Ushimaru & Fukushima, 2002) and contraction (Saucerman et al. 2003; Smith, 2003) in the myocyte indicate that a fully coupled excitation-contraction model regulated by metabolism will soon be developed.
These cellular-based models now encompass sufficient biophysical detail such that the mediation of specific membrane exchanger and pump ionic currents via gene expression levels can be directly accounted for and linked to specific pathologies (Snyders & Chaudhary, 1996; Clancy & Rudy, 2002; Winslow et al. 1998; Antzelevitch et al. 2001; Noble, 2002e; Mazhari et al. 2001). However, to conclusively link gene regulation with clinically relevant whole organ pathology requires a further step in the increasing spatial scale. A continuum framework supplies an effective means of representing the spatial properties of geometry, conductivity and stiffness tensors, and the spatial variation of different cell types and properties. As will be demonstrated in the following section, using cellular models of active function at grid points embedded in the continuum geometry, in combination with governing equations, and appropriate numerical solution methods, presents a powerful tool for linking function through a number of spatial scales.
Tissue modelling
The major physical processes operating in the heart at the level of the intact organ are (i) the mechanical deformation of the myocardium (ii) the fluid mechanics of blood flow in the atria, ventricles and coronaries (iii) electrical activation of the conducting system from the sino-atrial node to Purkinje fibres and the myocardium, and (iv) transport of metabolic substrates between the coronaries and the myocardium to support the energy demands of the working heart. [Heat flow associated with temperature gradients does not appear to be significant in the in -vivo heart unlike the lungs where warm blood is in close proximity to cold air]. These tissue level processes are of course supported by a diverse range of physical processes at the cellular level, as discussed in earlier sections. In each case the physical process (mechanics, electrical current flow, etc.) is governed by well-established conservation laws. In the following sections we describe these laws and how they are linked to physical processes at the cell level.
Continuum fields
A key concept underlying all of these tissue-level equations is the concept of a continuum representation of spatially varying quantities. This is illustrated for the spatial variation of transmembrane potential in Fig. 5. There are about 1010 myocytes in the heart [a myocyte is about 100 µm long and 20 µm diameter, or
.105 mm3 volume and the human adult myocardium occupies a volume of about 4.105 mm3]. Each cell has a slightly different resting potential (e.g. due to small differences in the expression of Na/K pumps) and these transmembrane potential differences become larger during activation, as illustrated for a one-dimensional row of cells in Fig. 5A. Since the cells are electrically coupled, the differences between adjacent cells are small and to a good approximation the potential field varies smoothly and continuously as shown in Fig. 5B. The spatial variation of transmembrane potential is called a field and is best represented mathematically by dividing the spatial domain into subdomains called finite elements, as shown in Fig. 5B. The potential field within an element is then modelled by interpolating values of the potential defined at the nodes using the basis functions shown in Fig. 5C. Since a node is shared between two elements the potential is continuous across element boundaries. If the spatial derivative of potential is also defined at the nodes and the set of basis functions is extended to include interpolation of the derivative (called cubic Hermite basis functions), the finite element representation of the potential field can maintain continuity of its first derivative (and hence current) across the whole domain.
|
Any number of such fields can coexist in the same physical domain. For example, the extracellular potential provides another dependent variable field governed by conservation laws and linked to the transmembrane potential in bidomain models of cardiac electrophysiology.
Model of ventricular geometry
The one-dimensional cubic Hermite basis functions illustrated in Fig. 5 can be extended to three dimensions and used to fit a finite element model of ventricular geometry, as shown in Fig. 6. Anatomically based models have now been developed for the dog (Nielsen et al. 1991), pig (Stevens & Hunter, 2003) and rabbit heart (Vetter & McCulloch, 2000).
|
In addition to the geometry, fundamental to predicting whole organ function, is an accurate representation of the spatial variation in tissue properties which are often based on the underlying microstructure. This fibrous-sheet structure of the heart is illustrated in Fig. 7. The fibre axis is aligned with the local myocytes (i.e. the direction of sarcomere length changes), the sheet axis is orthogonal to the fibre axis, and is in the plane in which myocytes are tightly bound by endomysial collagen into layers 34 cells thick. The third sheet-normal axis is orthogonal to this plane. The fibrous-sheet structure for the whole myocardium is illustrated in Fig. 8
|
|
To predict whole organ function such as myocardial mechanics or activation requires the application of physical conservation laws which couple the discrete cellular models into a common spatial framework. These laws are expressed mathematically in the form of integral equations or PDEs (partial differential equations) which balance forces (expressed per unit area as stress) or fluxes (such as an electrical current). These equations are derived from the basic physical laws of nature and are independent of the particular properties which characterize the material under consideration (such as soft biological tissue). To solve these balance equations for a particular material requires another equation, called a material law or constitutive equation, which is derived from experimental observation of the particular material. For example, in the case of simulating myocardium contraction, to solve the stress equilibrium equations requires an additional material law linking components of stress to the components of material strain (deformation gradients). The experimentally obtained parameters in this relation are the elastic constants of the tissue. For conservation of current the material law links current to voltage gradients with parameters that express the tissue conductivity.
The systems of equations governing each function are typically non-linear and are solved using numerical methods which reflect the spatial-temporal scale of the underlying phenomena. The rapid upstroke in the action potential of the cardiac myocyte produces large spatial gradients which require representation at high spatial resolution (see Fig. 9). The relative efficiency of the finite difference technique or low order finite element method lends itself to this application. Conversely the continuous stress and strain fields generated by cardiac deformation are effectively predicted using finite elements with high order C1 continuous basis functions. The coupling of systems with large differences in the solution scales provides much of the challenge for modelling computational modelling at this scale in the context of the Physiome Project. Two examples of system coupled modelling are given below.
|
The electrophysiology model of Noble et al. (1998), linked via calcium kinetics to thin filaments kinetics and active force generation by Nickerson et al. (2001), provides the voltage source and tension dynamics at a cellular level. Stretch activated channels in the electrophysiological model produce a reverse coupling where mechanical stretch alters the cell action potential. Smith et al. (2003a) have embedded this cellular model in a grid of finite difference points in a two dimensional finite element model of the cardiac ventricles (see Fig. 10). Excitation was initiated by applying a spatial pattern of stimulus currents based on measurements of Durrer et al. (1970), inducing
|
Coupled myocardial mechanics and coronary blood flow
The anatomically-based model of the largest six generations of the coronary arterial tree generated by Smith et al. (2000) has been embedded at material points in the model of the cardiac ventricles presented above. By assuming the functional form of the axial velocity profile, the three dimensional Navier-Stokes equations governing blood flow have been reduced to one dimension to create an efficient model of coronary haemodynamics (Smith et al. 2002b). Using the myocardial mechanics framework of Nash & Hunter, (2000) the temporal compression and deformation of each vessel segment was determined. The effect of this spatialtemporal variation in intramyocardial stress was then coupled to blood flow via an elastic pressure radius relationship based on local arterial mechanics. This coupled model of myocardial mechanics and coronary blood flow has been used to quantify the effect of myocardial contraction on coronary blood flow (also known as systolic flow impediment) in the heart (Smith et al. 2002b), as illustrated in Fig. 11.
|