What is the difference between discrete and continuous traits




















As important…. Your physics assignments can be a real challenge, and the due date can be really close — feel free to use our assistance and get the desired result.

Be sure that math assignments completed by our experts will be error-free and done according to your instructions specified in the submitted order form. Our experts will gladly share their knowledge and help you with programming projects. Need a fast expert's response?

Place free inquiry. Calculate the price. Learn more about our help with Assignments: Biology. Comments No comments. Be the first! Thank you! Your comments have been successfully added. Many human phenotypes such as IQ, learning ability and blood pressure also are quantitative traits.

These traits are controlled by multiple genes, each segregating according to Mendel's laws. These traits can also be affected by the environment to varying degrees. The following are examples of quantitative traits that we are concerned with in our daily life. Each trait is controlled by a number of genes and is a quantitative trait. The two photographs above demonstrate variability for Indian Paintbrush flower color.

The parents in the left photo are either yellow or reddish orange. The F 2 individuals though show a distribution of colors from yellow to reddish orange. The development and widespread adoption of statistical phylogenetic methods has revolutionized disparate disciplines in evolutionary biology, epidemiology, and systematics.

Studies utilizing maximum-likelihood ML and Bayesian approaches have become the preferred means to analyze molecular data, largely eclipsing parsimony and distance methods. Despite this, approaches which draw inference from morphological data have remained comparatively underdeveloped but see relevant discussion and citations below.

As a result, non-probabilistic tree inference methods have continued to be employed for the phylogenetic analysis of morphological characters. Nonetheless, several landmark advances in the development of statistical morphological phylogenetic methods have demonstrated the benefits of further developing this framework. This will be particularly important in the near future as burgeoning approaches enabling the rapid collection of morphological data may begin to outstrip methods through which to analyze them Chang and Alfaro a , b.

This may significantly alter and enhance our view of the tree of life, especially considering that the majority of macro-organisms, represented by fossil taxa, can only be analyzed from their morphology. A foundational contribution in morphological phylogenetics has been the Mk model of discrete trait evolution Lewis This is a version of the Jukes—Cantor model of nucleotide substitution generalized to accommodate varying numbers of character states Jukes and Cantor Extensions to this model accommodate for biased sampling of parsimony informative characters Lewis , rate heterogeneity between sites Wagner , and asymmetric transition rates Ronquist and Huelsenbeck ; Wright et al.

The deployment of this model has demonstrated the utility of statistical approaches to morphological phylogenetics. Such approaches improve estimates of uncertainty over non-probabilistic approaches, enable a clearer statement of modeling assumptions, and enable branch length estimation.

These approaches have also enabled the application of tip dating methods to the combined analysis of extinct taxa represented by morphological data with extant taxa Nylander et al. These total evidence tip dating methods have been widely used since their introduction, and are implemented in the BEAST Bouckaert et al. These have more clearly resolved the timing of species divergences and relationships between fossil and living taxa Wiens et al. Overall, probabilistic approaches to morphological phylogenetics appear to represent an improvement in accuracy compared to cladistic methods, and are indispensable in their distinct ability to allow the estimation of branch lengths and evolutionary rate.

The benefits of a statistical total-evidence framework as applied to fossil taxa will only become clearer as more data become available and improved methods are developed Pennell and Harmon ; Lee and Palci Despite the these strides, discrete character models represent an imperfect solution in their current usage. Although Bayesian inference under Mk appears to outperform parsimony under certain conditions, error increases at high evolutionary rates Wright and Hillis However, these have been assuaged and any issues arising from missing data are likely not specific to probabilistic approaches Wright and Hillis ; Guillerme and Cooper Another potential issue is the lack of clarity in interpreting the Mk model biologically.

Although transition rates have a strong theoretical and empirical basis in population genetics, their significance beyond serving as nuisance parameters is less straightforward when applied to morphological data. Discrete morphological characters may not undergo change in a manner analogous to nucleotides, which are well understood to alternate between states repeatedly.

Conversely, many characters used for phylogenetic inference consist of single, parsimony informative directional changes between taxa Klopfstein et al. It is unclear how adequately discrete Markov models describe such variation. The Mk model itself does not accommodate directional evolution, and previous researchers have questioned the adequacy of existing discrete character models Ronquist et al. This is particularly important when considering the importance of branch lengths in total evidence tip dating methods discussed above, but may also be expected to mislead inference of topology.

Aside from the modeling concerns discussed above, discrete morphological characters present a non-trivial set of challenges to phylogenetics that are distinct from those possessed by molecular data. Perhaps foremost among these is disagreement between researchers in the categorization, ordering, and weighing of discrete character states Farris ; Hauser and Presch ; Pleijel ; Wilkinson Despite extensive discussion among comparative biologists, the interpretive nature of the process of character coding has continued to leave major palaenotological questions unresolved Upchurch ; Wilson and Sereno ; Bloch and Boyer ; Kirk et al.

Use of continuous characters may help to address some of the concerns with discrete traits discussed above. They can be collected more objectively than qualitative observations and do not require ordering of states. Their use in phylogenetic inference has been discussed among the earliest advancements in statistical phylogenetics Cavalli-Sforza and Edwards ; Felsenstein , and their phylogenetic informativeness has been demonstrated empirically Goloboff et al.

Still, the use of continuous characters for the inference of phylogenetic topology has remained uncommon, with methods for their use in phylogenetics remaining relatively poorly examined beyond the foundational works referenced above. Although many paleontological studies incorporate continuous measurements, they are binned into categories and analyzed as discrete. However, since fossil data are often scarce, it may be beneficial to maximize the amount of information gleaned from available specimens by representing such variation in its entirety.

Another potential benefit to inferring phylogeny from continuous characters is the wealth of models developed in phylogenetic comparative methods to describe their evolution. Most comparative models of continuous trait evolution belong to the Gaussian class, which are also well utilized in disparate fields such as physics, economics, and engineering.

In comparative biology, they are used to describe stochastic Markovian movement through continuous trait space along continuous time. Two major benefits to Gaussian models in phylogenetics are their relatively straightforward interpretability and the relative ease of deriving mathematical extensions to describe a range of biological processes.

Given the existence of well understood and clearly interpretable models describing their evolution, the use of continuous traits may offer several advantages over discrete characters in phylogenetic inference. However, their behavior is not well understood when applied to the inference of phylogenetic topology, and so further investigation is needed. In addition, there are potential hurdles to their efficacy.

Possibly foremost among these is the widespread covariance between continuous measurements that is expected through both genetic and morphometric perspectives Lynch et al. Nevertheless, the expected magnitude in covariance among continuous morphological measurements and the robustness of phylogenetic methods to this violation is not known.

Furthermore, it is also generally reasonable to expect evolutionary covariance between nucleotide sites, and phylogenetic methods that do not accommodate for this are routinely applied to molecular data. In this study, I carry out simulations to compare the relative performance of binary discrete and continuous characters at reconstructing phylogenetic relationships. Simulations of continuous characters were designed to reflect a range of scenarios that may influence accuracy including overall evolutionary rate and matrix sizes.

I also conduct inference on continuous traits that have undergone correlated evolution, an important violation to single-rate BM thought to be widespread in continuous character evolution.

I generated a set of pure birth trees using the Phytools package Revell package in R R Core Team , each containing ten taxa. All trees were ultrametric and generated with a total length of 1. These trees were used to simulate continuous characters evolving along an unbounded BM process, again using Phytools. This is a Markovian process in continuous time where the variance of the process can increase infinitely through time.

Since the process under which traits were simulated is unbounded, phylogenetic signal is expected to remain consistent across rates Revell et al. Discrete characters were simulated in the Phytools package Revell under an Mk model with homogeneous transition probabilities.

Traits were generated at transition rates 0. All character matrices were generated without rate heterogeneity, and include invariable sites i. Dots denote incorrect bipartitions. Matrices containing traits were generated and randomly subsampled to create smaller sets of 20 and characters to reflect a range of sampling depths.

These were chosen because many published morphological matrices fall within this range. The subsampled matrix sizes were chosen to represent reasonably sized paleontological data sets, while the trait matrices were tested to assess performance when data are abundant.

While such large data sets are uncommon in morphology, several studies have produced character matrices of this size, and for continuous characters, it may be feasible to generate such large data sets from morphometric data.

When phylogenetic half-life is set to be equal, phylogenetic constraint should be the same between both sets of characters in the sense that they reach saturation over the same timescale. This comparison examines whether either data source performs inherently better when phylogenetic signal is held constant. These data were generated in matrices of traits at an evolutionary rate of 0. Because the phylogenetic information content of both sets of constrained traits should be the same, both sets are expected to perform similarly.

Nevertheless, this comparison provides a control by assessing whether unknown differences in the behavior of each model or other properties of each method themselves lead to any differences in reconstruction accuracy. Data were also generated under a correlated BM process to mimic inference in the presence of multidimensionality. These data sets were constructed at covariance strengths of 0. These were chosen to represent situations where traits range from being loosely to tightly correlated to each another, and where the number of correlated dimensions is large to small.

Although differing, these values were chosen to loosely follow the scheme of Adams and Felice Trait likelihoods were computed after Felsenstein , Markov chain Monte Carlo MCMC simulations were run for ,—1,, generations and checked manually for convergence using Tracer v1.

Runs were accepted when the effective sample size ESS for logged parameters exceeded Trees were inferred from discrete data in MrBayes version 3. Different programs were used because, while MrBayes remains the standard in the field for Bayesian phylogenetic inference, its current version does not implement likelihood functions for continuous character models.

So the continuous character approach needed to be developed in RevBayes, however, I preferred to remain with the standard and proven implementation where possible. For both continuous and discrete characters, I incorporated a birth—death prior on node heights. This was done to enable an even comparison of branch lengths obtained through both methods that are scaled to time. Tree distributions were summarized using TreeAnnotator version 2. MCC trees maximize the posterior probability of each individual clade, summarizing across all trees sampled during MCMC simulation.

I assessed topological accuracy from simulated trait data using the symmetric Robinson—Foulds distance measure Robinson and Foulds , giving the topological distance between true trees and inferred trees. Symmetric distance is calculated as a count of the number of shared and unshared partitions between compared trees. These values were then scaled to the total possible symmetric distance for interpretability. Additionally, I measured error in branch length reconstruction using the branch length distance BLD Kuhner and Felsenstein This is calculated as the sum of the vector representing the individual differences between the branch lengths of all shared bipartitions.

The scale of this value depends on the lengths of the trees under comparison. If trees of different lengths are compared, BLD can be very high.

However, in this study, all trees are scaled to a root height of 1 to allow comparison of topological and internal branch length reconstruction error. Summary barplots were constructed using ggplot2 Wickham Topological reconstruction error is lower overall for trees estimated from continuous characters than from binary discrete Fig.

For discrete characters, symmetric distance increases significantly at high evolutionary rates, likely due to saturation and loss of phylogenetic signal. Distance also increases in discrete characters when rate is very slow, due to lack of time for phylogenetic signal to develop. This pattern is similar to that recovered by Wright and Hillis in their test of Bayesian inference of Mk, which revealed highest topological error at very low and high rates. As expected, continuous characters perform consistently across rates because saturation cannot occur, even at very fast rates.

Because of the differing sensitivities of each data type to evolutionary rate, topological error should also be compared using the most favorable rate class for discrete characters, 0.

Even at this rate, continuous reconstruction performs more consistently than discrete, with error more tightly distributed around a slightly lower mean. A likely explanation is that discrete characters retain less information that continuous characters. The small state space of the binary character model likely causes phylogenetic signal to become saturated more quickly at fast rates, and develop too slowly at slow rates than multi-state characters. BM and Mk appear to perform fairly similarly in reconstructing branch lengths Fig.

The pattern across rates and matrix sizes are very similar between BLD and symmetric distances, with the fastest rates producing the most error.

This likely results from increased saturation at fast rates, causing underestimation of hidden character changes. Topological error calculated as the proportion of maximum symmetric distance across trees estimated from independently evolving continuous characters. Matrix size has a major impact on tree reconstruction accuracy. Estimations from both discrete and continuous traits improve substantially at each increasing matrix size Fig. Estimates from character matrices possess fairly high error in both data types, with approximately 1 in 5 bipartitions being incorrectly estimated from continuous characters, and 2 in 5 incorrectly being incorrectly estimated from discrete data.

Increasing matrix size to traits improves accuracy significantly, with both data types estimating approximately 1 in 10 bipartitions incorrectly. Although at several rates, mean symmetric distance compared between data types is close, continuous characters tend to be less widely distributed, and thus appear to reconstruct trees with more consistent accuracy.

When matrix size is increased to characters, both continuous and discrete characters are able to recover phylogeny with very high accuracy, except for at very fast rates, where discrete characters estimate approximately half of all bipartitions incorrectly on average. Phylogenies inferred from continuous traits simulated under an OU model achieve virtually identical performance to binary discrete characters simulated under the same phylogenetic constraint Fig.

This result demonstrates that any performance increases observed for continuous traits over discrete traits result from differences in realized phylogenetic information. Topological error achieved after reconstructing trees from discrete traits simulated under Mk at rate 0. Tree inference under BM appears relatively robust to the violation of coevolving continuous characters.

When correlated traits are of low dimensionality and covariance strength, reconstruction appears to be nearly as accurate as uncorrelated traits, with all bipartitions estimated correctly on average. Although statistical significance cannot be estimated for BLD and symmetric distance, estimation under low to intermediate trait covariance appears at least qualitatively similar, albeit slightly worse, to uncorrelated continuous and binary discrete characters.

The decreases in accuracy observed can likely be attributed to the decrease in total information content caused by covariance. This reduces the effective amount of data from which to draw inference. This is reflected in the results, with higher covariances and dimensionalities reconstructing trees with a similar magnitude of error as is shown for the character data sets. Dimensions refers to the number of traits within covarying blocks.

Covariance strength refers to the strength of the correlation between covarying characters, with a value of 0 describing to complete independence and 1 describing perfect correlation. The results demonstrate that phylogenetic reconstruction from continuous trait data can provide a reasonable supplement or alternative to inference from discrete characters.

Continuous characters that are unconstrained and unbounded in their evolution outperform discrete characters, and perform equally well when constrained by selection. Therefore, such characters are able to retain phylogenetic information at high evolutionary rates that may cause rampant saturation in discrete characters Fig. Further work is needed in this area to investigate the extent to which continuous characters are bounded and constrained in their evolution relative to discrete characters.

This will be especially important moving forward, as temporal variation in evolutionary regimes and model parameters can interact in complex ways, sometimes extending the maintenance of phylogenetic signal through time Revell et al. Although continuous characters in empirical are undoubtedly constrained in their evolution, the added information contained in continuous character data sets may lessen the extent of saturation relative to discrete characters in practice. The demonstration that performance becomes equal when the amount of phylogenetic constraint is held constant between both data sources identifies the major source of the performance increase observed in unconstrained BM traits compared to discrete traits.

The average amount of phylogenetic constraint exhibited by discrete and continuous traits, however, is not well understood in empirical data sets. Conversely, the susceptibility of discrete traits to the loss of phylogenetic signal at high evolutionary rates and deep timescales has long been recognized Hillis and Huelsenbeck ; Yang Although this effect is understood to affect molecular data, discrete morphological data sets may possess increased susceptibility to this effect because of the frequent use of binary character coding schemes.

Discrete characters constrained to fewer states increases signal loss at high evolutionary rates due to increased levels of homoplasy, saturation, and lower information content overall Donoghue and Ree The extent to which continuous traits are constrained in their evolution on average is not well understood. However, the results here suggest that researchers would benefit in treating continuous traits as such and inferring phylogenies under continuous trait models in order to maximize usable information contained in data sets.

My results demonstrate that the fundamental issues in comparing continuous and discrete traits are state space, selective constraint, and evolutionary boundedness.

When selective constraint in continuous characters occurs at levels which restrict phylogenetic signal with the same strength as binary characters, reconstruction accuracy is predictably equal. Nevertheless, it is unclear the extent to which phylogenetic half-life in continuous and discrete traits tends to differ in empirical data sets. Continuous characters may be expected to commonly evolve under some manifestation of selective constraint, but it is unclear whether such effects typically mask phylogenetic signal to the same extent as rapidly saturating binary traits.

Discrete traits with more than two states possess a significantly longer phylogenetic half-life than binary characters, but could be supplanted by continuous characters in many cases. Although empirical morphological data sets typically incorporate discrete characters with more than two states, these are typically fewer in number than binary coded characters. Multi-state characters are also typically discretized codings of continuous measurements.

The tendency of morphological matrices to be predominantly composed of binary characters should encourage further consideration of continuous traits in future empirical and theoretical studies. Error in branch length estimation was fairly high with the trait matrices but decreased substantially when matrix size was increased to traits.

Although BM and Mk achieve similar accuracy in estimating branch lengths in this study, careful thought should continue to be applied when relying upon Mk branch length estimates in the future. Branch length error may be higher when inferring under Mk from empirical data sets, since many discrete morphological matrices are constructed to include only parsimony informative characters. In these cases, characters are expected to have undergone only single synapomorphic changes. Although the lack of invariable sites in data sets tailored to parsimony is addressed through the ascertainment bias correction developed by Lewis , it is unclear how meaningfully the directional single character changes often observed in these data sets can inform evolutionary rates.

This mode of change, which may characterize much of discrete character evolution, differs from the population dynamics of nucleotide substitution.

Although continuous traits may often follow covarying evolutionary trajectories in nature, this appears to have a relatively minor impact on reconstruction. Accuracy was only greatly lowered in the simultaneous presence of very high dimensionality and covariance strength. Offering further support to the ability of continuous characters to reconstruct phylogeny despite evolutionary covariance, Adams and Felice also report the presence of phylogenetic information in multidimensional characters, even when the number of dimensions is greater than the number of taxa.

Despite these generally positive findings, it should be noted that inference may be misled if sampling is significantly biased to include relatively small numbers of strongly correlated measurements. In these cases, it would be beneficial to examine the correlation structure and information content of the data set to assess the amount of biased redundancy in signal. Use of continuous traits has the benefit of reducing subjectivity in the construction of data matrices in many cases. Categorizing qualitative characters often requires subjective interpretation.

However, quantitative measurements can be taken without this source of human error. Although the likelihood approaches to morphological phylogenetics enabled by the Mk model represent a major step in this direction, discordance in tree estimates can still be attributed to differences in qualitative categorization of variation by researchers.

Translation of morphological observations into data that can be analyzed can present serious complications in discrete characters. Steps such as the determination of whether or not to order states, the total number of states chosen to describe characters, and the assignment of character states can vary greatly and often yield widely different results Hauser and Presch ; Pleijel ; Wilkinson ; Hawkins et al.

Continuous measurements avoid many of these issues because they can be measured, by definition, objectively and quantitatively. In addition, they may better describe variation than discrete characters.

Several workers have suggested that the majority of biological variation is fundamentally continuous Thiele ; Rae ; Wiens



0コメント

  • 1000 / 1000