Title: | Joint Modelling of Functional Traits |
---|---|
Description: | Fitting and analyzing a Joint Trait Distribution Model. The Joint Trait Distribution Model is implemented in the Bayesian framework using conjugate priors and posteriors, thus guaranteeing fast inference. In particular the package computes joint probabilities and multivariate confidence intervals, and enables the investigation of how they depend on the environment through partial response curves. The method implemented by the package is described in Poggiato et al. (2023) <doi:10.1111/geb.13706>. |
Authors: | Giovanni Poggiato [aut, cre, cph] |
Maintainer: | Giovanni Poggiato <[email protected]> |
License: | GPL-3 |
Version: | 0.1-3 |
Built: | 2024-11-08 05:30:07 UTC |
Source: | https://github.com/giopogg/jtdm |
Partial response curve of the pairwise most suitable community-level strategy and of the pairwise envelop of possible community-level strategy. In order to build the response curve, the function builds a dataframe where the focal variable varies along a gradient and the other (non-focal) variables are fixed to their mean (but see FixX parameter for fixing non-focal variables to user-defined values). The chosen traits are specified in indexTrait. Then uses the jtdm_predict function to compute the most suitable community-level strategy and the residual covariance matrix to build the envelop of possible CWM combinations.
ellipse_plot( m, indexGradient, indexTrait, FullPost = FALSE, grid.length = 20, FixX = NULL, confL = 0.95 )
ellipse_plot( m, indexGradient, indexTrait, FullPost = FALSE, grid.length = 20, FixX = NULL, confL = 0.95 )
m |
a model fitted with |
indexGradient |
The name (as specified in the column names of X) of the focal variable. |
indexTrait |
A vector of the two names (as specified in the column names of Y) containing the two (or more!) traits we want to compute the community level strategy of. |
FullPost |
If FullPost = TRUE, the function returns samples from the predictive distribution of joint probabilities. If FullPost= FALSE, joint probabilities are computed only using the posterior mean of the parameters. |
grid.length |
The number of points along the gradient of the focal variable. Default to 20 (which ensures a fair visualization). |
FixX |
Optional. A parameter to specify the value to which non-focal variables are fixed. This can be useful for example if we have some categorical variables (e.g. forest vs meadows) and we want to obtain the partial response curve for a given value of the variable. It has to be a list of the length and names of the columns of X. For example, if the columns of X are "MAT","MAP","Habitat" and we want to fix "Habitat" to 1, then FixX=list(MAT=NULL,MAP=NULL,Habitat=1.). Default to NULL. |
confL |
The confidence level of the confidence ellipse (i.e. of the envelop of possible community-level strategies). Default is 0.95. |
Plot of the partial response curve of the pairwise most suitable community-level strategy and of the pairwise envelop of possible community-level strategy
data(Y) data(X) # Short MCMC to obtain a fast example: results are unreliable ! m = jtdm_fit(Y=Y, X=X, formula=as.formula("~GDD+FDD+forest"), sample = 1000) # plot the pairwise SLA-LNC partial response curve along the GDD gradient ellipse_plot(m,indexTrait = c("SLA","LNC"),indexGradient="GDD") # plot the pairwise SLA-LNC partial response curve along the GDD gradient # in forest (i.e. when forest=1) ellipse_plot(m,indexTrait = c("SLA","LNC"),indexGradient="GDD", FixX=list(GDD=NULL,FDD=NULL,forest=1))
data(Y) data(X) # Short MCMC to obtain a fast example: results are unreliable ! m = jtdm_fit(Y=Y, X=X, formula=as.formula("~GDD+FDD+forest"), sample = 1000) # plot the pairwise SLA-LNC partial response curve along the GDD gradient ellipse_plot(m,indexTrait = c("SLA","LNC"),indexGradient="GDD") # plot the pairwise SLA-LNC partial response curve along the GDD gradient # in forest (i.e. when forest=1) ellipse_plot(m,indexTrait = c("SLA","LNC"),indexGradient="GDD", FixX=list(GDD=NULL,FDD=NULL,forest=1))
Get the samples from the posterior distribution of the residual covariance matrix, together with the posterior mean and quantiles.
get_sigma(m)
get_sigma(m)
m |
a model fitted with |
A list containing:
Ssamples |
Sample from the posterior distribution of the residual covariance matrix. It is an array where the first two dimensions are the rows and columns of the matrix, and the third dimensions are the samples from the posterior distribution |
Smean |
Posterior mean of the residual covariance matrix. |
Sq975 , Sq025
|
97.5% and 0.25% posterior quantiles of the residual covariance matrix. |
data(Y) data(X) # Short MCMC to obtain a fast example: results are unreliable ! m = jtdm_fit(Y=Y, X=X, formula=as.formula("~GDD+FDD+forest"), sample = 1000) # get the inferred residual covariance Sigma =get_sigma(m)
data(Y) data(X) # Short MCMC to obtain a fast example: results are unreliable ! m = jtdm_fit(Y=Y, X=X, formula=as.formula("~GDD+FDD+forest"), sample = 1000) # get the inferred residual covariance Sigma =get_sigma(m)
Get the samples from the posterior distribution of the regression coefficient matrix B, together with the posterior mean and quantiles. The regression coefficient matrix B is a matrix where the number of rows is defined by the number of traits that are modeled, and the number of columns is the number of columns of the matrix m$X (the number of explanatory variables after transformation via formula)
getB(m)
getB(m)
m |
a model fitted with |
A list containing:
Bsamples |
Sample from the posterior distribution of the regression coefficient matrix. It is an array where the first dimension is the number of traits, the second the number of columns in m$X (the number of variables after transformation via formula) and the third the number of MCMC samples. |
Bmean |
Posterior mean of the regression coefficient matrix. |
Bq975 , Bq025
|
97.5% and 0.25% posterior quantiles of the regression coefficient matrix. |
data(Y) data(X) m = jtdm_fit(Y=Y, X=X, formula=as.formula("~GDD+FDD+forest"), sample = 1000) # get the inferred regression coefficients B=getB(m)
data(Y) data(X) m = jtdm_fit(Y=Y, X=X, formula=as.formula("~GDD+FDD+forest"), sample = 1000) # get the inferred regression coefficients B=getB(m)
Computes the joint probability of CWM traits in regions in the community-trait space specified by bounds and in sites specified in Xnew.
joint_trait_prob( m, indexTrait, bounds, Xnew = NULL, FullPost = FALSE, samples = NULL, parallel = FALSE )
joint_trait_prob( m, indexTrait, bounds, Xnew = NULL, FullPost = FALSE, samples = NULL, parallel = FALSE )
m |
a model fitted with |
indexTrait |
A vector of the names (as specified in the column names of Y) of the two (or more!) traits we want to compute the joint probabilities of. |
bounds |
The parameter to specify a region in the community-trait space where the function computes the joint probabilities of traits. It is a list of the length of "indexTrait", each element of the list is a vector of length two. The vector represents the inferior and superior bounds of the region for the specified trait. For example, if we consider two traits, bounds=list(c(10,Inf),c(10,Inf)) corresponds to the region in the community-trait space where both traits both take values greater than 10. |
Xnew |
Optionally, a data frame in which to look for variables with which to predict. If omitted, the fitted linear predictors are used. |
FullPost |
If FullPost = TRUE, the function returns samples from the predictive distribution of joint probabilities, thus allowing the computation of credible intervals. If FullPost= FALSE, joint probabilities are computed only using the posterior mean of the parameters. FullPost cannot be equal to "mean" here. |
samples |
Optional, default to NULL, only works when FullPost=FALSE. Defines the number of posterior samples to compute the posterior distribution of joint probabilities. Needs to be between 1 the total number of samples drawn from the posterior distribution. |
parallel |
Optional, only works when |
This function is time consuming when FullPost = TRUE
. Consider setting parallel = TRUE
and/or to set samples
to a value smaller than the total number of posterior samples .
A list containing:
PROBsamples |
Samples from the posterior distribution of the joint probability.NULL if FullPost=FALSE. |
PROBmean |
Posterior mean of the joint probability. |
PROBq975 , PROBq025
|
97.5% and 0.25% posterior quantiles of the joint probability. NULL if FullPost=FALSE. |
data(Y) data(X) #We sample only few samples from the posterior in order to reduce # the computational time of the examples. #Increase the number of samples to obtain robust results m = jtdm_fit(Y = Y, X = X, formula = as.formula("~GDD+FDD+forest"), sample = 10) # Compute probability of SLA and LNC to be joint-high at sites in the studies joint = joint_trait_prob(m, indexTrait = c("SLA","LNC"), bounds = list(c(mean(Y[,"SLA"]),Inf), c(mean(Y[,"SLA"]),Inf)), FullPost = TRUE)
data(Y) data(X) #We sample only few samples from the posterior in order to reduce # the computational time of the examples. #Increase the number of samples to obtain robust results m = jtdm_fit(Y = Y, X = X, formula = as.formula("~GDD+FDD+forest"), sample = 10) # Compute probability of SLA and LNC to be joint-high at sites in the studies joint = joint_trait_prob(m, indexTrait = c("SLA","LNC"), bounds = list(c(mean(Y[,"SLA"]),Inf), c(mean(Y[,"SLA"]),Inf)), FullPost = TRUE)
Computes the partial responses curves of joint probability of CWM traits as a function of a focal variable. The regions in which joint probabilities are computed are specified by bounds. In order to build the response curve, the function builds a dataframe where the focal variable varies along a gradient and the other (non-focal) variables are fixed to their mean (but see FixX parameter for fixing non-focal variables to user-defined values). Then, uses joint_trait_prob to compute the joint probability in these dataset.
joint_trait_prob_gradient( m, indexTrait, indexGradient, bounds, grid.length = 200, XFocal = NULL, FixX = NULL, FullPost = FALSE, samples = NULL, parallel = FALSE )
joint_trait_prob_gradient( m, indexTrait, indexGradient, bounds, grid.length = 200, XFocal = NULL, FixX = NULL, FullPost = FALSE, samples = NULL, parallel = FALSE )
m |
A model fitted with |
indexTrait |
A vector of the names (as specified in the column names of Y) of the two (or more!) traits we want to compute the joint probabilities of. |
indexGradient |
The name (as specified in the column names of X) of the focal variable. |
bounds |
The parameter to specify a region in the community-trait space where the function computes the joint probabilities of traits. It is a list of the length of "indexTrait", each element of the list is a vector of length two. The vector represents the inferior and superior bounds of the region for the specified trait. For example, if we consider two traits, bounds=list(c(10,Inf),c(10,Inf)) corresponds to the region in the community-trait space where both traits both take values greater than 10. |
grid.length |
The number of points along the gradient of the focal variable. Default to 200. |
XFocal |
Optional. A gradient of the focal variable provided by the user. If provided, the function will used this gradient instead of building a regular one. Default to NULL. |
FixX |
Optional. A parameter to specify the value to which non-focal variables are fixed. This can be useful for example if we have some categorical variables (e.g. forest vs meadows) and we want to obtain the partial response curve for a given value of the variable. It has to be a list of the length and names of the columns of X. For example, if the columns of X are "MAT","MAP","Habitat" and we want to fix "Habitat" to 1, then FixX=list(MAT=NULL,MAP=NULL,Habitat=1.). Default to NULL. |
FullPost |
If FullPost = TRUE, the function returns samples from the predictive distribution of joint probabilities, thus allowing the computation of credible intervals. If FullPost= FALSE, joint probabilities are computed only using the posterior mean of the parameters. FullPost cannot be equal to "mean" here. |
samples |
Optional, default to NULL, only works when FullPost=FALSE. Defines the number of samples to compute the posterior distribution of joint probabilities. Needs to be between 1 the total number of samples drawn from the posterior distribution. |
parallel |
Optional, only works when FullPost = TRUE. When TRUE, the function uses mclapply to parallelise the calculation of the posterior distribution joint probabilities. |
This function is time consuming when FullPost = TRUE
. Consider setting parallel = TRUE
and/or to set samples
to a value smaller than the total number of posterior samples.
A list containing:
GradProbssamples |
Sample from the posterior distribution of the joint probability along the gradient. It is a vector whose length is the number of posterior samples. NULL if FullPost=FALSE. |
GradProbsmean |
Posterior mean of the joint probability along the gradient. |
GradProbsq975 , GradProbsq025
|
97.5% and 0.25% posterior quantiles of the joint probability along the gradient. NULL if FullPost=FALSE. |
gradient |
The gradient of the focal variable built by the function. |
data(Y) data(X) # We sample only few samples from the posterior in order to reduce # the computational time of the examples. # Increase the number of samples to obtain robust results m = jtdm_fit(Y = Y, X = X, formula = as.formula("~GDD+FDD+forest"), sample = 10) # Compute probability of SLA and LNC to be joint-high at sites in the studies # Compute the joint probability of SLA and LNC # to be joint-high along the GDD gradient joint = joint_trait_prob_gradient(m,indexTrait = c("SLA","LNC"), indexGradient = "GDD", bounds = list(c(mean(Y[,"SLA"]),Inf),c(mean(Y[,"SLA"]),Inf)), FullPost = TRUE) # Compute the joint probability of SLA and LNC to be joint-high along the # GDD gradient when forest = 1 (i.e. in forests) joint = joint_trait_prob_gradient(m, indexTrait = c("SLA","LNC"), indexGradient = "GDD", bounds = list(c(mean(Y[,"SLA"]),Inf), c(mean(Y[,"SLA"]),Inf)), FixX = list(GDD = NULL, FDD = NULL, forest = 1), FullPost = TRUE)
data(Y) data(X) # We sample only few samples from the posterior in order to reduce # the computational time of the examples. # Increase the number of samples to obtain robust results m = jtdm_fit(Y = Y, X = X, formula = as.formula("~GDD+FDD+forest"), sample = 10) # Compute probability of SLA and LNC to be joint-high at sites in the studies # Compute the joint probability of SLA and LNC # to be joint-high along the GDD gradient joint = joint_trait_prob_gradient(m,indexTrait = c("SLA","LNC"), indexGradient = "GDD", bounds = list(c(mean(Y[,"SLA"]),Inf),c(mean(Y[,"SLA"]),Inf)), FullPost = TRUE) # Compute the joint probability of SLA and LNC to be joint-high along the # GDD gradient when forest = 1 (i.e. in forests) joint = joint_trait_prob_gradient(m, indexTrait = c("SLA","LNC"), indexGradient = "GDD", bounds = list(c(mean(Y[,"SLA"]),Inf), c(mean(Y[,"SLA"]),Inf)), FixX = list(GDD = NULL, FDD = NULL, forest = 1), FullPost = TRUE)
jtdm_fit is used to fit a Joint trait distribution model. Requires the response variable Y (the sites x traits matrix) and the explanatory variables X.This function samples from the posterior distribution of the parameters, which has been analytically determined. Therefore, there is no need for classical MCMC convergence checks.
jtdm_fit(Y, X, formula, sample = 1000)
jtdm_fit(Y, X, formula, sample = 1000)
Y |
The sites x traits matrix containing community (weighted) means of each trait at each site. |
X |
The design matrix, i.e. sites x predictor matrix containing the value of each explanatory variable (e.g. the environmental conditions) at each site. |
formula |
An object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'. |
sample |
Number of samples from the posterior distribution. Since we sample from the exact posterior distribution, the number of samples is relative lower than MCMC samplers. As a rule of thumb, 1000 samples should provide correct inference. |
A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula for more details of allowed formulae.
A list containing:
model |
An object of class "jtdm_fit", containing the samples from the posterior distribution of the regression coefficients (B) and residual covariance matrix (Sigma), together with the likelihood of the model. |
Y |
A numeric vector of standard errors on parameters |
X_raw |
The design matrix specified as input |
X |
The design matrix transformed as specified in formula |
formula |
The formula specified as input |
data(Y) data(X) m = jtdm_fit(Y = Y, X = X, formula = as.formula("~GDD+FDD+forest"), sample = 1000)
data(Y) data(X) m = jtdm_fit(Y = Y, X = X, formula = as.formula("~GDD+FDD+forest"), sample = 1000)
Obtains predictions from a fitted joint trait distribution model and optionally computes their R squared and root mean square error (RMSE)
jtdm_predict( m = m, Xnew = NULL, Ynew = NULL, validation = FALSE, FullPost = "mean" )
jtdm_predict( m = m, Xnew = NULL, Ynew = NULL, validation = FALSE, FullPost = "mean" )
m |
a model fitted with |
Xnew |
optionally, a data frame in which to look for variables with which to predict. If omitted, the fitted linear predictors are used |
Ynew |
Optional. The observed response variables at sites specified in Xnew. It is used to compute goodness of fit metrics when validation= T. |
validation |
boolean parameter to decide whether we want to compute goodness of fit measures. If true, then Ynew is needed. |
FullPost |
The type of predictions to be obtain. If FullPost = TRUE, the function returns samples from the predictive distribution, the credible intervals are thus the predictive credible interval. If FullPost="mean", the function computes the posterior distribution of the regression term |
To obtain a full assessment of the posterior distribution, the function should be ran with FullPost=TRUE, although this can be time consuming. FullPost="mean" is used to compute partial response curves, while FullPost=FALSE is used to compute goodness of fit metrics.
A list containing:
Pred |
Sample from the posterior distribution of the posterior predictive distribution. It is an array where the first dimension is the number of sites in Xnew, the second is the number of traits modelled and the third the number of MCMC samples. NULL if FullPost=FALSE. |
PredMean |
Posterior mean of posterior predictive distribution |
Predq975 , Predq025
|
97.5% and 0.25% posterior quantiles of the posterior predictive distribution. NULL if FullPost=FALSE. |
R2 |
R squared of predictions (squared Pearson correlation between Ynew and the predictions). NULL if validation=FALSE. |
RMSE |
Root square mean error between squared of predictions. NULL if validation=FALSE. |
data(Y) data(X) m = jtdm_fit(Y = Y, X = X, formula=as.formula("~GDD+FDD+forest"), sample = 1000) # marginal predictions of traits in the sites of X pred = jtdm_predict(m)
data(Y) data(X) m = jtdm_fit(Y = Y, X = X, formula=as.formula("~GDD+FDD+forest"), sample = 1000) # marginal predictions of traits in the sites of X pred = jtdm_predict(m)
Run K-fold cross validation predictions of the model m on a specified dataset.
jtdmCV(m, K = 5, sample = 1000, partition = NULL)
jtdmCV(m, K = 5, sample = 1000, partition = NULL)
m |
a model fitted with |
K |
The number of folds of the K-fold cross validation |
sample |
Number of samples from the posterior distribution. Since we sample from the exact posterior distribution, the number of samples is relative lower than MCMC samplers. As a rule of thumb, 1000 samples should provide correct inference. |
partition |
A partition of the dataset specified by the user. It is a vector (whose length are the number of sites), where each element specifies the fold index of the site. |
A list containing:
Pred |
Sample from the posterior predictive distribution in cross validation. It is an array where the first dimension is the number of sites in Xnew, the second is the number of traits modeled and the third the number of MCMC samples. NULL if FullPost=FALSE. |
PredMean |
Posterior mean of posterior predictive distribution in cross validation. |
Predq975 , Predq025
|
97.5% and 0.25% posterior quantiles of the posterior predictive distribution in cross validation. NULL if FullPost=FALSE. |
R2 |
R squared of predictions in cross validation. |
RMSE |
Root square mean error between squared of predictions in cross validation. |
data(Y) data(X) m = jtdm_fit(Y=Y, X=X, formula=as.formula("~GDD+FDD+forest"), sample = 1000) # Run 3-fold cross validation on m pred = jtdmCV(m, K = 5, sample = 1000)
data(Y) data(X) m = jtdm_fit(Y=Y, X=X, formula=as.formula("~GDD+FDD+forest"), sample = 1000) # Run 3-fold cross validation on m pred = jtdmCV(m, K = 5, sample = 1000)
Computes and plots the trait-environment relationship of a given CWM trait and a focal environmental variable. In order to build the response curve, the function builds a dataframe where the focal environmental variable varies along a gradient and the other (non-focal) variables are fixed to their mean (but see FixX parameter for fixing non-focal variables to user-defined values).
partial_response( m, indexGradient, indexTrait, XFocal = NULL, grid.length = 200, FixX = NULL, FullPost = "mean" )
partial_response( m, indexGradient, indexTrait, XFocal = NULL, grid.length = 200, FixX = NULL, FullPost = "mean" )
m |
a model fitted with |
indexGradient |
The name (as specified in the column names of X) of the focal variable. |
indexTrait |
The name (as specified in the column names of Y) of the focal trait. |
XFocal |
Optional. A gradient of the focal variable provided by the user. If provided, the function will used this gradient instead of building a regular one. Default to NULL. |
grid.length |
The number of points along the gradient of the focal variable. Default to 200. |
FixX |
Optional. A parameter to specify the value to which non-focal variables are fixed. This can be useful for example if we have some categorical variables (e.g. forest vs meadows) and we want to obtain the partial response curve for a given value of the variable. It has to be a list of the length and names of the columns of X. For example, if the columns of X are "MAT","MAP","Habitat" and we want to fix "Habitat" to 1, then FixX=list(MAT=NULL,MAP=NULL,Habitat=1.). Default to NULL. |
FullPost |
The type of predictions to be obtain. If FullPost = TRUE, the function returns samples from the predictive distribution. If FullPost="mean", the function computes the posterior distribution of the regression term B%*%X). Default to "mean", here FullPost cannot be FALSE. |
A list containing:
p |
A plot of the trait-environment relationship. |
predictions |
A data frame containing the predicted trait-environmental relationships including the gradient of the focal environmental variable, mean trait predictions and quantiles (can be useful to code customized plot). |
data(Y) data(X) # Short MCMC to obtain a fast example: results are unreliable ! m = jtdm_fit(Y=Y, X=X, formula=as.formula("~GDD+FDD+forest"), sample = 1000) # SLA-GDD relationship plot = partial_response(m,indexGradient="GDD",indexTrait="SLA") plot$p # SLA-GDD relationship in forest (i.e. when forest=1) plot = partial_response(m,indexGradient="GDD",indexTrait="SLA", FixX=list(GDD=NULL,FDD=NULL,forest=1)) plot$p
data(Y) data(X) # Short MCMC to obtain a fast example: results are unreliable ! m = jtdm_fit(Y=Y, X=X, formula=as.formula("~GDD+FDD+forest"), sample = 1000) # SLA-GDD relationship plot = partial_response(m,indexGradient="GDD",indexTrait="SLA") plot$p # SLA-GDD relationship in forest (i.e. when forest=1) plot = partial_response(m,indexGradient="GDD",indexTrait="SLA", FixX=list(GDD=NULL,FDD=NULL,forest=1)) plot$p
Plots the regression coefficients and covariance matrix of a fitted jtdm
## S3 method for class 'jtdm_fit' plot(x, ...)
## S3 method for class 'jtdm_fit' plot(x, ...)
x |
a model fitted with |
... |
additional arguments |
A plot of the regression coefficients and covariance matrix of the fitted model
Giovanni Poggiato
data(Y) data(X) m = jtdm_fit(Y=Y, X=X, formula=as.formula("~GDD+FDD+forest"), sample = 1000) plot(m)
data(Y) data(X) m = jtdm_fit(Y=Y, X=X, formula=as.formula("~GDD+FDD+forest"), sample = 1000) plot(m)
Prints the summary of a fitted jtdm
## S3 method for class 'jtdm_fit' summary(object, ...)
## S3 method for class 'jtdm_fit' summary(object, ...)
object |
a model fitted with |
... |
additional arguments |
A printed summary of the fitted jtdm
Giovanni Poggiato
data(Y) data(X) m = jtdm_fit(Y=Y, X=X, formula=as.formula("~GDD+FDD+forest"), sample = 1000) summary(m)
data(Y) data(X) m = jtdm_fit(Y=Y, X=X, formula=as.formula("~GDD+FDD+forest"), sample = 1000) summary(m)
Includes the Growing Degree Days (GDD) during the growing season and Freezing Degree Days (FDD) during the growing season averaged over the period 1989-2019
data(X) data(X)
data(X) data(X)
A matrix
Orchamp consortium
data(X)
data(X)
A site x CWM traits dataset computed using pinpoint abundances of plants and species mean
data(Y)
data(Y)
A matrix
Orchamp Consortium
data(Y)
data(Y)