Bayesian Genomic Linear Regression

Osval Antonio Montesinos López; Abelardo Montesinos López; Dr. Jose Crossa

doi:10.1007/978-3-030-89010-0_6

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Montesinos López OA, Montesinos López A, Crossa J. Multivariate Statistical Machine Learning Methods for Genomic Prediction [Internet]. Cham (CH): Springer; 2022. doi: 10.1007/978-3-030-89010-0_6

Cover of Multivariate Statistical Machine Learning Methods for Genomic Prediction

Multivariate Statistical Machine Learning Methods for Genomic Prediction [Internet].

Show details

Contents

< Prev Next >

Chapter 6Bayesian Genomic Linear Regression

Published online: January 14, 2022.

The Bayesian paradigm for parameter estimation is introduced and linked to the main problem of genomic-enabled prediction to predict the trait of interest of the non-phenotyped individuals from genotypic information, environment variables, or other information (covariates). In this situation, a convenient practice is to include the individuals to be predicted in the posterior distribution to be sampled. We explained how the Bayesian Ridge regression method is derived and exemplified with data from plant breeding genomic selection. Other Bayesian methods (Bayes A, Bayes B, Bayes C, and Bayesian Lasso) were also described and exemplified for genome-based prediction. The chapter presented several examples that were implemented in the Bayesian generalized linear regression (BGLR) library for continuous response variables. The predictor under all these Bayesian methods includes main effects (of environments and genotypes) as well as interaction terms related to genotype × environment interaction.

Keywords:

Bayesian methods, Genomic selection, Main and interaction terms

6.1. Bayes Theorem and Bayesian Linear Regression

Unlike classic statistical inference, which assumes that the parameter θ that defines the model is a fixed unknown quantity, in Bayesian inference it is considered as a random variable whose variation tries to represent the knowledge or ignorance about this, before the data points are collected (Box and Tiao 1992). The probability density function which describes such variation is known as the prior distribution and is an additional component in the specification of the complete model in a Bayesian framework.

Given a data set y_n = (y₁, …, y_n) whose distribution is assumed to be f(y| θ), and a prior distribution for the parameter θ, f(θ), the Bayesian analysis uses the Bayes theorem to combine these two pieces of information to obtain the posterior distribution of the parameters, on which the inference is fully based (Christensen et al. 2011):

f (θ| y) = \frac{f (y, θ)}{f (y)} = \frac{f (θ) f (y| θ)}{f (y)} \propto f (θ) L (θ; y),

where f(y) = ∫ f(y| θ)f(θ)dθ = E_θ[f(y| θ)] is the marginal distribution of θ. This conditional distribution describes what is known about θ after data is collected and can be thought of as the updated prior knowledge about θ with the information contained in the data, which is done through the likelihood function L(θ; y) (Box and Tiao 1992).

In general, because the posterior distribution doesn’t always have a recognizable form and it is often not easy to simulate from this, numerical approximation methods are employed. Once a sample of the posterior distribution is obtained, estimation of a parameter is often found by averaging the sample values or averaging a function of the sample values when another quantity is of interest. For example, in genomic prediction with dense molecular markers, the main interest is to predict the trait of interest of the non-phenotyped individuals that have only genotypic information, environment variables, or other information (covariates). In this situation, a convenient practice is to include the individuals to be predicted (y_p) in the posterior distribution to be sampled.

Specifically, a standard Bayesian framework for a normal linear regression model (see Chap. 3)

Y = β_{0} + \sum_{j = 1}^{p} X_{j} β_{j} + ϵ

6.1

with ϵ a random error with normal distribution with mean 0 and variance σ², is fully specified by assuming the next non-informative prior distribution: β and log(σ) approximately independent and locally uniform.

f (β, σ^{2}) \propto σ^{- 2}

6.2

which is not a proper distribution because it does not integrate to 1 (Box and Tiao 1992; Gelman et al. 2013). However, when X is of full column rank, the posterior distribution is a proper distribution and is given by

\begin{matrix} f (β, σ^{2}, y, X) \propto {(σ^{2})}^{- \frac{n}{2}} exp [- \frac{1}{2 σ^{2}} {(y - Xβ)}^{T} (y - Xβ)] {(σ^{2})}^{- 1} \\ \propto {(σ^{2})}^{- \frac{n}{2}} exp [- \frac{1}{2 σ^{2}} (β^{T} X^{T} Xβ - 2 y^{T} Xβ + y^{T} y)] {(σ^{2})}^{- 1} \\ \propto {(σ^{2})}^{- \frac{p + 1}{2}} exp [- \frac{1}{2 σ^{2}} {(β - \tilde{β})}^{T} X^{T} X (β - \tilde{β}) - \frac{1}{2 σ^{2}} (y^{T} y - {\tilde{β}}^{T} X^{T} X \tilde{β})] {(σ^{2})}^{- 1 - (n - p - 1) / 2} \\ \propto {(σ^{2})}^{- \frac{p + 1}{2}} exp [- \frac{1}{2 σ^{2}} {(β - \tilde{β})}^{T} X^{T} X (β - \tilde{β})] {(σ^{2})}^{- 1 - (n - p - 1) / 2} exp [- \frac{1}{2 σ^{2}} y^{T} (I - H) y] \\ \propto {(σ^{2})}^{- \frac{p + 1}{2}} exp [- \frac{1}{2 σ^{2}} {(β - \tilde{β})}^{T} X^{T} X (β - \tilde{β})] {(σ^{2})}^{- 1 - \frac{n - p - 1}{2}} exp (- \frac{(n - p - 1) {\tilde{σ}}^{2}}{2 σ^{2}}), \end{matrix}

where

\tilde{β} = {(X^{T} X)}^{- 1} X^{T} y

{\tilde{σ}}^{2} = \frac{1}{n - p - 1} y^{T} (I - H) y,

and H = X(X^TX)⁻¹X^T. From here the marginal posterior distribution of σ² is

σ^{2} ∣ y, X \sim IG ((n - p - 1) / 2, \frac{(n - p - 1) {\tilde{σ}}^{2}}{2})

with mean

\frac{(n - p - 1) {\hat{σ}}^{2}}{2} / [(n - p - 1) / 2] = {\tilde{σ}}^{2}

, and given σ², the posterior conditional distribution of β is given by

β ∣ σ^{2}, y, X \sim N (\tilde{β}, σ^{2} {(X^{T} X)}^{- 1}) .

6.2. Bayesian Genome-Based Ridge Regression

When p > n, X is not of full column rank and the posterior of model (6.1) may not be proper (Gelman et al. 2013), so a solution is instead to consider independently proper prior distributions, β ∼ N(0, Iσ²) and σ² ∼ IG(α₀, α₀), which for large values of σ² (10⁶) and small values of α₀ (10⁻³) is an approximation to the standard non-informative prior given in (6.1) (Christensen et al. 2011). A similar prior specification is taken in genomic prediction where different models are obtained by adopting different prior distributions of the parameters. For example, the Bayesian Linear Ridge Regression (Pérez and de los Campos 2014) with standardized covariates (X_j^′s) is given by

Y = μ + \sum_{j = 1}^{p} X_{j} β_{j} + ϵ

6.3

with a flat prior for mean parameter (μ), f(μ) ∝ 1, which can be approximately specified by

μ \sim N (0, σ_{0}^{2})

, with a large value of

σ_{0}^{2}

(10¹⁰), a multivariate normal distribution with mean vector 𝟎 and covariance matrix

σ_{β}^{2} I_{p}

on the beta coefficients,

β_{0} = {(β_{1} \dots β_{p})}^{T} ∣ σ_{β}^{2} \sim N_{p} (0, I_{p} σ_{β}^{2}),

and scaled inverse Chi-square distributions as priors for the variance component:

σ_{β}^{2} \sim χ_{v_{β}, S_{β}}^{- 2}

(prior for the variance of the regression coefficients β_j) and

σ^{2} \sim χ_{v, S}^{- 2}

(prior for the variance of random errors, ϵ), where

χ_{v, S}^{- 2}

denotes the scaled inverse Chi-squared distribution with shape parameter v and scale parameter S. The joint posterior distribution of the parameters in this model,

θ = {(μ, β_{0}^{T}, σ^{2}, σ_{β}^{2})}^{T},

is given by

\begin{matrix} f (μ, β_{0}^{T}, σ_{β}^{2}, σ^{2}, y, X) \propto L (μ, β_{0}^{T}, σ^{2}, y) f (θ) \\ \propto L (μ, β_{0}^{T}, σ^{2}, y) f (μ) f (β_{0}| σ_{β}^{2}) f (σ_{β}^{2}) f (σ^{2}) \\ \propto \frac{1}{{(2 {πσ}^{2})}^{\frac{n}{2}}} exp [- \frac{1}{2 σ^{2}} {‖y - 1_{n} μ - X_{1} β_{0}‖}^{2}] \times exp (- \frac{1}{2 σ_{0}^{2}} μ^{2}) \\ \times \frac{1}{{(σ_{β}^{2})}^{\frac{p}{2}}} exp ([- \frac{1}{2 σ_{β}^{2}} β_{0}^{T} β_{0}]) \times \frac{{(\frac{S_{β}}{2})}^{\frac{v_{β}}{2}}}{Γ (\frac{v_{β}}{2}) {(σ_{β}^{2})}^{1 + \frac{v_{β}}{2}}} exp (- \frac{S_{β}}{2 σ_{β}^{2}}) \\ \times \frac{{(\frac{S}{2})}^{\frac{v}{2}}}{Γ (\frac{v}{2}) {(σ^{2})}^{1 + \frac{v}{2}}} exp (- \frac{S}{2 σ^{2}}) . \end{matrix}

This has no known form and it is not easy to simulate values of it, so numerical methods are required to explore it. One way to simulate values of this distribution is by means of the Gibbs sampler method, which consists of alternatingly generating samples of the full conditional distributions of each variable (or block of variables) given the rest of the parameters (Casella and George 1992).

The full conditional posteriors to implement the Gibbs sampler are obtained in the lines below.

The conditional posterior distribution of β₀ is given by

\begin{matrix} f (β_{0} -) \propto L (μ, β_{0}^{T}, σ^{2}, y) f (β_{0}| σ_{β}^{2}) \\ \propto exp [- \frac{1}{2 σ^{2}} {‖\begin{array}{c} y - 1_{n} μ \end{array} - X_{1} β_{0}‖}^{2} - \frac{1}{2 σ_{β}^{2}} β_{0}^{T} β_{0}] \\ \propto exp \{- \frac{1}{2} [β_{0}^{T} (σ_{β}^{- 2} I_{n} + σ^{- 2} X_{1}^{T} X_{1}) β_{0} - 2 σ^{- 2} {(y - 1_{n} μ)}^{T} X_{1} β_{0}]\} \\ \propto exp \{- \frac{1}{2} [{(β_{0} - {\tilde{β}}_{0})}^{T} {\tilde{Σ}}_{0}^{- 1} (β_{0} - {\tilde{β}}_{0})]\} \end{matrix}

${\tilde{Σ}}_{0} = {(σ_{β}^{- 2} I_{p} + σ^{- 2} X_{1}^{T} X_{1})}^{- 1}$ and ${\tilde{β}}_{0} = σ^{- 2} {\tilde{Σ}}_{0} X_{1}^{T} (y - 1_{n} μ)$ . That is, $β_{0} ∣ - \sim N_{p} ({\tilde{β}}_{0}, {\tilde{Σ}}_{0})$ . Similarly, the conditional distribution of μ is $μ ∣ - \sim N (\tilde{μ}, {\tilde{σ}}_{0}^{2})$ , where ${\tilde{σ}}_{0}^{2} = \frac{σ^{2}}{n}$ and $\tilde{μ} = \frac{1}{n} 1_{n}^{T} (y - X_{1} β_{0})$ .

The conditional distribution of σ² is

\begin{matrix} f (σ^{2} -) \propto L (μ, β_{0}^{T}, σ^{2}, y) f (σ^{2}) \\ \propto \frac{1}{{(σ^{2})}^{\frac{n}{2}}} exp [- \frac{1}{2 σ^{2}} {‖\begin{array}{c} y - 1_{n} μ \end{array} - X_{1} β_{0}‖}^{2}] \frac{{(\frac{S}{2})}^{\frac{v}{2}}}{Γ (\frac{v}{2}) {(σ^{2})}^{1 + \frac{v}{2}}} exp (- \frac{S}{2 σ^{2}}) \\ \propto \frac{{(\frac{\tilde{S}}{2})}^{\frac{\tilde{v}}{2}}}{{(σ^{2})}^{1 + \frac{\tilde{v}}{2}}} exp (- \frac{\tilde{S}}{2 σ^{2}}), \end{matrix}

where

\tilde{v} = v + n

and

\tilde{S} = S + {||y - 1_{n} μ - X_{1} β_{0}||}^{2}

. So

σ^{2} ∣ - \sim χ_{\tilde{v}, \tilde{S}}^{- 2},

where

χ_{v, s}^{- 2}

denotes a scaled inverse Chi-squared distribution with parameters v and S. Similarly,

σ_{β}^{2} ∣ - \sim χ_{{\tilde{v}}_{β}, {\tilde{S}}_{β}}^{- 2},

where

{\tilde{v}}_{β} = v_{β} + p

and

{\tilde{S}}_{β} = S_{β} + β_{0}^{T} β_{0}

In summary, for the Ridge regression model, a Gibbs sampler consists of the following steps:

1.

Choose initial values for μ, β₀, and σ².

2.

Simulate a value of the full conditional distribution of $σ_{β}^{2}$ :

σ_{β}^{2} ∣ μ, β_{0}, σ^{2} \sim χ_{\tilde{v}, {\tilde{S}}_{β}}^{- 2},

where $χ_{\tilde{v}, {\tilde{S}}_{β}}^{- 2}$ denotes a scaled inverse Chi-square distribution with shape parameter ${\tilde{v}}_{β} = v_{β} + p$ a $nd scale parameter {\tilde{S}}_{β} = S_{β} + β_{0}^{T} β_{0}$ .

3.

Simulate the full conditional posterior distribution of β₀:

β_{0} ∣ μ, σ_{β}^{2}, σ^{2} \sim N_{p} ({\tilde{β}}_{0}, {\tilde{Σ}}_{0}),

where ${\tilde{Σ}}_{0} = {(σ_{β}^{- 2} I_{p} + σ^{- 2} X_{1}^{T} X_{1})}^{- 1}$ and ${\tilde{β}}_{0} = σ^{- 2} {\tilde{Σ}}_{0} X_{1}^{T} (y - 1_{n} μ)$

4.

Simulate the full conditional distribution of μ:

μ ∣ β_{0}, σ_{β}^{2}, σ^{2} \sim N (\tilde{μ}, {\tilde{σ}}_{μ}^{2}),

where ${\tilde{σ}}_{μ}^{2} = \frac{σ^{2}}{n}$ and $\tilde{μ} = 1_{n}^{T} (y - X_{1} β_{0}) .$

5.

Simulate the full conditional distribution of σ²:

σ^{2} ∣ μ, β_{0}, σ_{β}^{2} \sim χ_{\tilde{v}, \tilde{S}}^{- 2},

where $\tilde{v} = v + n and \tilde{S} = S + {‖\begin{matrix} y - 1_{n} μ \end{matrix} - X_{1} β_{0}‖}^{2}$ .

6.

Repeat steps 2–5 depending on how many values of the parameter vector ( $β^{T}, σ_{β}^{2}, σ^{2}$ ) you wish to simulate. Usually a large number of iterations are needed and an early part of them are discarded, to finally average the rest of each parameter to obtain estimates of them.

The Gibbs sampler described above can be implemented easily with the BGLR R package: if the hyperparameters S-v and S_β-v_β are not specified, by default the BGLR function assigns v = v_β = 5, and to S and S_β assigns values such that the mode of the priors of σ² and $σ_{β}^{2}$ (inverse scaled Chi-square) matches a certain proportion of the total variance (1 − R² and R²): S = Var(Y) × (1 − R²) × (v + 2) and S_β = Var(Y) × R² × (v_β + 2) (see Appendix 2 for more details). Explicitly, in BGLR this model can be implemented by running the following R code:

 ETA = list( list( model = ‘BRR’, X = X₁, df0 = v_β, S0 = S_β, R2 = 1-R²) )
 A = BGLR(y=y, ETA = ETA, nIter = 1e4, burnIn = 1e3, S0 = S, df0 = v, R2 = R²)

where nIter = 1e4 and burnIn = 1e3 are the desired number of iterations and the number of them to be discarded when computing the estimates of the parameters. Remember that when the hyperparameter values are not given, they are set up in the default values, as previously described.

A sub-model of the BRR that does not induce shrinkage of the beta coefficients is obtained by assuming that ${(β_{1} \dots β_{p})}^{T} ∣ σ_{β}^{2} \sim N_{p} (0, I_{p} σ_{β}^{2}),$ ignoring the prior distribution of $σ_{β}^{2}$ and setting this at a very high value (10¹⁰). Note that this model is very similar to the Bayesian model obtained by adopting the prior (6.2), under which the beta coefficients are estimated solely with the information contained in the likelihood function (Pérez and de los Campos 2014). This prior model can also be implemented in the BGLR package and is called FIXED. Certainly, the Gibbs sampler steps for its implementation are the same as the steps described before for the BRR, except that step 2 is removed (no simulations are obtained from $σ_{β}^{2}$ ) and $σ_{β}^{- 2}$ is set equal to zero in the full conditional of β₀ (step 3).

6.3. Bayesian GBLUP Genomic Model

In genomic-enabled prediction, the number of markers used to predict the performance of a trait of interest is often very large compared to the number of individuals phenotyped in the sample (p ≫ n); for this reason, some computational difficulties may arise when exploring the posterior distribution of the beta coefficients. When the main objective is to use this model for predictive purposes, a solution consists of reducing the dimension of the problem by directly simulating values of g = X₁β₀ (breeding values or genomic effects, Lehermeier et al. 2013) instead of only from β₀. To do this, first note that because $β_{0} ∣ σ_{β}^{2} \sim N_{p} (0, I_{p} σ_{β}^{2}),$ to induce a prior for g, this is defined as $g = X_{1} β_{0} ∣ σ_{β}^{2} \sim N_{n} (0, σ_{β}^{2} X_{1} X_{1}^{T}) = N_{n} (0, σ_{g}^{2} G)$ , where $σ_{g}^{2} = p σ_{β}^{2}$ and $G = \frac{1}{p} X_{1} X_{1}^{T},$ which is known as the genomic relationship matrix (VanRaden 2007). Then, under this parameterization (g = X₁β₀ and $σ_{g}^{2} = p σ_{β}^{2}$ ), the model specified in (6.3), in matrix notation takes the following form:

Y = 1_{n} μ + g + ϵ

6.4

with a flat prior to mean parameter (μ),

σ^{2} \sim χ_{v, S}^{- 2}

, and the induced priors:

g = X_{1} β_{0} ∣ σ_{g}^{2} \sim N_{n} (0, σ_{g}^{2} G)

and

σ_{g}^{2} \sim χ_{v_{g}, S_{g}}^{- 2}

(v_g = v_β, S_g = pS_β).

Similarly to what was done for model (6.3), the full conditional posterior distribution of g in model (6.4) is given by

\begin{matrix} f (g -) \propto L (μ, g, σ^{2}, y) f (g| σ_{g}^{2}) \\ \propto \frac{1}{{(2 {πσ}^{2})}^{\frac{n}{2}}} exp [- \frac{1}{2 σ^{2}} {‖y - 1_{n} μ - g‖}^{2}] \frac{1}{{(σ_{g}^{2})}^{\frac{n}{2}}} exp ([- \frac{1}{2 σ_{g}^{2}} g^{T} G^{- 1} g]) \\ \propto exp \{- \frac{1}{2} [{(g - \tilde{g})}^{T} {\tilde{G}}^{- 1} (g - \tilde{g})]\}, \end{matrix}

where

\tilde{G} = {(σ_{g}^{- 2} G^{- 1} + σ^{- 2} I_{n})}^{- 1}

and

\tilde{g} = σ^{- 2} \tilde{G} (y - 1_{n} μ)

, and from here

g ∣ - \sim N_{n} (\tilde{g}, \tilde{G})

. Then, the mean/mode of g ∣ − is

\tilde{g} = σ^{- 2} \tilde{G} (y - 1_{n} μ)

, which is also the best linear unbiased predictor (BLUP) of g under the mixed model equation of Henderson (1975) using the machinery of a classic linear mixed model described in the previous chapter for model (6.4), after recognizing the prior distribution of g as the distribution of the random effects in a model, ignoring the priors specification of the rest of the parameters and assuming that they are known (Henderson 1975). For this reason, model (6.4) is often referred to as GBLUP. If G is replaced by the pedigree matrix A, the resulting model is known as PBLUP or ABLUP.

The full conditional posterior of the rest of parameters is similar to the BRR model: $μ ∣ - \sim N (\tilde{μ}, {\tilde{σ}}_{0}^{2})$ , where ${\tilde{σ}}_{0}^{2} = \frac{σ^{2}}{n}$ and $\tilde{μ} = \frac{1}{n} 1_{n}^{T} (y - g)$ ; $σ^{2} ∣ - \sim χ_{\tilde{v}, \tilde{S}}^{- 2}$ , w $here \tilde{v} = v + n$ and $\tilde{S} = S + {‖y - 1_{n} μ - g‖}^{2};$ and $σ_{g}^{2} ∣ - \sim χ_{{\tilde{v}}_{g}, {\tilde{S}}_{g}}^{- 2},$ where ${\tilde{v}}_{g} = v_{g} + n$ and ${\tilde{S}}_{g} = S_{g} + g^{T} G^{- 1} g$ .

Note that when p ≫ n, then the dimension of the parameter space of the posterior of GBLUP model is lower than the BRR.

The GBLUP model (6.4) also can be implemented easily with the BGLR R package, and when the hyperparameters S-v and S_g-v_g are not specified, v = v_g = 5 is used by default and the scale parameters are settled similarly as in the BRR.

The BGLR code to fit this model:

 ETA = list( list( model = ‘RHKS’, K = G, df0 = v_g, S0 = S_g, R2 = 1-R²)) )
 A = BGLR(y=y, ETA = ETA, nIter = 1e4, burnIn = 1e3, S0 = S, df0 = v, R2 = R²)

The GBLUP can be equivalently expressed and consequently fitted with the BRR model by making the design matrix equal to the lower triangular factor of the Cholesky decomposition of the genomic relationship matrix, i.e., X = L, where G = LL^′. So, with the BGLR package, the BRR implementation of a GBLUP model is.

 L = t(chol(G))
 ETA = list( list( model = ‘BRR’, X = L, df0 = v_β, S0 = S_β, R2 = 1-R²) )
 A = BGLR(y=y, ETA = ETA, nIter = 1e4, burnIn = 1e3, S0 = S, df0 = v, R2 = R²)

When there is more than one repetition of an individual in the data at hand, or a more sophisticated design is used in the data collection, model (6.4) can be specified in a more general way to take into account this structure, as follows:

Y = 1_{n} μ + Zg + ϵ

6.5

with Z the incident matrix of the genotypes. This model cannot be fitted directly in the BGLR and some precalculus is needed first to compute the “covariance” matrix of the predictor Zg in model (6.5): K_L = Var(Zg) = ZGZ^T. The BGLR code for implementing this model is the following:

 Z = model.matrix(~0+GID,data=dat_F,xlev = list(GID=unique(dat_F$GID)))
 K_L = Z%*%G%*%t(Z)
 ETA = list( list( model = ‘RHKS’, K = K_L, df0 = v_g, S0 = S_g, R2 = 1-R²)) )
 A = BGLR(y=y, ETA = ETA, nIter = 1e4, burnIn = 1e3, S0 = S, df0 = v, R2 = R²)

where dat_F is the data set that contains the necessary phenotypic information (GID: Lines or individuals; y: response variable of the trait of interest).

6.4. Genomic-Enabled Prediction BayesA Model

Another variant to the standard Bayesian model (6.1) is the BayesA model proposed by Meuwissen et al. (2001), which is a slight modification of the BRR model obtained with the same prior distributions, except that now a specific variance $σ_{β_{j}}^{2}$ is assumed for each of the covariate (marker) effects, that is, $β_{j} ∣ σ_{β_{j}}^{2} \sim N (0, σ_{βj}^{2})$ , and these variance parameters are supposed to be independent random variables with a scaled inverse Chi-square distribution with parameters v_β and S_β, $σ_{β_{j}}^{2} \sim χ^{- 2} (v_{β}, S_{β})$ . These specific variances for each marker effect provide covariate heterogeneous shrinkage estimation. Furthermore, a gamma distribution is assigned to S_β, S_β ∼ G(r, s), where G(r, s) denotes a gamma distribution where r and s are the rate and shape parameters, respectively. By providing a different prior variance for each β_j, this model has the potential of inducing covariate-specific shrinkage of estimated effects (Pérez and de los Campos 2013).

Note that choosing r = r^∗/v_β and taking very large values of v_β, the prior of $σ_{βj}^{2}$ collapses to a degenerate distribution at S_β, and the BRR model is obtained, but with a gamma distribution with parameters r^∗ and s as priors to the common variance of the effects $σ_{β}^{2} = Var (β_{j}) = S_{β}$ , instead of χ⁻²(v_β, S_β). Furthermore, the marginal distribution of each beta coefficient β_j, that is, the unconditional distribution of β_j ∣ S_β, is a scaled-t-student distribution (scaled by $\sqrt{S_{β} / v_{β}}$ ). These distributions, compared to the normal, have heavier tails and put higher mass around 0, which compared to the BRR, induce fewer shrinkage estimates of covariates with sizable effects, and induce strong shrinkage toward zero estimates of covariates with smaller effects, respectively (de los Campos et al. 2013).

A Gibbs sampler implementation for estimating the parameters of this model can be done following steps 1–6 of the BRR model, where the second step is replaced by the next 2.1 and 2.2 steps, the third and the last step are replaced and modified by the next steps 3 and 6:

2.1.

Simulate from the full conditional of each ${σ_{β}}_{j}^{2}$

σ_{β_{j}}^{2} ∣ μ, β_{0}, σ_{- j}^{2}, S_{β}, σ^{2} \sim χ_{{\tilde{v}}_{j}, {\tilde{S}}_{β}_{j}}^{- 2},

where ${\tilde{v}}_{β_{j}} = v_{β} + 1$ a $nd scale parameter {\tilde{S}}_{β_{j}} = S_{β} + β_{j}^{2}$ , where $σ_{- j}^{2}$ is the vector $σ_{β}^{2} = (σ_{β_{1}}^{2} \dots σ_{β_{p}}^{2})$ but without the jth entry.

2.2.

Simulate from the full conditional of S_β

\begin{matrix} f (S_{β} -) \propto [\prod_{j = 1}^{p} f (σ_{β_{j}}^{2}| S_{β})] f (S_{β}) \\ \propto \prod_{j = 1}^{p} [\frac{{(\frac{S_{β}}{2})}^{\frac{v_{β}}{2}}}{Γ (\frac{v_{β}}{2}) {(σ_{β_{j}}^{2})}^{1 + \frac{v_{β}}{2}}} exp (- \frac{S_{β}}{2 σ_{β_{j}}^{2}})] S_{β}^{s - 1} exp (- {rS}_{β}) \\ \propto S_{β}^{s + \frac{{pv}_{β}}{2} - 1} exp [- (r + \frac{1}{2} \sum_{j = 1}^{p} \frac{1}{σ_{βj}^{2}}) S_{β}] \end{matrix}

which corresponds to the kernel of the gamma distribution with rate parameter $\tilde{r} = r + \frac{1}{2} \sum_{j = 1}^{p} \frac{1}{σ_{βj}^{2}}$ and shape parameter $\tilde{s} = s + \frac{{pv}_{β}}{2}$ , and so $S_{β} ∣ - \sim Gamma (\tilde{r}, \tilde{s}) .$

3. Simulate the full conditional posterior distribution of β₀:
$β_{0} ∣ μ, σ_{β}^{2}, σ^{2} \sim N_{p} ({\tilde{β}}_{0}, {\tilde{Σ}}_{0}),$
where ${\tilde{Σ}}_{0} = {(D_{p}^{- 1} + σ^{- 2} X_{1}^{T} X_{1})}^{- 1}$ and ${\tilde{β}}_{0} = σ^{- 2} {\tilde{Σ}}_{0} X_{1}^{T} (y - 1_{n} μ)$ , $D_{p} = Diag (σ_{β_{1}}^{2} \dots σ_{β_{p}}^{2}) .$
6. Repeat steps 2–5 (given in the BRR method) depending on how many values of the parameter vector ( $β^{T}, σ_{β}^{2}, σ^{2}, S_{β}$ ) we wish to simulate.

When implementing this model in the BGLR package, by default v = v_β = 5 are used and S = Var(Y) × (1 − R²) × (v + 2), which makes the mode of the priors of σ² (χ⁻²(v, S)) match a certain proportion of the total variance (1 − R²). If the hyperparameters of the a priori for S_β, s, and r, are not specified, by default BGLR takes s = 1.1 to have an a priori (S_β ∼ G(s, r)) that is relatively non-informative with a coefficient of variation ( $1 / \sqrt{s}$ ) of approximately 95%. Then, to the rate parameter it assigns the value r = (s − 1)/S_β, with $S_{β} = Var (y) \times R^{2} \times (v_{β} + 2) / S_{x}^{2}$ , where $S_{x}^{2}$ is the sum of the variances of the columns of X and R² is the percentage of the total variation that a priori is required to attribute to the covariates in X. The BGLR code for implementing this model is

 ETA = list( list( model = ‘BayesA’, X=X₁, df0 = v_β, rate0 = r, shape0 = s, R2 = 1-R²))
 A = BGLR(y=y, ETA = ETA, nIter = 1e4, burnIn = 1e3, S0 = S, df0 = v, R2 = R²)

6.5. Genomic-Enabled Prediction BayesB and BayesC Models

Other variants of the model (6.1) are the BayesC and the BayesB models, which in turn can be considered as direct extensions of the BRR and BayesA models, respectively, by adding a parameter π that represents the prior proportion of covariates with nonzero effect (Pérez and de los Campos 2014).

The BayesC is the same as the BRR, but instead of assuming a priori that the beta coefficients are independent normal random variables with mean 0 and variance $σ_{β}^{2}$ , it assumes that with probability π each β_j comes from a $N (0, σ_{β}^{2})$ , and with probability 1 − π comes from a degenerate distribution (DG) at zero, that is, $β_{1}, \dots, β_{p} ∣ σ_{β}^{2}, π \overset{iid}{\sim} π_{p} N (0, σ_{β}^{2}) + (1 - π_{p}) DG (0)$ (mix of a normal distribution with mean 0 and variance $σ_{β}^{2}$ , and degenerate distribution at zero). In addition, for the parameter π_p, a beta distribution is assigned as prior, that is, π_p~Beta (π_p0, ϕ₀), where π_p0 = E(π_p) represents the mean and $ϕ_{0}^{- 1}$ is the “dispersion” parameter ( $var (π_{p}) = \frac{π_{p 0 (1 - π_{p 0})}}{ϕ_{0} + 1}$ ). If ϕ₀ = 2 and π_p0 = 0.5, the prior for π_p is a uniform distribution in (0,1). For large values of ϕ₀, the distribution for π_p is highly concentrated around π_p0, and so the BayesC is reduced to BRR when π_p0 = 1 for large values of ϕ₀.

For this model, the full conditional distributions of μ and $σ_{β}^{2}$ are the same as the model described before, that is, $μ ∣ - \sim N (\tilde{μ}, {\tilde{σ}}_{μ}^{2})$ and $σ^{2} ∣ - \sim χ_{\tilde{v}, \tilde{S}}^{- 2} .$ However, for the rest of the parameters, this does not have a known form and is not easy to simulate from them. A solution is to introduce a latent variable to represent the prior distribution of each β_j, and compute all the conditional distributions in this augmented scheme, including the distribution corresponding to the latent variable. To do this, note that this prior can be specified by assuming that conditional to a binary latent variable Z_j,

β_{j} ∣ σ_{β}^{2}, Z_{j} = z \sim \{\begin{matrix} N (0, σ_{β}^{2}), if z = 1 \\ DG (0), if z = 0, \end{matrix}

where Z_j is a Bernoulli random variable with parameter π_p (Z_j ∼ Ber(π_p)). With this introduced latent variable, all the full conditionals can be derived, as is described next.

If the current value of z_j is 1, the full conditional posterior of β_j is

\begin{matrix} f (β_{j} -) \propto L (μ, β_{0}, σ^{2}, y) f (β_{j}| σ_{β}^{2}| z_{j}) \\ \propto exp (- \frac{1}{2 σ^{2}} ‖\begin{array}{c} y - 1_{n} μ \end{array} - X_{1} β_{0}‖) \frac{1}{\sqrt{2 {πσ}_{β}^{2}}} exp (- \frac{β_{j}^{2}}{2 σ_{β}^{2}}) \\ \propto exp (- \frac{1}{2 σ^{2}} \sum_{i = 1}^{n} {(y_{ij} - x_{ij} β_{j})}^{2} - \frac{1}{2 σ_{β}^{2}} β_{j}^{2}) \\ \propto exp \{- \frac{1}{2} [(σ_{β}^{- 2} + σ^{- 2} \sum_{i = 1}^{n} x_{ij}^{2}) β_{j}^{2} - 2 σ^{- 2} \sum_{i = 1}^{n} x_{ij} y_{ij} β_{j} + \frac{1}{σ^{2}} \sum_{i = 1}^{n} y_{ij}^{2}]\} \\ \propto exp [- \frac{{(β_{j} - {\tilde{β}}_{j})}^{2}}{2 {\tilde{σ}}_{j}^{2}}], \end{matrix}

where

y_{ij} = y_{i} - \sum_{\begin{matrix} k = 1 \\ k \neq j \end{matrix}}^{p} x_{ik} β_{k}

{\tilde{σ}}_{j}^{2} = {(σ_{β}^{- 2} + σ^{- 2} \sum_{i = 1}^{n} x_{ij}^{2})}^{- 1}

, and

{\tilde{β}}_{j} = σ^{- 2} {\tilde{σ}}_{j}^{2} \sum_{i = 1}^{n} x_{ij} y_{ij} .

That is, when the current value of z_j is 1,

β_{j} ∣ - \sim N ({\tilde{β}}_{j}, {\tilde{σ}}_{j}^{2})

. However, if z_j = 0, the full conditional posterior of β_j is a degenerate random variable at 0, that is, β_j ∣ − ∼ DG(0).

The full conditional distribution of Z_j is

\begin{matrix} f (z_{j} -) \propto f (β_{j}| σ_{β}^{2}| z_{j}) f (z_{j}) \\ \propto f (β_{j}| σ_{β}^{2}| z_{j}) π_{p}^{z_{j}} {(1 - π_{j})}^{1 - z_{j}} \end{matrix}

from which, conditional on the rest of the parameters, Z_j is a Bernoulli random variable with parameter

{\tilde{π}}_{pj} = \frac{\frac{π_{p}}{\sqrt{2 π σ_{β}^{2}}} exp (- \frac{β_{j}^{2}}{2 σ_{β}^{2}})}{\frac{π_{p}}{\sqrt{2 π σ_{β}^{2}}} exp (- \frac{β_{j}^{2}}{2 σ_{β}^{2}}) + (1 - π_{p}) δ_{0} (β_{j})}

. Note however that, if β_j ≠ 0,

{\tilde{π}}_{pj} = 1,

and then Z_j = 1 with probability 1, when simulating from the full conditional posterior of β_j, we will always obtain values different from zero, and this cyclic behavior will remain permanent. On the other hand, note that if β_j = 0,

{\tilde{π}}_{pj} = \frac{\frac{π_{p}}{\sqrt{2 π σ_{β}^{2}}}}{\frac{π_{p}}{\sqrt{2 π σ_{β}^{2}}} + (1 - π_{p})}

is not 0, then the next simulated value of β_j will be different from 0, and in this scenario, in the next steps of the “Gibbs,” Z_j will at all times be 1, and so the chain has absorbing states and will not explore the entire sampling space. A solution to this problem consists of trying to simulate from the joint conditional distribution of β_j and Z_j, that is, from β_j, Z_j ∣ −. This full joint conditional distribution can be computed as

f (β_{j} z_{j} -) \propto f_{*} (z_{j}) f (β_{j} z_{j} -) \propto f_{*} (z_{j}) L (μ, β_{0}, σ^{2}, y) f (β_{j}| σ_{β}^{2}| z_{j}),

where f_∗(z_j) is the marginal conditional distribution of Z_j conditioned to all parameters except β_j (Z_j ∣ − j ∼ f_∗(·)). Specifically, this is given by “integrating” (f(β_j, z_j| −) with respect to β_j)

\begin{matrix} f_{*} (z_{j}) \propto \int_{- \infty}^{\infty} f (β_{j} z_{j} -) d β_{j} \propto \int_{- \infty}^{\infty} L (μ, β_{0}, σ^{2}, y) f (β_{j}| σ_{β}^{2}| z_{j}) f (z_{j}| π_{p}) d β_{j} \\ \propto \{\begin{array}{c} \int_{- \infty}^{\infty} exp [- \frac{1}{2 σ^{2}} \sum_{i = 1}^{n} {(y_{ij} - x_{ij} β_{j})}^{2}] \frac{1}{\sqrt{2 {πσ}_{β}^{2}}} exp (- \frac{β_{j}^{2}}{2 σ_{β}^{2}}) π_{p} d β_{j} \\ \int_{- \infty}^{\infty} exp [- \frac{1}{2 σ^{2}} \sum_{i = 1}^{n} {(y_{ij} - x_{ij} β_{j})}^{2}] δ_{0} (β_{j}) (1 - π_{p}) d β_{j} \end{array} \\ \propto \{\begin{array}{c} π_{p} \sqrt{\frac{{\tilde{σ}}_{j}^{2}}{σ_{β}^{2}}} \\ 1 - π_{p} . \end{array} \end{matrix}

From here, Z_j ∣ − is a Bernoulli random distribution with parameter ${\tilde{π}}_{p} = (π_{p} \sqrt{\frac{{\tilde{σ}}_{j}^{2}}{σ_{β}^{2}}}) / (π_{p} \sqrt{\frac{{\tilde{σ}}_{j}^{2}}{σ_{β}^{2}}} + 1 - π_{p})$ . With this and the full conditional distribution derived above for β_j, an easy way to simulate values from β_j, Z_j ∣ − consists of first simulating a z_j value from $Z_{j} ∣ - j \sim Ber ({\tilde{π}}_{p})$ , and then, if z_j = 1, simulating a value of β_j from $β_{j} ∣ - \sim N ({\tilde{β}}_{j}, {\tilde{σ}}_{j}^{2})$ , otherwise take β_j = 0.

Now, note that the full conditional distribution of $σ_{β}^{2}$ is

\begin{matrix} f (σ_{β}^{2} -) \propto \{\prod_{j}^{p} {[f (β_{j}| σ_{β}^{2}| z_{j})]}^{z_{j}}\} f (σ_{β}^{2}) \\ \propto \frac{1}{{(σ_{β}^{2})}^{p {\bar{z}}_{p}}} exp (- \frac{1}{2 σ_{β}^{2}} \sum_{j = 1}^{p} z_{j} β_{j}^{2}) \frac{{(\frac{S_{β}}{2})}^{\frac{v_{β}}{2}}}{Γ (\frac{v_{β}}{2}) {(σ_{β}^{2})}^{1 + \frac{v_{β}}{2}}} exp (- \frac{S_{β}}{2 σ_{β}^{2}}) \\ \propto \frac{1}{{(σ_{β}^{2})}^{p {\bar{z}}_{p}}} exp (- \frac{1}{2 σ_{β}^{2}} \sum_{j = 1}^{p} z_{j} β_{j}^{2}) \frac{{(\frac{S_{β}}{2})}^{\frac{v_{β}}{2}}}{Γ (\frac{v_{β}}{2}) {(σ_{β}^{2})}^{1 + \frac{v_{β}}{2}}} exp (- \frac{S_{β}}{2 σ_{β}^{2}}) \\ \propto \frac{1}{{(σ_{β}^{2})}^{1 + \frac{{\tilde{v}}_{β}}{2}}} exp (- \frac{{\tilde{S}}_{β}}{2 σ_{β}^{2}}), \end{matrix}

where

{\bar{z}}_{p} = 1 / p \sum_{j = 1}^{p} z_{j}

{\tilde{S}}_{β} = S_{β} + \sum_{j = 1}^{p} z_{j} β_{j}^{2},

and

{\tilde{v}}_{β} = v_{β} + p {\bar{z}}_{p}

. That is,

σ_{β}^{2} ∣ μ, β_{0}, σ^{2}, z \sim χ_{\tilde{v}, {\tilde{S}}_{β}}^{- 2}

. The full conditional distributions of μ and σ² are the same as BRR, that is,

μ ∣ - \sim N (\tilde{μ}, {\tilde{σ}}_{μ}^{2})

and

σ^{2} ∣ - \sim χ_{\tilde{v}, \tilde{S}}^{- 2}

, with

{\tilde{σ}}_{μ}^{2} = \frac{σ^{2}}{n},

\tilde{μ} = 1_{n}^{T} (y - X_{1} β_{0}),

\tilde{v} = v + n, and \tilde{S} = S + {‖\begin{matrix} y - 1_{n} μ \end{matrix} - X_{1} β_{0}‖}^{2}

The full conditional distribution of π_p is

\begin{matrix} f (π_{p} -) \propto [\prod_{j = 1}^{p} f (z_{j}| π_{p})] f (π_{p}) \\ \propto π_{p}^{p {\bar{z}}_{p}} {(1 - π_{p})}^{p (1 - {\bar{z}}_{p})} π_{p}^{ϕ_{0} π_{p 0} + 1 - 1} {(1 - π_{p})}^{ϕ_{0} (1 - π_{p 0}) - 1} \\ \propto π_{p}^{ϕ_{0} π_{p 0} + p {\bar{z}}_{p} - 1} {(1 - π_{p})}^{ϕ_{0} (1 - π_{p 0}) + p (1 - {\bar{z}}_{p}) - 1} \end{matrix}

which means that

π_{p} ∣ - \sim Beta ({\tilde{π}}_{p 0}, {\tilde{ϕ}}_{0})

, with

{\tilde{ϕ}}_{0} = ϕ_{0} + p

and

{\tilde{π}}_{p 0} = \frac{ϕ_{0} π_{p 0} + p {\bar{z}}_{p}}{ϕ_{0} + p}

The BayesB model is a variant of BayesA that assumes almost the same prior models to the parameters, except that instead of assuming independent normal random variables with common mean 0 and common variance $σ_{β}^{2}$ for the beta coefficients, this model adopts a mixture distribution, that is, $β_{j} ∣ σ_{β_{j}}^{2}, π \overset{iid}{\sim} πN (0, σ_{β_{j}}^{2}) + (1 - π) DG (0)$ , with π ∼ Beta(π₀, p₀). This model has the potential to perform variable selection and produce covariate-specific shrinkage estimates (Pérez et al. 2010).

This model can also be considered an extension of the BayesC model with a gamma distribution as prior to the scale parameter of the a priori distribution of the variance of the beta coefficients, that is, S_β ∼ G(s, r). It is interesting to point out that if π = 1, this model is reduced to BayesA, which is obtained by taking π₀ = 1 and letting ϕ₀ go to ∞. Also, this is reduced to the BayesC by setting $s / r = S_{β}^{0}$ and choosing a very large value for r.

To explore the posterior distribution of this model, the same Gibbs sampler given for BayesC can be used, but adding to the process the full conditional posterior distribution of S_β: $S_{β} ∣ - \sim Gamma (\tilde{r}, \tilde{s})$ , where $\tilde{r} = r + \frac{1}{2 σ_{β}^{2}}$ and shape parameter $\tilde{s} = s + \frac{v_{β}}{2}$ .

When implementing both these models in the BGLR R package, by default this assigns π_p0 = 0.5 and ϕ₀ = 10, for the hyperparameters of the prior model of π_p, Beta(π_p0, ϕ₀), which results in a weakly informative prior. For the remaining hyperparameters of the BayesC model, by default BGLR assigns values like those assigned to the BRR model, but with some modifications to consider because a priori now only a proportion π₀ of the covariates (columns of X) has nonzero effects:

\begin{array}{l} v = v_{β} = 5, \\ S = Var (Y) \times (1 - R^{2}) \times (v + 2), \\ S_{β} = Var (y) \times R^{2} \times \frac{(v_{β} + 2)}{S_{x}^{2} π_{0}} . \end{array}

While for the remaining hyperparameters of BayesB , by default BGLR also assigns values similar to BayesA: v = v_β = 5, S = Var(Y) × (1 − R²) × (v + 2), r = (s − 1)/S_β, with $S_{β} = Var (y) \times R^{2} \times \frac{(v_{β} + 2)}{S_{x}^{2} π_{0}}$ , where $S_{x}^{2}$ is the sum of the variances of the columns of X.

The BGLR codes to implement these models are, respectively:

ETA = list( list( model = ‘BayesC’, X=X₁ ) )
A = BGLR(y=y, ETA = ETA, nIter = 1e4, burnIn = 1e3, df0 = v, S0 = S, probIn = π_p0, counts = ϕ₀, R2 = R²)

and

ETA = list( list( model = ‘BayesB’, X=X₁ ) )
A = BGLR(y=y, ETA = ETA, nIter = 1e4, burnIn = 1e3, df0 = v, rate0 = r, shape0 = s, probIn = π_p0, counts = ϕ₀, R2 = R²)

6.6. Genomic-Enabled Prediction Bayesian Lasso Model

Another variant of the model (6.1) is the Bayesian Lasso linear regression model (BL). This model assumes independent Laplace or double-exponential distributions with location and scale parameters 0 and $\frac{\sqrt{σ^{2}}}{λ}$ , respectively, for the beta coefficients, that is, $β_{1}, \dots, β_{p} ∣ σ^{2}, λ \overset{iid}{\sim} L (0, \frac{\sqrt{σ^{2}}}{λ})$ . Furthermore, the priors for parameters μ and σ² are the same as in the models described before, while for λ², a gamma distribution with parameters s_λ and r_λ is often adopted.

Because compared to the normal distribution, the Laplace distribution has fatter tails and puts higher density around 0, this prior induces stronger shrinkage estimates for covariates with relatively small effects and reduced shrinkage estimates for covariates with larger effects (Pérez et al. 2010).

A more convenient specification of the prior for the beta coefficients in this model is obtained with the representation proposed by Park and Casella (2008), which is a continuous scale mixture of a normal distribution: β_j ∣ τ_j ∼ N(0, τ_jσ²) and τ_j ∼ Exp(2/λ²), j = 1, …, p, where Exp(θ) denotes an exponential distribution with scale parameter θ. So, unlike the prior used by the BRR model, this prior distribution also puts a higher mass at zero and has heavier tails, which induce stronger shrinkage estimates for covariates with relatively small effect and less shrinkage estimates for markers with sizable effect (Pérez et al. 2010).

Note that the prior distribution for the beta coefficients and the prior variance of this distribution in BayesB and BayesC can be equivalently expressed as a mixture of a scaled inverse Chi-squared distribution with parameters v_β and S_β, and a degenerate distribution at zero, that is, $β_{j} \sim N (0, σ_{β}^{2})$ and $σ_{β}^{2} \sim π_{p} χ^{- 2} (v_{β}, S_{β}) + (1 - π_{p}) DG (0)$ . So, based on this result and the connections between the models described before, the main difference between all these models is the manner in which the prior variance of the predictor variable is modelled.

Example 1

To illustrate how to use the models described before, here we consider the prediction of grain yield (tons/ha) based on marker information. The data set used consists of 30 lines in four environments with one and two repetitions and the genotyped information contains 500 markers for each line. The numbers of lines with one (two) repetition are 6 (24), 2 (28), 0 (30), and 3 (27) in Environments 1, 2, 3, and 4, respectively, resulting in 229 observations. The performance prediction of all these models was evaluated with 10 random partitions in a cross-validation strategy, where 80% of the complete data set was used to fit the model and the rest to evaluate the model in terms of the mean squared error of prediction (MSE).

The results for all models (shown in Table 6.1) were obtained by iterating 10,000 times the corresponding Gibbs sampler and discarding the first 1000 of them, using the default hyperparameter values implemented in BGLR. This indicates that the behavior of all the models is similar, except the BayesC, where the MSE is slightly greater than the rest.

Table 6.1

Mean squared error (MSE) of prediction across 10 random partitions, with 80% for training and the rest for testing, in five Bayesian linear models

The R code to obtain the results in Table 6.1 is given in Appendix 3.

What happens when using other hyperparameter values? Although the ones used here (proposed by Pérez et al. 2010) did not always produce the best prediction performance (Lehermeier et al. 2013) and there are other ways to propose the hyperparameter values in these models (Habier et al. 2010, 2011), it is important to point out that the values used by default in BGLR work reasonably well and that it is not easy to find other combinations that work better in all applications, and when you want to use other combinations of hyperparameters you need to be very careful because you can dramatically affect the predictive performance of the model that uses the default hyperparameters.

Indeed, by means of simulated and experimental data, Lehermeier et al. (2013) observed a strong influence on the predictive performance of the hyperparameters given to the prior distributions in BayesA, BayesB, and the Bayes Lasso with fixed λ. Specifically, in the first two models, they observed that the scale parameter S_β of the prior distribution of variance of β_j had a strong effect on the predictive ability because overfitting in the data occurred when a too large value of this value was chosen, whereas underfitting was observed when too small values of this parameter were used. Note that this is expected approximately by seeing that in both models (BayesA and BayesB), $Var (β_{j}) = E (σ_{β_{j}}^{2}) = S_{β} / (v_{β} - 1$ ), which is almost the inverse of the regularization parameter in any type of Ridge regression model.

6.7. Extended Predictor in Bayesian Genomic Regression Models

All the Bayesian formulations of the model (6.1) described before can be extended, in terms of the predictor, to easily take into account the effects of other factors. For example, effects of environments and environment–marker interaction can be added as

y = 1_{n} μ + X_{E} β_{E} + Xβ + X_{EM} β_{EM} + ϵ,

6.6

where X_E and X_EM are the design matrices of the environments and environment–marker interactions, respectively, while β_E and β_EM are the vectors of the environment effects and the interaction effects, respectively, with a prior distribution that can be specified as was done for β. Indeed, with the BGLR function all these things are possible, and all the options described before can also be adopted for the rest of effects added in the model: FIXED, BRR, BayesA, BayesB, BayesC, and BL.

Under the RKHS model with genotypic and environment–genotypic interaction effects, in the predictor, the modified model (6.6) is expressed as

Y = 1_{n} μ + X_{E} β_{E} + Z_{L} g + Z_{EL} gE + ϵ,

6.7

where Z_L and Z_LE are the incident matrices of the genomic and environment–genotypic interaction effects, respectively. Similarly to model (6.5), this model cannot be fitted directly in the BGLR and some precalculations are needed first to compute the “covariance” matrix of the predictors Z_Lg and Z_ELgE, which are

K_{L} = σ_{g}^{- 2} Var (Z_{L} g) = Z_{L} G Z_{L}^{T}

and

K_{LE} = σ_{gE}^{- 2} Var (Z_{LE} gE) = Z_{LE} (I_{I} ⨂ G) Z_{LE}^{T}

, respectively, where I is the number of environments. The BGLR code for implementing this model is the following:

 I = length(unique(dat_F$Env))
 XE = model.matrix(~0+Env,data=dat_F)[,-1]
 Z_L = model.matrix(~0+GID,data=dat_F,xlev = list(GID=unique(dat_F$GID)))
 K_L = Z_L %*%G%*%t(Z_L)
 Z_LE = model.matrix(~0+GID:Env,data=dat_F,
 xlev = list(GID=unique(dat_F$GID),Env = unique(dat_F$Env)))
 K_LE = Z_LE%*%kronecker(diag(I),G)%*%t(Z_LE)
  ETA = list( list(model='FIXED',X=XE),
  list( model = ‘RHKS’, K = K_L, df0 = v_g, S0 = S_g, R2 = 1-R²)),
  list(model='RKHS',K=K_LE )
 A = BGLR(y,ETA = ETA, nIter = 1e4, burnIn = 1e3, S0 = S, df0 = v, R2 = R²)

where dat_F is the data set that contains the necessary phenotypic information (GID: Lines or individuals; Env: Environment; y: response variable of trait under study).

Example 2 (Includes Models with Only Env Effects and Models with Env and LinexEnv Effects)

To illustrate how to fit the extended genomic regression models described before, here we consider the prediction of grain yield (tons/ha) based on marker information and the genomic relationship derived from it. The data set used consists of 30 lines in four environments, and the genotyped information of 500 markers for each line. The performance prediction of 18 models was evaluated with a five-fold cross-validation, where 80% of the complete data set was used to fit the model and the rest to evaluate the model in terms of the mean squared error of prediction (MSE). These models were obtained by considering different predictors (marker, environment, or/and environment–marker interaction) and different prior models to the parameters of each predictor included.

The model M1 only considered in the predictor the marker effects, from which six sub-models were obtained by adopting one of the six options (BRR, RKHS, BayesA, BayesB, BayesC, or BL) to the prior model of β (or g). Model M2 is model M1 plus the environment effects with a FIXED prior model, for all prior sub-models in the marker predictor. Model M3 is model M2 plus the environment–marker interaction, with a prior model of the same family as those chosen for the marker predictor (see in Table 6.2 all the models we compared).

Table 6.2

Fitted models: M_md, m = 1, 2, 3, d = 1, …, 6

The performance prediction of the models presented in Table 6.2 is shown in Table 6.3. The first column represents the kind of prior model used in both marker effects and env:marker interaction terms, when the latter is included in the model. In each of the first five prior models, model M2 resulted in better MSE performance, while when the BL prior model was used, model M3, the model with the interaction term, was better. The greater difference is between M1 and M2, where the average MSE across all priors of the first model is approximately 21% greater than the corresponding average of the M2 model. Similar behavior was observed with Pearson’s correlation, with the average of this criterion across all priors about 32% greater in model M2 than in M1. So the inclusion of the environment effect was important, but not the environment:marker interaction.

Table 6.3

Performance prediction of models in Table 6.2: Mean squared error of prediction (MSE) and average Pearson’s correlation (PC), each with its standard deviation across the five partitions

The best prediction performance in terms of MSE was obtained with sub-model M25 (M2 with a BayesC prior) followed by M21 (M2 with a BRR prior). However, the difference between those and sub-models M22, M23, and M24, also derived from M2, is only slight and a little more than with M26, which as commented before, among the models that assume a BL prior, showed a worse performance than M36 (M3 plus a BL prior for marker effects and environment–marker interaction).

6.8. Bayesian Genomic Multi-trait Linear Regression Model

The univariate models described for continuous outcomes do not exploit the possible correlation between traits, when the selection of better individuals is based on several traits and these univariate models are fitted separately to each trait. Relative to this, some advantages of jointly modelling the multi-traits is that in this way the correlation among the traits is appropriately accounted for and can help to improve statistical power and the precision of parameter estimation, which are very important in genomic selection, because they can help to improve prediction accuracy and reduce trait selection bias (Schaeffer 1984; Pollak et al. 1984; Montesinos-López et al. 2018).

An example of this is when crop breeders collect phenotypic data for multiple traits such as grain yield and its components (grain type, grain weight, biomass, etc.), tolerance to biotic and abiotic stresses, and grain quality (taste, shape, color, nutrient, and/or content) (Montesinos-López et al. 2016). In this and many other cases, sometimes the interest is to predict traits that are difficult or expensive to measure with those that are easy to measure or the aim can be to improve all these correlated traits simultaneously (Henderson and Quaas 1976; Calus and Veerkamp 2011; Jiang et al. 2015). In these lines, there is evidence of the usefulness of multi-trait modelling. For example, Jia and Jannink (2012) showed that, compared to single-trait modelling, the prediction accuracy of low-heritability traits could be increased by using a multi-trait model when the degree of correlation between traits is at least moderate. They also found that multi-trait models had better prediction accuracy when phenotypes were not available on all individuals and traits. Joint modelling also has been found useful for increasing prediction accuracy when the traits of interest are not measured in the individuals of the testing set, but this and other traits were observed in individuals in the training set (Pszczola et al. 2013).

6.8.1. Genomic Multi-trait Linear Model

The genomic multi-trait linear model adopts a univariate genomic linear model structure for each trait but with correlated residuals and genotypic effects for traits in the same individual. Assuming that for individual j, n_T traits (Y_jt, t = 1, …, n_T) are measured, this model assumes that

[\begin{matrix} Y_{j 1} \\ Y_{j 2} \\ ⋮ \\ Y_{j n_{T}} \end{matrix}] = [\begin{matrix} μ_{1} \\ μ_{2} \\ ⋮ \\ μ_{n_{T}} \end{matrix}] + [\begin{matrix} x_{j}^{T} β_{1} \\ x_{j}^{T} β_{2} \\ ⋮ \\ x_{j}^{T} β_{n_{T}} \end{matrix}] + [\begin{matrix} g_{j 1} \\ g_{j 2} \\ ⋮ \\ g_{{jn}_{T}} \end{matrix}] + [\begin{matrix} ϵ_{j 1} \\ ϵ_{j 2} \\ ⋮ \\ ϵ_{{jn}_{T}} \end{matrix}],

where μ_t, t = 1, …, n_T, are the specific trait intercepts, x_j is a vector of covariates equal for all traits, g_jt, t = 1, …, n_T, are the specific trait genotype effects, and ϵ_jt, t = 1, …, n_T are the random error terms corresponding to each trait. In matrix notation, it can be expressed as

Y_{j} = μ + B^{T} x_{j} + g_{j} + ϵ_{j},

6.8

where

Y_{j} = {[Y_{j 1} \dots Y_{j n_{T}}]}^{T}

μ = {[μ_{1} \dots μ_{n_{T}}]}^{T}

B = [\begin{matrix} β_{1}, & \dots, & β_{n_{T}} \end{matrix}]

g_{j} = {[g_{j 1} \dots g_{j n_{T}}]}^{T},

and

ϵ_{j} = {[ϵ \dots ϵ_{n_{T}}]}^{T} .

The residual vectors are assumed to be independent with multivariate normal distribution, that is

ϵ_{j} \sim N_{n_{T}} (0, R),

and all the random genotype effects are assumed to be

g = {[g_{1}^{T} \dots g_{J}^{T}]}^{T} \sim N (0, G ⨂ Σ_{T}),

⨂ being the Kronecker product. For a full Bayesian specification of this model, we suppose that β = vec(B) ∼ N(β₀, Σ_β), that is, marginally, for the fixed effect of each trait, a prior multivariate normal distribution is adopted,

β_{t} \sim N_{p} (β_{t 0}, Σ_{β_{t}}), t = 1, \dots, n_{T};

a flat prior for the intercepts, f(μ) ∝ 1; and independent inverse Wishart distributions for the covariance matrix of residuals R and for Σ_T, that is, Σ_T ∼ IW(v_t, S_t) and R ∼ IW(v_R, S_R).

Putting all the information together where the measured traits of each individual (Y_j) are accommodated in the rows of a matrix response (Y), model (6.8) can be expressed as

Y = 1_{J} μ^{T} + XB + Z_{1} b_{1} + E,

6.9

where =[Y₁, …, Y_J]^T, X = [x₁, …, x_J]^T, b₁ = [g₁, …, g_J]^T, and E = [ϵ₁, …, ϵ_J]^T. Note that under this notation,

E^{T} \sim M N_{n_{T} \times J} (0, R, I_{J})

or equivalently

E \sim M N_{J \times n_{T}} (0, I_{J}, R)

, and

b_{1}^{T} \sim {MN}_{n_{T} \times J} (0, Σ_{T}, G)

b_{1} \sim {MN}_{J \times n_{T}} (0, G, Σ_{T})

. Here

Z \sim {MN}_{J \times n_{T}} (M, U, V)

means that the random matrix Z follows the matrix variate normal distribution with parameters M, U, and V, or equivalently, that the Jn_T random vector vec(Z) is distributed as

N_{J n_{T}} (vec (M), V \otimes U),

with vec(·) denoting the vectorization of a matrix that stacks the columns of this in a single column. Note that when Σ_T and R are diagonal matrices, model (6.9) is equivalent to separately fitting a univariate GBLUP model to each trait.

The conditional distribution of all traits is given by

\begin{matrix} f (Y| μ| β| b_{1}| Σ_{T}| R) = \frac{{|R|}^{- \frac{J}{2}}}{{(2 π)}^{{Jn}_{T}}} exp \{- \frac{1}{2} tr [R^{- 1} {(Y - 1_{J} μ^{T} - XB - Z_{1} b_{1})}^{T} I_{J} (Y - 1_{J} μ^{T} - Xβ - Z_{1} b_{1})]\} \\ = \frac{{|R|}^{- \frac{J}{2}}}{{(2 π)}^{{Jn}_{T}}} exp \{- \frac{1}{2} {[vec (Y - 1_{J} μ^{T} - XB - Z_{1} b_{1})]}^{T} (R^{- 1} \otimes I_{J}) [vec (Y - 1_{J} μ^{T} - XB - Z_{1} b_{1})]\} \end{matrix}

and the joint posterior of parameters μ, B, b₁, Σ_T, and R is given by

f (μ, B, b_{1}, Σ_{T}, R, Y) \propto f (Y| μ| B| b_{1}| Σ_{T}| R) f (b_{1}| Σ_{T}) f (Σ_{T}) f (β) f (R),

where f(b₁| Σ_T) denotes the conditional distribution of the genotype effects, and f(Σ_T), f(β), and f(R) denote the prior density distribution of Σ_T, B, and R, respectively. This joint posterior distribution of the parameters doesn’t have closed form; for this reason, next are derived the full conditional distributions for Gibbs sampler implementation.

Let β₀ and Σ_β be the prior mean and variance of β = vec(B). Because tr(AB) = vec(A^T)^Tvec(B) = vec(B)^Tvec(A^T) and vec (AXB) = (B^T ⊗ A)vec(X), we have that

\begin{matrix} P (β -) \propto f (Y| μ| B| b_{1}| Σ_{T}| R) f (β) \\ \propto exp \{\begin{matrix} - \frac{1}{2} {[vec (Y - 1_{J} μ^{T} - Z_{1} b_{1}) - (I_{n_{T}} \otimes X) vec (B)]}^{T} (R^{- 1} \otimes I_{J}) \\ \times [vec (Y - 1_{J} μ^{T} - Z_{1} b_{1}) - (I_{n_{T}} \otimes X) vec (B)] - \frac{1}{2} {(β - β_{0})}^{T} Σ_{β}^{- 1} (β - β_{0}) \end{matrix}\} \\ \propto exp \{\begin{matrix} - \frac{1}{2} {[vec (Y - 1_{J} μ^{T} - Z_{1} b_{1}) - (I_{n_{T}} \otimes X) β]}^{T} (R^{- 1} \otimes I_{J}) \\ \times [vec (Y - 1_{J} μ^{T} - Z_{1} b_{1}) - (I_{n_{T}} \otimes X) β] - \frac{1}{2} {(β - β_{0})}^{T} Σ_{β}^{- 1} (β - β_{0}) \end{matrix}\} \\ \propto exp \{- \frac{1}{2} {[β - {\tilde{β}}_{0}]}^{T} {\tilde{Σ}}_{β}^{- 1} [β - {\tilde{β}}_{0}]\}, \end{matrix}

where

{\tilde{Σ}}_{β} = {[Σ_{β}^{- 1} + (R^{- 1} \otimes X^{T} X)]}^{- 1}

and

{\tilde{β}}_{0} = {\tilde{Σ}}_{β} [Σ_{β}^{- 1} β_{0} + (R^{- 1} \otimes X^{T}) vec (Y - 1_{J} μ^{T} - Z_{1} b_{1})]

. So, the full conditional distribution of β is

N_{p} ({\tilde{β}}_{0}, {\tilde{Σ}}_{β})

. Similarly, the full conditional distribution of g = vec(b₁) is

N_{J} (\tilde{g}, \tilde{G})

, with

\tilde{G} = {[(Σ_{T}^{- 1} \otimes G^{- 1}) + (R^{- 1} \otimes Z_{1}^{T} Z_{1})]}^{- 1}

and

\tilde{g} = \tilde{G} (R^{- 1} \otimes Z_{1}^{T}) vec (Y - 1_{J} μ^{T} - XB) .

Now, because

vec (1_{J} μ^{T}) = (I_{n_{T}} \otimes 1_{J}) μ

, similarly as before, the full conditional of μ is

N_{n_{T}} (\tilde{μ}, {\tilde{Σ}}_{μ})

, where

{\tilde{Σ}}_{μ} = J^{- 1} R

and

\tilde{μ} = {\tilde{Σ}}_{μ} (R^{- 1} \otimes 1_{J}) vec (Y - XB - Z_{1} B)

The full conditional distribution of Σ_T

\begin{matrix} P (Σ_{T} -) \propto P (b_{1}| Σ_{T}) P (Σ_{T}) \\ \propto {|Σ_{T}|}^{- \frac{J}{2}} {|G|}^{- \frac{n_{T}}{2}} exp \{- \frac{1}{2} tr [b_{1}^{T} G^{- 1} b_{1} Σ_{T}^{- 1}]\} P (Σ_{T}) \\ \propto Σ_{T}^{- \frac{v_{T} + J + n_{T} + 1}{2}} exp \{- \frac{1}{2} tr (b_{1}^{T} G^{- 1} b_{1} + S_{T}) Σ_{T}^{- 1}\} . \end{matrix}

From here we have that $Σ_{T} ∣ - \sim IW (v_{T} + J, b_{1}^{T} G^{- 1} b_{1} + S_{T}) .$ Now, because

\begin{matrix} P (R -) \propto f (Y| μ| B| b_{1}| Σ_{T}| R) f (R) \\ \propto {|R|}^{- \frac{J}{2}} exp \{- \frac{1}{2} tr [R^{- 1} {(Y - 1_{J} μ^{T} - XB - Z_{1} b_{1})}^{T} I_{J} (Y - 1_{J} μ^{T} - Xβ - Z_{1} b_{1})]\} \\ {|R|}^{- \frac{v_{R} + n_{T} + 1}{2}} exp \{- \frac{1}{2} tr (S_{T} R^{- 1})\} \\ \propto {|R|}^{- \frac{v_{R} + J + n_{T} + 1}{2}} exp \{- \frac{1}{2} tr [S_{T} + {(Y - 1_{J} μ^{T} - XB - Z_{1} b_{1})}^{T} (Y - 1_{J} μ^{T} - Xβ - Z_{1} b_{1})] R^{- 1}\} \end{matrix}

the full conditional distribution of R is

IW ({\tilde{v}}_{R}, {\tilde{S}}_{R})

, where

{\tilde{v}}_{R} = v_{R} + J

and

{\tilde{S}}_{R} = S_{T} + {(Y - 1_{J} μ^{T} - XB - Z_{1} b_{1})}^{T} (Y - 1_{J} μ^{T} - Xβ - Z_{1} b_{1}) .

In summary, a Gibbs sampler exploration of the joint posterior distribution of μ, β, g, Σ_T, and R can be done with the following steps:

1.: Simulate β from a multivariate normal distribution $N_{p} ({\tilde{β}}_{0}, {\tilde{Σ}}_{β})$ , where ${\tilde{Σ}}_{β} = {[Σ_{β}^{- 1} + (R^{- 1} \otimes X^{T} X)]}^{- 1}$ and ${\tilde{β}}_{0} = {\tilde{Σ}}_{β} [Σ_{β}^{- 1} β_{0} + (R^{- 1} \otimes X^{T}) vec (Y - 1_{J} μ^{T} - Z_{1} b_{1})]$ .
2.: Simulate μ from $N_{n_{T}} (\tilde{μ}, {\tilde{Σ}}_{μ}),$ where ${\tilde{Σ}}_{μ} = J^{- 1} R$ and $\tilde{μ} = {\tilde{Σ}}_{μ} (R^{- 1} \otimes 1_{J}) vec (Y - XB - Z_{1} B)$ .
3.: Simulate g = vec(b₁) from $N_{J} (\tilde{g}, \tilde{G}),$ where $\tilde{G} = {[(Σ_{T}^{- 1} \otimes G^{- 1}) + (R^{- 1} \otimes Z_{1}^{T} Z_{1})]}^{- 1}$ and $\tilde{g} = \tilde{G} (R^{- 1} \otimes Z_{1}^{T}) vec (Y - 1_{J} μ^{T} - XB) .$
4.: Simulate Σ_T from $IW (v_{T} + J, b_{1}^{T} G^{- 1} b_{1} + S_{T})$ .
5.: Simulate R from $IW ({\tilde{v}}_{R}, {\tilde{S}}_{R}),$ where ${\tilde{v}}_{R} = v_{R} + J$ and ${\tilde{S}}_{R} = S_{T} + {(Y - 1_{J} μ^{T} - XB - Z_{1} b_{1})}^{T} (Y - 1_{J} μ^{T} - Xβ - Z_{1} b_{1}) .$
6.: Return to step 1 or terminate when chain length is adequate to meet convergence diagnostics and the required sample size is reached.

An implementation of this model can be done using the github version of the BGLR R library, which can be accessed from https://github.com/gdlc/BGLR-R and can be installed directly in the R console by running the following commands: install.packages('devtools'); library(devtools); install_git('https://github.com/gdlc/BGLR-R'). This implementation also uses a flat prior for the fixed effect regression coefficients β, and in such a case, the corresponding full conditional of this parameter is the same as step 1 of the Gibbs sampler given before, but removing $Σ_{β}^{- 1}$ and $Σ_{β}^{- 1} β_{0}$ from ${\tilde{Σ}}_{β}$ and ${\tilde{β}}_{0}$ , respectively. Specifically, model (6.8) can be implemented with this version of the BGLR package as follows:

ETA = list( list( X=X, model='FIXED' ), list( K=Z₁GZ_₁^T, model=’ RKHS’ ) )
A = Multitrait(y = Y, ETA=ETA, resCov = list( type = 'UN', S0 = S_R, df0 = v_R ), nIter = nI, burnIn = nb)

The first argument in the Multitrait function is the response variable which many times is a phenotype matrix where each row corresponds to the measurement of n_T traits in each individual. The second argument is a list predictor in which the first sub-list specifies the design matrix and prior model to the fixed effects part of the predictor in model (6.9), and in the second sub-list are specified the parameters of the distribution of random genetic effects of b₁, where it is specified the K = G genomic relationship matrix, that accounts for the similarity between individuals based on marker information, df0 = v_T and S0 = S_T are the degrees of freedom parameter (v_T) and the scale matrix parameter (S_T) of the inverse Wishart prior distribution for Σ_T, respectively. In the third argument (resCOV), S0 and df0 are the scale matrix parameter (S_R) and the degree of freedom parameter (v_R) of the inverse Wishart prior distribution for R. The last two arguments are the required number of iterations (nI) and the burn-in period (nb) for running the Gibbs sampler.

Similarly to the univariate case, model (6.9) can be equivalently described and implemented as a multivariate Ridge regression model, as follows:

Y = 1_{J} μ^{T} + XB + X_{1} B_{1} + E,

6.10

where X₁ = Z₁L_G,

G = L_{G} L_{G}^{T}

is the Cholesky factorization of G,

B_{1} = L_{G}^{- 1} b_{1} \sim {MN}_{J \times n_{T}} (0, I_{J}, Σ_{T})

, and the specifications for the rest of parameters and prior distribution are the same as given in model (6.8). A Gibbs sampler implementation of this model is very similar to the one described before, with little modification. Indeed, a Gibbs implementation with the same multi-trait function is as follows:

L_G= t(chol(G))
X₁= Z₁L_G
ETA = list( list( X=X, model='FIXED' ), list( X=X₁, model='BRR' ) )
A = Multitrait(y = Y, ETA=ETA, resCov = list( type = 'UN', S0 = S_R, df0 = v_R ), nIter = nI, burnIn = nb)

with the only change in the second sub-list predictor, where now the design matrix X₁ and the Ridge regression model (BRR) are specified.

Example 3

To illustrate the performance in terms of the prediction power of these models and how to implement this in R software, we considered a reduced data set that consisted of 50 wheat lines grown in two environments. In each individual, two traits were measured: FLRSDS and MIXTIM. The evaluation was done with a five-fold cross-validation, where lines were evaluated in some environments with all traits but are missing for all traits in other environments. Model (6.9) was fitted and the environment effect was assumed a fixed effect.

The results are shown in Table 6.4, where the average (standard deviation) of two performance criteria is shown for each trait in each environment: average Pearson’s correlation (PC) and the mean squared error of prediction (MSE). Table 6.4 shows good correlation performance in both traits and in both environments, and better predictions were obtained in environment 2 with both criteria. The magnitude of the MSE in the first trait is mainly because the measurement scale is greater than in the second trait.

Table 6.4

Average Pearson’s correlation (PC) and mean squared error of prediction (MSE) between predicted and observed values across five random partitions where lines were evaluated in some environments with all traits but are missing for all traits in (more...)

The R codes to reproduce these results (Table 6.4) are shown in Appendix 5.

6.9. Bayesian Genomic Multi-trait and Multi-environment Model (BMTME)

Model (6.9) does not take into account the possible trait–genotype–environment interaction (T × G × E), when environment information is available. An extension of this model is the one proposed by Montesinos-López et al. (2016), who added this interaction term to vary the specific trait genetic effects (g_j) across environments. If the information of n_T traits of J lines is collected in I environments, this model is given by

Y = 1_{IJ} μ^{T} + XB + Z_{1} b_{1} + Z_{2} b_{2} + E,

6.11

where =[Y₁, …, Y_IJ]^T, X = [x₁, …, x_IJ]^T, Z₁ and Z₂ are the incident lines and the incident environment–line interaction matrices, b₁ = [g₁, …, g_J]^T, b₂ = [g₂₁, …, g_2IJ]^T, and E = [ϵ₁, …, ϵ_IJ]^T. Here,

b_{2} ∣ Σ_{T}, Σ_{E} \sim M N_{IJ \times n_{T}} (0, Σ_{E} ⨂ G, Σ_{T})

, and similar to model (6.2),

b_{1} ∣ Σ_{T} \sim {MN}_{J \times n_{T}} (0, G, Σ_{T})

and

E \sim M N_{IJ \times n_{T}} (0, I_{IJ}, R)

. The complete Bayesian specification of this model assumes independent multivariate normal distributions for the columns of B, that is, for the fixed effect of each trait a prior multivariate normal distribution is adopted,

β_{t} \sim N_{p} (β_{t 0}, Σ_{β_{t}}), t = 1, \dots, n_{T};

a flat prior for the intercepts, f(μ) ∝ 1; and independent inverse Wishart distributions for the covariance matrices of residuals R and for Σ_T, Σ_T ∼ IW(v_T, S_T) and R ∼ IW(v_R, S_R), and also an inverse Wishart distribution for Σ_E, Σ_E ∼ IW(v_E, S_E).

The full conditional distributions of μ, B, b₁, b₂, and R can be derived as in model (6.9). The full conditional distribution of Σ_T is

\begin{matrix} f (Σ_{T} -) \propto f (b_{1}| Σ_{T}) P (b_{2}| Σ_{T}| Σ_{E}) P (Σ_{T}) \\ \propto {|Σ_{T}|}^{- \frac{J}{2}} {|G|}^{- \frac{L}{2}} exp \{- \frac{1}{2} tr [b_{1}^{T} G^{- 1} b_{1} Σ_{T}^{- 1}]\} \\ \times {|Σ_{T}|}^{- \frac{IJ}{2}} {|Σ_{E} \otimes G|}^{- \frac{n_{T}}{2}} exp \{- \frac{1}{2} tr [b_{2}^{T} (Σ_{E}^{- 1} \otimes G^{- 1}) b_{2} Σ_{T}^{- 1}]\} P (Σ_{T}) \\ \propto Σ_{T}^{- \frac{v_{T} + J + IJ + n_{T} + 1}{2}} exp \{- \frac{1}{2} tr (b_{1}^{T} G^{- 1} b_{1} + b_{2}^{T} (Σ_{E}^{- 1} \otimes G^{- 1}) b_{2} + S_{t}) Σ_{T}^{- 1}\}, \end{matrix}

that is,

Σ_{T} ∣ - \sim IW (v_{T} + J + IJ, b_{1}^{T} G^{- 1} b_{1} + b_{2}^{T} (Σ_{E}^{- 1} \otimes G^{- 1}) b_{2} + S_{T}) .

Now, let be $b_{2}^{*}$ a Jn_T × I matrix such that $vec (b_{2}^{T}) = vec (b_{2}^{*}) .$ Then because $b_{2}^{T} ∣ Σ_{T}, Σ_{E} \sim M N_{n_{T} \times IJ} (0, Σ_{T}, Σ_{E} ⨂ G)$ , $vec (b_{2}^{T}) ∣ Σ_{T}, Σ_{E} \sim N (0, Σ_{E} ⨂ (G ⨂ Σ_{T})),$ and $b_{2}^{*} ∣ Σ_{T}, Σ_{E} \sim M N_{n_{T} \times IJ} (0, G ⨂ Σ_{T}, Σ_{E})$ , the full conditional posterior distribution of Σ_E is

\begin{matrix} P (Σ_{E}| ELSE) \propto P (b_{2}| Σ_{E}) P (Σ_{E}) \\ \propto {|Σ_{E}|}^{- \frac{JL}{2}} {|G \otimes Σ_{T}|}^{- \frac{I}{2}} exp \{- \frac{1}{2} tr [Σ_{E}^{- 1} b_{2}^{* T} (G^{- 1} \otimes Σ_{T}^{- 1}) b_{2}^{*}]\} \\ {|S_{E}|}^{\frac{υ_{E} + I - 1}{2}} \times {|Σ_{E}|}^{- \frac{υ_{E}}{2}} exp \{- \frac{1}{2} tr (S_{E} Σ_{E}^{- 1})\} \\ \propto {|Σ_{E}|}^{- \frac{υ_{E} + JL + I + 1}{2}} exp \{- \frac{1}{2} tr [(b_{2}^{* T} (G^{- 1} \otimes Σ_{T}^{- 1}) b_{2}^{*} + S_{E})] Σ_{E}^{- 1}\} \end{matrix}

which means that

Σ_{E} ∣ - \sim IW (υ_{E} + JL, b_{2}^{* T} (G^{- 1} \otimes Σ_{T}^{- 1}) b_{2}^{*} + S_{E})

A Gibbs sampler to explore the joint posterior distribution of parameters of model (6.11), μ, β, b₁,b₂, Σ_T, Σ_E, and R, can be implemented with the following steps:

1.: Simulate β from a multivariate normal distribution $N_{p} ({\tilde{β}}_{0}, {\tilde{Σ}}_{β})$ , where ${\tilde{Σ}}_{β} = {[Σ_{β}^{- 1} + (R^{- 1} \otimes X^{T} X)]}^{- 1}$ and ${\tilde{β}}_{0} = {\tilde{Σ}}_{β} [Σ_{β}^{- 1} β_{0} + (R^{- 1} \otimes X^{T}) vec (Y - 1_{IJ} μ^{T} - Z_{1} b_{1} - Z_{2} b_{2})]$ .
2.: Simulate μ from $N_{n_{T}} (\tilde{μ}, {\tilde{Σ}}_{μ})$ , where ${\tilde{Σ}}_{μ} = {(IJ)}^{- 1} R$ and $\tilde{μ} = {\tilde{Σ}}_{μ} (R^{- 1} \otimes 1_{IJ}) vec (Y - XB - Z_{1} b_{1} - Z_{2} b_{2})$ .
3.: Simulate g₁ = vec(b₁) from $N_{J} ({\tilde{g}}_{1}, \tilde{G})$ , where $\tilde{G} = {[(Σ_{T}^{- 1} \otimes G^{- 1}) + (R^{- 1} \otimes Z_{1}^{T} Z_{1})]}^{- 1}$ and $\tilde{g} = \tilde{G} (R^{- 1} \otimes Z_{1}^{T}) vec (Y - 1_{IJ} μ^{T} - XB - Z_{2} b_{2}) .$
4.: Simulate g₂ = vec(b₂) from $N_{J} ({\tilde{g}}_{2}, {\tilde{G}}_{2})$ , where ${\tilde{G}}_{2} = {[(Σ_{T}^{- 1} \otimes Σ_{E}^{- 1} ⨂ G^{- 1}) + (R^{- 1} \otimes Z_{2}^{T} Z_{2})]}^{- 1}$ and ${\tilde{g}}_{2} = {\tilde{G}}_{2} (R^{- 1} \otimes Z_{2}^{T}) vec (Y - 1_{IJ} μ^{T} - XB - Z_{1} b_{1}) .$
5.: Simulate Σ_T from $IW (v_{T} + J + IJ, b_{1}^{T} G^{- 1} b_{1} + b_{2}^{T} (Σ_{E}^{- 1} \otimes G^{- 1}) b_{2} + S_{T})$ .
6.: Simulate Σ_E from $IW (υ_{E} + JL, b_{2}^{* T} (G^{- 1} \otimes Σ_{T}^{- 1}) b_{2}^{*} + S_{E}) .$
7.: Simulate R from $IW ({\tilde{v}}_{R}, {\tilde{S}}_{R})$ , where ${\tilde{v}}_{R} = v_{R} + IJ$ and ${\tilde{S}}_{R} = S_{T} + {(Y - 1_{IJ} μ^{T} - XB - Z_{1} b_{1} - Z_{2} b_{2})}^{T} (Y - 1_{IJ} μ^{T} - Xβ - Z_{1} b_{1} - Z_{2} b_{2}) .$
8.: Return to step 1 or terminate when chain length is adequate to meet convergence diagnostics and the required sample size is reached.

A similar Gibbs sampler is implemented in the BMTME R package, with the main difference, that this package does not allow specifying a general fixed effect design matrix X, only the corresponding to the design matrix for the environment effects, and also the intercept vector μ is ignored because it is included in the fixed environment effects. Specifically, to fit model (6.11) where the only fixed effects to be taken into account are the environment’s effects, the R code to implement this with the BMTME package is as follows:

XE = model.matrix(~Env,data=dat_F)
Z1 = model.matrix(~0+GID,data=dat_F)
Lg = t(chol(G))
Z1_a = Z1%*%Lg
Z2 = model.matrix(~0+GID:Env,data=dat_F)
G2 = kronecker(diag(dim(XE)[2])),Lg)
Z2 _a = Z2%*%G2
A = BMTME(Y = Y, X = XE, Z1 = Z1_a, Z2 = Z2_a, nIter = nI, burnIn = nb, thin = 2, bs = 50)

where Y is the matrix of response variables where each row corresponds to the measurement of n_T traits in each individual, XE is a design matrix for the environment effects, Z1 is the incidence matrix of the genetics effects, Z2 is the design matrix of the genetic–environment interaction effects, nI and nb are the required number of interactions and the burn-in period, and bs is the number of blocks to use internally to sample from vec(b₂).

Example 4

To illustrate how to implement this model with the BMTME R package, we considered the data in Example 2, but now the explored model includes the trait–genotype–environment interaction.

The average results of the prediction performance in terms of PC and MSE for implementing the same five-fold cross-validation used in Example 3 are shown in Table 6.5. These results show an improvement in terms of prediction performance with this model in all trait environments combinations and in both criteria (PC and MSE) to measure the prediction performance, except in trait MIXTIM and Env 2, where the MSE is slightly greater than the one obtained with model (6.9), which does not take into account the triple interaction (trait–genotype–environment).

Table 6.5

The R code used to obtained these results is given in Appendix 5.

Appendix 1

The probability density function (pdf) of the scaled inverse Chi-square distribution with v degrees of freedom and scale parameter S, χ⁻²(v, S), is given by
$f (σ^{2}; v; S) = \frac{{(\frac{S}{2})}^{\frac{v}{2}}}{Γ (\frac{v}{2}) {(σ^{2})}^{1 + \frac{v}{2}}} exp (- \frac{S}{2 σ^{2}}) .$
and the mean, mode, and variance of this distribution are given by $\frac{S}{v - 2}$ , $\frac{S}{v + 2}$ , and $\frac{2 S^{2}}{{(v - 2)}^{2} (v - 4)}$ , respectively.
The pdf of the gamma distribution with shape parameter s and rate parameter r:
$f_{S_{β}} (x; s; r) = \frac{r^{s} x^{s - 1}}{Γ (s)} exp (- rx) .$
The mean, mode, and variance of this distribution are s/r, (s − 1)/r, and s/r², respectively.
The pdf of a beta distribution with mean μ and precision parameter ϕ (“dispersion” parameter ϕ⁻¹) is given by
$f (x; μ; ϕ) = \frac{1}{B [μϕ, (1 - μ) ϕ]} x^{μϕ - 1} {(1 - x)}^{(1 - μ) ϕ - 1},$
where the relation with the standard parameterization of this distribution, Beta(α, β), is
$μ = \frac{α}{α + β}, ϕ = α + β .$
The pdf of a Laplace distribution is
$f (x; b) = \frac{1}{2 b} exp (- \frac{|x|}{b}), b > 0, - \infty \leq x \leq \infty .$
The mean and variance of this distribution are 0 and 2b².
A random matrix Σ of dimension p × p is distributed as inverse Wishart distribution with parameter v and S, Σ ∼ IW(v, S), if it has a density function
$f (Σ) = \frac{1}{2^{\frac{vp}{2}} Γ_{p} (v / 2)} {|S|}^{\frac{v}{2}} {|Σ|}^{- \frac{v + p + 1}{2}} exp [- \frac{1}{2} tr (S Σ^{- 1})],$
where Γ_p(v/2) is the multivariate gamma function, v > 0, and Σ and S are positive defined matrices. The mean matrix of this distribution is $\frac{S}{v - p - 1} .$
A (p × q) random matrix Z follows the matrix normal distribution with matrix parameters M (p × q), U (p × p), and V(q × q), Z ∼ MN_{p, q}(M, U, V), if it has density
$f (Z| M| U| V) = \frac{exp \{- \frac{1}{2} tr [V^{- 1} {(Z - M)}^{T} U^{- 1} (Z - M)]\}}{{(2 π)}^{pq / 2} {|V|}^{p / 2} {|U|}^{q / 2}} .$

Appendix 2: Setting Hyperparameters for the Prior Distributions of the BRR Model

The following rules are those used in Pérez and de los Campos (2014), and provide proper but weakly informative prior distributions. In general, this consists of assigning a certain proportion of the total variance of the phenotypes, to the different components of the model.

Specifically, for model (6.3), first the total variance of y is partitioned into two components: (1) the error and (2) the linear predictor:

Var (y_{j}) = Var (x_{j}^{T} β_{0}) + σ^{2}

Therefore, the average of the variance of the individuals, called total variance, is equal to

\frac{1}{n} \sum_{j = 1}^{n} Var (y_{j}) = \frac{1}{n} \sum_{j = 1}^{n} Var (x_{j}^{T} β_{0}) + σ^{2} = \frac{1}{n} tr (X X^{T}) σ_{β}^{2} + σ^{2} = V_{M} + V_{ϵ} .

Then, by setting $R_{1}^{2}$ as the proportion of the total variance (V_y), that is explained by markers a priori, $V_{M} = R_{1}^{2} V_{y}$ , and replacing $σ_{β}^{2}$ in V_M by its prior mode, $\frac{S_{β}}{v_{β} + 2}$ , we have that

\frac{1}{n} tr (X X^{T}) (\frac{S_{β}}{v_{β} + 2}) = R_{1}^{2} V_{y} .

From here, once we have set a value for v_β, the scale parameter is given by

S_{β} = \frac{R_{1}^{2} V_{y}}{\frac{1}{n} tr (X X^{T})} (v_{β} + 2) .

A commonly used value of the shape parameter is v_β = 5 and the value for the proportion of explained variances is $R_{1}^{2} = 0.5$ .

Because the model only has two predictors and $R_{1}^{2}$ was set as the proportion of the total variance that is explained by markers a priori, the corresponding proportion that is explained by error a priori is $R_{2}^{2} = 1 - R_{1}^{2}$ . Then, similar to what was done before, once there is a value for the shape parameter of the prior distribution of σ², v, the value of the scale parameter is given by

S = (1 - R_{1}^{2}) V_{y} (v + 2) .

By default, v = 5 is often used.

Appendix 3: R Code Example 1

rm(list=ls())
library(BGLR)
load('dat_ls_E1.RData',verbose=TRUE)
#Phenotypic data
dat_F = dat_ls$dat_F
head(dat_F)
#Marker data
dat_M = dat_ls$dat_M
dim(dat_M)
dat_F = transform(dat_F, GID = as.character(GID))
head(dat_F,5)
#Matrix design of markers
Pos = match(dat_F$GID,row.names(dat_M))
XM = dat_M[Pos,]
XM = scale(XM)
dim(XM)
n = dim(dat_F)[1]
y = dat_F$y
#10 random partitions
K = 10
set.seed(1)
PT = replicate(K,sample(n,0.20*n))
#BRR
ETA_BRR = list(list(model='BRR',X=XM))
Tab = data.frame(PT = 1:K,MSEP = NA)
set.seed(1)
for(k in 1:K)
{
 Pos_tst = PT[,k]
 y_NA = y
 y_NA[Pos_tst] = NA
 A = BGLR(y=y_NA,ETA=ETA_BRR,nIter = 1e4,burnIn = 1e3,verbose = FALSE)
 yp_ts = A$yHat[Pos_tst]
 Tab$MSEP[k] = mean((y[Pos_tst]-yp_ts)^2)
}
#GBLUP
dat_M = scale(dat_M)
G = tcrossprod(XM)/dim(XM)[2]
dim(G)
#Matrix design of GIDs
Z = model.matrix(~0+GID,data=dat_F,xlev = list(GID=unique(dat_F$GID)))
K_L = Z%*%G%*%t(Z)
ETA_GB = list(list(model='RKHS',K = K_L))
#Tab = data.frame(PT = 1:K,MSEP = NA)
set.seed(1)
for(k in 1:K)
{
 Pos_tst = PT[,k]
 y_NA = y
 y_NA[Pos_tst] = NA
 A = BGLR(y=y_NA,ETA=ETA_GB,nIter = 1e4,burnIn = 1e3,verbose = FALSE)
 yp_ts = A$yHat[Pos_tst]
 Tab$MSEP_GB[k] = mean((y[Pos_tst]-yp_ts)^2)
}
#BA
ETA_BA = list(list(model='BayesA',X=XM))
#Tab = data.frame(PT = 1:K,MSEP = NA)
set.seed(1)
for(k in 1:K)
{
 Pos_tst = PT[,k]
 y_NA = y
 y_NA[Pos_tst] = NA
 A = BGLR(y=y_NA,ETA=ETA_BA,nIter = 1e4,burnIn = 1e3,verbose = FALSE)
 yp_ts = A$yHat[Pos_tst]
 Tab$MSEP_BA[k] = mean((y[Pos_tst]-yp_ts)^2)
}
#BB
ETA_BB = list(list(model='BayesB',X=XM))
#Tab = data.frame(PT = 1:K,MSEP = NA)
set.seed(1)
for(k in 1:K)
{
 Pos_tst = PT[,k]
 y_NA = y
 y_NA[Pos_tst] = NA
 A = BGLR(y=y_NA,ETA=ETA_BB,nIter = 1e4,burnIn = 1e3,verbose = FALSE)
 yp_ts = A$yHat[Pos_tst]
 Tab$MSEP_BB[k] = mean((y[Pos_tst]-yp_ts)^2)
}
#BC
ETA_BC = list(list(model='BayesC',X=XM))
#Tab = data.frame(PT = 1:K,MSEP = NA)
set.seed(1)
for(k in 1:K)
{
 Pos_tst = PT[,k]
 y_NA = y
 y_NA[Pos_tst] = NA
 A = BGLR(y=y_NA,ETA=ETA_BC,nIter = 1e4,burnIn = 1e3,verbose = FALSE)
 yp_ts = A$yHat[Pos_tst]
 Tab$MSEP_BC[k] = mean((y[Pos_tst]-yp_ts)^2)
}
#BL
ETA_BL = list(list(model='BL',X=XM))
#Tab = data.frame(PT = 1:K,MSEP = NA)
set.seed(1)
for(k in 1:K)
{
 Pos_tst = PT[,k]
 y_NA = y
 y_NA[Pos_tst] = NA
 A = BGLR(y=y_NA,ETA=ETA_BL,nIter = 1e4,burnIn = 1e3,verbose = FALSE)
 yp_ts = A$yHat[Pos_tst]
 Tab$MSEP_BL[k] = mean((y[Pos_tst]-yp_ts)^2)
}
Tab
#Mean and SD across the five partitions
apply(Tab[,-1],2,function(x)c(mean(x),sd(x)))

Appendix 4: R Code Example 2

rm(list=ls())
library(BGLR)
library(BMTME)
load('dat_ls_E2.RData',verbose=TRUE)
#Phenotypic data
dat_F = dat_ls$dat_F
head(dat_F)
dim(dat_F)
#Marker data
dat_M = dat_ls$dat_M
dim(dat_M)
dat_F = transform(dat_F, GID = as.character(GID))
head(dat_F,5)
#Matrix design for markers
Pos = match(dat_F$GID,row.names(dat_M))
XM = dat_M[Pos,]
dim(XM)
XM = scale(XM)
#Environment design matrix
XE = model.matrix(~0+Env,data=dat_F)[,-1]
head(XE)
#Environment–marker design matrix
XEM = model.matrix(~0+XM:XE)
#GID design matrix and Environment-GID design matrix
#for RKHS models
Z_L = model.matrix(~0+GID,data=dat_F,xlev = list(GID=unique(dat_F$GID)))
Z_LE = model.matrix(~0+GID:Env,data=dat_F,
  xlev = list(GID=unique(dat_F$GID),Env = unique(dat_F$Env)))
#Genomic relationship matrix derived from markers
dat_M = scale(dat_M)
G = tcrossprod(dat_M)/dim(dat_M)[2]
dim(G)
#Covariance matrix for Zg
K_L = Z_L%*%G%*%t(Z_L)
#Covariance matrix for random effects ZEg
K_LE = Z_LE%*%kronecker(diag(4),G)%*%t(Z_LE)
n = dim(dat_F)[1]
y = dat_F$y
#Number of random partitions
K = 5
PT = CV.KFold(dat_F,DataSetID = 'GID', K=5, set_seed = 1)
Models = c('BRR','RKHS','BayesA','BayesB','BayesC','BL')
Tab = data.frame()
for(m in 1:6)
{
 ETA1 = list(list(model=Models[m],X=XM))
 ETA2 = list(list(model='FIXED',X=XE),list(model=Models[m],X=XM))
 ETA3 = list(list(model='FIXED',X=XE),list(model=Models[m],X=XM),
 list(model=Models[m],X=XEM))
 if(Models[m]=='RKHS')
 {
 ETA1 = list(list(model='RKHS',K=K_L))
 ETA2 = list(list(model='FIXED',X=XE),list(model='RKHS',K=K_L))
 ETA3 = list(list(model='FIXED',X=XE),list(model='RKHS',K=K_L),
  list(model='RKHS',K=K_LE))
 }
 Tab1_m = data.frame(PT = 1:K,MSEP = NA)
 set.seed(1)
 Tab2_m = Tab1_m
 Tab3_m = Tab2_m
 for(k in 1:K)
 {
 Pos_tst = PT$CrossValidation_list[[k]]
 y_NA = y
 y_NA[Pos_tst] = NA
 A = BGLR(y=y_NA,ETA=ETA1,nIter = 1e4,burnIn = 1e3,verbose = FALSE)
 yp_ts = A$yHat[Pos_tst]
 Tab1_m$MSEP[k] = mean((y[Pos_tst]-yp_ts)^2)
 Tab1_m$Cor[k] = cor(y[Pos_tst],yp_ts)
 A = BGLR(y=y_NA,ETA=ETA2,nIter = 1e4,burnIn = 1e3,verbose = FALSE)
 yp_ts = A$yHat[Pos_tst]
 Tab2_m$MSEP[k] = mean((y[Pos_tst]-yp_ts)^2)
 Tab2_m$Cor[k] = cor(y[Pos_tst],yp_ts)
 A = BGLR(y=y_NA,ETA=ETA3,nIter = 1e4,burnIn = 1e3,verbose = FALSE)
 yp_ts = A$yHat[Pos_tst]
 Tab3_m$MSEP[k] = mean((y[Pos_tst]-yp_ts)^2)
 Tab3_m$Cor[k] = cor(y[Pos_tst],yp_ts)
 }
 Tab = rbind(Tab,data.frame(Model=Models[m],Tab1_m,Tab2_m,Tab3_m))
}
Tab

Appendix 5

6.1.1. R Code Example 3

rm(list=ls(all=TRUE))
library(BGLR)
library(BMTME)
library(dplyr)
load('dat_ls.RData',verbose=TRUE)
dat_F = dat_ls$dat_F
head(dat_F)
Y = as.matrix(dat_F[,-(1:2)])
dat_F$Env = as.character(dat_F$DS)
G = dat_ls$G
J = dim(G)[1]
XE = matrix(model.matrix(~0+Env,data=dat_F)[,-1],nc=1)
Z = model.matrix(~0+GID,data=dat_F)
K = Z%*%G%*%t(Z)
#Partitions for a 5-FCV
PT_ls = CV.KFold(dat_FF, DataSetID='GID',K=5,set_seed = 123)
PT_ls = PT_ls$CrossValidation_list
#Predictor BGLR
ETA = list(list(X=XE,model='FIXED'),list(K=K,model='RKHS'))
#Function to summarize the performance prediction: PC_MM_f
source('PC_MM.R')#See below
Tab = data.frame()
set.seed(1)
for(p in 1:5)
{
 Y_NA = Y
 Pos_NA = PT_ls[[p]]
 Y_NA[Pos_NA,] = NA
 A = Multitrait(y = Y_NA, ETA=ETA,
 resCov = list( type = 'UN', S0 = diag(2), df0 = 5 ),
 nIter = 5e3, burnIn = 1e3)
 PC = PC_MM_f(Y[Pos_NA,],A$yHat[Pos_NA,],Env=dat_F$Env[Pos_NA])
 Tab = rbind(Tab,data.frame(PT=p,PC))
 cat('PT=',p,'\n')
}
Tab_R = Tab%>%group_by(Env,Trait)%>%select(Cor,MSEP)%>%summarise(Cor_mean = mean(Cor),
    Cor_sd = sd(Cor),
    MSEP_mean = mean(MSEP),
    MSEP_sd = sd(MSEP))
Tab_R = as.data.frame(Tab_R)
Tab_R
#Save in the same folder the following in a file with name “PC_MM.R”
#Performance criteria
PC_MM_f<-function(y,yp,Env=NULL)
{
 if(is.null(Env))
 {
 Cor = diag(cor(as.matrix(y),as.matrix(yp)))
 MSEP = colMeans((y-yp)^2)
 PC = data.frame(Trait = colnames(y),Cor=Cor, MSEP=MSEP)
 }
 else
 {
 PC = data.frame()
 Envs = unique(Env)
 nE = length(Envs)
 for(e in 1:nE)
 {
 y_e = y[Env==Envs[e],]
 yp_e = yp[Env==Envs[e],]
 Cor = diag(cor(as.matrix(y_e),as.matrix(yp_e)))
 MSEP = colMeans((y_e-yp_e)^2)
 PC = rbind(PC,data.frame(Trait = colnames(y),Env=Envs[e],Cor=Cor, MSEP=MSEP))
 }
 }
 PC
}

6.1.2. R Code for Example 4

rm(list=ls(all=TRUE))
library(BMTME)
library(dplyr)
load('dat_ls.RData',verbose=TRUE)
dat_F = dat_ls$dat_F
head(dat_F)
Y = as.matrix(dat_F[,-(1:2)])
dat_F$Env = as.character(dat_F$DS)
G = dat_ls$G
Lg = t(chol(G))
XE = model.matrix(~Env,data=dat_F)
Z1 = model.matrix(~0+GID,data=dat_F)
Z1_a = Z1%*%Lg
Z2 = model.matrix(~0+GID:Env,data=dat_F)
L2 = kronecker(diag(dim(XE)[2]),Lg)
Z2_a = Z2%*%L2
#Partitions for a 5-FCV
PT_ls = CV.KFold(dat_FF, DataSetID='GID',K=5,set_seed = 123)
PT_ls = PT_ls$CrossValidation_list
source('PC_MM.R')#See file R “PC_MM.R” defined in example 6.1
Tab = data.frame()
set.seed(1)
for(p in 1:5)
{
 Y_NA = Y
 Pos_NA = PT_ls[[p]]
 Y_NA[Pos_NA,] = NA
 A = BMTME(Y = Y_NA, X = XE, Z1 = Z1_a, Z2 = Z2_a,
  nIter = 3e3, burnIn = 5e2, thin = 2, bs = 50)
 PC = PC_MM_f(Y[Pos_NA,],A$yHat[Pos_NA,],Env=dat_F$Env[Pos_NA])
 Tab = rbind(Tab,data.frame(PT=p,PC))
 cat('PT=',p,'\n')
}
Tab_R = Tab%>%group_by(Env,Trait)%>%select(Cor,MSEP)%>%summarise(Cor_mean = mean(Cor),
    Cor_sd = sd(Cor),
    MSEP_mean = mean(MSEP),
    MSEP_sd = sd(MSEP))
Tab_R = as.data.frame(Tab_R)
Tab_R

References

Box GEP, Tiao GC (1992) Bayesian inference in statistical analysis. Wiley, New York. [CrossRef]
Calus MP, Veerkamp RF (2011) Accuracy of multi-trait genomic selection using different methods. Genet Sel Evol 43(1):26. [PMC free article: PMC3146811] [PubMed: 21729282] [CrossRef]
Casella G, George EI (1992) Explaining the Gibbs sampler. Am Stat 46(3):167–174.
Christensen R, Johnson W, Branscum A, Hanson TE (2011) Bayesian ideas and data analysis: an introduction for scientists and statisticians. Chapman & Hall/CRC, Stanford, CA.
de los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MP (2013) Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193(2):327–345. [PMC free article: PMC3567727] [PubMed: 22745228] [CrossRef]
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis. Chapman and Hall/CRC, Stanford, CA. [CrossRef]
Habier D, Tetens J, Seefried FR, Lichtner P, Thaller G (2010) The impact of genetic relationship information on genomic breeding values in German Holstein cattle. Genet Sel Evol 42(1):5. [PMC free article: PMC2838754] [PubMed: 20170500] [CrossRef]
Habier D, Fernando RL, Kizilkaya K, Garrick DJ (2011) Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics 12(1):186. [PMC free article: PMC3144464] [PubMed: 21605355] [CrossRef]
Henderson C (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31(2):423–447. https://doi.org/10.2307/2529430. [PubMed: 1174616] [CrossRef]
Henderson CR, Quaas RL (1976) Multiple trait evaluation using relative’s records. J Anim Sci 43:11–88. [CrossRef]
Jia Y, Jannink JL (2012) Multiple-trait genomic selection methods increase genetic value prediction accuracy. Genetics 192(4):1513–1522. [PMC free article: PMC3512156] [PubMed: 23086217] [CrossRef]
Jiang J, Zhang Q, Ma L, Li J, Wang Z, Liu JF (2015) Joint prediction of multiple quantitative traits using a Bayesian multivariate antedependence model. Heredity 115(1):29–36. [PMC free article: PMC4815501] [PubMed: 25873147] [CrossRef]
Lehermeier C, Wimmer V, Albrecht T, Auinger HJ, Gianola D, Schmid VJ, Schön CC (2013) Sensitivity to prior specification in Bayesian genome-based prediction models. Stat Appl Genet Mol Biol 12(3):375–391. [PubMed: 23629460] [CrossRef]
Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829. [PMC free article: PMC1461589] [PubMed: 11290733] [CrossRef]
Montesinos-López OA, Montesinos-López A, Crossa J, Toledo FH, Pérez-Hernández O, Eskridge KM, Rutkoski J (2016) A genomic Bayesian multi-trait and multi-environment model. G3 6(9):2725–2744. [PMC free article: PMC5015931] [PubMed: 27342738] [CrossRef]
Montesinos-López OA, Montesinos-López A, Montesinos-López JC, Crossa J, Luna-Vázquez FJ, Salinas-Ruiz J (2018) A Bayesian multiple-trait and multiple-environment model using the matrix normal distribution. In: Physical methods for stimulation of plant and mushroom development. IntechOpen, Croatia, p 19.
Park T, Casella G (2008) The Bayesian lasso. J Am Stat Assoc 103(482):681–686. [CrossRef]
Pérez P, de los Campos G (2013) BGLR: a statistical package for whole genome regression and prediction. R package version 1(0.2)
Pérez P, de los Campos G (2014) Genome-wide regression and prediction with the BGLR statistical package. Genetics 198(2):483–495. [PMC free article: PMC4196607] [PubMed: 25009151] [CrossRef]
Pérez P, de los Campos G, Crossa J, Gianola D (2010) Genomic-enabled prediction based on molecular markers and pedigree using the Bayesian linear regression package in R. Plant Genome 3(2):106–116. [PMC free article: PMC3091623] [PubMed: 21566722] [CrossRef]
Pollak EJ, Van der Werf J, Quaas RL (1984) Selection bias and multiple trait evaluation. J Dairy Sci 67(7):1590–1595. [CrossRef]
Pszczola M, Veerkamp RF, De Haas Y, Wall E, Strabel T, Calus MPL (2013) Effect of predictor traits on accuracy of genomic breeding values for feed intake based on a limited cow reference population. Animal 7(11):1759–1768. [PubMed: 23915541] [CrossRef]
Schaeffer LR (1984) Sire and cow evaluation under multiple trait models. J Dairy Sci 67(7):1567–1580. [CrossRef]
VanRaden PM (2007) Genomic measures of relationship and inbreeding. Interbull Bull 7:33–36.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Bookshelf ID: NBK583963DOI: 10.1007/978-3-030-89010-0_6

Contents

< Prev Next >

PubReader
Print View
Cite this Page
Montesinos López OA, Montesinos López A, Crossa J. Multivariate Statistical Machine Learning Methods for Genomic Prediction [Internet]. Cham (CH): Springer; 2022. Chapter 6, Bayesian Genomic Linear Regression. 2022 Jan 14. doi: 10.1007/978-3-030-89010-0_6
PDF version of this page (566K)

In this Page

Bayes Theorem and Bayesian Linear Regression
Bayesian Genome-Based Ridge Regression
Bayesian GBLUP Genomic Model
Genomic-Enabled Prediction BayesA Model
Genomic-Enabled Prediction BayesB and BayesC Models
Genomic-Enabled Prediction Bayesian Lasso Model
Extended Predictor in Bayesian Genomic Regression Models
Bayesian Genomic Multi-trait Linear Regression Model
Bayesian Genomic Multi-trait and Multi-environment Model (BMTME)
Appendix 1
Appendix 2: Setting Hyperparameters for the Prior Distributions of the BRR Model
Appendix 3: R Code Example 1
Appendix 4: R Code Example 2
Appendix 5
References

Recent Activity

Clear Turn Off Turn On

Bayesian Genomic Linear Regression - Multivariate Statistical Machine Learning M...
Bayesian Genomic Linear Regression - Multivariate Statistical Machine Learning Methods for Genomic Prediction
uncharacterized protein LOC121108839 isoform X1 [Gallus gallus]
uncharacterized protein LOC121108839 isoform X1 [Gallus gallus]
gi|2024331339|ref|XP_040512909.1|
Protein
large ribosomal subunit protein eL18 [Equus caballus]
large ribosomal subunit protein eL18 [Equus caballus]
gi|306922364|ref|NP_001182442.1|
Protein
Enterococcus faecalis strain NY13412 chromosome, complete genome
Enterococcus faecalis strain NY13412 chromosome, complete genome
gi|2496560956|ref|NZ_CP104856.1|
Nucleotide
Mus musculus M-phase phosphoprotein 9 (Mphosph9), transcript variant 6, non-codi...
Mus musculus M-phase phosphoprotein 9 (Mphosph9), transcript variant 6, non-coding RNA
gi|2251727650|ref|NR_176471.1|
Nucleotide

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Bookshelf

Multivariate Statistical Machine Learning Methods for Genomic Prediction [Internet].

Chapter 6Bayesian Genomic Linear Regression

Keywords:

6.1. Bayes Theorem and Bayesian Linear Regression

6.2. Bayesian Genome-Based Ridge Regression

6.3. Bayesian GBLUP Genomic Model

6.4. Genomic-Enabled Prediction BayesA Model

6.5. Genomic-Enabled Prediction BayesB and BayesC Models

6.6. Genomic-Enabled Prediction Bayesian Lasso Model

Example 1

Table 6.1

6.7. Extended Predictor in Bayesian Genomic Regression Models

Example 2 (Includes Models with Only Env Effects and Models with Env and LinexEnv Effects)

Table 6.2

Table 6.3

6.8. Bayesian Genomic Multi-trait Linear Regression Model

6.8.1. Genomic Multi-trait Linear Model

Example 3

Table 6.4

6.9. Bayesian Genomic Multi-trait and Multi-environment Model (BMTME)

Example 4

Table 6.5

Appendix 1

Appendix 2: Setting Hyperparameters for the Prior Distributions of the BRR Model

Appendix 3: R Code Example 1

Appendix 4: R Code Example 2

Appendix 5

6.1.1. R Code Example 3

6.1.2. R Code for Example 4

References

Views

In this Page

Related Items in Bookshelf

Recent Activity