Over half of the world’s population depend on rice for survival as it serves as a major staple food for an estimated 66% of people in almost all regions. Therefore, increased rice production will play a crucial role in improving food security, eradicating poverty and supplementing energy demands as it currently provides over 20% of energy. Predictions have shown that when compared with 1995, 60% more rice will have to be produced by 2030 (Saleh et al., 2020).
Influenced by quantitative trait loci (QTL) genes and environmental factors, yield is one of the most significant and complex trait in rice. In rice species, yield is determined by certain factors, classified as either direct or indirect. Direct determinants of yield include 1000 or 100 – grain weight depending on the size of the grain, filled grains per panicle and panicle number per unit area and/or per plant, while indirect determinants include traits like plant height, growth period, tillering ability, panicle length, seed length, seed setting rate, and grains per panicle (Li et al., 2019).
One major predictor of milling yield is test weight, a component used for the comparison of grain densities and it is derived by weighing a known grain volume. It is dependent on embryo size and reserved nutrient quantity which is used for germination and growth. Lower test weight scores occur due to unfavorable field environmental conditions or poor grain filling prior to harvest. Depending on grain size, the 100 or 1000 grain weight is an important measure of seed quality which has a positive effect on seedling growth, seed potential, sprouting and overall plant performance (Deivasigamani S & Swaminathan C, 2018).
Over the years, studies have focused on overall rice yield prediction using agronomic traits (Miedaner et al., 2019; Prakash Poudel, 2018; Puji Lestari et al., 2015; Venkateswarlu et al., n.d.), where test weight has been the leading determinant of grain yield. However, the realization of yield potential which is measured by multiplying the number of spikelets per unit area by average grain weight is highly dependent on some latent factors (Venkateswarlu et al., n.d.). Considering the strong association between test weight and increased yield potential, it will be relatively important to explore factors that might influence or control test weight in rice cultivars. Therefore, this study seeks to;
Explore trait variation across subpopulations and regions.
Examine the extent of association between test weight (100 seed weight) and yield attributing traits across rice subpopulations and geographical regions.
Identify direct and latent factors influencing test weight across subpopulations.
Rice Diversity Panel 1 Dataset
The RDP1 (Reg. No. MP-6, NSL 500357 MAP) dataset is used in this study here. It broadly represents the wide range of phenotypic and genotypic variation within O. sativa. It consists of 421 purified homozygous rice (Oryza sativa L.) accessions (GSOR 301001 through GSOR 301421; GSOR 312001 through 312020) which include both landraces and elite rice cultivars. Accessions are classified into five subpopulation groups namely; indica (95 accessions) and aus (60), which belong to the Indica varietal group, and tropical japonica (106), temperate japonica (111), and aromatic (Group V) (16) which comprise the Japonica varietal group. Thirty-three accessions shared <60% ancestry with a single group and were classified as admixtures (Eizenga et al., 2014). Data can be downloaded here (usda.gov). The data selected for the analysis included the following trait data; 100 seed weight (HHGW), panicle number per unit area (PNN), filled grain number per panicle (FGN), plant height (PTH), panicle length (PNL), total florets per panicle (FLN).
Statistical Analysis
All statistical analysis were carried out in R version 4.0.2. Data was loaded into R studio and all missing values and unknown groups were filtered out. For descriptive statistics, boxplots were plotted using the ‘ggplot2’ package to show the distribution of traits across all regions and subpopulations. Before inferential statistical analysis were performed, data was preprocessed and normalized using the caret package after which density plots were drawn to visualize normality. PCA analysis was performed using the ‘ggbiplot’ package. Standardized major axis (SMA) slopes were used for the summary of bivariate trait relationships and to test whether the regression of test weight and other agronomic traits differed among subpopulations and regions. SMA regression was carried out using the ‘smart’ package. Multivariate causal network for all selected traits was quantified with a Structural Equation Model (SEM) using the ‘lavaan’ package. SEM combines multiple regression analysis and factor analysis for the evaluation of structural relationships, estimating interrelated dependence in a single analysis (Li et al., 2019; Regression & Models, 2005). A general linear regression was performed, where the most correlated variable, FLN was used as predictor for test weight across subpopulation and region. All relevant assumptions were tested and post hoc analyses were carried out.
# load all required libraries
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(smatr)
library(Cairo)
library(extrafont)
## Registering fonts with R
library(clusterSim)
## Loading required package: cluster
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
library(grid)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
library(ggpubr)
library(lavaan)
## This is lavaan 0.6-7
## lavaan is BETA software! Please report any bugs.
library(semPlot)
## Registered S3 methods overwritten by 'lme4':
## method from
## cooks.distance.influence.merMod car
## influence.merMod car
## dfbeta.influence.merMod car
## dfbetas.influence.merMod car
## Registered S3 methods overwritten by 'BDgraph':
## method from
## plot.sim lava
## print.sim lava
## Registered S3 methods overwritten by 'huge':
## method from
## plot.roc pROC
## plot.sim BDgraph
## print.roc pROC
## print.sim BDgraph
library(devtools)
## Loading required package: usethis
library(multcomp)
## Loading required package: mvtnorm
## Loading required package: survival
##
## Attaching package: 'survival'
## The following object is masked from 'package:caret':
##
## cluster
## Loading required package: TH.data
##
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
##
## geyser
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
library(emmeans)
##
## Attaching package: 'emmeans'
## The following object is masked from 'package:devtools':
##
## test
library(modelr)
#library(ggbiplot)
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
Distribution of subpopulations across regions
# load data and clean names
rice <- read.csv("./raw_data/rice_phenotype_data.csv") %>%
janitor::clean_names() %>%
filter(!subpopulation_group_pca %in% "N/A") %>%
filter(!region %in% "Unknown") %>%
na.omit()
# view data
head(rice)
## nsftv_id gsor_id irgc_id name orignal_providing_country
## 1 1 301001 126380 Agostano Italy
## 2 3 301003 117636 Ai-Chiao-Hong China
## 3 6 301006 117603 ARC 7229 India
## 4 7 301007 126381 Arias Indonesia
## 5 8 301008 117643 Asse Y Pung Philippines
## 6 9 301009 117647 Baber India
## region subpopulation_group_structure subpopulation_group_pca
## 1 West Europe TEJ TEJ
## 2 East Asia IND IND
## 3 South Asia AUS AUS
## 4 Southeast Asia TRJ TRJ
## 5 Southeast Asia TRJ TRJ
## 6 South Asia ADMIX_J TEJ
## subpopulation_group_fast_structure amycn alkdig dhulprotcn dthd ptht flflg
## 1 temperate-japonica 15.6 6.1 8.4 75.1 110.9 28.4
## 2 indica 23.1 5.8 9.2 89.5 143.5 39
## 3 aus 23.2 5.5 8.5 89.1 148.3 36.9
## 4 tropical-japonica 20.1 6.3 11.8 105 119.6 37
## 5 tropical-japonica 22.2 6.6 8.7 101.6 147.4 42
## 6 temperate-japonica 22.1 5.4 9.9 73 128.9 24.4
## flfwd pnnb pnlg pbrnb flnbppn unfilgrnb filgrnb hulgrlg hulgrwd hulgrlgwdro
## 1 1.3 21.5 20.5 9.3 136.3 16.5 119.8 8.1 3.7 2.2
## 2 1 57.5 26.8 9.2 113 28.3 84.8 7.7 3 2.6
## 3 1.8 17.4 30.9 11.2 291.9 37.6 254.3 7.1 3.3 2.2
## 4 1.6 17.4 24.6 13.3 229.5 21.4 208.1 7.3 2.5 2.9
## 5 1.9 13.8 31.5 12.2 328.6 72.1 256.4 8.6 2.7 3.2
## 6 1.2 18.6 20.2 7.9 114.4 12.9 101.5 7.7 3.4 2.3
## hulgrvol hhulgrwt dhulgrlg dhulgrwd dhulgrvol culmhab sdsh awnplu lflpubes
## 1 13.3 3.1 5.8 3.1 7.7 4 yes 0 yes
## 2 8.3 2.3 5.6 2.6 5.1 8 yes 0 yes
## 3 9 2.2 5.1 2.8 5.6 6 yes 0 yes
## 4 5.9 2.2 5.5 2.2 3.6 3 yes 1 yes
## 5 7.7 2 6.5 2.2 4.5 3 yes 0 yes
## 6 10.5 2.7 5.5 2.9 6.4 7 no 5 yes
## hulcl dhulgrcl lfblrs strthead
## 1 2 2 6 4.8
## 2 2 1 3 7.8
## 3 4 2 0 8.2
## 4 2 2 1.5 8.6
## 5 3 2 1.5 7
## 6 6 2 6.5 7.2
# check data structure
str(rice)
## 'data.frame': 383 obs. of 38 variables:
## $ nsftv_id : chr "1" "3" "6" "7" ...
## $ gsor_id : chr "301001" "301003" "301006" "301007" ...
## $ irgc_id : chr "126380" "117636" "117603" "126381" ...
## $ name : chr "Agostano" "Ai-Chiao-Hong" "ARC 7229" "Arias" ...
## $ orignal_providing_country : chr "Italy" "China" "India" "Indonesia" ...
## $ region : chr "West Europe" "East Asia" "South Asia" "Southeast Asia" ...
## $ subpopulation_group_structure : chr "TEJ" "IND" "AUS" "TRJ" ...
## $ subpopulation_group_pca : chr "TEJ" "IND" "AUS" "TRJ" ...
## $ subpopulation_group_fast_structure: chr "temperate-japonica" "indica" "aus" "tropical-japonica" ...
## $ amycn : chr "15.6" "23.1" "23.2" "20.1" ...
## $ alkdig : chr "6.1" "5.8" "5.5" "6.3" ...
## $ dhulprotcn : chr "8.4" "9.2" "8.5" "11.8" ...
## $ dthd : chr "75.1" "89.5" "89.1" "105" ...
## $ ptht : chr "110.9" "143.5" "148.3" "119.6" ...
## $ flflg : chr "28.4" "39" "36.9" "37" ...
## $ flfwd : chr "1.3" "1" "1.8" "1.6" ...
## $ pnnb : chr "21.5" "57.5" "17.4" "17.4" ...
## $ pnlg : chr "20.5" "26.8" "30.9" "24.6" ...
## $ pbrnb : chr "9.3" "9.2" "11.2" "13.3" ...
## $ flnbppn : chr "136.3" "113" "291.9" "229.5" ...
## $ unfilgrnb : chr "16.5" "28.3" "37.6" "21.4" ...
## $ filgrnb : chr "119.8" "84.8" "254.3" "208.1" ...
## $ hulgrlg : chr "8.1" "7.7" "7.1" "7.3" ...
## $ hulgrwd : chr "3.7" "3" "3.3" "2.5" ...
## $ hulgrlgwdro : chr "2.2" "2.6" "2.2" "2.9" ...
## $ hulgrvol : chr "13.3" "8.3" "9" "5.9" ...
## $ hhulgrwt : chr "3.1" "2.3" "2.2" "2.2" ...
## $ dhulgrlg : chr "5.8" "5.6" "5.1" "5.5" ...
## $ dhulgrwd : chr "3.1" "2.6" "2.8" "2.2" ...
## $ dhulgrvol : chr "7.7" "5.1" "5.6" "3.6" ...
## $ culmhab : chr "4" "8" "6" "3" ...
## $ sdsh : chr "yes" "yes" "yes" "yes" ...
## $ awnplu : chr "0" "0" "0" "1" ...
## $ lflpubes : chr "yes" "yes" "yes" "yes" ...
## $ hulcl : chr "2" "2" "4" "2" ...
## $ dhulgrcl : chr "2" "1" "2" "2" ...
## $ lfblrs : chr "6" "3" "0" "1.5" ...
## $ strthead : chr "4.8" "7.8" "8.2" "8.6" ...
## - attr(*, "na.action")= 'omit' Named int [1:22] 9 10 38 48 55 72 74 80 91 114 ...
## ..- attr(*, "names")= chr [1:22] "9" "10" "38" "48" ...
# convert characters to class numeric
rice2 <- rice %>%
mutate(hhulgrwt1 = as.numeric(hhulgrwt),
hulgrvol1 = as.numeric(hulgrvol),
ptht1 = as.numeric(ptht),
pnlg1 = as.numeric(pnlg),
pnnb1 = as.numeric(pnnb),
dthd1 = as.numeric(dthd),
filgrnb1 = as.numeric(filgrnb),
flnbppn1 = as.numeric(flnbppn))
# str data
str(rice2)
## 'data.frame': 383 obs. of 46 variables:
## $ nsftv_id : chr "1" "3" "6" "7" ...
## $ gsor_id : chr "301001" "301003" "301006" "301007" ...
## $ irgc_id : chr "126380" "117636" "117603" "126381" ...
## $ name : chr "Agostano" "Ai-Chiao-Hong" "ARC 7229" "Arias" ...
## $ orignal_providing_country : chr "Italy" "China" "India" "Indonesia" ...
## $ region : chr "West Europe" "East Asia" "South Asia" "Southeast Asia" ...
## $ subpopulation_group_structure : chr "TEJ" "IND" "AUS" "TRJ" ...
## $ subpopulation_group_pca : chr "TEJ" "IND" "AUS" "TRJ" ...
## $ subpopulation_group_fast_structure: chr "temperate-japonica" "indica" "aus" "tropical-japonica" ...
## $ amycn : chr "15.6" "23.1" "23.2" "20.1" ...
## $ alkdig : chr "6.1" "5.8" "5.5" "6.3" ...
## $ dhulprotcn : chr "8.4" "9.2" "8.5" "11.8" ...
## $ dthd : chr "75.1" "89.5" "89.1" "105" ...
## $ ptht : chr "110.9" "143.5" "148.3" "119.6" ...
## $ flflg : chr "28.4" "39" "36.9" "37" ...
## $ flfwd : chr "1.3" "1" "1.8" "1.6" ...
## $ pnnb : chr "21.5" "57.5" "17.4" "17.4" ...
## $ pnlg : chr "20.5" "26.8" "30.9" "24.6" ...
## $ pbrnb : chr "9.3" "9.2" "11.2" "13.3" ...
## $ flnbppn : chr "136.3" "113" "291.9" "229.5" ...
## $ unfilgrnb : chr "16.5" "28.3" "37.6" "21.4" ...
## $ filgrnb : chr "119.8" "84.8" "254.3" "208.1" ...
## $ hulgrlg : chr "8.1" "7.7" "7.1" "7.3" ...
## $ hulgrwd : chr "3.7" "3" "3.3" "2.5" ...
## $ hulgrlgwdro : chr "2.2" "2.6" "2.2" "2.9" ...
## $ hulgrvol : chr "13.3" "8.3" "9" "5.9" ...
## $ hhulgrwt : chr "3.1" "2.3" "2.2" "2.2" ...
## $ dhulgrlg : chr "5.8" "5.6" "5.1" "5.5" ...
## $ dhulgrwd : chr "3.1" "2.6" "2.8" "2.2" ...
## $ dhulgrvol : chr "7.7" "5.1" "5.6" "3.6" ...
## $ culmhab : chr "4" "8" "6" "3" ...
## $ sdsh : chr "yes" "yes" "yes" "yes" ...
## $ awnplu : chr "0" "0" "0" "1" ...
## $ lflpubes : chr "yes" "yes" "yes" "yes" ...
## $ hulcl : chr "2" "2" "4" "2" ...
## $ dhulgrcl : chr "2" "1" "2" "2" ...
## $ lfblrs : chr "6" "3" "0" "1.5" ...
## $ strthead : chr "4.8" "7.8" "8.2" "8.6" ...
## $ hhulgrwt1 : num 3.1 2.3 2.2 2.2 2 2.7 2.6 1.6 4.4 2.4 ...
## $ hulgrvol1 : num 13.3 8.3 9 5.9 7.7 10.5 9.1 6.1 10 7.4 ...
## $ ptht1 : num 111 144 148 120 147 ...
## $ pnlg1 : num 20.5 26.8 30.9 24.6 31.5 20.2 21.1 17.6 22.5 30.6 ...
## $ pnnb1 : num 21.5 57.5 17.4 17.4 13.8 18.6 36.7 39 26.7 30.9 ...
## $ dthd1 : num 75.1 89.5 89.1 105 101.6 ...
## $ filgrnb1 : num 119.8 84.8 254.3 208.1 256.4 ...
## $ flnbppn1 : num 136 113 292 230 329 ...
## - attr(*, "na.action")= 'omit' Named int [1:22] 9 10 38 48 55 72 74 80 91 114 ...
## ..- attr(*, "names")= chr [1:22] "9" "10" "38" "48" ...
# Distribution of subpopulations across all regions
# create an object for total region counts
totals <- rice2 %>%
count(region)
# create a barplot object
barplot <- rice2 %>%
count(subpopulation_group_pca, region) %>%
filter(n !=10) %>%
ggplot(aes(y = region, x = n)) +
geom_bar(stat = "identity", aes(fill = subpopulation_group_pca)) +
theme(axis.text.x = element_text(angle = 90, vjust=0.3, hjust=1)) +
geom_text(data = totals, aes(label = n), cex = 3) +
labs(y ="Regions", title = "Distribution of subpopulations across regions")
# visulaize barplot
barplot
East Asia had the highest number of accessions (76) consisting of four sub-populations, with Temperate Japonica (TEJ) and Indica varieties (IND) occupying a significant proportion. The Admixed (ADMIX) subpopulation can be found in almost all regions except Oceania, Central Asia, Central America and the Caribbean regions with the lowest number of accessions (6)
Trait differences across regions and sub-populations
# trait differences across regions and across sub-populations
hhulgrwt <- ggplot(rice2,
aes(y = hhulgrwt1, fill = region)) +
geom_boxplot() +
labs(y ="100-Seed weight")
ptht <- ggplot(rice2,
aes(y = ptht1, fill = region)) +
geom_boxplot() +
labs(y ="Plant Height")
pnlg <- ggplot(rice2,
aes(y = pnlg1, fill = region)) +
geom_boxplot() +
labs(y ="Panicle length
Inflorescence length")
pnnb <- ggplot(rice2,
aes(y = pnnb1, fill = region)) +
geom_boxplot() +
labs(y ="Panicles per plant
Inflorescence per plant")
dthd <- ggplot(rice2,
aes(y = dthd1, fill = region)) +
geom_boxplot() +
labs(y ="Days to heading")
filgrnb <- ggplot(rice2,
aes(y = filgrnb1, fill = region)) +
geom_boxplot() +
labs(y ="Seeds per panicle
Filled florets per panicle")
flnbppn <- ggplot(rice2,
aes(y = flnbppn1, fill = region)) +
geom_boxplot() +
labs(y ="Total Florets per panicle")
ggarrange(hhulgrwt, ptht, pnlg, pnnb, dthd, filgrnb, flnbppn + rremove("x.text"), common.legend = TRUE, ncol = 2, nrow = 2)
## $`1`
##
## $`2`
##
## attr(,"class")
## [1] "list" "ggarrange"
#------------------------------------------------------------------------------
# trait differences across subpopulations
hhulgrwt <- ggplot(rice2,
aes(y = hhulgrwt1, fill = subpopulation_group_pca)) +
geom_boxplot() +
labs(y ="100-Seed weight")
ptht <- ggplot(rice2,
aes(y = ptht1, fill = subpopulation_group_pca)) +
geom_boxplot() +
labs(y ="Plant Height")
pnlg <- ggplot(rice2,
aes(y = pnlg1, fill = subpopulation_group_pca)) +
geom_boxplot() +
labs(y ="Panicle length
Inflorescence length")
pnnb <- ggplot(rice2,
aes(y = pnnb1, fill = subpopulation_group_pca)) +
geom_boxplot() +
labs(y ="Panicles per plant
Inflorescence per plant")
dthd <- ggplot(rice2,
aes(y = dthd1, fill = subpopulation_group_pca)) +
geom_boxplot() +
labs(y ="Days to heading")
filgrnb <- ggplot(rice2,
aes(y = filgrnb1, fill = subpopulation_group_pca)) +
geom_boxplot() +
labs(y ="Seeds per panicle
Filled florets per panicle")
flnbppn <- ggplot(rice2,
aes(y = flnbppn1, fill = subpopulation_group_pca)) +
geom_boxplot() +
labs(y ="Total Florets per panicle")
# arrange all plots side by side using ggarrange
ggarrange(hhulgrwt, ptht, pnlg, pnnb, dthd, filgrnb, flnbppn, rremove("x.text"), common.legend = TRUE, ncol = 2, nrow = 2)
## $`1`
##
## $`2`
##
## attr(,"class")
## [1] "list" "ggarrange"
Data normalization for regression analysis
# scale and center data using Caret library
preproc1 <- preProcess(rice2[,39:46], method=c("center", "scale"))
# normalize data with mean 0 and standard deviation 1
norm1 <- predict(preproc1, rice2[,39:46]) %>%
mutate(region = rice2$region,
subpopulation =rice2$subpopulation_group_pca) %>%
na.omit() %>%
dplyr::rename(HHGW = hhulgrwt1,
HGV = hulgrvol1,
PTH = ptht1,
PNL = pnlg1,
PNN = pnnb1,
DTH = dthd1,
FGN = filgrnb1,
FLN = flnbppn1)
# summary of normalized data
summary(norm1)
## HHGW HGV PTH PNL
## Min. :-3.13283 Min. :-1.85980 Min. :-2.404944 Min. :-2.46837
## 1st Qu.:-0.63691 1st Qu.:-0.69574 1st Qu.:-0.781939 1st Qu.:-0.61774
## Median :-0.22092 Median :-0.07214 Median :-0.017243 Median :-0.04441
## Mean : 0.02193 Mean : 0.01695 Mean : 0.006418 Mean : 0.01261
## 3rd Qu.: 0.61105 3rd Qu.: 0.64501 3rd Qu.: 0.674510 3rd Qu.: 0.63778
## Max. : 3.93895 Max. : 3.17060 Max. : 3.746671 Max. : 3.36654
## PNN DTH FGN FLN
## Min. :-1.80455 Min. :-2.41050 Min. :-2.293937 Min. :-1.929527
## 1st Qu.:-0.74798 1st Qu.:-0.60661 1st Qu.:-0.626699 1st Qu.:-0.610398
## Median :-0.22914 Median :-0.15163 Median :-0.039020 Median :-0.116219
## Mean :-0.02606 Mean :-0.02416 Mean : 0.003102 Mean : 0.006202
## 3rd Qu.: 0.56907 3rd Qu.: 0.52461 3rd Qu.: 0.526551 3rd Qu.: 0.521558
## Max. : 3.07712 Max. : 5.11907 Max. : 3.608639 Max. : 3.564864
## region subpopulation
## Length:364 Length:364
## Class :character Class :character
## Mode :character Mode :character
##
##
##
# View distributions after data normalization}
norm1 %>%
group_by(subpopulation) %>%
summarise(subpopulation, HHGW) %>%
ggplot(aes(x = HHGW, fill = subpopulation)) +
geom_density(alpha = 0.7) +
facet_wrap(~subpopulation, ncol = 2) +
labs(title = "Distribution of grain weight after normalization")
## `summarise()` regrouping output by 'subpopulation' (override with `.groups` argument)
Principal Component Analysis
# I encountered an error downloading the ggbiplot package from GitHub so I copied the function code from the GitHub repo.
ggbiplot <- function(pcobj, choices = 1:2, scale = 1, pc.biplot = TRUE,
obs.scale = 1 - scale, var.scale = scale,
groups = NULL, ellipse = FALSE, ellipse.prob = 0.68,
labels = NULL, labels.size = 3, alpha = 1,
var.axes = TRUE,
circle = FALSE, circle.prob = 0.69,
varname.size = 3, varname.adjust = 1.5,
varname.abbrev = FALSE, ...)
{
library(ggplot2)
library(plyr)
library(scales)
library(grid)
stopifnot(length(choices) == 2)
# Recover the SVD
if(inherits(pcobj, 'prcomp')){
nobs.factor <- sqrt(nrow(pcobj$x) - 1)
d <- pcobj$sdev
u <- sweep(pcobj$x, 2, 1 / (d * nobs.factor), FUN = '*')
v <- pcobj$rotation
} else if(inherits(pcobj, 'princomp')) {
nobs.factor <- sqrt(pcobj$n.obs)
d <- pcobj$sdev
u <- sweep(pcobj$scores, 2, 1 / (d * nobs.factor), FUN = '*')
v <- pcobj$loadings
} else if(inherits(pcobj, 'PCA')) {
nobs.factor <- sqrt(nrow(pcobj$call$X))
d <- unlist(sqrt(pcobj$eig)[1])
u <- sweep(pcobj$ind$coord, 2, 1 / (d * nobs.factor), FUN = '*')
v <- sweep(pcobj$var$coord,2,sqrt(pcobj$eig[1:ncol(pcobj$var$coord),1]),FUN="/")
} else if(inherits(pcobj, "lda")) {
nobs.factor <- sqrt(pcobj$N)
d <- pcobj$svd
u <- predict(pcobj)$x/nobs.factor
v <- pcobj$scaling
d.total <- sum(d^2)
} else {
stop('Expected a object of class prcomp, princomp, PCA, or lda')
}
# Scores
choices <- pmin(choices, ncol(u))
df.u <- as.data.frame(sweep(u[,choices], 2, d[choices]^obs.scale, FUN='*'))
# Directions
v <- sweep(v, 2, d^var.scale, FUN='*')
df.v <- as.data.frame(v[, choices])
names(df.u) <- c('xvar', 'yvar')
names(df.v) <- names(df.u)
if(pc.biplot) {
df.u <- df.u * nobs.factor
}
# Scale the radius of the correlation circle so that it corresponds to
# a data ellipse for the standardized PC scores
r <- sqrt(qchisq(circle.prob, df = 2)) * prod(colMeans(df.u^2))^(1/4)
# Scale directions
v.scale <- rowSums(v^2)
df.v <- r * df.v / sqrt(max(v.scale))
# Change the labels for the axes
if(obs.scale == 0) {
u.axis.labs <- paste('standardized PC', choices, sep='')
} else {
u.axis.labs <- paste('PC', choices, sep='')
}
# Append the proportion of explained variance to the axis labels
u.axis.labs <- paste(u.axis.labs,
sprintf('(%0.1f%% explained var.)',
100 * pcobj$sdev[choices]^2/sum(pcobj$sdev^2)))
# Score Labels
if(!is.null(labels)) {
df.u$labels <- labels
}
# Grouping variable
if(!is.null(groups)) {
df.u$groups <- groups
}
# Variable Names
if(varname.abbrev) {
df.v$varname <- abbreviate(rownames(v))
} else {
df.v$varname <- rownames(v)
}
# Variables for text label placement
df.v$angle <- with(df.v, (180/pi) * atan(yvar / xvar))
df.v$hjust = with(df.v, (1 - varname.adjust * sign(xvar)) / 2)
# Base plot
g <- ggplot(data = df.u, aes(x = xvar, y = yvar)) +
xlab(u.axis.labs[1]) + ylab(u.axis.labs[2]) + coord_equal()
if(var.axes) {
# Draw circle
if(circle)
{
theta <- c(seq(-pi, pi, length = 50), seq(pi, -pi, length = 50))
circle <- data.frame(xvar = r * cos(theta), yvar = r * sin(theta))
g <- g + geom_path(data = circle, color = muted('white'),
size = 1/2, alpha = 1/3)
}
# Draw directions
g <- g +
geom_segment(data = df.v,
aes(x = 0, y = 0, xend = xvar, yend = yvar),
arrow = arrow(length = unit(1/2, 'picas')),
color = muted('red'))
}
# Draw either labels or points
if(!is.null(df.u$labels)) {
if(!is.null(df.u$groups)) {
g <- g + geom_text(aes(label = labels, color = groups),
size = labels.size)
} else {
g <- g + geom_text(aes(label = labels), size = labels.size)
}
} else {
if(!is.null(df.u$groups)) {
g <- g + geom_point(aes(color = groups), alpha = alpha)
} else {
g <- g + geom_point(alpha = alpha)
}
}
# Overlay a concentration ellipse if there are groups
if(!is.null(df.u$groups) && ellipse) {
theta <- c(seq(-pi, pi, length = 50), seq(pi, -pi, length = 50))
circle <- cbind(cos(theta), sin(theta))
ell <- ddply(df.u, 'groups', function(x) {
if(nrow(x) <= 2) {
return(NULL)
}
sigma <- var(cbind(x$xvar, x$yvar))
mu <- c(mean(x$xvar), mean(x$yvar))
ed <- sqrt(qchisq(ellipse.prob, df = 2))
data.frame(sweep(circle %*% chol(sigma) * ed, 2, mu, FUN = '+'),
groups = x$groups[1])
})
names(ell)[1:2] <- c('xvar', 'yvar')
g <- g + geom_path(data = ell, aes(color = groups, group = groups))
}
# Label the variable axes
if(var.axes) {
g <- g +
geom_text(data = df.v,
aes(label = varname, x = xvar, y = yvar,
angle = angle, hjust = hjust),
color = 'darkred', size = varname.size)
}
# Change the name of the legend for groups
# if(!is.null(groups)) {
# g <- g + scale_color_brewer(name = deparse(substitute(groups)),
# palette = 'Dark2')
# }
# TODO: Add a second set of axes
return(g)
}
# compute PCA
rice.pca <- prcomp(norm1[,1:8], center = TRUE,scale. = TRUE)
# view pca summary
summary(rice.pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 1.7248 1.4214 1.1228 0.80165 0.66286 0.63019 0.43245
## Proportion of Variance 0.3719 0.2525 0.1576 0.08033 0.05492 0.04964 0.02338
## Cumulative Proportion 0.3719 0.6244 0.7820 0.86234 0.91726 0.96690 0.99028
## PC8
## Standard deviation 0.27883
## Proportion of Variance 0.00972
## Cumulative Proportion 1.00000
# view pca structure
str(rice.pca)
## List of 5
## $ sdev : num [1:8] 1.725 1.421 1.123 0.802 0.663 ...
## $ rotation: num [1:8, 1:8] -0.3657 -0.3812 0.3156 0.4178 0.0747 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:8] "HHGW" "HGV" "PTH" "PNL" ...
## .. ..$ : chr [1:8] "PC1" "PC2" "PC3" "PC4" ...
## $ center : Named num [1:8] 0.02193 0.01695 0.00642 0.01261 -0.02606 ...
## ..- attr(*, "names")= chr [1:8] "HHGW" "HGV" "PTH" "PNL" ...
## $ scale : Named num [1:8] 0.989 0.995 0.984 0.994 0.954 ...
## ..- attr(*, "names")= chr [1:8] "HHGW" "HGV" "PTH" "PNL" ...
## $ x : num [1:364, 1:8] -2.227 0.552 3.9 2.64 4.948 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:364] "1" "2" "3" "4" ...
## .. ..$ : chr [1:8] "PC1" "PC2" "PC3" "PC4" ...
## - attr(*, "class")= chr "prcomp"
P <- ggbiplot(rice.pca,
obs.scale = 1,
var.scale=1,
ellipse=F,
circle=F,
varname.size=4,
var.axes=T,
groups=norm1$subpopulation, #no need for coloring, I'm making the points invisible
alpha=0) #invisible points, I add them below
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following object is masked from 'package:ggpubr':
##
## mutate
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following object is masked from 'package:purrr':
##
## compact
P$layers <- c(geom_point(aes(color=norm1$subpopulation), cex=3), P$layers) #add geom_point in a layer underneath (only way I have to change the size of the points in ggbiplot)
P
For the principal component plot, PC1 accounted for 37.2% of the total variation and was associated with FGN, FLN, PNL, DTH and PNN. The first quadrat of PC1 formed a cluster comprising ADMIX, AUS and IND scoring high for FGN, FLN, PNL and DTH, while IND and AUS formed the major cluster in the second quadrat scoring low for PTH and PNN. TEJ and ADMIX were the major clusters found in PC2 which accounted for 25.3%. It was associated with HHGW and HGV, which scored high on the first quadrat, while the second quadrat contained some clusters of TEJ, ADMIX and AUS.
Standardized Major Axis (SMA) regression
Relationship between Test weight and other agronomic traits
# To plot relationship between seed per panicle and test weight across subpopulations
# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$subpopulation), function(x) {
# make object for sma model and subset population
obj <- sma(HHGW~FGN*subpopulation, data=subset(norm1, subpopulation == x), multcomp=TRUE,multcompmethod="adjusted")
data.frame(subpopulation = x,
intercept=obj$coef[[1]][1, 1],
slope=obj$coef[[1]][2, 1])
})) -> fits
# plot relationship between seed per panicle and test weight across subpopulations
seeds_per_panicle <- ggplot(norm1) +
geom_point(aes(x=FGN, y=HHGW, color=subpopulation)) +
geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
labs(x = "Seeds per panicle", y = "100 seed weight",
title = "Relationship between test weight and seeds per panicle across subpopulations")
# visualize relationship
seeds_per_panicle
# make a facet-wrap to view relationship across groups
seeds_per_panicle + facet_wrap(~subpopulation, ncol=2)
# Perform sma analysis for seed per panicle across subpopulations
filgrnb_sma <-sma(HHGW~FGN*subpopulation, data=norm1,multcomp=TRUE,multcompmethod="adjusted")
# view group summary
filgrnb_sma$groupsummary %>%
kbl(format = 'html') %>%
kable_paper() %>%
scroll_box(width = "700px",
height = "350px") # Using KableExtra table formatting
group | n | r2 | pval | Slope | Slope_lowCI | Slope_highCI | Int | Int_lowCI | Int_highCI | Slope_test | Slope_test_p | Elev_test | Elev_test_p |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ADMIX | 56 | 0.0121245 | 0.4191634 | -1.0579973 | -1.3831041 | -0.8093088 | 0.1553618 | -0.2030996 | 0.5138232 | NA | NA | NA | NA |
ARO | 7 | 0.2221491 | 0.2856813 | -3.3189881 | -8.0916075 | -1.3613713 | -3.0181351 | -5.9428513 | -0.0934189 | NA | NA | NA | NA |
AUS | 51 | 0.0722018 | 0.0565743 | -0.6931603 | -0.9108492 | -0.5274981 | -0.5983793 | -0.8477217 | -0.3490369 | NA | NA | NA | NA |
IND | 73 | 0.0003167 | 0.8812093 | 0.8160154 | 0.6454752 | 1.0316138 | -0.5489209 | -0.8172319 | -0.2806098 | NA | NA | NA | NA |
TEJ | 92 | 0.0000000 | 0.9992490 | 0.9523719 | 0.7735907 | 1.1724705 | 0.7747140 | 0.5149058 | 1.0345222 | NA | NA | NA | NA |
TRJ | 85 | 0.2515723 | 0.0000010 | -0.9518213 | -1.1484191 | -0.7888790 | 0.7758076 | 0.5609367 | 0.9906784 | NA | NA | NA | NA |
#------------------------------------------------------------------------------
# To plot the relationship between Total Florets per panicle and 100 seed weight
# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$subpopulation), function(x) {
# make object for sma model and subset population
obj <- sma(HHGW~FLN*subpopulation, data=subset(norm1, subpopulation == x), multcomp=TRUE,multcompmethod="adjusted")
data.frame(subpopulation = x,
intercept=obj$coef[[1]][1, 1],
slope=obj$coef[[1]][2, 1])
})) -> fits
# plot relationship between seed per panicle and test weight across subpopulations
florets_per_panicle <- ggplot(norm1) +
geom_point(aes(x=FLN, y=HHGW, color=subpopulation)) +
geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
labs(x = "Total Florets per Panicle", y = "100 seed weight",
title = "Relationship between test weight and total florets per panicle across subpopulations")
# visualize relationship
florets_per_panicle
# make a facet-wrap to view relationship across groups
florets_per_panicle + facet_wrap(~subpopulation, ncol=2)
# Perform sma analysis for Total Florets per panicle across subpopulations
flnbppn_sma <-sma(HHGW~FLN*subpopulation, data=norm1,multcomp=TRUE,multcompmethod="adjusted")
# view group summary
flnbppn_sma$groupsummary %>%
kbl(format = 'html') %>%
kable_paper() %>%
scroll_box(width = "700px",
height = "350px") # Using KableExtra table formatting
group | n | r2 | pval | Slope | Slope_lowCI | Slope_highCI | Int | Int_lowCI | Int_highCI | Slope_test | Slope_test_p | Elev_test | Elev_test_p |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ADMIX | 56 | 0.1613282 | 0.0021530 | -1.0019881 | -1.2831417 | -0.7824390 | 0.4253753 | 0.1263962 | 0.7243544 | NA | NA | NA | NA |
ARO | 7 | 0.3560280 | 0.1572758 | -3.3027580 | -7.5404121 | -1.4466332 | -2.9293795 | -5.4986269 | -0.3601320 | NA | NA | NA | NA |
AUS | 51 | 0.0767219 | 0.0490988 | -0.7737894 | -1.0161392 | -0.5892401 | -0.8708629 | -1.1211628 | -0.6205630 | NA | NA | NA | NA |
IND | 73 | 0.0280656 | 0.1565744 | -0.8576659 | -1.0807839 | -0.6806086 | -0.5036280 | -0.7507534 | -0.2565026 | NA | NA | NA | NA |
TEJ | 92 | 0.0000873 | 0.9295503 | 1.0138130 | 0.8235053 | 1.2480998 | 0.8539348 | 0.5901924 | 1.1176772 | NA | NA | NA | NA |
TRJ | 85 | 0.3162021 | 0.0000000 | -0.9617684 | -1.1509440 | -0.8036867 | 0.8723962 | 0.6641858 | 1.0806066 | NA | NA | NA | NA |
#------------------------------------------------------------------------------
# To estimate the relationship between panicles per plant and 100 seed weight
# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$subpopulation), function(x) {
# make object for sma model and subset population
obj <- sma(HHGW~FLN*subpopulation, data=subset(norm1, subpopulation == x), multcomp=TRUE,multcompmethod="adjusted")
data.frame(subpopulation = x,
intercept=obj$coef[[1]][1, 1],
slope=obj$coef[[1]][2, 1])
})) -> fits
# plot relationship between panicles per plant and test weight across subpopulations
panicles_per_plant <- ggplot(norm1) +
geom_point(aes(x=PNN, y=HHGW, color=subpopulation)) +
geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
labs(x = "Panicle per plant", y = "100 seed weight",
title = "Relationship between test weight and panicles per plant across subpopulations")
# visualize relationship
panicles_per_plant
# make a facet-wrap to view relationship across groups
panicles_per_plant + facet_wrap(~subpopulation, ncol=2)
# Perform sma analysis for panicles per plant across subpopulations
pnnb_sma <-sma(HHGW~PNN*subpopulation, data=norm1,multcomp=TRUE,multcompmethod="adjusted")
# view group summary
pnnb_sma$groupsummary %>%
kbl(format = 'html') %>%
kable_paper() %>%
scroll_box(width = "700px",
height = "350px") # Using KableExtra table formatting
group | n | r2 | pval | Slope | Slope_lowCI | Slope_highCI | Int | Int_lowCI | Int_highCI | Slope_test | Slope_test_p | Elev_test | Elev_test_p |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ADMIX | 56 | 0.1381634 | 0.0047924 | -1.3066327 | -1.678839 | -1.0169464 | -0.2327722 | -0.5533354 | 0.0877910 | NA | NA | NA | NA |
ARO | 7 | 0.0148219 | 0.7948473 | 3.1650317 | 1.190632 | 8.4135364 | -0.3755597 | -1.8688436 | 1.1177243 | NA | NA | NA | NA |
AUS | 51 | 0.0877770 | 0.0347723 | -0.8982791 | -1.177736 | -0.6851323 | -0.1998707 | -0.4796015 | 0.0798600 | NA | NA | NA | NA |
IND | 73 | 0.0212668 | 0.2182957 | -1.0111003 | -1.275145 | -0.8017317 | 0.5352962 | 0.1808400 | 0.8897523 | NA | NA | NA | NA |
TEJ | 92 | 0.0681682 | 0.0119433 | -1.4595375 | -1.784107 | -1.1940148 | 0.1097310 | -0.1148818 | 0.3343438 | NA | NA | NA | NA |
TRJ | 85 | 0.0135029 | 0.2895732 | -2.6793730 | -3.322629 | -2.1606505 | -2.0072094 | -2.5961890 | -1.4182299 | NA | NA | NA | NA |
#------------------------------------------------------------------------------
# To estimate the relationship between plant height and 100 seed weight
# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$subpopulation), function(x) {
# make object for sma model and subset population
obj <- sma(HHGW~PTH*subpopulation, data=subset(norm1, subpopulation == x), multcomp=TRUE,multcompmethod="adjusted")
data.frame(subpopulation = x,
intercept=obj$coef[[1]][1, 1],
slope=obj$coef[[1]][2, 1])
})) -> fits
# plot relationship between plant height and test weight across subpopulations
plant_height <- ggplot(norm1) +
geom_point(aes(x=PTH, y=HHGW, color=subpopulation)) +
geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
labs(x = "Plant height", y = "100 seed weight",
title = "Relationship between test weight and plant height across regions")
# visualize relationship
plant_height
# make a facet-wrap to view relationship across groups
plant_height + facet_wrap(~subpopulation, ncol=2)
# Perform sma analysis for panicles per plant across subpopulations
ptht_sma <-sma(HHGW~PTH*subpopulation, data=norm1,multcomp=TRUE,multcompmethod="adjusted")
# view group summary
ptht_sma$groupsummary %>%
kbl(format = 'html') %>%
kable_paper() %>%
scroll_box(width = "700px",
height = "350px") # Using KableExtra table formatting
group | n | r2 | pval | Slope | Slope_lowCI | Slope_highCI | Int | Int_lowCI | Int_highCI | Slope_test | Slope_test_p | Elev_test | Elev_test_p |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ADMIX | 56 | 0.0118435 | 0.4246377 | -0.9903552 | -1.2947248 | -0.7575381 | 0.2082924 | -0.1502028 | 0.5667876 | NA | NA | NA | NA |
ARO | 7 | 0.0128631 | 0.8086956 | -1.2907307 | -3.4336862 | -0.4851887 | 1.7019968 | -1.0996128 | 4.5036064 | NA | NA | NA | NA |
AUS | 51 | 0.0263307 | 0.2552656 | -1.4129056 | -1.8687461 | -1.0682576 | 0.3222571 | -0.0715528 | 0.7160670 | NA | NA | NA | NA |
IND | 73 | 0.0482383 | 0.0618992 | -0.7246572 | -0.9110112 | -0.5764232 | -0.3988271 | -0.6400974 | -0.1575568 | NA | NA | NA | NA |
TEJ | 92 | 0.0282577 | 0.1092141 | 1.2331440 | 1.0045823 | 1.5137078 | 1.2298064 | 0.9489479 | 1.5106649 | NA | NA | NA | NA |
TRJ | 85 | 0.0402848 | 0.0654966 | 1.1154818 | 0.9021361 | 1.3792814 | 0.4182288 | 0.1609324 | 0.6755252 | NA | NA | NA | NA |
#------------------------------------------------------------------------------
# To estimate the relationship between days to heading and 100 seed weight
# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$subpopulation), function(x) {
# make object for sma model and subset population
obj <- sma(HHGW~DTH*subpopulation, data=subset(norm1, subpopulation == x), multcomp=TRUE,multcompmethod="adjusted")
data.frame(subpopulation = x,
intercept=obj$coef[[1]][1, 1],
slope=obj$coef[[1]][2, 1])
})) -> fits
# plot relationship between days to heading and test weight across subpopulations
days_to_heading <- ggplot(norm1) +
geom_point(aes(x=DTH, y=HHGW, color=subpopulation)) +
geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
labs(x = "Days to heading", y = "100 seed weight",
title = "Relationship between test weight and Days to Heading across subpopulations")
# visualize relationship
days_to_heading
# make a facet-wrap to view relationship across groups
days_to_heading + facet_wrap(~subpopulation, ncol=2)
# Perform sma analysis for days to heading across subpopulations
dthd_sma <-sma(HHGW~DTH*subpopulation, data=norm1,multcomp=TRUE,multcompmethod="adjusted")
# view group summary
dthd_sma$groupsummary %>%
kbl(format = 'html') %>%
kable_paper() %>%
scroll_box(width = "700px",
height = "350px") # Using KableExtra table formatting
group | n | r2 | pval | Slope | Slope_lowCI | Slope_highCI | Int | Int_lowCI | Int_highCI | Slope_test | Slope_test_p | Elev_test | Elev_test_p |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ADMIX | 56 | 0.0570007 | 0.0763853 | -1.1270561 | -1.4645429 | -0.8673391 | 0.1121810 | -0.2200131 | 0.4443752 | NA | NA | NA | NA |
ARO | 7 | 0.4841719 | 0.0825206 | -1.3669631 | -2.9013274 | -0.6440459 | -0.4074222 | -1.2867789 | 0.4719346 | NA | NA | NA | NA |
AUS | 51 | 0.0192425 | 0.3316554 | 1.1706593 | 0.8842261 | 1.5498786 | -0.8271417 | -1.0979854 | -0.5562980 | NA | NA | NA | NA |
IND | 73 | 0.0295793 | 0.1456815 | -0.6419890 | -0.8088564 | -0.5095464 | -0.2516907 | -0.5068615 | 0.0034801 | NA | NA | NA | NA |
TEJ | 92 | 0.0186075 | 0.1947779 | -0.9552551 | -1.1737686 | -0.7774210 | -0.1698103 | -0.4337191 | 0.0940985 | NA | NA | NA | NA |
TRJ | 85 | 0.0007544 | 0.8029571 | -1.5460831 | -1.9198803 | -1.2450635 | 0.7033897 | 0.4122409 | 0.9945385 | NA | NA | NA | NA |
# To relationship between seed per panicle and test weight across regions
# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$region), function(x) {
# make object for sma model and subset population
obj <- sma(HHGW~FGN*region, data=subset(norm1, region == x), multcomp=TRUE,multcompmethod="adjusted")
data.frame(region = x,
intercept=obj$coef[[1]][1, 1],
slope=obj$coef[[1]][2, 1])
})) -> fits
# plot relationship between seed per panicle and test weight across regions
seeds_per_panicle <- ggplot(norm1) +
geom_point(aes(x=FGN, y=HHGW, color=region)) +
geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
labs(x = "Seeds per panicle", y = "100 seed weight",
title = "Relationship between test weight and seeds per panicle across regions")
# visualize relationship
seeds_per_panicle
# make a facet-wrap to view relationship across groups
seeds_per_panicle + facet_wrap(~region, ncol=2)
# Perform sma analysis for seed per panicle across regions
filgrnb1_sma <-sma(HHGW~FGN*region, data=norm1,multcomp=TRUE,multcompmethod="adjusted")
# view group summary
filgrnb1_sma$groupsummary %>%
kbl(format = 'html') %>%
kable_paper() %>%
scroll_box(width = "700px",
height = "350px") # Using KableExtra table formatting
group | n | r2 | pval | Slope | Slope_lowCI | Slope_highCI | Int | Int_lowCI | Int_highCI | Slope_test | Slope_test_p | Elev_test | Elev_test_p |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Africa | 37 | 0.1038062 | 0.0518053 | -1.1395415 | -1.5683439 | -0.8279783 | 0.7981303 | 0.4009233 | 1.1953373 | NA | NA | NA | NA |
Caribbean | 6 | 0.0685432 | 0.6162613 | -0.6306579 | -1.8993192 | -0.2094063 | 0.5911244 | -0.1275153 | 1.3097641 | NA | NA | NA | NA |
Central America | 6 | 0.4047575 | 0.1744462 | -0.8800483 | -2.2321099 | -0.3469744 | 0.8680227 | 0.1612131 | 1.5748324 | NA | NA | NA | NA |
Central Asia | 6 | 0.1373502 | 0.4695396 | -1.0977920 | -3.2067302 | -0.3758181 | -0.9904780 | -2.4043457 | 0.4233898 | NA | NA | NA | NA |
East Asia | 74 | 0.0007424 | 0.8177540 | 0.8618698 | 0.6829122 | 1.0877233 | -0.0957317 | -0.3602370 | 0.1687735 | NA | NA | NA | NA |
East Europe | 15 | 0.0045454 | 0.8113138 | -0.6786493 | -1.1963794 | -0.3849656 | 0.3255896 | -0.3758267 | 1.0270060 | NA | NA | NA | NA |
North America | 43 | 0.0158674 | 0.4208852 | -0.7349767 | -1.0000780 | -0.5401487 | 0.2770053 | -0.0168251 | 0.5708357 | NA | NA | NA | NA |
Oceania | 6 | 0.3608716 | 0.2073036 | -0.7256853 | -1.8894745 | -0.2787120 | -0.3652131 | -1.2710989 | 0.5406727 | NA | NA | NA | NA |
South America | 30 | 0.0610612 | 0.1880170 | -1.2935255 | -1.8667460 | -0.8963234 | 0.0973052 | -0.3656056 | 0.5602159 | NA | NA | NA | NA |
South Asia | 63 | 0.0494662 | 0.0797754 | -0.7445339 | -0.9532235 | -0.5815328 | -0.5703187 | -0.8336862 | -0.3069512 | NA | NA | NA | NA |
Southeast Asia | 46 | 0.0105168 | 0.4976533 | -0.8542723 | -1.1506184 | -0.6342513 | -0.1038009 | -0.4983028 | 0.2907009 | NA | NA | NA | NA |
West Asia | 11 | 0.0089617 | 0.7818864 | -0.9335995 | -1.8681939 | -0.4665511 | -0.6187488 | -1.5346333 | 0.2971357 | NA | NA | NA | NA |
West Europe | 21 | 0.0023361 | 0.8351923 | -0.9910938 | -1.5745282 | -0.6238484 | 0.9547820 | 0.4382989 | 1.4712650 | NA | NA | NA | NA |
#------------------------------------------------------------------------------
# To estimate the relationship between Total Florets per panicle and 100 seed weight
# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$region), function(x) {
# make object for sma model and subset population
obj <- sma(HHGW~FLN*region, data=subset(norm1, region == x), multcomp=TRUE,multcompmethod="adjusted")
data.frame(region= x,
intercept=obj$coef[[1]][1, 1],
slope=obj$coef[[1]][2, 1])
})) -> fits
# plot relationship between seed per panicle and test weight across regions
florets_per_panicle <- ggplot(norm1) +
geom_point(aes(x=FLN, y=HHGW, color=region)) +
geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
labs(x = "Total Florets per Panicle", y = "100 seed weight",
title = "Relationship between test weight and total florets per panicle across regions")
# visualize relationship
florets_per_panicle
# make a facet-wrap to view relationship across groups
florets_per_panicle + facet_wrap(~region, ncol=2)
# Perform sma analysis for Total Florets per panicle across regions
flnbppn1_sma <-sma(HHGW~FLN*region, data=norm1,multcomp=TRUE,multcompmethod="adjusted")
# view group summary
flnbppn1_sma$groupsummary %>%
kbl(format = 'html') %>%
kable_paper() %>%
scroll_box(width = "700px",
height = "350px") # Using KableExtra table formatting
group | n | r2 | pval | Slope | Slope_lowCI | Slope_highCI | Int | Int_lowCI | Int_highCI | Slope_test | Slope_test_p | Elev_test | Elev_test_p |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Africa | 37 | 0.1824934 | 0.0083630 | -1.2247427 | -1.6623304 | -0.9023444 | 0.9025426 | 0.5346671 | 1.2704181 | NA | NA | NA | NA |
Caribbean | 6 | 0.0058945 | 0.8850630 | -0.6217080 | -1.9221328 | -0.2010896 | 0.4849523 | -0.2637442 | 1.2336489 | NA | NA | NA | NA |
Central America | 6 | 0.5675139 | 0.0837613 | -0.9795673 | -2.2206821 | -0.4320979 | 0.7868129 | 0.2336541 | 1.3399717 | NA | NA | NA | NA |
Central Asia | 6 | 0.2078680 | 0.3634980 | -0.9024660 | -2.5495213 | -0.3194501 | -0.7732753 | -1.9320294 | 0.3854787 | NA | NA | NA | NA |
East Asia | 74 | 0.0000042 | 0.9862292 | -0.8872140 | -1.1198035 | -0.7029346 | -0.3062358 | -0.5754577 | -0.0370139 | NA | NA | NA | NA |
East Europe | 15 | 0.0024563 | 0.8607635 | -0.7441415 | -1.3125404 | -0.4218892 | 0.2996701 | -0.4128345 | 1.0121747 | NA | NA | NA | NA |
North America | 43 | 0.0952134 | 0.0440929 | -0.7525207 | -1.0114193 | -0.5598938 | 0.4403637 | 0.1625446 | 0.7181829 | NA | NA | NA | NA |
Oceania | 6 | 0.0381806 | 0.7106325 | -0.9618998 | -2.9344850 | -0.3153028 | -0.3175809 | -1.5986496 | 0.9634879 | NA | NA | NA | NA |
South America | 30 | 0.2966206 | 0.0018599 | -1.2485895 | -1.7181157 | -0.9073753 | 0.2048406 | -0.1542327 | 0.5639140 | NA | NA | NA | NA |
South Asia | 63 | 0.0774085 | 0.0272509 | -0.7995030 | -1.0199343 | -0.6267120 | -0.7127386 | -0.9639363 | -0.4615409 | NA | NA | NA | NA |
Southeast Asia | 46 | 0.0531385 | 0.1232532 | -0.7999274 | -1.0706494 | -0.5976596 | 0.0055228 | -0.3641482 | 0.3751938 | NA | NA | NA | NA |
West Asia | 11 | 0.0015596 | 0.9082219 | 1.0253296 | 0.5112472 | 2.0563454 | 0.9416803 | -0.0834793 | 1.9668400 | NA | NA | NA | NA |
West Europe | 21 | 0.0047711 | 0.7660808 | -1.0065084 | -1.5981729 | -0.6338858 | 0.8928619 | 0.3737050 | 1.4120189 | NA | NA | NA | NA |
#------------------------------------------------------------------------------
# To estimate the relationship between panicles per plant and 100 seed weight
# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$region), function(x) {
# make object for sma model and subset population
obj <- sma(HHGW~PNN*region, data=subset(norm1, region == x), multcomp=TRUE,multcompmethod="adjusted")
data.frame(region = x,
intercept=obj$coef[[1]][1, 1],
slope=obj$coef[[1]][2, 1])
})) -> fits
# plot relationship between panicles per plant and test weight across regions
panicles_per_plant <- ggplot(norm1) +
geom_point(aes(x=PNN, y=HHGW, color=region)) +
geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
labs(x = "Panicle per plant", y = "100 seed weight",
title = "Relationship between test weight and panicles per plant across regions")
# visualize relationship
panicles_per_plant
# make a facet-wrap to view relationship across regions
panicles_per_plant + facet_wrap(~region, ncol=2)
# Perform sma analysis for panicles per plant across regions
pnnb1_sma <-sma(HHGW~PNN*region, data=norm1,multcomp=TRUE,multcompmethod="adjusted")
# view group summary
pnnb1_sma$groupsummary %>%
kbl(format = 'html') %>%
kable_paper() %>%
scroll_box(width = "700px",
height = "350px") # Using KableExtra table formatting
group | n | r2 | pval | Slope | Slope_lowCI | Slope_highCI | Int | Int_lowCI | Int_highCI | Slope_test | Slope_test_p | Elev_test | Elev_test_p |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Africa | 37 | 0.0982540 | 0.0588716 | -1.2364862 | -1.7033950 | -0.8975593 | 0.2075433 | -0.2294960 | 0.6445827 | NA | NA | NA | NA |
Caribbean | 6 | 0.0276460 | 0.7528919 | -0.4763870 | -1.4597221 | -0.1554711 | 0.1371494 | -0.6263329 | 0.9006317 | NA | NA | NA | NA |
Central America | 6 | 0.2714903 | 0.2891587 | 3.2463278 | 1.1868026 | 8.8798628 | 3.0949157 | -0.0113749 | 6.2012063 | NA | NA | NA | NA |
Central Asia | 6 | 0.1525534 | 0.4439209 | 1.2164528 | 0.4193714 | 3.5285131 | 0.4598387 | -0.8589333 | 1.7786107 | NA | NA | NA | NA |
East Asia | 74 | 0.1473644 | 0.0007344 | -0.8401930 | -1.0420001 | -0.6774705 | 0.0796029 | -0.1373261 | 0.2965319 | NA | NA | NA | NA |
East Europe | 15 | 0.2781364 | 0.0433516 | -1.6136370 | -2.6321650 | -0.9892330 | 0.0772868 | -0.4913607 | 0.6459344 | NA | NA | NA | NA |
North America | 43 | 0.0003400 | 0.9065756 | 1.2394581 | 0.9087676 | 1.6904832 | 0.7702358 | 0.3952447 | 1.1452269 | NA | NA | NA | NA |
Oceania | 6 | 0.4271472 | 0.1592367 | -0.5701319 | -1.4260224 | -0.2279420 | -0.2029340 | -1.0481804 | 0.6423123 | NA | NA | NA | NA |
South America | 30 | 0.2715049 | 0.0031527 | -1.0937199 | -1.5132481 | -0.7905004 | 0.0680064 | -0.3023325 | 0.4383453 | NA | NA | NA | NA |
South Asia | 63 | 0.0551354 | 0.0639668 | -0.9367392 | -1.1984358 | -0.7321880 | -0.2173011 | -0.5045463 | 0.0699441 | NA | NA | NA | NA |
Southeast Asia | 46 | 0.1993628 | 0.0018681 | -0.8358917 | -1.0934764 | -0.6389849 | -0.2066121 | -0.5153668 | 0.1021427 | NA | NA | NA | NA |
West Asia | 11 | 0.3602667 | 0.0508948 | -0.9851505 | -1.7446157 | -0.5562953 | 0.3023817 | -0.2227482 | 0.8275117 | NA | NA | NA | NA |
West Europe | 21 | 0.1134283 | 0.1354687 | -1.2945534 | -2.0060117 | -0.8354231 | 0.6785127 | 0.1948718 | 1.1621536 | NA | NA | NA | NA |
#------------------------------------------------------------------------------
# To estimate the relationship between plant height and 100 seed weight
# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$region), function(x) {
# make object for sma model and subset population
obj <- sma(HHGW~PNN*region, data=subset(norm1, region == x), multcomp=TRUE,multcompmethod="adjusted")
data.frame(region = x,
intercept=obj$coef[[1]][1, 1],
slope=obj$coef[[1]][2, 1])
})) -> fits
# plot relationship between plant height and test weight across regions
plant_height <- ggplot(norm1) +
geom_point(aes(x=PNN, y=HHGW, color=region)) +
geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
labs(x = "Plant height", y = "100 seed weight",
title = "Relationship between test weight and plant height across regions")
# visualize relationship
plant_height
# make a facet-wrap to view relationship across regions
plant_height + facet_wrap(~region, ncol=2)
# Perform sma analysis for panicles per plant across regions
ptht1_sma <-sma(HHGW~PNN*region, data=norm1,multcomp=TRUE,multcompmethod="adjusted")
# view group summary
ptht1_sma$groupsummary%>%
kbl(format = 'html') %>%
kable_paper() %>%
scroll_box(width = "700px",
height = "350px") # Using KableExtra table formatting
group | n | r2 | pval | Slope | Slope_lowCI | Slope_highCI | Int | Int_lowCI | Int_highCI | Slope_test | Slope_test_p | Elev_test | Elev_test_p |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Africa | 37 | 0.0982540 | 0.0588716 | -1.2364862 | -1.7033950 | -0.8975593 | 0.2075433 | -0.2294960 | 0.6445827 | NA | NA | NA | NA |
Caribbean | 6 | 0.0276460 | 0.7528919 | -0.4763870 | -1.4597221 | -0.1554711 | 0.1371494 | -0.6263329 | 0.9006317 | NA | NA | NA | NA |
Central America | 6 | 0.2714903 | 0.2891587 | 3.2463278 | 1.1868026 | 8.8798628 | 3.0949157 | -0.0113749 | 6.2012063 | NA | NA | NA | NA |
Central Asia | 6 | 0.1525534 | 0.4439209 | 1.2164528 | 0.4193714 | 3.5285131 | 0.4598387 | -0.8589333 | 1.7786107 | NA | NA | NA | NA |
East Asia | 74 | 0.1473644 | 0.0007344 | -0.8401930 | -1.0420001 | -0.6774705 | 0.0796029 | -0.1373261 | 0.2965319 | NA | NA | NA | NA |
East Europe | 15 | 0.2781364 | 0.0433516 | -1.6136370 | -2.6321650 | -0.9892330 | 0.0772868 | -0.4913607 | 0.6459344 | NA | NA | NA | NA |
North America | 43 | 0.0003400 | 0.9065756 | 1.2394581 | 0.9087676 | 1.6904832 | 0.7702358 | 0.3952447 | 1.1452269 | NA | NA | NA | NA |
Oceania | 6 | 0.4271472 | 0.1592367 | -0.5701319 | -1.4260224 | -0.2279420 | -0.2029340 | -1.0481804 | 0.6423123 | NA | NA | NA | NA |
South America | 30 | 0.2715049 | 0.0031527 | -1.0937199 | -1.5132481 | -0.7905004 | 0.0680064 | -0.3023325 | 0.4383453 | NA | NA | NA | NA |
South Asia | 63 | 0.0551354 | 0.0639668 | -0.9367392 | -1.1984358 | -0.7321880 | -0.2173011 | -0.5045463 | 0.0699441 | NA | NA | NA | NA |
Southeast Asia | 46 | 0.1993628 | 0.0018681 | -0.8358917 | -1.0934764 | -0.6389849 | -0.2066121 | -0.5153668 | 0.1021427 | NA | NA | NA | NA |
West Asia | 11 | 0.3602667 | 0.0508948 | -0.9851505 | -1.7446157 | -0.5562953 | 0.3023817 | -0.2227482 | 0.8275117 | NA | NA | NA | NA |
West Europe | 21 | 0.1134283 | 0.1354687 | -1.2945534 | -2.0060117 | -0.8354231 | 0.6785127 | 0.1948718 | 1.1621536 | NA | NA | NA | NA |
#------------------------------------------------------------------------------
# To estimate the relationship between days to heading and 100 seed weight
# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$region), function(x) {
# make object for sma model and subset population
obj <- sma(HHGW~DTH*region, data=subset(norm1, region == x), multcomp=TRUE,multcompmethod="adjusted")
data.frame(region = x,
intercept=obj$coef[[1]][1, 1],
slope=obj$coef[[1]][2, 1])
})) -> fits
# plot relationship between days to heading and test weight across regions
days_to_heading <- ggplot(norm1) +
geom_point(aes(x=DTH, y=HHGW, color=region)) +
geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
labs(x = "Days to heading", y = "100 seed weight",
title = "Relationship between test weight and Days to Heading across regions")
# visualize relationship
days_to_heading
# make a facet-wrap to view relationship across regions
days_to_heading + facet_wrap(~region, ncol=2)
# Perform sma analysis for Days to heading across regions
dthd1_sma <-sma(HHGW~DTH*region, data=norm1,multcomp=TRUE,multcompmethod="adjusted")
# view group summary
dthd1_sma$groupsummary %>%
kbl(format = 'html') %>%
kable_paper() %>%
scroll_box(width = "700px",
height = "350px") # Using KableExtra table formatting
group | n | r2 | pval | Slope | Slope_lowCI | Slope_highCI | Int | Int_lowCI | Int_highCI | Slope_test | Slope_test_p | Elev_test | Elev_test_p |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Africa | 37 | 0.0146114 | 0.4760598 | 0.8908774 | 0.6376806 | 1.2446081 | 0.5296634 | 0.0713506 | 0.9879763 | NA | NA | NA | NA |
Caribbean | 6 | 0.0131383 | 0.8288195 | 1.0412775 | 0.3377972 | 3.2097924 | 0.6798661 | -0.1550681 | 1.5148004 | NA | NA | NA | NA |
Central America | 6 | 0.1222994 | 0.4968149 | 1.2897887 | 0.4385342 | 3.7934441 | 0.0422902 | -1.0048885 | 1.0894690 | NA | NA | NA | NA |
Central Asia | 6 | 0.2522037 | 0.3100299 | -0.5894852 | -1.6286757 | -0.2133591 | -0.7191027 | -1.7961556 | 0.3579501 | NA | NA | NA | NA |
East Asia | 74 | 0.0144835 | 0.3070841 | -0.8577000 | -1.0807544 | -0.6806813 | -0.2869803 | -0.5394275 | -0.0345331 | NA | NA | NA | NA |
East Europe | 15 | 0.0133345 | 0.6819456 | -0.9361115 | -1.6465107 | -0.5322193 | -0.3903905 | -1.3054272 | 0.5246461 | NA | NA | NA | NA |
North America | 43 | 0.0183218 | 0.3867970 | 1.2024663 | 0.8840452 | 1.6355785 | 0.1531816 | -0.1332323 | 0.4395954 | NA | NA | NA | NA |
Oceania | 6 | 0.0674391 | 0.6192211 | -1.4786715 | -4.4553515 | -0.4907513 | -0.3157077 | -1.5444846 | 0.9130691 | NA | NA | NA | NA |
South America | 30 | 0.1428722 | 0.0394516 | -1.0683908 | -1.5178379 | -0.7520295 | 0.1522019 | -0.2676111 | 0.5720149 | NA | NA | NA | NA |
South Asia | 63 | 0.0662274 | 0.0417346 | -0.8890929 | -1.1358622 | -0.6959348 | -0.4514204 | -0.7147067 | -0.1881341 | NA | NA | NA | NA |
Southeast Asia | 46 | 0.0644515 | 0.0886649 | -1.0929733 | -1.4603881 | -0.8179953 | 0.2561081 | -0.1278416 | 0.6400579 | NA | NA | NA | NA |
West Asia | 11 | 0.0259993 | 0.6357593 | 0.8877299 | 0.4459374 | 1.7672086 | 0.3375603 | -0.4217561 | 1.0968766 | NA | NA | NA | NA |
West Europe | 21 | 0.0052198 | 0.7556367 | -1.0477267 | -1.6634589 | -0.6599089 | 0.3486499 | -0.2971674 | 0.9944671 | NA | NA | NA | NA |
There was a strong relationship between HHGW - FGN for all rice subpopulations (Table 1). Negative associations were observed in ADMIX (slope = -1.06, p < 0.01), ARO (slope = -3.32, p < 0.01), AUS (slope = -0.69, p < 0.01) and TRJ (slope = -0.95, p < 0.01), while IND (slope = 0.82, p < 0.01) and TEJ (slope = 0.95, p < 0.01) showed positive associations. Across regions, only Africa (slope = -1.14, p = 0.052) showed a significant negative association between these traits.
For HHGW - FLN, all subpopulations showed a strong negative association except TEJ (slope = 1.01, p < 0.01) which had a positive association. Africa (slope = -1.22, p < 0.01), North America (slope = -0.75, p = 0.04), South America (slope = -1.25, p < 0.01) and South Asia (slope = -0.80, p = 0.03) were the only regions with significant relationships and they all exhibited a negative association.
ADMIX (slope = -1.31, p < 0.01), AUS (slope = -0.90, p = 0.03) and TEJ (slope = -1.46, p = 0.01) all showed strong negative associations for HHGW – PNN relationship. East Asia (slope = -0.84, p < 0.01), East Europe (slope = -1.61, p = 0.04), South America (slope = -1.09, p < 0.01), Southeast Asia (slope = -0.84, p < 0.01) and West Asia (slope = -1.00, p = 0.05) all showed a strong negative association.
No significant association was observed across all subpopulations for HHGW- PTH relationship. East Asia (slope = -0.84, p < 0.01), East Europe (slope = -1.61, p < 0.04), South America (slope = -1.09, p < 0.01), Southeast Asia (slope = -0.84, p < 0.01) and West Asia (slope = -0.99, p = 0.05) regions all showed strong negative associations.
Significant associations were not found across subpopulations for HHGW – DTH relationship. Nevertheless, significant negative associations were found only in South America (slope = -1.07, p = 0.04) and South Asia (slope = -0.89, p = 0.04).
Structural Equation Modeling (SEM)
#sem for ADMIX
data<-subset(norm1,subpopulation=='ADMIX')
sitemod <- 'HHGW~FGN+PNN+FLN
FGN~~PNN
PNN~~FLN
FGN~DTH+PTH
PNN~DTH+PTH
FLN~DTH+PTH
DTH~~PTH'
# model/sem anylysis
sitemod.fit <- lavaan::sem(sitemod, data = data)
#summary of the results
summary(sitemod.fit, fit.measures=TRUE)
## lavaan 0.6-7 ended normally after 25 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 18
##
## Number of observations 56
##
## Model Test User Model:
##
## Test statistic 61.587
## Degrees of freedom 3
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 137.088
## Degrees of freedom 15
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.520
## Tucker-Lewis Index (TLI) -1.399
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -409.267
## Loglikelihood unrestricted model (H1) -378.474
##
## Akaike (AIC) 854.535
## Bayesian (BIC) 890.991
## Sample-size adjusted Bayesian (BIC) 834.418
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.591
## 90 Percent confidence interval - lower 0.467
## 90 Percent confidence interval - upper 0.723
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.324
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## HHGW ~
## FGN 0.403 0.130 3.096 0.002
## PNN -0.472 0.150 -3.143 0.002
## FLN -0.774 0.110 -7.014 0.000
## FGN ~
## DTH 0.233 0.165 1.414 0.157
## PTH -0.028 0.145 -0.196 0.844
## PNN ~
## DTH 0.123 0.148 0.836 0.403
## PTH 0.074 0.130 0.570 0.569
## FLN ~
## DTH 0.501 0.162 3.090 0.002
## PTH -0.075 0.142 -0.527 0.598
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .FGN ~~
## .PNN -0.413 0.112 -3.694 0.000
## .PNN ~~
## .FLN 0.150 0.084 1.791 0.073
## DTH ~~
## PTH 0.463 0.131 3.520 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .HHGW 0.579 0.109 5.292 0.000
## .FGN 0.829 0.157 5.292 0.000
## .PNN 0.665 0.124 5.361 0.000
## .FLN 0.803 0.152 5.292 0.000
## DTH 0.763 0.144 5.292 0.000
## PTH 0.988 0.187 5.292 0.000
#plot our CFA. you can change layout with layout = argument. see ?semPaths() for more.
semPlot::semPaths(sitemod.fit, "par", edge.label.cex = 1.2, fade = FALSE)
title("ADMIX")
#------------------------------------------------------------------------------
#sem for ARO
data<-subset(norm1,subpopulation=='ARO')
sitemod <- 'HHGW~FGN+PNN+FLN
FGN~~PNN
PNN~~FLN
FGN~DTH+PTH
PNN~DTH+PTH
FLN~DTH+PTH
DTH~~PTH'
# model/sem anylysis
sitemod.fit <- lavaan::sem(sitemod, data = data)
#summary of the results
summary(sitemod.fit, fit.measures=TRUE)
## lavaan 0.6-7 ended normally after 60 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 18
##
## Number of observations 7
##
## Model Test User Model:
##
## Test statistic 9.832
## Degrees of freedom 3
## P-value (Chi-square) 0.020
##
## Model Test Baseline Model:
##
## Test statistic 46.483
## Degrees of freedom 15
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.783
## Tucker-Lewis Index (TLI) -0.085
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -11.625
## Loglikelihood unrestricted model (H1) -6.709
##
## Akaike (AIC) 59.249
## Bayesian (BIC) 58.276
## Sample-size adjusted Bayesian (BIC) 5.594
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.570
## 90 Percent confidence interval - lower 0.199
## 90 Percent confidence interval - upper 0.985
## P-value RMSEA <= 0.05 0.022
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.094
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## HHGW ~
## FGN 1.570 0.893 1.758 0.079
## PNN 3.189 0.912 3.498 0.000
## FLN -4.850 1.094 -4.435 0.000
## FGN ~
## DTH 0.329 0.189 1.734 0.083
## PTH -0.165 0.179 -0.923 0.356
## PNN ~
## DTH -0.101 0.206 -0.491 0.623
## PTH 0.345 0.195 1.773 0.076
## FLN ~
## DTH 0.310 0.127 2.452 0.014
## PTH 0.044 0.119 0.369 0.712
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .FGN ~~
## .PNN -0.029 0.021 -1.370 0.171
## .PNN ~~
## .FLN 0.029 0.018 1.595 0.111
## DTH ~~
## PTH 0.400 0.255 1.567 0.117
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .HHGW 0.215 0.115 1.871 0.061
## .FGN 0.059 0.032 1.871 0.061
## .PNN 0.070 0.034 2.068 0.039
## .FLN 0.026 0.014 1.871 0.061
## DTH 0.514 0.275 1.871 0.061
## PTH 0.577 0.308 1.871 0.061
#plot our CFA. you can change layout with layout = argument. see ?semPaths() for more.
semPlot::semPaths(sitemod.fit, "par", edge.label.cex = 1.2, fade = FALSE)
title("ARO")
#------------------------------------------------------------------------------
#sem for AUS
data<-subset(norm1,subpopulation=='AUS')
sitemod <- 'HHGW~FGN+PNN+FLN
FGN~~PNN
PNN~~FLN
FGN~DTH+PTH
PNN~DTH+PTH
FLN~DTH+PTH
DTH~~PTH'
# model/sem anylysis
sitemod.fit <- lavaan::sem(sitemod, data = data)
#summary of the results
summary(sitemod.fit, fit.measures=TRUE)
## lavaan 0.6-7 ended normally after 31 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 18
##
## Number of observations 51
##
## Model Test User Model:
##
## Test statistic 200.907
## Degrees of freedom 3
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 290.159
## Degrees of freedom 15
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.281
## Tucker-Lewis Index (TLI) -2.596
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -297.642
## Loglikelihood unrestricted model (H1) -197.188
##
## Akaike (AIC) 631.283
## Bayesian (BIC) 666.056
## Sample-size adjusted Bayesian (BIC) 609.543
##
## Root Mean Square Error of Approximation:
##
## RMSEA 1.137
## 90 Percent confidence interval - lower 1.007
## 90 Percent confidence interval - upper 1.273
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.392
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## HHGW ~
## FGN 0.460 0.090 5.102 0.000
## PNN -0.562 0.119 -4.732 0.000
## FLN -0.982 0.094 -10.395 0.000
## FGN ~
## DTH -0.414 0.193 -2.149 0.032
## PTH 1.084 0.233 4.657 0.000
## PNN ~
## DTH -0.103 0.156 -0.659 0.510
## PTH -0.729 0.188 -3.882 0.000
## FLN ~
## DTH -0.343 0.169 -2.031 0.042
## PTH 1.035 0.204 5.081 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .FGN ~~
## .PNN -0.183 0.083 -2.200 0.028
## .PNN ~~
## .FLN -0.054 0.066 -0.811 0.417
## DTH ~~
## PTH 0.009 0.043 0.209 0.834
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .HHGW 0.326 0.065 5.050 0.000
## .FGN 0.703 0.139 5.050 0.000
## .PNN 0.458 0.091 5.056 0.000
## .FLN 0.539 0.107 5.050 0.000
## DTH 0.371 0.073 5.050 0.000
## PTH 0.255 0.050 5.050 0.000
#plot our CFA. you can change layout with layout = argument. see ?semPaths() for more.
semPlot::semPaths(sitemod.fit, "par", edge.label.cex = 1.2, fade = FALSE)
title("AUS")
#------------------------------------------------------------------------------
#sem for IND
data<-subset(norm1,subpopulation=='IND')
sitemod <- 'HHGW~FGN+PNN+FLN
FGN~~PNN
PNN~~FLN
FGN~DTH+PTH
PNN~DTH+PTH
FLN~DTH+PTH
DTH~~PTH'
# model/sem anylysis
sitemod.fit <- lavaan::sem(sitemod, data = data)
#summary of the results
summary(sitemod.fit, fit.measures=TRUE)
## lavaan 0.6-7 ended normally after 29 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 18
##
## Number of observations 73
##
## Model Test User Model:
##
## Test statistic 129.492
## Degrees of freedom 3
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 184.562
## Degrees of freedom 15
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.254
## Tucker-Lewis Index (TLI) -2.730
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -582.198
## Loglikelihood unrestricted model (H1) -517.452
##
## Akaike (AIC) 1200.397
## Bayesian (BIC) 1241.625
## Sample-size adjusted Bayesian (BIC) 1184.906
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.760
## 90 Percent confidence interval - lower 0.651
## 90 Percent confidence interval - upper 0.875
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.442
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## HHGW ~
## FGN 0.708 0.083 8.538 0.000
## PNN -0.352 0.119 -2.954 0.003
## FLN -0.958 0.106 -9.045 0.000
## FGN ~
## DTH -0.001 0.091 -0.007 0.995
## PTH -0.179 0.103 -1.746 0.081
## PNN ~
## DTH -0.129 0.076 -1.707 0.088
## PTH 0.161 0.086 1.886 0.059
## FLN ~
## DTH 0.152 0.085 1.788 0.074
## PTH -0.166 0.096 -1.730 0.084
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .FGN ~~
## .PNN 0.082 0.078 1.054 0.292
## .PNN ~~
## .FLN -0.395 0.097 -4.060 0.000
## DTH ~~
## PTH 0.174 0.166 1.051 0.293
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .HHGW 0.489 0.081 6.042 0.000
## .FGN 0.943 0.156 6.042 0.000
## .PNN 0.655 0.108 6.061 0.000
## .FLN 0.827 0.137 6.042 0.000
## DTH 1.588 0.263 6.042 0.000
## PTH 1.246 0.206 6.042 0.000
#plot our CFA. you can change layout with layout = argument. see ?semPaths() for more.
semPlot::semPaths(sitemod.fit, "par", edge.label.cex = 1.2, fade = FALSE)
title("IND")
#------------------------------------------------------------------------------
#sem for TEJ
data<-subset(norm1,subpopulation=='TEJ')
sitemod <- 'HHGW~FGN+PNN+FLN
FGN~~PNN
PNN~~FLN
FGN~DTH+PTH
PNN~DTH+PTH
FLN~DTH+PTH
DTH~~PTH'
# model/sem anylysis
sitemod.fit <- lavaan::sem(sitemod, data = data)
#summary of the results
summary(sitemod.fit, fit.measures=TRUE)
## lavaan 0.6-7 ended normally after 28 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 18
##
## Number of observations 92
##
## Model Test User Model:
##
## Test statistic 196.728
## Degrees of freedom 3
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 334.740
## Degrees of freedom 15
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.394
## Tucker-Lewis Index (TLI) -2.029
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -573.818
## Loglikelihood unrestricted model (H1) -475.454
##
## Akaike (AIC) 1183.636
## Bayesian (BIC) 1229.028
## Sample-size adjusted Bayesian (BIC) 1172.210
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.838
## 90 Percent confidence interval - lower 0.741
## 90 Percent confidence interval - upper 0.939
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.169
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## HHGW ~
## FGN -0.016 0.107 -0.150 0.881
## PNN -0.424 0.156 -2.711 0.007
## FLN -0.073 0.126 -0.580 0.562
## FGN ~
## DTH 0.533 0.096 5.531 0.000
## PTH 0.124 0.125 0.996 0.319
## PNN ~
## DTH 0.087 0.079 1.103 0.270
## PTH -0.353 0.102 -3.454 0.001
## FLN ~
## DTH 0.511 0.089 5.777 0.000
## PTH 0.146 0.114 1.275 0.202
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .FGN ~~
## .PNN 0.081 0.039 2.066 0.039
## .PNN ~~
## .FLN -0.194 0.045 -4.336 0.000
## DTH ~~
## PTH 0.286 0.070 4.109 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .HHGW 0.658 0.097 6.782 0.000
## .FGN 0.518 0.076 6.782 0.000
## .PNN 0.350 0.051 6.845 0.000
## .FLN 0.436 0.064 6.782 0.000
## DTH 0.780 0.115 6.782 0.000
## PTH 0.468 0.069 6.782 0.000
#plot our CFA. you can change layout with layout = argument. see ?semPaths() for more.
semPlot::semPaths(sitemod.fit, "par", edge.label.cex = 1.2, fade = FALSE)
title("TEJ")
#------------------------------------------------------------------------------
#sem for TRJ
data<-subset(norm1,subpopulation=='TRJ')
sitemod <- 'HHGW~FGN+PNN+FLN
FGN~~PNN
PNN~~FLN
FGN~DTH+PTH
PNN~DTH+PTH
FLN~DTH+PTH
DTH~~PTH'
# model/sem anylysis
sitemod.fit <- lavaan::sem(sitemod, data = data)
#summary of the results
summary(sitemod.fit, fit.measures=TRUE)
## lavaan 0.6-7 ended normally after 28 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 18
##
## Number of observations 85
##
## Model Test User Model:
##
## Test statistic 130.740
## Degrees of freedom 3
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 226.559
## Degrees of freedom 15
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.396
## Tucker-Lewis Index (TLI) -2.019
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -517.290
## Loglikelihood unrestricted model (H1) -451.920
##
## Akaike (AIC) 1070.580
## Bayesian (BIC) 1114.548
## Sample-size adjusted Bayesian (BIC) 1057.762
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.708
## 90 Percent confidence interval - lower 0.607
## 90 Percent confidence interval - upper 0.814
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.216
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## HHGW ~
## FGN -0.051 0.079 -0.645 0.519
## PNN -0.938 0.234 -4.002 0.000
## FLN -0.611 0.081 -7.525 0.000
## FGN ~
## DTH 0.558 0.189 2.957 0.003
## PTH -0.092 0.136 -0.675 0.500
## PNN ~
## DTH 0.058 0.067 0.856 0.392
## PTH -0.087 0.049 -1.789 0.074
## FLN ~
## DTH 0.349 0.191 1.823 0.068
## PTH 0.030 0.138 0.215 0.829
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .FGN ~~
## .PNN -0.049 0.033 -1.474 0.140
## .PNN ~~
## .FLN -0.071 0.034 -2.069 0.039
## DTH ~~
## PTH 0.233 0.060 3.868 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .HHGW 0.500 0.077 6.519 0.000
## .FGN 0.864 0.133 6.519 0.000
## .PNN 0.110 0.017 6.528 0.000
## .FLN 0.890 0.136 6.519 0.000
## DTH 0.363 0.056 6.519 0.000
## PTH 0.698 0.107 6.519 0.000
#plot our CFA. you can change layout with layout = argument. see ?semPaths() for more.
semPlot::semPaths(sitemod.fit, "par", edge.label.cex = 1.2, fade = FALSE)
title("TRJ")
Based on the Root Mean Square Error of Approximation (RMSEA > 0.5) and Standardized Root Mean Square Residual (SRMSR < 0.5), the model was a good fit for the data. Test weight was significantly affected in all rice sub-populations.
Direct and Indirect Effects
For all subpopulations, a negative direct effect on HHGW was observed in FLN. In ADMIX subpopulation, FGN, PNN and FLN accounted for 58% (R2 = 0.58) of the variation in HHGW. FLN had the strongest negative effect on HHGW (β = -0.77, p < 0.05), this implies that if FLN was increased by one standard deviation with FGN and PNN held constant, HHGW will decrease by 0.77 standard deviations. PNN negatively affected HHGW (β = -0.47), while FGN positively affected HHGW (β = 0.40) at p < 0.05. There was a significant indirect effect (p < 0.05) of PTH on HHGW via DTH and FLN. The double-headed arrow between DTH and PTH indicates a significant relationship (p < 0.05) but assumes no causality.
In ARO subpopulation, FLN, PNN and FGN accounted for 22 % (R2 = 0.22) of the variation in HHGW. The strongest negative direct effect was observed in FLN (β = -0.85, p < 0.05), while PNN and FGN had positive effects on HHGW. All indirect effects of PTH and DTH via FGN, PNN and FLN were all insignificant (p > 0.05). In AUS subpopulation, all agronomic traits accounted for 33% (R2 = 0.33) of the variation in HHGW. The strongest direct effect on HHGW was observed in FLN (β = -0.98, p < 0.05). PNN had a direct negative effect (β = -0.56, p < 0.05) while FGN had a direct positive effect (β = 0.46, p < 0.05). PTH and DTH had significant indirect effects on HHGW. PTH had effect on HHGW via FLN, PNN and FGN while DTH effect was via FGN and FLN.
For IND, the only significant direct effects on HHGW were observed in FLN (β = -0.96, p < 0.05) and FGN (β = 0.71, p < 0.05) which had negative and positive respectively. These traits accounted for 49% (R2 = 0.49) of the variation in HHGW. All indirect effects were insignificant (p > 0.05). For TEJ, only PNN accounted for all the variation in HHGW (R2 = 0.66) and was the only trait with a significant direct effect on HHGW (β = -0.42, p < 0.05). A significant indirect effect on HHGW was observed in PTH via DTH, FLN and PNN and also a negative effect via PNN (p < 0.05). DTH also had a significant indirect effect via FLN and PNN (p < 0.05).
For TRJ, FLN and PNN accounted for the total variation in HHGW (R2 = 0.50). Both traits had a negative direct effect on HHGW, (β = -0.61, p < 0.05) and (β = -0.94, p < 0.05) for FLN and PNN respectively. A significant indirect effect was observed in PTHT via DTH and FLN (p < 0.05), while other indirect effects were insignificant (p > 0.05).
Path Summary
# Path summary
# I created a result summary table on excel
sem <- read.csv("./raw_data/SEM_result.csv")
sem %>%
kbl(format = 'html') %>%
kable_paper() %>%
scroll_box(width = "700px",
height = "350px") # Using KableExtra table formatting
Effect | ADMIX | ARO | AUS | IND | TEJ | TRJ | Proposed.interpretation |
---|---|---|---|---|---|---|---|
FLN_HHG | -0.77 | -4.88 | 0.98 | -0.98 | NS | -0.61 | HHG variation is strongly influenced by FLN; Decreasing FLN increases grain weight except in TEJ |
PNN_HHG | -0.44 | 0.18 | -0.55 | NS | -0.42 | -0.9 | HHG variation is strongly influenced by PNN except in IND |
FGN_HHG | -0.42 | 1.57 | 0.46 | 0.71 | NS | NS | Increase in filled grain per panicle increases test weight except in TEJ and TRJ |
DTH_PTH | -0.46 | NS | NS | NS | 0.21 | 0.23 | There is a relationship between DTH and PTH in ADMIX, TEJ and TRJ, although causality is not asumed |
DTH_FGN | NS | NS | -0.41 | NS | 0.53 | 0.56 | DTH either increases or decreses FGN depending on the rice subpopulation |
DTH_PNN | NS | NS | NS | NS | NS | NS | Panicle number is not dependent on days to heading |
DTH_FLN | 0.5 | NS | NS | NS | 0.53 | 0.35 | DTH regulates FLN |
PTH_FGN | NS | NS | NS | NS | NS | NS | PTH controls FGN in AUS |
PTH_FLN | NS | NS | NS | NS | 0.5 | NS | PTH controls FLN in TEJ |
PTH_PNN | NS | NS | NS | NS | -0.35 | NS | PTH controls PNN in TEJ |
Predicting test weight with Florescence per panicle
# Does florescence per panicle predict test weight?
# Explore data
ggplot(norm1,
aes(x = HHGW, y = FLN)) +
geom_point() +
stat_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'
# Fit a model
rice_mlr <- glm(FLN ~ HHGW + region,
data = norm1,
family = gaussian(link = "identity"))
# assumptions
plot(rice_mlr, which = 1)
plot(rice_mlr, which = 2)
shapiro.test(residuals(rice_mlr))
##
## Shapiro-Wilk normality test
##
## data: residuals(rice_mlr)
## W = 0.96349, p-value = 6.921e-08
plot(rice_mlr, which = 4)
# view residuals
residualPlots(rice_mlr)
## Test stat Pr(>|Test stat|)
## HHGW 0.0214 0.8838
# include residuals into data
norm1 <- norm1 %>%
add_residuals(rice_mlr)
# plot residuals
qplot(subpopulation, resid, data = norm1, geom = "boxplot")
# test of interaction between mass and species
rice_int <- glm(FLN ~ HHGW * region * subpopulation,
data = norm1,
family = gaussian(link = "identity"))
residualPlots(rice_int)
## Test stat Pr(>|Test stat|)
## HHGW 0.0028 0.958
Anova(rice_int)
## Analysis of Deviance Table (Type II tests)
##
## Response: FLN
## LR Chisq Df Pr(>Chisq)
## HHGW 22.643 1 1.950e-06 ***
## region 10.697 12 0.55507
## subpopulation 41.480 5 7.504e-08 ***
## HHGW:region 8.996 12 0.70330
## HHGW:subpopulation 6.178 5 0.28931
## region:subpopulation 45.525 30 0.03447 *
## HHGW:region:subpopulation 16.529 19 0.62177
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(rice_mlr)
## Analysis of Deviance Table (Type II tests)
##
## Response: FLN
## LR Chisq Df Pr(>Chisq)
## HHGW 19.288 1 1.124e-05 ***
## region 40.182 12 6.708e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# post-hocs what is the effect of subpopulation and region at the average level of HHGW
rice_em <- emmeans(rice_mlr, specs = ~ region,
at = list(HHGW = 4)) # for specificity
# check which slopes are statistically different
rice_em %>%
contrast(method = "tukey") %>%
data.frame() %>%
kbl(format = 'html') %>%
kable_paper() %>%
scroll_box(width = "700px",
height = "350px")
contrast | estimate | SE | df | z.ratio | p.value |
---|---|---|---|---|---|
Africa - Caribbean | 0.0337395 | 0.4092806 | Inf | 0.0824361 | 1.0000000 |
Africa - Central America | -0.0732953 | 0.4088549 | Inf | -0.1792698 | 1.0000000 |
Africa - Central Asia | 0.9839374 | 0.4125749 | Inf | 2.3848698 | 0.4546777 |
Africa - East Asia | 0.5121375 | 0.1945159 | Inf | 2.6328831 | 0.2902848 |
Africa - East Europe | 0.6681605 | 0.2842514 | Inf | 2.3505973 | 0.4795893 |
Africa - North America | -0.1902123 | 0.2119240 | Inf | -0.8975497 | 0.9995857 |
Africa - Oceania | 0.4180241 | 0.4131480 | Inf | 1.0118022 | 0.9986235 |
Africa - South America | 0.2539267 | 0.2304817 | Inf | 1.1017221 | 0.9968816 |
Africa - South Asia | 0.4909530 | 0.2105321 | Inf | 2.3319626 | 0.4932626 |
Africa - Southeast Asia | 0.0977997 | 0.2125899 | Inf | 0.4600395 | 0.9999997 |
Africa - West Asia | 1.1566380 | 0.3214016 | Inf | 3.5987313 | 0.0193922 |
Africa - West Europe | 0.3249966 | 0.2551319 | Inf | 1.2738379 | 0.9884219 |
Caribbean - Central America | -0.1070348 | 0.5362086 | Inf | -0.1996141 | 1.0000000 |
Caribbean - Central Asia | 0.9501979 | 0.5372101 | Inf | 1.7687641 | 0.8653087 |
Caribbean - East Asia | 0.4783980 | 0.3954042 | Inf | 1.2098963 | 0.9926346 |
Caribbean - East Europe | 0.6344211 | 0.4489458 | Inf | 1.4131352 | 0.9725897 |
Caribbean - North America | -0.2239518 | 0.4050367 | Inf | -0.5529173 | 0.9999978 |
Caribbean - Oceania | 0.3842846 | 0.5374829 | Inf | 0.7149708 | 0.9999625 |
Caribbean - South America | 0.2201873 | 0.4154044 | Inf | 0.5300552 | 0.9999987 |
Caribbean - South Asia | 0.4572135 | 0.4017533 | Inf | 1.1380455 | 0.9957791 |
Caribbean - Southeast Asia | 0.0640602 | 0.4044683 | Inf | 0.1583814 | 1.0000000 |
Caribbean - West Asia | 1.1228985 | 0.4716060 | Inf | 2.3810096 | 0.4574661 |
Caribbean - West Europe | 0.2912571 | 0.4326647 | Inf | 0.6731706 | 0.9999806 |
Central America - Central Asia | 1.0572327 | 0.5379487 | Inf | 1.9653040 | 0.7557753 |
Central America - East Asia | 0.5854329 | 0.3963321 | Inf | 1.4771271 | 0.9612775 |
Central America - East Europe | 0.7414559 | 0.4486327 | Inf | 1.6527013 | 0.9130482 |
Central America - North America | -0.1169170 | 0.4055887 | Inf | -0.2882649 | 1.0000000 |
Central America - Oceania | 0.4913194 | 0.5382968 | Inf | 0.9127296 | 0.9995083 |
Central America - South America | 0.3272221 | 0.4157816 | Inf | 0.7870046 | 0.9998949 |
Central America - South Asia | 0.5642483 | 0.4034667 | Inf | 1.3985001 | 0.9747789 |
Central America - Southeast Asia | 0.1710950 | 0.4054356 | Inf | 0.4220030 | 0.9999999 |
Central America - West Asia | 1.2299333 | 0.4720944 | Inf | 2.6052697 | 0.3068046 |
Central America - West Europe | 0.3982919 | 0.4316315 | Inf | 0.9227592 | 0.9994506 |
Central Asia - East Asia | -0.4717999 | 0.3941696 | Inf | -1.1969464 | 0.9933115 |
Central Asia - East Europe | -0.3157769 | 0.4516975 | Inf | -0.6990892 | 0.9999707 |
Central Asia - North America | -1.1741497 | 0.4050387 | Inf | -2.8988582 | 0.1593394 |
Central Asia - Oceania | -0.5659133 | 0.5361288 | Inf | -1.0555548 | 0.9979246 |
Central Asia - South America | -0.7300107 | 0.4159540 | Inf | -1.7550275 | 0.8716449 |
Central Asia - South Asia | -0.4929844 | 0.3977910 | Inf | -1.2393052 | 0.9908876 |
Central Asia - Southeast Asia | -0.8861377 | 0.4030556 | Inf | -2.1985494 | 0.5922718 |
Central Asia - West Asia | 0.1727006 | 0.4715593 | Inf | 0.3662330 | 1.0000000 |
Central Asia - West Europe | -0.6589408 | 0.4379017 | Inf | -1.5047686 | 0.9554425 |
East Asia - East Europe | 0.1560230 | 0.2677101 | Inf | 0.5828058 | 0.9999961 |
East Asia - North America | -0.7023499 | 0.1786253 | Inf | -3.9319726 | 0.0056072 |
East Asia - Oceania | -0.0941134 | 0.3942204 | Inf | -0.2387331 | 1.0000000 |
East Asia - South America | -0.2582108 | 0.2020626 | Inf | -1.2778753 | 0.9881016 |
East Asia - South Asia | -0.0211846 | 0.1623590 | Inf | -0.1304798 | 1.0000000 |
East Asia - Southeast Asia | -0.4143378 | 0.1743633 | Inf | -2.3762901 | 0.4608816 |
East Asia - West Asia | 0.6445004 | 0.3003711 | Inf | 2.1456808 | 0.6311814 |
East Asia - West Europe | -0.1871409 | 0.2433461 | Inf | -0.7690318 | 0.9999179 |
East Europe - North America | -0.8583729 | 0.2807830 | Inf | -3.0570690 | 0.1055434 |
East Europe - Oceania | -0.2501364 | 0.4521913 | Inf | -0.5531651 | 0.9999978 |
East Europe - South America | -0.4142338 | 0.2951159 | Inf | -1.4036310 | 0.9740272 |
East Europe - South Asia | -0.1772076 | 0.2791821 | Inf | -0.6347383 | 0.9999899 |
East Europe - Southeast Asia | -0.5703608 | 0.2810882 | Inf | -2.0291166 | 0.7136716 |
East Europe - West Asia | 0.4884774 | 0.3704347 | Inf | 1.3186602 | 0.9844510 |
East Europe - West Europe | -0.3431639 | 0.3153776 | Inf | -1.0881050 | 0.9972271 |
North America - Oceania | 0.6082364 | 0.4052299 | Inf | 1.5009663 | 0.9562811 |
North America - South America | 0.4441391 | 0.2209937 | Inf | 2.0097362 | 0.7267339 |
North America - South Asia | 0.6811653 | 0.1894020 | Inf | 3.5963995 | 0.0195514 |
North America - Southeast Asia | 0.2880121 | 0.1976721 | Inf | 1.4570191 | 0.9651515 |
North America - West Asia | 1.3468503 | 0.3137507 | Inf | 4.2927399 | 0.0012543 |
North America - West Europe | 0.5152090 | 0.2559775 | Inf | 2.0127116 | 0.7247432 |
Oceania - South America | -0.1640974 | 0.4162045 | Inf | -0.3942709 | 1.0000000 |
Oceania - South Asia | 0.0729289 | 0.3975169 | Inf | 0.1834611 | 1.0000000 |
Oceania - Southeast Asia | -0.3202244 | 0.4030811 | Inf | -0.7944416 | 0.9998839 |
Oceania - West Asia | 0.7386139 | 0.4717179 | Inf | 1.5657958 | 0.9403280 |
Oceania - West Europe | -0.0930275 | 0.4386902 | Inf | -0.2120573 | 1.0000000 |
South America - South Asia | 0.2370262 | 0.2126491 | Inf | 1.1146353 | 0.9965213 |
South America - Southeast Asia | -0.1561270 | 0.2191544 | Inf | -0.7124065 | 0.9999640 |
South America - West Asia | 0.9027112 | 0.3273863 | Inf | 2.7573277 | 0.2225328 |
South America - West Europe | 0.0710699 | 0.2708806 | Inf | 0.2623661 | 1.0000000 |
South Asia - Southeast Asia | -0.3931532 | 0.1824945 | Inf | -2.1543297 | 0.6248588 |
South Asia - West Asia | 0.6656850 | 0.3068328 | Inf | 2.1695369 | 0.6136967 |
South Asia - West Europe | -0.1659563 | 0.2596732 | Inf | -0.6390970 | 0.9999891 |
Southeast Asia - West Asia | 1.0588382 | 0.3120686 | Inf | 3.3929662 | 0.0387769 |
Southeast Asia - West Europe | 0.2271969 | 0.2582749 | Inf | 0.8796709 | 0.9996630 |
West Asia - West Europe | -0.8316413 | 0.3520620 | Inf | -2.3622016 | 0.4711173 |
# plot comparison
multcomp::cld(rice_em, adjust="tukey") %>%
ggplot(aes(x = region, y = emmean,
ymin = asymp.LCL, ymax = asymp.UCL,
color = factor(.group))) +
geom_pointrange() +
coord_flip() +
labs(title = "Slope comparison across regions")
## Note: adjust = "tukey" was changed to "sidak"
## because "tukey" is only appropriate for one set of pairwise comparisons
# visualize!
rice_newdat <- modelr::data_grid(norm1,
region = unique(region),
HHGW = seq_range(HHGW, n = 100))
# create data predictions
rice_predict <- predict(rice_mlr,
newdata = rice_newdat,
type = "response",
interval = "confidence")
# add data predictions
rice_newdat <- rice_newdat %>%
mutate(FLN = rice_predict)
# plot regression on predicted data
ggplot(norm1,
aes(x = HHGW, y = FLN, color = region)) +
geom_point()+
geom_line(data = rice_newdat) +
labs(title = "Regression on predicted data")
# compare means of new fit
rice_newfit <- emmeans(rice_mlr, specs = ~ region + HHGW,
at = list(HHGW = seq(1,3, length.out = 100))) %>%
as_tibble() %>%
mutate(FLN = emmean)
# plot regresion slopes with prediction intervals
ggplot(norm1,
aes(x = HHGW, y = FLN, color = region)) +
geom_point() +
geom_line(data = rice_newfit) +
geom_ribbon(data = rice_newfit,
aes(ymin = asymp.LCL, ymax = asymp.UCL, group = region, colour = region),
alpha = 0.1, color = "lightgrey") +
theme_classic() +
labs(title = "Regression on predicted data showing prediction intervals")
The linear regression passed all necessary assumptions. All residuals were clustered close to the regression line and the cook distance was 0.04 indicating normality of data. Comparison of slopes with ‘emmeans’ showed that slopes were not statistically different. Regression with the predicted data revealed slope alignment within the predicted values while all regression lines fell within the prediction interval.
Total florets per panicle has the most significant association with test weight which agrees with the result of (Ren et al., 2020), where altering meristem determinacy and restoring lateral floret formation significantly increased grains per panicle and subsequently yield. Plant height is a significant morphological trait influencing yield performance. Molecular cloning and functional analyses of several genes associated with PTH in rice have revealed connections to the synthesis and regulation of the phytohormone (gibberellin) (Miedaner et al., 2019). TEJ was observed to have the highest test weight which shows a significant relationship with plant height based on the SEM analysis. Indirect effect traits like DTH affects test weight via FLN and should be fully considered for breeding purposes in TEJ and TRJ subpopulations.
This research was able to assess the relationship between test weight and other agronomic traits and identify significant associations. The major determinant of test weight is the total floret per panicle except in Temperate japonica, where test weight is determined by plant height. Also the prediction of test weight across regions using floret number gave a positive result. Results obtained from this study will improve crop management techniques in rice breeders.
Deivasigamani S, & Swaminathan C. (2018). Evaluation of Seed Test Weight on Major Field Crops. Www.Arcjournals.Org International Journal of Research Studies in Agricultural Sciences, 4(1), 2454–6224. crossref
Eizenga, G. C., Ali, M. L., Bryant, R. J., Yeater, K. M., McClung, A. M., & McCouch, S. R. (2014). Registration of the Rice Diversity Panel 1 for Genomewide Association Studies. Journal of Plant Registrations, 8(1), 109–116. crossref
Li, R., Li, M., Ashraf, U., Liu, S., & Zhang, J. (2019). Exploring the relationships between yield and yield-related traits for rice varieties released in china from 1978 to 2017. Frontiers in Plant Science. crossref
Miedaner, T., Laidig, F., Zhao, J., Zhang, J., Li, R., Li, M., Ashraf, U., & Liu, S. (2019). Exploring the Relationships Between Yield and Yield-Related Traits for Rice Varieties Released in China From 1978 to 2017. crossref
Prakash Poudel, A. (2018). Analysis of Growth and Yield Attributing Characteristic of Direct Seeded Upland Rice Genotypes in Western Hills of Nepal. In Acta Scientific Agriculture (Vol. 2).
Puji Lestari, A., Sopandie, D., & Aswidinnoor, H. (2015). Panicle Length and Weight Performance of F3 Population from Local and Introduction Hybridization of Rice Varieties. HAYATI Journal of Biosciences April, 22(2), 87–92. crossref
Regression, M., & Models, E. (2005). c ontRibutions Commentary. Commentary, 86(October), 283–295. [crossref](http://www.esajournals.org/doi/abs/10.1890/0012-9623(2007)88[72:AHOTES]2.0.CO;2
Ren, D., Li, Y., He, G., & Qian, Q. (2020). Multifloret spikelet improves rice yield. New Phytologist, 225(6), 2301–2306.[crossref] (https://doi.org/10.1111/nph.16303)
Saleh, M. M., Salem, K. F. M., & Elabd, A. B. (2020). Definition of selection criterion using correlation and path coefficient analysis in rice (Oryza sativa L .) genotypes.
Venkateswarlu, B., Prasad, G. S. V, & Prasad, A. S. R. (n.d.). STUDIES ON THE NATURE OF RELATIONSHIPS BETWEEN GRAIN SIZE , SPIKELET NUMBER , GRAIN YIELD AND SPIKELET FILLING IN LATE DURATION VARIETIES OF RICE ( OR YZA SA TIVA L .) Grain number Grain size Productivity Rate of grain growth Rice Spikelet filling. 130(1981), 123–130.