Introduction

Over half of the world’s population depend on rice for survival as it serves as a major staple food for an estimated 66% of people in almost all regions. Therefore, increased rice production will play a crucial role in improving food security, eradicating poverty and supplementing energy demands as it currently provides over 20% of energy. Predictions have shown that when compared with 1995, 60% more rice will have to be produced by 2030 (Saleh et al., 2020).

Influenced by quantitative trait loci (QTL) genes and environmental factors, yield is one of the most significant and complex trait in rice. In rice species, yield is determined by certain factors, classified as either direct or indirect. Direct determinants of yield include 1000 or 100 – grain weight depending on the size of the grain, filled grains per panicle and panicle number per unit area and/or per plant, while indirect determinants include traits like plant height, growth period, tillering ability, panicle length, seed length, seed setting rate, and grains per panicle (Li et al., 2019).

One major predictor of milling yield is test weight, a component used for the comparison of grain densities and it is derived by weighing a known grain volume. It is dependent on embryo size and reserved nutrient quantity which is used for germination and growth. Lower test weight scores occur due to unfavorable field environmental conditions or poor grain filling prior to harvest. Depending on grain size, the 100 or 1000 grain weight is an important measure of seed quality which has a positive effect on seedling growth, seed potential, sprouting and overall plant performance (Deivasigamani S & Swaminathan C, 2018).

Over the years, studies have focused on overall rice yield prediction using agronomic traits (Miedaner et al., 2019; Prakash Poudel, 2018; Puji Lestari et al., 2015; Venkateswarlu et al., n.d.), where test weight has been the leading determinant of grain yield. However, the realization of yield potential which is measured by multiplying the number of spikelets per unit area by average grain weight is highly dependent on some latent factors (Venkateswarlu et al., n.d.). Considering the strong association between test weight and increased yield potential, it will be relatively important to explore factors that might influence or control test weight in rice cultivars. Therefore, this study seeks to;

  1. Explore trait variation across subpopulations and regions.

  2. Examine the extent of association between test weight (100 seed weight) and yield attributing traits across rice subpopulations and geographical regions.

  3. Identify direct and latent factors influencing test weight across subpopulations.

Materials and Methods

Rice Diversity Panel 1 Dataset

The RDP1 (Reg. No. MP-6, NSL 500357 MAP) dataset is used in this study here. It broadly represents the wide range of phenotypic and genotypic variation within O. sativa. It consists of 421 purified homozygous rice (Oryza sativa L.) accessions (GSOR 301001 through GSOR 301421; GSOR 312001 through 312020) which include both landraces and elite rice cultivars. Accessions are classified into five subpopulation groups namely; indica (95 accessions) and aus (60), which belong to the Indica varietal group, and tropical japonica (106), temperate japonica (111), and aromatic (Group V) (16) which comprise the Japonica varietal group. Thirty-three accessions shared <60% ancestry with a single group and were classified as admixtures (Eizenga et al., 2014). Data can be downloaded here (usda.gov). The data selected for the analysis included the following trait data; 100 seed weight (HHGW), panicle number per unit area (PNN), filled grain number per panicle (FGN), plant height (PTH), panicle length (PNL), total florets per panicle (FLN).

Statistical Analysis

All statistical analysis were carried out in R version 4.0.2. Data was loaded into R studio and all missing values and unknown groups were filtered out. For descriptive statistics, boxplots were plotted using the ‘ggplot2’ package to show the distribution of traits across all regions and subpopulations. Before inferential statistical analysis were performed, data was preprocessed and normalized using the caret package after which density plots were drawn to visualize normality. PCA analysis was performed using the ‘ggbiplot’ package. Standardized major axis (SMA) slopes were used for the summary of bivariate trait relationships and to test whether the regression of test weight and other agronomic traits differed among subpopulations and regions. SMA regression was carried out using the ‘smart’ package. Multivariate causal network for all selected traits was quantified with a Structural Equation Model (SEM) using the ‘lavaan’ package. SEM combines multiple regression analysis and factor analysis for the evaluation of structural relationships, estimating interrelated dependence in a single analysis (Li et al., 2019; Regression & Models, 2005). A general linear regression was performed, where the most correlated variable, FLN was used as predictor for test weight across subpopulation and region. All relevant assumptions were tested and post hoc analyses were carried out.

Analysis and Results

# load all required libraries
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(smatr)
library(Cairo)
library(extrafont)
## Registering fonts with R
library(clusterSim)
## Loading required package: cluster
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
library(grid)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
library(ggpubr)
library(lavaan)
## This is lavaan 0.6-7
## lavaan is BETA software! Please report any bugs.
library(semPlot)
## Registered S3 methods overwritten by 'lme4':
##   method                          from
##   cooks.distance.influence.merMod car 
##   influence.merMod                car 
##   dfbeta.influence.merMod         car 
##   dfbetas.influence.merMod        car
## Registered S3 methods overwritten by 'BDgraph':
##   method    from
##   plot.sim  lava
##   print.sim lava
## Registered S3 methods overwritten by 'huge':
##   method    from   
##   plot.roc  pROC   
##   plot.sim  BDgraph
##   print.roc pROC   
##   print.sim BDgraph
library(devtools)
## Loading required package: usethis
library(multcomp)
## Loading required package: mvtnorm
## Loading required package: survival
## 
## Attaching package: 'survival'
## The following object is masked from 'package:caret':
## 
##     cluster
## Loading required package: TH.data
## 
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
## 
##     geyser
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
library(emmeans)
## 
## Attaching package: 'emmeans'
## The following object is masked from 'package:devtools':
## 
##     test
library(modelr)
#library(ggbiplot)
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows

Distribution of subpopulations across regions

# load data and clean names
rice <- read.csv("./raw_data/rice_phenotype_data.csv") %>%
  janitor::clean_names() %>%
  filter(!subpopulation_group_pca %in% "N/A") %>%
  filter(!region %in% "Unknown") %>%
  na.omit()
  
# view data
head(rice)
##   nsftv_id gsor_id irgc_id          name orignal_providing_country
## 1        1  301001  126380      Agostano                     Italy
## 2        3  301003  117636 Ai-Chiao-Hong                     China
## 3        6  301006  117603      ARC 7229                     India
## 4        7  301007  126381         Arias                 Indonesia
## 5        8  301008  117643   Asse Y Pung               Philippines
## 6        9  301009  117647         Baber                     India
##           region subpopulation_group_structure subpopulation_group_pca
## 1    West Europe                           TEJ                     TEJ
## 2      East Asia                           IND                     IND
## 3     South Asia                           AUS                     AUS
## 4 Southeast Asia                           TRJ                     TRJ
## 5 Southeast Asia                           TRJ                     TRJ
## 6     South Asia                       ADMIX_J                     TEJ
##   subpopulation_group_fast_structure amycn alkdig dhulprotcn  dthd  ptht flflg
## 1                 temperate-japonica  15.6    6.1        8.4  75.1 110.9  28.4
## 2                             indica  23.1    5.8        9.2  89.5 143.5    39
## 3                                aus  23.2    5.5        8.5  89.1 148.3  36.9
## 4                  tropical-japonica  20.1    6.3       11.8   105 119.6    37
## 5                  tropical-japonica  22.2    6.6        8.7 101.6 147.4    42
## 6                 temperate-japonica  22.1    5.4        9.9    73 128.9  24.4
##   flfwd pnnb pnlg pbrnb flnbppn unfilgrnb filgrnb hulgrlg hulgrwd hulgrlgwdro
## 1   1.3 21.5 20.5   9.3   136.3      16.5   119.8     8.1     3.7         2.2
## 2     1 57.5 26.8   9.2     113      28.3    84.8     7.7       3         2.6
## 3   1.8 17.4 30.9  11.2   291.9      37.6   254.3     7.1     3.3         2.2
## 4   1.6 17.4 24.6  13.3   229.5      21.4   208.1     7.3     2.5         2.9
## 5   1.9 13.8 31.5  12.2   328.6      72.1   256.4     8.6     2.7         3.2
## 6   1.2 18.6 20.2   7.9   114.4      12.9   101.5     7.7     3.4         2.3
##   hulgrvol hhulgrwt dhulgrlg dhulgrwd dhulgrvol culmhab sdsh awnplu lflpubes
## 1     13.3      3.1      5.8      3.1       7.7       4  yes      0      yes
## 2      8.3      2.3      5.6      2.6       5.1       8  yes      0      yes
## 3        9      2.2      5.1      2.8       5.6       6  yes      0      yes
## 4      5.9      2.2      5.5      2.2       3.6       3  yes      1      yes
## 5      7.7        2      6.5      2.2       4.5       3  yes      0      yes
## 6     10.5      2.7      5.5      2.9       6.4       7   no      5      yes
##   hulcl dhulgrcl lfblrs strthead
## 1     2        2      6      4.8
## 2     2        1      3      7.8
## 3     4        2      0      8.2
## 4     2        2    1.5      8.6
## 5     3        2    1.5        7
## 6     6        2    6.5      7.2
# check data structure
str(rice)
## 'data.frame':    383 obs. of  38 variables:
##  $ nsftv_id                          : chr  "1" "3" "6" "7" ...
##  $ gsor_id                           : chr  "301001" "301003" "301006" "301007" ...
##  $ irgc_id                           : chr  "126380" "117636" "117603" "126381" ...
##  $ name                              : chr  "Agostano" "Ai-Chiao-Hong" "ARC 7229" "Arias" ...
##  $ orignal_providing_country         : chr  "Italy" "China" "India" "Indonesia" ...
##  $ region                            : chr  "West Europe" "East Asia" "South Asia" "Southeast Asia" ...
##  $ subpopulation_group_structure     : chr  "TEJ" "IND" "AUS" "TRJ" ...
##  $ subpopulation_group_pca           : chr  "TEJ" "IND" "AUS" "TRJ" ...
##  $ subpopulation_group_fast_structure: chr  "temperate-japonica" "indica" "aus" "tropical-japonica" ...
##  $ amycn                             : chr  "15.6" "23.1" "23.2" "20.1" ...
##  $ alkdig                            : chr  "6.1" "5.8" "5.5" "6.3" ...
##  $ dhulprotcn                        : chr  "8.4" "9.2" "8.5" "11.8" ...
##  $ dthd                              : chr  "75.1" "89.5" "89.1" "105" ...
##  $ ptht                              : chr  "110.9" "143.5" "148.3" "119.6" ...
##  $ flflg                             : chr  "28.4" "39" "36.9" "37" ...
##  $ flfwd                             : chr  "1.3" "1" "1.8" "1.6" ...
##  $ pnnb                              : chr  "21.5" "57.5" "17.4" "17.4" ...
##  $ pnlg                              : chr  "20.5" "26.8" "30.9" "24.6" ...
##  $ pbrnb                             : chr  "9.3" "9.2" "11.2" "13.3" ...
##  $ flnbppn                           : chr  "136.3" "113" "291.9" "229.5" ...
##  $ unfilgrnb                         : chr  "16.5" "28.3" "37.6" "21.4" ...
##  $ filgrnb                           : chr  "119.8" "84.8" "254.3" "208.1" ...
##  $ hulgrlg                           : chr  "8.1" "7.7" "7.1" "7.3" ...
##  $ hulgrwd                           : chr  "3.7" "3" "3.3" "2.5" ...
##  $ hulgrlgwdro                       : chr  "2.2" "2.6" "2.2" "2.9" ...
##  $ hulgrvol                          : chr  "13.3" "8.3" "9" "5.9" ...
##  $ hhulgrwt                          : chr  "3.1" "2.3" "2.2" "2.2" ...
##  $ dhulgrlg                          : chr  "5.8" "5.6" "5.1" "5.5" ...
##  $ dhulgrwd                          : chr  "3.1" "2.6" "2.8" "2.2" ...
##  $ dhulgrvol                         : chr  "7.7" "5.1" "5.6" "3.6" ...
##  $ culmhab                           : chr  "4" "8" "6" "3" ...
##  $ sdsh                              : chr  "yes" "yes" "yes" "yes" ...
##  $ awnplu                            : chr  "0" "0" "0" "1" ...
##  $ lflpubes                          : chr  "yes" "yes" "yes" "yes" ...
##  $ hulcl                             : chr  "2" "2" "4" "2" ...
##  $ dhulgrcl                          : chr  "2" "1" "2" "2" ...
##  $ lfblrs                            : chr  "6" "3" "0" "1.5" ...
##  $ strthead                          : chr  "4.8" "7.8" "8.2" "8.6" ...
##  - attr(*, "na.action")= 'omit' Named int [1:22] 9 10 38 48 55 72 74 80 91 114 ...
##   ..- attr(*, "names")= chr [1:22] "9" "10" "38" "48" ...
# convert characters to class numeric
rice2 <- rice %>%
  mutate(hhulgrwt1 = as.numeric(hhulgrwt),
         hulgrvol1 = as.numeric(hulgrvol),
         ptht1 = as.numeric(ptht),
         pnlg1 = as.numeric(pnlg),
         pnnb1 = as.numeric(pnnb),
         dthd1 = as.numeric(dthd),
         filgrnb1 = as.numeric(filgrnb),
         flnbppn1 = as.numeric(flnbppn))

# str data
str(rice2)
## 'data.frame':    383 obs. of  46 variables:
##  $ nsftv_id                          : chr  "1" "3" "6" "7" ...
##  $ gsor_id                           : chr  "301001" "301003" "301006" "301007" ...
##  $ irgc_id                           : chr  "126380" "117636" "117603" "126381" ...
##  $ name                              : chr  "Agostano" "Ai-Chiao-Hong" "ARC 7229" "Arias" ...
##  $ orignal_providing_country         : chr  "Italy" "China" "India" "Indonesia" ...
##  $ region                            : chr  "West Europe" "East Asia" "South Asia" "Southeast Asia" ...
##  $ subpopulation_group_structure     : chr  "TEJ" "IND" "AUS" "TRJ" ...
##  $ subpopulation_group_pca           : chr  "TEJ" "IND" "AUS" "TRJ" ...
##  $ subpopulation_group_fast_structure: chr  "temperate-japonica" "indica" "aus" "tropical-japonica" ...
##  $ amycn                             : chr  "15.6" "23.1" "23.2" "20.1" ...
##  $ alkdig                            : chr  "6.1" "5.8" "5.5" "6.3" ...
##  $ dhulprotcn                        : chr  "8.4" "9.2" "8.5" "11.8" ...
##  $ dthd                              : chr  "75.1" "89.5" "89.1" "105" ...
##  $ ptht                              : chr  "110.9" "143.5" "148.3" "119.6" ...
##  $ flflg                             : chr  "28.4" "39" "36.9" "37" ...
##  $ flfwd                             : chr  "1.3" "1" "1.8" "1.6" ...
##  $ pnnb                              : chr  "21.5" "57.5" "17.4" "17.4" ...
##  $ pnlg                              : chr  "20.5" "26.8" "30.9" "24.6" ...
##  $ pbrnb                             : chr  "9.3" "9.2" "11.2" "13.3" ...
##  $ flnbppn                           : chr  "136.3" "113" "291.9" "229.5" ...
##  $ unfilgrnb                         : chr  "16.5" "28.3" "37.6" "21.4" ...
##  $ filgrnb                           : chr  "119.8" "84.8" "254.3" "208.1" ...
##  $ hulgrlg                           : chr  "8.1" "7.7" "7.1" "7.3" ...
##  $ hulgrwd                           : chr  "3.7" "3" "3.3" "2.5" ...
##  $ hulgrlgwdro                       : chr  "2.2" "2.6" "2.2" "2.9" ...
##  $ hulgrvol                          : chr  "13.3" "8.3" "9" "5.9" ...
##  $ hhulgrwt                          : chr  "3.1" "2.3" "2.2" "2.2" ...
##  $ dhulgrlg                          : chr  "5.8" "5.6" "5.1" "5.5" ...
##  $ dhulgrwd                          : chr  "3.1" "2.6" "2.8" "2.2" ...
##  $ dhulgrvol                         : chr  "7.7" "5.1" "5.6" "3.6" ...
##  $ culmhab                           : chr  "4" "8" "6" "3" ...
##  $ sdsh                              : chr  "yes" "yes" "yes" "yes" ...
##  $ awnplu                            : chr  "0" "0" "0" "1" ...
##  $ lflpubes                          : chr  "yes" "yes" "yes" "yes" ...
##  $ hulcl                             : chr  "2" "2" "4" "2" ...
##  $ dhulgrcl                          : chr  "2" "1" "2" "2" ...
##  $ lfblrs                            : chr  "6" "3" "0" "1.5" ...
##  $ strthead                          : chr  "4.8" "7.8" "8.2" "8.6" ...
##  $ hhulgrwt1                         : num  3.1 2.3 2.2 2.2 2 2.7 2.6 1.6 4.4 2.4 ...
##  $ hulgrvol1                         : num  13.3 8.3 9 5.9 7.7 10.5 9.1 6.1 10 7.4 ...
##  $ ptht1                             : num  111 144 148 120 147 ...
##  $ pnlg1                             : num  20.5 26.8 30.9 24.6 31.5 20.2 21.1 17.6 22.5 30.6 ...
##  $ pnnb1                             : num  21.5 57.5 17.4 17.4 13.8 18.6 36.7 39 26.7 30.9 ...
##  $ dthd1                             : num  75.1 89.5 89.1 105 101.6 ...
##  $ filgrnb1                          : num  119.8 84.8 254.3 208.1 256.4 ...
##  $ flnbppn1                          : num  136 113 292 230 329 ...
##  - attr(*, "na.action")= 'omit' Named int [1:22] 9 10 38 48 55 72 74 80 91 114 ...
##   ..- attr(*, "names")= chr [1:22] "9" "10" "38" "48" ...
# Distribution of subpopulations across all regions

# create an object for total region counts
totals <- rice2 %>% 
  count(region)

# create a barplot object
barplot <- rice2 %>%
  count(subpopulation_group_pca, region) %>%
  filter(n !=10) %>%
  ggplot(aes(y = region, x = n)) +
  geom_bar(stat = "identity", aes(fill = subpopulation_group_pca)) +
  theme(axis.text.x = element_text(angle = 90, vjust=0.3, hjust=1)) +
  geom_text(data = totals, aes(label = n), cex = 3) +
  labs(y ="Regions", title = "Distribution of subpopulations across regions") 

# visulaize barplot
barplot

East Asia had the highest number of accessions (76) consisting of four sub-populations, with Temperate Japonica (TEJ) and Indica varieties (IND) occupying a significant proportion. The Admixed (ADMIX) subpopulation can be found in almost all regions except Oceania, Central Asia, Central America and the Caribbean regions with the lowest number of accessions (6)

Trait differences across regions and sub-populations

# trait differences across regions and across sub-populations
hhulgrwt <- ggplot(rice2,
       aes(y = hhulgrwt1, fill = region)) +
  geom_boxplot() +
  labs(y ="100-Seed weight") 


ptht <- ggplot(rice2,
       aes(y = ptht1, fill = region)) +
  geom_boxplot() +
  labs(y ="Plant Height") 


pnlg <- ggplot(rice2,
       aes(y = pnlg1, fill = region)) +
  geom_boxplot() +
  labs(y ="Panicle length
Inflorescence length") 


pnnb <- ggplot(rice2,
       aes(y = pnnb1, fill = region)) +
  geom_boxplot() +
  labs(y ="Panicles per plant
Inflorescence per plant") 


dthd <- ggplot(rice2,
       aes(y = dthd1, fill = region)) +
  geom_boxplot() +
  labs(y ="Days to heading") 


filgrnb <- ggplot(rice2,
       aes(y = filgrnb1, fill = region)) +
  geom_boxplot() +
  labs(y ="Seeds per panicle
Filled florets per panicle")


flnbppn <- ggplot(rice2,
       aes(y = flnbppn1, fill = region)) +
  geom_boxplot() +
  labs(y ="Total Florets per panicle")



ggarrange(hhulgrwt, ptht, pnlg, pnnb, dthd, filgrnb, flnbppn  + rremove("x.text"), common.legend = TRUE, ncol = 2, nrow = 2)
## $`1`

## 
## $`2`

## 
## attr(,"class")
## [1] "list"      "ggarrange"
#------------------------------------------------------------------------------


# trait differences across subpopulations
hhulgrwt <- ggplot(rice2,
       aes(y = hhulgrwt1, fill = subpopulation_group_pca)) +
  geom_boxplot() +
  labs(y ="100-Seed weight") 


ptht <- ggplot(rice2,
       aes(y = ptht1, fill = subpopulation_group_pca)) +
  geom_boxplot() +
  labs(y ="Plant Height") 


pnlg <- ggplot(rice2,
       aes(y = pnlg1, fill = subpopulation_group_pca)) +
  geom_boxplot() +
  labs(y ="Panicle length
Inflorescence length") 


pnnb <- ggplot(rice2,
       aes(y = pnnb1, fill = subpopulation_group_pca)) +
  geom_boxplot() +
  labs(y ="Panicles per plant
Inflorescence per plant") 


dthd <- ggplot(rice2,
       aes(y = dthd1, fill = subpopulation_group_pca)) +
  geom_boxplot() +
  labs(y ="Days to heading") 


filgrnb <- ggplot(rice2,
       aes(y = filgrnb1, fill = subpopulation_group_pca)) +
  geom_boxplot() +
  labs(y ="Seeds per panicle
Filled florets per panicle")


flnbppn <- ggplot(rice2,
       aes(y = flnbppn1, fill = subpopulation_group_pca)) +
  geom_boxplot() +
  labs(y ="Total Florets per panicle")


# arrange all plots side by side using ggarrange
ggarrange(hhulgrwt, ptht, pnlg, pnnb, dthd, filgrnb, flnbppn, rremove("x.text"), common.legend = TRUE, ncol = 2, nrow = 2)
## $`1`

## 
## $`2`

## 
## attr(,"class")
## [1] "list"      "ggarrange"

Data normalization for regression analysis

# scale and center data using Caret library
preproc1 <- preProcess(rice2[,39:46], method=c("center", "scale"))

# normalize data with mean 0 and standard deviation 1 
norm1 <- predict(preproc1, rice2[,39:46]) %>%
  mutate(region = rice2$region,
         subpopulation =rice2$subpopulation_group_pca) %>%
  na.omit() %>%
  dplyr::rename(HHGW = hhulgrwt1,
         HGV = hulgrvol1,
         PTH = ptht1,
         PNL = pnlg1,
         PNN = pnnb1,
         DTH = dthd1,
         FGN = filgrnb1,
         FLN = flnbppn1)

# summary of normalized data 
summary(norm1)
##       HHGW               HGV                PTH                 PNL          
##  Min.   :-3.13283   Min.   :-1.85980   Min.   :-2.404944   Min.   :-2.46837  
##  1st Qu.:-0.63691   1st Qu.:-0.69574   1st Qu.:-0.781939   1st Qu.:-0.61774  
##  Median :-0.22092   Median :-0.07214   Median :-0.017243   Median :-0.04441  
##  Mean   : 0.02193   Mean   : 0.01695   Mean   : 0.006418   Mean   : 0.01261  
##  3rd Qu.: 0.61105   3rd Qu.: 0.64501   3rd Qu.: 0.674510   3rd Qu.: 0.63778  
##  Max.   : 3.93895   Max.   : 3.17060   Max.   : 3.746671   Max.   : 3.36654  
##       PNN                DTH                FGN                 FLN           
##  Min.   :-1.80455   Min.   :-2.41050   Min.   :-2.293937   Min.   :-1.929527  
##  1st Qu.:-0.74798   1st Qu.:-0.60661   1st Qu.:-0.626699   1st Qu.:-0.610398  
##  Median :-0.22914   Median :-0.15163   Median :-0.039020   Median :-0.116219  
##  Mean   :-0.02606   Mean   :-0.02416   Mean   : 0.003102   Mean   : 0.006202  
##  3rd Qu.: 0.56907   3rd Qu.: 0.52461   3rd Qu.: 0.526551   3rd Qu.: 0.521558  
##  Max.   : 3.07712   Max.   : 5.11907   Max.   : 3.608639   Max.   : 3.564864  
##     region          subpopulation     
##  Length:364         Length:364        
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 
# View distributions after data normalization}

norm1 %>%
  group_by(subpopulation) %>%
  summarise(subpopulation, HHGW) %>%
  ggplot(aes(x = HHGW, fill = subpopulation)) +
  geom_density(alpha = 0.7) +
  facet_wrap(~subpopulation, ncol = 2) +
  labs(title = "Distribution of grain weight after normalization")
## `summarise()` regrouping output by 'subpopulation' (override with `.groups` argument)

Principal Component Analysis

# I encountered an error downloading the ggbiplot package from GitHub so I copied the function code from the GitHub repo.

ggbiplot <- function(pcobj, choices = 1:2, scale = 1, pc.biplot = TRUE, 
                     obs.scale = 1 - scale, var.scale = scale, 
                     groups = NULL, ellipse = FALSE, ellipse.prob = 0.68, 
                     labels = NULL, labels.size = 3, alpha = 1, 
                     var.axes = TRUE, 
                     circle = FALSE, circle.prob = 0.69, 
                     varname.size = 3, varname.adjust = 1.5, 
                     varname.abbrev = FALSE, ...)
{
  library(ggplot2)
  library(plyr)
  library(scales)
  library(grid)
  
  stopifnot(length(choices) == 2)
  
  # Recover the SVD
  if(inherits(pcobj, 'prcomp')){
    nobs.factor <- sqrt(nrow(pcobj$x) - 1)
    d <- pcobj$sdev
    u <- sweep(pcobj$x, 2, 1 / (d * nobs.factor), FUN = '*')
    v <- pcobj$rotation
  } else if(inherits(pcobj, 'princomp')) {
    nobs.factor <- sqrt(pcobj$n.obs)
    d <- pcobj$sdev
    u <- sweep(pcobj$scores, 2, 1 / (d * nobs.factor), FUN = '*')
    v <- pcobj$loadings
  } else if(inherits(pcobj, 'PCA')) {
    nobs.factor <- sqrt(nrow(pcobj$call$X))
    d <- unlist(sqrt(pcobj$eig)[1])
    u <- sweep(pcobj$ind$coord, 2, 1 / (d * nobs.factor), FUN = '*')
    v <- sweep(pcobj$var$coord,2,sqrt(pcobj$eig[1:ncol(pcobj$var$coord),1]),FUN="/")
  } else if(inherits(pcobj, "lda")) {
    nobs.factor <- sqrt(pcobj$N)
    d <- pcobj$svd
    u <- predict(pcobj)$x/nobs.factor
    v <- pcobj$scaling
    d.total <- sum(d^2)
  } else {
    stop('Expected a object of class prcomp, princomp, PCA, or lda')
  }
  
  # Scores
  choices <- pmin(choices, ncol(u))
  df.u <- as.data.frame(sweep(u[,choices], 2, d[choices]^obs.scale, FUN='*'))
  
  # Directions
  v <- sweep(v, 2, d^var.scale, FUN='*')
  df.v <- as.data.frame(v[, choices])
  
  names(df.u) <- c('xvar', 'yvar')
  names(df.v) <- names(df.u)
  
  if(pc.biplot) {
    df.u <- df.u * nobs.factor
  }
  
  # Scale the radius of the correlation circle so that it corresponds to 
  # a data ellipse for the standardized PC scores
  r <- sqrt(qchisq(circle.prob, df = 2)) * prod(colMeans(df.u^2))^(1/4)
  
  # Scale directions
  v.scale <- rowSums(v^2)
  df.v <- r * df.v / sqrt(max(v.scale))
  
  # Change the labels for the axes
  if(obs.scale == 0) {
    u.axis.labs <- paste('standardized PC', choices, sep='')
  } else {
    u.axis.labs <- paste('PC', choices, sep='')
  }
  
  # Append the proportion of explained variance to the axis labels
  u.axis.labs <- paste(u.axis.labs, 
                       sprintf('(%0.1f%% explained var.)', 
                               100 * pcobj$sdev[choices]^2/sum(pcobj$sdev^2)))
  
  # Score Labels
  if(!is.null(labels)) {
    df.u$labels <- labels
  }
  
  # Grouping variable
  if(!is.null(groups)) {
    df.u$groups <- groups
  }
  
  # Variable Names
  if(varname.abbrev) {
    df.v$varname <- abbreviate(rownames(v))
  } else {
    df.v$varname <- rownames(v)
  }
  
  # Variables for text label placement
  df.v$angle <- with(df.v, (180/pi) * atan(yvar / xvar))
  df.v$hjust = with(df.v, (1 - varname.adjust * sign(xvar)) / 2)
  
  # Base plot
  g <- ggplot(data = df.u, aes(x = xvar, y = yvar)) + 
    xlab(u.axis.labs[1]) + ylab(u.axis.labs[2]) + coord_equal()
  
  if(var.axes) {
    # Draw circle
    if(circle) 
    {
      theta <- c(seq(-pi, pi, length = 50), seq(pi, -pi, length = 50))
      circle <- data.frame(xvar = r * cos(theta), yvar = r * sin(theta))
      g <- g + geom_path(data = circle, color = muted('white'), 
                         size = 1/2, alpha = 1/3)
    }
    
    # Draw directions
    g <- g +
      geom_segment(data = df.v,
                   aes(x = 0, y = 0, xend = xvar, yend = yvar),
                   arrow = arrow(length = unit(1/2, 'picas')), 
                   color = muted('red'))
  }
  
  # Draw either labels or points
  if(!is.null(df.u$labels)) {
    if(!is.null(df.u$groups)) {
      g <- g + geom_text(aes(label = labels, color = groups), 
                         size = labels.size)
    } else {
      g <- g + geom_text(aes(label = labels), size = labels.size)      
    }
  } else {
    if(!is.null(df.u$groups)) {
      g <- g + geom_point(aes(color = groups), alpha = alpha)
    } else {
      g <- g + geom_point(alpha = alpha)      
    }
  }
  
  # Overlay a concentration ellipse if there are groups
  if(!is.null(df.u$groups) && ellipse) {
    theta <- c(seq(-pi, pi, length = 50), seq(pi, -pi, length = 50))
    circle <- cbind(cos(theta), sin(theta))
    
    ell <- ddply(df.u, 'groups', function(x) {
      if(nrow(x) <= 2) {
        return(NULL)
      }
      sigma <- var(cbind(x$xvar, x$yvar))
      mu <- c(mean(x$xvar), mean(x$yvar))
      ed <- sqrt(qchisq(ellipse.prob, df = 2))
      data.frame(sweep(circle %*% chol(sigma) * ed, 2, mu, FUN = '+'), 
                 groups = x$groups[1])
    })
    names(ell)[1:2] <- c('xvar', 'yvar')
    g <- g + geom_path(data = ell, aes(color = groups, group = groups))
  }
  
  # Label the variable axes
  if(var.axes) {
    g <- g + 
      geom_text(data = df.v, 
                aes(label = varname, x = xvar, y = yvar, 
                    angle = angle, hjust = hjust), 
                color = 'darkred', size = varname.size)
  }
  # Change the name of the legend for groups
  # if(!is.null(groups)) {
  #   g <- g + scale_color_brewer(name = deparse(substitute(groups)), 
  #                               palette = 'Dark2')
  # }
  
  # TODO: Add a second set of axes
  
  return(g)
}
# compute PCA 
rice.pca <- prcomp(norm1[,1:8], center = TRUE,scale. = TRUE)

# view pca summary
summary(rice.pca)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5     PC6     PC7
## Standard deviation     1.7248 1.4214 1.1228 0.80165 0.66286 0.63019 0.43245
## Proportion of Variance 0.3719 0.2525 0.1576 0.08033 0.05492 0.04964 0.02338
## Cumulative Proportion  0.3719 0.6244 0.7820 0.86234 0.91726 0.96690 0.99028
##                            PC8
## Standard deviation     0.27883
## Proportion of Variance 0.00972
## Cumulative Proportion  1.00000
# view pca structure
str(rice.pca)
## List of 5
##  $ sdev    : num [1:8] 1.725 1.421 1.123 0.802 0.663 ...
##  $ rotation: num [1:8, 1:8] -0.3657 -0.3812 0.3156 0.4178 0.0747 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:8] "HHGW" "HGV" "PTH" "PNL" ...
##   .. ..$ : chr [1:8] "PC1" "PC2" "PC3" "PC4" ...
##  $ center  : Named num [1:8] 0.02193 0.01695 0.00642 0.01261 -0.02606 ...
##   ..- attr(*, "names")= chr [1:8] "HHGW" "HGV" "PTH" "PNL" ...
##  $ scale   : Named num [1:8] 0.989 0.995 0.984 0.994 0.954 ...
##   ..- attr(*, "names")= chr [1:8] "HHGW" "HGV" "PTH" "PNL" ...
##  $ x       : num [1:364, 1:8] -2.227 0.552 3.9 2.64 4.948 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:364] "1" "2" "3" "4" ...
##   .. ..$ : chr [1:8] "PC1" "PC2" "PC3" "PC4" ...
##  - attr(*, "class")= chr "prcomp"
P <- ggbiplot(rice.pca,
         obs.scale = 1, 
         var.scale=1,
         ellipse=F,
         circle=F,
         varname.size=4,
         var.axes=T,
         groups=norm1$subpopulation, #no need for coloring, I'm making the points invisible
         alpha=0) #invisible points, I add them below
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following object is masked from 'package:ggpubr':
## 
##     mutate
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following object is masked from 'package:purrr':
## 
##     compact
P$layers <- c(geom_point(aes(color=norm1$subpopulation), cex=3), P$layers) #add geom_point in a layer underneath (only way I have to change the size of the points in ggbiplot)

P

For the principal component plot, PC1 accounted for 37.2% of the total variation and was associated with FGN, FLN, PNL, DTH and PNN. The first quadrat of PC1 formed a cluster comprising ADMIX, AUS and IND scoring high for FGN, FLN, PNL and DTH, while IND and AUS formed the major cluster in the second quadrat scoring low for PTH and PNN. TEJ and ADMIX were the major clusters found in PC2 which accounted for 25.3%. It was associated with HHGW and HGV, which scored high on the first quadrat, while the second quadrat contained some clusters of TEJ, ADMIX and AUS.

Standardized Major Axis (SMA) regression

Relationship between Test weight and other agronomic traits

# To plot relationship between seed per panicle and test weight across subpopulations

# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$subpopulation), function(x) {
  # make object for sma model and subset population
  obj <- sma(HHGW~FGN*subpopulation, data=subset(norm1, subpopulation == x), multcomp=TRUE,multcompmethod="adjusted")
  data.frame(subpopulation = x, 
             intercept=obj$coef[[1]][1, 1],
             slope=obj$coef[[1]][2, 1])
})) -> fits

# plot relationship between seed per panicle and test weight across subpopulations
seeds_per_panicle <- ggplot(norm1) +
  geom_point(aes(x=FGN, y=HHGW, color=subpopulation)) +
  geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
  labs(x = "Seeds per panicle", y = "100 seed weight",
       title = "Relationship between test weight and seeds per panicle across subpopulations")

# visualize relationship
seeds_per_panicle

# make a facet-wrap to view relationship across groups
seeds_per_panicle + facet_wrap(~subpopulation, ncol=2)

# Perform sma analysis for seed per panicle across subpopulations
filgrnb_sma <-sma(HHGW~FGN*subpopulation, data=norm1,multcomp=TRUE,multcompmethod="adjusted")

# view group summary
filgrnb_sma$groupsummary %>%
  kbl(format = 'html') %>%
  kable_paper() %>%
  scroll_box(width = "700px",
             height = "350px")       # Using KableExtra table formatting
group n r2 pval Slope Slope_lowCI Slope_highCI Int Int_lowCI Int_highCI Slope_test Slope_test_p Elev_test Elev_test_p
ADMIX 56 0.0121245 0.4191634 -1.0579973 -1.3831041 -0.8093088 0.1553618 -0.2030996 0.5138232 NA NA NA NA
ARO 7 0.2221491 0.2856813 -3.3189881 -8.0916075 -1.3613713 -3.0181351 -5.9428513 -0.0934189 NA NA NA NA
AUS 51 0.0722018 0.0565743 -0.6931603 -0.9108492 -0.5274981 -0.5983793 -0.8477217 -0.3490369 NA NA NA NA
IND 73 0.0003167 0.8812093 0.8160154 0.6454752 1.0316138 -0.5489209 -0.8172319 -0.2806098 NA NA NA NA
TEJ 92 0.0000000 0.9992490 0.9523719 0.7735907 1.1724705 0.7747140 0.5149058 1.0345222 NA NA NA NA
TRJ 85 0.2515723 0.0000010 -0.9518213 -1.1484191 -0.7888790 0.7758076 0.5609367 0.9906784 NA NA NA NA
#------------------------------------------------------------------------------

# To plot the relationship between Total Florets per panicle and 100 seed weight

# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$subpopulation), function(x) {
  # make object for sma model and subset population
  obj <- sma(HHGW~FLN*subpopulation, data=subset(norm1, subpopulation == x), multcomp=TRUE,multcompmethod="adjusted")
  data.frame(subpopulation = x, 
             intercept=obj$coef[[1]][1, 1],
             slope=obj$coef[[1]][2, 1])
})) -> fits

# plot relationship between seed per panicle and test weight across subpopulations
florets_per_panicle <- ggplot(norm1) +
  geom_point(aes(x=FLN, y=HHGW, color=subpopulation)) +
  geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
  labs(x = "Total Florets per Panicle", y = "100 seed weight",
       title = "Relationship between test weight and total florets per panicle across subpopulations")

# visualize relationship
florets_per_panicle

# make a facet-wrap to view relationship across groups
florets_per_panicle + facet_wrap(~subpopulation, ncol=2)

# Perform sma analysis for Total Florets per panicle across subpopulations
flnbppn_sma <-sma(HHGW~FLN*subpopulation, data=norm1,multcomp=TRUE,multcompmethod="adjusted")

# view group summary
flnbppn_sma$groupsummary %>%
  kbl(format = 'html') %>%
  kable_paper() %>%
  scroll_box(width = "700px",
             height = "350px")        # Using KableExtra table formatting
group n r2 pval Slope Slope_lowCI Slope_highCI Int Int_lowCI Int_highCI Slope_test Slope_test_p Elev_test Elev_test_p
ADMIX 56 0.1613282 0.0021530 -1.0019881 -1.2831417 -0.7824390 0.4253753 0.1263962 0.7243544 NA NA NA NA
ARO 7 0.3560280 0.1572758 -3.3027580 -7.5404121 -1.4466332 -2.9293795 -5.4986269 -0.3601320 NA NA NA NA
AUS 51 0.0767219 0.0490988 -0.7737894 -1.0161392 -0.5892401 -0.8708629 -1.1211628 -0.6205630 NA NA NA NA
IND 73 0.0280656 0.1565744 -0.8576659 -1.0807839 -0.6806086 -0.5036280 -0.7507534 -0.2565026 NA NA NA NA
TEJ 92 0.0000873 0.9295503 1.0138130 0.8235053 1.2480998 0.8539348 0.5901924 1.1176772 NA NA NA NA
TRJ 85 0.3162021 0.0000000 -0.9617684 -1.1509440 -0.8036867 0.8723962 0.6641858 1.0806066 NA NA NA NA
#------------------------------------------------------------------------------


# To estimate the relationship between panicles per plant and 100 seed weight

# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$subpopulation), function(x) {
  # make object for sma model and subset population
  obj <- sma(HHGW~FLN*subpopulation, data=subset(norm1, subpopulation == x), multcomp=TRUE,multcompmethod="adjusted")
  data.frame(subpopulation = x, 
             intercept=obj$coef[[1]][1, 1],
             slope=obj$coef[[1]][2, 1])
})) -> fits

# plot relationship between panicles per plant and test weight across subpopulations
panicles_per_plant <- ggplot(norm1) +
  geom_point(aes(x=PNN, y=HHGW, color=subpopulation)) +
  geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
  labs(x = "Panicle per plant", y = "100 seed weight",
       title = "Relationship between test weight and panicles per plant across subpopulations")

# visualize relationship
panicles_per_plant

# make a facet-wrap to view relationship across groups
panicles_per_plant + facet_wrap(~subpopulation, ncol=2)

# Perform sma analysis for panicles per plant across subpopulations
pnnb_sma <-sma(HHGW~PNN*subpopulation, data=norm1,multcomp=TRUE,multcompmethod="adjusted")

# view group summary
pnnb_sma$groupsummary %>%
  kbl(format = 'html') %>%
  kable_paper() %>%
  scroll_box(width = "700px",
             height = "350px")            # Using KableExtra table formatting
group n r2 pval Slope Slope_lowCI Slope_highCI Int Int_lowCI Int_highCI Slope_test Slope_test_p Elev_test Elev_test_p
ADMIX 56 0.1381634 0.0047924 -1.3066327 -1.678839 -1.0169464 -0.2327722 -0.5533354 0.0877910 NA NA NA NA
ARO 7 0.0148219 0.7948473 3.1650317 1.190632 8.4135364 -0.3755597 -1.8688436 1.1177243 NA NA NA NA
AUS 51 0.0877770 0.0347723 -0.8982791 -1.177736 -0.6851323 -0.1998707 -0.4796015 0.0798600 NA NA NA NA
IND 73 0.0212668 0.2182957 -1.0111003 -1.275145 -0.8017317 0.5352962 0.1808400 0.8897523 NA NA NA NA
TEJ 92 0.0681682 0.0119433 -1.4595375 -1.784107 -1.1940148 0.1097310 -0.1148818 0.3343438 NA NA NA NA
TRJ 85 0.0135029 0.2895732 -2.6793730 -3.322629 -2.1606505 -2.0072094 -2.5961890 -1.4182299 NA NA NA NA
#------------------------------------------------------------------------------

# To estimate the relationship between plant height and 100 seed weight

# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$subpopulation), function(x) {
  # make object for sma model and subset population
  obj <- sma(HHGW~PTH*subpopulation, data=subset(norm1, subpopulation == x), multcomp=TRUE,multcompmethod="adjusted")
  data.frame(subpopulation = x, 
             intercept=obj$coef[[1]][1, 1],
             slope=obj$coef[[1]][2, 1])
})) -> fits

# plot relationship between plant height and test weight across subpopulations
plant_height <- ggplot(norm1) +
  geom_point(aes(x=PTH, y=HHGW, color=subpopulation)) +
  geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
  labs(x = "Plant height", y = "100 seed weight",
       title = "Relationship between test weight and plant height across regions")

# visualize relationship
plant_height

# make a facet-wrap to view relationship across groups
plant_height + facet_wrap(~subpopulation, ncol=2)

# Perform sma analysis for panicles per plant across subpopulations
ptht_sma <-sma(HHGW~PTH*subpopulation, data=norm1,multcomp=TRUE,multcompmethod="adjusted")

# view group summary
ptht_sma$groupsummary %>%
  kbl(format = 'html') %>%
  kable_paper() %>%
  scroll_box(width = "700px",
             height = "350px")            # Using KableExtra table formatting
group n r2 pval Slope Slope_lowCI Slope_highCI Int Int_lowCI Int_highCI Slope_test Slope_test_p Elev_test Elev_test_p
ADMIX 56 0.0118435 0.4246377 -0.9903552 -1.2947248 -0.7575381 0.2082924 -0.1502028 0.5667876 NA NA NA NA
ARO 7 0.0128631 0.8086956 -1.2907307 -3.4336862 -0.4851887 1.7019968 -1.0996128 4.5036064 NA NA NA NA
AUS 51 0.0263307 0.2552656 -1.4129056 -1.8687461 -1.0682576 0.3222571 -0.0715528 0.7160670 NA NA NA NA
IND 73 0.0482383 0.0618992 -0.7246572 -0.9110112 -0.5764232 -0.3988271 -0.6400974 -0.1575568 NA NA NA NA
TEJ 92 0.0282577 0.1092141 1.2331440 1.0045823 1.5137078 1.2298064 0.9489479 1.5106649 NA NA NA NA
TRJ 85 0.0402848 0.0654966 1.1154818 0.9021361 1.3792814 0.4182288 0.1609324 0.6755252 NA NA NA NA
#------------------------------------------------------------------------------


# To estimate the relationship between days to heading and 100 seed weight

# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$subpopulation), function(x) {
  # make object for sma model and subset population
  obj <- sma(HHGW~DTH*subpopulation, data=subset(norm1, subpopulation == x), multcomp=TRUE,multcompmethod="adjusted")
  data.frame(subpopulation = x, 
             intercept=obj$coef[[1]][1, 1],
             slope=obj$coef[[1]][2, 1])
})) -> fits

# plot relationship between days to heading and test weight across subpopulations
days_to_heading <- ggplot(norm1) +
  geom_point(aes(x=DTH, y=HHGW, color=subpopulation)) +
  geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
  labs(x = "Days to heading", y = "100 seed weight",
       title = "Relationship between test weight and Days to Heading across subpopulations")

# visualize relationship
days_to_heading

# make a facet-wrap to view relationship across groups
days_to_heading + facet_wrap(~subpopulation, ncol=2)

# Perform sma analysis for days to heading across subpopulations
dthd_sma <-sma(HHGW~DTH*subpopulation, data=norm1,multcomp=TRUE,multcompmethod="adjusted")

# view group summary
dthd_sma$groupsummary %>%
  kbl(format = 'html') %>%
  kable_paper() %>%
  scroll_box(width = "700px",
             height = "350px")            # Using KableExtra table formatting
group n r2 pval Slope Slope_lowCI Slope_highCI Int Int_lowCI Int_highCI Slope_test Slope_test_p Elev_test Elev_test_p
ADMIX 56 0.0570007 0.0763853 -1.1270561 -1.4645429 -0.8673391 0.1121810 -0.2200131 0.4443752 NA NA NA NA
ARO 7 0.4841719 0.0825206 -1.3669631 -2.9013274 -0.6440459 -0.4074222 -1.2867789 0.4719346 NA NA NA NA
AUS 51 0.0192425 0.3316554 1.1706593 0.8842261 1.5498786 -0.8271417 -1.0979854 -0.5562980 NA NA NA NA
IND 73 0.0295793 0.1456815 -0.6419890 -0.8088564 -0.5095464 -0.2516907 -0.5068615 0.0034801 NA NA NA NA
TEJ 92 0.0186075 0.1947779 -0.9552551 -1.1737686 -0.7774210 -0.1698103 -0.4337191 0.0940985 NA NA NA NA
TRJ 85 0.0007544 0.8029571 -1.5460831 -1.9198803 -1.2450635 0.7033897 0.4122409 0.9945385 NA NA NA NA
# To relationship between seed per panicle and test weight across regions

# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$region), function(x) {
  # make object for sma model and subset population
  obj <- sma(HHGW~FGN*region, data=subset(norm1, region == x), multcomp=TRUE,multcompmethod="adjusted")
  data.frame(region = x, 
             intercept=obj$coef[[1]][1, 1],
             slope=obj$coef[[1]][2, 1])
})) -> fits

# plot relationship between seed per panicle and test weight across regions
seeds_per_panicle <- ggplot(norm1) +
  geom_point(aes(x=FGN, y=HHGW, color=region)) +
  geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
  labs(x = "Seeds per panicle", y = "100 seed weight",
       title = "Relationship between test weight and seeds per panicle across regions")

# visualize relationship
seeds_per_panicle

# make a facet-wrap to view relationship across groups
seeds_per_panicle + facet_wrap(~region, ncol=2)

# Perform sma analysis for seed per panicle across regions
filgrnb1_sma <-sma(HHGW~FGN*region, data=norm1,multcomp=TRUE,multcompmethod="adjusted")

# view group summary
filgrnb1_sma$groupsummary %>%
  kbl(format = 'html') %>%
  kable_paper() %>%
  scroll_box(width = "700px",
             height = "350px")      # Using KableExtra table formatting
group n r2 pval Slope Slope_lowCI Slope_highCI Int Int_lowCI Int_highCI Slope_test Slope_test_p Elev_test Elev_test_p
Africa 37 0.1038062 0.0518053 -1.1395415 -1.5683439 -0.8279783 0.7981303 0.4009233 1.1953373 NA NA NA NA
Caribbean 6 0.0685432 0.6162613 -0.6306579 -1.8993192 -0.2094063 0.5911244 -0.1275153 1.3097641 NA NA NA NA
Central America 6 0.4047575 0.1744462 -0.8800483 -2.2321099 -0.3469744 0.8680227 0.1612131 1.5748324 NA NA NA NA
Central Asia 6 0.1373502 0.4695396 -1.0977920 -3.2067302 -0.3758181 -0.9904780 -2.4043457 0.4233898 NA NA NA NA
East Asia 74 0.0007424 0.8177540 0.8618698 0.6829122 1.0877233 -0.0957317 -0.3602370 0.1687735 NA NA NA NA
East Europe 15 0.0045454 0.8113138 -0.6786493 -1.1963794 -0.3849656 0.3255896 -0.3758267 1.0270060 NA NA NA NA
North America 43 0.0158674 0.4208852 -0.7349767 -1.0000780 -0.5401487 0.2770053 -0.0168251 0.5708357 NA NA NA NA
Oceania 6 0.3608716 0.2073036 -0.7256853 -1.8894745 -0.2787120 -0.3652131 -1.2710989 0.5406727 NA NA NA NA
South America 30 0.0610612 0.1880170 -1.2935255 -1.8667460 -0.8963234 0.0973052 -0.3656056 0.5602159 NA NA NA NA
South Asia 63 0.0494662 0.0797754 -0.7445339 -0.9532235 -0.5815328 -0.5703187 -0.8336862 -0.3069512 NA NA NA NA
Southeast Asia 46 0.0105168 0.4976533 -0.8542723 -1.1506184 -0.6342513 -0.1038009 -0.4983028 0.2907009 NA NA NA NA
West Asia 11 0.0089617 0.7818864 -0.9335995 -1.8681939 -0.4665511 -0.6187488 -1.5346333 0.2971357 NA NA NA NA
West Europe 21 0.0023361 0.8351923 -0.9910938 -1.5745282 -0.6238484 0.9547820 0.4382989 1.4712650 NA NA NA NA
#------------------------------------------------------------------------------

# To estimate the relationship between Total Florets per panicle and 100 seed weight

# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$region), function(x) {
  # make object for sma model and subset population
  obj <- sma(HHGW~FLN*region, data=subset(norm1, region == x), multcomp=TRUE,multcompmethod="adjusted")
  data.frame(region= x, 
             intercept=obj$coef[[1]][1, 1],
             slope=obj$coef[[1]][2, 1])
})) -> fits

# plot relationship between seed per panicle and test weight across regions
florets_per_panicle <- ggplot(norm1) +
  geom_point(aes(x=FLN, y=HHGW, color=region)) +
  geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
  labs(x = "Total Florets per Panicle", y = "100 seed weight",
       title = "Relationship between test weight and total florets per panicle across regions")

# visualize relationship
florets_per_panicle

# make a facet-wrap to view relationship across groups
florets_per_panicle + facet_wrap(~region, ncol=2)

# Perform sma analysis for Total Florets per panicle across regions
flnbppn1_sma <-sma(HHGW~FLN*region, data=norm1,multcomp=TRUE,multcompmethod="adjusted")

# view group summary
flnbppn1_sma$groupsummary %>%
  kbl(format = 'html') %>%
  kable_paper() %>%
  scroll_box(width = "700px",
             height = "350px")       # Using KableExtra table formatting
group n r2 pval Slope Slope_lowCI Slope_highCI Int Int_lowCI Int_highCI Slope_test Slope_test_p Elev_test Elev_test_p
Africa 37 0.1824934 0.0083630 -1.2247427 -1.6623304 -0.9023444 0.9025426 0.5346671 1.2704181 NA NA NA NA
Caribbean 6 0.0058945 0.8850630 -0.6217080 -1.9221328 -0.2010896 0.4849523 -0.2637442 1.2336489 NA NA NA NA
Central America 6 0.5675139 0.0837613 -0.9795673 -2.2206821 -0.4320979 0.7868129 0.2336541 1.3399717 NA NA NA NA
Central Asia 6 0.2078680 0.3634980 -0.9024660 -2.5495213 -0.3194501 -0.7732753 -1.9320294 0.3854787 NA NA NA NA
East Asia 74 0.0000042 0.9862292 -0.8872140 -1.1198035 -0.7029346 -0.3062358 -0.5754577 -0.0370139 NA NA NA NA
East Europe 15 0.0024563 0.8607635 -0.7441415 -1.3125404 -0.4218892 0.2996701 -0.4128345 1.0121747 NA NA NA NA
North America 43 0.0952134 0.0440929 -0.7525207 -1.0114193 -0.5598938 0.4403637 0.1625446 0.7181829 NA NA NA NA
Oceania 6 0.0381806 0.7106325 -0.9618998 -2.9344850 -0.3153028 -0.3175809 -1.5986496 0.9634879 NA NA NA NA
South America 30 0.2966206 0.0018599 -1.2485895 -1.7181157 -0.9073753 0.2048406 -0.1542327 0.5639140 NA NA NA NA
South Asia 63 0.0774085 0.0272509 -0.7995030 -1.0199343 -0.6267120 -0.7127386 -0.9639363 -0.4615409 NA NA NA NA
Southeast Asia 46 0.0531385 0.1232532 -0.7999274 -1.0706494 -0.5976596 0.0055228 -0.3641482 0.3751938 NA NA NA NA
West Asia 11 0.0015596 0.9082219 1.0253296 0.5112472 2.0563454 0.9416803 -0.0834793 1.9668400 NA NA NA NA
West Europe 21 0.0047711 0.7660808 -1.0065084 -1.5981729 -0.6338858 0.8928619 0.3737050 1.4120189 NA NA NA NA
#------------------------------------------------------------------------------

# To estimate the relationship between panicles per plant and 100 seed weight

# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$region), function(x) {
  # make object for sma model and subset population
  obj <- sma(HHGW~PNN*region, data=subset(norm1, region == x), multcomp=TRUE,multcompmethod="adjusted")
  data.frame(region = x, 
             intercept=obj$coef[[1]][1, 1],
             slope=obj$coef[[1]][2, 1])
})) -> fits

# plot relationship between panicles per plant and test weight across regions
panicles_per_plant <- ggplot(norm1) +
  geom_point(aes(x=PNN, y=HHGW, color=region)) +
  geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
  labs(x = "Panicle per plant", y = "100 seed weight",
       title = "Relationship between test weight and panicles per plant across regions")

# visualize relationship
panicles_per_plant

# make a facet-wrap to view relationship across regions
panicles_per_plant + facet_wrap(~region, ncol=2)

# Perform sma analysis for panicles per plant across regions
pnnb1_sma <-sma(HHGW~PNN*region, data=norm1,multcomp=TRUE,multcompmethod="adjusted")

# view group summary
pnnb1_sma$groupsummary %>%
  kbl(format = 'html') %>%
  kable_paper() %>%
  scroll_box(width = "700px",
             height = "350px")          # Using KableExtra table formatting
group n r2 pval Slope Slope_lowCI Slope_highCI Int Int_lowCI Int_highCI Slope_test Slope_test_p Elev_test Elev_test_p
Africa 37 0.0982540 0.0588716 -1.2364862 -1.7033950 -0.8975593 0.2075433 -0.2294960 0.6445827 NA NA NA NA
Caribbean 6 0.0276460 0.7528919 -0.4763870 -1.4597221 -0.1554711 0.1371494 -0.6263329 0.9006317 NA NA NA NA
Central America 6 0.2714903 0.2891587 3.2463278 1.1868026 8.8798628 3.0949157 -0.0113749 6.2012063 NA NA NA NA
Central Asia 6 0.1525534 0.4439209 1.2164528 0.4193714 3.5285131 0.4598387 -0.8589333 1.7786107 NA NA NA NA
East Asia 74 0.1473644 0.0007344 -0.8401930 -1.0420001 -0.6774705 0.0796029 -0.1373261 0.2965319 NA NA NA NA
East Europe 15 0.2781364 0.0433516 -1.6136370 -2.6321650 -0.9892330 0.0772868 -0.4913607 0.6459344 NA NA NA NA
North America 43 0.0003400 0.9065756 1.2394581 0.9087676 1.6904832 0.7702358 0.3952447 1.1452269 NA NA NA NA
Oceania 6 0.4271472 0.1592367 -0.5701319 -1.4260224 -0.2279420 -0.2029340 -1.0481804 0.6423123 NA NA NA NA
South America 30 0.2715049 0.0031527 -1.0937199 -1.5132481 -0.7905004 0.0680064 -0.3023325 0.4383453 NA NA NA NA
South Asia 63 0.0551354 0.0639668 -0.9367392 -1.1984358 -0.7321880 -0.2173011 -0.5045463 0.0699441 NA NA NA NA
Southeast Asia 46 0.1993628 0.0018681 -0.8358917 -1.0934764 -0.6389849 -0.2066121 -0.5153668 0.1021427 NA NA NA NA
West Asia 11 0.3602667 0.0508948 -0.9851505 -1.7446157 -0.5562953 0.3023817 -0.2227482 0.8275117 NA NA NA NA
West Europe 21 0.1134283 0.1354687 -1.2945534 -2.0060117 -0.8354231 0.6785127 0.1948718 1.1621536 NA NA NA NA
#------------------------------------------------------------------------------

# To estimate the relationship between plant height and 100 seed weight

# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$region), function(x) {
  # make object for sma model and subset population
  obj <- sma(HHGW~PNN*region, data=subset(norm1, region == x), multcomp=TRUE,multcompmethod="adjusted")
  data.frame(region = x, 
             intercept=obj$coef[[1]][1, 1],
             slope=obj$coef[[1]][2, 1])
})) -> fits

# plot relationship between plant height and test weight across regions
plant_height <- ggplot(norm1) +
  geom_point(aes(x=PNN, y=HHGW, color=region)) +
  geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
  labs(x = "Plant height", y = "100 seed weight",
       title = "Relationship between test weight and plant height across regions")

# visualize relationship
plant_height

# make a facet-wrap to view relationship across regions
plant_height + facet_wrap(~region, ncol=2)

# Perform sma analysis for panicles per plant across regions
ptht1_sma <-sma(HHGW~PNN*region, data=norm1,multcomp=TRUE,multcompmethod="adjusted")

# view group summary
ptht1_sma$groupsummary%>%
  kbl(format = 'html') %>%
  kable_paper() %>%
  scroll_box(width = "700px",
             height = "350px")          # Using KableExtra table formatting
group n r2 pval Slope Slope_lowCI Slope_highCI Int Int_lowCI Int_highCI Slope_test Slope_test_p Elev_test Elev_test_p
Africa 37 0.0982540 0.0588716 -1.2364862 -1.7033950 -0.8975593 0.2075433 -0.2294960 0.6445827 NA NA NA NA
Caribbean 6 0.0276460 0.7528919 -0.4763870 -1.4597221 -0.1554711 0.1371494 -0.6263329 0.9006317 NA NA NA NA
Central America 6 0.2714903 0.2891587 3.2463278 1.1868026 8.8798628 3.0949157 -0.0113749 6.2012063 NA NA NA NA
Central Asia 6 0.1525534 0.4439209 1.2164528 0.4193714 3.5285131 0.4598387 -0.8589333 1.7786107 NA NA NA NA
East Asia 74 0.1473644 0.0007344 -0.8401930 -1.0420001 -0.6774705 0.0796029 -0.1373261 0.2965319 NA NA NA NA
East Europe 15 0.2781364 0.0433516 -1.6136370 -2.6321650 -0.9892330 0.0772868 -0.4913607 0.6459344 NA NA NA NA
North America 43 0.0003400 0.9065756 1.2394581 0.9087676 1.6904832 0.7702358 0.3952447 1.1452269 NA NA NA NA
Oceania 6 0.4271472 0.1592367 -0.5701319 -1.4260224 -0.2279420 -0.2029340 -1.0481804 0.6423123 NA NA NA NA
South America 30 0.2715049 0.0031527 -1.0937199 -1.5132481 -0.7905004 0.0680064 -0.3023325 0.4383453 NA NA NA NA
South Asia 63 0.0551354 0.0639668 -0.9367392 -1.1984358 -0.7321880 -0.2173011 -0.5045463 0.0699441 NA NA NA NA
Southeast Asia 46 0.1993628 0.0018681 -0.8358917 -1.0934764 -0.6389849 -0.2066121 -0.5153668 0.1021427 NA NA NA NA
West Asia 11 0.3602667 0.0508948 -0.9851505 -1.7446157 -0.5562953 0.3023817 -0.2227482 0.8275117 NA NA NA NA
West Europe 21 0.1134283 0.1354687 -1.2945534 -2.0060117 -0.8354231 0.6785127 0.1948718 1.1621536 NA NA NA NA
#------------------------------------------------------------------------------

# To estimate the relationship between days to heading and 100 seed weight

# assembly sma model coefficients into smaller objects using do.call and rbind
do.call(rbind, lapply(unique(norm1$region), function(x) {
  # make object for sma model and subset population
  obj <- sma(HHGW~DTH*region, data=subset(norm1, region == x), multcomp=TRUE,multcompmethod="adjusted")
  data.frame(region = x, 
             intercept=obj$coef[[1]][1, 1],
             slope=obj$coef[[1]][2, 1])
})) -> fits

# plot relationship between days to heading and test weight across regions
days_to_heading <- ggplot(norm1) +
  geom_point(aes(x=DTH, y=HHGW, color=region)) +
  geom_abline(data=fits, aes(slope=slope, intercept=intercept)) +
  labs(x = "Days to heading", y = "100 seed weight",
       title = "Relationship between test weight and Days to Heading across regions")

# visualize relationship
days_to_heading

# make a facet-wrap to view relationship across regions
days_to_heading + facet_wrap(~region, ncol=2)

# Perform sma analysis for Days to heading across regions
dthd1_sma <-sma(HHGW~DTH*region, data=norm1,multcomp=TRUE,multcompmethod="adjusted")

# view group summary
dthd1_sma$groupsummary %>%
  kbl(format = 'html') %>%
  kable_paper() %>%
  scroll_box(width = "700px",
             height = "350px")          # Using KableExtra table formatting
group n r2 pval Slope Slope_lowCI Slope_highCI Int Int_lowCI Int_highCI Slope_test Slope_test_p Elev_test Elev_test_p
Africa 37 0.0146114 0.4760598 0.8908774 0.6376806 1.2446081 0.5296634 0.0713506 0.9879763 NA NA NA NA
Caribbean 6 0.0131383 0.8288195 1.0412775 0.3377972 3.2097924 0.6798661 -0.1550681 1.5148004 NA NA NA NA
Central America 6 0.1222994 0.4968149 1.2897887 0.4385342 3.7934441 0.0422902 -1.0048885 1.0894690 NA NA NA NA
Central Asia 6 0.2522037 0.3100299 -0.5894852 -1.6286757 -0.2133591 -0.7191027 -1.7961556 0.3579501 NA NA NA NA
East Asia 74 0.0144835 0.3070841 -0.8577000 -1.0807544 -0.6806813 -0.2869803 -0.5394275 -0.0345331 NA NA NA NA
East Europe 15 0.0133345 0.6819456 -0.9361115 -1.6465107 -0.5322193 -0.3903905 -1.3054272 0.5246461 NA NA NA NA
North America 43 0.0183218 0.3867970 1.2024663 0.8840452 1.6355785 0.1531816 -0.1332323 0.4395954 NA NA NA NA
Oceania 6 0.0674391 0.6192211 -1.4786715 -4.4553515 -0.4907513 -0.3157077 -1.5444846 0.9130691 NA NA NA NA
South America 30 0.1428722 0.0394516 -1.0683908 -1.5178379 -0.7520295 0.1522019 -0.2676111 0.5720149 NA NA NA NA
South Asia 63 0.0662274 0.0417346 -0.8890929 -1.1358622 -0.6959348 -0.4514204 -0.7147067 -0.1881341 NA NA NA NA
Southeast Asia 46 0.0644515 0.0886649 -1.0929733 -1.4603881 -0.8179953 0.2561081 -0.1278416 0.6400579 NA NA NA NA
West Asia 11 0.0259993 0.6357593 0.8877299 0.4459374 1.7672086 0.3375603 -0.4217561 1.0968766 NA NA NA NA
West Europe 21 0.0052198 0.7556367 -1.0477267 -1.6634589 -0.6599089 0.3486499 -0.2971674 0.9944671 NA NA NA NA

There was a strong relationship between HHGW - FGN for all rice subpopulations (Table 1). Negative associations were observed in ADMIX (slope = -1.06, p < 0.01), ARO (slope = -3.32, p < 0.01), AUS (slope = -0.69, p < 0.01) and TRJ (slope = -0.95, p < 0.01), while IND (slope = 0.82, p < 0.01) and TEJ (slope = 0.95, p < 0.01) showed positive associations. Across regions, only Africa (slope = -1.14, p = 0.052) showed a significant negative association between these traits.

For HHGW - FLN, all subpopulations showed a strong negative association except TEJ (slope = 1.01, p < 0.01) which had a positive association. Africa (slope = -1.22, p < 0.01), North America (slope = -0.75, p = 0.04), South America (slope = -1.25, p < 0.01) and South Asia (slope = -0.80, p = 0.03) were the only regions with significant relationships and they all exhibited a negative association.

ADMIX (slope = -1.31, p < 0.01), AUS (slope = -0.90, p = 0.03) and TEJ (slope = -1.46, p = 0.01) all showed strong negative associations for HHGW – PNN relationship. East Asia (slope = -0.84, p < 0.01), East Europe (slope = -1.61, p = 0.04), South America (slope = -1.09, p < 0.01), Southeast Asia (slope = -0.84, p < 0.01) and West Asia (slope = -1.00, p = 0.05) all showed a strong negative association.

No significant association was observed across all subpopulations for HHGW- PTH relationship. East Asia (slope = -0.84, p < 0.01), East Europe (slope = -1.61, p < 0.04), South America (slope = -1.09, p < 0.01), Southeast Asia (slope = -0.84, p < 0.01) and West Asia (slope = -0.99, p = 0.05) regions all showed strong negative associations.

Significant associations were not found across subpopulations for HHGW – DTH relationship. Nevertheless, significant negative associations were found only in South America (slope = -1.07, p = 0.04) and South Asia (slope = -0.89, p = 0.04).

Structural Equation Modeling (SEM)

#sem for ADMIX
data<-subset(norm1,subpopulation=='ADMIX')
sitemod <- 'HHGW~FGN+PNN+FLN
            FGN~~PNN
            PNN~~FLN
            FGN~DTH+PTH
            PNN~DTH+PTH
            FLN~DTH+PTH
            DTH~~PTH'        

# model/sem anylysis
sitemod.fit <- lavaan::sem(sitemod, data = data)

#summary of the results
summary(sitemod.fit, fit.measures=TRUE)
## lavaan 0.6-7 ended normally after 25 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         18
##                                                       
##   Number of observations                            56
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                61.587
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               137.088
##   Degrees of freedom                                15
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.520
##   Tucker-Lewis Index (TLI)                      -1.399
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -409.267
##   Loglikelihood unrestricted model (H1)       -378.474
##                                                       
##   Akaike (AIC)                                 854.535
##   Bayesian (BIC)                               890.991
##   Sample-size adjusted Bayesian (BIC)          834.418
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.591
##   90 Percent confidence interval - lower         0.467
##   90 Percent confidence interval - upper         0.723
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.324
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   HHGW ~                                              
##     FGN               0.403    0.130    3.096    0.002
##     PNN              -0.472    0.150   -3.143    0.002
##     FLN              -0.774    0.110   -7.014    0.000
##   FGN ~                                               
##     DTH               0.233    0.165    1.414    0.157
##     PTH              -0.028    0.145   -0.196    0.844
##   PNN ~                                               
##     DTH               0.123    0.148    0.836    0.403
##     PTH               0.074    0.130    0.570    0.569
##   FLN ~                                               
##     DTH               0.501    0.162    3.090    0.002
##     PTH              -0.075    0.142   -0.527    0.598
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .FGN ~~                                              
##    .PNN              -0.413    0.112   -3.694    0.000
##  .PNN ~~                                              
##    .FLN               0.150    0.084    1.791    0.073
##   DTH ~~                                              
##     PTH               0.463    0.131    3.520    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .HHGW              0.579    0.109    5.292    0.000
##    .FGN               0.829    0.157    5.292    0.000
##    .PNN               0.665    0.124    5.361    0.000
##    .FLN               0.803    0.152    5.292    0.000
##     DTH               0.763    0.144    5.292    0.000
##     PTH               0.988    0.187    5.292    0.000
#plot our CFA. you can change layout with layout = argument. see ?semPaths() for more. 
semPlot::semPaths(sitemod.fit, "par", edge.label.cex = 1.2, fade = FALSE)
title("ADMIX")

#------------------------------------------------------------------------------

#sem for ARO
data<-subset(norm1,subpopulation=='ARO')
sitemod <- 'HHGW~FGN+PNN+FLN
            FGN~~PNN
            PNN~~FLN
            FGN~DTH+PTH
            PNN~DTH+PTH
            FLN~DTH+PTH
            DTH~~PTH'        

# model/sem anylysis
sitemod.fit <- lavaan::sem(sitemod, data = data)

#summary of the results
summary(sitemod.fit, fit.measures=TRUE)
## lavaan 0.6-7 ended normally after 60 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         18
##                                                       
##   Number of observations                             7
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 9.832
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.020
## 
## Model Test Baseline Model:
## 
##   Test statistic                                46.483
##   Degrees of freedom                                15
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.783
##   Tucker-Lewis Index (TLI)                      -0.085
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)                -11.625
##   Loglikelihood unrestricted model (H1)         -6.709
##                                                       
##   Akaike (AIC)                                  59.249
##   Bayesian (BIC)                                58.276
##   Sample-size adjusted Bayesian (BIC)            5.594
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.570
##   90 Percent confidence interval - lower         0.199
##   90 Percent confidence interval - upper         0.985
##   P-value RMSEA <= 0.05                          0.022
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.094
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   HHGW ~                                              
##     FGN               1.570    0.893    1.758    0.079
##     PNN               3.189    0.912    3.498    0.000
##     FLN              -4.850    1.094   -4.435    0.000
##   FGN ~                                               
##     DTH               0.329    0.189    1.734    0.083
##     PTH              -0.165    0.179   -0.923    0.356
##   PNN ~                                               
##     DTH              -0.101    0.206   -0.491    0.623
##     PTH               0.345    0.195    1.773    0.076
##   FLN ~                                               
##     DTH               0.310    0.127    2.452    0.014
##     PTH               0.044    0.119    0.369    0.712
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .FGN ~~                                              
##    .PNN              -0.029    0.021   -1.370    0.171
##  .PNN ~~                                              
##    .FLN               0.029    0.018    1.595    0.111
##   DTH ~~                                              
##     PTH               0.400    0.255    1.567    0.117
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .HHGW              0.215    0.115    1.871    0.061
##    .FGN               0.059    0.032    1.871    0.061
##    .PNN               0.070    0.034    2.068    0.039
##    .FLN               0.026    0.014    1.871    0.061
##     DTH               0.514    0.275    1.871    0.061
##     PTH               0.577    0.308    1.871    0.061
#plot our CFA. you can change layout with layout = argument. see ?semPaths() for more. 
semPlot::semPaths(sitemod.fit, "par", edge.label.cex = 1.2, fade = FALSE)
title("ARO")

#------------------------------------------------------------------------------

#sem for AUS
data<-subset(norm1,subpopulation=='AUS')
sitemod <- 'HHGW~FGN+PNN+FLN
            FGN~~PNN
            PNN~~FLN
            FGN~DTH+PTH
            PNN~DTH+PTH
            FLN~DTH+PTH
            DTH~~PTH'        

# model/sem anylysis
sitemod.fit <- lavaan::sem(sitemod, data = data)

#summary of the results
summary(sitemod.fit, fit.measures=TRUE)
## lavaan 0.6-7 ended normally after 31 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         18
##                                                       
##   Number of observations                            51
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                               200.907
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               290.159
##   Degrees of freedom                                15
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.281
##   Tucker-Lewis Index (TLI)                      -2.596
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -297.642
##   Loglikelihood unrestricted model (H1)       -197.188
##                                                       
##   Akaike (AIC)                                 631.283
##   Bayesian (BIC)                               666.056
##   Sample-size adjusted Bayesian (BIC)          609.543
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          1.137
##   90 Percent confidence interval - lower         1.007
##   90 Percent confidence interval - upper         1.273
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.392
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   HHGW ~                                              
##     FGN               0.460    0.090    5.102    0.000
##     PNN              -0.562    0.119   -4.732    0.000
##     FLN              -0.982    0.094  -10.395    0.000
##   FGN ~                                               
##     DTH              -0.414    0.193   -2.149    0.032
##     PTH               1.084    0.233    4.657    0.000
##   PNN ~                                               
##     DTH              -0.103    0.156   -0.659    0.510
##     PTH              -0.729    0.188   -3.882    0.000
##   FLN ~                                               
##     DTH              -0.343    0.169   -2.031    0.042
##     PTH               1.035    0.204    5.081    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .FGN ~~                                              
##    .PNN              -0.183    0.083   -2.200    0.028
##  .PNN ~~                                              
##    .FLN              -0.054    0.066   -0.811    0.417
##   DTH ~~                                              
##     PTH               0.009    0.043    0.209    0.834
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .HHGW              0.326    0.065    5.050    0.000
##    .FGN               0.703    0.139    5.050    0.000
##    .PNN               0.458    0.091    5.056    0.000
##    .FLN               0.539    0.107    5.050    0.000
##     DTH               0.371    0.073    5.050    0.000
##     PTH               0.255    0.050    5.050    0.000
#plot our CFA. you can change layout with layout = argument. see ?semPaths() for more. 
semPlot::semPaths(sitemod.fit, "par", edge.label.cex = 1.2, fade = FALSE)
title("AUS")

#------------------------------------------------------------------------------

#sem for IND
data<-subset(norm1,subpopulation=='IND')
sitemod <- 'HHGW~FGN+PNN+FLN
            FGN~~PNN
            PNN~~FLN
            FGN~DTH+PTH
            PNN~DTH+PTH
            FLN~DTH+PTH
            DTH~~PTH'        

# model/sem anylysis
sitemod.fit <- lavaan::sem(sitemod, data = data)

#summary of the results
summary(sitemod.fit, fit.measures=TRUE)
## lavaan 0.6-7 ended normally after 29 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         18
##                                                       
##   Number of observations                            73
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                               129.492
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               184.562
##   Degrees of freedom                                15
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.254
##   Tucker-Lewis Index (TLI)                      -2.730
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -582.198
##   Loglikelihood unrestricted model (H1)       -517.452
##                                                       
##   Akaike (AIC)                                1200.397
##   Bayesian (BIC)                              1241.625
##   Sample-size adjusted Bayesian (BIC)         1184.906
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.760
##   90 Percent confidence interval - lower         0.651
##   90 Percent confidence interval - upper         0.875
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.442
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   HHGW ~                                              
##     FGN               0.708    0.083    8.538    0.000
##     PNN              -0.352    0.119   -2.954    0.003
##     FLN              -0.958    0.106   -9.045    0.000
##   FGN ~                                               
##     DTH              -0.001    0.091   -0.007    0.995
##     PTH              -0.179    0.103   -1.746    0.081
##   PNN ~                                               
##     DTH              -0.129    0.076   -1.707    0.088
##     PTH               0.161    0.086    1.886    0.059
##   FLN ~                                               
##     DTH               0.152    0.085    1.788    0.074
##     PTH              -0.166    0.096   -1.730    0.084
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .FGN ~~                                              
##    .PNN               0.082    0.078    1.054    0.292
##  .PNN ~~                                              
##    .FLN              -0.395    0.097   -4.060    0.000
##   DTH ~~                                              
##     PTH               0.174    0.166    1.051    0.293
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .HHGW              0.489    0.081    6.042    0.000
##    .FGN               0.943    0.156    6.042    0.000
##    .PNN               0.655    0.108    6.061    0.000
##    .FLN               0.827    0.137    6.042    0.000
##     DTH               1.588    0.263    6.042    0.000
##     PTH               1.246    0.206    6.042    0.000
#plot our CFA. you can change layout with layout = argument. see ?semPaths() for more. 
semPlot::semPaths(sitemod.fit, "par", edge.label.cex = 1.2, fade = FALSE)
title("IND")

#------------------------------------------------------------------------------

#sem for TEJ
data<-subset(norm1,subpopulation=='TEJ')
sitemod <- 'HHGW~FGN+PNN+FLN
            FGN~~PNN
            PNN~~FLN
            FGN~DTH+PTH
            PNN~DTH+PTH
            FLN~DTH+PTH
            DTH~~PTH'        

# model/sem anylysis
sitemod.fit <- lavaan::sem(sitemod, data = data)

#summary of the results
summary(sitemod.fit, fit.measures=TRUE)
## lavaan 0.6-7 ended normally after 28 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         18
##                                                       
##   Number of observations                            92
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                               196.728
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               334.740
##   Degrees of freedom                                15
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.394
##   Tucker-Lewis Index (TLI)                      -2.029
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -573.818
##   Loglikelihood unrestricted model (H1)       -475.454
##                                                       
##   Akaike (AIC)                                1183.636
##   Bayesian (BIC)                              1229.028
##   Sample-size adjusted Bayesian (BIC)         1172.210
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.838
##   90 Percent confidence interval - lower         0.741
##   90 Percent confidence interval - upper         0.939
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.169
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   HHGW ~                                              
##     FGN              -0.016    0.107   -0.150    0.881
##     PNN              -0.424    0.156   -2.711    0.007
##     FLN              -0.073    0.126   -0.580    0.562
##   FGN ~                                               
##     DTH               0.533    0.096    5.531    0.000
##     PTH               0.124    0.125    0.996    0.319
##   PNN ~                                               
##     DTH               0.087    0.079    1.103    0.270
##     PTH              -0.353    0.102   -3.454    0.001
##   FLN ~                                               
##     DTH               0.511    0.089    5.777    0.000
##     PTH               0.146    0.114    1.275    0.202
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .FGN ~~                                              
##    .PNN               0.081    0.039    2.066    0.039
##  .PNN ~~                                              
##    .FLN              -0.194    0.045   -4.336    0.000
##   DTH ~~                                              
##     PTH               0.286    0.070    4.109    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .HHGW              0.658    0.097    6.782    0.000
##    .FGN               0.518    0.076    6.782    0.000
##    .PNN               0.350    0.051    6.845    0.000
##    .FLN               0.436    0.064    6.782    0.000
##     DTH               0.780    0.115    6.782    0.000
##     PTH               0.468    0.069    6.782    0.000
#plot our CFA. you can change layout with layout = argument. see ?semPaths() for more. 
semPlot::semPaths(sitemod.fit, "par", edge.label.cex = 1.2, fade = FALSE)
title("TEJ")

#------------------------------------------------------------------------------

#sem for TRJ
data<-subset(norm1,subpopulation=='TRJ')
sitemod <- 'HHGW~FGN+PNN+FLN
            FGN~~PNN
            PNN~~FLN
            FGN~DTH+PTH
            PNN~DTH+PTH
            FLN~DTH+PTH
            DTH~~PTH'        

# model/sem anylysis
sitemod.fit <- lavaan::sem(sitemod, data = data)

#summary of the results
summary(sitemod.fit, fit.measures=TRUE)
## lavaan 0.6-7 ended normally after 28 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         18
##                                                       
##   Number of observations                            85
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                               130.740
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               226.559
##   Degrees of freedom                                15
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.396
##   Tucker-Lewis Index (TLI)                      -2.019
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -517.290
##   Loglikelihood unrestricted model (H1)       -451.920
##                                                       
##   Akaike (AIC)                                1070.580
##   Bayesian (BIC)                              1114.548
##   Sample-size adjusted Bayesian (BIC)         1057.762
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.708
##   90 Percent confidence interval - lower         0.607
##   90 Percent confidence interval - upper         0.814
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.216
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   HHGW ~                                              
##     FGN              -0.051    0.079   -0.645    0.519
##     PNN              -0.938    0.234   -4.002    0.000
##     FLN              -0.611    0.081   -7.525    0.000
##   FGN ~                                               
##     DTH               0.558    0.189    2.957    0.003
##     PTH              -0.092    0.136   -0.675    0.500
##   PNN ~                                               
##     DTH               0.058    0.067    0.856    0.392
##     PTH              -0.087    0.049   -1.789    0.074
##   FLN ~                                               
##     DTH               0.349    0.191    1.823    0.068
##     PTH               0.030    0.138    0.215    0.829
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .FGN ~~                                              
##    .PNN              -0.049    0.033   -1.474    0.140
##  .PNN ~~                                              
##    .FLN              -0.071    0.034   -2.069    0.039
##   DTH ~~                                              
##     PTH               0.233    0.060    3.868    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .HHGW              0.500    0.077    6.519    0.000
##    .FGN               0.864    0.133    6.519    0.000
##    .PNN               0.110    0.017    6.528    0.000
##    .FLN               0.890    0.136    6.519    0.000
##     DTH               0.363    0.056    6.519    0.000
##     PTH               0.698    0.107    6.519    0.000
#plot our CFA. you can change layout with layout = argument. see ?semPaths() for more. 
semPlot::semPaths(sitemod.fit, "par", edge.label.cex = 1.2, fade = FALSE)
title("TRJ")

Based on the Root Mean Square Error of Approximation (RMSEA > 0.5) and Standardized Root Mean Square Residual (SRMSR < 0.5), the model was a good fit for the data. Test weight was significantly affected in all rice sub-populations.

Direct and Indirect Effects

For all subpopulations, a negative direct effect on HHGW was observed in FLN. In ADMIX subpopulation, FGN, PNN and FLN accounted for 58% (R2 = 0.58) of the variation in HHGW. FLN had the strongest negative effect on HHGW (β = -0.77, p < 0.05), this implies that if FLN was increased by one standard deviation with FGN and PNN held constant, HHGW will decrease by 0.77 standard deviations. PNN negatively affected HHGW (β = -0.47), while FGN positively affected HHGW (β = 0.40) at p < 0.05. There was a significant indirect effect (p < 0.05) of PTH on HHGW via DTH and FLN. The double-headed arrow between DTH and PTH indicates a significant relationship (p < 0.05) but assumes no causality.

In ARO subpopulation, FLN, PNN and FGN accounted for 22 % (R2 = 0.22) of the variation in HHGW. The strongest negative direct effect was observed in FLN (β = -0.85, p < 0.05), while PNN and FGN had positive effects on HHGW. All indirect effects of PTH and DTH via FGN, PNN and FLN were all insignificant (p > 0.05). In AUS subpopulation, all agronomic traits accounted for 33% (R2 = 0.33) of the variation in HHGW. The strongest direct effect on HHGW was observed in FLN (β = -0.98, p < 0.05). PNN had a direct negative effect (β = -0.56, p < 0.05) while FGN had a direct positive effect (β = 0.46, p < 0.05). PTH and DTH had significant indirect effects on HHGW. PTH had effect on HHGW via FLN, PNN and FGN while DTH effect was via FGN and FLN.

For IND, the only significant direct effects on HHGW were observed in FLN (β = -0.96, p < 0.05) and FGN (β = 0.71, p < 0.05) which had negative and positive respectively. These traits accounted for 49% (R2 = 0.49) of the variation in HHGW. All indirect effects were insignificant (p > 0.05). For TEJ, only PNN accounted for all the variation in HHGW (R2 = 0.66) and was the only trait with a significant direct effect on HHGW (β = -0.42, p < 0.05). A significant indirect effect on HHGW was observed in PTH via DTH, FLN and PNN and also a negative effect via PNN (p < 0.05). DTH also had a significant indirect effect via FLN and PNN (p < 0.05).

For TRJ, FLN and PNN accounted for the total variation in HHGW (R2 = 0.50). Both traits had a negative direct effect on HHGW, (β = -0.61, p < 0.05) and (β = -0.94, p < 0.05) for FLN and PNN respectively. A significant indirect effect was observed in PTHT via DTH and FLN (p < 0.05), while other indirect effects were insignificant (p > 0.05).

Path Summary

# Path summary
# I created a result summary table on excel
sem <- read.csv("./raw_data/SEM_result.csv")

sem %>%
  kbl(format = 'html') %>%
  kable_paper() %>%
  scroll_box(width = "700px",
             height = "350px")          # Using KableExtra table formatting
Effect ADMIX ARO AUS IND TEJ TRJ Proposed.interpretation
FLN_HHG -0.77 -4.88 0.98 -0.98 NS -0.61 HHG variation is strongly influenced by FLN; Decreasing FLN increases grain weight except in TEJ
PNN_HHG -0.44 0.18 -0.55 NS -0.42 -0.9 HHG variation is strongly influenced by PNN except in IND
FGN_HHG -0.42 1.57 0.46 0.71 NS NS Increase in filled grain per panicle increases test weight except in TEJ and TRJ
DTH_PTH -0.46 NS NS NS 0.21 0.23 There is a relationship between DTH and PTH in ADMIX, TEJ and TRJ, although causality is not asumed
DTH_FGN NS NS -0.41 NS 0.53 0.56 DTH either increases or decreses FGN depending on the rice subpopulation
DTH_PNN NS NS NS NS NS NS Panicle number is not dependent on days to heading
DTH_FLN 0.5 NS NS NS 0.53 0.35 DTH regulates FLN
PTH_FGN NS NS NS NS NS NS PTH controls FGN in AUS
PTH_FLN NS NS NS NS 0.5 NS PTH controls FLN in TEJ
PTH_PNN NS NS NS NS -0.35 NS PTH controls PNN in TEJ

Predicting test weight with Florescence per panicle

# Does florescence per panicle predict test weight?

# Explore data
ggplot(norm1,
       aes(x = HHGW, y = FLN)) +
  geom_point() +
  stat_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'

# Fit a model
rice_mlr <- glm(FLN ~ HHGW + region,
                 data = norm1,
                 family = gaussian(link = "identity"))

# assumptions
plot(rice_mlr, which = 1)

plot(rice_mlr, which = 2)

shapiro.test(residuals(rice_mlr))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(rice_mlr)
## W = 0.96349, p-value = 6.921e-08
plot(rice_mlr, which = 4)

# view residuals
residualPlots(rice_mlr)

##      Test stat Pr(>|Test stat|)
## HHGW    0.0214           0.8838
# include residuals into data
norm1 <- norm1 %>%
  add_residuals(rice_mlr)

# plot residuals 
qplot(subpopulation, resid, data = norm1, geom = "boxplot")

# test of interaction between mass and species
rice_int <- glm(FLN ~ HHGW * region * subpopulation,
                 data = norm1,
                 family = gaussian(link = "identity"))

residualPlots(rice_int)

##      Test stat Pr(>|Test stat|)
## HHGW    0.0028            0.958
Anova(rice_int)
## Analysis of Deviance Table (Type II tests)
## 
## Response: FLN
##                           LR Chisq Df Pr(>Chisq)    
## HHGW                        22.643  1  1.950e-06 ***
## region                      10.697 12    0.55507    
## subpopulation               41.480  5  7.504e-08 ***
## HHGW:region                  8.996 12    0.70330    
## HHGW:subpopulation           6.178  5    0.28931    
## region:subpopulation        45.525 30    0.03447 *  
## HHGW:region:subpopulation   16.529 19    0.62177    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(rice_mlr)
## Analysis of Deviance Table (Type II tests)
## 
## Response: FLN
##        LR Chisq Df Pr(>Chisq)    
## HHGW     19.288  1  1.124e-05 ***
## region   40.182 12  6.708e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# post-hocs what is the effect of subpopulation and region at the average level of HHGW
rice_em <- emmeans(rice_mlr, specs = ~ region,
                    at = list(HHGW = 4))  # for specificity


# check which slopes are statistically different
rice_em %>%
  contrast(method = "tukey") %>%
  data.frame() %>%
  kbl(format = 'html') %>%
  kable_paper() %>%
  scroll_box(width = "700px",
             height = "350px")
contrast estimate SE df z.ratio p.value
Africa - Caribbean 0.0337395 0.4092806 Inf 0.0824361 1.0000000
Africa - Central America -0.0732953 0.4088549 Inf -0.1792698 1.0000000
Africa - Central Asia 0.9839374 0.4125749 Inf 2.3848698 0.4546777
Africa - East Asia 0.5121375 0.1945159 Inf 2.6328831 0.2902848
Africa - East Europe 0.6681605 0.2842514 Inf 2.3505973 0.4795893
Africa - North America -0.1902123 0.2119240 Inf -0.8975497 0.9995857
Africa - Oceania 0.4180241 0.4131480 Inf 1.0118022 0.9986235
Africa - South America 0.2539267 0.2304817 Inf 1.1017221 0.9968816
Africa - South Asia 0.4909530 0.2105321 Inf 2.3319626 0.4932626
Africa - Southeast Asia 0.0977997 0.2125899 Inf 0.4600395 0.9999997
Africa - West Asia 1.1566380 0.3214016 Inf 3.5987313 0.0193922
Africa - West Europe 0.3249966 0.2551319 Inf 1.2738379 0.9884219
Caribbean - Central America -0.1070348 0.5362086 Inf -0.1996141 1.0000000
Caribbean - Central Asia 0.9501979 0.5372101 Inf 1.7687641 0.8653087
Caribbean - East Asia 0.4783980 0.3954042 Inf 1.2098963 0.9926346
Caribbean - East Europe 0.6344211 0.4489458 Inf 1.4131352 0.9725897
Caribbean - North America -0.2239518 0.4050367 Inf -0.5529173 0.9999978
Caribbean - Oceania 0.3842846 0.5374829 Inf 0.7149708 0.9999625
Caribbean - South America 0.2201873 0.4154044 Inf 0.5300552 0.9999987
Caribbean - South Asia 0.4572135 0.4017533 Inf 1.1380455 0.9957791
Caribbean - Southeast Asia 0.0640602 0.4044683 Inf 0.1583814 1.0000000
Caribbean - West Asia 1.1228985 0.4716060 Inf 2.3810096 0.4574661
Caribbean - West Europe 0.2912571 0.4326647 Inf 0.6731706 0.9999806
Central America - Central Asia 1.0572327 0.5379487 Inf 1.9653040 0.7557753
Central America - East Asia 0.5854329 0.3963321 Inf 1.4771271 0.9612775
Central America - East Europe 0.7414559 0.4486327 Inf 1.6527013 0.9130482
Central America - North America -0.1169170 0.4055887 Inf -0.2882649 1.0000000
Central America - Oceania 0.4913194 0.5382968 Inf 0.9127296 0.9995083
Central America - South America 0.3272221 0.4157816 Inf 0.7870046 0.9998949
Central America - South Asia 0.5642483 0.4034667 Inf 1.3985001 0.9747789
Central America - Southeast Asia 0.1710950 0.4054356 Inf 0.4220030 0.9999999
Central America - West Asia 1.2299333 0.4720944 Inf 2.6052697 0.3068046
Central America - West Europe 0.3982919 0.4316315 Inf 0.9227592 0.9994506
Central Asia - East Asia -0.4717999 0.3941696 Inf -1.1969464 0.9933115
Central Asia - East Europe -0.3157769 0.4516975 Inf -0.6990892 0.9999707
Central Asia - North America -1.1741497 0.4050387 Inf -2.8988582 0.1593394
Central Asia - Oceania -0.5659133 0.5361288 Inf -1.0555548 0.9979246
Central Asia - South America -0.7300107 0.4159540 Inf -1.7550275 0.8716449
Central Asia - South Asia -0.4929844 0.3977910 Inf -1.2393052 0.9908876
Central Asia - Southeast Asia -0.8861377 0.4030556 Inf -2.1985494 0.5922718
Central Asia - West Asia 0.1727006 0.4715593 Inf 0.3662330 1.0000000
Central Asia - West Europe -0.6589408 0.4379017 Inf -1.5047686 0.9554425
East Asia - East Europe 0.1560230 0.2677101 Inf 0.5828058 0.9999961
East Asia - North America -0.7023499 0.1786253 Inf -3.9319726 0.0056072
East Asia - Oceania -0.0941134 0.3942204 Inf -0.2387331 1.0000000
East Asia - South America -0.2582108 0.2020626 Inf -1.2778753 0.9881016
East Asia - South Asia -0.0211846 0.1623590 Inf -0.1304798 1.0000000
East Asia - Southeast Asia -0.4143378 0.1743633 Inf -2.3762901 0.4608816
East Asia - West Asia 0.6445004 0.3003711 Inf 2.1456808 0.6311814
East Asia - West Europe -0.1871409 0.2433461 Inf -0.7690318 0.9999179
East Europe - North America -0.8583729 0.2807830 Inf -3.0570690 0.1055434
East Europe - Oceania -0.2501364 0.4521913 Inf -0.5531651 0.9999978
East Europe - South America -0.4142338 0.2951159 Inf -1.4036310 0.9740272
East Europe - South Asia -0.1772076 0.2791821 Inf -0.6347383 0.9999899
East Europe - Southeast Asia -0.5703608 0.2810882 Inf -2.0291166 0.7136716
East Europe - West Asia 0.4884774 0.3704347 Inf 1.3186602 0.9844510
East Europe - West Europe -0.3431639 0.3153776 Inf -1.0881050 0.9972271
North America - Oceania 0.6082364 0.4052299 Inf 1.5009663 0.9562811
North America - South America 0.4441391 0.2209937 Inf 2.0097362 0.7267339
North America - South Asia 0.6811653 0.1894020 Inf 3.5963995 0.0195514
North America - Southeast Asia 0.2880121 0.1976721 Inf 1.4570191 0.9651515
North America - West Asia 1.3468503 0.3137507 Inf 4.2927399 0.0012543
North America - West Europe 0.5152090 0.2559775 Inf 2.0127116 0.7247432
Oceania - South America -0.1640974 0.4162045 Inf -0.3942709 1.0000000
Oceania - South Asia 0.0729289 0.3975169 Inf 0.1834611 1.0000000
Oceania - Southeast Asia -0.3202244 0.4030811 Inf -0.7944416 0.9998839
Oceania - West Asia 0.7386139 0.4717179 Inf 1.5657958 0.9403280
Oceania - West Europe -0.0930275 0.4386902 Inf -0.2120573 1.0000000
South America - South Asia 0.2370262 0.2126491 Inf 1.1146353 0.9965213
South America - Southeast Asia -0.1561270 0.2191544 Inf -0.7124065 0.9999640
South America - West Asia 0.9027112 0.3273863 Inf 2.7573277 0.2225328
South America - West Europe 0.0710699 0.2708806 Inf 0.2623661 1.0000000
South Asia - Southeast Asia -0.3931532 0.1824945 Inf -2.1543297 0.6248588
South Asia - West Asia 0.6656850 0.3068328 Inf 2.1695369 0.6136967
South Asia - West Europe -0.1659563 0.2596732 Inf -0.6390970 0.9999891
Southeast Asia - West Asia 1.0588382 0.3120686 Inf 3.3929662 0.0387769
Southeast Asia - West Europe 0.2271969 0.2582749 Inf 0.8796709 0.9996630
West Asia - West Europe -0.8316413 0.3520620 Inf -2.3622016 0.4711173
# plot comparison
multcomp::cld(rice_em, adjust="tukey") %>%
  ggplot(aes(x = region, y = emmean, 
             ymin = asymp.LCL, ymax = asymp.UCL,
             color = factor(.group))) +
  geom_pointrange() +
  coord_flip() +
  labs(title = "Slope comparison across regions")
## Note: adjust = "tukey" was changed to "sidak"
## because "tukey" is only appropriate for one set of pairwise comparisons

# visualize!
rice_newdat <- modelr::data_grid(norm1,
                                  region = unique(region),
                                  HHGW = seq_range(HHGW, n = 100))


# create data predictions
rice_predict <- predict(rice_mlr,
                         newdata = rice_newdat,
                         type = "response",
                         interval = "confidence")

# add data predictions
rice_newdat <- rice_newdat %>%
  mutate(FLN = rice_predict)

# plot regression on predicted data
ggplot(norm1,
       aes(x = HHGW, y = FLN, color = region)) +
  geom_point()+
  geom_line(data = rice_newdat) +
  labs(title = "Regression on predicted data")

# compare means of new fit
rice_newfit <- emmeans(rice_mlr, specs = ~ region + HHGW,
                        at = list(HHGW = seq(1,3, length.out = 100))) %>%
  as_tibble() %>%
  mutate(FLN = emmean)

# plot regresion slopes with prediction intervals
ggplot(norm1,
       aes(x = HHGW, y = FLN, color = region)) +
  geom_point() +
  geom_line(data = rice_newfit) +
  geom_ribbon(data = rice_newfit,
              aes(ymin = asymp.LCL, ymax = asymp.UCL, group = region, colour = region),
              alpha = 0.1, color = "lightgrey") +
  theme_classic() +
  labs(title = "Regression on predicted data showing prediction intervals")

The linear regression passed all necessary assumptions. All residuals were clustered close to the regression line and the cook distance was 0.04 indicating normality of data. Comparison of slopes with ‘emmeans’ showed that slopes were not statistically different. Regression with the predicted data revealed slope alignment within the predicted values while all regression lines fell within the prediction interval.

Discussion

Total florets per panicle has the most significant association with test weight which agrees with the result of (Ren et al., 2020), where altering meristem determinacy and restoring lateral floret formation significantly increased grains per panicle and subsequently yield. Plant height is a significant morphological trait influencing yield performance. Molecular cloning and functional analyses of several genes associated with PTH in rice have revealed connections to the synthesis and regulation of the phytohormone (gibberellin) (Miedaner et al., 2019). TEJ was observed to have the highest test weight which shows a significant relationship with plant height based on the SEM analysis. Indirect effect traits like DTH affects test weight via FLN and should be fully considered for breeding purposes in TEJ and TRJ subpopulations.

Conclusion

This research was able to assess the relationship between test weight and other agronomic traits and identify significant associations. The major determinant of test weight is the total floret per panicle except in Temperate japonica, where test weight is determined by plant height. Also the prediction of test weight across regions using floret number gave a positive result. Results obtained from this study will improve crop management techniques in rice breeders.

References

Deivasigamani S, & Swaminathan C. (2018). Evaluation of Seed Test Weight on Major Field Crops. Www.Arcjournals.Org International Journal of Research Studies in Agricultural Sciences, 4(1), 2454–6224. crossref

Eizenga, G. C., Ali, M. L., Bryant, R. J., Yeater, K. M., McClung, A. M., & McCouch, S. R. (2014). Registration of the Rice Diversity Panel 1 for Genomewide Association Studies. Journal of Plant Registrations, 8(1), 109–116. crossref

Li, R., Li, M., Ashraf, U., Liu, S., & Zhang, J. (2019). Exploring the relationships between yield and yield-related traits for rice varieties released in china from 1978 to 2017. Frontiers in Plant Science. crossref

Miedaner, T., Laidig, F., Zhao, J., Zhang, J., Li, R., Li, M., Ashraf, U., & Liu, S. (2019). Exploring the Relationships Between Yield and Yield-Related Traits for Rice Varieties Released in China From 1978 to 2017. crossref

Prakash Poudel, A. (2018). Analysis of Growth and Yield Attributing Characteristic of Direct Seeded Upland Rice Genotypes in Western Hills of Nepal. In Acta Scientific Agriculture (Vol. 2).

Puji Lestari, A., Sopandie, D., & Aswidinnoor, H. (2015). Panicle Length and Weight Performance of F3 Population from Local and Introduction Hybridization of Rice Varieties. HAYATI Journal of Biosciences April, 22(2), 87–92. crossref

Regression, M., & Models, E. (2005). c ontRibutions Commentary. Commentary, 86(October), 283–295. [crossref](http://www.esajournals.org/doi/abs/10.1890/0012-9623(2007)88[72:AHOTES]2.0.CO;2

Ren, D., Li, Y., He, G., & Qian, Q. (2020). Multifloret spikelet improves rice yield. New Phytologist, 225(6), 2301–2306.[crossref] (https://doi.org/10.1111/nph.16303)

Saleh, M. M., Salem, K. F. M., & Elabd, A. B. (2020). Definition of selection criterion using correlation and path coefficient analysis in rice (Oryza sativa L .) genotypes.

Venkateswarlu, B., Prasad, G. S. V, & Prasad, A. S. R. (n.d.). STUDIES ON THE NATURE OF RELATIONSHIPS BETWEEN GRAIN SIZE , SPIKELET NUMBER , GRAIN YIELD AND SPIKELET FILLING IN LATE DURATION VARIETIES OF RICE ( OR YZA SA TIVA L .) Grain number Grain size Productivity Rate of grain growth Rice Spikelet filling. 130(1981), 123–130.