Workflow

  1. Loading libraries and data
  • Loading necessary libraries
  • Loading necessary data
  1. Estimation of the number of bi-clusters
  • Seriation of data
  • Seriation of normalized data
  • Estimation of the number of bi-clusters
  1. Applying iBBiG on normalized data
  • Calculation of top league bi-cluster
  • Calculation of F-tests of the top league bi-cluster
  • Heat maps of the top league
  • Checking stability of the top league bi-cluster
  • Calculation of partial rankings
  • Comparison of partial rankings to the adequate part of the original ranking
  1. Applying iBBiG on reversed normalized data
  • Calculation of lower league bi-cluster
  • Calculation of F-tests of the lower league bi-cluster
  • Heat maps of the lower league
  • Checking stability of the lower league bi-cluster
  • Calculation of partial rankings
  • Comparison of partial rankings to the adequate part of the original ranking
  1. Applying BicARE
  • Calculation of middle league bi-cluster
  • Calculation of F-tests of the middle league bi-cluster
  • Heat maps of the middle league
  • Checking stability of the middle league bi-cluster
  • Calculation of partial rankings
  • Comparison of partial rankings to the adequate part of the original ranking
  1. Specifying overlaps
  • Venn diagram of indicators
  • Venn diagram of countries
  1. Summing up the conclusions

Loading libraries and data

Loading necessary libraries

#Seriation and biclustering
library(seriation) #For seriate a matrix
library(biclust) #General biclustering functions
library(iBBiG) #iBBiG biclustering
library(BicARE) #BicARE biclustering
library(BcDiag) #Diagnostics of biclusters

#Additional statistic packages
library(factoextra) #Additional distance functions
library(corrgram) #Correlation diagram
library(VennDiagram) #VennDiagram
library(venneuler) #VennDiagram

#For formatted and interactive tables and figures
library(knitr) #Formatted tables, figures

Citations

#Seriation and biclustering
print(citation("seriation"),style="text") #For seriate a matrix
## Hahsler M, Buchta C, Hornik K (2018). _seriation: Infrastructure
## for Ordering Objects Using Seriation_. R package version 1.2-3,
## <URL: https://CRAN.R-project.org/package=seriation>.
## 
## Hahsler M, Hornik K, Buchta C (2008). "Getting things in order: An
## introduction to the R package seriation." _Journal of Statistical
## Software_, *25*(3), 1-34. ISSN 1548-7660, <URL:
## http://www.jstatsoft.org/v25/i03/>.
print(citation("biclust"),style="text") #General biclustering functions
## Kaiser S, Santamaria R, Khamiakova T, Sill M, Theron R, Quintales
## L, Leisch F, De Troyer. E (2018). _biclust: BiCluster Algorithms_.
## R package version 2.0.1, <URL:
## https://CRAN.R-project.org/package=biclust>.
print(citation("iBBiG"),style="text") #iBBiG biclustering
## Gusenleitner D, Culhane A (2011). _iBBiG: Iterative Binary
## Biclustering of Genesets_. R package version 1.24.0, <URL:
## http://bcb.dfci.harvard.edu/~aedin/publications/>.
print(citation("BicARE"),style="text") #BicARE biclustering
## Gestraud P (2008). _BicARE: Biclustering Analysis and Results
## Exploration_. R package version 1.38.0, <URL:
## http://bioinfo.curie.fr>.
print(citation("BcDiag"),style="text") #Diagnostics of biclusters
## Mengsteab A, Otava M, Khamiakova T, De Troyer E (2015). _BcDiag:
## Diagnostics Plots for Bicluster Data_. R package version 1.0.10,
## <URL: https://CRAN.R-project.org/package=BcDiag>.
#Additional statistic packages
print(citation("factoextra"),style="text") #Additional distance functions
## Kassambara A, Mundt F (2017). _factoextra: Extract and Visualize
## the Results of Multivariate Data Analyses_. R package version
## 1.0.5, <URL: https://CRAN.R-project.org/package=factoextra>.
print(citation("VennDiagram"),style="text") #VennDiagram
## Chen H (2018). _VennDiagram: Generate High-Resolution Venn and
## Euler Plots_. R package version 1.6.20, <URL:
## https://CRAN.R-project.org/package=VennDiagram>.
print(citation("venneuler"),style="text") #VennDiagram
## Wilkinson L (2011). _venneuler: Venn and Euler Diagrams_. R
## package version 1.1-0, <URL:
## https://CRAN.R-project.org/package=venneuler>.
#For formatted and interactive tables and figures
print(citation("knitr"),style="text") #Formatted tables, figures
## Xie Y (2018). _knitr: A General-Purpose Package for Dynamic Report
## Generation in R_. R package version 1.20, <URL:
## https://yihui.name/knitr/>.
## 
## Xie Y (2015). _Dynamic Documents with R and knitr_, 2nd edition.
## Chapman and Hall/CRC, Boca Raton, Florida. ISBN 978-1498716963,
## <URL: https://yihui.name/knitr/>.
## 
## Xie Y (2014). "knitr: A Comprehensive Tool for Reproducible
## Research in R." In Stodden V, Leisch F, Peng RD (eds.),
## _Implementing Reproducible Computational Research_. Chapman and
## Hall/CRC. ISBN 978-1466561595, <URL:
## http://www.crcpress.com/product/isbn/9781466561595>.

Loading necessary data

load("U21_BIC.RData")
#ALL: 50 by 24 dataframe of normalized data, with two letters indicator names
#U21: 50 by 24 dataframe of normalized data, with original indicator names
#mtx: 50 by 24 matrix of normalized data, with two letters indicator names
#orig_mtx: 50 by 24 matrix of unnormalized data, with two letters indicator names
#U21_rank: 50 by 1 matrix (vector) of U21'S country ranks
#weights: 1 by 24 matrix (vector) of weights 

Estimation of the number of bi-clusters

Seriation of data

Firstly, to obtain a preliminary picture of the possible bi-clusters, the seriation of the original data is done.

ORIG_SER <- c(seriate(get_dist(orig_mtx,"euclidean"),method="HC_COMPLETE"),seriate(get_dist(t(orig_mtx),"spearman"),method="HC_COMPLETE"))
ORIG_ORDERED<-orig_mtx[get_order(ORIG_SER,dim=1),get_order(ORIG_SER,dim=2)]
rownames(ORIG_ORDERED) <- rownames(mtx[get_order(ORIG_SER,dim=1),get_order(ORIG_SER,dim=2)])

After seriation, plotting the heat map of the original data (see Fig. 1) can help to detect the possible bi-clusters. On Fig. 1 blue cells indicate a lower value of the indicators, and red cells mean a higher value of the indicators.

hmap(as.matrix(ORIG_ORDERED),col=bluered(100),showdist="both")
**Fig. 1** Heat map of the original matrix after seriation

Fig. 1 Heat map of the original matrix after seriation

Seriation of normalized data

After the normalization of the original data, seriation is done on the normalized data.

ALL_SER <- c(seriate(get_dist(ALL,"euclidean"),method="HC_COMPLETE"),seriate(get_dist(t(ALL),"spearman"),method="HC_COMPLETE"))
ALL_ORDERED<-ALL[get_order(ALL_SER,dim=1),get_order(ALL_SER,dim=2)]
rownames(ALL_ORDERED) <- rownames(mtx[get_order(ALL_SER,dim=1),get_order(ALL_SER,dim=2)])

Estimating the number of bi-clusters

After seriation, plotting the heat map of the normalized original data (see Fig. 2) can help to detect the possible bi-clusters.

hmap(as.matrix(ALL_ORDERED),col=bluered(100),showdist="both")
**Fig. 2** Heat map of the normalized and seriated matrix

Fig. 2 Heat map of the normalized and seriated matrix

After seriation, two bigger homogeneous blocks can be identified based on Fig. 2. The block of the red cells on the top left corner on Fig. 2 indicates the top league, while the bigger blue block, which indicates the remaining (lower) league, can be discovered in the bottom of the figure. The dendrogram of two-way clustering also shows that regarding rows and columns two main blocks can be specified.

Applying iBBiG on normalized data

Applying the iBBiG method on the original normalized data can specify the top league bi-clusters. First of all, the number of bi-clusters needed to be determined.

res <- iBBiG(binaryMatrix=binarize(mtx,threshold = 0.50),nModules = 2,alpha=0.05,pop_size = 100,mutation = 0.08,stagnation = 50,selection_pressure = 1.2,max_sp = 15,success_ratio = 0.8)
## Module:  1 ... done
## Module:  2 ... done
summary(res)
## 
## An object of class iBBiG 
## 
## Number of Clusters found:  2 
## 
## First  2  Cluster scores and sizes:
##                         M 1      M 2
## Cluster Score      287.3794 78.86695
## Number of Rows:     23.0000 22.00000
## Number of Columns:  19.0000  5.00000

The results above show that the iBBiG algorithm found 2 bi-clusters. In the first bi-cluster there are 23 rows (countries) and 19 columns (indicators), in the second bi-cluster there are 22 rows (countries) and 5 columns (indicators). The next step is to calculate F-tests to determine which bi-cluster is significant.

Obs.FStat <- NULL
for (i in c(1:res@Number)){
  Obs.FStat[[i]] <- computeObservedFstat(x=mtx,bicResult=res,number=i)
}
kable(Obs.FStat,caption = "**Table 1** The results of row and column effects for two bi-clusters")
Table 1 The results of row and column effects for two bi-clusters
Fstat PValue
Row Effect 5.357729 0.000000
Column Effect 17.509067 0.000000
Tukey test 4.561540 0.033312
Fstat PValue
Row Effect 1.3197912 0.1865824
Column Effect 17.5620858 0.0000000
Tukey test 0.2274441 0.6346788

In Table 1 the results of the F-tests can be seen. In the case of the first bi-cluster, both the row and column effect are significant. On the other hand, in the case of the second bi-cluster the row effect is not significant; thus the second bi-cluster is not significant. Only the first bi-cluster is significant which is the top league (League A).

Calculating top league bi-cluster

The next step is to identify the top league bi-cluster.

res <- iBBiG(binaryMatrix=binarize(mtx,threshold = 0.50),nModules = 1,alpha=0.05,pop_size = 100,mutation = 0.08,stagnation = 50,selection_pressure = 1.2,max_sp = 15,success_ratio = 0.8)
## Module:  1 ... done
summary(res)
## 
## An object of class iBBiG 
## 
## There was one cluster found with Score 287.3794 and 
##   23 Rows and  19 columns

The results of the iBBiG algorithm can be seen above. The algorithm found one bi-cluster in which there are 23 rows (countries) and 19 columns (indicators). The following step is to calculate the F-tests to determine whether the bi-cluster is significant or not.

Calculating F-tests of the top league bi-cluster

In Table 2 the results of the row and column effects (F-tests) can be read. Both the row and column effects are significant, so the top league bi-cluster is significant.

Obs.FStat <- NULL
for (i in c(1:res@Number)){
  Obs.FStat[[i]] <- computeObservedFstat(x=mtx,bicResult=res,number=i)
}
kable(Obs.FStat,caption = "**Table 2** The results of row and column effects for the top league")
Table 2 The results of row and column effects for the top league
Fstat PValue
Row Effect 5.357729 0.000000
Column Effect 17.509067 0.000000
Tukey test 4.561540 0.033312

To see the differences between those countries that are included in the top league and those that are not in the top league, one can plot the means, medians, variances, and median absolute deviations (MAD) for rows (countries) (see Fig. 3). (The horizontal axis of Fig. 3 shows the countries, the vertical axis shows the mentioned parameters on normalized (0-1) data of indicators.) On Fig. 3 the red line indicates those countries that are included in the top league, while the black line shows the remaining countries. The countries in the top league have higher means and medians, and have lower variances.

exploreBic(dset=mtx,bres=res,mname='biclust',pfor='all',gby='genes',bnum=1)
**Fig. 3** Mean, median, variance and MAD for the top league countries

Fig. 3 Mean, median, variance and MAD for the top league countries

One can do the same for the columns (indicators). Fig. 4 shows the differences between the means, medians, variances, and median absolute deviations (MAD) for the indicators. (The horizontal axis of Fig. 4 shows the indicators, the vertical axis shows the mentioned parameters on normalized (0-1) data of countries.) The red line indicates those indicators that are in the top league, while the black line shows the remaining indicators.

exploreBic(dset=mtx,bres=res,mname='biclust',pfor='all',gby='conditions',bnum=1)
**Fig. 4** Mean, median, variance and MAD for the top league indicators

Fig. 4 Mean, median, variance and MAD for the top league indicators

Heat maps of the top league

Fig. 5 depicts the heat map of the top league. The top league contains 23 countries and 19 indicators. The red cells mean higher values in the case of indicators, while blue cells symbol lower values.

drawHeatmap(x=mtx,bicResult=res,number=1,beamercolor = TRUE,paleta=bluered(100))
**Fig. 5** Heat map of the top league

Fig. 5 Heat map of the top league

Fig. 6 shows the heat map of the results of iBBiG on normalized data. The top league can be seen in the upper left corner of Fig. 6 which is enlarged on Fig.5.

drawHeatmap(x=mtx,bicResult=res,local=FALSE,number=1,beamercolor = TRUE,paleta=bluered(100))
**Fig. 6** Heat map of the output after applying iBBiG on normalized data

Fig. 6 Heat map of the output after applying iBBiG on normalized data

Checking stability of the top league bi-cluster

The next step is to check the stability of the top league bi-cluster. The results of the bootstrap can be seen below.

Bootstrap <- diagnoseColRow(x=as.matrix(ALL),bicResult=res,number=1,
  nResamplings=100,replace=TRUE)
Bootstrap
## $bootstrapFstats
##         fval.row  fval.col
##   [1,] 1.2026837 1.3205742
##   [2,] 1.3676736 1.7373911
##   [3,] 0.8118937 0.8880638
##   [4,] 1.5261295 0.2777014
##   [5,] 1.3993341 1.1837395
##   [6,] 0.6385318 1.4633654
##   [7,] 0.6782392 0.8177886
##   [8,] 0.6932455 0.7241169
##   [9,] 1.2680688 0.9662818
##  [10,] 1.2486366 1.2786787
##  [11,] 0.7145626 0.4491881
##  [12,] 1.2788351 0.8709748
##  [13,] 1.3980956 1.0535610
##  [14,] 0.7759009 0.7984441
##  [15,] 0.6002676 0.8329667
##  [16,] 0.6129999 0.6856887
##  [17,] 0.7342018 1.1730026
##  [18,] 0.9764269 0.9670248
##  [19,] 1.3275337 0.7802138
##  [20,] 0.7321205 1.5716447
##  [21,] 1.3674956 1.4694315
##  [22,] 0.9054332 0.9276000
##  [23,] 1.0326867 0.4573337
##  [24,] 1.1422485 0.5513133
##  [25,] 0.8244479 1.1786556
##  [26,] 0.7301205 1.1256426
##  [27,] 0.8888907 1.5445409
##  [28,] 0.7298719 1.1039908
##  [29,] 1.0712000 0.7759592
##  [30,] 1.4146496 1.6594966
##  [31,] 1.0158473 1.2888455
##  [32,] 0.6567816 0.7917884
##  [33,] 0.9938556 1.3326657
##  [34,] 0.9829198 0.7671811
##  [35,] 1.0054850 1.2350922
##  [36,] 1.1379685 0.9727460
##  [37,] 1.4185984 0.5458038
##  [38,] 0.8212765 0.6746406
##  [39,] 0.7687311 0.9305091
##  [40,] 0.6367571 1.4488547
##  [41,] 1.2353665 1.6661099
##  [42,] 0.9055664 1.1430397
##  [43,] 0.6303128 1.5898334
##  [44,] 0.9404475 1.0736001
##  [45,] 0.9473171 0.5897293
##  [46,] 1.2439256 0.7454299
##  [47,] 1.4762053 0.7966633
##  [48,] 2.0130147 1.0435483
##  [49,] 0.8742999 0.7333514
##  [50,] 0.8033141 0.9072758
##  [51,] 1.0645694 1.8677187
##  [52,] 1.1703943 1.5806795
##  [53,] 0.9365559 0.7206504
##  [54,] 1.2260734 1.1338262
##  [55,] 0.8655757 1.1302788
##  [56,] 1.4777824 0.7803232
##  [57,] 0.7955131 1.2670401
##  [58,] 1.0430997 1.0832247
##  [59,] 0.9365087 0.5431144
##  [60,] 0.5856033 1.4584063
##  [61,] 1.0306506 0.6672936
##  [62,] 0.7769869 0.3898064
##  [63,] 0.6773326 1.1247551
##  [64,] 1.5228648 0.8385164
##  [65,] 0.8937860 1.0508234
##  [66,] 0.7677707 1.0700413
##  [67,] 0.4700065 0.8493705
##  [68,] 0.9131953 1.2175865
##  [69,] 1.0241261 1.1784070
##  [70,] 0.8704849 0.7199816
##  [71,] 0.7734390 0.7956813
##  [72,] 0.8869042 0.7604852
##  [73,] 0.8432648 0.7902705
##  [74,] 1.8327000 0.9006882
##  [75,] 0.7023261 1.4872491
##  [76,] 1.0718623 0.6153486
##  [77,] 1.4625926 1.1640272
##  [78,] 0.8384632 0.9418639
##  [79,] 1.1045710 1.3423119
##  [80,] 1.3179668 0.9028345
##  [81,] 0.7559192 1.1921405
##  [82,] 0.8579936 0.6461717
##  [83,] 1.5849493 0.8311512
##  [84,] 0.5828022 1.3476813
##  [85,] 1.0394824 0.8311297
##  [86,] 0.7720017 0.8853921
##  [87,] 0.6229655 1.4731351
##  [88,] 0.6388157 0.9331130
##  [89,] 0.8828443 0.8890287
##  [90,] 0.7123351 0.7222937
##  [91,] 1.3333453 1.6154011
##  [92,] 1.1806568 1.3225893
##  [93,] 0.8281207 0.5309578
##  [94,] 0.7788410 1.4400201
##  [95,] 0.8769279 0.5160188
##  [96,] 0.9999855 0.9088493
##  [97,] 1.1408533 1.1005132
##  [98,] 1.4108602 0.6626816
##  [99,] 1.0242650 0.7655642
## [100,] 1.0936086 0.8243887
## 
## $observedFstatRow
## [1] 12.32355
## 
## $observedFstatCol
## [1] 18.58519
## 
## $bootstrapPvalueRow
## [1] 0
## 
## $bootstrapPvalueCol
## [1] 0

Since the bootstrapPvalueRow and the bootstrapPvalueCol are lower than 0.01, the bi-cluster is stable.

Calculating partial rankings

The next step is to calculate partial rankings based on the indicators and countries in the top league. The calculation of each score of each indicator of the countries are done according to the method used in the original U21 ranking. The original weights of U21’s indicators are used, and the total scores for the countries are calculated using the selected indicators in the given bi-cluster. Table 3 shows the result of partial ranking in the case of the top league.

C<-biclust::bicluster(orig_mtx,res,number=1)
B<-as.data.frame(C[[1]])
selectedweight<-weights[,colnames(B)]
colsR <- grep("R", names(B), value=T)
colsC <- grep("C", names(B), value=T)
colsO <- grep("O", names(B), value=T)
colsE <- grep("E", names(B), value=T)
swR<-weights[,colsR]
swC<-weights[,colsC]
swO<-weights[,colsO]
swE<-weights[,colsE]
BR<-rowSums(B[,colsR]*swR)*100/max(rowSums(B[,colsR]*swR))
BC<-rowSums(B[,colsC]*swC)*100/max(rowSums(B[,colsC]*swC))
BO<-rowSums(B[,colsO]*swO)*100/max(rowSums(B[,colsO]*swO))
BE<-rowSums(B[,colsE]*swE)*100/max(rowSums(B[,colsE]*swE))
B$R_Score<-BR
B$E_Score<-BE
B$C_Score<-BC
B$O_Score<-BO
B$Overall_Score<-rowSums(cbind(BR,BE,BC,BO)*c(0.2,0.2,0.2,0.4))*100/max(rowSums(cbind(BR,BE,BC,BO)*c(0.2,0.2,0.2,0.4)))
B$Rank<-rank(-B$Overall_Score)
kable(B,caption = "**Table 3** League A by iBBiG",digits = 2)
Table 3 League A by iBBiG
R2 R3 R4 R5 E1 E2 E3 E4 C2 C3 C4 C5 C6 O2 O3 O4 O6 O7 O8 R_Score E_Score C_Score O_Score Overall_Score Rank
Australia 57.1 59.2 59.4 63.8 100.0 84.0 95.5 93.2 67.30 32.20 86.30 73.10 44.20 95.50 79.30 62.20 82.60 71.70 55.20 70.67 88.03 72.54 90.18 84.45 9
Austria 53.6 58.7 72.9 79.8 100.0 77.9 100.0 85.1 88.30 20.90 41.90 67.20 84.10 62.60 78.00 46.00 70.40 36.20 59.40 73.98 96.70 72.37 71.22 79.74 15
Belgium 50.0 59.3 47.8 47.6 100.0 91.2 100.0 94.6 91.00 36.60 30.80 74.10 63.00 66.00 85.60 52.30 68.70 64.70 49.50 61.14 95.73 64.94 78.13 77.97 16
Canada 96.4 87.9 64.5 70.5 100.0 84.0 88.6 77.7 69.10 33.60 80.80 85.60 64.50 84.10 81.60 50.30 94.60 96.00 58.10 89.08 95.42 83.56 93.86 93.02 4
Denmark 67.9 74.2 100.0 100.0 100.0 84.0 95.5 75.7 89.25 34.95 40.45 89.15 96.25 92.45 92.45 73.05 72.95 62.95 91.05 100.00 86.27 86.75 97.94 94.30 3
Finland 67.9 65.4 78.7 74.6 100.0 100.0 100.0 92.6 77.20 43.30 28.50 87.80 79.40 90.90 77.80 58.10 94.80 73.50 100.00 82.33 99.67 79.98 100.00 91.15 7
France 53.6 58.9 49.3 46.0 100.0 84.0 95.5 87.2 78.40 8.40 9.30 67.50 44.90 51.90 73.80 25.30 56.60 55.70 51.30 61.96 89.36 50.21 63.54 67.78 23
Germany 45.6 62.2 54.1 54.2 100.0 78.6 97.7 77.0 70.50 37.50 42.70 86.80 61.30 50.20 80.40 32.70 56.10 51.50 54.30 63.61 95.72 67.14 65.68 73.37 18
Hong Kong 43.7 73.6 37.3 48.2 100.0 84.0 90.9 97.3 99.80 100.00 84.60 79.40 38.60 76.80 78.70 44.10 59.40 32.20 39.00 57.78 87.98 95.36 66.69 81.63 11
Ireland 57.1 62.6 47.7 52.1 100.0 75.3 100.0 85.8 81.80 13.00 31.50 81.00 52.90 70.70 77.50 37.40 72.90 70.50 45.70 66.51 96.51 69.30 75.68 77.82 17
Israel 60.7 42.0 53.3 42.6 100.0 84.0 90.9 85.8 70.20 68.20 39.60 100.00 50.10 72.90 71.40 78.30 61.90 86.80 16.30 57.63 87.91 86.38 78.29 80.13 14
Netherlands 60.7 67.1 75.8 85.3 100.0 79.1 100.0 100.0 76.60 44.40 58.80 85.20 89.50 88.40 98.30 63.30 75.80 59.90 43.40 88.24 98.31 87.47 86.67 92.59 5
New Zealand 57.1 40.7 43.2 32.9 100.0 100.0 100.0 95.9 79.60 14.00 40.00 62.60 50.20 74.50 75.70 59.60 80.10 73.50 50.70 51.75 100.00 53.46 83.64 80.20 13
Norway 60.7 72.4 59.0 83.1 100.0 84.0 95.5 86.5 80.80 25.10 31.90 79.40 76.20 75.30 79.30 59.70 72.50 71.20 73.30 80.06 94.29 73.54 87.11 84.26 10
Portugal 53.6 41.4 59.4 38.9 100.0 87.5 100.0 85.8 73.40 30.50 48.50 55.60 38.50 52.70 63.80 18.60 65.40 32.30 60.90 57.67 89.18 64.65 59.32 68.09 22
Singapore 50.7 93.6 57.3 86.8 98.7 73.3 88.6 91.9 80.80 18.00 58.00 81.50 47.30 75.20 84.20 29.60 52.10 71.30 80.40 84.80 94.86 71.04 79.34 85.56 8
Slovenia 46.4 37.9 27.5 21.1 100.0 78.2 100.0 85.8 63.60 78.40 45.50 42.30 56.50 60.10 53.30 21.80 84.40 46.90 57.90 39.29 83.66 70.68 65.52 70.71 20
Spain 46.4 52.3 38.9 32.0 100.0 79.6 100.0 77.0 63.80 62.60 63.50 49.80 43.00 43.30 64.40 12.20 82.00 59.00 38.30 52.61 96.05 68.89 60.43 71.54 19
Sweden 64.3 76.5 91.4 95.7 100.0 85.9 100.0 88.5 86.90 58.50 63.30 96.90 100.00 100.00 83.60 94.50 73.40 65.80 69.70 97.22 90.91 100.00 98.36 100.00 1
Switzerland 54.1 85.6 71.6 84.3 98.5 74.1 97.7 77.0 100.00 40.90 61.80 98.30 60.50 93.40 100.00 100.00 53.90 65.80 42.60 88.53 94.22 94.78 92.04 94.48 2
Taiwan 66.4 50.4 14.4 14.4 97.8 67.2 88.6 89.2 28.90 83.40 38.40 80.70 36.20 65.00 56.20 21.40 64.50 74.60 61.40 44.39 74.91 70.23 69.30 68.83 21
United Kingdom 50.0 62.0 45.7 44.6 100.0 87.0 100.0 85.8 76.20 26.60 72.40 83.00 64.60 79.20 91.90 55.20 60.70 73.70 56.30 61.63 97.68 75.79 84.23 81.49 12
United States 100.0 100.0 41.6 53.3 100.0 95.4 100.0 95.9 48.30 31.90 100.00 95.10 62.60 63.20 95.40 49.90 94.60 79.40 61.00 85.59 97.07 81.18 89.58 91.37 6

Comparing partial rankings to the adequate part of the original ranking

The final step is to compare the results of the partial rankings to the corresponding part of the original rankings. The results of the Spearman’s rank correlation shows a strong positive relationship between the two rankings.

cor(B$Rank,U21_rank[rownames(B),],method="spearman")
## [1] 0.9041502
cor.test(B$Rank,U21_rank[rownames(B),],method="spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  B$Rank and U21_rank[rownames(B), ]
## S = 194, p-value = 3.284e-06
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9041502

Applying iBBiG on reversed normalized data

To identify the lower league(s), the reverse data matrix is needed to be calculated firstly. Applying the iBBiG method on the reversed normalized data can specify the lower league bi-clusters.

rmtx <- 1-mtx 
rres <- iBBiG(binaryMatrix=binarize(rmtx,threshold = NA),nModules = 2,alpha=0.08,pop_size = 100,mutation = 0.08,stagnation = 50,selection_pressure = 1.2,max_sp = 15,success_ratio = 0.8)
## [1] "Threshold:  0.5"
## Module:  1 ... done
## Module:  2 ... done
summary(rres) 
## 
## An object of class iBBiG 
## 
## Number of Clusters found:  2 
## 
## First  2  Cluster scores and sizes:
##                         M 1      M 2
## Cluster Score      535.1661 52.10892
## Number of Rows:     38.0000 11.00000
## Number of Columns:  19.0000  7.00000

The results above show that the iBBiG algorithm found 2 bi-clusters on the reversed data. In the first bi-cluster there are 38 rows (countries) and 19 columns (indicators), in the second bi-cluster there are 11 rows (countries) and 7 columns (indicators). The next step is to calculate F-tests to determine which bi-cluster is significant.

Obs.FStat <- NULL 
for (i in c(1:rres@Number)){
  Obs.FStat[[i]] <- computeObservedFstat(x=rmtx,bicResult=rres,number=i) 
  } 
kable(Obs.FStat,caption = "**Table 4** The results of row and column effects of the reversed data for the two bi-clusters")
Table 4 The results of row and column effects of the reversed data for the two bi-clusters
Fstat PValue
Row Effect 8.318348 0.0000000
Column Effect 15.026782 0.0000000
Tukey test 4.135302 0.0423941
Fstat PValue
Row Effect 0.4131750 0.9350382
Column Effect 7.6626744 0.0000039
Tukey test 0.0194771 0.8894826

In Table 4 the results of the F-tests can be seen. In the case of the first bi-cluster, both the row and column effect are significant. On the other hand, in the case of the second bi-cluster the row effect is not significant; thus the second bi-cluster is not significant. Only the first bi-cluster is significant which is the lower league (League C).

Calculating lower league bi-cluster

The next step is to identify the lower league bi-cluster.

## [1] "Threshold:  0.5"
## Module:  1 ... done
## 
## An object of class iBBiG 
## 
## There was one cluster found with Score 435.976 and 
##   38 Rows and  19 columns

The results of the iBBiG algorithm on the reversed data can be seen above. The algorithm found one bi-cluster in which there are 38 rows (countries) and 19 columns (indicators). The following step is to calculate the F-tests to determine whether the bi-cluster is significant or not.

Calculating F-tests of the lower league bi-cluster

In Table 5 the results of the row and column effects (F-tests) can be read. Both the row and column effects are significant, so the lower league bi-cluster is significant.

Obs.FStat <- NULL 
for (i in c(1:rres@Number)){
  Obs.FStat[[i]] <- computeObservedFstat(x=rmtx,bicResult=rres,number=i) 
  } 
kable(Obs.FStat,caption = "**Table 5** The results of row and column effects of the lower league")
Table 5 The results of row and column effects of the lower league
Fstat PValue
Row Effect 8.318348 0.0000000
Column Effect 15.026782 0.0000000
Tukey test 4.135302 0.0423941

To see the differences between those countries that are included in the lower league and those that are not in the lower league, one can plot the means, medians, variances, and median absolute deviations (MAD) for rows (countries) (see Fig. 7). On Fig. 7 the red line indicates those countries that are included in the lower league, while the black line shows the remaining countries. The countries in the lower league have lower means, medians, and variances.

exploreBic(dset=mtx,bres=rres,mname='biclust',pfor='all',gby='genes',bnum=1)
**Fig. 7** Mean, median, variance and MAD for lower leauge countries

Fig. 7 Mean, median, variance and MAD for lower leauge countries

One can do the same for the columns (indicators). Fig. 8 shows the differences between the means, medians, variances, and median absolute deviations (MAD) for the indicators. The red line indicates those indicators that are in the lower league, while the black line shows the remaining indicators.

exploreBic(dset=mtx,bres=rres,mname='biclust',pfor='all',gby='conditions',bnum=1)
**Fig. 8** Mean, median, variance and MAD for lower leauge indicators

Fig. 8 Mean, median, variance and MAD for lower leauge indicators

Heat maps of the lower league

Fig. 9 depicts the heat map of the lower league. The lower league contains 38 countries and 19 indicators. The red cells mean higher values in the case of indicators, while blue cells symbol lower values.

drawHeatmap(x=mtx,bicResult=rres,number=1,paleta=bluered(100),beamercolor = TRUE)
**Fig. 9** Heat map of the lower leauge

Fig. 9 Heat map of the lower leauge

Fig. 10 shows results of iBBiG on reversed normalized data. The lower league can be seen in the upper left corner of Fig. 10 which is enlarged on Fig.9.

drawHeatmap(x=mtx,bicResult=rres,local=FALSE,number=1,paleta=bluered(100),beamercolor = TRUE) 
**Fig. 10** Heat map of the output after applying iBBiG on reversed normalized data

Fig. 10 Heat map of the output after applying iBBiG on reversed normalized data

Checking stability of the lower league bi-cluster

The next step is to check the stability of the lower league bi-cluster. The results of the bootstrap can be seen below.

Bootstrap <- diagnoseColRow(x=as.matrix(ALL),bicResult=rres,number=1,
  nResamplings=100,replace=TRUE)
Bootstrap
## $bootstrapFstats
##         fval.row  fval.col
##   [1,] 1.4237254 0.4483903
##   [2,] 1.4660081 0.5321757
##   [3,] 1.0756345 0.6481464
##   [4,] 0.6161736 0.4184992
##   [5,] 0.8319811 0.9652949
##   [6,] 0.9994922 0.6431964
##   [7,] 1.0388106 1.0590753
##   [8,] 1.0280197 1.1136457
##   [9,] 0.8342056 1.2137646
##  [10,] 1.2702670 0.6305286
##  [11,] 1.4993470 1.5994159
##  [12,] 1.4072294 1.0876665
##  [13,] 0.6683582 0.6523523
##  [14,] 1.3002412 0.9732727
##  [15,] 0.8109792 1.4170151
##  [16,] 1.3957309 1.1748224
##  [17,] 0.7704592 1.7317029
##  [18,] 0.8286294 1.0210135
##  [19,] 0.7952631 0.8611124
##  [20,] 0.8699187 0.9030868
##  [21,] 0.4598349 0.9110233
##  [22,] 0.9199109 1.2887099
##  [23,] 1.7248299 0.6981561
##  [24,] 0.8492007 1.1058571
##  [25,] 0.9534347 1.1333928
##  [26,] 1.0724603 1.1659252
##  [27,] 0.8559765 1.3293447
##  [28,] 1.2010227 0.8924740
##  [29,] 0.8670476 1.2684569
##  [30,] 0.8881162 1.1947255
##  [31,] 1.2930667 1.0348202
##  [32,] 1.1513937 0.4929972
##  [33,] 1.2167372 1.5201769
##  [34,] 1.1054566 0.9970977
##  [35,] 0.9720816 1.3678695
##  [36,] 1.2324870 0.8888273
##  [37,] 0.8758754 0.9113400
##  [38,] 1.0504969 0.6959262
##  [39,] 0.9481555 0.8698783
##  [40,] 0.8574165 0.7278181
##  [41,] 1.1682113 1.7338856
##  [42,] 1.0295635 1.0645590
##  [43,] 0.8167552 1.0235224
##  [44,] 1.1345097 0.3714476
##  [45,] 1.1068983 1.0585954
##  [46,] 0.9331512 1.0730467
##  [47,] 0.9081018 1.2677403
##  [48,] 1.1199429 1.2298908
##  [49,] 0.9518410 1.6922662
##  [50,] 0.8673587 0.9918538
##  [51,] 1.5022798 2.0779756
##  [52,] 0.9230245 0.9648401
##  [53,] 1.0639705 1.0593616
##  [54,] 0.9967370 0.7841320
##  [55,] 1.1538045 0.8541580
##  [56,] 0.7391143 1.1935220
##  [57,] 0.9548899 1.4376680
##  [58,] 0.7845879 0.9574194
##  [59,] 0.8685536 0.6267966
##  [60,] 0.7850624 0.6005306
##  [61,] 1.5168676 0.7378354
##  [62,] 0.6869145 0.8986626
##  [63,] 0.8461919 1.9766450
##  [64,] 0.6967023 0.6841110
##  [65,] 1.5876723 1.0910674
##  [66,] 1.2528341 0.9940788
##  [67,] 1.3605008 1.1436539
##  [68,] 1.0891424 0.7832727
##  [69,] 0.9117973 1.0458372
##  [70,] 1.4436080 1.0489678
##  [71,] 1.0215558 1.0596523
##  [72,] 0.6877975 1.1566910
##  [73,] 0.8797977 0.5376220
##  [74,] 1.0785625 0.9498554
##  [75,] 0.9369067 0.9476483
##  [76,] 0.8628462 0.7576334
##  [77,] 1.3429453 0.9157648
##  [78,] 0.8931006 0.7982962
##  [79,] 0.7780720 1.0305868
##  [80,] 1.3539476 0.9532098
##  [81,] 1.1326380 0.7152527
##  [82,] 0.8734360 0.6844782
##  [83,] 0.8730422 1.2308409
##  [84,] 0.5771400 1.0570330
##  [85,] 1.2306217 0.7401573
##  [86,] 0.9894361 0.4473581
##  [87,] 0.6143227 0.4550165
##  [88,] 1.4916273 0.9966152
##  [89,] 1.2194575 1.0031049
##  [90,] 1.0397591 0.8130222
##  [91,] 1.0424340 0.7657433
##  [92,] 0.7727385 1.0516976
##  [93,] 0.6500320 0.6626070
##  [94,] 1.0223807 1.2730589
##  [95,] 0.9393109 1.4116153
##  [96,] 1.0940706 1.2636057
##  [97,] 0.7369514 1.0152232
##  [98,] 1.1536404 1.1711906
##  [99,] 1.1032971 1.8034645
## [100,] 1.1624509 1.1195184
## 
## $observedFstatRow
## [1] 9.115407
## 
## $observedFstatCol
## [1] 14.83461
## 
## $bootstrapPvalueRow
## [1] 0
## 
## $bootstrapPvalueCol
## [1] 0

Since the bootstrapPvalueRow and the bootstrapPvalueCol are lower than 0.01, the bi-cluster is stable.

Calculating partial rankings

The next step is to calculate partial rankings based on the indicators and countries in the lower league. The calculation of each score of each indicator of the countries are done according to the method used in the original U21 ranking. The original weights of U21’s indicators are used, and the total scores for the countries are calculated using the selected indicators in the given bi-cluster. Table 6 shows the result of partial ranking in the case of the lower league.

C<-biclust::bicluster(orig_mtx,rres,number=1)
B<-as.data.frame(C[[1]])
selectedweight<-weights[,colnames(B)]
colsR <- grep("R", names(B), value=T)
colsC <- grep("C", names(B), value=T)
colsO <- grep("O", names(B), value=T)
swR<-weights[,colsR]
swC<-weights[,colsC]
swO<-weights[,colsO]
BR<-rowSums(B[,colsR]*swR)*100/max(rowSums(B[,colsR]*swR))
BC<-rowSums(B[,colsC]*swC)*100/max(rowSums(B[,colsC]*swC))
BO<-rowSums(B[,colsO]*swO)*100/max(rowSums(B[,colsO]*swO))
B$R_Score<-BR
B$C_Score<-BC
B$O_Score<-BO
B$Overall_Score<-rowSums(cbind(BR,BC,BO)*c(0.2,0.2,0.4))*100/max(rowSums(cbind(BR,BC,BO)*c(0.2,0.2,0.4)))
B$Rank<-rank(-B$Overall_Score)
kable(B,caption = "**Table 6** League C by iBBiG",digits = 2)
Table 6 League C by iBBiG
R1 R2 R3 R4 R5 C1 C3 C4 C5 C6 O1 O2 O3 O4 O5 O6 O7 O8 O9 R_Score C_Score O_Score Overall_Score Rank
Argentina 49.3 53.6 18.3 14.4 14.4 1.6 14.2 9.8 52.0 13.0 1.5 7.2 47.4 2.0 7.5 74.2 25.5 15.7 53.9 47.65 28.62 33.61 38.99 34
Belgium 59.0 50.0 59.3 47.8 47.6 40.5 36.6 30.8 74.1 63.0 3.7 66.0 85.6 52.3 28.2 68.7 64.7 49.5 68.1 88.27 75.66 81.37 91.96 4
Brazil 38.4 42.9 51.4 14.4 14.4 1.0 24.7 23.9 47.4 20.6 10.2 10.2 41.1 1.8 18.8 52.3 21.7 9.5 85.5 52.82 33.75 75.11 60.39 19
Bulgaria 23.3 38.9 17.6 6.4 2.4 17.9 6.0 5.8 29.7 29.0 0.2 6.1 38.5 0.0 0.0 59.2 44.1 22.3 82.2 27.99 25.50 35.39 32.20 37
Chile 30.9 85.7 27.8 12.2 5.4 1.5 9.4 13.8 50.6 24.4 1.2 14.0 50.2 5.1 8.4 70.0 54.0 4.2 37.6 57.69 29.92 36.24 45.07 27
China 30.0 47.5 17.4 15.5 3.2 1.5 6.2 36.9 50.2 20.5 58.9 8.6 37.4 1.2 22.5 24.1 6.7 13.2 50.6 33.72 31.30 48.74 41.52 31
Croatia 32.7 35.5 24.4 22.1 10.7 3.0 82.4 6.8 30.2 45.8 0.6 29.2 42.3 10.2 4.2 58.3 33.7 21.1 66.5 42.02 59.02 53.02 59.98 20
Czech Republic 42.5 42.9 29.9 38.3 27.6 42.0 70.0 51.0 51.0 49.1 2.0 37.7 51.9 6.2 6.1 64.1 34.1 39.4 89.1 62.62 78.28 63.92 75.65 12
France 55.7 53.6 58.9 49.3 46.0 58.8 8.4 9.3 67.5 44.9 16.6 51.9 73.8 25.3 40.3 56.6 55.7 51.3 61.4 84.70 54.55 87.08 87.56 8
Germany 46.1 45.6 62.2 54.1 54.2 37.2 37.5 42.7 86.8 61.3 20.7 50.2 80.4 32.7 37.0 56.1 51.5 54.3 98.2 87.12 78.10 77.07 90.19 5
Greece 56.5 50.0 31.9 14.4 14.4 24.7 27.5 30.3 43.0 38.2 2.6 45.6 63.1 9.4 9.9 88.7 48.7 25.2 55.9 55.45 46.78 53.53 58.92 21
Hong Kong 35.5 43.7 73.6 37.3 48.2 32.8 100.0 84.6 79.4 38.6 2.8 76.8 78.7 44.1 19.6 59.4 32.2 39.0 53.6 79.01 100.00 81.94 95.70 2
Hungary 36.6 37.9 34.2 26.4 13.9 21.3 36.1 39.7 52.7 61.1 1.1 21.5 51.7 9.8 9.1 59.0 39.5 31.3 100.0 51.32 62.85 49.83 63.86 17
India 51.9 50.5 13.1 14.4 14.4 0.8 4.0 4.2 45.6 13.1 7.8 1.3 44.7 0.0 5.0 23.1 39.0 1.7 39.0 45.92 15.50 23.43 30.49 38
Indonesia 22.1 25.0 3.8 2.8 0.3 0.5 83.3 29.9 65.3 24.3 0.1 0.1 37.2 0.0 0.0 27.0 14.0 1.2 36.1 17.07 54.03 33.58 34.27 36
Iran 43.3 43.4 14.6 21.5 7.5 0.5 8.6 2.8 49.3 6.4 5.4 14.1 42.9 0.6 4.2 48.4 34.6 9.7 51.1 40.19 22.77 45.72 37.00 35
Ireland 55.8 57.1 62.6 47.7 52.1 32.1 13.0 31.5 81.0 52.9 1.6 70.7 77.5 37.4 16.0 72.9 70.5 45.7 85.8 90.66 64.58 100.00 100.00 1
Israel 43.3 60.7 42.0 53.3 42.6 5.4 68.2 39.6 100.0 50.1 2.8 72.9 71.4 78.3 33.3 61.9 86.8 16.3 60.4 81.99 76.11 96.73 94.81 3
Italy 33.2 35.7 37.5 37.6 30.9 18.5 55.4 26.6 47.8 49.9 14.0 45.8 75.2 18.9 17.2 63.4 27.9 23.8 46.9 57.32 52.98 53.21 60.94 18
Japan 22.8 53.6 62.6 39.1 37.3 17.8 11.5 7.1 66.8 80.2 24.4 37.9 54.1 12.3 44.8 59.4 86.8 69.7 63.3 75.77 49.71 84.35 82.81 10
Malaysia 72.1 91.4 49.7 31.7 12.9 30.2 14.7 15.3 81.2 2.7 2.2 14.9 41.5 1.5 4.2 36.8 34.0 21.8 50.6 80.61 48.83 52.70 73.96 13
Mexico 42.8 50.0 30.8 14.2 5.4 1.6 25.1 24.1 51.9 13.4 2.3 4.0 42.0 0.7 7.5 27.5 32.4 5.4 37.2 47.44 37.31 22.60 40.73 33
New Zealand 45.6 57.1 40.7 43.2 32.9 77.0 14.0 40.0 62.6 50.2 1.7 74.5 75.7 59.6 16.3 80.1 73.5 50.7 49.7 75.05 62.88 91.18 90.16 6
Poland 45.0 53.6 34.7 28.9 14.9 4.9 13.0 15.3 27.1 24.7 4.5 22.9 41.1 2.7 9.9 72.9 44.4 22.4 79.4 56.38 23.47 61.25 55.59 23
Portugal 44.6 53.6 41.4 59.4 38.9 16.8 30.5 48.5 55.6 38.5 2.8 52.7 63.8 18.6 14.1 65.4 32.3 60.9 55.4 76.36 56.16 66.36 71.80 15
Romania 48.5 57.1 19.6 14.0 4.7 9.1 12.7 14.8 45.1 30.4 1.4 13.0 34.6 0.0 0.0 51.2 27.9 10.2 64.7 44.51 32.35 34.51 41.06 32
Russia 43.2 57.1 27.5 13.5 5.9 9.4 8.8 10.4 32.5 5.1 2.1 3.0 28.9 1.2 15.7 74.9 100.0 42.8 82.4 47.53 20.35 84.15 56.17 22
Saudi Arabia 100.0 90.0 63.7 14.4 14.4 16.9 88.9 7.7 49.3 15.8 0.6 4.0 48.5 8.4 18.5 42.8 27.9 16.5 33.5 100.00 52.58 28.40 65.75 16
Serbia 59.6 56.8 25.4 45.3 12.6 17.8 57.2 6.1 49.3 0.1 0.7 17.1 43.5 7.0 5.0 50.0 32.7 21.0 61.0 64.57 27.92 35.68 46.12 25
Slovakia 29.4 32.1 27.0 24.5 15.0 19.1 22.5 6.2 37.7 32.7 0.4 16.4 44.3 0.0 0.0 54.6 35.1 38.2 89.8 41.51 32.74 81.00 55.39 24
Slovenia 47.0 46.4 37.9 27.5 21.1 9.1 78.4 45.5 42.3 56.5 0.6 60.1 53.3 21.8 4.2 84.4 46.9 57.9 70.9 58.72 65.34 66.24 71.96 14
South Africa 27.7 30.6 13.6 19.5 5.6 36.6 30.5 26.5 53.2 50.3 1.9 7.4 61.4 3.2 15.2 51.8 11.3 5.2 51.8 32.41 59.83 31.74 43.83 28
Korea, Rep. (South) 31.8 92.9 39.0 37.2 30.3 9.2 6.5 11.8 64.2 76.9 11.1 44.1 55.5 12.6 21.0 100.0 75.6 71.6 47.6 80.74 47.67 87.44 83.49 9
Spain 46.9 46.4 52.3 38.9 32.0 15.9 62.6 63.5 49.8 43.0 10.1 43.3 64.4 12.2 18.2 82.0 59.0 38.3 67.2 68.56 64.55 82.85 78.96 11
Taiwan 37.8 66.4 50.4 14.4 14.4 9.9 83.4 38.4 80.7 36.2 7.6 65.0 56.2 21.4 19.9 64.5 74.6 61.4 50.6 64.61 68.23 89.37 87.71 7
Thailand 34.8 36.4 11.8 5.5 1.4 4.0 45.5 59.0 57.3 37.5 1.5 4.6 50.5 0.0 0.0 52.2 23.9 4.6 50.5 28.32 56.04 50.69 45.99 26
Turkey 34.8 36.4 13.4 44.0 15.8 4.0 12.7 10.1 57.7 15.3 5.7 15.2 44.8 0.6 4.2 60.2 29.2 11.7 47.6 48.59 33.52 33.37 41.95 30
Ukraine 78.3 67.9 10.5 4.7 0.9 7.5 12.9 8.8 41.9 0.3 0.4 1.6 23.4 0.0 0.0 78.6 71.0 17.1 50.4 64.33 16.88 33.97 41.99 29

Comparing partial rankings to the adequate part of the original ranking

The final step is to compare the results of the partial rankings to the corresponding part of the original rankings. The results of the Spearman’s rank correlation shows a strong positive relationship between the two ranking.

cor(B$Rank,U21_rank[rownames(B),],method="spearman")
## [1] 0.9295836
cor.test(B$Rank,U21_rank[rownames(B),],method="spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  B$Rank and U21_rank[rownames(B), ]
## S = 643.54, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9295836

Applying BicARE on normalized data

With the BicARE method the middle league(s) can be specified. In order to use the BicARE method, the first step is to convert the BicARE object to a Biclust object.

bicare2biclust <- function(x){
    if(class(x)=="Biclust"){ 
        return(x)
    } else if(class(x)=="biclustering"){
        Parameters <- list(numberofbicluster=x$param[1,2],residuthreshold=x$param[2,2],genesinitialprobability=x$param[3,2],samplesinitialprobability=x$param[4,2],numberofiterations=x$param[5,2],date=x$param[6,2])
        RowxNumber <- t(x$bicRow==1)
        NumberxCol <- x$bicCol==1
        Number <- as.numeric(dim(RowxNumber)[2])
        info <- list()
        return(new("Biclust",Parameters=Parameters,RowxNumber=RowxNumber,NumberxCol=NumberxCol,Number=Number,info=info))
    }
}

In the next step the BicARE algorithm determines the number of the bi-clusters.

BICARE_res <- FLOC(Data=as.matrix(ALL),k=2,pGene=1,pSample=1,r=1e-16,N=17,M=6,t=10000,blocGene=NULL,blocSample=NULL)
bres <- bicare2biclust(BICARE_res)
summary(bres)
## 
## An object of class Biclust 
## 
## call:
##  NULL
## 
## Number of Clusters found:  2 
## 
## Cluster sizes:
##                    BC 1 BC 2
## Number of Rows:      17   17
## Number of Columns:    6    6
bres@Parameters$residuthreshold
## [1] "1e-16"

The results above show that the BiCARE algorithm found 2 bi-clusters. In both bi-clusters there are 17 rows (countries) and 6 columns (indicators) which means that the first and the second bi-cluster are same. Because of this, in the following, only the first bi-cluster will be considered. This bi-cluster contains the middle league countries and indicators.

Calculating middle league bi-cluster

The next step is to identify the middle league bi-cluster.

BICARE_res <- FLOC(Data=as.matrix(ALL),k=1,pGene=1,pSample=1,r=1e-16,N=17,M=6,t=10000,blocGene=NULL,blocSample=NULL)
bres <- bicare2biclust(BICARE_res)
summary(bres)
## 
## An object of class Biclust 
## 
## call:
##  NULL
## 
## There was one cluster found with
##   17 Rows and  6 columns
bres@Parameters$residuthreshold
## [1] "1e-16"

The results of the BicARE algorithm can be seen above. The algorithm found one bi-cluster in which there are 17 rows (countries) and 6 columns (indicators). The following step is to calculate the F-tests to determine whether the bi-cluster is significant or not.

Calculating F-tests of the middle league bi-cluster

In Table 7 the results of the row and column effects (F-tests) can be read. Both the row and column effects are significant, so the middle league bi-cluster is significant.

RObs.FStat <- NULL
for (i in c(1:bres@Number)){
  RObs.FStat[[i]] <- computeObservedFstat(x=mtx,bicResult=bres,number=i)
}
kable(RObs.FStat,caption = "**Table 7** The results of row and column effects of the middle league")
Table 7 The results of row and column effects of the middle league
Fstat PValue
Row Effect 24.05448 0.0000000
Column Effect 28.60087 0.0000000
Tukey test 11.37256 0.0011565

To see the differences between those countries that are included in the middle league and those that are not in the middle league, one can plot the means, medians, variances, and median absolute deviations (MAD) for rows (countries) (see Fig. 11). On Fig. 11 the red line indicates those countries that are included in the middle league, while the black line shows the remaining countries. Fig. 11. shows that in the case of the countries in the middle league the variances differ the most.

exploreBic(dset=mtx,bres=bres,mname='biclust',pfor='all',gby='genes',bnum=1)
**Fig. 11** Mean, median, variance and MAD for middle league countries

Fig. 11 Mean, median, variance and MAD for middle league countries

One can do the same for the columns (indicators). Fig. 12 shows the differences between the means, medians, variances, and median absolute deviations (MAD) for the indicators. The red line indicates those indicators that are in the middle league, while the black line shows the remaining indicators. In the case of the indicators in the middle league, all parameters are different.

exploreBic(dset=mtx,bres=bres,mname='biclust',pfor='all',gby='conditions',bnum=1)
**Fig. 12** Mean, median, variance and MAD for middle league indicators

Fig. 12 Mean, median, variance and MAD for middle league indicators

Heat maps of the middle league

Fig. 13 depicts the heat map of the middle league. The middle league contains 17 countries and 6 indicators. The red cells mean higher values in the case of indicators, while blue cells symbol lower values.

drawHeatmap(x=mtx,bicResult=bres,number=1,paleta=bluered(100),beamercolor = TRUE) 
**Fig. 13** Heat map of the middle league

Fig. 13 Heat map of the middle league

Fig. 14 shows the results of BicARE on normalized data. The middle league can be seen in the upper left corner of Fig. 14 which is enlarged on Fig.13.

drawHeatmap(x=mtx,bicResult=bres,local=FALSE,number=1,paleta=bluered(100),beamercolor = TRUE) 
**Fig. 14** Heat map of the output after applying BicARE on normalized data

Fig. 14 Heat map of the output after applying BicARE on normalized data

Checking stability of the middle league bi-cluster

The next step is to check the stability of the middle league bi-cluster. The results of the bootstrap can be seen below.

Bootstrap <- diagnoseColRow(x=as.matrix(ALL),bicResult=bres,number=1,
  nResamplings=100,replace=TRUE)
Bootstrap
## $bootstrapFstats
##         fval.row   fval.col
##   [1,] 1.2797138 0.92973265
##   [2,] 0.6308360 2.26969130
##   [3,] 0.5531226 0.34184899
##   [4,] 0.7398099 0.60701036
##   [5,] 0.8751443 0.75850736
##   [6,] 0.6384033 1.38538046
##   [7,] 0.9264259 1.59386590
##   [8,] 1.0424404 1.41667547
##   [9,] 1.2313374 1.66053918
##  [10,] 1.6187034 1.93170581
##  [11,] 1.0002165 1.82831682
##  [12,] 1.1659334 0.31544320
##  [13,] 0.9686587 3.02555502
##  [14,] 0.7889407 0.24219823
##  [15,] 1.1651444 0.51430220
##  [16,] 0.7664935 0.25188752
##  [17,] 1.2966583 0.75479037
##  [18,] 0.6087243 0.60416462
##  [19,] 1.6635851 1.11425445
##  [20,] 1.5994296 3.72220362
##  [21,] 0.4524083 1.68140405
##  [22,] 1.3018355 1.40929519
##  [23,] 0.7737949 0.67066220
##  [24,] 0.4313269 0.63478370
##  [25,] 0.7197905 0.63137392
##  [26,] 0.3107913 0.08439050
##  [27,] 0.5925387 1.05212284
##  [28,] 1.0170342 0.91129531
##  [29,] 1.3987804 0.60384385
##  [30,] 0.9740528 1.60382279
##  [31,] 0.9998474 0.38161015
##  [32,] 0.9621023 0.61654700
##  [33,] 1.2777245 0.69383626
##  [34,] 1.0652778 0.25271195
##  [35,] 1.3949112 1.24467219
##  [36,] 0.9137101 2.40609908
##  [37,] 0.3231827 0.81714002
##  [38,] 0.6636984 0.60313247
##  [39,] 0.5652549 0.67989827
##  [40,] 1.0661977 0.07615586
##  [41,] 0.5950481 0.70955485
##  [42,] 1.4749913 0.90076425
##  [43,] 1.3541272 1.05899153
##  [44,] 0.8173940 0.31513892
##  [45,] 1.1536514 1.40289697
##  [46,] 0.8475528 0.38835338
##  [47,] 1.2035240 1.00488266
##  [48,] 1.7110558 0.60071718
##  [49,] 0.7605314 0.23575262
##  [50,] 0.6358308 0.33679445
##  [51,] 0.7873911 0.85714294
##  [52,] 0.6774348 1.11798979
##  [53,] 1.0210549 2.58339697
##  [54,] 1.3124527 0.86583984
##  [55,] 1.1102214 0.45232343
##  [56,] 1.0448184 0.73400262
##  [57,] 1.0455843 1.00166559
##  [58,] 0.6963777 0.32632050
##  [59,] 0.6558770 0.95636937
##  [60,] 1.1892755 1.16442647
##  [61,] 1.6105777 0.48507106
##  [62,] 0.8275210 0.68998180
##  [63,] 1.3919422 0.82283795
##  [64,] 0.5333156 0.66179741
##  [65,] 1.7159211 0.79098281
##  [66,] 1.1097524 1.71013298
##  [67,] 2.3399421 1.92874703
##  [68,] 1.8179426 1.35168277
##  [69,] 0.8439948 1.33697959
##  [70,] 1.4131508 2.23182750
##  [71,] 1.1032874 1.85557228
##  [72,] 0.8321214 1.68305501
##  [73,] 0.7636602 0.68242062
##  [74,] 0.2614331 0.73621046
##  [75,] 1.0502828 1.22074287
##  [76,] 0.7249837 3.15642511
##  [77,] 1.1959766 1.37508923
##  [78,] 0.7127359 0.23914445
##  [79,] 0.9366184 0.23630663
##  [80,] 1.9354833 1.24137594
##  [81,] 0.9424695 0.50241750
##  [82,] 1.0753967 1.72812940
##  [83,] 0.7579631 1.06972436
##  [84,] 0.5639394 0.05701886
##  [85,] 1.0104365 1.21466801
##  [86,] 0.8774873 0.64495578
##  [87,] 0.6519611 0.89153511
##  [88,] 1.4148459 2.22847710
##  [89,] 1.1513600 0.47132863
##  [90,] 1.3629299 1.09582861
##  [91,] 1.3945072 0.46625133
##  [92,] 1.0723225 1.40314437
##  [93,] 0.9980748 0.69521239
##  [94,] 0.7459515 1.80375122
##  [95,] 0.6668875 0.01625024
##  [96,] 0.7424207 1.14138576
##  [97,] 1.0617370 1.89492581
##  [98,] 1.3710553 0.94145384
##  [99,] 0.6751472 1.17410028
## [100,] 0.9992803 0.38839751
## 
## $observedFstatRow
## [1] 34.66895
## 
## $observedFstatCol
## [1] 63.27824
## 
## $bootstrapPvalueRow
## [1] 0
## 
## $bootstrapPvalueCol
## [1] 0

Since the bootstrapPvalueRow and the bootstrapPvalueCol are lower than 0.01, the bi-cluster is stable.

Calculating partial rankings

The next step is to calculate partial rankings based on the indicators and countries in the middle league. The calculation of each score of each indicator of the countries are done according to the method used in the original U21 ranking. The original weights of U21’s indicators are used, and the total scores for the countries are calculated using the selected indicators in the given bi-cluster. Table 8 shows the result of partial ranking in the case of the middle league.

C<-biclust::bicluster(orig_mtx,bres,number=1)
B<-as.data.frame(C[[1]])
selectedweight<-weights[,colnames(B)]
colsR <- grep("R", names(B), value=T)
colsO <- grep("O", names(B), value=T)
swR<-weights[,colsR]
swO<-weights[,colsO]
BR<-rowSums(B[,colsR]*swR)*100/max(rowSums(B[,colsR]*swR))
BO<-rowSums(B[,colsO]*swO)*100/max(rowSums(B[,colsO]*swO))
B$R_Score<-BR
B$O_Score<-BO
B$Overall_Score<-rowSums(cbind(BR,BO)*c(0.2,0.4))*100/max(rowSums(cbind(BR,BO)*c(0.2,0.4)))
B$Rank<-rank(-B$Overall_Score)
kable(B,caption = "**Table 8** League B by BicARE",digits = 2)
Table 8 League B by BicARE
R3 R4 R5 O3 O4 O5 R_Score O_Score Overall_Score Rank
Bulgaria 17.6 6.4 2.4 38.5 0.0 0.0 14.90 19.67 18.08 16
Croatia 24.4 22.1 10.7 42.3 10.2 4.2 26.85 28.97 27.56 13
Czech Republic 29.9 38.3 27.6 51.9 6.2 6.1 41.79 32.81 35.80 9
France 58.9 49.3 46.0 73.8 25.3 40.3 72.16 71.23 71.85 3
Germany 62.2 54.1 54.2 80.4 32.7 37.0 76.06 76.70 76.49 2
Hungary 34.2 26.4 13.9 51.7 9.8 9.1 29.94 36.08 31.98 11
Iran 14.6 21.5 7.5 42.9 0.6 4.2 19.71 24.37 22.82 15
Ireland 62.6 47.7 52.1 77.5 37.4 16.0 71.15 66.89 69.73 4
Malaysia 49.7 31.7 12.9 41.5 1.5 4.2 36.30 24.12 28.18 12
Netherlands 67.1 75.8 85.3 98.3 63.3 34.1 100.00 100.00 100.00 1
Portugal 41.4 59.4 38.9 63.8 18.6 14.1 67.42 49.31 55.35 6
Romania 19.6 14.0 4.7 34.6 0.0 0.0 14.56 17.68 15.60 17
Russia 27.5 13.5 5.9 28.9 1.2 15.7 25.19 23.40 24.00 14
Slovenia 37.9 27.5 21.1 53.3 21.8 4.2 38.60 40.52 39.24 8
South Africa 13.6 19.5 5.6 61.4 3.2 15.2 15.00 40.78 32.19 10
Spain 52.3 38.9 32.0 64.4 12.2 18.2 59.43 48.44 55.77 5
Taiwan 50.4 14.4 14.4 56.2 21.4 19.9 31.70 49.82 43.78 7

Comparing partial rankings to the adequate part of the original ranking

The final step is to compare the results of the partial rankings to the corresponding part of the original rankings. The results of the Spearman’s rank correlation shows a strong positive relationship between the two ranking.

cor(B$Rank,U21_rank[rownames(B),],method="spearman")
## [1] 0.9019608
cor.test(B$Rank,U21_rank[rownames(B),],method="spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  B$Rank and U21_rank[rownames(B), ]
## S = 80, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9019608

Specifying overlaps

Bi-cluster might (or might not) overlap. Analyzing the overlaps can give a subtler picture of countries and indicators.

Venn diagram of indicators

Fig. 15 shows the Venn diagram of the indicators. The red circle contains those indicators that are in the upper league (League A), in the the blue circle those indicators can be found that are in the lower league (League C). The green circle indicates the middle league’s (League B) indicators.

T<-biclust::bicluster(orig_mtx,res,number = 1)
League_A<-T[[1]]
R<-biclust::bicluster(orig_mtx,rres,number = 1)
League_C<-R[[1]]
B<-biclust::bicluster(orig_mtx,bres,number = 1)
League_B<-B[[1]]
draw.triple.venn(area1=nrow(as.matrix(colnames(League_A))),area2=nrow(as.matrix(colnames(League_B))),area3=nrow(as.matrix(colnames(League_C))),n12=nrow(as.matrix(intersect(colnames(League_A),colnames(League_B)))),n13=nrow(as.matrix(intersect(colnames(League_A),colnames(League_C)))),n23=nrow(as.matrix(intersect(colnames(League_B),colnames(League_C)))),n123=nrow(as.matrix(intersect(intersect(colnames(League_A),colnames(League_B)),colnames(League_C)))),fill=c("red","green","blue"),category=c("League A","League B","League C"),main="Venn Diagram of indicators")
**Fig. 15** Venn diagram of the indicators

Fig. 15 Venn diagram of the indicators

## (polygon[GRID.polygon.17], polygon[GRID.polygon.18], polygon[GRID.polygon.19], polygon[GRID.polygon.20], polygon[GRID.polygon.21], polygon[GRID.polygon.22], text[GRID.text.23], text[GRID.text.24], text[GRID.text.25], text[GRID.text.26], text[GRID.text.27], text[GRID.text.28], text[GRID.text.29], text[GRID.text.30])

One interesting finding is, that all of the indicators in the middle league are part of the lower league’s indicators.

Venn diagram of countries

Fig. 16 shows the Venn diagram of the countries. The red circle contains those countries that are in the upper league (League A), in the the blue circle those countries can be found that are in the lower league (League C). The green circle indicates the middle league’s (League B) countries.

draw.triple.venn(area1=nrow(as.matrix(rownames(League_A))),area2=nrow(as.matrix(rownames(League_B))),area3=nrow(as.matrix(rownames(League_C))),n12=nrow(as.matrix(intersect(rownames(League_A),rownames(League_B)))),n13=nrow(as.matrix(intersect(rownames(League_A),rownames(League_C)))),n23=nrow(as.matrix(intersect(rownames(League_B),rownames(League_C)))),n123=nrow(as.matrix(intersect(intersect(rownames(League_A),rownames(League_B)),rownames(League_C)))),fill=c("red","green","blue"),category=c("League A","League B","League C"),main="Venn Diagram of countries")
**Fig. 16** Venn diagram of the countries

Fig. 16 Venn diagram of the countries

## (polygon[GRID.polygon.31], polygon[GRID.polygon.32], polygon[GRID.polygon.33], polygon[GRID.polygon.34], polygon[GRID.polygon.35], polygon[GRID.polygon.36], text[GRID.text.37], text[GRID.text.38], text[GRID.text.39], text[GRID.text.40], text[GRID.text.41], text[GRID.text.42], text[GRID.text.43], text[GRID.text.44], text[GRID.text.45])

Seven countries are in all of the three leagues. There are 11 countries which belong only to the upper league, and there are 18 countries which are just in the lower league.

Summing up the conclusions

The above proposed bi-clustering method can help to identify which countries can be compared based on which indicators in university rankings. With the iBBiG method the upper and lower league(s) (League(s) A and League(s) C) can be determined and the BicARE method can specify the middle league(s) (League(s) B).