Workflow
#Seriation and biclustering
library(seriation) #For seriate a matrix
library(biclust) #General biclustering functions
library(iBBiG) #iBBiG biclustering
library(BicARE) #BicARE biclustering
library(BcDiag) #Diagnostics of biclusters
#Additional statistic packages
library(factoextra) #Additional distance functions
library(VennDiagram) #VennDiagram
library(venneuler) #VennDiagram
#For formatted and interactive tables and figures
library(knitr) #Formatted tables, figures
Citations
#Seriation and biclustering
print(citation("seriation"),style="text") #For seriate a matrix
## Hahsler M, Buchta C, Hornik K (2018). _seriation: Infrastructure
## for Ordering Objects Using Seriation_. R package version 1.2-3,
## <URL: https://CRAN.R-project.org/package=seriation>.
##
## Hahsler M, Hornik K, Buchta C (2008). "Getting things in order: An
## introduction to the R package seriation." _Journal of Statistical
## Software_, *25*(3), 1-34. ISSN 1548-7660, <URL:
## http://www.jstatsoft.org/v25/i03/>.
print(citation("biclust"),style="text") #General biclustering functions
## Kaiser S, Santamaria R, Khamiakova T, Sill M, Theron R, Quintales
## L, Leisch F, De Troyer. E (2018). _biclust: BiCluster Algorithms_.
## R package version 2.0.1, <URL:
## https://CRAN.R-project.org/package=biclust>.
print(citation("iBBiG"),style="text") #iBBiG biclustering
## Gusenleitner D, Culhane A (2011). _iBBiG: Iterative Binary
## Biclustering of Genesets_. R package version 1.24.0, <URL:
## http://bcb.dfci.harvard.edu/~aedin/publications/>.
print(citation("BicARE"),style="text") #BicARE biclustering
## Gestraud P (2008). _BicARE: Biclustering Analysis and Results
## Exploration_. R package version 1.38.0, <URL:
## http://bioinfo.curie.fr>.
print(citation("BcDiag"),style="text") #Diagnostics of biclusters
## Mengsteab A, Otava M, Khamiakova T, De Troyer E (2015). _BcDiag:
## Diagnostics Plots for Bicluster Data_. R package version 1.0.10,
## <URL: https://CRAN.R-project.org/package=BcDiag>.
#Additional statistic packages
print(citation("factoextra"),style="text") #Additional distance functions
## Kassambara A, Mundt F (2017). _factoextra: Extract and Visualize
## the Results of Multivariate Data Analyses_. R package version
## 1.0.5, <URL: https://CRAN.R-project.org/package=factoextra>.
print(citation("VennDiagram"),style="text") #VennDiagram
## Chen H (2018). _VennDiagram: Generate High-Resolution Venn and
## Euler Plots_. R package version 1.6.20, <URL:
## https://CRAN.R-project.org/package=VennDiagram>.
print(citation("venneuler"),style="text") #VennDiagram
## Wilkinson L (2011). _venneuler: Venn and Euler Diagrams_. R
## package version 1.1-0, <URL:
## https://CRAN.R-project.org/package=venneuler>.
#For formatted and interactive tables and figures
print(citation("knitr"),style="text") #Formatted tables, figures
## Xie Y (2018). _knitr: A General-Purpose Package for Dynamic Report
## Generation in R_. R package version 1.20, <URL:
## https://yihui.name/knitr/>.
##
## Xie Y (2015). _Dynamic Documents with R and knitr_, 2nd edition.
## Chapman and Hall/CRC, Boca Raton, Florida. ISBN 978-1498716963,
## <URL: https://yihui.name/knitr/>.
##
## Xie Y (2014). "knitr: A Comprehensive Tool for Reproducible
## Research in R." In Stodden V, Leisch F, Peng RD (eds.),
## _Implementing Reproducible Computational Research_. Chapman and
## Hall/CRC. ISBN 978-1466561595, <URL:
## http://www.crcpress.com/product/isbn/9781466561595>.
In the case of this analysis an extended database is used. The extended database contains data from the U21 ranking (R1-5, E1-4, C1-6, O1-9), the Global Competitiveness Index (GCI) and the indicators of its two pillars related to higher education and R&D and GDP from World Bank. New indicators are the following:
5th pillar of the GCI (Higher education and training):
12th pillar of the GCI (Innovation):
Overall GCI:
World Bank/World Development Indicators/GDP:
load("U21_AND_MORE_BIC.RData")
#ALL: 50 by 24 dataframe of normalized data, with two letters indicator names
#U21: 50 by 24 dataframe of normalized data, with original indicator names
#LESS_NORM: 50 by 46 dataframe of normalized U21 and Global Competitive Index indicators for 50 countries
#LESS_ORIG: 50 by 46 dataframe of original U21 and Global Competitive Index indicators for 50 countries
#orig_mtx: 50 by 46 matrix of original U21 and Global Competitive Index indicators for 50 countries
#orig_mtx_u21: 50 by 24 matrix of original U21 indicators for 50 countries
#mtx: 50 by 46 matrix of normalized U21 and Global Competitive Index indicators for 50 countries
#mtx_u21: 50 by 34 matrix of normalized U21 indicators for 50 countries
#U21_rank: 50 by 1 matrix (vector) of U21'S country ranks
#weights: 1 by 46 matrix (vector) of equally weights (in this case all weights are 1/46)
Firstly, to obtain a preliminary picture of the possible bi-clusters, seriation of the normalized data is done.
ALL_SER <- c(seriate(dist(mtx,"euclidean"),method="MDS"),seriate(get_dist(t(mtx),"spearman"),method="MDS"))
ALL_ORDERED<-mtx[get_order(ALL_SER,dim=1),get_order(ALL_SER,dim=2)]
rownames(ALL_ORDERED) <- rownames(mtx[get_order(ALL_SER,dim=1),get_order(ALL_SER,dim=2)])
After seriation, plotting the heat map of the normalized data (see Fig. 1) can help to detect the possible bi-clusters. On Fig. 1 the blue cells indicate lower values of the indicators, while red cells mean higher values.
hmap(as.matrix(ALL_ORDERED),col=bluered(100),showdist="both")
After seriation, two-two bigger homogeneous blocks can be identified based on Fig. 1. Two blocks of the red cells on the top left corner on Fig. 1 indicates the top leagues, while the two bigger blue blocks, which indicates the remaining (lower) leagues, can be discovered in the bottom of the figure. The dendrogram of two-way clustering also shows that regarding rows and columns two-two main blocks can be specified.
Applying the iBBiG method on the normalized data can specify the top league bi-clusters. First of all, the number of bi-clusters needed to be determined.
res <- iBBiG(binaryMatrix=binarize(mtx,threshold = NA),nModules = 2,alpha=0.3,pop_size = 100,mutation = 0.08,stagnation = 50,selection_pressure = 1.2,max_sp = 15,success_ratio = 0.6)
## [1] "Threshold: 0.5"
## Module: 1 ... done
## Module: 2 ... done
summary(res)
##
## An object of class iBBiG
##
## Number of Clusters found: 2
##
## First 2 Cluster scores and sizes:
## M 1 M 2
## Cluster Score 524.6154 89.85694
## Number of Rows: 24.0000 25.00000
## Number of Columns: 36.0000 7.00000
The results above show that the iBBiG algorithm found 2 bi-clusters. In the first bi-cluster there are 24 rows (countries) and 36 columns (indicators), in the second bi-cluster there are 25 rows (countries) and 7 columns (indicators). The next step is to calculate F-tests to determine which bi-cluster is significant.
Obs.FStat <- NULL
for (i in c(1:res@Number)){
Obs.FStat[[i]] <- computeObservedFstat(x=mtx,bicResult=res,number=i)
}
kable(Obs.FStat,caption = "**Table 1** The results of row and column effects for two bi-clusters")
|
|
In Table 1 the results of the F-tests can be seen. In the case of the first bi-cluster, both the row and column effect are significant. On the other hand, in the case of the second bi-cluster the row effect is not significant at p = 0.01 level; thus the second bi-cluster is not significant. Only the first bi-cluster is significant which is the top league (League A).
The next step is to identify the top league bi-cluster.
res <- iBBiG(binaryMatrix=binarize(mtx,threshold = 0.50),nModules = 1,alpha=0.3,pop_size = 100,mutation = 0.08,stagnation = 50,selection_pressure = 1.2,max_sp = 15,success_ratio = 0.8)
## Module: 1 ... done
summary(res)
##
## An object of class iBBiG
##
## There was one cluster found with Score 524.6154 and
## 24 Rows and 36 columns
The results of the iBBiG algorithm can be seen above. The algorithm found one bi-cluster in which there are 24 rows (countries) and 36 columns (indicators). The following step is to calculate the F-tests to determine whether the bi-cluster is significant or not.
In Table 2 the results of the row and column effects (F-tests) can be read. Both the row and column effects are significant, so the top league bi-cluster is significant.
Obs.FStat <- NULL
for (i in c(1:res@Number)){
Obs.FStat[[i]] <- computeObservedFstat(x=mtx,bicResult=res,number=i)
}
kable(Obs.FStat,caption = "**Table 2** The results of row and column effects for the top league")
|
To see the differences between those countries that are included in the top league and those that are not in the top league, one can plot the means, medians, variances, and median absolute deviations (MAD) for rows (countries) (see Fig. 2). (The horizontal axis of Fig. 2 shows the countries, and the vertical axis depicts the mentioned parameters on normalized (0-1) data of indicators.) On Fig. 2 the red line indicates those countries that are included in the top league, while the black line shows the remaining countries. The countries in the top league have higher means and medians.
exploreBic(dset=mtx,bres=res,mname='biclust',pfor='all',gby='genes',bnum=1)
One can do the same for the columns (indicators). Fig. 3 shows the differences between the means, medians, variances, and median absolute deviations (MAD) for the indicators. (The horizontal axis of Fig. 3 shows the indicators, and the vertical axis depicts the mentioned parameters on normalized (0-1) data of countries.) The red line indicates those indicators that are in the top league, while the black line shows the remaining indicators.
exploreBic(dset=mtx,bres=res,mname='biclust',pfor='all',gby='conditions',bnum=1)
Fig. 4 depicts the heat map of the top league. The top league contains 24 countries and 36 indicators. The red cells mean higher values in the case of indicators, while blue cells symbol lower values.
drawHeatmap(x=mtx,bicResult=res,number=1,beamercolor = TRUE,paleta=bluered(100))
Fig. 5 shows the heat map of the results of iBBiG on normalized data. The top league can be seen in the upper left corner of Fig.5, which is enlarged on Fig.4.
drawHeatmap(x=mtx,bicResult=res,local=FALSE,number=1,beamercolor = TRUE,paleta=bluered(100))
The next step is to check the stability of the top league bi-cluster. The results of the bootstrap can be seen below.
Bootstrap <- diagnoseColRow(x=mtx,bicResult=res,number=1,
nResamplings=100,replace=TRUE)
Bootstrap
## $bootstrapFstats
## fval.row fval.col
## [1,] 0.4558573 1.2457839
## [2,] 0.7762345 0.7598299
## [3,] 0.8820925 1.2526881
## [4,] 0.6692145 0.6348535
## [5,] 0.9052724 1.0013984
## [6,] 1.1753236 0.8978720
## [7,] 1.0272293 0.7682869
## [8,] 0.9653420 1.4250487
## [9,] 1.2893017 1.3926588
## [10,] 1.1110981 1.6054483
## [11,] 0.9458103 0.9886969
## [12,] 0.8346419 0.8635576
## [13,] 1.1191162 0.9559220
## [14,] 0.6836625 1.2205601
## [15,] 1.0312550 1.0788385
## [16,] 0.9493767 1.1851422
## [17,] 0.7678664 0.7791395
## [18,] 1.1122661 1.3074732
## [19,] 0.6424177 0.9130398
## [20,] 1.3827550 0.7626709
## [21,] 1.1457097 0.8648573
## [22,] 1.3361727 1.2474525
## [23,] 1.1112548 0.5710432
## [24,] 1.1390997 0.6208631
## [25,] 1.1138205 1.1016886
## [26,] 0.6206800 0.9748964
## [27,] 1.5427918 0.6365833
## [28,] 1.1807683 1.1337138
## [29,] 1.1469962 1.2715420
## [30,] 1.0504918 1.3011868
## [31,] 1.4030936 1.2870953
## [32,] 1.2225875 0.8520711
## [33,] 0.8377245 1.2852197
## [34,] 0.8244972 1.1141902
## [35,] 1.4580365 1.2429602
## [36,] 1.0191745 1.3459453
## [37,] 0.6079635 0.8073483
## [38,] 0.6027129 0.8911671
## [39,] 1.1186381 0.7813470
## [40,] 0.8440753 1.2352131
## [41,] 0.9991487 0.9056138
## [42,] 0.7488974 1.0719661
## [43,] 1.2087327 0.8868491
## [44,] 0.7187704 1.3342927
## [45,] 1.3792451 1.0448297
## [46,] 1.2232797 0.9134026
## [47,] 0.9807438 1.2319932
## [48,] 1.0937240 0.7643169
## [49,] 1.0069267 0.8439561
## [50,] 1.0554553 0.7672389
## [51,] 0.3991724 0.7518581
## [52,] 0.3975236 0.5158957
## [53,] 1.1472599 1.0219702
## [54,] 0.9002998 0.7226747
## [55,] 1.5158563 0.7355529
## [56,] 1.1911368 0.7386849
## [57,] 0.8558949 0.7115147
## [58,] 0.9025096 1.4479635
## [59,] 1.5137745 1.0294836
## [60,] 1.5480152 1.2708263
## [61,] 0.8614656 0.8841339
## [62,] 1.2848611 1.1394600
## [63,] 1.0059435 0.9341937
## [64,] 0.6423531 0.9372097
## [65,] 1.0677445 0.7347199
## [66,] 1.1592102 0.8873006
## [67,] 0.8702847 1.1062927
## [68,] 0.6011507 0.8901371
## [69,] 0.8682281 1.2872815
## [70,] 0.9228611 1.2186835
## [71,] 1.0301156 0.6209046
## [72,] 0.5282748 0.7284243
## [73,] 1.2344511 1.1267588
## [74,] 0.8141651 0.9895322
## [75,] 0.5924007 0.8045624
## [76,] 1.0001861 1.1210527
## [77,] 1.1179881 0.8128473
## [78,] 0.8513709 0.6978781
## [79,] 1.1746256 0.7061067
## [80,] 1.0393032 0.9121229
## [81,] 1.1141229 1.0538603
## [82,] 1.4043178 0.9318470
## [83,] 0.8170598 0.7481587
## [84,] 0.9658959 0.8928029
## [85,] 1.4394232 0.5875308
## [86,] 0.8082113 0.8927686
## [87,] 1.3945176 0.7694480
## [88,] 1.0719340 1.4111355
## [89,] 1.4402063 0.8448667
## [90,] 0.6760628 0.8433685
## [91,] 1.0152765 1.3606787
## [92,] 1.0978468 1.0129742
## [93,] 0.9068615 1.4025693
## [94,] 1.4808052 1.2251358
## [95,] 1.1234761 0.9579472
## [96,] 0.8776120 0.8975560
## [97,] 0.8126690 0.9773330
## [98,] 1.9600443 1.5367003
## [99,] 1.2707879 1.5715004
## [100,] 0.9669200 0.8821530
##
## $observedFstatRow
## [1] 10.66838
##
## $observedFstatCol
## [1] 9.711019
##
## $bootstrapPvalueRow
## [1] 0
##
## $bootstrapPvalueCol
## [1] 0
Since the bootstrapPvalueRow and the bootstrapPvalueCol are lower than 0.01, the bi-cluster is stable.
The next step is to calculate partial rankings based on the indicators and countries in the top league. The calculation of each score of each indicator of the countries are done according to the method used in the original U21 ranking. The original weights of U21’s indicators are used, and the total scores for the countries are calculated using the selected indicators in the given bi-cluster. Table 3 shows the result of partial ranking in the case of the top league.
C<-biclust::bicluster(orig_mtx,res,number=1)
B<-as.data.frame(C[[1]])
selectedweight<-weights[,colnames(B)]
B$Overall_Score<-rowSums(B*selectedweight)*100/max(rowSums(B*selectedweight))
B$Rank<-rank(-B$Overall_Score)
kable(B,caption = "**Table 3** League A by iBBiG",digits = 2)
R3 | R4 | R5 | E1 | E2 | E3 | E4 | C2 | C5 | C6 | O2 | O3 | O6 | O7 | O8 | GCI_64_VALUE_Secondary_education_enrollment,gross%* (a nagyobb jobb) | GCI_65_VALUE_Tertiary_education_enrollment,gross%* (a nagyobb jobb) | GCI_66_VALUE_A._Quantity_of_education (a nagyobb jobb) | GCI_67_VALUE_Quality_of_the_education_system,1-7(best) | GCI_68_VALUE_Quality_of_math_and_science_education,1-7(best) | GCI_69_VALUE_Quality_of_management_schools,1-7(best) | GCI_70_VALUE_Internet_access_in_schools,1-7(best) | GCI_71_VALUE_B._Quality_of_education (a nagyobb jobb) | GCI_72_VALUE_Availability_of_research_and_training_services,1-7(best) | GCI_73_VALUE_Extent_of_staff_training,1-7(best) | GCI_74_VALUE_C._On-the-job_training (a nagyobb jobb) | GCI_75_VALUE_5th_pillar:_Higher_education_and_training (a nagyobb jobb) | GCI_149_VALUE_Capacity_for_innovation,1-7(best) | GCI_150_VALUE_Quality_of_scientific_research_institutions,1-7(best) | GCI_151_VALUE_Company_spending_on_R&D,1-7(best) | GCI_152_VALUE_University-industry_collaboration_in_R&D,1-7(best) | GCI_153_VALUE_Gov’t_procurement_of_advanced_tech_products,1-7(best) | GCI_154_VALUE_Availability_of_scientists_and_engineers,1-7(best) | GCI_156_VALUE_12th_pillar:_Innovation (a nagyobb jobb) | GCI_158_VALUE_Global_Competitiveness_Index (a nagyobb jobb) | WB_WDI_223_GDP_per_capita,PPP(constant_2011_international_$)_[NY.GDP.PCAP.PP.KD] | Overall_Score | Rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Australia | 59.2 | 59.4 | 63.8 | 100.0 | 84.0 | 95.5 | 93.2 | 67.30 | 73.10 | 44.20 | 95.50 | 79.30 | 82.60 | 71.70 | 55.20 | 135.54 | 86.33 | 7.00 | 4.80 | 4.61 | 5.05 | 6.16 | 5.16 | 5.21 | 4.52 | 4.87 | 5.67 | 4.61 | 5.78 | 3.57 | 4.84 | 3.42 | 4.72 | 4.41 | 5.08 | 43395.57 | 54.83 | 11 |
Austria | 58.7 | 72.9 | 79.8 | 100.0 | 77.9 | 100.0 | 85.1 | 88.30 | 67.20 | 84.10 | 62.60 | 78.00 | 70.40 | 36.20 | 59.40 | 97.69 | 72.44 | 6.48 | 4.49 | 4.61 | 4.63 | 5.58 | 4.83 | 5.93 | 4.82 | 5.37 | 5.56 | 4.96 | 5.01 | 4.80 | 4.68 | 3.66 | 4.27 | 4.82 | 5.16 | 44122.65 | 55.66 | 10 |
Belgium | 59.3 | 47.8 | 47.6 | 100.0 | 91.2 | 100.0 | 94.6 | 91.00 | 74.10 | 63.00 | 66.00 | 85.60 | 68.70 | 64.70 | 49.50 | 107.26 | 70.83 | 6.42 | 5.33 | 6.01 | 6.01 | 5.87 | 5.81 | 6.00 | 5.11 | 5.56 | 5.93 | 5.16 | 6.08 | 4.84 | 5.58 | 3.51 | 4.47 | 4.89 | 5.18 | 41355.16 | 52.27 | 14 |
Canada | 87.9 | 64.5 | 70.5 | 100.0 | 84.0 | 88.6 | 77.7 | 69.10 | 85.60 | 64.50 | 84.10 | 81.60 | 94.60 | 96.00 | 58.10 | 103.40 | 58.88 | 5.93 | 5.25 | 5.10 | 5.77 | 6.25 | 5.59 | 5.29 | 4.70 | 4.99 | 5.50 | 4.63 | 5.47 | 3.93 | 4.90 | 3.69 | 5.06 | 4.54 | 5.24 | 42946.36 | 54.32 | 13 |
Denmark | 74.2 | 100.0 | 100.0 | 100.0 | 84.0 | 95.5 | 75.7 | 89.25 | 89.15 | 96.25 | 92.45 | 92.45 | 72.95 | 62.95 | 91.05 | 124.66 | 79.60 | 6.78 | 4.81 | 4.54 | 5.21 | 5.96 | 5.13 | 5.32 | 4.94 | 5.13 | 5.68 | 5.28 | 5.38 | 4.79 | 4.90 | 3.35 | 4.57 | 5.06 | 5.29 | 45082.15 | 57.11 | 8 |
Finland | 65.4 | 78.7 | 74.6 | 100.0 | 100.0 | 100.0 | 92.6 | 77.20 | 87.80 | 79.40 | 90.90 | 77.80 | 94.80 | 73.50 | 100.00 | 107.68 | 93.72 | 7.00 | 5.86 | 6.26 | 5.58 | 6.47 | 6.04 | 5.90 | 5.32 | 5.61 | 6.22 | 5.59 | 5.72 | 5.68 | 5.97 | 4.07 | 6.25 | 5.78 | 5.50 | 39017.54 | 49.68 | 15 |
France | 58.9 | 49.3 | 46.0 | 100.0 | 84.0 | 95.5 | 87.2 | 78.40 | 67.50 | 44.90 | 51.90 | 73.80 | 56.60 | 55.70 | 51.30 | 109.71 | 58.30 | 5.90 | 4.38 | 5.19 | 5.73 | 4.71 | 5.00 | 5.29 | 4.47 | 4.88 | 5.26 | 4.77 | 5.56 | 4.71 | 4.58 | 3.75 | 4.83 | 4.74 | 5.08 | 37531.43 | 47.44 | 17 |
Germany | 62.2 | 54.1 | 54.2 | 100.0 | 78.6 | 97.7 | 77.0 | 70.50 | 86.80 | 61.30 | 50.20 | 80.40 | 56.10 | 51.50 | 54.30 | 101.27 | 61.65 | 6.04 | 5.24 | 5.09 | 4.98 | 5.05 | 5.09 | 6.03 | 5.02 | 5.52 | 5.55 | 5.60 | 5.78 | 5.46 | 5.34 | 4.19 | 4.92 | 5.47 | 5.49 | 43417.73 | 54.69 | 12 |
Hong Kong | 73.6 | 37.3 | 48.2 | 100.0 | 84.0 | 90.9 | 97.3 | 99.80 | 79.40 | 38.60 | 76.80 | 78.70 | 59.40 | 32.20 | 39.00 | 88.65 | 59.67 | 5.89 | 4.75 | 5.42 | 5.41 | 6.05 | 5.41 | 5.42 | 4.63 | 5.02 | 5.44 | 4.50 | 4.75 | 3.93 | 4.59 | 3.97 | 4.48 | 4.38 | 5.46 | 52700.52 | 66.02 | 4 |
Ireland | 62.6 | 47.7 | 52.1 | 100.0 | 75.3 | 100.0 | 85.8 | 81.80 | 81.00 | 52.90 | 70.70 | 77.50 | 72.90 | 70.50 | 45.70 | 119.12 | 71.24 | 6.43 | 5.43 | 5.01 | 5.32 | 5.35 | 5.28 | 5.03 | 4.78 | 4.90 | 5.54 | 5.02 | 5.50 | 4.58 | 5.24 | 3.53 | 4.95 | 4.68 | 4.98 | 48885.62 | 61.45 | 6 |
Israel | 42.0 | 53.3 | 42.6 | 100.0 | 84.0 | 90.9 | 85.8 | 70.20 | 100.00 | 50.10 | 72.90 | 71.40 | 61.90 | 86.80 | 16.30 | 101.70 | 65.85 | 6.21 | 3.72 | 3.95 | 4.86 | 5.48 | 4.50 | 4.61 | 3.95 | 4.28 | 5.00 | 5.82 | 6.27 | 5.31 | 5.50 | 4.34 | 5.20 | 5.56 | 4.95 | 31812.63 | 40.48 | 21 |
Japan | 62.6 | 39.1 | 37.3 | 92.0 | 35.8 | 100.0 | 92.6 | 36.50 | 66.80 | 80.20 | 37.90 | 54.10 | 59.40 | 86.80 | 69.70 | 101.81 | 61.46 | 6.03 | 4.43 | 5.09 | 4.23 | 5.33 | 4.77 | 5.64 | 5.41 | 5.53 | 5.44 | 5.38 | 5.81 | 5.83 | 5.00 | 4.09 | 5.44 | 5.54 | 5.47 | 37322.85 | 47.13 | 18 |
Korea, Rep. (South) | 39.0 | 37.2 | 30.3 | 79.3 | 68.2 | 100.0 | 78.4 | 39.20 | 64.20 | 76.90 | 44.10 | 55.50 | 100.00 | 75.60 | 71.60 | 97.20 | 98.38 | 7.00 | 3.62 | 4.70 | 4.21 | 6.25 | 4.69 | 4.67 | 4.22 | 4.44 | 5.38 | 4.70 | 4.98 | 4.50 | 4.62 | 4.14 | 4.42 | 4.83 | 4.96 | 33425.69 | 42.40 | 20 |
Malaysia | 49.7 | 31.7 | 12.9 | 100.0 | 99.8 | 95.5 | 80.4 | 44.80 | 81.20 | 2.70 | 14.90 | 41.50 | 36.80 | 34.00 | 21.80 | 67.24 | 35.97 | 3.74 | 5.26 | 5.20 | 5.13 | 5.41 | 5.25 | 5.44 | 5.35 | 5.40 | 4.80 | 5.19 | 5.21 | 4.93 | 5.33 | 5.18 | 5.22 | 4.67 | 5.16 | 24195.90 | 30.75 | 24 |
Netherlands | 67.1 | 75.8 | 85.3 | 100.0 | 79.1 | 100.0 | 100.0 | 76.60 | 85.20 | 89.50 | 88.40 | 98.30 | 75.80 | 59.90 | 43.40 | 129.91 | 77.34 | 6.69 | 5.30 | 5.44 | 5.70 | 6.38 | 5.70 | 6.13 | 5.03 | 5.58 | 5.99 | 5.23 | 5.87 | 4.65 | 5.38 | 4.00 | 4.62 | 5.25 | 5.45 | 45668.44 | 57.73 | 7 |
New Zealand | 40.7 | 43.2 | 32.9 | 100.0 | 100.0 | 100.0 | 95.9 | 79.60 | 62.60 | 50.20 | 74.50 | 75.70 | 80.10 | 73.50 | 50.70 | 119.54 | 79.78 | 6.79 | 5.32 | 5.33 | 5.18 | 6.01 | 5.46 | 4.91 | 4.93 | 4.92 | 5.72 | 5.07 | 5.26 | 3.81 | 4.91 | 3.45 | 4.44 | 4.42 | 5.20 | 34468.77 | 43.81 | 19 |
Norway | 72.4 | 59.0 | 83.1 | 100.0 | 84.0 | 95.5 | 86.5 | 80.80 | 79.40 | 76.20 | 75.30 | 79.30 | 72.50 | 71.20 | 73.30 | 111.06 | 74.10 | 6.55 | 5.05 | 4.55 | 5.26 | 6.52 | 5.34 | 5.52 | 5.16 | 5.34 | 5.75 | 5.02 | 5.20 | 4.49 | 5.02 | 4.20 | 4.51 | 4.85 | 5.35 | 63286.69 | 79.20 | 2 |
Portugal | 41.4 | 59.4 | 38.9 | 100.0 | 87.5 | 100.0 | 85.8 | 73.40 | 55.60 | 38.50 | 52.70 | 63.80 | 65.40 | 32.30 | 60.90 | 112.85 | 68.86 | 6.34 | 4.27 | 4.54 | 5.92 | 5.74 | 5.12 | 5.11 | 4.18 | 4.64 | 5.37 | 4.29 | 5.38 | 3.57 | 4.68 | 3.77 | 5.24 | 4.08 | 4.54 | 26023.67 | 33.33 | 23 |
Singapore | 93.6 | 57.3 | 86.8 | 98.7 | 73.3 | 88.6 | 91.9 | 80.80 | 81.50 | 47.30 | 75.20 | 84.20 | 52.10 | 71.30 | 80.40 | 107.10 | 81.30 | 6.85 | 5.80 | 6.32 | 5.83 | 6.36 | 6.08 | 5.46 | 5.25 | 5.36 | 6.09 | 4.99 | 5.61 | 4.85 | 5.58 | 5.10 | 4.94 | 5.18 | 5.65 | 80305.45 | 100.00 | 1 |
Sweden | 76.5 | 91.4 | 95.7 | 100.0 | 85.9 | 100.0 | 88.5 | 86.90 | 96.90 | 100.00 | 100.00 | 83.60 | 73.40 | 65.80 | 69.70 | 98.38 | 70.03 | 6.38 | 4.58 | 4.42 | 5.16 | 6.32 | 5.12 | 5.42 | 5.10 | 5.26 | 5.59 | 5.50 | 5.48 | 5.36 | 5.33 | 4.02 | 4.88 | 5.37 | 5.41 | 44167.63 | 55.95 | 9 |
Switzerland | 85.6 | 71.6 | 84.3 | 98.5 | 74.1 | 97.7 | 77.0 | 100.00 | 98.30 | 60.50 | 93.40 | 100.00 | 53.90 | 65.80 | 42.60 | 96.31 | 55.56 | 5.79 | 5.99 | 5.94 | 6.16 | 6.10 | 6.05 | 6.50 | 5.69 | 6.09 | 5.98 | 5.89 | 6.35 | 5.94 | 5.79 | 3.95 | 4.77 | 5.70 | 5.70 | 56680.44 | 71.11 | 3 |
Taiwan | 50.4 | 14.4 | 14.4 | 97.8 | 67.2 | 88.6 | 89.2 | 28.90 | 80.70 | 36.20 | 65.00 | 56.20 | 64.50 | 74.60 | 61.40 | 100.25 | 83.88 | 6.95 | 3.91 | 5.27 | 4.81 | 6.11 | 5.02 | 5.44 | 4.35 | 4.90 | 5.63 | 4.75 | 5.17 | 4.63 | 5.09 | 4.05 | 4.99 | 5.10 | 5.25 | 31579.77 | 40.05 | 22 |
United Kingdom | 62.0 | 45.7 | 44.6 | 100.0 | 87.0 | 100.0 | 85.8 | 76.20 | 83.00 | 64.60 | 79.20 | 91.90 | 60.70 | 73.70 | 56.30 | 95.42 | 61.88 | 6.05 | 4.63 | 4.29 | 5.83 | 6.35 | 5.27 | 5.67 | 4.67 | 5.17 | 5.50 | 5.27 | 6.35 | 4.78 | 5.67 | 3.70 | 4.79 | 4.96 | 5.41 | 37983.13 | 48.12 | 16 |
United States | 100.0 | 41.6 | 53.3 | 100.0 | 95.4 | 100.0 | 95.9 | 48.30 | 95.10 | 62.60 | 63.20 | 95.40 | 94.60 | 79.40 | 61.00 | 93.67 | 94.28 | 7.00 | 4.56 | 4.39 | 5.58 | 6.06 | 5.15 | 5.64 | 5.00 | 5.32 | 5.82 | 5.88 | 6.11 | 5.49 | 5.85 | 4.35 | 5.32 | 5.49 | 5.54 | 51830.99 | 65.20 | 5 |
The final step is to compare the results of the partial rankings to the corresponding part of the original U21 ranking. The results of the Spearman’s rank correlation show a moderate positive relationship between the two rankings.
cor(B$Rank,U21_rank[rownames(B),],method="spearman")
## [1] 0.6608696
cor.test(B$Rank,U21_rank[rownames(B),],method="spearman")
##
## Spearman's rank correlation rho
##
## data: B$Rank and U21_rank[rownames(B), ]
## S = 780, p-value = 0.0005987
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.6608696
An interesting question is how the League A of the U21 and the League A of the extended database (U21 + GCI + GDP) are overlap. In order to answer this question, one can make a Venn diagram of the two leagues to detect overlaps. The first step before plotting the Venn diagrams (one for the indicators and one for the countries), is to calculate the top league bi-cluster of the U21 database.
res_U21 <- iBBiG(binaryMatrix=binarize(mtx_u21,threshold = 0.50),nModules = 1,alpha=0.05,pop_size = 100,mutation = 0.08,stagnation = 50,selection_pressure = 1.2,max_sp = 15,success_ratio = 0.8)
## Module: 1 ... done
T<-biclust::bicluster(orig_mtx,res,number = 1)
T_U21<-biclust::bicluster(orig_mtx_u21,res_U21,number = 1)
League_A<-T[[1]]
League_A_U21<-T_U21[[1]]
Fig.6 shows the Venn diagram of the indicators in the two different top leagues. The green circle contains those indicators that are in the U21’s top league (League A) while in the red circle the extended database’s top league’s indicators can be found. 15 indicators are the same in the two top leagues.
draw.pairwise.venn(area1=nrow(as.matrix(colnames(League_A))),area2=nrow(as.matrix(colnames(League_A_U21))),cross.area=nrow(as.matrix(intersect(colnames(League_A),colnames(League_A_U21)))),fill=c("red","green"),category=c("Extended","U21"),main="Venn Diagram of indicators on League As")
## (polygon[GRID.polygon.13], polygon[GRID.polygon.14], polygon[GRID.polygon.15], polygon[GRID.polygon.16], text[GRID.text.17], text[GRID.text.18], text[GRID.text.19], text[GRID.text.20], text[GRID.text.21])
One can draw a Venn diagram for the countries in the two top leagues. On Fig. 7 the green circle contains those countries that are in the U21’s top league while in the red circle the extended database’s top league’s countries can be found. 21 countries are the same in the two top leagues.
draw.pairwise.venn(area1=nrow(as.matrix(rownames(League_A))),area2=nrow(as.matrix(rownames(League_A_U21))),cross.area=nrow(as.matrix(intersect(rownames(League_A),rownames(League_A_U21)))),fill=c("red","green"),category=c("Extended","U21"),main="Venn Diagram of countries on League As")
## (polygon[GRID.polygon.22], polygon[GRID.polygon.23], polygon[GRID.polygon.24], polygon[GRID.polygon.25], text[GRID.text.26], text[GRID.text.27], text[GRID.text.28], text[GRID.text.29], text[GRID.text.30])
To identify the lower league(s), the reverse data matrix is needed to be calculated firstly. Applying the iBBiG method on the reversed normalized data can specify the lower league bi-clusters.
rmtx <- 1-mtx
rres <- iBBiG(binaryMatrix=binarize(rmtx,threshold = NA),nModules = 2,alpha=0.3,pop_size = 100,mutation = 0.08,stagnation = 50,selection_pressure = 1.2,max_sp = 15,success_ratio = 0.6)
## [1] "Threshold: 0.5"
## Module: 1 ... done
## Module: 2 ... done
summary(rres)
##
## An object of class iBBiG
##
## Number of Clusters found: 2
##
## First 2 Cluster scores and sizes:
## M 1 M 2
## Cluster Score 773.3831 91.93064
## Number of Rows: 29.0000 22.00000
## Number of Columns: 39.0000 9.00000
The results above show that the iBBiG algorithm found 2 bi-clusters on the reversed data. In the first bi-cluster there are 29 rows (countries) and 39 columns (indicators), in the second bi-cluster there are 22 rows (countries) and 9 columns (indicators). The next step is to calculate F-tests to determine which bi-cluster is significant.
Obs.FStat <- NULL
for (i in c(1:rres@Number)){
Obs.FStat[[i]] <- computeObservedFstat(x=rmtx,bicResult=rres,number=i)
}
kable(Obs.FStat,caption = "**Table 4** The results of row and column effects of the reversed data for two bi-clusters")
|
|
In Table 4 the results of the F-tests can be seen. In the case of the first bi-cluster, both the row and column effect are significant. On the other hand, in the case of the second bi-cluster the row effect is not significant; thus the second bi-cluster is not significant. Only the first bi-cluster is significant which is the lower league (League C).
The next step is to identify the lower league bi-cluster.
## [1] "Threshold: 0.5"
## Module: 1 ... done
##
## An object of class iBBiG
##
## There was one cluster found with Score 773.3831 and
## 29 Rows and 39 columns
The results of the iBBiG algorithm on the reversed data can be seen above. The algorithm found one bi-cluster in which there are 29 rows (countries) and 39 columns (indicators). The following step is to calculate the F-tests to determine whether the bi-cluster is significant or not.
In Table 5 the results of the row and column effects (F-tests) can be read. Both the row and column effects are significant, so the lower league bi-cluster is significant.
Obs.FStat <- NULL
for (i in c(1:rres@Number)){
Obs.FStat[[i]] <- computeObservedFstat(x=rmtx,bicResult=rres,number=i)
}
kable(Obs.FStat,caption = "**Table 5** The results of row and column effects of the lower league")
|
To see the differences between those countries that are included in the lower league and those that are not in the lower league, one can plot the means, medians, variances, and median absolute deviations (MAD) for rows (countries) (see Fig. 8). On Fig. 8 the red line indicates those countries that are included in the lower league, while the black line shows the remaining countries. The countries in the lower league have lower means and medians.
exploreBic(dset=mtx,bres=rres,mname='biclust',pfor='all',gby='genes',bnum=1)
One can do the same for the columns (indicators). Fig. 9 shows the differences between the means, medians, variances, and median absolute deviations (MAD) for the indicators. The red line indicates those indicators that are in the lower league, while the black line shows the remaining indicators.
exploreBic(dset=mtx,bres=rres,mname='biclust',pfor='all',gby='conditions',bnum=1)
Fig. 10 depicts the heat map of the lower league. The lower league contains 29 countries and 39 indicators. The red cells mean higher values in the case of indicators, while blue cells symbol lower values.
drawHeatmap(x=mtx,bicResult=rres,number=1,paleta=bluered(100),beamercolor = TRUE)
Fig. 11 shows the heat map of the results of iBBiG on reversed normalized data. The lower league can be seen in the upper left corner of Fig. 11, which is enlarged on Fig.10.
drawHeatmap(x=mtx,bicResult=rres,local=FALSE,number=1,paleta=bluered(100),beamercolor = TRUE)
The next step is to check the stability of the lower league bi-cluster. The results of the bootstrap can be seen below.
Bootstrap <- diagnoseColRow(x=mtx,bicResult=rres,number=1,
nResamplings=100,replace=TRUE)
Bootstrap
## $bootstrapFstats
## fval.row fval.col
## [1,] 0.8787951 1.1993345
## [2,] 1.1479031 0.8119857
## [3,] 0.9185714 1.3819028
## [4,] 1.0515971 1.2870996
## [5,] 0.6711014 1.1869916
## [6,] 0.7256949 1.3622916
## [7,] 1.1097199 1.3266477
## [8,] 1.0173044 1.2075056
## [9,] 1.2001393 1.0269856
## [10,] 1.2875160 0.6932334
## [11,] 0.9397023 0.9578160
## [12,] 0.9254670 0.9772501
## [13,] 0.9739202 1.0086570
## [14,] 0.8057073 1.2884739
## [15,] 1.5034866 0.9482308
## [16,] 0.6151914 0.9449776
## [17,] 1.4159744 0.8233872
## [18,] 0.6950945 1.0090267
## [19,] 0.7447388 0.9996370
## [20,] 0.9293428 0.6893537
## [21,] 1.1753038 1.1526448
## [22,] 0.7429555 1.0307952
## [23,] 0.6219049 0.7573396
## [24,] 1.0111328 0.9726158
## [25,] 1.0113143 1.2827577
## [26,] 1.2316483 0.7151238
## [27,] 1.2353553 1.0921821
## [28,] 1.2241078 0.9375155
## [29,] 0.7286721 0.8996095
## [30,] 1.2138118 1.0991131
## [31,] 1.0748515 1.0166719
## [32,] 0.5032830 0.9733850
## [33,] 1.0445495 1.0934007
## [34,] 1.2926866 1.0033226
## [35,] 1.1000232 1.0031260
## [36,] 1.0071052 0.8099202
## [37,] 1.2508099 0.7534137
## [38,] 0.9551263 1.2858541
## [39,] 0.3465532 0.8366278
## [40,] 0.7751429 0.9808323
## [41,] 0.8573466 1.3089716
## [42,] 0.9404418 0.8184324
## [43,] 0.7119250 1.3058550
## [44,] 1.0064976 1.3067592
## [45,] 0.7907035 0.9199236
## [46,] 0.9281528 1.0198737
## [47,] 0.6669084 0.7307468
## [48,] 0.9660137 0.9953439
## [49,] 1.0873105 0.7482719
## [50,] 0.8220739 0.6219358
## [51,] 0.8944025 1.1141820
## [52,] 1.1483426 1.0547553
## [53,] 0.8639051 1.0415858
## [54,] 1.0156392 1.0290273
## [55,] 0.7594707 0.8070350
## [56,] 1.0063043 1.1970833
## [57,] 0.9496060 1.1385846
## [58,] 0.9643459 0.7935547
## [59,] 0.6304430 1.0704103
## [60,] 0.8211591 0.8507563
## [61,] 0.8714773 1.1699452
## [62,] 1.2386100 1.4304289
## [63,] 0.7784997 0.9700217
## [64,] 0.8695585 1.5800689
## [65,] 1.1858165 0.8027987
## [66,] 1.1088257 1.2027392
## [67,] 1.6205025 1.0034114
## [68,] 0.7371161 1.0834809
## [69,] 0.9777323 1.3261343
## [70,] 1.0243051 1.0839284
## [71,] 0.7243905 1.2884683
## [72,] 0.9842515 0.9378520
## [73,] 0.5377015 0.9844052
## [74,] 0.9582576 1.2670905
## [75,] 0.6990923 0.9718825
## [76,] 0.6688881 0.9551415
## [77,] 0.9539975 1.2000477
## [78,] 1.1018907 0.7914655
## [79,] 0.6309446 0.8638333
## [80,] 0.9339823 0.3076932
## [81,] 1.2841541 1.2029371
## [82,] 1.3483125 0.9638501
## [83,] 1.2250632 0.9440469
## [84,] 0.7596095 1.0701843
## [85,] 0.9719936 0.7823847
## [86,] 1.0898750 0.9861345
## [87,] 1.6464125 0.9385529
## [88,] 0.9311192 0.7878789
## [89,] 1.1811276 0.8706689
## [90,] 1.1635742 0.5973442
## [91,] 1.1607369 1.3041162
## [92,] 1.1474227 0.9049634
## [93,] 1.0795009 0.6811819
## [94,] 1.0779367 1.0681114
## [95,] 0.7780336 1.2273647
## [96,] 1.0265458 1.6387280
## [97,] 1.2477054 0.9540996
## [98,] 0.4876795 1.0352828
## [99,] 1.5877732 1.4472059
## [100,] 0.6446398 0.9540343
##
## $observedFstatRow
## [1] 10.43374
##
## $observedFstatCol
## [1] 12.11962
##
## $bootstrapPvalueRow
## [1] 0
##
## $bootstrapPvalueCol
## [1] 0
Since the bootstrapPvalueRow and the bootstrapPvalueCol are lower than 0.01, the bi-cluster is stable.
The next step is to calculate partial rankings based on the indicators and countries in the lower league. The calculation of each score of each indicator of the countries are done according to the method used in the original U21 ranking. The original weights of U21’s indicators are used, and the total scores for the countries are calculated using the selected indicators in the given bi-cluster. Table 6 shows the result of partial ranking in the case of the lower league.
C<-biclust::bicluster(orig_mtx,rres,number=1)
B<-as.data.frame(C[[1]])
selectedweight<-weights[,colnames(B)]
B$Overall_Score<-rowSums(B*selectedweight)*100/max(rowSums(B*selectedweight))
B$Rank<-rank(-B$Overall_Score)
kable(B,caption = "**Table 6** League C by reverse iBBiG",digits = 2)
R1 | R2 | R3 | R4 | R5 | C1 | C2 | C3 | C4 | C5 | C6 | O1 | O2 | O3 | O4 | O5 | O6 | O7 | O8 | O9 | GCI_64_VALUE_Secondary_education_enrollment,gross%* (a nagyobb jobb) | GCI_65_VALUE_Tertiary_education_enrollment,gross%* (a nagyobb jobb) | GCI_67_VALUE_Quality_of_the_education_system,1-7(best) | GCI_69_VALUE_Quality_of_management_schools,1-7(best) | GCI_71_VALUE_B._Quality_of_education (a nagyobb jobb) | GCI_72_VALUE_Availability_of_research_and_training_services,1-7(best) | GCI_73_VALUE_Extent_of_staff_training,1-7(best) | GCI_74_VALUE_C._On-the-job_training (a nagyobb jobb) | GCI_75_VALUE_5th_pillar:_Higher_education_and_training (a nagyobb jobb) | GCI_149_VALUE_Capacity_for_innovation,1-7(best) | GCI_150_VALUE_Quality_of_scientific_research_institutions,1-7(best) | GCI_151_VALUE_Company_spending_on_R&D,1-7(best) | GCI_152_VALUE_University-industry_collaboration_in_R&D,1-7(best) | GCI_153_VALUE_Gov’t_procurement_of_advanced_tech_products,1-7(best) | GCI_154_VALUE_Availability_of_scientists_and_engineers,1-7(best) | GCI_155_VALUE_PCT_patents,_applications/million_pop.* (a nagyobb jobb) | GCI_156_VALUE_12th_pillar:_Innovation (a nagyobb jobb) | GCI_158_VALUE_Global_Competitiveness_Index (a nagyobb jobb) | WB_WDI_223_GDP_per_capita,PPP(constant_2011_international_$)_[NY.GDP.PCAP.PP.KD] | Overall_Score | Rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Argentina | 49.3 | 53.6 | 18.3 | 14.4 | 14.4 | 1.6 | 66.2 | 14.2 | 9.8 | 52.0 | 13.0 | 1.5 | 7.2 | 47.4 | 2.0 | 7.5 | 74.2 | 25.5 | 15.7 | 53.9 | 91.94 | 78.63 | 2.98 | 4.82 | 3.79 | 4.21 | 3.73 | 3.97 | 4.83 | 3.67 | 4.10 | 2.80 | 3.64 | 2.54 | 3.81 | 1.36 | 3.04 | 3.79 | 18797.55 | 38.41 | 18 |
Brazil | 38.4 | 42.9 | 51.4 | 14.4 | 14.4 | 1.0 | 35.8 | 24.7 | 23.9 | 47.4 | 20.6 | 10.2 | 10.2 | 41.1 | 1.8 | 18.8 | 52.3 | 21.7 | 9.5 | 85.5 | 99.35 | 65.62 | 2.72 | 4.53 | 3.38 | 4.45 | 4.31 | 4.38 | 4.92 | 4.10 | 4.03 | 3.53 | 3.80 | 3.37 | 3.31 | 3.23 | 3.31 | 4.34 | 15371.00 | 31.73 | 22 |
Bulgaria | 23.3 | 38.9 | 17.6 | 6.4 | 2.4 | 17.9 | 76.1 | 6.0 | 5.8 | 29.7 | 29.0 | 0.2 | 6.1 | 38.5 | 0.0 | 0.0 | 59.2 | 44.1 | 22.3 | 82.2 | 93.11 | 62.70 | 3.39 | 3.39 | 4.02 | 3.41 | 3.30 | 3.35 | 4.49 | 3.30 | 3.51 | 2.83 | 3.00 | 3.15 | 3.59 | 5.05 | 2.94 | 4.37 | 16302.22 | 33.41 | 21 |
Chile | 30.9 | 85.7 | 27.8 | 12.2 | 5.4 | 1.5 | 81.4 | 9.4 | 13.8 | 50.6 | 24.4 | 1.2 | 14.0 | 50.2 | 5.1 | 8.4 | 70.0 | 54.0 | 4.2 | 37.6 | 89.01 | 74.39 | 3.69 | 5.41 | 4.41 | 4.46 | 4.22 | 4.34 | 5.09 | 3.71 | 4.03 | 3.06 | 4.20 | 3.79 | 4.63 | 6.69 | 3.54 | 4.60 | 22226.45 | 45.24 | 15 |
China | 30.0 | 47.5 | 17.4 | 15.5 | 3.2 | 1.5 | 22.5 | 6.2 | 36.9 | 50.2 | 20.5 | 58.9 | 8.6 | 37.4 | 1.2 | 22.5 | 24.1 | 6.7 | 13.2 | 50.6 | 88.98 | 26.70 | 4.03 | 3.93 | 4.39 | 4.35 | 4.29 | 4.32 | 4.42 | 4.24 | 4.34 | 4.29 | 4.40 | 4.30 | 4.41 | 11.66 | 3.91 | 4.89 | 12758.65 | 26.35 | 25 |
Croatia | 32.7 | 35.5 | 24.4 | 22.1 | 10.7 | 3.0 | 49.1 | 82.4 | 6.8 | 30.2 | 45.8 | 0.6 | 29.2 | 42.3 | 10.2 | 4.2 | 58.3 | 33.7 | 21.1 | 66.5 | 98.43 | 61.63 | 3.23 | 4.20 | 4.29 | 4.14 | 3.22 | 3.68 | 4.67 | 3.11 | 4.00 | 3.07 | 3.39 | 2.65 | 3.93 | 9.98 | 3.10 | 4.13 | 20136.09 | 41.16 | 16 |
Czech Republic | 42.5 | 42.9 | 29.9 | 38.3 | 27.6 | 42.0 | 55.9 | 70.0 | 51.0 | 51.0 | 49.1 | 2.0 | 37.7 | 51.9 | 6.2 | 6.1 | 64.1 | 34.1 | 39.4 | 89.1 | 96.55 | 64.17 | 3.58 | 4.27 | 4.42 | 4.87 | 4.14 | 4.51 | 5.02 | 4.60 | 4.55 | 3.70 | 4.00 | 2.98 | 4.24 | 15.83 | 3.67 | 4.53 | 29119.62 | 59.26 | 5 |
Greece | 56.5 | 50.0 | 31.9 | 14.4 | 14.4 | 24.7 | 58.2 | 27.5 | 30.3 | 43.0 | 38.2 | 2.6 | 45.6 | 63.1 | 9.4 | 9.9 | 88.7 | 48.7 | 25.2 | 55.9 | 107.86 | 113.98 | 3.00 | 3.88 | 3.82 | 3.83 | 3.55 | 3.69 | 4.84 | 3.30 | 3.74 | 2.62 | 3.06 | 2.56 | 5.38 | 7.62 | 3.18 | 4.04 | 23989.14 | 49.10 | 13 |
Hungary | 36.6 | 37.9 | 34.2 | 26.4 | 13.9 | 21.3 | 68.3 | 36.1 | 39.7 | 52.7 | 61.1 | 1.1 | 21.5 | 51.7 | 9.8 | 9.1 | 59.0 | 39.5 | 31.3 | 100.0 | 101.62 | 59.63 | 3.29 | 4.29 | 4.31 | 3.91 | 3.63 | 3.77 | 4.68 | 3.01 | 5.08 | 2.86 | 4.27 | 3.18 | 4.24 | 24.97 | 3.50 | 4.28 | 24016.30 | 49.10 | 12 |
India | 51.9 | 50.5 | 13.1 | 14.4 | 14.4 | 0.8 | 31.3 | 4.0 | 4.2 | 45.6 | 13.1 | 7.8 | 1.3 | 44.7 | 0.0 | 5.0 | 23.1 | 39.0 | 1.7 | 39.0 | 68.51 | 24.77 | 4.16 | 4.43 | 4.16 | 4.21 | 3.94 | 4.08 | 3.86 | 4.02 | 4.01 | 3.78 | 3.87 | 3.55 | 4.36 | 1.54 | 3.53 | 4.21 | 5389.90 | 11.68 | 29 |
Indonesia | 22.1 | 25.0 | 3.8 | 2.8 | 0.3 | 0.5 | 78.2 | 83.3 | 29.9 | 65.3 | 24.3 | 0.1 | 0.1 | 37.2 | 0.0 | 0.0 | 27.0 | 14.0 | 1.2 | 36.1 | 82.54 | 31.51 | 4.46 | 4.61 | 4.65 | 4.44 | 4.66 | 4.55 | 4.53 | 4.76 | 4.26 | 4.03 | 4.55 | 4.22 | 4.62 | 0.07 | 3.93 | 4.57 | 10003.09 | 20.88 | 27 |
Iran | 43.3 | 43.4 | 14.6 | 21.5 | 7.5 | 0.5 | 26.7 | 8.6 | 2.8 | 49.3 | 6.4 | 5.4 | 14.1 | 42.9 | 0.6 | 4.2 | 48.4 | 34.6 | 9.7 | 51.1 | 86.28 | 55.16 | 3.01 | 3.75 | 3.49 | 3.86 | 3.03 | 3.44 | 4.17 | 3.49 | 4.15 | 2.72 | 3.18 | 3.24 | 4.35 | 0.07 | 3.13 | 4.03 | 16450.72 | 33.53 | 20 |
Italy | 33.2 | 35.7 | 37.5 | 37.6 | 30.9 | 18.5 | 63.0 | 55.4 | 26.6 | 47.8 | 49.9 | 14.0 | 45.8 | 75.2 | 18.9 | 17.2 | 63.4 | 27.9 | 23.8 | 46.9 | 100.66 | 62.47 | 3.74 | 5.08 | 4.26 | 4.82 | 3.19 | 4.00 | 4.78 | 4.26 | 4.50 | 3.62 | 3.73 | 2.65 | 4.78 | 53.84 | 3.73 | 4.42 | 33945.84 | 68.69 | 2 |
Korea, Rep. (South) | 31.8 | 92.9 | 39.0 | 37.2 | 30.3 | 9.2 | 39.2 | 6.5 | 11.8 | 64.2 | 76.9 | 11.1 | 44.1 | 55.5 | 12.6 | 21.0 | 100.0 | 75.6 | 71.6 | 47.6 | 97.20 | 98.38 | 3.62 | 4.21 | 4.69 | 4.67 | 4.22 | 4.44 | 5.38 | 4.70 | 4.98 | 4.50 | 4.62 | 4.14 | 4.42 | 201.52 | 4.83 | 4.96 | 33425.69 | 68.25 | 3 |
Malaysia | 72.1 | 91.4 | 49.7 | 31.7 | 12.9 | 30.2 | 44.8 | 14.7 | 15.3 | 81.2 | 2.7 | 2.2 | 14.9 | 41.5 | 1.5 | 4.2 | 36.8 | 34.0 | 21.8 | 50.6 | 67.24 | 35.97 | 5.26 | 5.13 | 5.25 | 5.44 | 5.35 | 5.40 | 4.80 | 5.19 | 5.21 | 4.93 | 5.33 | 5.18 | 5.22 | 12.62 | 4.67 | 5.16 | 24195.90 | 49.16 | 11 |
Mexico | 42.8 | 50.0 | 30.8 | 14.2 | 5.4 | 1.6 | 61.2 | 25.1 | 24.1 | 51.9 | 13.4 | 2.3 | 4.0 | 42.0 | 0.7 | 7.5 | 27.5 | 32.4 | 5.4 | 37.2 | 85.68 | 28.99 | 2.81 | 4.24 | 3.37 | 4.33 | 3.95 | 4.14 | 3.99 | 3.72 | 3.94 | 3.09 | 3.97 | 3.40 | 3.95 | 1.83 | 3.31 | 4.27 | 16459.06 | 33.59 | 19 |
Poland | 45.0 | 53.6 | 34.7 | 28.9 | 14.9 | 4.9 | 44.5 | 13.0 | 15.3 | 27.1 | 24.7 | 4.5 | 22.9 | 41.1 | 2.7 | 9.9 | 72.9 | 44.4 | 22.4 | 79.4 | 97.66 | 73.19 | 3.57 | 4.00 | 4.20 | 4.81 | 3.97 | 4.39 | 5.04 | 3.76 | 3.88 | 2.83 | 3.50 | 3.24 | 4.17 | 7.15 | 3.26 | 4.48 | 24346.21 | 49.45 | 10 |
Portugal | 44.6 | 53.6 | 41.4 | 59.4 | 38.9 | 16.8 | 73.4 | 30.5 | 48.5 | 55.6 | 38.5 | 2.8 | 52.7 | 63.8 | 18.6 | 14.1 | 65.4 | 32.3 | 60.9 | 55.4 | 112.85 | 68.86 | 4.27 | 5.92 | 5.12 | 5.11 | 4.18 | 4.64 | 5.37 | 4.29 | 5.38 | 3.57 | 4.68 | 3.77 | 5.24 | 13.01 | 4.08 | 4.54 | 26023.67 | 53.30 | 8 |
Romania | 48.5 | 57.1 | 19.6 | 14.0 | 4.7 | 9.1 | 43.9 | 12.7 | 14.8 | 45.1 | 30.4 | 1.4 | 13.0 | 34.6 | 0.0 | 0.0 | 51.2 | 27.9 | 10.2 | 64.7 | 95.01 | 51.60 | 3.83 | 4.21 | 4.40 | 4.17 | 3.56 | 3.86 | 4.63 | 3.75 | 3.98 | 3.13 | 3.59 | 3.41 | 4.03 | 2.24 | 3.28 | 4.30 | 19666.95 | 40.00 | 17 |
Russia | 43.2 | 57.1 | 27.5 | 13.5 | 5.9 | 9.4 | 50.8 | 8.8 | 10.4 | 32.5 | 5.1 | 2.1 | 3.0 | 28.9 | 1.2 | 15.7 | 74.9 | 100.0 | 42.8 | 82.4 | 95.30 | 76.14 | 3.48 | 3.75 | 4.17 | 4.34 | 3.84 | 4.09 | 4.96 | 3.77 | 3.96 | 3.16 | 3.63 | 3.34 | 4.06 | 7.13 | 3.29 | 4.37 | 24880.08 | 50.51 | 9 |
Saudi Arabia | 100.0 | 90.0 | 63.7 | 14.4 | 14.4 | 16.9 | 87.5 | 88.9 | 7.7 | 49.3 | 15.8 | 0.6 | 4.0 | 48.5 | 8.4 | 18.5 | 42.8 | 27.9 | 16.5 | 33.5 | 116.17 | 50.94 | 4.11 | 4.16 | 4.23 | 4.08 | 4.09 | 4.08 | 4.64 | 3.98 | 4.17 | 3.59 | 4.20 | 4.57 | 4.35 | 6.65 | 3.80 | 5.06 | 49958.44 | 100.00 | 1 |
Serbia | 59.6 | 56.8 | 25.4 | 45.3 | 12.6 | 17.8 | 56.6 | 57.2 | 6.1 | 49.3 | 0.1 | 0.7 | 17.1 | 43.5 | 7.0 | 5.0 | 50.0 | 32.7 | 21.0 | 61.0 | 91.71 | 52.38 | 3.06 | 3.55 | 3.80 | 3.52 | 3.09 | 3.30 | 4.25 | 2.97 | 3.74 | 2.45 | 3.24 | 2.88 | 3.88 | 2.31 | 2.89 | 3.90 | 13112.82 | 27.35 | 24 |
Slovakia | 29.4 | 32.1 | 27.0 | 24.5 | 15.0 | 19.1 | 72.1 | 22.5 | 6.2 | 37.7 | 32.7 | 0.4 | 16.4 | 44.3 | 0.0 | 0.0 | 54.6 | 35.1 | 38.2 | 89.8 | 93.88 | 55.11 | 2.76 | 3.78 | 4.02 | 4.48 | 3.83 | 4.16 | 4.65 | 3.54 | 3.86 | 3.05 | 3.36 | 2.93 | 3.96 | 9.20 | 3.18 | 4.15 | 27237.62 | 55.06 | 7 |
Slovenia | 47.0 | 46.4 | 37.9 | 27.5 | 21.1 | 9.1 | 63.6 | 78.4 | 45.5 | 42.3 | 56.5 | 0.6 | 60.1 | 53.3 | 21.8 | 4.2 | 84.4 | 46.9 | 57.9 | 70.9 | 97.57 | 86.02 | 4.07 | 4.37 | 4.91 | 4.43 | 3.70 | 4.07 | 5.33 | 3.71 | 4.74 | 3.08 | 3.96 | 2.98 | 3.90 | 62.98 | 3.64 | 4.22 | 28459.91 | 58.19 | 6 |
South Africa | 27.7 | 30.6 | 13.6 | 19.5 | 5.6 | 36.6 | 70.9 | 30.5 | 26.5 | 53.2 | 50.3 | 1.9 | 7.4 | 61.4 | 3.2 | 15.2 | 51.8 | 11.3 | 5.2 | 51.8 | 101.89 | 19.20 | 2.22 | 5.16 | 3.11 | 4.49 | 4.91 | 4.70 | 4.04 | 4.33 | 4.72 | 3.41 | 4.49 | 2.96 | 3.54 | 6.48 | 3.64 | 4.35 | 12462.03 | 25.96 | 26 |
Spain | 46.9 | 46.4 | 52.3 | 38.9 | 32.0 | 15.9 | 63.8 | 62.6 | 63.5 | 49.8 | 43.0 | 10.1 | 43.3 | 64.4 | 12.2 | 18.2 | 82.0 | 59.0 | 38.3 | 67.2 | 130.81 | 84.57 | 3.44 | 5.93 | 4.52 | 4.69 | 3.72 | 4.20 | 5.23 | 3.85 | 4.52 | 3.31 | 3.77 | 3.08 | 5.19 | 39.59 | 3.69 | 4.55 | 31193.33 | 63.64 | 4 |
Thailand | 34.8 | 36.4 | 11.8 | 5.5 | 1.4 | 4.0 | 61.3 | 45.5 | 59.0 | 57.3 | 37.5 | 1.5 | 4.6 | 50.5 | 0.0 | 0.0 | 52.2 | 23.9 | 4.6 | 50.5 | 86.98 | 51.23 | 3.45 | 4.13 | 4.01 | 4.17 | 4.41 | 4.29 | 4.58 | 3.75 | 3.91 | 3.24 | 3.95 | 2.94 | 4.26 | 1.23 | 3.28 | 4.66 | 14853.46 | 30.61 | 23 |
Turkey | 34.8 | 36.4 | 13.4 | 44.0 | 15.8 | 4.0 | 27.8 | 12.7 | 10.1 | 57.7 | 15.3 | 5.7 | 15.2 | 44.8 | 0.6 | 4.2 | 60.2 | 29.2 | 11.7 | 47.6 | 86.11 | 69.39 | 3.42 | 3.79 | 3.84 | 4.36 | 3.81 | 4.09 | 4.69 | 3.70 | 3.87 | 2.95 | 3.69 | 4.16 | 4.22 | 6.83 | 3.42 | 4.46 | 22401.88 | 45.37 | 14 |
Ukraine | 78.3 | 67.9 | 10.5 | 4.7 | 0.9 | 7.5 | 60.5 | 12.9 | 8.8 | 41.9 | 0.3 | 0.4 | 1.6 | 23.4 | 0.0 | 0.0 | 78.6 | 71.0 | 17.1 | 50.4 | 97.77 | 79.70 | 3.67 | 3.89 | 4.16 | 3.91 | 3.78 | 3.84 | 4.93 | 3.64 | 3.77 | 3.13 | 3.50 | 2.87 | 4.33 | 3.19 | 3.16 | 4.14 | 8243.47 | 17.70 | 28 |
The final step is to compare the results of the partial rankings to the corresponding part of the original U21 ranking. The results of the Spearman’s rank correlation show a strong positive relationship between the two ranking.
cor(B$Rank,U21_rank[rownames(B),],method="spearman")
## [1] 0.7801453
cor.test(B$Rank,U21_rank[rownames(B),],method="spearman")
##
## Spearman's rank correlation rho
##
## data: B$Rank and U21_rank[rownames(B), ]
## S = 892.61, p-value = 6.038e-07
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.7801453
An interesting question is how the League C of the U21 and the League C of the extended database (U21 + GCI + GDP) are overlap. In order to answer this question, one can make a Venn diagram of the two leagues to detect overlaps. The first step before plotting the Venn diagrams (one for indicators and one for countries), is to calculate the lower league bi-cluster of the U21 database.
rres_U21 <- iBBiG(binaryMatrix=binarize(1-mtx_u21,threshold = 0.50),nModules = 1,alpha=0.08,pop_size = 100,mutation = 0.08,stagnation = 50,selection_pressure = 1.2,max_sp = 15,success_ratio = 0.8)
## Module: 1 ... done
T<-biclust::bicluster(orig_mtx,rres,number = 1)
T_U21<-biclust::bicluster(orig_mtx_u21,rres_U21,number = 1)
League_C<-T[[1]]
League_C_U21<-T_U21[[1]]
Fig.12 shows the Venn diagram of the indicators in the two different lower leagues. The green circle contains those indicators that are in the U21’s lower league (League C) while in the red circle the extended database’s lower league’s indicators can be found. 19 indicators are the same in the two lower leagues.
draw.pairwise.venn(area1=nrow(as.matrix(colnames(League_C))),area2=nrow(as.matrix(colnames(League_C_U21))),cross.area=nrow(as.matrix(intersect(colnames(League_C),colnames(League_C_U21)))),fill=c("red","green"),category=c("Extended","U21"),main="Venn Diagram of indicators on League Cs")
## (polygon[GRID.polygon.33], polygon[GRID.polygon.34], polygon[GRID.polygon.35], polygon[GRID.polygon.36], text[GRID.text.37], text[GRID.text.38], text[GRID.text.39], text[GRID.text.40])
One can draw a Venn diagram for the countries in the two lower leagues. On Fig. 13 the green circle contains those countries that are in the U21’s lower league while in the brown circle the extended database’s lower league’s countries can be found. An interesting finding is that all of the countries in the extended database (29) are part of the U21 countries.
draw.pairwise.venn(area1=nrow(as.matrix(rownames(League_C))),area2=nrow(as.matrix(rownames(League_C_U21))),cross.area=nrow(as.matrix(intersect(rownames(League_C),rownames(League_C_U21)))),fill=c("red","green"),category=c("Extended","U21"),main="Venn Diagram of countries on League Cs")
## (polygon[GRID.polygon.41], polygon[GRID.polygon.42], polygon[GRID.polygon.43], polygon[GRID.polygon.44], text[GRID.text.45], text[GRID.text.46], lines[GRID.lines.47], text[GRID.text.48], text[GRID.text.49])
With the BicARE method the middle league(s) can be specified. In order to use the BiCARE method, the first step is to convert the BicARE object to a Biclust object.
bicare2biclust <- function(x){
if(class(x)=="Biclust"){
return(x)
} else if(class(x)=="biclustering"){
Parameters <- list(numberofbicluster=x$param[1,2],residuthreshold=x$param[2,2],genesinitialprobability=x$param[3,2],samplesinitialprobability=x$param[4,2],numberofiterations=x$param[5,2],date=x$param[6,2])
RowxNumber <- t(x$bicRow==1)
NumberxCol <- x$bicCol==1
Number <- as.numeric(dim(RowxNumber)[2])
info <- list()
return(new("Biclust",Parameters=Parameters,RowxNumber=RowxNumber,NumberxCol=NumberxCol,Number=Number,info=info))
}
}
In the next step the BicARE algorithm determines the number of the bi-clusters.
BICARE_res <- FLOC(Data=mtx,k=2,pGene=1.0,pSample=1.0,r=NULL,N=8,M=6,t=500,blocGene=NULL,blocSample=NULL)
bres <- bicare2biclust(BICARE_res)
summary(bres)
##
## An object of class Biclust
##
## call:
## NULL
##
## Number of Clusters found: 2
##
## Cluster sizes:
## BC 1 BC 2
## Number of Rows: 35 27
## Number of Columns: 6 6
bres@Parameters$residuthreshold
## [1] "0.00329199454357613"
The results above show that the BicARE algorithm found 2 bi-clusters. In the first bi-cluster there are 35 rows (countries) and 6 columns (indicators), in the second bi-cluster there are 27 rows (countries) and 6 columns (indicators). The next step is to calculate F-tests to determine which bi-cluster is significant.
RObs.FStat <- NULL
for (i in c(1:bres@Number)){
RObs.FStat[[i]] <- computeObservedFstat(x=mtx,bicResult=bres,number=i)
}
kable(RObs.FStat,caption = "**Table 7** The results of row and column effects of the middle league for two bi-clusters")
|
|
In Table 7 the results of the F-tests can be seen. Although both bi-clusters are significant, in the following only the first bi-cluster is considered because based on the U21 data only one bi-cluster was found and can be compared with the extended database’s results. The first bi-cluster will be considered as the middle league (League B).
The next step is to identify the middle league bi-cluster.
BICARE_res <- FLOC(Data=mtx,k=1,pGene=1.0,pSample=1.0,r=NULL,N=8,M=6,t=500,blocGene=NULL,blocSample=NULL)
bres <- bicare2biclust(BICARE_res)
summary(bres)
##
## An object of class Biclust
##
## call:
## NULL
##
## There was one cluster found with
## 27 Rows and 6 columns
bres@Parameters$residuthreshold
## [1] "0.00329199454357613"
The results of the BicARE algorithm can be seen above. The algorithm found one bi-cluster in which there are 27 rows (countries) and 6 columns (indicators). The following step is to calculate the F-tests to determine whether the bi-cluster is significant or not.
In Table 8 the results of the row and column effects (F-tests) can be read. Both the row and column effects are significant, so the middle league bi-cluster is significant.
RObs.FStat <- NULL
for (i in c(1:bres@Number)){
RObs.FStat[[i]] <- computeObservedFstat(x=mtx,bicResult=bres,number=i)
}
kable(RObs.FStat,caption = "**Table 8** The results of row and column effects of the middle league")
|
To see the differences between those countries that are included in the middle league and those that are not in the middle league, one can plot the means, medians, variances, and median absolute deviations (MAD) for rows (countries) (see Fig. 14). On Fig. 14 the red line indicates those countries that are included in the middle league, while the black line shows the remaining countries. Fig. 14 shows that in the case of the countries in the middle league the variances differ the most.
exploreBic(dset=mtx,bres=bres,mname='biclust',pfor='all',gby='genes',bnum=1)
One can do the same for the columns (indicators). Fig. 15 shows the differences between the means, medians, variances, and median absolute deviations (MAD) for the indicators. The red line indicates those indicators that are in the middle league, while the black line shows the remaining indicators.
exploreBic(dset=mtx,bres=bres,mname='biclust',pfor='all',gby='conditions',bnum=1)
Fig. 16 depicts the heat map of the middle league. The middle league contains 27 countries and 6 indicators. The red cells mean higher values in the case of indicators, while blue cells symbol lower values.
drawHeatmap(x=mtx,bicResult=bres,number=1,paleta=bluered(100),beamercolor = TRUE)
Fig. 17 shows the heat map of the results of BicARE on normalized data.The middle league can be seen in the upper left corner of Fig. 17, which is enlarged on Fig.16.
drawHeatmap(x=mtx,bicResult=bres,local=FALSE,number=1,paleta=bluered(100),beamercolor = TRUE)
The next step is to check the stability of the middle league bi-cluster. The results of the bootstrap can be seen below.
Bootstrap <- diagnoseColRow(x=mtx,bicResult=bres,number=1,
nResamplings=100,replace=TRUE)
Bootstrap
## $bootstrapFstats
## fval.row fval.col
## [1,] 1.2078600 0.61240007
## [2,] 1.5806960 0.85319402
## [3,] 0.9273756 1.36239109
## [4,] 0.6872762 1.79701157
## [5,] 0.7660947 1.07387646
## [6,] 1.1139033 0.26647840
## [7,] 0.8795790 1.69844993
## [8,] 0.9123604 1.66501644
## [9,] 2.3435876 0.49093603
## [10,] 0.7157546 0.31756348
## [11,] 0.5568714 0.46800481
## [12,] 0.6603036 1.03764537
## [13,] 0.9614088 1.05868740
## [14,] 1.1161979 1.27977908
## [15,] 0.8939627 1.27184215
## [16,] 0.8575280 0.75544638
## [17,] 0.7414932 2.09229150
## [18,] 1.1472397 2.06786233
## [19,] 0.6297470 0.53184250
## [20,] 0.7956746 1.48659577
## [21,] 1.8246519 4.00042634
## [22,] 0.7848388 0.36980379
## [23,] 0.8926824 0.57060718
## [24,] 0.7910152 0.45822001
## [25,] 0.6405035 1.92382946
## [26,] 0.9168117 1.30711295
## [27,] 2.1284440 0.50411194
## [28,] 0.6608382 1.28195683
## [29,] 0.6708781 0.94941481
## [30,] 0.9400408 1.41871798
## [31,] 1.1387900 1.42750599
## [32,] 0.6048057 0.98925913
## [33,] 1.0817870 1.50933300
## [34,] 0.8913641 0.47729945
## [35,] 0.9048450 0.82701836
## [36,] 0.7274354 1.23834031
## [37,] 0.8536817 0.94550524
## [38,] 0.9972651 2.12438988
## [39,] 0.7917958 0.64773618
## [40,] 0.9944008 0.71253015
## [41,] 0.7635362 0.94899826
## [42,] 0.8574021 0.67111886
## [43,] 0.9974994 0.35971525
## [44,] 0.7574384 0.39283201
## [45,] 1.1448646 1.76582684
## [46,] 1.9931620 2.15707081
## [47,] 0.8269813 0.94746880
## [48,] 1.1662655 0.89833189
## [49,] 0.8304346 1.60601105
## [50,] 0.5241232 1.78577964
## [51,] 0.8979847 1.66020154
## [52,] 0.6964299 0.54443764
## [53,] 0.6960671 0.85740038
## [54,] 0.8195451 1.17689029
## [55,] 0.9794541 0.31672513
## [56,] 0.8682385 3.50394981
## [57,] 0.8277749 0.55470189
## [58,] 1.0689082 1.91161123
## [59,] 0.8997694 2.21317099
## [60,] 1.0505329 0.28597662
## [61,] 0.7615445 0.94014020
## [62,] 1.1851639 0.99557101
## [63,] 1.1819092 2.13056893
## [64,] 0.5472015 1.44855670
## [65,] 0.8222250 0.82266190
## [66,] 1.2966959 0.98496003
## [67,] 0.8420572 1.02265808
## [68,] 1.0241865 0.22234867
## [69,] 1.1494583 3.20653343
## [70,] 1.1477168 0.79660900
## [71,] 0.9805434 1.11897143
## [72,] 1.3189989 1.74830315
## [73,] 1.0974138 1.63691935
## [74,] 0.9473180 0.35818774
## [75,] 1.7656438 0.78717292
## [76,] 1.0617085 0.07751164
## [77,] 1.2535956 0.74280781
## [78,] 0.8902942 0.97886108
## [79,] 1.4919405 1.29303458
## [80,] 0.8876010 3.00128630
## [81,] 0.8065096 1.12045311
## [82,] 0.6580799 0.50252792
## [83,] 1.5335705 0.86472714
## [84,] 0.6407898 1.14873848
## [85,] 1.2930996 1.80945642
## [86,] 1.4546359 0.54653723
## [87,] 1.1740167 0.91574036
## [88,] 1.4850134 1.71722679
## [89,] 0.8171080 0.69132360
## [90,] 1.0804226 1.00283869
## [91,] 0.5552350 0.53821270
## [92,] 0.8880356 0.30116660
## [93,] 0.7923035 1.01005978
## [94,] 0.9249327 1.35695494
## [95,] 1.8142046 0.54950519
## [96,] 1.1446460 1.89536025
## [97,] 1.0660994 2.33089157
## [98,] 1.0268796 0.81621546
## [99,] 0.8612494 0.68431171
## [100,] 0.7424431 1.76134082
##
## $observedFstatRow
## [1] 125.4684
##
## $observedFstatCol
## [1] 6.115956
##
## $bootstrapPvalueRow
## [1] 0
##
## $bootstrapPvalueCol
## [1] 0
Since the bootstrapPvalueRow and the bootstrapPvalueCol are lower than 0.01, the bi-cluster is stable.
The next step is to calculate partial rankings based on the indicators and countries in the middle league. The calculation of each score of each indicator of the countries are done according to the method used in the original U21 ranking. The original weights of U21?s indicators are used, and the total scores for the countries are calculated using the selected indicators in the given bi-cluster. Table 9 shows the result of partial ranking in the case of the middle league.
C<-biclust::bicluster(orig_mtx,bres,number=1)
B<-as.data.frame(C[[1]])
selectedweight<-weights[,colnames(B)]
B$Overall_Score<-rowSums(B*selectedweight)*100/max(rowSums(B*selectedweight))
B$Rank<-rank(-B$Overall_Score)
kable(B,caption = "**Table 9** League B by BicARE",digits = 2)
GCI_72_VALUE_Availability_of_research_and_training_services,1-7(best) | GCI_73_VALUE_Extent_of_staff_training,1-7(best) | GCI_74_VALUE_C._On-the-job_training (a nagyobb jobb) | GCI_149_VALUE_Capacity_for_innovation,1-7(best) | GCI_152_VALUE_University-industry_collaboration_in_R&D,1-7(best) | GCI_156_VALUE_12th_pillar:_Innovation (a nagyobb jobb) | Overall_Score | Rank | |
---|---|---|---|---|---|---|---|---|
Argentina | 4.21 | 3.73 | 3.97 | 3.67 | 3.64 | 3.04 | 62.42 | 21 |
Australia | 5.21 | 4.52 | 4.87 | 4.61 | 4.84 | 4.41 | 79.85 | 11 |
Austria | 5.93 | 4.82 | 5.37 | 4.96 | 4.68 | 4.82 | 85.78 | 8 |
Belgium | 6.00 | 5.11 | 5.56 | 5.16 | 5.58 | 4.89 | 90.60 | 4 |
Brazil | 4.45 | 4.31 | 4.38 | 4.10 | 3.80 | 3.31 | 68.29 | 13 |
Bulgaria | 3.41 | 3.30 | 3.35 | 3.30 | 3.00 | 2.94 | 54.15 | 26 |
Canada | 5.29 | 4.70 | 4.99 | 4.63 | 4.90 | 4.54 | 81.50 | 9 |
Croatia | 4.14 | 3.22 | 3.68 | 3.11 | 3.39 | 3.10 | 57.91 | 24 |
Germany | 6.03 | 5.02 | 5.52 | 5.60 | 5.34 | 5.47 | 92.48 | 2 |
Greece | 3.83 | 3.55 | 3.69 | 3.30 | 3.06 | 3.18 | 57.79 | 25 |
Hong Kong | 5.42 | 4.63 | 5.02 | 4.50 | 4.59 | 4.38 | 80.01 | 10 |
India | 4.21 | 3.94 | 4.08 | 4.02 | 3.87 | 3.53 | 66.37 | 15 |
Indonesia | 4.44 | 4.66 | 4.55 | 4.76 | 4.55 | 3.93 | 75.41 | 12 |
Mexico | 4.33 | 3.95 | 4.14 | 3.72 | 3.97 | 3.31 | 65.73 | 17 |
Netherlands | 6.13 | 5.03 | 5.58 | 5.23 | 5.38 | 5.25 | 91.46 | 3 |
Norway | 5.52 | 5.16 | 5.34 | 5.02 | 5.02 | 4.85 | 86.70 | 7 |
Romania | 4.17 | 3.56 | 3.86 | 3.75 | 3.59 | 3.28 | 62.31 | 22 |
Russia | 4.34 | 3.84 | 4.09 | 3.77 | 3.63 | 3.29 | 64.39 | 19 |
Serbia | 3.52 | 3.09 | 3.30 | 2.97 | 3.24 | 2.89 | 53.33 | 27 |
Singapore | 5.46 | 5.25 | 5.36 | 4.99 | 5.58 | 5.18 | 89.24 | 6 |
Slovakia | 4.48 | 3.83 | 4.16 | 3.54 | 3.36 | 3.18 | 63.24 | 20 |
Slovenia | 4.43 | 3.70 | 4.07 | 3.71 | 3.96 | 3.64 | 65.92 | 16 |
Spain | 4.69 | 3.72 | 4.20 | 3.85 | 3.77 | 3.69 | 67.07 | 14 |
Sweden | 5.42 | 5.10 | 5.26 | 5.50 | 5.33 | 5.37 | 89.70 | 5 |
Switzerland | 6.50 | 5.69 | 6.09 | 5.89 | 5.79 | 5.70 | 100.00 | 1 |
Turkey | 4.36 | 3.81 | 4.09 | 3.70 | 3.69 | 3.42 | 64.67 | 18 |
Ukraine | 3.91 | 3.78 | 3.84 | 3.64 | 3.50 | 3.16 | 61.22 | 23 |
The final step is to compare the results of the partial rankings to the corresponding part of the original U21 ranking. The results of the Spearman?s rank correlation show a moderate positive relationship between the two ranking.
cor(B$Rank,U21_rank[rownames(B),],method="spearman")
## [1] 0.6941392
cor.test(B$Rank,U21_rank[rownames(B),],method="spearman")
##
## Spearman's rank correlation rho
##
## data: B$Rank and U21_rank[rownames(B), ]
## S = 1002, p-value = 9.038e-05
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.6941392
An interesting question is how the League B of the U21 and the League B of the extended database (U21 + GCI + GDP) are overlap. In order to answer this question, one can make a Venn diagram of the two leagues to detect overlaps. The first step before plotting the Venn diagrams(one for indicators and one for countries), is to calculate the middle league bi-cluster of the U21 database.
BICARE_res_U21 <- FLOC(Data=as.matrix(ALL),k=1,pGene=1,pSample=1,r=1e-16,N=17,M=6,t=10000,blocGene=NULL,blocSample=NULL)
bres_U21 <- bicare2biclust(BICARE_res_U21)
T<-biclust::bicluster(orig_mtx,bres,number = 1)
T_U21<-biclust::bicluster(orig_mtx_u21,bres_U21,number = 1)
League_B<-T[[1]]
League_B_U21<-T_U21[[1]]
Fig.18 shows the Venn diagram of the indicators in the two different top leagues. The green circle contains those indicators that are in the U21’s middle league (League B) while in the red circle the extended database’s middle league’s indicators can be found. There is no common indicator in the two middle leagues.
draw.pairwise.venn(area1=nrow(as.matrix(colnames(League_B))),area2=nrow(as.matrix(colnames(League_B_U21))),cross.area=nrow(as.matrix(intersect(colnames(League_B),colnames(League_B_U21)))),fill=c("red","green"),category=c("Extended","U21"),main="Venn Diagram of indicators on League Bs")
## (polygon[GRID.polygon.52], polygon[GRID.polygon.53], polygon[GRID.polygon.54], polygon[GRID.polygon.55], text[GRID.text.56], text[GRID.text.57], text[GRID.text.58], text[GRID.text.59])
One can draw a Venn diagram for the countries in the two middle leagues. On Fig. 19 the green circle contains those countries that are in the U21’s middle league while in the red circle the extended database’s middle league’s countries can be found. 8 countries are the same in the two middle leagues.
draw.pairwise.venn(area1=nrow(as.matrix(rownames(League_B))),area2=nrow(as.matrix(rownames(League_B_U21))),cross.area=nrow(as.matrix(intersect(rownames(League_B),rownames(League_B_U21)))),fill=c("red","green"),category=c("Extended","U21"),main="Venn Diagram of countries on League Bs")
## (polygon[GRID.polygon.60], polygon[GRID.polygon.61], polygon[GRID.polygon.62], polygon[GRID.polygon.63], text[GRID.text.64], text[GRID.text.65], text[GRID.text.66], text[GRID.text.67], text[GRID.text.68])
Bi-clusters might (or might not) overlap. Analyzing the overlaps can give a subtler picture of countries and indicators.
Fig. 20 shows the Venn diagram of the extended database’s (U21 + GCI + GDP) indicators. The red circle contains those indicators that are in the upper league (League A), in the blue circle those indicators can be found that are in the lower league (League C). The green circle indicates the middle league’s (League B) indicators.
T<-biclust::bicluster(orig_mtx,res,number = 1)
League_A<-T[[1]]
R<-biclust::bicluster(orig_mtx,rres,number = 1)
League_C<-R[[1]]
B<-biclust::bicluster(orig_mtx,bres,number = 1)
League_B<-B[[1]]
draw.triple.venn(area1=nrow(as.matrix(colnames(League_A))),area2=nrow(as.matrix(colnames(League_B))),area3=nrow(as.matrix(colnames(League_C))),n12=nrow(as.matrix(intersect(colnames(League_A),colnames(League_B)))),n13=nrow(as.matrix(intersect(colnames(League_A),colnames(League_C)))),n23=nrow(as.matrix(intersect(colnames(League_B),colnames(League_C)))),n123=nrow(as.matrix(intersect(intersect(colnames(League_A),colnames(League_B)),colnames(League_C)))),fill=c("red","green","blue"),category=c("League A","League B","League C"),main="Venn Diagram of indicators of the extended database")
## (polygon[GRID.polygon.69], polygon[GRID.polygon.70], polygon[GRID.polygon.71], polygon[GRID.polygon.72], polygon[GRID.polygon.73], polygon[GRID.polygon.74], text[GRID.text.75], text[GRID.text.76], text[GRID.text.77], text[GRID.text.78], text[GRID.text.79], text[GRID.text.80], text[GRID.text.81])
One interesting finding is, that all of the indicators in the middle league are part of the lower league’s indicators.
Fig. 21 shows the Venn diagram of the extended database’s (U21 + GCI +GDP) countries. The red circle contains those countries that are in the upper league (League A), in the the blue circle those countries can be found that are in the lower league (League C). The green circle indicates the middle league’s (League B) countries.
draw.triple.venn(area1=nrow(as.matrix(rownames(League_A))),area2=nrow(as.matrix(rownames(League_B))),area3=nrow(as.matrix(rownames(League_C))),n12=nrow(as.matrix(intersect(rownames(League_A),rownames(League_B)))),n13=nrow(as.matrix(intersect(rownames(League_A),rownames(League_C)))),n23=nrow(as.matrix(intersect(rownames(League_B),rownames(League_C)))),n123=nrow(as.matrix(intersect(intersect(rownames(League_A),rownames(League_B)),rownames(League_C)))),fill=c("red","green","blue"),category=c("League A","League B","League C"),main="Venn Diagram of countries of the extended database")
## (polygon[GRID.polygon.82], polygon[GRID.polygon.83], polygon[GRID.polygon.84], polygon[GRID.polygon.85], polygon[GRID.polygon.86], polygon[GRID.polygon.87], text[GRID.text.88], text[GRID.text.89], text[GRID.text.90], text[GRID.text.91], text[GRID.text.92], text[GRID.text.93], text[GRID.text.94], text[GRID.text.95], text[GRID.text.96], text[GRID.text.97])
If the extended database (U21 + GCI + GDP) is used there are no countries which belong to all of the three leagues at the same time.
The above proposed bi-clustering methods can help to identify which countries can be compared based on which indicators in higher education system rankings. With the iBBiG method the upper and lower league(s) (League(s) A and League(s) C) can be determined and the BicARE method can specify the middle league(s) (League(s) B). In this case an extended database was used during the bi-clustering. The exentended database contains not just indicators of the U21 ranking, but indicators of the Global Competitive Index (GCI) and an indicator from the World Bank (WB/GDP). The results show that using this extended database do not change significantly the countries and the indicators in the upper and lower league compared to the results of the leagues of the U21 database. On the other hand, in the case of the middle league there are differences between the results of the extended database and the U21 database.