Reproducibility in paediatric versus early adults’ groups#

This notebook investigates whether different age stages (paediatric versus adults) have different reproducibility behaviour. The age range of the participants is from 6.5 to 56.2 years, and the analysis was carried out for two age groups: paediatric and adults. In this analysis the ABIDE I dataset was used. ABIDE I is a multi-site data sharing initiative (17 sites in total) with neurotypical controls and subjects diagnosed with Autism Spectrum Disorder (ASD).

#@title

import os
import math
import matplotlib
import numpy as np
import pandas as pd
import pingouin as pg
import scipy.stats
import seaborn as sns
import matplotlib.pyplot as plt
from collections import Counter

from sklearn.linear_model import LinearRegression
from utils import get_regression_metrics, bland_altman_plot
from settings import RESOURCE_DIR, ABIDE_MAP_SITES

# Tables format
#%matplotlib qt

pd.set_option('display.float_format', lambda x: f"{x: 0.2e}")

# Visualization format
font = {'size'   : 18}
matplotlib.rc('font', **font)
sns.set_theme(style="whitegrid", font_scale=2)

# Global variables
var_compare = "adult"
template_name = "a2009s"
metric_analysis = "corticalThicknessAverage"
path_csv_cortical_data = "cortical_thicknesss_paediatric_vs_early_adults_groups.csv"

color_pallete = {"ADULT": "#25654B", "PAEDIATRIC": "#C94444"}


var_analyse = ["r_square", "slope", "intercept"]

pipeline1 = "ACPC_CAT12"
pipeline2 = "FREESURFER"
/home/fmachado/anaconda3/envs/fs-cat12/lib/python3.9/site-packages/outdated/utils.py:14: OutdatedPackageWarning: The package pingouin is out of date. Your version is 0.5.1, the latest is 0.5.2.
Set the environment variable OUTDATED_IGNORE=1 to disable these warnings.
  return warn(
#@title

df_names_rois = pd.read_csv(os.path.join(RESOURCE_DIR, "template", f"{template_name}_atlas_labels.csv"))
df_areas = pd.read_csv(os.path.join(RESOURCE_DIR, "template", f"{template_name}_rois_areas.csv"))

Data#

#@title
df_software_raw = pd.read_csv(os.path.join(RESOURCE_DIR, "data", path_csv_cortical_data), low_memory=False)
df_software_raw.set_index("path", inplace=True)
# Get only the template ROIs
df_software_raw = df_software_raw.loc[df_software_raw.roiName.str[1:].isin(df_names_rois.label.to_list())]

df_software_raw.site = df_software_raw.site.apply(lambda x: ABIDE_MAP_SITES[x])
df_software_raw.head(4)
repositoryName site adult subjectID sessionID run age gender software roiName corticalThicknessAverage template repository_and_site cjv cnr snr_total
path
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz ABIDE_I California Institute of Technology ADULT sub-0051475 1 1 4.42e+01 MALE ACPC_CAT12 lG_Ins_lg_and_S_cent_ins 3.01e+00 a2009s ABIDE_I|CaliforniaInstituteofTechnology 4.40e-01 3.16e+00 1.52e+01
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz ABIDE_I California Institute of Technology ADULT sub-0051475 1 1 4.42e+01 MALE ACPC_CAT12 lG_and_S_cingul-Ant 3.11e+00 a2009s ABIDE_I|CaliforniaInstituteofTechnology 4.40e-01 3.16e+00 1.52e+01
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz ABIDE_I California Institute of Technology ADULT sub-0051475 1 1 4.42e+01 MALE ACPC_CAT12 lG_and_S_cingul-Mid-Ant 2.94e+00 a2009s ABIDE_I|CaliforniaInstituteofTechnology 4.40e-01 3.16e+00 1.52e+01
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz ABIDE_I California Institute of Technology ADULT sub-0051475 1 1 4.42e+01 MALE ACPC_CAT12 lG_and_S_cingul-Mid-Post 2.73e+00 a2009s ABIDE_I|CaliforniaInstituteofTechnology 4.40e-01 3.16e+00 1.52e+01

Check preprocessing problems#

Some images were only ran by one of the softwares, in the code below images with preprocessing problems are discoved, and those subjects/session/run are removed from the analysis.

#@title
roi_name = df_software_raw.loc[(~df_software_raw[metric_analysis].isna()) & (df_software_raw.template == template_name)].roiName.to_list()[1]

df_group = df_software_raw[df_software_raw["roiName"] == roi_name].copy()
df_group = df_group[df_group.template == template_name].groupby(by=['path']).apply(lambda x: list(x["software"]))

# Get the images run by only one
df_problems_running = df_group[df_group.apply(len) == 1].to_frame()


software_problems = [item for sublist in df_problems_running[0].to_list() for item in sublist]


print("\n".join([f"{number_problems} images were only processed by: {soft_name}" for soft_name, number_problems in Counter(software_problems).items()]))
2 images were only processed by: ACPC_CAT12
3 images were only processed by: FREESURFER

Exclude the subjects only run by only one software#

#@title
df_software_raw = df_software_raw[~df_software_raw.index.isin(df_problems_running.index)]

Demographics#

#@title
df_subjects = df_software_raw[["repositoryName", "site", var_compare, "repository_and_site", "subjectID", "sessionID", "run", "age", "gender", "snr_total"]].drop_duplicates().copy()
df_subjects.site = df_subjects.site.apply(lambda x: "" if type(x)==float else x)

if len(df_subjects["repositoryName"].unique()) == 1: 
    groupby_col = ["site"] 
else:
    groupby_col = ["repositoryName", "site"]
    
df_summary = df_subjects.groupby(groupby_col).apply(lambda x: pd.Series({"Total of participants": len(x), 
                                                                         "Number males": (x["gender"]=="MALE").sum(),
                                                                         "Mean and standard deviation [years]": f"{x['age'].mean(): 0.2f}+/-{x['age'].std(): 0.2f}",
                                                                         "Min Age [years]": f"{x['age'].min(): 0.2f}",
                                                                         "Max Age [years]": f"{x['age'].max(): 0.2f}"}))


df_summary
Total of participants Number males Mean and standard deviation [years] Min Age [years] Max Age [years]
site
California Institute of Technology 19 15 28.87+/- 11.21 17.00 56.20
Carnegie Mellon University 13 10 26.85+/- 5.74 20.00 40.00
Kennedy Krieger Institute 33 24 10.16+/- 1.26 8.07 12.77
Ludwig Maximilians University Munich 32 28 26.19+/- 9.96 7.00 48.00
NYU Langone Medical Center 104 78 15.83+/- 6.28 6.47 31.78
Olin Institute of Living at Hartford Hospital 16 14 16.94+/- 3.68 10.00 23.00
Oregon Health and Science University 15 15 10.06+/- 1.08 8.20 11.99
San Diego State University 22 16 14.22+/- 1.90 8.67 16.88
Social Brain Lab 15 15 33.73+/- 6.61 20.00 42.00
Stanford University 19 15 9.97+/- 1.64 7.75 12.43
Trinity Centre for Health Sciences 25 25 17.08+/- 3.77 12.04 25.66
University of California Los Angeles 45 39 12.96+/- 1.92 9.21 17.79
University of Leuven 35 30 18.17+/- 4.99 12.20 29.00
University of Michigan 75 58 14.82+/- 3.62 8.20 28.80
University of Pittsburgh School of Medicine 27 23 18.88+/- 6.64 9.44 33.24
University of Utah School of Medicine 43 43 21.36+/- 7.64 8.77 39.39
Yale Child Study Center 28 20 12.68+/- 2.75 7.66 17.83
#@title
df_software = df_software_raw.reset_index().pivot(['path',  'subjectID', 'sessionID', 'run', 'template', var_compare,'roiName'], ['software'], [metric_analysis]).reset_index().set_index("path")
df_software.head(4)
subjectID sessionID run template adult roiName corticalThicknessAverage
software ACPC_CAT12 FREESURFER
path
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz sub-0051475 1 1 a2009s ADULT lG_Ins_lg_and_S_cent_ins 3.01e+00 3.14e+00
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz sub-0051475 1 1 a2009s ADULT lG_and_S_cingul-Ant 3.11e+00 3.02e+00
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz sub-0051475 1 1 a2009s ADULT lG_and_S_cingul-Mid-Ant 2.94e+00 2.73e+00
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz sub-0051475 1 1 a2009s ADULT lG_and_S_cingul-Mid-Post 2.73e+00 2.60e+00

Search subjects with at least one ROI#

#@title
df_rois_nan = df_software.loc[df_software[[('corticalThicknessAverage', 'FREESURFER'), ('corticalThicknessAverage', 'ACPC_CAT12')]].isna().sum(axis=1)>0]
df_rois_nan
subjectID sessionID run template adult roiName corticalThicknessAverage
software ACPC_CAT12 FREESURFER
path

Exclude subjects with at least one ROI#

#@title
df_software_raw = df_software_raw.loc[~df_software_raw.index.isin(df_rois_nan.index.get_level_values(0).to_list())]
df_software = df_software.loc[~df_software.index.isin(df_rois_nan.index.get_level_values(0).to_list())]

Analysis#

Overall Analysis#

Bland-Altman plot#

from IPython.display import display, HTML

for var_value in df_software[var_compare].unique():
    df_soft_filter = df_software.loc[df_software[var_compare] == var_value]
    diff = (df_soft_filter[("corticalThicknessAverage", "FREESURFER")] - df_soft_filter[("corticalThicknessAverage", "ACPC_CAT12")])
    test = pg.ttest(diff, 0)


    df_means_diff = df_soft_filter[[("corticalThicknessAverage", "FREESURFER"), 
                                    ("corticalThicknessAverage", "ACPC_CAT12")]].mean().to_frame().T.rename(columns={"corticalThicknessAverage": "Cortical Thickness Average Mean"})
    df_means_diff["Difference FREESURFER - CAT12"] = diff.mean()
    
    
    test = pg.ttest(diff, 0)
    display(HTML(f"<br><h5>{var_value}</h5>"))
    display(HTML(df_means_diff.to_html()))
    
    display(HTML(f"<br><p>One sample t-test to verify whether the means difference is equal to zero</p>"))
    display(HTML(test.to_html()))
    print(f"Confidence interval (95%) of the means difference: [{round(diff.mean() - 1.96*diff.std(), 3)};{round(diff.mean() + 1.96*diff.std(), 3)}]")

ADULT
Cortical Thickness Average Mean Difference FREESURFER - CAT12
software FREESURFER ACPC_CAT12
0 2.49e+00 2.45e+00 3.42e-02

One sample t-test to verify whether the means difference is equal to zero

T dof alternative p-val CI95% cohen-d BF10 power
T-test 2.30e+01 26935 two-sided 3.26e-116 [0.03, 0.04] 1.40e-01 7.157e+111 1.00e+00
Confidence interval (95%) of the means difference: [-0.444;0.512]
/home/fmachado/anaconda3/envs/fs-cat12/lib/python3.9/site-packages/pingouin/bayesian.py:146: RuntimeWarning: divide by zero encountered in double_scalars
  bf10 = 1 / ((1 + t**2 / df)**(-(df + 1) / 2) / integr)
/home/fmachado/anaconda3/envs/fs-cat12/lib/python3.9/site-packages/pingouin/bayesian.py:146: RuntimeWarning: divide by zero encountered in double_scalars
  bf10 = 1 / ((1 + t**2 / df)**(-(df + 1) / 2) / integr)

PAEDIATRIC
Cortical Thickness Average Mean Difference FREESURFER - CAT12
software FREESURFER ACPC_CAT12
0 2.60e+00 2.52e+00 8.00e-02

One sample t-test to verify whether the means difference is equal to zero

T dof alternative p-val CI95% cohen-d BF10 power
T-test 6.47e+01 56831 two-sided 0.00e+00 [0.08, 0.08] 2.71e-01 inf 1.00e+00
Confidence interval (95%) of the means difference: [-0.498;0.658]
#@title
y_label_txt = "Difference Freesurfer - CAT12 [mm]"

for var in df_software[var_compare].unique():
    df_filt = df_software.loc[df_software[var_compare] == var]
    bland_altman_plot(df_filt.copy().reset_index(),
                      ('corticalThicknessAverage', 'FREESURFER'), 
                      ('corticalThicknessAverage', 'ACPC_CAT12'),
                      var_compare, color_pallete, x_label_reg="FREESURFER [mm]", y_label_reg="CAT12 [mm]",
                      y_label_diff="Difference Freesurfer - CAT12 [mm]")
../_images/obj_b_peadiatric_vs_adults_21_0.png ../_images/obj_b_peadiatric_vs_adults_21_1.png
#@title

df = df_software.reset_index().set_index(["subjectID", "sessionID", "run"]).join(df_subjects.reset_index().set_index(["subjectID", "sessionID", "run"]))
df.columns = ["".join(el) for el in df.columns]
df.head(3)
/tmp/ipykernel_42235/928943096.py:3: FutureWarning: merging between different levels is deprecated and will be removed in a future version. (2 levels on the left, 1 on the right)
  df = df_software.reset_index().set_index(["subjectID", "sessionID", "run"]).join(df_subjects.reset_index().set_index(["subjectID", "sessionID", "run"]))
path template adult roiName corticalThicknessAverageACPC_CAT12 corticalThicknessAverageFREESURFER path repositoryName site adult repository_and_site age gender snr_total
subjectID sessionID run
sub-0050030 1 1 ABIDE_I/UniversityofPittsburghSchoolofMedicine... a2009s ADULT lG_Ins_lg_and_S_cent_ins 3.43e+00 2.96e+00 ABIDE_I/UniversityofPittsburghSchoolofMedicine... ABIDE_I University of Pittsburgh School of Medicine ADULT ABIDE_I|UniversityofPittsburghSchoolofMedicine 2.51e+01 MALE 1.23e+01
1 ABIDE_I/UniversityofPittsburghSchoolofMedicine... a2009s ADULT lG_and_S_cingul-Ant 2.62e+00 2.57e+00 ABIDE_I/UniversityofPittsburghSchoolofMedicine... ABIDE_I University of Pittsburgh School of Medicine ADULT ABIDE_I|UniversityofPittsburghSchoolofMedicine 2.51e+01 MALE 1.23e+01
1 ABIDE_I/UniversityofPittsburghSchoolofMedicine... a2009s ADULT lG_and_S_cingul-Mid-Ant 3.05e+00 2.79e+00 ABIDE_I/UniversityofPittsburghSchoolofMedicine... ABIDE_I University of Pittsburgh School of Medicine ADULT ABIDE_I|UniversityofPittsburghSchoolofMedicine 2.51e+01 MALE 1.23e+01

Participant Analysis#

#@title
print(f'Number of subjects in this analysis is: {len(df_software.reset_index()["subjectID"].unique())}')
Number of subjects in this analysis is: 566

Compute metrics#

#@title

sub_grouped_by, sub_grouped_by_stats = get_regression_metrics(df_software, ['path'], (metric_analysis, pipeline1), (metric_analysis, pipeline2))

df_ct = df_software.reset_index()[["path", var_compare, "roiName", "corticalThicknessAverage"]].melt(id_vars=["path", var_compare, "roiName"])

all_subjects_icc = []

for path_sub in df_ct["path"].unique():
    df_icc_roi = pg.intraclass_corr(data=df_ct.loc[df_ct["path"] == path_sub], 
                                    targets='roiName', raters='software',
                                    ratings='value')
    df_icc_roi["path"] = path_sub

    all_subjects_icc.append(df_icc_roi.loc[df_icc_roi["Type"]=="ICC3"])

df_icc_sub = pd.concat(all_subjects_icc).set_index("path")

df_subject_info = df_subjects.reset_index().set_index(['path']).join(sub_grouped_by_stats).join(df_icc_sub[["ICC"]])
df_subject_info = df_subject_info[~df_subject_info["slope"].isna()]

df_subject_info.head(5)
repositoryName site adult repository_and_site subjectID sessionID run age gender snr_total slope intercept r_value p_value std_err r_square ICC
path
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz ABIDE_I California Institute of Technology ADULT ABIDE_I|CaliforniaInstituteofTechnology sub-0051475 1 1 4.42e+01 MALE 1.52e+01 7.74e-01 7.09e-01 8.12e-01 6.82e-36 4.61e-02 6.59e-01 8.11e-01
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051476/anat/sub-0051476_T1w.nii.gz ABIDE_I California Institute of Technology ADULT ABIDE_I|CaliforniaInstituteofTechnology sub-0051476 1 1 3.93e+01 MALE 1.64e+01 9.63e-01 2.11e-01 8.99e-01 4.25e-54 3.89e-02 8.07e-01 8.96e-01
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051477/anat/sub-0051477_T1w.nii.gz ABIDE_I California Institute of Technology ADULT ABIDE_I|CaliforniaInstituteofTechnology sub-0051477 1 1 4.25e+01 MALE 1.60e+01 8.36e-01 4.95e-01 8.24e-01 8.18e-38 4.76e-02 6.79e-01 8.24e-01
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051478/anat/sub-0051478_T1w.nii.gz ABIDE_I California Institute of Technology ADULT ABIDE_I|CaliforniaInstituteofTechnology sub-0051478 1 1 1.97e+01 MALE 1.56e+01 9.04e-01 3.54e-01 7.86e-01 2.85e-32 5.89e-02 6.17e-01 7.78e-01
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051479/anat/sub-0051479_T1w.nii.gz ABIDE_I California Institute of Technology ADULT ABIDE_I|CaliforniaInstituteofTechnology sub-0051479 1 1 2.00e+01 FEMALE 1.52e+01 9.14e-01 3.74e-01 8.36e-01 7.58e-40 4.97e-02 6.99e-01 8.32e-01

Reproducibility metrics analysis#

#@title
df_subject_info.groupby(by=var_compare).apply(lambda x: pd.Series([f'{x["ICC"].mean(): 0.2f}+/-{x["ICC"].std(): 0.2f}',
                                                                   f'{x["r_square"].mean(): 0.2f}+/-{x["r_square"].std(): 0.2f}'], 
                                                                   index=['ICC', '$R^2$']))
ICC $R^2$
adult
ADULT 0.79+/- 0.06 0.64+/- 0.09
PAEDIATRIC 0.69+/- 0.11 0.51+/- 0.14

Analysis of age effect on reproducibility metrics#

def latex_float(float_str):
    
    if "e" in float_str:
        base, exponent = float_str.split("e")
        return r"{0} \times 10^{{{1}}}".format(base, int(exponent))
    else:
        return float_str
import re, seaborn as sns
import numpy as np

from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.colors import ListedColormap
from IPython.display import display, Math


reproducibility_measures = [("ICC", "ICC"), ("r_square", "$R^2$")]

for rep_measure, label_measure in reproducibility_measures:
    
    for el in df_subject_info[var_compare].unique():
        legend = el

        data_filtered = df_subject_info.loc[df_subject_info[var_compare] == el]
                
        x_values = data_filtered[["age", "snr_total"]]
        y_values = data_filtered[rep_measure]
        
        model = LinearRegression().fit(x_values, y_values)

        
        age_weight = latex_float(f'{model.coef_[0]: 0.1e}')
        SNR_weight = latex_float(f'{model.coef_[1]: 0.1e}')
        intercept = latex_float(f'{model.intercept_: 0.1e}')
        
        eq = f'${label_measure.replace("$", "")}_' + "{" + legend + "}" + f'={age_weight}age + {SNR_weight}SNR + {intercept}$'
        display(Math(r'{}'.format(eq)))
        
    fig = plt.figure(figsize=(6,6))
    ax = Axes3D(fig, auto_add_to_figure=False)
    fig.add_axes(ax)

        
    for el in df_subject_info["site"].unique():

        df_filt = df_subject_info.loc[df_subject_info["site"] == el]
        x= df_filt["age"].values
        y= df_filt["snr_total"].values
        z= df_filt[rep_measure].values
        # axes instance

        # get colormap from seaborn
        cmap = ListedColormap(sns.color_palette("husl", 256).as_hex())

        # plot
        sc = ax.scatter(x, y, z, s=40, marker='o', cmap=cmap, alpha=1)

    ax.set_xlabel('Age', labelpad=15)
    ax.set_ylabel('SNR', labelpad=30)
    ax.set_zlabel(label_measure, labelpad=30)
    
    for label in ax.get_yaxis().get_ticklabels()[::2]:
        label.set_visible(False)
    ax.view_init(elev=10., azim=-80)

    plt.subplots_adjust(bottom=0.2, top=2)
    plt.setp(ax.get_yticklabels(), rotation=0)

    plt.show()
    
\[\displaystyle ICC_{ADULT}= 2.0 \times 10^{-3}age + 2.3 \times 10^{-3}SNR + 7.1 \times 10^{-1}\]
\[\displaystyle ICC_{PAEDIATRIC}= 8.9 \times 10^{-3}age + 8.9 \times 10^{-3}SNR + 5.0 \times 10^{-1}\]
../_images/obj_b_peadiatric_vs_adults_31_2.png
\[\displaystyle R^2_{ADULT}= 2.9 \times 10^{-3}age + 4.4 \times 10^{-3}SNR + 5.2 \times 10^{-1}\]
\[\displaystyle R^2_{PAEDIATRIC}= 1.2 \times 10^{-2}age + 1.3 \times 10^{-2}SNR + 2.3 \times 10^{-1}\]
../_images/obj_b_peadiatric_vs_adults_31_5.png

ROI Analysis#

#@title
roi_grouped_by, roi_grouped_by_stats = get_regression_metrics(df_software, ["roiName", var_compare], (metric_analysis, pipeline1), (metric_analysis, pipeline2))

all_rois = []

for var in df_ct[var_compare].unique():
    for roi_name in df_ct["roiName"].unique():
        df_icc_roi = pg.intraclass_corr(data=df_ct.loc[(df_ct["roiName"] == roi_name) & (df_ct[var_compare] == var)], 
                                        targets='path', raters='software',
                                        ratings='value')
        df_icc_roi["roiName"] = roi_name
        df_icc_roi[var_compare] = var
        
        all_rois.append(df_icc_roi.loc[df_icc_roi["Type"]=="ICC3"])
        
df_icc_rois = pd.concat(all_rois)


roi_grouped_by_stats = roi_grouped_by_stats.join(pd.concat(all_rois).set_index(["roiName", var_compare])[["ICC", "CI95%"]])


map_abbrev_to_name = df_names_rois.set_index("label")[["name"]].to_dict("series")["name"]
hem_label = {"r": "Right", "l": "Left"}

roi_grouped_by_stats["name"] = roi_grouped_by_stats.index.to_frame().roiName.apply(lambda x: map_abbrev_to_name[x[1:]])
roi_grouped_by_stats["hem"] = roi_grouped_by_stats.index.to_frame().roiName.apply(lambda x: hem_label[x[0].lower()])


dfs = []

rois_info = roi_grouped_by_stats.copy()
rois_info["label"] = roi_grouped_by_stats.index.get_level_values(0).str[1:].str.strip()
rois_info["hem"] = roi_grouped_by_stats.index.get_level_values(0).str[0]
rois_info = rois_info.reset_index().set_index(["label", "hem", var_compare])

for var in rois_info.index.get_level_values(2).unique():
    for hem in rois_info.index.get_level_values(1).unique():
        df_areas_metrics_var = pd.concat([df_areas, df_areas["Label"].apply(lambda x: rois_info.loc[(x, hem, var)])], axis=1)
        df_areas_metrics_var[var_compare] = var
        dfs.append(df_areas_metrics_var)
        
df_areas_metrics = pd.concat(dfs)
df_areas_metrics["hem"] = df_areas_metrics["roiName"].str[0]


df_icc = df_areas_metrics.pivot_table(index="name", columns=[var_compare, "hem"], values=["ICC"])

df_ci = df_areas_metrics.pivot_table(index="name", columns=[var_compare, "hem"], values=["CI95%"])

df_ci_icc = df_icc.join(df_ci) 
df_ci_icc.columns = df_ci_icc.columns.swaplevel(0, 2)
df_ci_icc.columns = df_ci_icc.columns.swaplevel(0, 1)
df_ci_icc = df_ci_icc[[df_ci_icc.columns[0], df_ci_icc.columns[-4], 
                     df_ci_icc.columns[1], df_ci_icc.columns[-3], 
                     df_ci_icc.columns[2], df_ci_icc.columns[-2], 
                     df_ci_icc.columns[3], df_ci_icc.columns[-1]]] 

roi_grouped_by_stats
slope intercept r_value p_value std_err r_square ICC CI95% name hem
roiName adult
lG_Ins_lg_and_S_cent_ins ADULT 4.62e-01 1.68e+00 5.13e-01 1.34e-13 5.77e-02 2.63e-01 5.10e-01 [0.39, 0.61] Long insular gyrus and central sulcus of the i... Left
PAEDIATRIC 2.59e-01 2.34e+00 4.21e-01 6.50e-18 2.86e-02 1.77e-01 3.76e-01 [0.29, 0.46] Long insular gyrus and central sulcus of the i... Left
lG_and_S_cingul-Ant ADULT 4.65e-01 1.46e+00 5.08e-01 2.36e-13 5.88e-02 2.58e-01 5.06e-01 [0.39, 0.61] Anterior part of the cingulate gyrus and sulcus Left
PAEDIATRIC 4.36e-01 1.60e+00 4.16e-01 1.64e-17 4.88e-02 1.73e-01 4.16e-01 [0.33, 0.5] Anterior part of the cingulate gyrus and sulcus Left
lG_and_S_cingul-Mid-Ant ADULT 6.04e-01 9.65e-01 6.61e-01 3.11e-24 5.11e-02 4.37e-01 6.58e-01 [0.57, 0.73] Middle-anterior part of the cingulate gyrus an... Left
... ... ... ... ... ... ... ... ... ... ... ...
rS_temporal_inf PAEDIATRIC 5.48e-01 1.12e+00 3.89e-01 2.54e-15 6.64e-02 1.51e-01 3.67e-01 [0.28, 0.45] Inferior temporal sulcus Right
rS_temporal_sup ADULT 7.52e-01 5.71e-01 7.86e-01 1.99e-39 4.41e-02 6.18e-01 7.85e-01 [0.72, 0.84] Superior temporal sulcus Right
PAEDIATRIC 6.82e-01 8.02e-01 6.52e-01 7.49e-48 4.06e-02 4.25e-01 6.51e-01 [0.59, 0.71] Superior temporal sulcus Right
rS_temporal_transverse ADULT 8.75e-01 4.57e-01 6.37e-01 4.40e-22 7.90e-02 4.05e-01 6.06e-01 [0.51, 0.69] Transverse temporal sulcus Right
PAEDIATRIC 3.11e-01 1.88e+00 3.06e-01 8.60e-10 4.95e-02 9.39e-02 3.06e-01 [0.21, 0.39] Transverse temporal sulcus Right

296 rows × 10 columns

#@title
df_sorted = pd.pivot_table(roi_grouped_by_stats.reset_index(), index="name", values=["r_square"], columns=[var_compare, "hem"])
df_sorted
r_square
adult ADULT PAEDIATRIC
hem Left Right Left Right
name
Angular gyrus 3.56e-01 4.28e-01 3.94e-01 4.06e-01
Anterior occipital sulcus and preoccipital notch 4.29e-01 4.40e-01 2.98e-01 2.61e-01
Anterior part of the cingulate gyrus and sulcus 2.58e-01 1.50e-01 1.73e-01 1.60e-01
Anterior segment of the circular sulcus of the insula 9.09e-02 1.62e-01 1.85e-01 1.25e-01
Anterior transverse collateral sulcus 6.69e-02 6.02e-02 1.67e-02 7.26e-02
... ... ... ... ...
Temporal pole 3.04e-01 2.76e-01 4.37e-01 5.09e-01
Transverse frontopolar gyri and sulci 3.84e-01 4.27e-01 3.06e-01 2.96e-01
Transverse temporal sulcus 4.81e-01 4.05e-01 3.32e-01 9.39e-02
Triangular part of the inferior frontal gyrus 3.83e-01 3.92e-01 3.82e-01 2.76e-01
Vertical ramus of the anterior segment of the lateral sulcus 1.62e-01 1.22e-01 1.47e-01 1.29e-01

74 rows × 4 columns

Reproducibility metrics analysis#

#@title
roi_grouped_by_stats.groupby(by=var_compare).apply(lambda x: pd.Series([f'{x["ICC"].mean(): 0.2f}+/-{x["ICC"].std(): 0.2f}',
                                                                        f'{x["r_square"].mean(): 0.2f}+/-{x["r_square"].std(): 0.2f}'], 
                                                                        index=['ICC', '$R^2$']))
ICC $R^2$
adult
ADULT 0.59+/- 0.15 0.38+/- 0.16
PAEDIATRIC 0.52+/- 0.14 0.30+/- 0.14
df_roi_t = roi_grouped_by_stats.reset_index()

for var in ["r_square","ICC"]:

    df_result = pg.ttest(df_roi_t.loc[df_roi_t[var_compare] == "PAEDIATRIC"][var], 
                         df_roi_t.loc[df_roi_t[var_compare] == "ADULT"][var])
    
    display(HTML(f"<br><h5>{var}</h5>"))
    display(HTML(df_result.to_html()))
    

r_square
T dof alternative p-val CI95% cohen-d BF10 power
T-test -4.79e+00 294 two-sided 2.59e-06 [-0.12, -0.05] 5.57e-01 5415.382 9.98e-01

ICC
T dof alternative p-val CI95% cohen-d BF10 power
T-test -3.98e+00 294 two-sided 8.78e-05 [-0.1, -0.03] 4.62e-01 209.663 9.77e-01
Mean per lobe#
#@title

to_group_by = ["Area"]
df_areas_metrics_grouped = df_areas_metrics.groupby(to_group_by + [var_compare, "hem"]).mean()
df_lobes = pd.pivot_table(df_areas_metrics_grouped.reset_index(), index=to_group_by, values=["r_square"], columns=[var_compare])

df_lobes
r_square
adult ADULT PAEDIATRIC
Area
Frontal Lobe 3.47e-01 2.82e-01
Insula 2.42e-01 1.58e-01
Limbic lobe 3.65e-01 2.69e-01
Parietal lobe 4.91e-01 4.07e-01
Temporal and occipital lobes 4.14e-01 3.08e-01

Paired t-test#

all_df = []
for var in df_ct[var_compare].unique():
    for roiname in df_ct['roiName'].unique():
        df_posthoc = pg.pairwise_ttests(data=df_ct.loc[(df_ct['roiName'] == roiname) & (df_ct[var_compare] == var)], 
                                        dv='value', within='software', subject='path',
                                     parametric=True, padjust='fdr_bh', effsize='cohen')

        # Pretty printing of table
        df_posthoc['roiName'] = roiname
        df_posthoc[var_compare] = var
        all_df.append(df_posthoc)
df_f_p_value = pd.concat(all_df)
df_f_p_value.head(4)
Contrast A B Paired Parametric T dof alternative p-unc BF10 cohen roiName adult
0 software ACPC_CAT12 FREESURFER True True -5.27e+00 1.81e+02 two-sided 3.87e-07 2.789e+04 -3.87e-01 lG_Ins_lg_and_S_cent_ins ADULT
0 software ACPC_CAT12 FREESURFER True True -2.17e+00 1.81e+02 two-sided 3.16e-02 0.81 -1.60e-01 lG_and_S_cingul-Ant ADULT
0 software ACPC_CAT12 FREESURFER True True 1.21e+01 1.81e+02 two-sided 4.82e-25 7.93e+21 7.41e-01 lG_and_S_cingul-Mid-Ant ADULT
0 software ACPC_CAT12 FREESURFER True True 3.18e+00 1.81e+02 two-sided 1.73e-03 10.574 1.59e-01 lG_and_S_cingul-Mid-Post ADULT
df_f_p_value["hem"] = df_f_p_value.apply(lambda x: hem_label[x.roiName[0].lower()], axis=1)
df_f_p_value["name"] = df_f_p_value.apply(lambda x: map_abbrev_to_name[x.roiName[1:]], axis=1)

# Bonferroni correction
significance_level = .05

df_f_p_value["p-value"] = df_f_p_value["p-unc"].apply(lambda x: min(x*df_f_p_value.shape[0], 1))
df_f_p_value_rejected = df_f_p_value[df_f_p_value["p-value"] < significance_level]

df_f_p_value["cohen"] = df_f_p_value["cohen"].astype(float).round(2).abs()
df_f_p_value["cohen-p-value"] = df_f_p_value[["p-value", "cohen"]].apply(lambda x: f'{x["cohen"]}*' if x['p-value'] < significance_level else f'{x["cohen"]}', axis=1).astype(str)


df_p_value = pd.pivot_table(df_f_p_value.reset_index(), index="name", values=["cohen-p-value"], columns=[var_compare, "hem"], aggfunc=lambda x: ' '.join(x))
df_p_value
cohen-p-value
adult ADULT PAEDIATRIC
hem Left Right Left Right
name
Angular gyrus 0.45* 0.73* 0.85* 0.9*
Anterior occipital sulcus and preoccipital notch 0.08 0.02 0.04 0.06
Anterior part of the cingulate gyrus and sulcus 0.16 0.2 0.34* 0.14
Anterior segment of the circular sulcus of the insula 0.08 0.17 0.09 0.45*
Anterior transverse collateral sulcus 0.28 0.38* 0.45* 0.48*
... ... ... ... ...
Temporal pole 0.9* 0.93* 0.75* 0.74*
Transverse frontopolar gyri and sulci 0.68* 0.49* 1.0* 0.74*
Transverse temporal sulcus 0.26* 0.52* 0.56* 0.82*
Triangular part of the inferior frontal gyrus 0.56* 0.38* 1.02* 0.76*
Vertical ramus of the anterior segment of the lateral sulcus 0.9* 0.45* 0.55* 0.16

74 rows × 4 columns

#@title
print(f"Number of ROIs with at least on group that rejected the paired t-test hypothesis: {df_f_p_value_rejected.reset_index().roiName.unique().shape[0]}")
Number of ROIs with at least on group that rejected the paired t-test hypothesis: 137
Number of regions per site in which the hypothesis was rejected#
#@title
df_f_p_value_rejected.reset_index().groupby(var_compare).count()[["name"]]
name
adult
ADULT 109
PAEDIATRIC 130