Reproducibility in paediatric versus early adults’ groups#

This notebook investigates whether different age stages (paediatric versus adults) have different reproducibility behaviour. The age range of the participants is from 6.5 to 56.2 years, and the analysis was carried out for two age groups: paediatric and adults. In this analysis the ABIDE I dataset was used. ABIDE I is a multi-site data sharing initiative (17 sites in total) with neurotypical controls and subjects diagnosed with Autism Spectrum Disorder (ASD).

#@title

import os
import math
import matplotlib
import numpy as np
import pandas as pd
import pingouin as pg
import scipy.stats
import seaborn as sns
import matplotlib.pyplot as plt
from collections import Counter

from sklearn.linear_model import LinearRegression
from utils import get_regression_metrics, bland_altman_plot
from settings import RESOURCE_DIR, ABIDE_MAP_SITES

# Tables format
#%matplotlib qt

pd.set_option('display.float_format', lambda x: f"{x: 0.2e}")

# Visualization format
font = {'size'   : 18}
matplotlib.rc('font', **font)
sns.set_theme(style="whitegrid", font_scale=2)

# Global variables
var_compare = "adult"
template_name = "a2009s"
metric_analysis = "corticalThicknessAverage"
path_csv_cortical_data = "cortical_thicknesss_paediatric_vs_early_adults_groups.csv"

color_pallete = {"ADULT": "#25654B", "PAEDIATRIC": "#C94444"}


var_analyse = ["r_square", "slope", "intercept"]

pipeline1 = "ACPC_CAT12"
pipeline2 = "FREESURFER"

/home/fmachado/anaconda3/envs/fs-cat12/lib/python3.9/site-packages/outdated/utils.py:14: OutdatedPackageWarning: The package pingouin is out of date. Your version is 0.5.1, the latest is 0.5.2.
Set the environment variable OUTDATED_IGNORE=1 to disable these warnings.
  return warn(

#@title

df_names_rois = pd.read_csv(os.path.join(RESOURCE_DIR, "template", f"{template_name}_atlas_labels.csv"))
df_areas = pd.read_csv(os.path.join(RESOURCE_DIR, "template", f"{template_name}_rois_areas.csv"))

Data#

#@title
df_software_raw = pd.read_csv(os.path.join(RESOURCE_DIR, "data", path_csv_cortical_data), low_memory=False)
df_software_raw.set_index("path", inplace=True)
# Get only the template ROIs
df_software_raw = df_software_raw.loc[df_software_raw.roiName.str[1:].isin(df_names_rois.label.to_list())]

df_software_raw.site = df_software_raw.site.apply(lambda x: ABIDE_MAP_SITES[x])
df_software_raw.head(4)

	repositoryName	site	adult	subjectID	sessionID	run	age	gender	software	roiName	corticalThicknessAverage	template	repository_and_site	cjv	cnr	snr_total
path
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz	ABIDE_I	California Institute of Technology	ADULT	sub-0051475	1	1	4.42e+01	MALE	ACPC_CAT12	lG_Ins_lg_and_S_cent_ins	3.01e+00	a2009s	ABIDE_I\|CaliforniaInstituteofTechnology	4.40e-01	3.16e+00	1.52e+01
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz	ABIDE_I	California Institute of Technology	ADULT	sub-0051475	1	1	4.42e+01	MALE	ACPC_CAT12	lG_and_S_cingul-Ant	3.11e+00	a2009s	ABIDE_I\|CaliforniaInstituteofTechnology	4.40e-01	3.16e+00	1.52e+01
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz	ABIDE_I	California Institute of Technology	ADULT	sub-0051475	1	1	4.42e+01	MALE	ACPC_CAT12	lG_and_S_cingul-Mid-Ant	2.94e+00	a2009s	ABIDE_I\|CaliforniaInstituteofTechnology	4.40e-01	3.16e+00	1.52e+01
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz	ABIDE_I	California Institute of Technology	ADULT	sub-0051475	1	1	4.42e+01	MALE	ACPC_CAT12	lG_and_S_cingul-Mid-Post	2.73e+00	a2009s	ABIDE_I\|CaliforniaInstituteofTechnology	4.40e-01	3.16e+00	1.52e+01

Check preprocessing problems#

Some images were only ran by one of the softwares, in the code below images with preprocessing problems are discoved, and those subjects/session/run are removed from the analysis.

#@title
roi_name = df_software_raw.loc[(~df_software_raw[metric_analysis].isna()) & (df_software_raw.template == template_name)].roiName.to_list()[1]

df_group = df_software_raw[df_software_raw["roiName"] == roi_name].copy()
df_group = df_group[df_group.template == template_name].groupby(by=['path']).apply(lambda x: list(x["software"]))

# Get the images run by only one
df_problems_running = df_group[df_group.apply(len) == 1].to_frame()

software_problems = [item for sublist in df_problems_running[0].to_list() for item in sublist]

print("\n".join([f"{number_problems} images were only processed by: {soft_name}" for soft_name, number_problems in Counter(software_problems).items()]))

2 images were only processed by: ACPC_CAT12
3 images were only processed by: FREESURFER

Exclude the subjects only run by only one software#

#@title
df_software_raw = df_software_raw[~df_software_raw.index.isin(df_problems_running.index)]

Demographics#

#@title
df_subjects = df_software_raw[["repositoryName", "site", var_compare, "repository_and_site", "subjectID", "sessionID", "run", "age", "gender", "snr_total"]].drop_duplicates().copy()
df_subjects.site = df_subjects.site.apply(lambda x: "" if type(x)==float else x)

if len(df_subjects["repositoryName"].unique()) == 1: 
    groupby_col = ["site"] 
else:
    groupby_col = ["repositoryName", "site"]
    
df_summary = df_subjects.groupby(groupby_col).apply(lambda x: pd.Series({"Total of participants": len(x), 
                                                                         "Number males": (x["gender"]=="MALE").sum(),
                                                                         "Mean and standard deviation [years]": f"{x['age'].mean(): 0.2f}+/-{x['age'].std(): 0.2f}",
                                                                         "Min Age [years]": f"{x['age'].min(): 0.2f}",
                                                                         "Max Age [years]": f"{x['age'].max(): 0.2f}"}))


df_summary

	Total of participants	Number males	Mean and standard deviation [years]	Min Age [years]	Max Age [years]
site
California Institute of Technology	19	15	28.87+/- 11.21	17.00	56.20
Carnegie Mellon University	13	10	26.85+/- 5.74	20.00	40.00
Kennedy Krieger Institute	33	24	10.16+/- 1.26	8.07	12.77
Ludwig Maximilians University Munich	32	28	26.19+/- 9.96	7.00	48.00
NYU Langone Medical Center	104	78	15.83+/- 6.28	6.47	31.78
Olin Institute of Living at Hartford Hospital	16	14	16.94+/- 3.68	10.00	23.00
Oregon Health and Science University	15	15	10.06+/- 1.08	8.20	11.99
San Diego State University	22	16	14.22+/- 1.90	8.67	16.88
Social Brain Lab	15	15	33.73+/- 6.61	20.00	42.00
Stanford University	19	15	9.97+/- 1.64	7.75	12.43
Trinity Centre for Health Sciences	25	25	17.08+/- 3.77	12.04	25.66
University of California Los Angeles	45	39	12.96+/- 1.92	9.21	17.79
University of Leuven	35	30	18.17+/- 4.99	12.20	29.00
University of Michigan	75	58	14.82+/- 3.62	8.20	28.80
University of Pittsburgh School of Medicine	27	23	18.88+/- 6.64	9.44	33.24
University of Utah School of Medicine	43	43	21.36+/- 7.64	8.77	39.39
Yale Child Study Center	28	20	12.68+/- 2.75	7.66	17.83

#@title
df_software = df_software_raw.reset_index().pivot(['path',  'subjectID', 'sessionID', 'run', 'template', var_compare,'roiName'], ['software'], [metric_analysis]).reset_index().set_index("path")
df_software.head(4)

	subjectID	sessionID	run	template	adult	roiName	corticalThicknessAverage
software							ACPC_CAT12	FREESURFER
path
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz	sub-0051475	1	1	a2009s	ADULT	lG_Ins_lg_and_S_cent_ins	3.01e+00	3.14e+00
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz	sub-0051475	1	1	a2009s	ADULT	lG_and_S_cingul-Ant	3.11e+00	3.02e+00
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz	sub-0051475	1	1	a2009s	ADULT	lG_and_S_cingul-Mid-Ant	2.94e+00	2.73e+00
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz	sub-0051475	1	1	a2009s	ADULT	lG_and_S_cingul-Mid-Post	2.73e+00	2.60e+00

Search subjects with at least one ROI#

#@title
df_rois_nan = df_software.loc[df_software[[('corticalThicknessAverage', 'FREESURFER'), ('corticalThicknessAverage', 'ACPC_CAT12')]].isna().sum(axis=1)>0]
df_rois_nan

	subjectID	sessionID	run	template	adult	roiName	corticalThicknessAverage
software							ACPC_CAT12	FREESURFER
path

Exclude subjects with at least one ROI#

#@title
df_software_raw = df_software_raw.loc[~df_software_raw.index.isin(df_rois_nan.index.get_level_values(0).to_list())]
df_software = df_software.loc[~df_software.index.isin(df_rois_nan.index.get_level_values(0).to_list())]

Analysis#

Overall Analysis#

Bland-Altman plot#

from IPython.display import display, HTML

for var_value in df_software[var_compare].unique():
    df_soft_filter = df_software.loc[df_software[var_compare] == var_value]
    diff = (df_soft_filter[("corticalThicknessAverage", "FREESURFER")] - df_soft_filter[("corticalThicknessAverage", "ACPC_CAT12")])
    test = pg.ttest(diff, 0)


    df_means_diff = df_soft_filter[[("corticalThicknessAverage", "FREESURFER"), 
                                    ("corticalThicknessAverage", "ACPC_CAT12")]].mean().to_frame().T.rename(columns={"corticalThicknessAverage": "Cortical Thickness Average Mean"})
    df_means_diff["Difference FREESURFER - CAT12"] = diff.mean()
    
    
    test = pg.ttest(diff, 0)
    display(HTML(f"<br><h5>{var_value}</h5>"))
    display(HTML(df_means_diff.to_html()))
    
    display(HTML(f"<br><p>One sample t-test to verify whether the means difference is equal to zero</p>"))
    display(HTML(test.to_html()))
    print(f"Confidence interval (95%) of the means difference: [{round(diff.mean() - 1.96*diff.std(), 3)};{round(diff.mean() + 1.96*diff.std(), 3)}]")

ADULT

	Cortical Thickness Average Mean		Difference FREESURFER - CAT12
software	FREESURFER	ACPC_CAT12
0	2.49e+00	2.45e+00	3.42e-02

One sample t-test to verify whether the means difference is equal to zero

	T	dof	alternative	p-val	CI95%	cohen-d	BF10	power
T-test	2.30e+01	26935	two-sided	3.26e-116	[0.03, 0.04]	1.40e-01	7.157e+111	1.00e+00

Confidence interval (95%) of the means difference: [-0.444;0.512]

/home/fmachado/anaconda3/envs/fs-cat12/lib/python3.9/site-packages/pingouin/bayesian.py:146: RuntimeWarning: divide by zero encountered in double_scalars
  bf10 = 1 / ((1 + t**2 / df)**(-(df + 1) / 2) / integr)
/home/fmachado/anaconda3/envs/fs-cat12/lib/python3.9/site-packages/pingouin/bayesian.py:146: RuntimeWarning: divide by zero encountered in double_scalars
  bf10 = 1 / ((1 + t**2 / df)**(-(df + 1) / 2) / integr)

PAEDIATRIC

	Cortical Thickness Average Mean		Difference FREESURFER - CAT12
software	FREESURFER	ACPC_CAT12
0	2.60e+00	2.52e+00	8.00e-02

One sample t-test to verify whether the means difference is equal to zero

	T	dof	alternative	p-val	CI95%	cohen-d	BF10	power
T-test	6.47e+01	56831	two-sided	0.00e+00	[0.08, 0.08]	2.71e-01	inf	1.00e+00

Confidence interval (95%) of the means difference: [-0.498;0.658]

#@title
y_label_txt = "Difference Freesurfer - CAT12 [mm]"

for var in df_software[var_compare].unique():
    df_filt = df_software.loc[df_software[var_compare] == var]
    bland_altman_plot(df_filt.copy().reset_index(),
                      ('corticalThicknessAverage', 'FREESURFER'), 
                      ('corticalThicknessAverage', 'ACPC_CAT12'),
                      var_compare, color_pallete, x_label_reg="FREESURFER [mm]", y_label_reg="CAT12 [mm]",
                      y_label_diff="Difference Freesurfer - CAT12 [mm]")

../_images/obj_b_peadiatric_vs_adults_21_0.png

../_images/obj_b_peadiatric_vs_adults_21_1.png

#@title

df = df_software.reset_index().set_index(["subjectID", "sessionID", "run"]).join(df_subjects.reset_index().set_index(["subjectID", "sessionID", "run"]))
df.columns = ["".join(el) for el in df.columns]
df.head(3)

/tmp/ipykernel_42235/928943096.py:3: FutureWarning: merging between different levels is deprecated and will be removed in a future version. (2 levels on the left, 1 on the right)
  df = df_software.reset_index().set_index(["subjectID", "sessionID", "run"]).join(df_subjects.reset_index().set_index(["subjectID", "sessionID", "run"]))

			path	template	adult	roiName	corticalThicknessAverageACPC_CAT12	corticalThicknessAverageFREESURFER	path	repositoryName	site	adult	repository_and_site	age	gender	snr_total
subjectID	sessionID	run
sub-0050030	1	1	ABIDE_I/UniversityofPittsburghSchoolofMedicine...	a2009s	ADULT	lG_Ins_lg_and_S_cent_ins	3.43e+00	2.96e+00	ABIDE_I/UniversityofPittsburghSchoolofMedicine...	ABIDE_I	University of Pittsburgh School of Medicine	ADULT	ABIDE_I\|UniversityofPittsburghSchoolofMedicine	2.51e+01	MALE	1.23e+01
		1	ABIDE_I/UniversityofPittsburghSchoolofMedicine...	a2009s	ADULT	lG_and_S_cingul-Ant	2.62e+00	2.57e+00	ABIDE_I/UniversityofPittsburghSchoolofMedicine...	ABIDE_I	University of Pittsburgh School of Medicine	ADULT	ABIDE_I\|UniversityofPittsburghSchoolofMedicine	2.51e+01	MALE	1.23e+01
		1	ABIDE_I/UniversityofPittsburghSchoolofMedicine...	a2009s	ADULT	lG_and_S_cingul-Mid-Ant	3.05e+00	2.79e+00	ABIDE_I/UniversityofPittsburghSchoolofMedicine...	ABIDE_I	University of Pittsburgh School of Medicine	ADULT	ABIDE_I\|UniversityofPittsburghSchoolofMedicine	2.51e+01	MALE	1.23e+01

Participant Analysis#

#@title
print(f'Number of subjects in this analysis is: {len(df_software.reset_index()["subjectID"].unique())}')

Number of subjects in this analysis is: 566

Compute metrics#

#@title

sub_grouped_by, sub_grouped_by_stats = get_regression_metrics(df_software, ['path'], (metric_analysis, pipeline1), (metric_analysis, pipeline2))

df_ct = df_software.reset_index()[["path", var_compare, "roiName", "corticalThicknessAverage"]].melt(id_vars=["path", var_compare, "roiName"])

all_subjects_icc = []

for path_sub in df_ct["path"].unique():
    df_icc_roi = pg.intraclass_corr(data=df_ct.loc[df_ct["path"] == path_sub], 
                                    targets='roiName', raters='software',
                                    ratings='value')
    df_icc_roi["path"] = path_sub

    all_subjects_icc.append(df_icc_roi.loc[df_icc_roi["Type"]=="ICC3"])

df_icc_sub = pd.concat(all_subjects_icc).set_index("path")

df_subject_info = df_subjects.reset_index().set_index(['path']).join(sub_grouped_by_stats).join(df_icc_sub[["ICC"]])
df_subject_info = df_subject_info[~df_subject_info["slope"].isna()]

df_subject_info.head(5)

	repositoryName	site	adult	repository_and_site	subjectID	sessionID	run	age	gender	snr_total	slope	intercept	r_value	p_value	std_err	r_square	ICC
path
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051475/anat/sub-0051475_T1w.nii.gz	ABIDE_I	California Institute of Technology	ADULT	ABIDE_I\|CaliforniaInstituteofTechnology	sub-0051475	1	1	4.42e+01	MALE	1.52e+01	7.74e-01	7.09e-01	8.12e-01	6.82e-36	4.61e-02	6.59e-01	8.11e-01
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051476/anat/sub-0051476_T1w.nii.gz	ABIDE_I	California Institute of Technology	ADULT	ABIDE_I\|CaliforniaInstituteofTechnology	sub-0051476	1	1	3.93e+01	MALE	1.64e+01	9.63e-01	2.11e-01	8.99e-01	4.25e-54	3.89e-02	8.07e-01	8.96e-01
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051477/anat/sub-0051477_T1w.nii.gz	ABIDE_I	California Institute of Technology	ADULT	ABIDE_I\|CaliforniaInstituteofTechnology	sub-0051477	1	1	4.25e+01	MALE	1.60e+01	8.36e-01	4.95e-01	8.24e-01	8.18e-38	4.76e-02	6.79e-01	8.24e-01
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051478/anat/sub-0051478_T1w.nii.gz	ABIDE_I	California Institute of Technology	ADULT	ABIDE_I\|CaliforniaInstituteofTechnology	sub-0051478	1	1	1.97e+01	MALE	1.56e+01	9.04e-01	3.54e-01	7.86e-01	2.85e-32	5.89e-02	6.17e-01	7.78e-01
ABIDE_I/CaliforniaInstituteofTechnology/sub-0051479/anat/sub-0051479_T1w.nii.gz	ABIDE_I	California Institute of Technology	ADULT	ABIDE_I\|CaliforniaInstituteofTechnology	sub-0051479	1	1	2.00e+01	FEMALE	1.52e+01	9.14e-01	3.74e-01	8.36e-01	7.58e-40	4.97e-02	6.99e-01	8.32e-01

Reproducibility metrics analysis#

#@title
df_subject_info.groupby(by=var_compare).apply(lambda x: pd.Series([f'{x["ICC"].mean(): 0.2f}+/-{x["ICC"].std(): 0.2f}',
                                                                   f'{x["r_square"].mean(): 0.2f}+/-{x["r_square"].std(): 0.2f}'], 
                                                                   index=['ICC', '$R^2$']))

	ICC	$R^2$
adult
ADULT	0.79+/- 0.06	0.64+/- 0.09
PAEDIATRIC	0.69+/- 0.11	0.51+/- 0.14

Analysis of age effect on reproducibility metrics#

def latex_float(float_str):
    
    if "e" in float_str:
        base, exponent = float_str.split("e")
        return r"{0} \times 10^{{{1}}}".format(base, int(exponent))
    else:
        return float_str

import re, seaborn as sns
import numpy as np

from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.colors import ListedColormap
from IPython.display import display, Math


reproducibility_measures = [("ICC", "ICC"), ("r_square", "$R^2$")]

for rep_measure, label_measure in reproducibility_measures:
    
    for el in df_subject_info[var_compare].unique():
        legend = el

        data_filtered = df_subject_info.loc[df_subject_info[var_compare] == el]
                
        x_values = data_filtered[["age", "snr_total"]]
        y_values = data_filtered[rep_measure]
        
        model = LinearRegression().fit(x_values, y_values)

        
        age_weight = latex_float(f'{model.coef_[0]: 0.1e}')
        SNR_weight = latex_float(f'{model.coef_[1]: 0.1e}')
        intercept = latex_float(f'{model.intercept_: 0.1e}')
        
        eq = f'${label_measure.replace("$", "")}_' + "{" + legend + "}" + f'={age_weight}age + {SNR_weight}SNR + {intercept}$'
        display(Math(r'{}'.format(eq)))
        
    fig = plt.figure(figsize=(6,6))
    ax = Axes3D(fig, auto_add_to_figure=False)
    fig.add_axes(ax)

        
    for el in df_subject_info["site"].unique():

        df_filt = df_subject_info.loc[df_subject_info["site"] == el]
        x= df_filt["age"].values
        y= df_filt["snr_total"].values
        z= df_filt[rep_measure].values
        # axes instance

        # get colormap from seaborn
        cmap = ListedColormap(sns.color_palette("husl", 256).as_hex())

        # plot
        sc = ax.scatter(x, y, z, s=40, marker='o', cmap=cmap, alpha=1)

    ax.set_xlabel('Age', labelpad=15)
    ax.set_ylabel('SNR', labelpad=30)
    ax.set_zlabel(label_measure, labelpad=30)
    
    for label in ax.get_yaxis().get_ticklabels()[::2]:
        label.set_visible(False)
    ax.view_init(elev=10., azim=-80)

    plt.subplots_adjust(bottom=0.2, top=2)
    plt.setp(ax.get_yticklabels(), rotation=0)

    plt.show()
    

\[\displaystyle ICC_{ADULT}= 2.0 \times 10^{-3}age + 2.3 \times 10^{-3}SNR + 7.1 \times 10^{-1}\]

\[\displaystyle ICC_{PAEDIATRIC}= 8.9 \times 10^{-3}age + 8.9 \times 10^{-3}SNR + 5.0 \times 10^{-1}\]

../_images/obj_b_peadiatric_vs_adults_31_2.png

\[\displaystyle R^2_{ADULT}= 2.9 \times 10^{-3}age + 4.4 \times 10^{-3}SNR + 5.2 \times 10^{-1}\]

\[\displaystyle R^2_{PAEDIATRIC}= 1.2 \times 10^{-2}age + 1.3 \times 10^{-2}SNR + 2.3 \times 10^{-1}\]

../_images/obj_b_peadiatric_vs_adults_31_5.png

ROI Analysis#

#@title
roi_grouped_by, roi_grouped_by_stats = get_regression_metrics(df_software, ["roiName", var_compare], (metric_analysis, pipeline1), (metric_analysis, pipeline2))

all_rois = []

for var in df_ct[var_compare].unique():
    for roi_name in df_ct["roiName"].unique():
        df_icc_roi = pg.intraclass_corr(data=df_ct.loc[(df_ct["roiName"] == roi_name) & (df_ct[var_compare] == var)], 
                                        targets='path', raters='software',
                                        ratings='value')
        df_icc_roi["roiName"] = roi_name
        df_icc_roi[var_compare] = var
        
        all_rois.append(df_icc_roi.loc[df_icc_roi["Type"]=="ICC3"])
        
df_icc_rois = pd.concat(all_rois)


roi_grouped_by_stats = roi_grouped_by_stats.join(pd.concat(all_rois).set_index(["roiName", var_compare])[["ICC", "CI95%"]])


map_abbrev_to_name = df_names_rois.set_index("label")[["name"]].to_dict("series")["name"]
hem_label = {"r": "Right", "l": "Left"}

roi_grouped_by_stats["name"] = roi_grouped_by_stats.index.to_frame().roiName.apply(lambda x: map_abbrev_to_name[x[1:]])
roi_grouped_by_stats["hem"] = roi_grouped_by_stats.index.to_frame().roiName.apply(lambda x: hem_label[x[0].lower()])


dfs = []

rois_info = roi_grouped_by_stats.copy()
rois_info["label"] = roi_grouped_by_stats.index.get_level_values(0).str[1:].str.strip()
rois_info["hem"] = roi_grouped_by_stats.index.get_level_values(0).str[0]
rois_info = rois_info.reset_index().set_index(["label", "hem", var_compare])

for var in rois_info.index.get_level_values(2).unique():
    for hem in rois_info.index.get_level_values(1).unique():
        df_areas_metrics_var = pd.concat([df_areas, df_areas["Label"].apply(lambda x: rois_info.loc[(x, hem, var)])], axis=1)
        df_areas_metrics_var[var_compare] = var
        dfs.append(df_areas_metrics_var)
        
df_areas_metrics = pd.concat(dfs)
df_areas_metrics["hem"] = df_areas_metrics["roiName"].str[0]


df_icc = df_areas_metrics.pivot_table(index="name", columns=[var_compare, "hem"], values=["ICC"])

df_ci = df_areas_metrics.pivot_table(index="name", columns=[var_compare, "hem"], values=["CI95%"])

df_ci_icc = df_icc.join(df_ci) 
df_ci_icc.columns = df_ci_icc.columns.swaplevel(0, 2)
df_ci_icc.columns = df_ci_icc.columns.swaplevel(0, 1)
df_ci_icc = df_ci_icc[[df_ci_icc.columns[0], df_ci_icc.columns[-4], 
                     df_ci_icc.columns[1], df_ci_icc.columns[-3], 
                     df_ci_icc.columns[2], df_ci_icc.columns[-2], 
                     df_ci_icc.columns[3], df_ci_icc.columns[-1]]] 

roi_grouped_by_stats

		slope	intercept	r_value	p_value	std_err	r_square	ICC	CI95%	name	hem
roiName	adult
lG_Ins_lg_and_S_cent_ins	ADULT	4.62e-01	1.68e+00	5.13e-01	1.34e-13	5.77e-02	2.63e-01	5.10e-01	[0.39, 0.61]	Long insular gyrus and central sulcus of the i...	Left
lG_Ins_lg_and_S_cent_ins	PAEDIATRIC	2.59e-01	2.34e+00	4.21e-01	6.50e-18	2.86e-02	1.77e-01	3.76e-01	[0.29, 0.46]	Long insular gyrus and central sulcus of the i...	Left
lG_and_S_cingul-Ant	ADULT	4.65e-01	1.46e+00	5.08e-01	2.36e-13	5.88e-02	2.58e-01	5.06e-01	[0.39, 0.61]	Anterior part of the cingulate gyrus and sulcus	Left
lG_and_S_cingul-Ant	PAEDIATRIC	4.36e-01	1.60e+00	4.16e-01	1.64e-17	4.88e-02	1.73e-01	4.16e-01	[0.33, 0.5]	Anterior part of the cingulate gyrus and sulcus	Left
lG_and_S_cingul-Mid-Ant	ADULT	6.04e-01	9.65e-01	6.61e-01	3.11e-24	5.11e-02	4.37e-01	6.58e-01	[0.57, 0.73]	Middle-anterior part of the cingulate gyrus an...	Left
...	...	...	...	...	...	...	...	...	...	...	...
rS_temporal_inf	PAEDIATRIC	5.48e-01	1.12e+00	3.89e-01	2.54e-15	6.64e-02	1.51e-01	3.67e-01	[0.28, 0.45]	Inferior temporal sulcus	Right
rS_temporal_sup	ADULT	7.52e-01	5.71e-01	7.86e-01	1.99e-39	4.41e-02	6.18e-01	7.85e-01	[0.72, 0.84]	Superior temporal sulcus	Right
rS_temporal_sup	PAEDIATRIC	6.82e-01	8.02e-01	6.52e-01	7.49e-48	4.06e-02	4.25e-01	6.51e-01	[0.59, 0.71]	Superior temporal sulcus	Right
rS_temporal_transverse	ADULT	8.75e-01	4.57e-01	6.37e-01	4.40e-22	7.90e-02	4.05e-01	6.06e-01	[0.51, 0.69]	Transverse temporal sulcus	Right
rS_temporal_transverse	PAEDIATRIC	3.11e-01	1.88e+00	3.06e-01	8.60e-10	4.95e-02	9.39e-02	3.06e-01	[0.21, 0.39]	Transverse temporal sulcus	Right

296 rows × 10 columns

#@title
df_sorted = pd.pivot_table(roi_grouped_by_stats.reset_index(), index="name", values=["r_square"], columns=[var_compare, "hem"])
df_sorted

	r_square
adult	ADULT		PAEDIATRIC
hem	Left	Right	Left	Right
name
Angular gyrus	3.56e-01	4.28e-01	3.94e-01	4.06e-01
Anterior occipital sulcus and preoccipital notch	4.29e-01	4.40e-01	2.98e-01	2.61e-01
Anterior part of the cingulate gyrus and sulcus	2.58e-01	1.50e-01	1.73e-01	1.60e-01
Anterior segment of the circular sulcus of the insula	9.09e-02	1.62e-01	1.85e-01	1.25e-01
Anterior transverse collateral sulcus	6.69e-02	6.02e-02	1.67e-02	7.26e-02
...	...	...	...	...
Temporal pole	3.04e-01	2.76e-01	4.37e-01	5.09e-01
Transverse frontopolar gyri and sulci	3.84e-01	4.27e-01	3.06e-01	2.96e-01
Transverse temporal sulcus	4.81e-01	4.05e-01	3.32e-01	9.39e-02
Triangular part of the inferior frontal gyrus	3.83e-01	3.92e-01	3.82e-01	2.76e-01
Vertical ramus of the anterior segment of the lateral sulcus	1.62e-01	1.22e-01	1.47e-01	1.29e-01

74 rows × 4 columns

Reproducibility metrics analysis#

#@title
roi_grouped_by_stats.groupby(by=var_compare).apply(lambda x: pd.Series([f'{x["ICC"].mean(): 0.2f}+/-{x["ICC"].std(): 0.2f}',
                                                                        f'{x["r_square"].mean(): 0.2f}+/-{x["r_square"].std(): 0.2f}'], 
                                                                        index=['ICC', '$R^2$']))

	ICC	$R^2$
adult
ADULT	0.59+/- 0.15	0.38+/- 0.16
PAEDIATRIC	0.52+/- 0.14	0.30+/- 0.14

df_roi_t = roi_grouped_by_stats.reset_index()

for var in ["r_square","ICC"]:

    df_result = pg.ttest(df_roi_t.loc[df_roi_t[var_compare] == "PAEDIATRIC"][var], 
                         df_roi_t.loc[df_roi_t[var_compare] == "ADULT"][var])
    
    display(HTML(f"<br><h5>{var}</h5>"))
    display(HTML(df_result.to_html()))
    

r_square

	T	dof	alternative	p-val	CI95%	cohen-d	BF10	power
T-test	-4.79e+00	294	two-sided	2.59e-06	[-0.12, -0.05]	5.57e-01	5415.382	9.98e-01

ICC

	T	dof	alternative	p-val	CI95%	cohen-d	BF10	power
T-test	-3.98e+00	294	two-sided	8.78e-05	[-0.1, -0.03]	4.62e-01	209.663	9.77e-01

Mean per lobe#

#@title

to_group_by = ["Area"]
df_areas_metrics_grouped = df_areas_metrics.groupby(to_group_by + [var_compare, "hem"]).mean()
df_lobes = pd.pivot_table(df_areas_metrics_grouped.reset_index(), index=to_group_by, values=["r_square"], columns=[var_compare])

df_lobes

	r_square
adult	ADULT	PAEDIATRIC
Area
Frontal Lobe	3.47e-01	2.82e-01
Insula	2.42e-01	1.58e-01
Limbic lobe	3.65e-01	2.69e-01
Parietal lobe	4.91e-01	4.07e-01
Temporal and occipital lobes	4.14e-01	3.08e-01

Paired t-test#

all_df = []
for var in df_ct[var_compare].unique():
    for roiname in df_ct['roiName'].unique():
        df_posthoc = pg.pairwise_ttests(data=df_ct.loc[(df_ct['roiName'] == roiname) & (df_ct[var_compare] == var)], 
                                        dv='value', within='software', subject='path',
                                     parametric=True, padjust='fdr_bh', effsize='cohen')

        # Pretty printing of table
        df_posthoc['roiName'] = roiname
        df_posthoc[var_compare] = var
        all_df.append(df_posthoc)
df_f_p_value = pd.concat(all_df)
df_f_p_value.head(4)

Contrast	A	B	Paired	Parametric	T	dof	alternative	p-unc	BF10	cohen	roiName	adult
software	ACPC_CAT12	FREESURFER	True	True	-5.27e+00	1.81e+02	two-sided	3.87e-07	2.789e+04	-3.87e-01	lG_Ins_lg_and_S_cent_ins	ADULT
software	ACPC_CAT12	FREESURFER	True	True	-2.17e+00	1.81e+02	two-sided	3.16e-02	0.81	-1.60e-01	lG_and_S_cingul-Ant	ADULT
software	ACPC_CAT12	FREESURFER	True	True	1.21e+01	1.81e+02	two-sided	4.82e-25	7.93e+21	7.41e-01	lG_and_S_cingul-Mid-Ant	ADULT
software	ACPC_CAT12	FREESURFER	True	True	3.18e+00	1.81e+02	two-sided	1.73e-03	10.574	1.59e-01	lG_and_S_cingul-Mid-Post	ADULT

df_f_p_value["hem"] = df_f_p_value.apply(lambda x: hem_label[x.roiName[0].lower()], axis=1)
df_f_p_value["name"] = df_f_p_value.apply(lambda x: map_abbrev_to_name[x.roiName[1:]], axis=1)

# Bonferroni correction
significance_level = .05

df_f_p_value["p-value"] = df_f_p_value["p-unc"].apply(lambda x: min(x*df_f_p_value.shape[0], 1))
df_f_p_value_rejected = df_f_p_value[df_f_p_value["p-value"] < significance_level]

df_f_p_value["cohen"] = df_f_p_value["cohen"].astype(float).round(2).abs()
df_f_p_value["cohen-p-value"] = df_f_p_value[["p-value", "cohen"]].apply(lambda x: f'{x["cohen"]}*' if x['p-value'] < significance_level else f'{x["cohen"]}', axis=1).astype(str)


df_p_value = pd.pivot_table(df_f_p_value.reset_index(), index="name", values=["cohen-p-value"], columns=[var_compare, "hem"], aggfunc=lambda x: ' '.join(x))
df_p_value

	cohen-p-value
adult	ADULT		PAEDIATRIC
hem	Left	Right	Left	Right
name
Angular gyrus	0.45*	0.73*	0.85*	0.9*
Anterior occipital sulcus and preoccipital notch	0.08	0.02	0.04	0.06
Anterior part of the cingulate gyrus and sulcus	0.16	0.2	0.34*	0.14
Anterior segment of the circular sulcus of the insula	0.08	0.17	0.09	0.45*
Anterior transverse collateral sulcus	0.28	0.38*	0.45*	0.48*
...	...	...	...	...
Temporal pole	0.9*	0.93*	0.75*	0.74*
Transverse frontopolar gyri and sulci	0.68*	0.49*	1.0*	0.74*
Transverse temporal sulcus	0.26*	0.52*	0.56*	0.82*
Triangular part of the inferior frontal gyrus	0.56*	0.38*	1.02*	0.76*
Vertical ramus of the anterior segment of the lateral sulcus	0.9*	0.45*	0.55*	0.16

74 rows × 4 columns

#@title
print(f"Number of ROIs with at least on group that rejected the paired t-test hypothesis: {df_f_p_value_rejected.reset_index().roiName.unique().shape[0]}")

Number of ROIs with at least on group that rejected the paired t-test hypothesis: 137

Number of regions per site in which the hypothesis was rejected#

#@title
df_f_p_value_rejected.reset_index().groupby(var_compare).count()[["name"]]

	name
adult
ADULT	109
PAEDIATRIC	130

FreeSurfer and CAT12 - A matter of reproducibility

Reproducibility in paediatric versus early adults’ groups

Contents

Reproducibility in paediatric versus early adults’ groups#

Data#

Check preprocessing problems#

Exclude the subjects only run by only one software#

Demographics#

Search subjects with at least one ROI#

Exclude subjects with at least one ROI#

Analysis#

Overall Analysis#

Bland-Altman plot#

ADULT

PAEDIATRIC

Participant Analysis#

Compute metrics#

Reproducibility metrics analysis#

Analysis of age effect on reproducibility metrics#

ROI Analysis#

Reproducibility metrics analysis#

r_square

ICC

Mean per lobe#

Paired t-test#

Number of regions per site in which the hypothesis was rejected#