When do you code your variables as levels in a factor, and when do you enter them as binary variables in a model? As it turns out, it isn’t necessarily a straightforward question to answer, and it as cliche as it sounds, it depends on your research question.
Using examples from a research project studying the associations between an ML-estimated metric of aging and bipolar disorder, we’ll discuss:
-
- Factors consider when deciding whether to recode your variables into multiple levels of a factor
-
- Implications of making such a decision
Background
For some context, bipolar disorder (BD) is known to be associated with both clinical and biological markers of premature aging, just as the case with other brain disorders such as schizophrenia and traumatic brain injury. Studies have shown that individuals with BD exhibit signs of advanced or accelerated aging compared to healthy individuals. The following are a subset of reasons for studying aging in bipolar disorder:
-
- Age-related Comorbidities: Patients with BD frequently present with age-related health issues earlier than expected, such as cardiovascular diseases, diabetes, and neurocognitive decline. These conditions contribute to the overall burden of the illness and impact the quality of life and longevity of affected individuals (Wrigglesworth et al., 2021).
-
- Cognitive Decline: Cognitive deficits are common in BD and often worsen with age. Studies have reported that people with BD show more significant cognitive decline over time compared to healthy individuals, indicating that BD may contribute to premature brain aging (Lewandowski et al., 2014).
-
- Neuroimaging Findings: Neuroimaging studies have shown that individuals with BD exhibit structural brain changes, such as reduced grey matter volume, white matter hyperintensities, and hippocampal atrophy, which are often observed in aging populations. These findings support the hypothesis that BD is linked to accelerated brain aging (Kaufman et al. 2019).
Brain age models are simply machine learning models developed from training models on neuroimaging data like MRIs or PET scans to predict an individual’s chronological age. Such models have been applied to various neuropsychiatric disorders (Baecker et al., 2021), and the difference between the predicted age and the chronological age is called Brain-Predicted Age Difference (Brain-PAD). Evidence from previous studies also suggest that brain-PAD represents traits that are genetically influenced, and that genetic variants associated with brain-PAD in HCs overlap partially with those associated observed in Alzheimer’s disease (AD), autism spectrum disorder (ASD), attention-deficit/hyperactivity disorder (ADHD), MDD, SZ, and BD (Kaufman et al. 2019).
-
- Multivariate Measure of Aging: The predicted age and the resulting brain-PAD is considered a multivariate measure because it is derived from multiple neuroimaging features, such as regional subcortical and lateral ventricle volumes, cortical thickness, and surface area. This allows researchers to measure the impact of BD in a more holistic manner on cognitive functioning.
-
- Investigating Clinical Relationships: Predicted age or brain-PAD can also be correlated with various clinical and demographic factors, such as illness duration, severity, subtype of BD, and lifestyle factors. This helps in understanding the impact of BD on brain health and aging.
In this study, we’re primarily interested in two questions, and the focus of this article is on the second question:
-
- How does brain aging, as estimated by a machine learning model with this metric called Brain Predicted Age Difference (Brain-PAD), differ in people with bipolar disorder (BD) compared to healthy individuals?
-
- Among people with BD, how does the use of different medications (antiepileptics, second-generation antipsychotics, lithium) relate to Brain-PAD? Specifically, is there evidence that antiepileptics and second-generation antipsychotics are associated with more advanced brain aging, and is lithium use associated with less advanced brain aging?
Impact of medications on Brain-Predicted Age Difference
Among people with BD, how does the use of different medications (antiepileptics, second-generation antipsychotics, lithium) relate to Brain-PAD? Specifically, is there evidence that antiepileptics and second-generation antipsychotics are associated with more advanced brain aging, and is lithium use associated with less advanced brain aging?
Prior work has suggested lithium is associated with better brain integrity measures from neuroimaging data (Abé et al., 2022). Second generation anti psychotics (SGAs) are often combined with mood stabilizers to manage BD mania and depression. However, previous studies focused on changes in cortical thickness, volume, and area have shown mixed results. On the other hand, there is quite a bit of evidence about the negative impact of anti-epileptics (AEDs) on the brain. While the negative side effects of these medications are known, their use is justified by their effectiveness in managing psychiatric symptoms and improving patients’ immediate quality of life.
In the original dataset, the variables were coded as binary variables, where SGA, AED, FGA, lithium and anti-depressants were all binary variables. However, we decided to recode the medication variables to include 8 levels representing each combination of SGA, AED and lithium usage. FGA and antidepressants were excluded from the new medication variable because only 5% of the BD sample size (N = ~ 1500) were on FGA, while the other medications had at least a proportion ~30% of the sample size, and anti-depressant use was found to be stable in recent prescription practices. Antipsychotics, anti-epileptics and lithium are common involved in polypharmacy for BD.
This results in such a model, where we control the covariates chronological age and sex. We include age to account for a well-known regression to the mean effect where younger individuals are predicted to have older ages and vice versa. Sex is included due to known developmental differences.