Data Systems
To date, research on youth suicide trends has primarily focused on individual-level characteristics with less attention to the contexts and policies in which youth develop.
Imagine a world in which:
we could identify places (easily and in real-time) where progress has been made in suicide antecedents…
state suicide prevention centers could find communities that “looked like theirs” in terms of the demographics of their populations, to try out new ideas locally…
we could understand not only how suicide behavior rates differ across different demographic groups, but whether they differ because of the conditions, services, and policies in which youth of different demographic groups live…
The goal of our data systems work is to bring together publicly-available administrative data at national, state, and district levels to better identify trends in suicidal behaviors, the place-based factors that contribute to these trends, and policy levers for altering them, to offer new solutions for youth suicide prevention. While this work is in the pipeline, we are actively striving towards making information openly and easily accessible (read out the YRBS and ARCADIA packages below!).
YRBS package
To safeguard access to publicly funded data, we developed an open-source package that cleans and stores Youth Risk Behavior Surveillance System (YRBSS) data from 2015 to 2023 (Cañizares & Cardozo, 2025). Since 2023, this package has streamlined research by eliminating the need for SPSS/SAS licenses, providing data in CSV, SPSS, and Parquet formats, and offering intuitive variable names and seamless merging across years. Upcoming updates will enhance functionality, improve documentation, and ensure free, reliable access to YRBSS data for researchers and the public.
📌 Learn more and access the package here!
Arcadia package
The Arcadia package is being developed to facilitate data sharing among team members and IRB-approved researchers in the lab. It provides access to cleaned versions of the three-time-point dataset from the randomized control trial. The package grants flexibility to researchers by offering both cross-sectional and longitudinal datasets, allowing them to explore various research questions. It includes datasets for each time point separately, as well as a longitudinally structured version. It includes:
Data dictionaries, which are accessible directly through the package. This allows users to filter by specific constructs or instruments and retrieve only the relevant items, streamlining the analysis process.
Various functions to support data analysis, including one to score instruments while handling missing values through imputation. This function enables users to compute scores using either sums or averages and provides the option to apply imputation as needed. In the near future, a function will be added to reverse items as necessary, giving users greater control over item scoring.
Tutorials on working with the data and utilizing package features. Currently, tutorials cover topics such as building word clouds, creating tables and comparison analyses, conducting confirmatory factor analysis, and calculating reliability (e.g., Cronbach’s alpha).
As development progresses, additional functions and tutorials will be created based on user needs.
Team