About

I am a PhD student at CU Boulder advised by Maria Antoniak. I’m interested in cultural analytics, with a focus on how narratives structure meaning in large-scale text data and how those structures are learned by large language models. My current work is at the intersection of computational narratology and LLM pretraining data curation, where I study how sequences of events, agents, and causal relations are distributed across web-scale corpora and how these narrative patterns shape model behavior. By developing methods to detect and analyze narrative structure in pretraining data, I aim to understand how language models internalize cultural signals and how data curation choices influence downstream generation and interpretation.

Prior to CU Boulder I worked as a ML engineer at General Motors in Austin, TX where I developed and scaled up generative AI solutions. During my undergraduate studies at Carleton College, I participated in research spanning NLP, computational fluid dynamics, and statistical graphics.

I’m always happy to chat, reach out at teagrjohnson (at) gmail (dot) com.