Characterizing Narrative Content in Web-scale LLM Pretraining Data
Published in arXiv, 2026
Recommended citation: Johnson, T., Ash, E., Piper, A., & Antoniak, M. (2026). "Characterizing Narrative Content in Web-scale LLM Pretraining Data." arXiv preprint arXiv:2606.19468.
Download Paper
BibTeX Citation
@misc{johnson2026characterizingnarrativecontentwebscale,
title={Characterizing Narrative Content in Web-scale LLM Pretraining Data},
author={Teagan Johnson and Elliott Ash and Andrew Piper and Maria Antoniak},
year={2026},
eprint={2606.19468},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2606.19468}
}