Characterizing Narrative Content in Web-scale LLM Pretraining Data

Published in arXiv, 2026

Recommended citation: Johnson, T., Ash, E., Piper, A., & Antoniak, M. (2026). "Characterizing Narrative Content in Web-scale LLM Pretraining Data." arXiv preprint arXiv:2606.19468.
Download Paper

BibTeX Citation
@misc{johnson2026characterizingnarrativecontentwebscale,
      title={Characterizing Narrative Content in Web-scale LLM Pretraining Data},
      author={Teagan Johnson and Elliott Ash and Andrew Piper and Maria Antoniak},
      year={2026},
      eprint={2606.19468},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2606.19468}
}