Copyright Adjacent Questions When Training Data Intersects Manuscripts
Introduction Large language models (LLMs) rely on vast corpora of text for training. When a portion of that corpus includes manuscripts - texts that may be under copyright protection - the question arises whether the training process infringes the rights of the copyright holders. The issue is not a direct violation of ...
Read More