Newspaper archive content

Creative works prior to 1924 are public domain in the US, and there are many newspapers still around that are older than this. I’m not a lawyer but theoretically we should be able to freely use pre-1924 articles and, unlike Wikipedia, potentially use every single sentence in an article.

Is there any legal reason why these couldn’t be used? @nukeador

Some sources like http://www.newspapers.com are behind paywalls but the New York Times archive is pretty open and the restrictions are not unreasonable. Also the Library of Congress has newspaper archives (although their site is slow and times out a lot).

Wikipedia also has a big list of archives: https://en.wikipedia.org/wiki/Wikipedia:List_of_online_newspaper_archives#United_States

1 Like

There is of course the question of OCR sentence quality, but browsing some of the OCR-ed articles here, it doesn’t strike me as being vastly different to the efforts already made to clean up the Wikipedia content.

How was the sentences seems like, do they appear to be out-dated style?

Here’s a random article from 1921:
https://cdnc.ucr.edu/?a=d&d=MJ19210210.2.2&e=-------en--20--1--txt-txIN--------1

Ignoring the sentences with extra symbols, which would be filtered out by the import algorithm, it doesn’t appear particularly dated to me.

Looking at the sentences by themselves, they don’t seem to be vastly different in style to the kinds of sentences we already have.

Dense clouds of smoke rolled into the classrooms on the second floor just as the students were gathering for their afternoon session.

With admirable coolness they obeyed the fire drill orders of the teachers and marched from the building.

As they left, smoke was pouring from every window.

By the time they reached the schoolyard the firemen were inside the building fighting the flames and it was not long before the fire was quenched.

The work of thoroughly stamping it out, however, continued for some time.

Holes were chopped in the roof and every crack was drenched with water.

Fire Chief Schneider declared that the machine shop should be detached from the main structure.

It is understood that the building and contents, except for the machines, were fully covered by insurance.

Interesting. @lsaunders is this something we can add to the agenda in our next meeting with legal?

I think this sounds like a great idea and I’ll add it now. Thanks for bringing this to our attention!