A recent NYPL project has paid for the already-digitized registration records to be marked up as XML. (I was not involved, BTW, apart from saying "yes, this would work" four years ago.) Now for anything that's unambiguously a "book", we have a parseable record of its pre-1964 interactions with the Copyright Office: the initial registration and any potential renewal.
The two datasets are in different formats, but a little elbow grease will mesh them up. It turns out that eighty percent of 1924-1963 books never had their copyright renewed. More importantly, with a couple caveats about foreign publication and such, we now know which 80%.
Of course, the details matter and NYPL provides those in its own post, U.S. Copyright History 1923–1964. I'll note that there is more work to be done to ensure that the data is correct, including harmonizing the variations of an author's name. For now, read the long post from the NYPL and begin to envision how this will change your use of pre-1963 materials!