Recently there have been various calls for a U.S. National Digital Library. Well, perhaps Prof. Darnton and David Rothman may disagree, but I think we already have a major start in the Hathi Trust. Or maybe they don’t disagree at all — certainly it isn’t my place to put words in their mouth.
But for my money, the Hathi Trust is the closest thing we have to a National Digital Library, and it’s fairly close from a variety of perspectives:
- It has the goods. There are now well over 7 million volumes (and over 4 million book titles) in the Hathi Trust collection. And it continues to grow.
- It has the vision. See the Hathi Trust mission and goals.
- It has the technical chops. See this.
- It’s working on the governance. In a recent press release, a “constitutional convention” was announced for 2011 at which the members “will define HathiTrust’s next phase of governance and shape future directions for the partnership.”
Sure, you can point out all the ways in which it might not (yet) fulfill all the potential roles of a National Digital Library. But it may someday, and perhaps even without the kind of significant government investment being suggested by some as a requirement.
No related posts.








Thanks for helping to keep the national digital library issue alive, Roy. (I’ll remind LJ readers of the existence of my related pieces on the Atlantic site–see Web addresses below.)
At the same time, let’s please remember that the culture-oriented priorities of Prof. Darnton and the institutional priorities of HathiTrust are not right for society at large. We need a truly universal national digital library system. It should serve everyone from schoolchildren to Darnton-type academics in the humanities and other scholars in the scientific, technical and medical communities. The system should also help disseminate wealth-creating content such as guidance for small business people and multimedia for job-training.
In the case of HathiTrust, may I quote from the group’s own mission-and-goals statement? “To build a reliable and increasingly comprehensive digital archive of library materials converted from print that is co-owned and managed by a number of academic institutions.” The needs of academic institutions will not necessarily be the same as those of public library-users. Furthermore, libraries should develop born-digital collections, among other content, and be interactive at the system level. And what’s this about “converted from print”?
I’ve been writing on these issues for some 18 years as part of the evolving TeleRead proposal and now the National Information Stimulus Plan. I continue to be disappointed by the insistence of so many in seeing national digital libraries in terms of own needs–or their institutional allies’–rather than those of society at large. Especially I’m frustrated over the treatment of public libraries as an afterthought, whether the issue is content or governance. Hathi’s mission-and-goals statement could not be a clearer example of the problem: “To dramatically improve access to these materials in ways that, first and foremost, meet the needs of the co-owning institutions.” Oh, Hathi could always try to doll itself up as a “National Digital Library,” but, given statements like the one just quoted, I don’t think that would fly. Hathi’s heart just would not be in the right place.
Simply put, America needs a true national digital library system answerable to the citizenry–a genuine public system (albeit with philanthropic donations helping out, not just tax money alone, and also with links to other collections).
I don’t want the academically oriented Haithi, the profit-driven Google or anyone else to reduce the importance of public libraries, even though I can see Haithi and the others as contractors and other participants in the national system serving a variety of needs. Furthermore, I think this should be a government investment with some help from the nonprofit sector.
To borrow a good argument used by educators, we don’t hold bake sales to finance the Pentagon, and similarly a national digital library system should not be so reliant on foundations, which so often are out of touch with the needs of ordinary Americans. I recognize the financial challenges of a genuine public approach and have addressed them in my National Information Stimulus Plan on the Atlantic Web site.
Thank you,
David Rothman
davidrothman@pobox.com
703-370-6540
My Atlantic Web articles:
The library system vision…
http://www.theatlantic.com/personal/archive/2010/11/why-we-cant-afford-not-to-create-a-well-stocked-national-digital-library-system/66111/
Cost justification for the library system, via the National Information Stimulus Plan….
http://www.theatlantic.com/technology/archive/2010/06/guest-post-david-rothman-on-the-ipad-stimulus-plan/58539/
——- ——- ——-
My earlier thoughts on Roy’s views and the Darnton plan:
http://www.solomonscandals.com/?p=8707
——- ——- ——-
Haithi Mission and Goals
http://www.hathitrust.org/mission_goals
A governance-related news release without a single mention of public libraries
http://ns.umich.edu/htdocs/releases/story.php?id=8121
It is a good start, but I am cautious about HathiTrust’s founding business relationship with Google being a hindrance to what would otherwise be policy goals. HathiTrust right now won’t let you download a PDF of even an out-of-copyright or public domain book — unless you are an authenticated HT member. My impression (and I would be pleased if HT could clarify if I am correct or not) is that this is because of contractual restrictions from Google, for the google-scanned items that form the foundation (and I believe still the vast majority) of HT’s holdings.
Of course, HT could conceivably enforce such a restriction for Google-scanned images that is not enforced for non-Google-scanned-images. That such is not being done now probably has something to do with the fact that the vast majority of holdings right now are google-scanned, and also the increased technical difficulty of keeping track and enforcing disparate contractual restrictions from disparate sources (even of public domain materials) .
While those technical difficulties aren’t too onerous when talking about a feature like allowing PDF downloads, it gets a lot confusing if you start talking about aggregate analysis of the entire corpus. We haven’t seen Google’s contract with HT, but they’ve seen the terms proposed as part of the GBS settlement for an HT-like entity, and they put restrictions on who such an entity is allowed to provide aggregate or whole-index access to, and what uses are allowed to those people. Mainly, from my reading, to make sure that such an entity can’t compete on a large scale with Google Books itself, but can only be used by ‘researchers’.
A repository founded on a contract with a particular commercial entity, with terms limiting uses and access to prevent competition with that commercial entity — may not be the best foundation for a national library serving the public with a public policy goal.
These barriers are not insurmountable — Google arguably (and I use that word ‘arguably’ intentionally and carefully) has a right to restrict what uses the images it scanned can be put to, but the HT repository is not restricted to google-scanned content, and MAY get an increasing amount of non-google content. IF someone scans and it deposits it with HT. And if HT gets enough resources to create the infrastructure to enforce google-demanded restrictions on google-content without over-encumbering non-google content without (we hope) such restrictions. Non-trivial tasks all. So it’s definitely not enough just to think “Oh, HT will do it” without someone providing the resources.
The same thing regarding resources could be said of the Internet Archive. Which is trying to do some of the same stuff as HT, but generally without google scans, and generally without contracts with anyone limiting what they can do with public domain material. I’m curious why you choose to highlight HT here instead of IA.
Jonathan, we do keep track of the source and provide full PDFs when appropriate. See this book from the CDL’s Internet Archive work: http://babel.hathitrust.org/cgi/pt?id=uc2.ark:/13960/t87h1fb2j and this manuscript scanned at Michigan: http://babel.hathitrust.org/cgi/pt?id=mdp.39015079129147 . Image source and any restrictions included are part of the overall metadata we track and build into the access system. We are committed to providing as much access as legally possible.
Thanks to Jonathan Rochkind for his informative comments about HT-style organizations.
Given all the infrastructure and other work needed–and most of all, the cost of actual content–I myself believe we might as well start the national digital library system from scratch while using the contracting model with existing collections. Build the system around the needs of libraries as a whole, not nonprofits established to serve special constituencies! That’s just one component.
What’s more, an HT-based national digital library system would face all kinds of issues not mentioned by Jonathan Rochkind, such as governance to include public libraries if it’s to be a true system. Or how about development of collections of K-12 textbooks and other appropriate reading for people in that age groups? Or integration with local schools and libraries? Public institutions will relate better to another truly public institution. Or are people saying we ought to turn our library and school systems over to foundations? Sad (even though I think students at private schools most certainly should be able to use public libraries and I also don’t mind if charter schools benefit along the way).
Of course I’m fervently in favor of picking up existing content from Google, HT, you name it, as long as the national digital library system treats them like contractors rather than letting them run the show. What’s more, I want to see fair compensation for commercial publishers–rather than problematic copyright-law modifications to clip them (even though, yes, our copyright laws badly need reform in areas like treatment of orphans and perhaps academic journals, given the outrageous charges). These cost will add up, and to sell the library system idea politically you’ll need cost-justification, the very stuff about which I’ve written in my National Information Stimulus Plan on the Atlantic Web site (see my earlier post for the URL). A genuinely public national digital library system would lend itself far, far better to this approach.
Thanks,
David Rothman
davidrothman@pobox.com
703-370-6540
Wow, thanks Chris, that’s great that you do provide non-authenticated full PDF where you are allowed to by the contracts that gave you the material.
It would be awesome if HT could document this stuff better — tell us clearly on a web page somewhere what sources allow full PDFs to the public at large, and what don’t, and what proportion of your collection is currently in which category. I realize that NDA’s on (for instance) contracts with Google may prevent you from going into great detail, but it would be very helpful if you were as transparent as possible.
Of course, for content such as textbooks and fiction, ePub would probably be MUCH more useful than PDF–especially future versions. Typical library users will feel much more comfortable with ePub-style reflowability. Even with some new wrinkles, PDF can really stink on small mobile devices.
That detail aside, it would be terrific at this point to think a little less about technology and rights issues and a lot more about the mission of a national digital library system, as well as governance issues. So far, no one else in this dialogue is saying how much of a voice if any the public libraries would have in an HI-centric national digital library system.
And what about HT’s existing constituency? Is it possible that HT could better serve these institutions by staying focused on specialized academic needs while at the same time participating in a genuine national digital library system as a contractor and in other ways?
The needs of publibs and K-12 libs just aren’t in the DNA of an organization like HI.
Thanks,
David Rothman
davidrothman@pobox.com
703-370-6540
(URLs of related writings mentioned in my first comment here)
> The needs of publibs and K-12 libs just aren’t in the DNA of an organization like HI.
Er, HT. By any name or initials! – DR
[...] Roy Tennant declares on his blog at Library Journal that HathiTrust is our national digital library. Posted in HathiTrust. Leave a Comment » LikeBe the first to like this [...]
Has anyone considered that:
* Digitizing copyrighted works (or at least distributing those works, whether for profit or not) without the permission of the copyright holders is illegal?
* That publishers and writers expend enormous amounts of time and money to create books, and that they need to get paid?
* That “lending” e-books is equivalent to giving them away to all borrowers and everyone those borrowers choose to give the books to? Whatever DRM is or is not used, copies of the page images end up on the user’s hardware and can be easily used.
* That not paying for books may save libraries money, but will destroy our writing and publishing culture?
It’s exploitative to expect writers, editors, illustrators, graphic designers, indexers, translators, marketers, and all the other people who work to create books, to labor for free. Creators of works need to pay for their groceries just like the readers of books do, and just like librarians do. Furthermore, considering how low the price of most books is, they are not exactly breaking the budget of the average reader. Most books cost less than a large pizza.
Sorry, I’m not supporting libraries at the expense of supporting my own profession in writing in publishing. In fact, given the parasitic and even predatory attitude they are now displaying towards copyrighted books, I no longer support libraries in any way at all.
Frances, I really think you have the wrong impression. Libraries are not distributing (or making accessible online) copyrighted works. I actually think you would find librarians very mindful of both copyright and the living it can afford those writers who are either lucky or good or both. Many of us are also writers ourselves.
Although there are many copyrighted works that Google and others have digitized, they are not openly accessible anywhere. We understand that for works under copyright we must have a solid legal footing to do what we do with those works. So I’m a bit puzzled by your phrase “parasitic and even predatory attitude [libraries] are now displaying towards copyrighted books…” I assure you that we aren’t seeking to wrest your works from your hands and throw them open for all the world. However, there are many, many books that are presently unavailable that could be made available because no one has cared enough to maintain the copyright — so-called “orphan” works. It would be a shame to toss those works in the same lockbox as books that are still protected. And yet that is exactly where things stand now.
Roy,
Certain libraries made millions of copyrighted works available to Google for scanning. Without, apparently, either the libraries or Google ever checking copyright dates or in-print status. Many in-print works were scanned, including some of mine and others by authors I know. No one attempted to locate the copyright holders, or even to establish any standards for what is and is not a so-called orphan.
The proposed Google Settlement lumps together ALL works by All authors and publishers. That is, except for the largest publishers who reserved the right, within the Settlement, to cut better deals for themselves outside the Settlement, as many have already done for the new Google e-book store. The rest of us, including many live and active authors and micropresses, may be stuck with the terms unless we have already explicitly opted out. Many rights holders did not opt either in or out because they did not hear about the Settlement before the deadline, or did not understand its 350+ pages of confusing legalese. I paid a fair chunk of change to hire a lawyer to guide me through it and write an opt-out letter for me–not everyone was willing and able to do that. If the Settlement is approved, all these live, active rights holders will all be opted in by default.
The terms of the proposed Settlement are onerous. As just one example,, they remove the right of an author to sue either their publisher or Google if the terms of the Settlement are violated. Yet, some authors opted in for fear Google would sell their work anyway and just not pay them for it.
I’ve been a book collector for over 30 years. The so-called orphan works were by definition published in the US after 1923–which means they are in many cases not especially rare. Their not being in print does not by a long shot constitute unavailability. An enormous number are cheaply available in used bookstores and for free borrowing or ILL from libraries–where many of them have been languishing with little demand. Not being available free or in e-form does not mean these works are unavailable and does not excuse massive copyright violation.
And, I think a lot of those copyright owners could be located. My deceased parents wrote works that are probably considered “orphan” because they have been out of print for awhile, and to which I and/or my brother have inherited copyrights (the estate is in probate). (I know at least one was digitized by Google and was made available for free download.) It’s easy to cry “orphan” when you’ve made no attempt whatever to locate the “parent.”
I don’t think libraries are any more altruistic than Google. Both are seizing works for economic reasons and then plastering altruistic PR over their actions.
Roy,
I will add that “no one has cared enough to maintain the copyright” means nothing legally. During the period in which registration renewal was required, renewals were done with the US Copyright Office, who still has the records of them. Now, renewals are no longer required. The copyright expires when it expires. There is no legal need to “maintain” it. Which is as it should be. It can take time for a work to be right for a market, sometimes many years–for example Tolkien’s works. It can take time to revise a work for a new edition. It can take time for an author to find a new publisher. It is not the right of either Google or libraries to snatch the work before the copyright has expired, and say, “Since you’re not making any money off it, I get to make money off it.” Or in the case of libraries, save on shelf space or acquisitions fees.
If you want to change copyright law, go to Congress, don’t just scan works and then try to seize the rights to them with a private lawsuit. And when you go to Congress, I’ll be fighting you as hard as I can. But, I bet Congress would at least come up with some definition of “orphan” more believable then “hasn’t been in print for a few months.”
In case this really is not clear to anyone, US copyright protection does not depend on the work being in print, it does not depend on how long the work has been out of print, and it does not depend on the work ever having been published at all. There is no legal obligation to “maintain” a copyright in any way, either by keeping the work in print, by re-registering (except in the time period that required it, and as I said you can look that up the public records), or by any other means. A publisher and author have no legal or moral obligation to constantly make their work available to the public. They have every right to do so only when, how, and as it benefits them financially.
And, I think it is highly disingenuous for the Google Partner libraries to lend Google millions of copyrighted books from which they knew Google as well as themselves would massively benefit financially, craft contracts with Google and language in the Settlement that gave the libraries all the benefits of it and full legal immunity, set up a system to exchange free copyrighted books on a major scale, and then say, “We’re not commercial, we’re just trying to confer benefits on the public.”
Hello ALA, where are we??? Trying to project my role and the private school library of the future has me looking at the tech piece, looking at the reading specialist piece, and still wanting to be a librarian guide on the side..who tries to perpetuate appreciation for the book.., as well as use all the online databases and ebooks….Why must I give up shelving for Presentation Creation Space? We need substance taught first..creation is a mere integration sideline in learning..come what bells and whistles may..Find the professional presentations online and off first! Then empower new creators to do something with what is already out there, to expand it, to go in depth into the subject. What if we just try to focus on this a while professionally! My holiday present was a student who said they had read beyond ten minutes, and discovered the amazing diversion of a book!! Low tech, low budget library model suits the economy..so why are we neglecting this? NYT still has a weekly review section so someone must be reading..offline as well as one..Focus, folks.
[...] For the complete article: http://blog.libraryjournal.com/tennantdigitallibraries/2010/11/19/the-hathi-trust-is-our-national-di... [...]