AI and the Case of the Disappearing Textbooks

Artificial Intelligence is making the transition to electronic-only publishing a necessity for textbook publishers. GETTY

Stop the Presses! In a recent story, the BBC reported on how Pearsons, one of the largest textbook publishing companies in the world, is getting out of the print business. This is very much like Ford Motor Company announcing recently that they will stop producing cars. While the jury is still out on whether the latter is a good idea, in many respects. The decision by Pearsons has been inevitable for a while.

It is a matter of economics. Most people tend to see publishers as being focused primarily on creating books, magazines, and other content. While that's not an unreasonable assumption to make, it's a bit misleading. A publisher is, in fact, more like an investment bank. Traditionally, when a publisher commissions a book, what they are doing is making a bet that the book will return its total investment costs by a significant amount. Magazine publishers took another route, relying upon the idea that if the content they are producing is compelling enough, advertisers will be willing to place side bets that placing their advertising content within the editorial content will lead to sales of their clients' products, but in both cases, the model ultimately is that the publisher is investing a certain amount of money to seek a profit, as with any other company.

Textbooks are an interesting quandary in the publishing world. The cost of actually creating the original content for the textbook is a comparatively small percentage of the overall costs, but for a textbook, those other costs — securing the rights for images or commissioning them outright, editing the content, indexing, prepress, printing, promoting and distributing can mean that most textbooks can cost between $50,000 and $100,000 to make, and some can end up costing more than a million dollars.

What makes this even more of a risk is that a given textbook's primary audience is students. For secondary education and below, this cost is ameliorated by a school district buying the books for use by all the schools within the district. For college textbooks, the publisher is reliant upon individual teachers deciding to buy their particular book for a class. Either way, the audience is comparatively small by publishing standards, which is one of the reasons that the cost of textbooks tends to be higher than it is for general entertainment content.



The Push towards Digital Publishing

The rise of digital publishing, the Internet and increasingly AI have completely upended that equation. Until comparatively recently, those students represented a captive audience — if they wanted to take the class, they had to buy (or have someone subsidize the buying of) the textbooks. Because this created a (larger) market, the cost per book including profit was lower, though still high by book cost standards.

The Internet (and most notably Amazon) ate away at the distribution side, initially by making it easier to sell slightly used books at a considerably lower cost point that (from the publisher's perspective) were money not coming to them. Not surprisingly, publishers were forced into a position where they had to raise the prices of the book to eke ever-smaller margins. This pushed the costs of textbooks into the stratosphere, which is where the second whammy hit publishers like Pearsons.

Professors were faced by uprisings from students already faced with crippling student loans and began to use more and more material from the Internet (or publishing their works to the Internet). Not only was it far less expensive, but the professor could teach their students what was important to them, not what was important to the publishers.

This material was also searchable, which meant that students were more likely to go to just the information that they were seeking, rather than having to read the entire passage. Advances in semantics and machine learning also meant that the indexes could be turned into semantic links that made it easier to navigate across related content and made for easier synopses. Now, purists might argue that one of the reasons that you assign reading in the first place was so that students would read everything, but when you're dealing with a heavy course-load, taking the time to read hundreds of pages a night almost invariably placed far second to getting to information that you needed when you needed it. This latter paradigm is ingrained in students these days, and the arguments against, while arguably compelling, nonetheless failed to make much of a dent in student's behavior.

Pearsons (full disclosure — I have had books published by Pearsons) looked at this landscape, did the math, mostly involving bloody red figures, and bowed to inevitability. Most publishers, Pearsons included, were early to the digital publication game. Long gone are the layout boards covered with wax, the lab that did photo-emulsion processing, rooms filled with editors markup content by hand. The printing presses of today look like the hive mother of a colony of digital printers, where PDF files go in one end and textbooks come out the other.

Yet it is likely that over the next decade, even those sleek modern presses will disappear as well because they represent the absolute limit to the efficiencies that can be wrung from the system. Pearsons is not the first to recognize this, but it is one of the largest. The days when they could sell processed dead trees to students has come to an end.


Entity extraction creates associations between a document and a knowledge base.   KURT CAGLE

Entity extraction creates associations between a document and a knowledge base. KURT CAGLE


Semantics and AI Making Textbooks Relatable

That does not mean that they cannot still make a profit, and this is one area where publishing, semantic technologies and AI all factor in. One of the areas where publishers are beginning to see significant value is in the domain of semantic publishing. Put simply, when a book is produced, a semantic publishing system breaks that book down to its constituent chapters and sections, assigning each of them an identifier.

Each section is then scanned, often through some kind of machine learning algorithm to develop both summarized descriptions and semantic relationships with known data, and this information is then stored in semantic knowledge bases. Proximity makes a difference (the closer two sections, the more likely they are highly related) but other factors, including the degree of semantic overlap, also determine fitness. If you have two different sections in different books that have several common tags, chances are pretty good that they are more likely similar in content.

What has held back adoption of such approaches is that until comparatively the tagging process needed to be done by hand. Increasingly, though, machine learning is becoming sophisticated enough to do that automatically (with perhaps some manual curation in place to better meet editorial objectives), meaning that documents can be tied in with other documents down to the section level, creating deep relationships that can both provide value to students and can be used by applications to better shape the presentation of that content to different audiences.

This also provides significant dividends to publishers. They can see, at a glance, what topics are most in-demand and that are likely to be poor areas for commissioning new content. They can track usage at the conceptual level to see which authors are considered authoritative and which aren't, and they can also use this information to feed social media strategies at a micro-level. Publishers can also customize content to each user based upon that user's access level, familiarity with the concept, and sophistication.

Each publisher also establishes themselves as the host of their knowledge bases, securing their brands in an era where brands are often swamped by social media signals. For book publishers, in particular, this also opens up a possibility that has usually been out of their reach — the ability to secure advertising dollars to help buttress the bottom line.

The other big aspect of this shift is that textbook publishers are now going the route of other media companies and exploring subscription as a basis. Subscription offers both risks and benefits, but for textbook companies, the book question was just how to make if feasible. As a student, you would subscribe to a book online until the class was done, rather than buy the book outright. You could, of course, purchase it directly to keep it in your digital library, or even go to the option of purchasing a book as a print-on-demand book if you wanted to have it available to read in hard copy. Subscription, however, is the Holy Grail, especially with the idea that you could lock in a subscription fee at the beginning of your time in college (or, especially for school districts, could lock in a block contract for subscription licenses) that would allow you to access to a set number of books over that period, regardless of the books themselves.


While digital books can't capture that new book experience, they can go a long way towards making textbooks intelligent. GETTY

While digital books can't capture that new book experience, they can go a long way towards making textbooks intelligent. GETTY


Summary

Watching giants like Pearsons go all out in the digital space is bittersweet. New generations will grow up not knowing the satisfying feelings of cracking open the spine of a new book, of paging through mint-hard pages in pursuit of knowledge and so forth, of the slightly oily smell of ink on paper, of doodling in the margins. However, print-on-demand can give you that same “new car” experience.

Faced with a reality where publishers will be priced out of the market entirely or being able to create a product that benefits both them and their readership, the choice is clear — textbooks are going virtual.


Kurt Cagle is Managing Editor for Cognitive World, and is a contributing writer for Forbes, focusing on future technologies, science, enterprise data management, and technology ethics. He also runs his own consulting company, Semantical LLC, specializing on Smart Data, and is the author off more than twenty books on web technologies, search and data. He lives in Issaquah, WA with his wife, Cognitive World Editor Anne Cagle, daughters and cat (Bright Eyes).