Your business has its own language. If you sell cars, then you need not only make, model and year, but also MSRP, leather bucket seats and dealer incentives. If you are a dentist, knowing about bicuspids, prostheses and various forms of anesthesia is a must. Media companies have producers and writers, actors and grips, distribution networks and video masters. Business language is code.
This language is not only critical to being able to communicate with others in your organization as well as with your customers, but it also influences how the programmers and data scientists identify those things that their data systems track, accept and analyze. How things relate to one another makes its way into programs, into data designs, influences IT buys, and ultimately dictates whether the data that you collect is actually of value to your organization or simply a waste of a database farm.
From Business Language to Machine Language
While there are a number of different ways you can describe this language, one of the more useful is called an ontology, which quite literally means the study of the names of things. The idea behind ontologies is relatively simple, though it has some profound implications. In effect, imagine all the resources that your company sells, buys or uses to process stuff as classes of resources. In your car dealerships, automobiles, dealerships, sales-people, contracts and so forth would be resources, while in a publishing company, books, authors, editors, publishers, printing machines, printers and so forth are also considered resources.
Resources in turn either have associated sets of values (such as an author's name, or a car's mileage) or have some relationship to another thing or categories. Thus,
individual car will be sold from a given dealership (a relationship between different resource entities) or may be either gas-powered, diesel or electric (a category). Moreover, each resource has both a unique identifier as well as potentially multiple externally issued ones (such as a car's VIN number). The categories can be thought of as adjectives, while the relationships can be thought of as prepositional phrases (such as “has VIN Number 31EAF54915” or “carried by dealership X”)
The concepts, relationships/properties and individual data, and are collectively referred to as an ontology. In effect, the ontology describes the things, the sets of descriptions and the relationships, while the data is then given as a particular data cell. A good analogy is a spreadsheet, where each sheet represents a class of resources, each row a given thing in that class, and each column a property for that class, with the cell being either a value or a link to another resource in a different sheet. The set of rules that describes what properties are in each sheet and how the sheets are related is the ontology, and the set of all cells makes up the data (or triples) is a dataset or triple store in that ontology.
Collectively, the rules, structures and query languages for getting data from this dataset is known as semantics, with the whole of the dataset known as a knowledge base. In other words, the goal of semantics is to make your business language machine-readable.
Semantics: Better Search, Better Relevance, Better SEO
When you read a newspaper article or a web page, a magical transformation takes place where you parse through words and layout, building up new ideas from each sentence, mentally building a summary that lets you abstract why the article is important (or dismiss it if it isn't), identifying key people, places and things, and ultimately gaining new insight. Despite a lot of claims made about artificial intelligence, the vast majority of computers are only just beginning to develop the barest rudiments to allow them to do the same thing.
As the web first emerged in the 1990s, one of the things it most needed was an index, where you could look up a keyword, then trace that keyword back to the document(s) that had that keyword. At first, this process was done manually. Culminating in directories, but as the number of documents climbed beyond the first ten thousand or so, it became increasingly evident that the only way to keep up was to automate this process.
The first search engines were intended to do just this, first by reading all the terms in a document then stripping out those that were unnecessary (so-called stop words), then by creating a link between document and word set. The first to do this was AltaVista, but the first to be successful as a business was Google. These search engines would then use certain sets of criteria to determine when a set of words was presented as part of a search, what document best fit those terms (and by extension, appeared at the top of the list).
It turned out, from a business sense, that being at the top of that list was far better than being on page 2, which was light years better than being on page 3. Nobody wanted to be on page 3. This launched a grand melee and companies appeared overnight that would help companies optimize their search ranking, what soon became known as Search Engine Optimization or SEO. Pretty soon, this became an evolutionary war, with the search engines working to keep their rankings as fair (or at least as optimized for them) as possible, while SEO involved trying to trick the search engines into better pushing a given listing upward by a few slots.
One temporary truce to this effort came when Google, Yahoo, Microsoft and other search engine providers established a set of HTML tags that would best represent keywords that most closely matched the topics of the associated article, a process called tagging. These typically involved terms or concepts that the company was most interested in capturing, which in turn were part of the business language for that company. This metadata had higher priority than scanned content, and it, in turn, made it easier to put articles in the right set of buckets.
Towards a Smart Data Ecosystem
Semantics takes this up to eleven. By marking an article up semantically, you can identify all kinds of interesting things - what things the article (or catalog page) discusses, how much they cost at a certain time, where they are located, what their intended target audience is, what features those things have, what events occurred where these things were significant and even an idea about how reliable the page itself is. You can tie in resources with hash-tag keywords used in viral campaigns, you can embed tracking codes to show how popular an item is - in essence, you can create and shape your own tracking categories. This is having a profound influence on SEO, and is influencing how both web apps and mobile apps get written to provide Big Data measures.
Semantics also makes building such web applications easier and more cost-effective. Comprehensive use of ontologies means that you can make full cycle pipelines from data acquisition to user interfaces to data analysis to dashboards. Google, Bing and other search engines are now accepting semantic data as part of their search strategy, and increasingly social media sites such as Facebook, Instagram, Pinterest and others, are exposing semantically oriented endpoints (programmable web functions) that can tie semantic identifiers through social media, and from there into big data ingesters.
Our company, Semantic SEO Solutions, have been tracking the rise of semantics and ontologies for the last decade, focusing on building the best solutions for not only increasing the visibility of your sites, but also for making your sites more intelligent, easier to build, and far more flexible. Our latest offerings use semantic and machine learning solutions to help your business, to create smarter catalogs, more dynamic interfaces and more relevant content.
Who's doing this? Seventy-five percent of the Fortune 500 companies have some kind of smart data or semantics program underway, most under the banner of 360° initiatives, comprehensive enterprise data systems, or machine learning/data science projects. Amazon has recently added linked data capabilities to their AWS infrastructure with the
Ontologies - it's a big word, but it will have an even bigger impact upon your business, today and tomorrow.
Kurt Cagle is a writer, data scientist and futurist focused on the intersection of computer technologies and society. He is the founder of Semantical, LLC, a smart data company.