Data Privacy, By Ref Sharing and the End of Business Cards (Maybe)

null GETTY Description: Business card being exchanged

You're at a conference. If you're properly prepared, you've made sure that you have a box of business cards with nice shiny graphics, embossed letters and no doubt a witty or profound saying on the card. If, on the other hand, you're someone like me, that box of shiny cards is sitting on your bedside table at your house, where you put them while you were packing everything else and then promptly forgot about. Your house is now sixteen hundred miles away, and so you're forced to cadge everyone else's cards to remember contacts.

At the end of the conference, you have a couple of dozen cards in your pockets, some from people you had long, potentially lucrative conversations, some from booths that had potentially worthwhile tech or services available, one from Barney the Clown which you're mystified about. You have a card scanner that you bought a few years back in a moment of industriousness and tech lust gathering dust somewhere in your office, but most of those cards end up in a box near the laundry so that when you wash your clothes those business cards don't end up as pink lint in your drier.

Data by Copy - The Way the Data World Works Now

There they sit for six months until the box is overflowing, forcing you to take the time to sort through those cards and figure if there are any that are worthwhile to keep. Once you enter this information into your contact list, the data in these cards will become (mostly) immutable. Yes, you can update that information, but once you create a contact, it's more likely that you will simply create a new contact than update an existing one because it's easier for most applications to add than it is to edit.


 

Data by Copy is how most databases tend to be updated, creating multiple records for the same individual or resource.  KURT CAGLE

Data by Copy is how most databases tend to be updated, creating multiple records for the same individual or resource.KURT CAGLE


The problem with most contact lists (and for that matter most database entries) is that they are fundamentally obsolete the moment they are created. Programmers talk about this by saying that a data record is a temporal snapshot - an indication of the state of a given entity at a specific time. People change jobs, move, change their names (and even genders on occasion). The same person may appear under different names, different keys (identifiers), sometimes maliciously, but usually just because the data entry system was stupid and unable to identify when there was a match between two people.

The biggest issue with this is that once in the database, finding and culling such duplicates can occupy a significant amount of time, if it is done at all. This complexity can mean that information isn't sent to the right address, payments may be made but never completed, resources may be mislabeled and lost or sent to the wrong person, medications meant for one person are sent to another, and so forth. The cost of rectifying the error, the loss of goodwill, and the risk of security breaches for an organization could reach into the millions of dollars.

Data by Reference - The Way the Data World Should Work

One solution to this is to move from a By Copy model to a By Reference model for data. In a by copy model, information from one data source is copied into another data source. In a by reference or by ref model, the owner of the data doesn't give the person requesting the data the content itself. Instead, they give the requester a key. That key will allow the requester to access the data and use it, with the understanding that the information so contained could change at any time.

This has several implications. In a by reference model, a person can give out different keys that allow people to only access certain information. In a Patient Health Information setting, a given record may have one key that will let patients see their prognoses and treatment information but may not let them see internal doctor's notes, another that will let doctors see the patient prognoses and treatment information along with their private notes, and a third key may provide medical researchers access to medical conditions and treatments without seeing personal information that may identify the patient.

By reference, systems also provide a degree of protection that by copy systems don't. A stalker who gains access to your contact information can have that information revoked. If the key is self-renewing, a stale key won't provide any information at all. Indeed, such systems are increasingly being used by banks and financial systems that ensure that credit card numbers and similar financial identifiers are never passed directly, but only by reference using cryptographic keys.



 In the data by reference model, the data record on the client's side has a key to the server given by the server that accesses the appropriate profile. The app then synchronizes regularly with the server to ensure the keys are always up to date. KURT CAGLE

In the data by reference model, the data record on the client's side has a key to the server given by the server that accesses the appropriate profile. The app then synchronizes regularly with the server to ensure the keys are always up to date. KURT CAGLE


This is how the chips on modern debit cards work. The chips don't contain the information themselves that identify the card and authorize the transactions. Instead, they contain by ref keys that ascertain that the holder of the card has the private keys for the transaction. This means that at no time does the vendor have access to the customer's financial information and have to settle, instead, for a query asked simultaneously of the customer and their financial provider that if the transaction were performed, the customer has the means to pay for that transaction.

This approach has worked quite successfully in the real world. While stolen credit card number thefts were a huge problem in 2010, the introduction of by ref cryptographic keys in chipped cards has reduced the number of such thefts dramatically, to the extent that cybercrime has shifted more to ransomware. Separate encryption chips and the use of blockchain-like ledgers will likely have the same effect upon ransomware, by requiring that any system-level commands validate through some form of rotating keys that can be verified by cross-comparing different blockchain ledgers.

Data access by ref can be taken one step further. If you have a consistent ontology, queries can be made without necessarily revealing identifying information (such as “are you sufficiently old enough to drink alcohol in the state that you’re currently in?”). Barring this, it is still possible to limit the amount of information provided to a given requestor to the minimum necessary.


GDPR ultimately needs data by ref to have any teeth, as it moves control of data access from the data consumer (marketing companies and the like) back to data providers (individuals). GETTY

GDPR ultimately needs data by ref to have any teeth, as it moves control of data access from the data consumer (marketing companies and the like) back to data providers (individuals). GETTY


Profiles and Privacy

App creator Stay Touch applied this same type of by reference key model to their eponymous application. In this particular case, the owner of the application can set up multiple profiles, each of which provides a particular window into their data space. In an environment such as a conference, the user can set which profile they want to make available to other owners of the app (or can email a link to the other person that makes downloading the app possible. The recipient will end up seeing a by reference version of the user’s data, configured to only show that information that the owner wishes to show.

What's more, if in the future, the owner changes their email or other information, this will automatically be synced to all owners of the exchanged keys, updating the information in real-time. If the owner of that data decides they'd rather not the other person have their info, they can revoke the key at any time, which also removes the information from the other person's system. This makes possible the ability to minimize the impact of data sharing that you later regret.

This approach is surprisingly powerful and provides an obvious benefit to the owner of a set of data by making sure that only the most up to date information are available to others. However, on the other side, this has the potential to be costly for marketers, as the supply of accurate, timely data eventually dries up.

This cost may not be a bad thing. For the last several decades, there has been an imbalance where marketers gained an inordinate amount of information about everyone, while at the same time remaining largely anonymous themselves. As with any market, this has pushed the value of up-to-date personal information to near zero and prompted massive abuses of privacy and social manipulation.

This also prompted efforts such as the European GDPR and the California consumer protection act (CCPA). The challenge for these is that so long as information is passed via copies, such laws are relatively toothless, but by controlling access to the information by reference, rather than the information itself, this shifts the balance of power into the hands of the data owners in a manageable way - and actually turns the information about a person into a more valuable community that can be used for bartering into services.

As a side note, this by reference approach is very much in accord with the way that information is managed in Linked Data networks. Information is managed by pointer (another way of saying "by ref"), and systems can be set up such that the access to information can be controlled at the property, class and graph level, as well as via constraints in multiple dimensions.

So, while I doubt seriously that the small rectangular business card is likely to go away any time soon, perhaps a “by ref” world may end up being a better model for sharing data for all of us.

Follow me on Twitter or LinkedIn.


Kurt Cagle is Managing Editor for Cognitive World, and is a contributing writer for Forbes, focusing on future technologies, science, enterprise data management, and technology ethics. He also runs his own consulting company, Semantical LLC, specializing on Smart Data, and is the author off more than twenty books on web technologies, search and data. He lives in Issaquah, WA with his wife, Cognitive World Editor Anne Cagle, daughters and cat (Bright Eyes).