By now it's no secret that digital data is generated by the truckload and that it can be worth its weight in gold.
But that knowledge isn't half as important as understanding how you can tame the data beast and then wring out every drop of its value.
Naturally, creative and resourceful people in one place or another are always finding new processes and applications that'll make better use of their data. So we'll explore some of today's dominant data utilization trends and leave predicting tomorrow's technology for the pundits.
This chapter was taken from my book, Keeping Up: Backgrounders to All the Big Technology Trends You Can't Afford to Ignore. If you'd prefer to watch this chapter as a video, feel free to follow along here:
What Exactly Is Data?
Before priming our understanding of what's available to help us work productively with data, it's a good idea to first define exactly what data is.
Sure, we saw plenty of great individual examples in my previous article on Managing Data Storage, including the huge volumes of performance and status information produced by digital components of complex systems like cars. But that's not the same as a definition.
So then let's define it. Data, for our purposes, is any digital information that is generated by, or used for, your compute operations. That will include log messages produced by a compute device, weather information relayed through remote sensors to a server, digital imaging files (like CT, tomography and ultrasound scans), and the numbers you enter into a spreadsheet. And everything in between.
Which brings us to big data – another one of those buzz phrases that get thrown around a lot, often without accompanying context or clarity.
On a first glance, you'd probably figure that big data describes data sets that come in volumes higher than traditional data software and hardware solutions are capable of handling.
Indeed, your way of figuring it would be largely correct. Although we could add one or two secondary characteristics. The complexity of a data set, for instance, is also something that could force you to consider big data solutions. And sets of data that must be consumed and analyzed while in motion (streaming data) are also often better addressed using big data tools.
It's worth mentioning that big data workloads will often seek to solve large scale predictive analytics or behavior analytics problems. Such problems are common within domains like healthcare, Internet of Things (IoC), and information technology.
With that out of the way, we can now get to work understanding how – and why – all that data is being used.
Virtual Reality and Augmented Reality
What? Plain old reality suddenly not good enough for you?
Well yes, in some cases, plain old reality really isn't good enough. At least if you have a strong interest in engaging in experiences that are difficult or impossible under normal conditions.
A virtual reality (VR) device lets you immerse yourself in a non-existent environment.
The most common examples of currently available VR technology feature some kind of headset that projects visual images in front of your eyes while tracking your head movements and, in some cases, the way you're moving other parts of your body. The visual images will adapt to your physical movements, giving you the sensation that you're actually within and manipulating the virtual projection.
VR has potential applications in educational, healthcare, research, and military fields. The ability to simulate distant, prohibitively expensive, or theoretical environments can make training more realistic and immediate than would be otherwise possible.
VR technologies have been arriving – and then disappearing – for decades already. For the most part, they've focused on providing immersive gaming and entertainment environments. But they've never really caught on in a big way beyond the niche product level.
This might be partly due to high prices, and because some people experienced forms of motion sickness and disorientation.
But maybe – just maybe – (insert the current year here) will finally be the year VR hits the big time.
But where VR can leverage data in a really meaningful way is when, rather than blocking out your physical surroundings, the virtual environment is overlaid on top of your actual field of vision.
Imagine you're a technician working on electrical switching hardware under a sidewalk. You're wearing goggles that let you see the equipment in front of you, but that also project text and icons clearly identifying labels for each part and that show you where a replacement part should go and how it's connected. This is augmented reality.
I'm sure you can easily imagine how powerful this kind of dynamic display could be in the right conditions.
Surgeons are able to access a patient's history or even consult relevant medical literature without having to divert their eyes from the operation. Military pilots can similarly enjoy "heads up" displays that show them timely status reports describing their own aircraft and broader air traffic conditions without distraction.
Artificial Intelligence and Machine Learning
As a rule, computers are even better at performing dull, repetitive tasks over and over again than bored teenagers pretending to do homework. And they make less noise in the process.
The trick with computers is to cleverly string lots of dull, repetitive tasks together so that they can approximate intelligent and useful behavior.
The prize at the end of that road is called automation. Or, in other words, a state where computers can be confidently left alone to perform complex and useful tasks without supervision.
In many ways, we've been living in an age of sophisticated computer automation for decades. Domains as diverse as security monitoring, urban traffic control, book manufacturing, and heavy industry are already being handled with little or no human supervision.
But artificial intelligence (AI) seeks to go beyond relatively simple repetition to train computers to think for themselves – and thereby efficiently solve far more difficult problems.
Great idea. Somewhat harder to achieve in the real world.
What can AI actually do?
Understanding how effective AI can be will depend on what you expect it to do. Can you design software to search for and flag a handful of suspicious financial transactions from among the millions of credit card transactions a large bank processes? Yes. (Although I'm not quite sure that's truly AI at work and not just automation.)
Can you deploy "intelligent" chatbots on your website to help customers solve their problems without needing actual (and expensive) human interaction? Yes. In fact, I just had a surprisingly effective conversation with my mobile phone carrier's chatbot that did quickly solve my problem.
Can the first stages of a rocket you've just used to launch a payload into space use AI to guide it to a safe landing on a moving platform in the middle of the ocean? If you'd ask me, I'd say it's impossible. But SpaceX went ahead anyway and did it multiple times. Good thing they didn't ask me.
But can AI reliably make strategic decisions that intelligently account for all the many moving parts and complexity that exist in your industry? Can an AI-powered machine pass the Turing test (where a human evaluator is unable to be sure whether the machine is also human)? Perhaps not just yet. And perhaps never.
One tool used in many AI processes is the neural network. The original neural network consists of the many neurons that carry information about the state of a biological environment to the brain.
Artificial and virtual neural networks are systems for assessing, processing, and responding to the large physical or virtual data sets that feed AI-controlled systems.
Such data can come from cameras or other physical sensors, or from multiple data sources. The processed data can sometimes be used for predictive modeling, where the likelihood of future outcomes are compared.
Exciting stuff, to be sure. But the tools used for some of the most significant accomplishments attributed to artificial intelligence aren't actually artificial. Nor did they necessarily require all that much intelligence.
For example, Amazon Mechanical Turk (MTurk) is a service that connects client companies with remote freelancing "human intelligence" workers. The workers will, for what usually amounts to dreadfully low pay, perform "mechanical" tasks like labelling the content of hundreds or thousands of images. The labelling will cover areas like "is the subject a male or female?" or "is the subject a car or a bus?"
It could be that, over time, services like Mechanical Turk will become less important as improving AI methodologies might one day completely replace the human element for this kind of work. But in the meantime, MTurk and its competitors are still steaming along at full speed, churning out millions of units of "artificial" artificial intelligence.
One methodology that can help reduce reliance on human intervention is machine learning (ML).
How can machine learning help?
ML works by leveraging various kinds of manual assistance to help achieve greater task automation. An ML system can hopefully "learn" how to manage our tasks by being exposed to existing training data. Only once the system has demonstrated sufficient skill at solving the problems you have for it, will it be let loose on "real world" data.
These are some common approaches to training your ML system:
- Supervised learning lets the ML software read data sets that include both "problems" (images, for example) and their "solutions" (full labels). By seeing enough of the provided examples, the system should be able to apply its experience to similar problems that arrive without solutions.
- Unsupervised learning simply throws raw data without any associated solutions at the system. The goal is for the software to recognize enough patterns in the data to allow it to solve the problems on its own.
- Reinforcement learning learns from interactions with its environment. Ideally, the software recognizes and understands positive results and evolves its methodology to reliably and consistently produce similar results.
- Deep learning algorithms apply multiple layers of analysis to transform the raw target data. The full, multi-layer process in deep learning is known as the substantial credit assignment path (CAP).
AI in general, and ML in particular, are effective at building tools for tasks like autonomous driving, drug discovery, email filtering, and speech recognition, and for deriving sentiment analysis from massive data sets made up of human communications.
What AI and ML share in common with all the other technologies like virtual reality and augmented reality that we've discussed here – and in that other "How to Manage Data Storage" article – is the need to control and make better sense of the endless streams of information our digital products keep generating. The better we get at this kind of control, the more value we'll get from our data.
YouTube videos of all ten chapters from this book are available here. Lots more tech goodness - in the form of books, courses, and articles - can be had here. And consider taking my AWS, security, and container technology courses here.