An overview of big data
What sounds esoteric is a basic statement in the system theory of Niklas Luhmann . Whether cell structure, societies or psychology – in the 1980s, the social theorist has thought up many ripples that make us speechless today: big data is one of them.
Digitalisation creates masses of data
Data is the smallest information that once exists only as an image in the mind, later in speech, writing, books and as a file on tablets or computers. One can already guess: technical progress has always ensured that less knowledge is lost. Digitization now makes it possible to forget nothing at all. The more communication that is digital, the more data is generated, transferred and stored. At least transitional.
The system, the communication itself, is archived and ensures its own survival. According to Luhmann’s dictum are the communication metadata, so the data on communications – what, who, when, where and how – the content of the communication in their significance not after. Whether digital communication or sensor and process data, in the correct reading they are all of interest. 300 billion Twitter messages have been sent to date. Every second, 5,000 are added.
Let’s look at some main big data examples and applications in real life:
With Big Data you gain insights from the data masses
The data mountain is getting bigger, completely automatically. As much as possible is stored in search of benefits and advantages. Data octopuses are the companies that do not take people’s interests into account. Inventors are called those who use the data to make the world better. More efficient, resource-saving or faster.
The larger the mountain, the more difficult it becomes to deduce relationships, patterns and statements from it. It is clear that the larger the mountain, the richer the data, the greater the benefit that can be deducted. Big Data makes data mountains usable in oversize.
No matter if the data is loosely connected, changing fast, growing or missing, Big Data is the digital solution to the digital problem of gaining insights from digital data collection.
Intelligent systems built on cloud computers make it possible to denounce a confession in the data stream and to derive statements. Global data volume doubles every two years (Klaus Manhart: IDC Data Growth Study – Double Data Volume Every Two Years, in: CIO 2011). The amount of data on the world’s computers is so great that soon a new word has to be invented: the yottabyte , a one with 24 zeros.
Data is obtained from innumerable sources, but above all from science, internet and communication
Any movement can be understood as a data source: radio waves, electrical impulses or light. The world’s sensors and keyboards digitize content data, metadata, transaction data from banking and business, behavioral recordings of geographic and surfing movements, health records, financial data, science scores, the Internet of Things and private surveillance systems.
The mass data creation is therefore not catch. Particularly in the areas of science, Internet and communication, the generated data mass exceeds every storage option. 99 percent of all measurements generated in the LHC particle accelerator must be discarded. The question of selection and ad-hoc evaluation is urgent.
The search for useful insights: data mining
If you want to use data, you can buy it from providers such as market research companies or use the existing public or private historical and current sources: statistical databases, websites, online stores, address lists, production data, etc. Data is available everywhere in large quantities. But even if an adequate solution to the storage problem was found – as a profit you can not yet call the data.
Data mining is the search for knowledge in the data mountain. The essence of the data fruits are patterns, models, statement, hypothesis checks. Clever technicians, programmers, statisticians and people who are looking for reliable statements and can interpret the results need a good technical infrastructure to extract useful information from the information jungle.
The harvest and preprocessing is also crucial for wine and coffee. The search of the analysts for the essence of their fruits is much less enjoyable. To solve the abstract and technical problems are hard creative tasks.
The knowledge discovery in databases with statistical significance
Knowledge discovery in databases (“KKID”) describes this part of the Big Data world better: Not data, but knowledge is gained during data mining. And new knowledge is good if it is statistically significant, new and useful. Otherwise a lot of work was free. But what is statistical significance?
Not everyone has to become an analyst, so here is a brief summary: relationships between A and B may not be random by statistical criteria, but must – as far as one can say – have a systematic origin.
However, this enormously complex problem of statistics can sometimes be avoided in business practice. If you can test the analysis results experimentally, you can save a lot of time and scientific effort.
Big Data has arrived in the economy, for example in advertising and agriculture
To be ahead of time. Or at least better than the competitor. For small benefits, the human goes far. Accordingly, it is not surprising that big data is slowly moving from the research context into the world of industry and medium-sized companies.
Today, the advertising is by revenue the largest market for big data services . Immediately afterwards comes the data licensing. The companies promise a new world of business. Individually adaptable to the market situation production and delivery systems should increase efficiency and reduce costs. The planning of demand and sales on the basis of a large number of influencing factors which until now could hardly be considered will enable perfect management.
An example of this is the optimized use of fields in agriculture depending on climate, soil, sowing technology and needs. The limits and scarcities of reality are shifted enormously.
Big Data is changing our world: from manipulation to new business fields
Equally important are sentiment analyzes that can show product attractiveness in real time. Or media that – as Facebook showed in a study – are systematically able to manipulate the condition of the users. Adam Kramer of Facebook creates a national gross national happiness index based on company data. The employee of the innovation department searches specifically for potentials of digitized communication.
Experiments with millions of users are technically possible – and are being tackled. Because implementation and evaluation are no problem thanks to the big data infrastructure of the network.
New technology leads to new business areas. New solutions to old problems are possible:
- Sharing economy based on sensor monitoring
- Cloud services for publicly available information
- Advertising effectiveness analysis
- market research
- fraud prevention
- Diagnostics in medicine
- automatic and exact accounting in the energy and communications area
The world is changing, everywhere.
Big Data and its Political Importance: From Census to Election Campaign Planning
Russia recently engaged Russian companies to collect data. In Germany, the Minister of the Interior is pursuing the goal of national security in the US with data retention. In the US, the US secret services actively participated in the development and conception of the data scams Google, Facebook and Co.
Information and influence that arise from the sea of data seem to be existential values for nations. Strategically important decision-making tools have always been used – with the studies of economists and censuses sometimes even real precursors of Big Data. Whether auditing, economic and social policy, taxes and network analysis: up to the campaign planning Big Data holds decisive potential.
Big Data becomes a democratic supervisory body
Both in business and in politics it is now becoming clear how painful and necessary it is not simply to leave the powers of data and analysis to the powerful. The protection of data, privacy and copyright gets a whole new constitutional urgency.
Open Data , the release of data, in particular from taxpayers of funded databases, has become a worldwide movement. A whole range of tinkerers are now raising the treasures of this data and making the findings of the community available again.
Data analysis is done by linking properties
After this rough overview of big data, we now turn again to the concrete analysis. The organization of data is one of the most important foundations for this. Databases are a collection of so-called feature values.
An example: gender is a characteristic, the expression then “female”. In this way, in databases, similar to tables, statements about properties of many observations are linked together. As in the telephone book, the name combines with address and number in a certain system. Of course, this can be done with a lot more features at the same time: This is the beginning of multivariate databases and statistics.
Google as a widely available big data application with huge computing power
With big data, these databases are now huge: many features, forms, in rows, columns, time series, and multi-dimensional “tables” are possible. The investigation of such data landscapes requires enormous computing capacity.
But if real-time investigations, import of new data, fast and simultaneous data queries, overrides or various types of information such as numbers, language, text or images are added, it becomes clear which performance the mother of widely available big data applications – Google – has achieved. It is enormous.
Does big data make us look to the future? Yes, by linking data sources and content
The entanglement of data sources and content makes it possible to gain surprising insights. Tweets to specific restaurants or check-ins at bars, such as those available on Facebook or FourSquare, can provide clues linked to metadata as to where bad or spoiled food is being offered.
Unerringly, restaurants with poor hygiene could be located in one study . Even in the case of catastrophes, information about the extent and the best aid strategy can be obtained from the Twitter cloud. Where is the burning most? Who is the worst affected? How to where with the help?
Just as Facebook can draw conclusions from the user behavior on the economic and emotional situation of the users – up to the reliable prognosis ofa fast end of the relationship – one can predict future trouble spots, epidemics and even crimes based on correlated behavior. Anyway, an entire industry is trying to improve the techniques.
This does not only rely on the recognition of known patterns, ie data mining . Automated data mining, also known as machine learning, is expected to improve this process in the future. The further development of database systems and index structures are an important basis of any analysis.
The same applies to the mentioned semantic search options: plagiarism control via text comparisons and grammatical verification of text and language. Up to the control of databases for systemic errors and software codes for hackers, Big Data can retrieve and exploit irregularities and peculiarities.
Big Data should help to credibly verify critical information
It is the dream of big data experts not only to allow new markets and lower costs, but to recognize the favor of the hour. Which moment is decisive? Based on historical data patterns and signs of change, hypotheses can be made.
Twitter was able to predict the big crash of the BlackBerry shares two minutes before the stock market. Osama Bin Laden’s death was visible 20 minutes before the first newspapers – and believable due to network analysis and swarm intelligence theories.
To faithfully verify critical information is the hope that is put into big data. The big data experts are training their tools for greater significance.
Even big data carries dangers: deliberate manipulative behavior and the lack of seriousness in the evaluation
The opponents of this training are – in addition to incomplete and disordered databases – manipulated databases. Missing parts, altered data links, added extreme values that distort the image. Twitter bombs can occasionally change political races. Google bombs shape the image we have of people.
When former US Senator Rick Santorum faced headwinds as part of his provocative-conservative presidential election campaign, his name was linked to social networks and various blogs with key words that also influenced his Google ranking. So he was purposefully and sustainably discredited.
The bomb on Twitter, in blogs and Google left deep marks. It is a controversial question as to whether current discussion and engaged political groups should so dominate the online reputation of individuals (or companies, as in the case of BP or Shitstorms). And whether Google’s alleged impartial analysis algorithm should be able to pass on this image without editorial examination.
But there is also hope here – because technically it is easily possible to identify such formative trends. Some types of manipulation are easily recognizable.
Another problem is lack of seriousness in the evaluation of data: If statistical work rules are not sufficiently considered, no clear hypotheses are set up in advance, many analysis results are conceivable. The reliability and verifiability suffers.
Big Data brings a great deal of social responsibility
Data protection, correlation, representativeness, quality and informative value: technology does not care how it is used or spoiled. However, big data is so important, a so-called “megatrend”, that insiders invest billions in the field and embark on an adventure.
Many experiences are collected along the way and mistakes and progress are made while trying. The easiest way is a beginning, the most successful unknown. Because of this, you can not demand ready-made solutions, but you have to take the whole company with you and share the advantages and risks of the technology. The social debate will lead to a consensus on the role of morality, psyche and law in this innovation.
The big data problems seem to be preprogrammed
Conflicts lurk everywhere: surveillance, feedback, class organization, grouping, individualisation and anonymisation are only the first playing fields. From the dragnet to the creditworthiness and the most intimate health data, Big Data gets under your skin.
Human decisions will be constantly verifiable in the digital space. Individual mistakes become potentially visible to others and to oneself. A first taste? Take a look at which cluster Google has classified you into.
A consensus on big data is essential! The big data blog discusses it!
Big Data promises not only new knowledge, but also new thinking. The systems of knowledge acquisition and our understanding of knowledge as the basis of power are extremely changing at this moment. The world formula seems to come within reach of global communication networks and experiments that are plunging entire regions into controlled moods through manipulation of the Facebook timeline.
Social scientist and systems theorist Niklas Luhmann has written his books using a spreadsheet database. A box full of index cards with sentences and associated with references. The notes themselves were only in chronological order. The references were the pattern that enabled Luhmann to put his theses together argumentatively. Today they are fundamental to understanding complex systems – whether social, technical or biological.
For over 30 years, IT developers have understood his theories. Big Data breaks out of this framework, the paper box is digitized and the social role of data analysis is rediscovered. Who owns the data? Who is allowed to examine it? Who guards compliance with the rule?
This is discussed in the Big Data blog !