From Big to Data: Unfolded

Published in

The Startup

4 min readJan 7, 2021

Brief history of Big Data

A curve shaped library with brown accents and thousands of books — Photo by Olenka Sergienko from Pexels

Heighway Dragon is an ever-growing collection of similar fractal curves. Lindenmayer systems such as these can be generated in various ways. It is possible to end up with numerous iterations just by repeatedly folding them. Each iteration follows certain rewriting rules. Sequentially replacing segments and applying pre-designated rules you end up with a both mathematically and visually pleasing curve. As the number of iterations rises a visually pleasing curve appears.

The tricky part is to attempt at unfolding these fractals. The observer must determine how the pattern is coming to be in the first place, to attempt at unfolding it. An exemplary rule could be: “Replace each segment with 2 segments with a right angle and with a rotation of 45° alternatively to the right and the left.”.

Interpreting and visualizing the data may enable us to run different scenarios and predict possible outcomes. The problem of having ‘too much’ uncategorized information at hand then extracting relevant data out of it requires a meticulous and untraditional approach. The most prominent of these approaches is aptly named “Big Data”. Utilizing big data can help detect patterns among vast chunks of information derived from both abstract and physical settings, making it possible to retrace them to reveal further connections.

An explosion of information was not a dramatic one-time event and probably had been taking place until a librarian at the Wesleyan University has lifted the veil, in 1944.

Fremont Rider had observed that a staggering increase in volumes would eventually clog the whole cataloging system. He predicted that a storage, processing catastrophe was on the verge. He had noticed that every 16 years libraries were doubling in American Universities. In his words by 2040 Yale Library alone would have “Approximately 200,000,000 volumes, which will occupy over 6,000 miles of shelves.”(The Scholar & The Future of Research Library). This marks the literary milestone of information explosion and the problems this phenomenon is going to cause. Namely, processing and storing the information.

Unlike the Heighway dragon volume of a book is much more complicated to process, log, and interpret. Fremont Rider, whilst making these remarks could not have predicted the digitization. Volumes were static. Now that we have more processing power, handling literary material is the least of our worries. It is the constant stream of information pouring from hundreds of channels.

Thanks to unprecedented advancements in all disciplines of technology; whether intentionally or unintentionally we ended up with massive data sets at hand. It is now possible to detect and feasible to deduct the behavioral patterns of individuals. A great example would be the virtual heatmaps on touchscreens. Collected data and is filtered into meaningful instances while another system asynchronously tracks individual’s search trends, coupled with their biometric information. All these data sets on an individual can then be spiraled into a pool to be condensed into further meaningful output. These micro-events eventually lead to better decision making, better strategic moves, and prediction. In layman’s terms, employing Big Data potentially render it impossible to make bad decisions.

It is now not uncommon to be able to predict the future power outages on an electrical grid by combining a variety of data sets including but not limited to for example analyzing previous outages. It is now feasible to factor in the social events and behavioral analysis reports; data that seem irrelevant to the untrained eye may guide us in making surprisingly accurate decisions. We are witnessing the transition to the non-binary. This will enable us to achieve more with the data we have and transform noise into meaning. IoT devices, smart systems, and distributed processing are parts of our everyday lives now. Initiatives, public and private entities that incorporate these have demonstrated their benefits.

From a basic chat application to a fully autonomous self-driving vehicle; AI is on the path to becoming an indispensable part of our lives as well. AI software requires quality, to the point data to operate reliably. But there is a different aspect to that as well. As per the study by the Georgia Institute of Technology, it is indicated that pedestrians with darker skin tones are more likely to get hit by a self-driving car than those with lighter skin.”. This instantly opens up a more serious debate on AI. The reliability itself can not sustain the commercial and social aspects brought upon by the automation and AI processes.

Obtaining, managing, and extracting data efficiently will not suffice. The market of the future requires properly labeled, ethical, diversified, and unbiased data processing. Delivering the product as intended; requires quality input. Failure to meet this may render previous efforts obsolete and can even affect business negatively. Today, sourcing human in AI training to maintain these values is a preferred method but it is subject to change as well.

Data visualization is bound to become futile at some point. We will not be needing a set of biological eyes to monitor or understand any of the flow at all. Computer output will be hyper-condensed, so much that you will not be going through volumes of books to find the one, rather the article you need will be intrinsically provided in an instant.

From Big to Data: Unfolded

Written by AzerKaanDasdemir