Big Data refers to a term for data sets that are extraordinarily large and complex, that traditional data processing applications are insufficient in making sense.  What big data normally includes data sets with sizes beyond the ability of commonly used software tools to capture curate process and manage data within an acceptable time table. What is important to note is that the “big” in big data is a constantly moving target where in it must be on the largest end of what is available at the time, (e.g. in 2012 big data refereed to a range of a few dozen terabytes to today when it refers to many petabytes of data. Percentage-pie-chart-DA-determinations

One of the problems with the terms big data is that it is really an umbrella term that refers to 5 different types of data processing that overlap, in form but differ immensely. So there is usually a lot of inconsistency when talking about the term. So I will take a minute to explain how the differences of term manifest themselves and how we can understand their functionality and inter relatedness to each other.

First is the classic vanilla definition of Big Data or big-D which is the classic predictive analytic wherein you want to unveil trends and or push the limits of scientific know how by mining unfathomable amounts of data.

Next is what is refereed to as fast data or fast-D. refers to the quick analyzing of a consumers individuate preferential  data they pause by a store and for example generate a coupon. Fast Data sets are still extremely large but what is paramount when addressing fast-D sets is that the desired data is delivered near instantaneously. This can be understood in terms of a cities rail station using predictive analytic to understand delays and departure times for an hour from now in a moments notice.

Next is phenomena refereed to as Dark Data or dark-D. Dark D as its name suggests is that which is available to you but is not easily accessible or otherwise hard to acquire. This is understood more clearly when we consider that 80% of data is unstructured data. One way this is beginning to be of service is through its medical applications. Insofar as it is being used consider where and how medical epidemics will arise, where they will occur, who they will affect, what are the long term implications, and hopefully how to address and treat the topic at hand. This is begining to be used to address the recent outbreak of the Zika Virus in Brazil, and if it works well it will offer a hopeful look to the future and our medical hopes.

Next is new data: new data pertains to information we could get and need to get however, are unlikely gathering. APPX 8.6 gallons of agua gets lost through leaky pipes world wide every year. that is enough fill all the area behind the Hoover dam. big data is one thing that we are likely to improve upon but will still not understand very clearly what is actually going on.