4 key layers of a big data system - i.e. the different stages the data itself has
to pass through on its journey from raw statistic or snippet of unstructured
data (for example, social media post) to actionable insight.
Data sources layer
This is where the data is arrives at your organization. It
includes everything from your sales records, customer database, feedback,
social media channels, marketing list, email archives and any data gleaned from
monitoring or measuring aspects of your operations. One of the first steps in
setting up a data strategy is assessing what you have here, and measuring it
against what you need to answer the critical questions you want help with. You
might have everything you need already, or you might need to establish new
sources.
Database storage layer
This
is where your Big Data lives, once it is gathered from your sources. As the
volume of data generated and stored by companies has started to explode,
sophisticated but accessible systems and tools have been developed – such as
Apache Hadoop DFS (distributed file system), –
or Google File System, to help with this task. A computer with a big hard disk
might be all that is needed for smaller data sets, but when you start to deal
with storing (and analyzing) truly big data, a more sophisticated, distributed
system is called for. As well as a system for storing data that your computer
system will understand (the file system) you will need a system for organizing
and categorizing it in a way that people will understand – the database. Hadoop
has its own, known as HBase, but others including Amazon’s DynamoDB, MongoDB
and Cassandra (used by Facebook), all based on the NoSQL architecture, are
popular too. This is where you might find the Government taking an interest in
your activities – depending on the sort of data you are storing, there may well
be security and privacy regulations to follow.
Database processing/ analysis layer
When
you want to use the data you have stored to find out something useful, you will
need to process and analyze it. A common method is by using a MapReduce tool.
Essentially, this is used to select the elements of the data that you want to
analyze, and putting it into a format from which insights can be gleaned. If
you are a large organization which has invested in its own data analytics team,
they will form a part of this layer, too. They will employ tools such as Apache
PIG or HIVE to query the data, and might use automated pattern recognition
tools to determine trends, as well as drawing their conclusions from manual
analysis.
Database output layer
This is how the insights gleaned through the analysis is passed
on to the people who can take action to benefit from them. Clear and concise
communication (particularly if your decision-makers don’t have a background in
statistics) is essential, and this output can take the form of reports, charts,
figures and key recommendations. Ultimately, your Big Data system’s main task
is to show, at this stage of the process, how measurable improvement in at
least one KPI that can be achieved by taking action based on the analysis you
have carried out.
No comments:
Post a Comment