Towards Big Data
Author: Rana el Sharqawy – IT Department at EFG Hermes.
Have you ever wondered what Big Data is, or what makes data “big”?
Why are organizations currently becoming more analytical and data driven? What triggered the current boom of the data science concept and what should a data scientist be like?
We’re about to find out.
What sets big data apart is that it requires the use of different architecture and tools due to its scale, diversity and distribution. This enables businesses to unlock new business value through various insights and analyses. There are multiple characteristics of big data, but the three that stand out are its huge volume of data (for instance, tools that can manage billions of rows and billions of columns); the complexity of data types and structures (80-90% of the data in existence is unstructured, for example, journals, Facebook posts and Twitter have unstructured formats and types of writing and streaming, not just structured tables and files); and speed or velocity of new data creation.
Now that we have established the broader definition of big data, let’s move to data science. Data science makes use of machine learning algorithms to design and develop statistical models, generating knowledge from the pile of big data.
This begs the question, do business organizations have data science, or is it just business intelligence? What is the difference between the two?
Business Intelligence (BI) focuses on using a consistent set of metrics to measure past business performance and inform business planning. This includes creating key performance indicators (KPIs) that reflect the most essential metrics to measure your business.
Data science & predictive analytics refer to a combination of analytical and machine learning techniques used for drawing inferences and insight from data. Data scientists have a sense of getting hidden insight and analysis from data that may not be obvious to organization stakeholders or even business analysts. They are always curious about the outcomes, numbers and drivers. The combination of analytical and machine learning techniques includes approaches such as regression analysis, association rules (e.g. Market Basket Analysis), optimization techniques, and simulations (e.g. the Monte Carlo simulation to model scenario outcomes). These are the more robust techniques for answering higher order questions and deriving greater value for an organization.
To make things a little clearer, BI answers common questions like “what happened last quarter?”, “how much did we sell?”, and “where is the problem?” through standard and ad hoc reporting, dashboards, queries and structured data. Data science, on the other hand, looks to the future, answering questions such as “what if?”, “what is the optimal scenario for our business?, “what will happen next?”, and “what if these trends continue?” through optimization and predictive modeling, forecasting and statistical analysis based on structured and unstructured data.
Both BI and data science are needed for organizations to successfully meet emerging business challenges; moving through the past, present and future.
With all this in mind, here are five main competency and behavioral characteristics for data scientists:
- Quantitative skills, such as mathematics or statistics concepts.
- Technical aptitude, such as software engineering, machine learning, and programming skills.
- Skepticism. This may be a counter-intuitive trait, however, it is important that data scientists are able to examine their work critically.
- Curiosity and creativity. Data scientists must be passionate about data and finding creative ways to solve problems and portray information.
- Communication and collaboration. It is not enough to have strong quantitative or engineering skills. To make a project resonate, you must be able to articulate its business value in a clear way and work collaboratively with project sponsors and key stakeholders.
How many characteristics do you think you have?