Introduction to Data Science(Ws)

Posted by Shaifali's Blog on September 8, 2013


Data Science refers to an emerging area of work concerned with the collection, preparation, analysis, visualization, management and preservation of large collection of information. ― Jeffrey Stanton, Syracuse University School of Information Studies

From wikipedia Data science incorporates varying elements and builds on techniques and theories from many fields, including math, statistics, data engineering, pattern recognition and learning, advanced computing, visualization, uncertainty modelling, data warehousing, and high performance computing with the goal of extracting meaning from data, finding relationships in data and creating data products.


In the 19th century or so there was a major change in the nature of solving problems, apart from becoming more abstract, the primary focus shifted from calculations or following procedures to analysing relationships. The change in emphasis wasn’t arbitary, it came about through the complexity of the world. Data Science emphasis on finding unknown relationships among data.

Meanwhile approximately at the same time Internet also come in pace and as the Computer Science era changed from desktop application to web applications, and with more technology enhancement, now huge amount of data is available every where, what needed is only to work on this data! All truths are easy to understand once they are discovered; the point is to discover them.

Lots and lots and lots of data, lots and lots and lots of people, lots and lots and lots of places, constant change, ever increasing data. Data Science is the new reality. Data as programmable resource, versioning, filtering, aggregating, extending, automating, analysing and communicating data becomes a need, not only for business perspective also for human welfare.
Data is ever growing, never at rest!


Wherever is data, Data Science is followed. No matter data is small or in large scale, data science is used; For a new start-up with small data to big organizations with huge data, whenever one need to understand, process, extract value from it Data Science is applied.

What reviewed

Following are the three areas of data geeks(given by Drew Conway):

  • Hacking Skills refers to the programming knowledge and skills.
  • Math and Statistics refer to the traditional research, the way one handles data. Math is also described as the science of patterns, thus here applied for pattern of data.
  • Substantive Expertise refers the basic knowledge about the data itself.

Data Scientist

Three types of Known and Unknowns:

  1. There are Known Knowns, these are things we know that we know.
  2. There are Known Unknowns, things we know that we don’t know.
  3. There are Unknown Unknowns, there are things we don’t know we don’t know.

Data Scientists is the one who discovers the unknowns and make them known. He need to find nuggets of truth in data and then explain it to the business leaders.

How it works

Though its never ending knowledge to understand how Data Science is applied on data, because every different set of data may require different technique/algorithm to analyse and interpret it. But still there are under mentioned tasks as a usual procedure followed for data analysis and communication.

Types of tasks in Data Science :

  1. Preparing to run a model: refers to gathering, cleaning, integrating, restructuring, transforming, loading, filtering, deleting, combining, merging, verifying etc.
  2. Running the model.
  3. Communicating the result: means visualization and interpreting final result.