Data Science techniques
1 commentWhat is data science?
Data science is a multidisciplinary field that uses a range of techniques in order to extract data, draw insights, and solve analytical problems. With the end goal normally being to create business value.
The fields that are involved in data science vary from mathematics, statistics, information science, and computer science.
Data science has been growing in importance due to the rise of 'big data'. The increase in the size of data, and its more unstructured form, means that it is less manageable to analyse. Data science has therefore become an important field in which to deal with these issues.
What are the techniques of data science?
There are a wealth of techniques used by data scientists, some of these include:
Linear Regression: This is the linear approach, i.e. a graphical representation on a straight line, which models the relationship of a dependent variable and independent variable in order to predict a target variable.
Clustering: This is where you divide and sort data points into specific groups so that the data points share similar traits. There are two types of clustering, hard clustering - where data points either fit into a group or they don't -, and soft clustering - this is where the probability of a particular data point being in the category is made.
Association analysis: This is where machine learning models analyse data points in a database for patterns, and consequently identifies 'if-then' associations, also known as 'association rules'. After this analysis, you are able to see the commonly occurring associations. Follow this link to find a more detailed definition of association analysis.
Logistic Regression: This type of model, frequently used in statistics, uses a logistic curve, or logistic function, for modelling a binary dependent variable, overcoming the classification problem. Read this useful article on logistic regression to learn more.
What are the main phases in data science?
Data Science involves many stages in order to reach the end goal. These can include:
Discover: This stage involves the formulation of the initial hypothesis, after the framing of the business problem. It is also necessary to evaluate the resources needed for the project.
Data Preparation: This is where search for, pre-process, and ready the data needed for the modeling process. This may involve preparing the analytics sandbox.
Model planning: This stage requires planning of the methods and techniques that are needed in order to draw relevant results.
Model building: Following planning, you collate the methods and techniques so that they form a model
Put in to the model practice running the data.
Present results: After collecting all the results, it is necessary to translate them into a more efficient and concise presentation.
To find out more about data science and the techniques necessary, please refer to these webpages:
To find techniques and templates for data science, please refer to the tools on Eloquens below.
Most popular techniques
- Excel template to create a Non-ribbon Sankey Diagram used to depict connections between one dataset and another.655Discussadd_shopping_cart$10.00by Hicham Bou Habib
Decision Tree Algorithm & Analysis
Edureka gives a comprehensive tutorial on decision tree analysis with the help of examples.351Discussfreeby EdurekaData Science in Audit Investment Valuation Testing PSX
Testing year-end Investment Valuation of PSX listed companies415Discussadd_shopping_cartHow To Correctly Validate Machine Learning Models
Whitepaper discussing the 4 main components for correctly validating machine learning models.258Discussadd_shopping_cartfreeby RapidMinerMachine Learning Algorithms Tutorial
Teaching the basics of machine learning, along with the ways in which you can use machine learning for problem solving.229Discussfreeby EdurekaThe Top 5 Algorithms used in Data Science
This video discusses the 5 most widely used algorithms in Data Science and how to use them.365Discussadd_shopping_cartfreeby Edureka