#### How To Correctly Validate Machine Learning Models

*remove_red_eye*Discuss

*by RapidMiner*

Data science is a multidisciplinary field that uses a range of techniques in order to extract data, draw insights, and solve analytical problems. With the end goal normally being to create business value.

The fields that are involved in data science vary from mathematics, statistics, information science, and computer science.

Data science has been growing in importance due to the rise of 'big data'. The increase in the size of data, and its more unstructured form, means that it is less manageable to analyse. Data science has therefore become an important field in which to deal with these issues.

There are a wealth of techniques used by data scientists, some of these include:

**Linear Regression**: This is the linear approach, i.e. a graphical representation on a straight line, which models the relationship of a dependent variable and independent variable in order to predict a target variable.**Clustering**: This is where you divide and sort data points into specific groups so that the data points share similar traits. There are two types of clustering, hard clustering - where data points either fit into a group or they don't -, and soft clustering - this is where the probability of a particular data point being in the category is made.**Association analysis**: This is where machine learning models analyse data points in a database for patterns, and consequently identifies 'if-then' associations, also known as 'association rules'. After this analysis, you are able to see the commonly occurring associations. Follow this link to find a more detailed definition of association analysis.**Logistic Regression**: This type of model, frequently used in statistics, uses a logistic curve, or logistic function, for modelling a binary dependent variable, overcoming the classification problem. Read this useful article on logistic regression to learn more.

Data Science involves many stages in order to reach the end goal. These can include:

**Discover**: This stage involves the formulation of the initial hypothesis, after the framing of the business problem. It is also necessary to evaluate the resources needed for the project.**Data Preparation**: This is where search for, pre-process, and ready the data needed for the modeling process. This may involve preparing the analytics sandbox.**Model planning**: This stage requires planning of the methods and techniques that are needed in order to draw relevant results.**Model building**: Following planning, you collate the methods and techniques so that they form a modelPut in to the model practice running the data.

Present results: After collecting all the results, it is necessary to translate them into a more efficient and concise presentation.

To find out more about data science and the techniques necessary, please refer to these webpages:

To find techniques and templates for data science, please refer to the tools on Eloquens below.

#### How To Correctly Validate Machine Learning Models

Whitepaper discussing the 4 main components for correctly validating machine learning models.39*remove_red_eye*Discussfree*by RapidMiner*#### Machine Learning Algorithms Tutorial

Teaching the basics of machine learning, along with the ways in which you can use machine learning for problem solving.32*remove_red_eye*Discussfree*by Edureka*#### The Top 5 Algorithms used in Data Science

This video discusses the 5 most widely used algorithms in Data Science and how to use them.56*remove_red_eye*Discussfree*by Edureka*#### How to Learn Machine Learning in 6 Months

Senior Data Scientist Zach Millar explains how you can learn machine learning in 6 months through a roadmap process.71*remove_red_eye*Discussfree*by IDEAS*#### How to Apply INDEX and MATCH Separately and Combined | Advanced Excel

Learn how to apply both functions, INDEX and MATCH, separately and combined on Excel.New!Discuss#### How to Classify Data | Types of Data

Read our article to find out the two main ways of classifying data.New!Discuss#### How to Use Student's T Distribution

Learn everything you need to know about Student's T Distribution.New!Discuss#### How to Create a Database

Learn more about basic database terminology before you start coding.New!Discuss#### How to Differentiate Database and Spreadsheet

In this post, we will focus on the differences between database vs spreadsheet.New!Discuss#### Building Robust Machine Learning Models

This presentation focuses on the fundamentals of building robust machine learning models.32*remove_red_eye*Discuss#### Measuring Model Performance

Video tutorial on how to measure your model's performance.22*remove_red_eye*Discussfree*by Data Camp*#### Choosing the Right Machine Learning Algorithm

Seth Mottaghinejad discusses the things we should be thinking about when choosing a machine learning algorithm.39*remove_red_eye*Discuss#### Linear Regression Algorithm Tutorial

Edureka explains the basics of linear regression with the use of examples and use cases.52*remove_red_eye*Discussfree*by Edureka*#### Decision Tree Algorithm & Analysis

Edureka gives a comprehensive tutorial on decision tree analysis with the help of examples.41*remove_red_eye*Discussfree*by Edureka*#### How to use and implement the Interpolate-Lookup function

This is a detailed guide on how to use and implement the Interpolate-Lookup function.New!Discuss#### How to Include Dummy Variables into a Regression

Learn how to include Dummy Variables into a Regression.New!Discuss

#### Data Science in Audit Investment Valuation Testing PSX

Testing year-end Investment Valuation of PSX listed companiesNew!Discuss#### How to use and implement the Interpolate-Lookup function

This is a detailed guide on how to use and implement the Interpolate-Lookup function.New!Discuss#### Data Science for Audit- Dividend Income Testing

Data Science for Audit, Testing Dividend Income Using PythonNew!Discuss#### Python for Audit Testing (Valuation)

Data Science in External AuditingNew!Discuss#### How to Define Relational Database Essentials

Learn about the two main types of databases.New!Discuss#### How to Differentiate Database and Spreadsheet

In this post, we will focus on the differences between database vs spreadsheet.New!Discuss#### How to Create a Database

Learn more about basic database terminology before you start coding.New!Discuss#### How to Add a Second “if” Statement | ELIF

Learn an elegant way of adding a second “if” statement to one of our expressions.New!Discuss#### How to Define Python Tuples

Python tuples are another type of data sequences, but differently to lists, they are immutable...New!Discuss#### How to Use Conditionals and Loops in Python

Let’s see how to combine conditionals and loops in Python.New!Discuss#### How to Measure Asymmetry with Skewness

The most commonly used tool to measure asymmetry is skewness. Learn more about it by checking out this article.New!Discuss#### How to Handle large data tables with ease | VLOOKUP COLUMN and ROW

Learn how to handle large data tables with ease!New!Discuss#### How to Use the Simple Linear Regression Model | Geometrical Representation

Find out how to use the simple linear regression model through geometrical representation.New!Discuss#### How to Use VLOOKUP and MATCH in Excel

We’ve seen several function combinations so far. In this lesson, we’ll present another one that can be useful.New!Discuss#### How to Perform the Population vs Sample Data Check

The first step of every statistical analysis you will perform is the population vs sample data check.New!Discuss#### How to Define and Use the Simple Linear Regression Model

Learn everything you need to know about the simple linear regression model.New!Discuss

#### How to Learn Machine Learning in 6 Months

Senior Data Scientist Zach Millar explains how you can learn machine learning in 6 months through a roadmap process.71*remove_red_eye*Discussfree*by IDEAS*#### How To Correctly Validate Machine Learning Models

Whitepaper discussing the 4 main components for correctly validating machine learning models.39*remove_red_eye*Discussfree*by RapidMiner*#### Machine Learning Algorithms Tutorial

Teaching the basics of machine learning, along with the ways in which you can use machine learning for problem solving.32*remove_red_eye*Discussfree*by Edureka*#### The Top 5 Algorithms used in Data Science

This video discusses the 5 most widely used algorithms in Data Science and how to use them.56*remove_red_eye*Discussfree*by Edureka*#### How to Apply INDEX and MATCH Separately and Combined | Advanced Excel

Learn how to apply both functions, INDEX and MATCH, separately and combined on Excel.New!Discuss#### How to Classify Data | Types of Data

Read our article to find out the two main ways of classifying data.New!Discuss#### How to Use Student's T Distribution

Learn everything you need to know about Student's T Distribution.New!Discuss#### How to Create a Database

Learn more about basic database terminology before you start coding.New!Discuss#### How to Differentiate Database and Spreadsheet

In this post, we will focus on the differences between database vs spreadsheet.New!Discuss#### Building Robust Machine Learning Models

This presentation focuses on the fundamentals of building robust machine learning models.32*remove_red_eye*Discuss#### Measuring Model Performance

Video tutorial on how to measure your model's performance.22*remove_red_eye*Discussfree*by Data Camp*#### Choosing the Right Machine Learning Algorithm

Seth Mottaghinejad discusses the things we should be thinking about when choosing a machine learning algorithm.39*remove_red_eye*Discuss#### Linear Regression Algorithm Tutorial

Edureka explains the basics of linear regression with the use of examples and use cases.52*remove_red_eye*Discussfree*by Edureka*#### Decision Tree Algorithm & Analysis

Edureka gives a comprehensive tutorial on decision tree analysis with the help of examples.41*remove_red_eye*Discussfree*by Edureka*#### How to use and implement the Interpolate-Lookup function

This is a detailed guide on how to use and implement the Interpolate-Lookup function.New!Discuss#### How to Include Dummy Variables into a Regression

Learn how to include Dummy Variables into a Regression.New!Discuss- Have a Data Mining Technique to Share
#### Your Data Science Technique

Publish your technique

Learn more about digital publishing

### Prof. Ed Bodmer

Consultant and Workshop Leader

1### 365 Data Science

Data Science Education Platform

2### Rizwan Ahmed Surhio

Experienced Auditor with Data Analytics Skills

3### RapidMiner

Data Science on an Enterprise Scale

4### Edureka

Interactive e-learning platform.

5### Data Science Dojo

Data Science for everyone.

6### Data Camp

Data Camp's aim is to enable you to become a data science expert through practical learning.

7### Conference Videos

Conference Videos allows you to watch the TechEd North America Conference from your computer.

8### IDEAS

International Data Engineering and Science Association

9

The user community is here to help. Go ahead!

please wait...