# A simple explanation of Naive Bayes Classification

###### Posted By: Anonymous

I am finding it hard to understand the process of Naive Bayes, and I was wondering if someone could explain it with a simple step by step process in English. I understand it takes comparisons by times occurred as a probability, but I have no idea how the training data is related to the actual dataset.

Please give me an explanation of what role the training set plays. I am giving a very simple example for fruits here, like banana for example

```
training set---
round-red
round-orange
oblong-yellow
round-red
dataset----
round-red
round-orange
round-red
round-orange
oblong-yellow
round-red
round-orange
oblong-yellow
oblong-yellow
round-red
```

## Solution

Your question as I understand it is divided in two parts, part one being you need a better understanding of the Naive Bayes classifier & part two being the confusion surrounding Training set.

In general all of Machine Learning Algorithms need to be trained for supervised learning tasks like classification, prediction etc. or for unsupervised learning tasks like clustering.

During the training step, the algorithms are taught with a particular input dataset (training set) so that later on we may test them for unknown inputs (which they have never seen before) for which they may classify or predict etc (in case of supervised learning) based on their learning. This is what most of the Machine Learning techniques like Neural Networks, SVM, Bayesian etc. are based upon.

So in a general Machine Learning project basically you have to divide your input set to a Development Set (Training Set + Dev-Test Set) & a Test Set (or Evaluation set). Remember your basic objective would be that your system learns and classifies new inputs which they have never seen before in either Dev set or test set.

The test set typically has the same format as the training set. However, it is very important that the test set be distinct from the training corpus: if we simply

reused the training set as the test set, then a model that simply memorized its input, without learning how to generalize to new examples, would receive misleadingly high scores.

In general, for an example, 70% of our data can be used as training set cases. Also remember to partition the original set into the training and test sets *randomly*.

Now I come to your other question about Naive Bayes.

To demonstrate the concept of Naïve Bayes Classification, consider the example given below:

As indicated, the objects can be classified as either `GREEN`

or `RED`

. Our task is to classify new cases as they arrive, i.e., decide to which class label they belong, based on the currently existing objects.

Since there are twice as many `GREEN`

objects as `RED`

, it is reasonable to believe that a new case (which hasn’t been observed yet) is twice as likely to have membership `GREEN`

rather than `RED`

. In the Bayesian analysis, this belief is known as the prior probability. Prior probabilities are based on previous experience, in this case the percentage of `GREEN`

and `RED`

objects, and often used to predict outcomes before they actually happen.

Thus, we can write:

**Prior Probability of GREEN**:

`number of GREEN objects / total number of objects`

**Prior Probability of RED**:

`number of RED objects / total number of objects`

Since there is a total of `60`

objects, `40`

of which are `GREEN`

and 20 `RED`

, our prior probabilities for class membership are:

**Prior Probability for GREEN**:

`40 / 60`

**Prior Probability for RED**:

`20 / 60`

Having formulated our prior probability, we are now ready to classify a new object (`WHITE`

circle in the diagram below). Since the objects are well clustered, it is reasonable to assume that the more `GREEN`

(or `RED`

) objects in the vicinity of X, the more likely that the new cases belong to that particular color. To measure this likelihood, we draw a circle around X which encompasses a number (to be chosen a priori) of points irrespective of their class labels. Then we calculate the number of points in the circle belonging to each class label. From this we calculate the likelihood:

From the illustration above, it is clear that Likelihood of `X`

given `GREEN`

is smaller than Likelihood of `X`

given `RED`

, since the circle encompasses `1`

`GREEN`

object and `3`

`RED`

ones. Thus:

Although the prior probabilities indicate that `X`

may belong to `GREEN`

(given that there are twice as many `GREEN`

compared to `RED`

) the likelihood indicates otherwise; that the class membership of `X`

is `RED`

(given that there are more `RED`

objects in the vicinity of `X`

than `GREEN`

). In the Bayesian analysis, the final classification is produced by combining both sources of information, i.e., the prior and the likelihood, to form a posterior probability using the so-called Bayes’ rule (named after Rev. Thomas Bayes 1702-1761).

Finally, we classify X as `RED`

since its class membership achieves the largest posterior probability.

###### Answered By: Anonymous

Disclaimer: This content is shared under creative common license cc-by-sa 3.0. It is generated from StackExchange Website Network.