NAIVE BAYES 분류기 - TECHCODEVIEW.COM - PYTHON을 사용한 AI-ML-DS

베이즈 정리(Bayes' Theorem)를 기반으로 한 알고리즘 계열인 Naive Bayes 분류기입니다. 기능 독립성에 대한 순진한 가정에도 불구하고 이러한 분류자는 기계 학습에서 단순성과 효율성을 위해 널리 활용됩니다. 이 기사에서는 지나치게 단순화된 가정에도 불구하고 이론, 구현 및 응용을 자세히 살펴보고 실제 유용성을 조명합니다.

나이브 베이즈 분류기란 무엇입니까?

나이브 베이즈 분류기는 베이즈 정리를 기반으로 한 분류 알고리즘 모음입니다. 이는 단일 알고리즘이 아니라 모든 알고리즘이 공통 원칙을 공유하는 알고리즘 계열입니다. 즉, 분류되는 모든 기능 쌍은 서로 독립적입니다. 우선 데이터 세트를 고려해 보겠습니다.

가장 간단하고 효과적인 분류 알고리즘 중 하나인 Naïve Bayes 분류기는 신속한 예측 기능을 갖춘 기계 학습 모델의 신속한 개발을 지원합니다.

Naive Bayes 알고리즘은 분류 문제에 사용됩니다. 텍스트 분류에 많이 사용됩니다. 텍스트 분류 작업에서 데이터에는 높은 차원이 포함됩니다(각 단어는 데이터의 하나의 기능을 나타냄). 스팸 필터링, 감정 탐지, 등급 분류 등에 사용됩니다. Naïve Bayes를 사용하는 장점은 속도입니다. 속도가 빠르고, 고차원의 데이터로 예측이 쉽습니다.

이 모델은 인스턴스가 주어진 특성 값 집합을 가진 클래스에 속할 확률을 예측합니다. 확률적 분류기이다. 이는 모델의 한 특성이 다른 특성의 존재와 독립적이라고 가정하기 때문입니다. 즉, 각 특성은 서로 아무런 관련 없이 예측에 기여합니다. 현실 세계에서는 이 조건이 거의 충족되지 않습니다. 학습 및 예측 알고리즘에 베이즈 정리를 사용합니다.

Naive Bayes라고 불리는 이유는 무엇입니까?

이름의 Naive 부분은 Naïve Bayes 분류기가 만든 단순화된 가정을 나타냅니다. 분류기는 관찰을 설명하는 데 사용되는 특징이 클래스 레이블이 주어지면 조건부로 독립적이라고 가정합니다. 이름의 Bayes 부분은 Bayes의 정리를 공식화한 18세기 통계학자이자 신학자인 Thomas Bayes 목사를 가리킵니다.

골프 게임을 하기 위한 기상 조건을 설명하는 가상의 데이터세트를 생각해 보세요. 기상 조건을 고려하여 각 튜플은 골프를 치는 데 적합(예) 또는 부적합(아니요)으로 조건을 분류합니다. 다음은 데이터 세트를 표 형식으로 표현한 것입니다.

	시야	온도	습기	깜짝 놀란	골프를 치다
0	비오는 날	더운	높은	거짓	아니요
1	비오는 날	더운	높은	진실	아니요
2	흐린	더운	높은	거짓	예
삼	화창한	경증	높은	거짓	예
4	화창한	시원한	정상	거짓	예
5	화창한	시원한	정상	진실	아니요
6	흐린	시원한	정상	진실	예
7	비오는 날	경증	높은	거짓	아니요
8	비오는 날	시원한	정상	거짓	예
9	화창한	경증	정상	거짓	예
10	비오는 날	경증	정상	진실	예
열하나	흐린	경증	높은	진실	예
12	흐린	더운	정상	거짓	예
13	화창한	경증	높은	진실	아니요

데이터 세트는 두 부분으로 나뉩니다. 기능 매트릭스 그리고 응답 벡터 .

특징행렬은 각 벡터가 다음의 값으로 구성된 데이터세트의 모든 벡터(행)를 포함합니다. 종속 기능 . 위의 데이터 세트에서 기능은 'Outlook', 'Temperature', 'Humidity' 및 'Windy'입니다.
응답 벡터에는 다음 값이 포함됩니다. 클래스 변수 (예측 또는 출력) 기능 매트릭스의 각 행에 대해. 위 데이터세트에서 클래스 변수 이름은 'Play Golf'입니다.

나이브 베이즈의 가정

기본적인 Naive Bayes 가정은 각 기능이 다음을 만든다는 것입니다.

기능 독립성: 데이터의 특징은 클래스 레이블이 주어지면 조건부로 서로 독립적입니다.
연속 특성은 정규 분포를 따릅니다. 특성이 연속적이면 각 클래스 내에서 정규 분포를 따르는 것으로 가정됩니다.
이산 특성에는 다항 분포가 있습니다. 특성이 이산형인 경우 각 클래스 내에서 다항 분포를 갖는 것으로 가정됩니다.
기능도 마찬가지로 중요합니다. 모든 기능은 클래스 레이블 예측에 동일하게 기여하는 것으로 가정됩니다.
누락된 데이터 없음: 데이터에는 누락된 값이 포함되어서는 안 됩니다.

데이터 세트와 관련하여 이 개념은 다음과 같이 이해될 수 있습니다.

우리는 어떤 특성 쌍도 종속되지 않는다고 가정합니다. 예를 들어, 온도가 '덥다'는 것은 습도와 관련이 없으며 전망이 '비'라는 것은 바람에 영향을 미치지 않습니다. 따라서 특성은 다음과 같이 가정됩니다. 독립적인 .
둘째, 각 기능에는 동일한 가중치(또는 중요도)가 부여됩니다. 예를 들어 온도와 습도만 아는 것만으로는 결과를 정확하게 예측할 수 없습니다. 어떤 속성도 관련이 없으며 기여하는 것으로 간주됩니다. 같이 결과에.

Naive Bayes의 가정은 실제 상황에서는 일반적으로 정확하지 않습니다. 실제로 독립 가정은 결코 정확하지 않지만 실제로는 잘 작동하는 경우가 많습니다. 이제 Naive Bayes 공식으로 이동하기 전에 Bayes 정리에 대해 아는 것이 중요합니다.

베이즈 정리

베이즈 정리는 이미 발생한 다른 사건의 확률을 바탕으로 특정 사건이 발생할 확률을 구합니다. 베이즈 정리는 수학적으로 다음 방정식으로 표현됩니다.

P(A|B) = fracP(B{P(B)}

여기서 A와 B는 사건이고 P(B) ≠ 0입니다.

기본적으로 우리는 사건 B가 참일 때 사건 A의 확률을 찾으려고 노력하고 있습니다. 사건 B는 다음과 같이 불린다. 증거 .
P(A)는 선험적으로 A의 (사전 확률, 즉 증거가 확인되기 전 사건의 확률) 증거는 알 수 없는 인스턴스(여기서는 이벤트 B)의 속성값입니다.
P(B)는 한계 확률(Marginal Probability): 증거의 확률입니다.
P(A|B)는 B의 사후 확률, 즉 증거가 나타난 후 사건이 발생할 확률입니다.
P(B|A)는 가능성 확률, 즉 증거를 기반으로 가설이 실현될 가능성입니다.

이제 데이터 세트와 관련하여 다음과 같은 방식으로 Bayes의 정리를 적용할 수 있습니다.

P(y|X) = fracP(X{P(X)}

여기서 y는 클래스 변수이고 X는 종속 특징 벡터(크기)입니다. N ) 어디:

X = (x_1,x_2,x_3,…..,x_n)

명확하게 말하면 특징 벡터 및 해당 클래스 변수의 예는 다음과 같습니다. (데이터 세트의 첫 번째 행 참조)

X = (Rainy, Hot, High, False)>
y = No>

그래서 기본적으로,P(y|X) 여기서는 기상 조건이 비 올 전망, 기온이 덥고 습도가 높으며 바람이 없을 때 골프를 치지 않을 확률을 의미합니다.

데이터 세트와 관련하여 이 개념은 다음과 같이 이해될 수 있습니다.

우리는 어떤 특성 쌍도 종속되지 않는다고 가정합니다. 예를 들어, 온도가 '덥다'는 것은 습도와 관련이 없으며 전망이 '비'라는 것은 바람에 영향을 미치지 않습니다. 따라서 특성은 다음과 같이 가정됩니다. 독립적인 .
둘째, 각 기능에는 동일한 가중치(또는 중요도)가 부여됩니다. 예를 들어 온도와 습도만 아는 것만으로는 결과를 정확하게 예측할 수 없습니다. 어떤 속성도 관련이 없으며 기여하는 것으로 간주됩니다. 같이 결과에.

이제 베이즈 정리에 순진한 가정을 적용할 차례입니다. 독립 특징 중. 그럼 이제 헤어지자 증거 독립된 부분으로.

이제 두 사건 A와 B가 독립이면,

P(A,B) = P(A)P(B)>

따라서 우리는 다음과 같은 결과에 도달합니다.

P(y|x_1,…,x_n) = frac P(x_1{P(x_1)P(x_2)…P(x_n)}

이는 다음과 같이 표현될 수 있습니다:

P(y|x_1,…,x_n) = frac{P(y)prod_{i=1}^{n}P(x_i|y)}{P(x_1)P(x_2)…P(x_n)}

이제 주어진 입력에 대해 분모가 일정하게 유지되므로 해당 항을 제거할 수 있습니다.

P(y|x_1,…,x_n)propto P(y)prod_{i=1}^{n}P(x_i|y)

이제 분류자 모델을 만들어야 합니다. 이를 위해 클래스 변수의 가능한 모든 값에 대해 주어진 입력 세트의 확률을 찾습니다. 그리고 그리고 최대 확률로 출력을 선택합니다. 이는 수학적으로 다음과 같이 표현될 수 있습니다.

y = argmax_{y} P(y)prod_{i=1}^{n}P(x_i|y)

그럼 마지막으로 계산하는 작업이 남았습니다. P(y) 그리고P(x_i | y) .

점에 유의하시기 바랍니다P(y) 클래스 확률이라고도 하며P(x_i | y) 조건부 확률이라고 합니다.

다양한 나이브 베이즈 분류기는 주로 다음의 분포와 관련하여 가정하는 방식에 따라 다릅니다.P(x_i | y).

날씨 데이터 세트에 위 공식을 수동으로 적용해 보겠습니다. 이를 위해 데이터 세트에 대해 몇 가지 사전 계산을 수행해야 합니다.

우리는 찾아야 해요 P(x_i | y_j) 각각x_i X와y_j y에서. 이러한 모든 계산은 아래 표에 설명되어 있습니다.

그래서 위의 그림에서 계산해봤습니다.P(x_i | y_j) 각각x_i X와y_j 표 1-4에서 y를 수동으로 입력합니다. 예를 들어, 온도가 시원할 때 골프를 칠 확률, 즉 P(온도 = 시원함 | 골프를 치다 = 예) = 3/9입니다.

또한 클래스 확률을 찾아야 합니다.P(y) 이는 표 5에서 계산되었습니다. 예를 들어 P(골프 플레이 = 예) = 9/14입니다.

이제 사전 계산이 완료되었으며 분류기가 준비되었습니다!

새로운 기능 세트에 대해 테스트해 보겠습니다(오늘은 이를 호출하겠습니다).

today = (Sunny, Hot, Normal, False)>

P(Yes | today) = fracYes)P(No Wind{P(today)}

골프를 치지 않을 확률은 다음과 같이 주어진다:

P(No | today) = fracP(Sunny Outlook{P(today)}

P(오늘)는 두 확률 모두에서 공통적이므로 P(오늘)을 무시하고 다음과 같이 비례 확률을 찾을 수 있습니다.

P(Yes | today) propto frac{3}{9}.frac{2}{9}.frac{6}{9}.frac{6}{9}.frac{9}{14} approx 0.02116

그리고

P(No | today) propto frac{3}{5}.frac{2}{5}.frac{1}{5}.frac{2}{5}.frac{5}{14} approx 0.0068

이제부터

P(Yes | today) + P(No | today) = 1

이 숫자는 합을 1로 만들어 확률로 변환할 수 있습니다(정규화).

P(Yes | today) = frac{0.02116}{0.02116 + 0.0068} approx 0.0237

그리고

P(No | today) = frac{0.0068}{0.0141 + 0.0068} approx 0.33

부터

P(Yes | today)>P(아니요 | 오늘)

따라서 골프를 칠 것이라는 예측은 '예'이다.

위에서 논의한 방법은 이산형 데이터에 적용 가능합니다. 연속 데이터의 경우 각 특성 값의 분포와 관련하여 몇 가지 가정을 해야 합니다. 다양한 나이브 베이즈 분류기는 주로 다음의 분포와 관련하여 가정하는 방식에 따라 다릅니다.P(x_i | y).

Naive Bayes 모델의 유형

Naive Bayes 모델에는 세 가지 유형이 있습니다.

가우스 나이브 베이즈 분류기

Gaussian Naive Bayes에서는 각 특징과 연관된 연속 값이 가우스 분포에 따라 분포된다고 가정합니다. 가우스 분포라고도 합니다. 정규 분포 플로팅하면 아래와 같이 특성 값의 평균에 대해 대칭인 종 모양의 곡선이 제공됩니다.

Outlook 기능에 대한 사전 확률의 업데이트된 표는 다음과 같습니다.

특징의 우도는 가우스로 가정되므로 조건부 확률은 다음과 같이 지정됩니다.

P(x_i | y) = frac{1}{sqrt{2pisigma _{y}^{2} }} exp left (-frac{(x_i-mu _{y})^2}{2sigma _{y}^{2}} ight )

이제 scikit-learn을 사용하여 Gaussian Naive Bayes 분류기 구현을 살펴보겠습니다.

	예	아니요	피(예) Java에서 예외 처리 던지기	피(아니요)
화창한	삼	2	3/9	2/5
비오는 날	4	0	4/9	0/5
흐린	2	삼	2/9	3/5
총	9	5	100%	100%

파이썬

# load the iris dataset> from> sklearn.datasets>import> load_iris> iris>=> load_iris()> > # store the feature matrix (X) and response vector (y)> X>=> iris.data> y>=> iris.target> > # splitting X and y into training and testing sets> from> sklearn.model_selection>import> train_test_split> X_train, X_test, y_train, y_test>=> train_test_split(X, y, test_size>=>0.4>, random_state>=>1>)> > # training the model on training set> from> sklearn.naive_bayes>import> GaussianNB> gnb>=> GaussianNB()> gnb.fit(X_train, y_train)> > # making predictions on the testing set> y_pred>=> gnb.predict(X_test)> > # comparing actual response values (y_test) with predicted response values (y_pred)> from> sklearn>import> metrics> print>(>'Gaussian Naive Bayes model accuracy(in %):'>, metrics.accuracy_score(y_test, y_pred)>*>100>)>

Output: Gaussian Naive Bayes model accuracy(in %): 95.0 Multinomial Naive Bayes Feature vectors represent the frequencies with which certain events have been generated by a multinomial distribution. This is the event model typically used for document classification. Bernoulli Naive Bayes In the multivariate Bernoulli event model, features are independent booleans (binary variables) describing inputs. Like the multinomial model, this model is popular for document classification tasks, where binary term occurrence(i.e. a word occurs in a document or not) features are used rather than term frequencies(i.e. frequency of a word in the document). Advantages of Naive Bayes ClassifierEasy to implement and computationally efficient.Effective in cases with a large number of features.Performs well even with limited training data.It performs well in the presence of categorical features. For numerical features data is assumed to come from normal distributionsDisadvantages of Naive Bayes ClassifierAssumes that features are independent, which may not always hold in real-world data.Can be influenced by irrelevant attributes.May assign zero probability to unseen events, leading to poor generalization.Applications of Naive Bayes Classifier Spam Email Filtering : Classifies emails as spam or non-spam based on features. Text Classification : Used in sentiment analysis, document categorization, and topic classification. Medical Diagnosis: Helps in predicting the likelihood of a disease based on symptoms. Credit Scoring: Evaluates creditworthiness of individuals for loan approval. Weather Prediction : Classifies weather conditions based on various factors.As we reach to the end of this article, here are some important points to ponder upon: In spite of their apparently over-simplified assumptions, naive Bayes classifiers have worked quite well in many real-world situations, famously document classification and spam filtering. They require a small amount of training data to estimate the necessary parameters.Naive Bayes learners and classifiers can be extremely fast compared to more sophisticated methods. The decoupling of the class conditional feature distributions means that each distribution can be independently estimated as a one dimensional distribution. This in turn helps to alleviate problems stemming from the curse of dimensionality.ConclusionIn conclusion, Naive Bayes classifiers, despite their simplified assumptions, prove effective in various applications, showcasing notable performance in document classification and spam filtering. Their efficiency, speed, and ability to work with limited data make them valuable in real-world scenarios, compensating for their naive independence assumption. Frequently Asked Questions on Naive Bayes ClassifiersWhat is Naive Bayes real example?Naive Bayes is a simple probabilistic classifier based on Bayes’ theorem. It assumes that the features of a given data point are independent of each other, which is often not the case in reality. However, despite this simplifying assumption, Naive Bayes has been shown to be surprisingly effective in a wide range of applications. Why is it called Naive Bayes?Naive Bayes is called naive because it assumes that the features of a data point are independent of each other. This assumption is often not true in reality, but it does make the algorithm much simpler to compute. What is an example of a Bayes classifier?A Bayes classifier is a type of classifier that uses Bayes’ theorem to compute the probability of a given class for a given data point. Naive Bayes is one of the most common types of Bayes classifiers. What is better than Naive Bayes?There are several classifiers that are better than Naive Bayes in some situations. For example, logistic regression is often more accurate than Naive Bayes, especially when the features of a data point are correlated with each other. Can Naive Bayes probability be greater than 1?No, the probability of an event cannot be greater than 1. The probability of an event is a number between 0 and 1, where 0 indicates that the event is impossible and 1 indicates that the event is certain.>