k-nearest neighbor classification - MATLAB (2024)

Table of Contents
Description Creation Properties KNN Properties BreakTies — Tie-breaking algorithm'smallest' (default) | 'nearest' | 'random' DistanceWeight — Distance weighting function 'equal' | 'inverse' | 'squaredinverse' | function handle DistParameter — Parameter for distance metric positive definite covariance matrix | positive scalar | vector of positive scale values IncludeTies — Tie inclusion flag false (default) | true NSMethod — Nearest neighbor search method 'kdtree' | 'exhaustive' NumNeighbors — Number of nearest neighbors positive integer value Other Classification Properties CategoricalPredictors — Categorical predictor indices[] | vector of positive integers ClassNames — Names of classes in training data Y categorical array | character array | logical vector | numeric vector | cell array of character vectors Cost — Cost of misclassification square matrix ExpandedPredictorNames — Expanded predictor names cell array of character vectors ModelParameters — Parameters used in training ClassificationKNNobject Mu — Predictor means numeric vector NumObservations — Number of observations positive integer scalar PredictorNames — Predictor variable names cell array of character vectors Prior — Prior probabilities for each class numeric vector ResponseName — Response variable name character vector RowsUsed — Rows used in fitting [] | logical vector ScoreTransform — Score transformation 'none' (default) | 'doublelogit' | 'invlogit' | 'ismax' | 'logit' | function handle | ... Sigma — Predictor standard deviations numeric vector W — Observation weights vector of nonnegative values X — Unstandardized predictor data numeric matrix Y — Class labels categorical array | character array | logical vector | numeric vector | cell array of character vectors Hyperparameter Optimization Properties HyperparameterOptimizationResults — Cross-validation optimization of hyperparametersBayesianOptimization object | table Object Functions Examples Train k-Nearest Neighbor Classifier Tips Alternative Functionality Extended Capabilities C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™. GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™. Version History See Also Topics MATLAB Command Americas Europe Asia Pacific References

k-nearest neighbor classification

expand all in page

Description

ClassificationKNN is a nearest neighbor classification model in which you can alter both the distance metric and the number of nearest neighbors. Because a ClassificationKNN classifier stores training data, you can use the model to compute resubstitution predictions. Alternatively, use the model to classify new observations using the predict method.

Creation

Create a ClassificationKNN model using fitcknn.

Properties

expand all

KNN Properties

Tie-breaking algorithm used by predict when multiple classes have the same smallest cost, specified as one of the following:

  • 'smallest' — Use the smallest index among tied groups.

  • 'nearest' — Use the class with the nearest neighbor among tied groups.

  • 'random' — Use a random tiebreaker among tied groups.

By default, ties occur when multiple classes have the same number of nearest points among the k nearest neighbors. BreakTies applies when IncludeTies is false.

Change BreakTies using dot notation: mdl.BreakTies = newBreakTies.

Distance weighting function, specified as one of the values in this table.

ValueDescription
'equal'No weighting
'inverse'Weight is 1/distance
'squaredinverse'Weight is 1/distance2
@fcnfcn is a function that accepts a matrix of nonnegative distances and returns a matrix of the same size containing nonnegative distance weights. For example, 'squaredinverse' is equivalent to @(d)d.^(–2).

Change DistanceWeight using dot notation: mdl.DistanceWeight = newDistanceWeight.

Data Types: char | function_handle

Parameter for the distance metric, specified as one of the values described in this table.

Distance MetricParameter
'mahalanobis'Positive definite covariance matrix C
'minkowski'Minkowski distance exponent, a positive scalar
'seuclidean'Vector of positive scale values with length equal to the number of columns of X

For any other distance metric, the value of DistParameter must be [].

You can alter DistParameter using dot notation: mdl.DistParameter = newDistParameter. However, if Distance is 'mahalanobis' or 'seuclidean', then you cannot alter DistParameter.

Data Types: single | double

Tie inclusion flag indicating whether predict includes all the neighbors whose distance values are equal to the kth smallest distance, specified as false or true. If IncludeTies is true, predict includes all of these neighbors. Otherwise, predict uses exactly k neighbors (see the BreakTies property).

Change IncludeTies using dot notation: mdl.IncludeTies = newIncludeTies.

Data Types: logical

This property is read-only.

Nearest neighbor search method, specified as either 'kdtree' or 'exhaustive'.

  • 'kdtree' — Creates and uses a Kd-tree to find nearest neighbors.

  • 'exhaustive' — Uses the exhaustive search algorithm. When predicting the class of a new point xnew, the software computes the distance values from all points in X to xnew to find nearest neighbors.

The default value is 'kdtree' when X has 10 or fewer columns, X is not sparse, and the distance metric is a 'kdtree' type. Otherwise, the default value is 'exhaustive'.

Number of nearest neighbors in X used to classify each point during prediction, specified as a positive integer value.

Change NumNeighbors using dot notation: mdl.NumNeighbors = newNumNeighbors.

Data Types: single | double

Other Classification Properties

This property is read-only.

Categorical predictor indices, specified as a vector of positive integers. CategoricalPredictors contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and p, where p is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty ([]).

Data Types: double

This property is read-only.

Names of the classes in the training data Y with duplicates removed, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. ClassNames has the same data type as Y. (The software treats string arrays as cell arrays of character vectors.)

Data Types: categorical | char | logical | single | double | cell

Cost of the misclassification of a point, specified as a square matrix. Cost(i,j) is the cost of classifying a point into class j if its true class is i (that is, the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns in Cost corresponds to the order of the classes in ClassNames. The number of rows and columns in Cost is the number of unique classes in the response.

By default, Cost(i,j) = 1 if i ~= j, and Cost(i,j) = 0 if i = j. In other words, the cost is 0 for correct classification and 1 for incorrect classification.

Change a Cost matrix using dot notation: mdl.Cost = costMatrix.

Data Types: single | double

This property is read-only.

Expanded predictor names, specified as a cell array of character vectors.

If the model uses encoding for categorical variables, then ExpandedPredictorNames includes the names that describe the expanded variables. Otherwise, ExpandedPredictorNames is the same as PredictorNames.

Data Types: cell

This property is read-only.

Parameters used in training the ClassificationKNN model, specified as an object.

This property is read-only.

Predictor means, specified as a numeric vector of length numel(PredictorNames).

If you do not standardize mdl when training the model using fitcknn, then Mu is empty ([]).

Data Types: single | double

This property is read-only.

Number of observations used in training the ClassificationKNN model, specified as a positive integer scalar. This number can be less than the number of rows in the training data because rows containing NaN values are not part of the fit.

Data Types: double

This property is read-only.

Predictor variable names, specified as a cell array of character vectors. The variable names are in the same order in which they appear in the training data X.

Data Types: cell

Prior probabilities for each class, specified as a numeric vector. The order of the elements in Prior corresponds to the order of the classes in ClassNames.

Add or change a Prior vector using dot notation: mdl.Prior = priorVector.

Data Types: single | double

This property is read-only.

Response variable name, specified as a character vector.

Data Types: char

This property is read-only.

Rows of the original training data used in fitting the ClassificationKNN model, specified as a logical vector. This property is empty if all rows are used.

Data Types: logical

Score transformation, specified as either a character vector or a function handle.

This table summarizes the available character vectors.

ValueDescription
"doublelogit"1/(1 + e–2x)
"invlogit"log(x / (1 – x))
"ismax"Sets the score for the class with the largest score to 1, and sets the scores for all other classes to 0
"logit"1/(1 + ex)
"none" or "identity"x (no transformation)
"sign"–1 for x < 0
0for x = 0
1 for x >0
"symmetric"2x – 1
"symmetricismax"Sets the score for the class with the largest score to 1, and sets the scores for all other classes to –1
"symmetriclogit"2/(1 + ex)– 1

For a MATLAB® function or a function you define, use its function handle for score transform. The function handle must accept a matrix (the original scores) and return a matrix of the same size (the transformed scores).

Change ScoreTransform using dot notation: mdl.ScoreTransform = newScoreTransform.

Data Types: char | function_handle

This property is read-only.

Predictor standard deviations, specified as a numeric vector of length numel(PredictorNames).

If you do not standardize the predictor variables during training, then Sigma is empty ([]).

Data Types: single | double

This property is read-only.

Observation weights, specified as a vector of nonnegative values with the same number of rows as Y. Each entry in W specifies the relative importance of the corresponding observation in Y.

Data Types: single | double

This property is read-only.

Unstandardized predictor data, specified as a numeric matrix. Each column of X represents one predictor (variable), and each row represents one observation.

Data Types: single | double

This property is read-only.

Class labels, specified as a categorical or character array, logical or numeric vector, or cell array of character vectors. Each value in Y is the observed class label for the corresponding row in X.

Y has the same data type as the data in Y used for training the model. (The software treats string arrays as cell arrays of character vectors.)

Data Types: single | double | logical | char | cell | categorical

Hyperparameter Optimization Properties

This property is read-only.

Cross-validation optimization of hyperparameters, specified as a BayesianOptimization object or a table of hyperparameters and associated values. This property is nonempty when the 'OptimizeHyperparameters' name-value pair argument is nonempty when you create the model using fitcknn. The value depends on the setting of the 'HyperparameterOptimizationOptions' name-value pair argument when you create the model:

  • 'bayesopt' (default) — Object of class BayesianOptimization

  • 'gridsearch' or 'randomsearch' — Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst)

Object Functions

compareHoldoutCompare accuracies of two classification models using new data
crossvalCross-validate machine learning model
edgeEdge of k-nearest neighbor classifier
gatherGather properties of Statistics and Machine Learning Toolbox object from GPU
limeLocal interpretable model-agnostic explanations (LIME)
lossLoss of k-nearest neighbor classifier
marginMargin of k-nearest neighbor classifier
partialDependenceCompute partial dependence
plotPartialDependenceCreate partial dependence plot (PDP) and individual conditional expectation (ICE) plots
predictPredict labels using k-nearest neighbor classification model
resubEdgeResubstitution classification edge
resubLossResubstitution classification loss
resubMarginResubstitution classification margin
resubPredictClassify training data using trained classifier
shapleyShapley values
testckfoldCompare accuracies of two classification models by repeated cross-validation

Examples

collapse all

Train k-Nearest Neighbor Classifier

Open Live Script

Train a k-nearest neighbor classifier for Fisher's iris data, where k, the number of nearest neighbors in the predictors, is 5.

Load Fisher's iris data.

load fisheririsX = meas;Y = species;

X is a numeric matrix that contains four petal measurements for 150 irises. Y is a cell array of character vectors that contains the corresponding iris species.

Train a 5-nearest neighbor classifier. Standardize the noncategorical predictor data.

Mdl = fitcknn(X,Y,'NumNeighbors',5,'Standardize',1)
Mdl = ClassificationKNN ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 150 Distance: 'euclidean' NumNeighbors: 5

Mdl is a trained ClassificationKNN classifier, and some of its properties appear in the Command Window.

To access the properties of Mdl, use dot notation.

Mdl.ClassNames
ans = 3x1 cell {'setosa' } {'versicolor'} {'virginica' }
Mdl.Prior
ans = 1×3 0.3333 0.3333 0.3333

Mdl.Prior contains the class prior probabilities, which you can specify using the 'Prior' name-value pair argument in fitcknn. The order of the class prior probabilities corresponds to the order of the classes in Mdl.ClassNames. By default, the prior probabilities are the respective relative frequencies of the classes in the data.

You can also reset the prior probabilities after training. For example, set the prior probabilities to 0.5, 0.2, and 0.3, respectively.

Mdl.Prior = [0.5 0.2 0.3];

You can pass Mdl to predict to label new measurements or crossval to cross-validate the classifier.

Tips

  • The compact function reduces the size of most classification models by removing the training data properties and any other properties that are not required to predict the labels of new observations. Because k-nearest neighbor classification models require all of the training data to predict labels, you cannot reduce the size of a ClassificationKNN model.

Alternative Functionality

knnsearch finds the k-nearest neighbors of points. rangesearch finds all the points within a fixed distance. You can use these functions for classification, as shown in Classify Query Data. If you want to perform classification, then using ClassificationKNN models can be more convenient because you can train a classifier in one step (using fitcknn) and classify in other steps (using predict). Alternatively, you can train a k-nearest neighbor classification model using one of the cross-validation options in the call to fitcknn. In this case, fitcknn returns a ClassificationPartitionedModel cross-validated model object.

Extended Capabilities

Version History

Introduced in R2012a

See Also

fitcknn | predict

Topics

  • Construct KNN Classifier
  • Examine Quality of KNN Classifier
  • Predict Classification Using KNN Classifier
  • Modify KNN Classifier
  • Classification Using Nearest Neighbors

MATLAB Command

You clicked a link that corresponds to this MATLAB command:

 

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

k-nearest neighbor classification - MATLAB (1)

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

Americas

  • América Latina (Español)
  • Canada (English)
  • United States (English)

Europe

  • Belgium (English)
  • Denmark (English)
  • Deutschland (Deutsch)
  • España (Español)
  • Finland (English)
  • France (Français)
  • Ireland (English)
  • Italia (Italiano)
  • Luxembourg (English)
  • Netherlands (English)
  • Norway (English)
  • Österreich (Deutsch)
  • Portugal (English)
  • Sweden (English)
  • Switzerland
    • Deutsch
    • English
    • Français
  • United Kingdom (English)

Asia Pacific

Contact your local office

k-nearest neighbor classification - MATLAB (2024)

References

Top Articles
Latest Posts
Article information

Author: Mrs. Angelic Larkin

Last Updated:

Views: 5653

Rating: 4.7 / 5 (47 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Mrs. Angelic Larkin

Birthday: 1992-06-28

Address: Apt. 413 8275 Mueller Overpass, South Magnolia, IA 99527-6023

Phone: +6824704719725

Job: District Real-Estate Facilitator

Hobby: Letterboxing, Vacation, Poi, Homebrewing, Mountain biking, Slacklining, Cabaret

Introduction: My name is Mrs. Angelic Larkin, I am a cute, charming, funny, determined, inexpensive, joyous, cheerful person who loves writing and wants to share my knowledge and understanding with you.