Curse of Dimensionality

Huda
3 min readNov 14, 2021
https://www.istockphoto.com/photo/magic-bottle-gm1254435294-366650541
Photo from istockphoto

“Data is the new oil”- a poetic phrase coined by British mathematician Clive Humbly seems a lot more relevant in today’s world. As the world is progressing, our capability to generate and store data has become a lot easier. We are completely surrounded by data. Consider a store like Target; if they have to come up with some personalized offers for their customers, they will have to scan through a million observations, and corresponding to every observation, there might be hundreds of attributes. Such datasets are nothing but an example of high-dimensional data.

High Dimensionality

Each input feature is considered as a dimension. For example- if you take a model where you want to predict the age of a child based on its height and weight, then this height and weight become the features of the model. Each of these features translates as a dimension. Thus, a data set with a very high number of features is a high-dimensional dataset. It is often considered that more the data and features we have, the more accurate the results will be. This holds true but only up to a certain threshold. A model with 10 input features might have more accurate results than a model with 2 input features. But it doesn’t necessarily mean that our model accuracy will continue to increase if I give, say, 100 or 1000 input features. There will come a point wherein my accuracy would remain…

--

--

Huda

Data Scientist with recent experience in data acquisition and data modeling, statistical analysis, machine learning, deep learning and NLP