Master "StandardScaler" Part of scikit-learn, Most Popular Machine Learning Libraries

StandardScaler is part of scikit-learn (also called sklearn), which is one of the most popular machine learning libraries in Python.

This is used to standardize the data values into a standard format.

Thanks for reading Shuyeb’s Substack! Subscribe for free to receive new posts and support my work.

Let’s explain StandardScaler in a very simple way:

Assume. you and your friends just got your math test scores who attended exams and marks can be out of 50, 100, or even 200 depending on the exam

3 students wrote the exam of 50 marks,

5 students wrote the exam of 100 marks,

2 students wrote the exam of 200 marks

Some got very high marks, some got very low, and some are in the middle.

Now, suppose we want to compare all 10 students fairly. But the problem is: How do we compare them on the same scale?

The StandardScaler is like a teacher who says:

👉 “Let’s shift all scores so that the average (mean) becomes 0 and spread (standard deviation) becomes 1.”

This means:

Subtract the average (so everyone is compared from the middle point).
Divide by the spread (standard deviation) (so scores are not too stretched or too squeezed).

Suppose marks are: [50, 60, 70, 80, 90]

Now for a student with 80 marks:

This means 80 is 0.71 standard units above the average.

Similarly, 60 would become -0.71, meaning below average.

In real-world data:

If we don’t scale them, salary (big numbers) will dominate the model.

StandardScaler makes them equal and fair by putting everything on the same scale.

💡 In short

StandardScaler = “Make the data fair by centering around 0 and scaling by how spread out it is.”

You import it like this:

from sklearn.preprocessing import StandardScaler

Here is the explanation what each of the above syntax mean:

Recent Posts