top of page
Search

Master "StandardScaler" Part of scikit-learn, Most Popular Machine Learning Libraries

Master StandardScaler the Easy way
Master StandardScaler the Easy way

StandardScaler is part of scikit-learn (also called sklearn), which is one of the most popular machine learning libraries in Python.


This is used to standardize the data values into a standard format.


Thanks for reading Shuyeb’s Substack! Subscribe for free to receive new posts and support my work.


Let’s explain StandardScaler in a very simple way:


Assume. you and your friends just got your math test scores who attended exams and marks can be out of 50, 100, or even 200 depending on the exam


3 students wrote the exam of 50 marks,

5 students wrote the exam of 100 marks,

2 students wrote the exam of 200 marks


Some got very high marks, some got very low, and some are in the middle.


Now, suppose we want to compare all 10 students fairly. But the problem is: How do we compare them on the same scale?


📏 StandardScaler to the Rescue!


The StandardScaler is like a teacher who says:

👉 “Let’s shift all scores so that the average (mean) becomes 0 and spread (standard deviation) becomes 1.”


This means:

  1. Subtract the average (so everyone is compared from the middle point).

  2. Divide by the spread (standard deviation) (so scores are not too stretched or too squeezed).


⚖️ Example

Suppose marks are: [50, 60, 70, 80, 90]

  • Average = 70

  • Spread (std dev) ≈ 14


Now for a student with 80 marks:


ree




This means 80 is 0.71 standard units above the average.

Similarly, 60 would become -0.71, meaning below average.


🌍 Why do we use it in Data Science?


In real-world data:

  • One feature could be age (0–100 years)

  • Another feature could be salary (₹10,000 – ₹10,00,000)

If we don’t scale them, salary (big numbers) will dominate the model.

StandardScaler makes them equal and fair by putting everything on the same scale.


💡 In short

StandardScaler = “Make the data fair by centering around 0 and scaling by how spread out it is.”


You import it like this:


from sklearn.preprocessing import StandardScaler

Here is the explanation what each of the above syntax mean:

  • sklearn → the library

  • preprocessing → the module inside sklearn (for preparing/cleaning data)

  • StandardScaler → the specific class we use for standardization


 
 
 

Comments


bottom of page