Member-only story

Random Forest model trained with scikit-learn in Python + save and load model

GeoSense ✅
3 min readJan 27, 2023

--

In this post, I will show you how to train, save and load Random Forest model with scikit-learn and joblib in Python. Moreover, I will show you, how to compress the model and get a smaller file.

Train and save Machine Learning Model
# Load Libraries
import os
import joblib
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Train the Random Forest classifier
rf = RandomForestClassifier()
rf.fit(X,y)
rf.predict(X)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

Let’s save the Random Forest

# save
joblib.dump(rf, "random_forest.joblib")
# load, no need to initialize the loaded_rf
loaded_rf = joblib.load("random_forest.joblib")
loaded_rf.predict(X)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

Compressing the Scikit-Learn Random Forest

In the joblib docs, there is information that compress=3 is a good compromise between size and speed.

joblib.dump(rf, "RF_uncompressed.joblib", compress=0) 
print(f"Uncompressed Random Forest: {np.round(os.path.getsize('RF_uncompressed.joblib') / 1024 / 1024, 2) } MB")

joblib.dump(rf, "RF_compressed.joblib", compress=3) # compression is ON!
print(f"Compressed Random Forest: {np.round(os.path.getsize('RF_compressed.joblib') / 1024 / 1024, 2) } MB")
Uncompressed Random Forest: 0.17 MB
Compressed Random Forest: 0.02 MB

Link Github for this article here.

Thanks!

--

--

GeoSense ✅
GeoSense ✅

Written by GeoSense ✅

🌏 Remote sensing | 🛰️ Geographic Information Systems (GIS) | ℹ️ https://www.tnmthai.com/medium

No responses yet

Write a response