AWS Machine Learning Project-01

Pratik Khose
5 min readJul 9, 2024

--

Predicting Building Energy Efficiency with Supervised Learning on AWS SageMaker.

Introduction

As sustainability becomes a focal point in architecture, predicting and enhancing the energy efficiency of buildings is increasingly important. In this project, I utilize supervised learning techniques to build a model that predicts the energy efficiency rating of buildings. This is achieved by analyzing features such as wall area, roof area, overall height, and glazing area. Using AWS SageMaker, I will generate synthetic data, train a RandomForestRegressor model, and visualize the results. The workflow of a supervised learning project, from data generation to model evaluation.

Setting Up the Environment

Step 1: Open up your AWS cloud account and signup and you will see the AWS Console Hompage.

AWS Console

Step 2: Access AWS SageMaker
From the AWS console, search for SageMaker. In the left-side menu, click on “Notebooks” and then “Notebook instances”.

AWS Sagemaker

Step 3: Create a Notebook Instance
Click “Create notebook instance” and configure the notebook.

Creating notebook

Step 4: Enter the “Notebook instance name”.

Configuring Notebook

Step 4: Create a IAM role by giving access to the AWS Sagemaker.

IAM Role in AWS Sagemaker Notebook

Step 4: Open Jupyter. Once the instance is ready, click “Open Jupyter”.

Notebook Created

Step 5: Start a New Notebook
In the Jupyter interface, click “New” and select “conda_python3”.

Configuring the notebook

Part 1: Predicting Building Energy Efficiency (Supervised Learning)

Scenario
I working for an architecture firm, and my task is to build a model that predicts the energy efficiency rating of buildings based on features like wall area, roof area, overall height, and glazing area.

Below is the Python code to set up and execute the unsupervised learning task:

Import necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

warnings.filterwarnings('ignore')

Generate synthetic dataset for building features and energy efficiency ratings

np.random.seed(0)
data_size = 500
data = {
'WallArea': np.random.randint(200, 400, data_size),
'RoofArea': np.random.randint(100, 200, data_size),
'OverallHeight': np.random.uniform(3, 10, data_size),
'GlazingArea': np.random.uniform(0, 1, data_size),
'EnergyEfficiency': np.random.uniform(10, 50, data_size) # Energy efficiency rating
}
df = pd.DataFrame(data)

Data preprocessing

X = df.drop('EnergyEfficiency', axis=1)
y = df['EnergyEfficiency']

Visualize the relationships between features and the target variable (Energy Efficiency)

sns.pairplot(df, x_vars=['WallArea', 'RoofArea', 'OverallHeight', 'GlazingArea'], y_vars='EnergyEfficiency', height=4, aspect=1, kind='scatter')
plt.show()

Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Train a Random Forest model

model = RandomForestRegressor()
model.fit(X_train, y_train)

Predict and evaluate

predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error: {mse}")

Plot the True values vs Predicted values

plt.figure(figsize=(10, 6))
plt.scatter(y_test, predictions)
plt.xlabel("True Values")
plt.ylabel("Predictions")
plt.title("True Values vs Predicted Values")
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--')
plt.show()

Detailed Explanation

  • Import Libraries: We import essential libraries for data manipulation (Pandas, NumPy), visualization (Matplotlib, Seaborn), and machine learning (Scikit-Learn).
  • Generate Synthetic Data: We create a synthetic dataset for building features and energy efficiency ratings. This includes wall area, roof area, overall height, glazing area, and the energy efficiency rating.
  • Data Preprocessing: We separate the features (X) from the target variable (y).
  • Data Visualization: Using Seaborn, we create scatter plots to visualize the relationships between features and the target variable.
  • Train-Test Split: We split the dataset into training and testing sets (80% training, 20% testing).
  • Train the Model: We train a RandomForestRegressor model on the training data.
  • Evaluate the Model: We make predictions on the test data and evaluate the model using Mean Squared Error (MSE).
  • Visualize Predictions: We create a scatter plot to compare the true values and the predicted values. Ideally, points should lie along the diagonal line (y=x), indicating accurate predictions.

Results-

Data Visualization

The scatter plots below show the relationship between each feature and the target variable (energy efficiency). These visualizations help in understanding how changes in features may relate to energy efficiency, although with synthetic data, these relationships might not show clear trends.

Model Performance

After training the model and making predictions, we evaluate the model using Mean Squared Error (MSE). With synthetic data, the MSE value may vary, but it gives an idea of the average error in the model’s predictions. The closer this value is to zero, the better the model’s performance.

Mean Squared Error: X.XX

Prediction vs. True Value Plot

The scatter plot below compares the true values and model predictions. Ideally, points should lie along the diagonal line (y=x), indicating accurate predictions. Deviations from this line suggest prediction errors.

Implementation Video-

Conclusion

This exercise demonstrates the application of supervised learning to predict building energy efficiency using a RandomForestRegressor model. By following the steps outlined, you can replicate the process and gain insights into the workflow of a supervised learning project.

Final Steps
After completing the exercise, remember to delete the notebook instance from AWS SageMaker to avoid unnecessary charges.

Summary
In this blog post, we explored the process of predicting building energy efficiency using supervised learning on AWS SageMaker. We generated synthetic data, trained a RandomForestRegressor model, and visualized the results. This hands-on approach provides a comprehensive understanding of how to apply supervised learning techniques to real-world problems.

For further details, you can check out the complete project documentation, demonstration video, and the source code available on GitHub.

🎥 Watch the demonstration video: https://youtu.be/-w5jSqokqzE?si=eaGjvXoC0_tdxbmZ
📂 Check out the GitHub documentation: https://github.com/Pratik-Khose/AWS-Machine-Mearning-projects

Feel free to connect with me for any discussions or collaborations on machine learning and data science projects. Let’s innovate together! 💡

#MachineLearning #SupervisedLearning #AWS #SageMaker #DataScience #Architecture #EnergyEfficiency #RandomForest

--

--