AWS Machine Learning Project-04

Pratik Khose
5 min readJul 10, 2024

--

Creating a Neural Network to Predict Customer Purchases

Introduction

We’ll build a simple neural network using AWS SageMaker to predict whether a customer will make a purchase based on their website visit duration and the number of pages they visited. This is a binary classification problem, and we’ll use a small neural network for quick execution.

Setting Up the Environment

Step 1: Open up your AWS cloud account and signup and you will see the AWS Console Hompage.

AWS Console

Step 2: Access AWS SageMaker
From the AWS console, search for SageMaker. In the left-side menu, click on “Notebooks” and then “Notebook instances”.

AWS Sagemaker

Step 3: Create a Notebook Instance
Click “Create notebook instance” and configure the notebook.

Creating notebook

Step 4: Enter the “Notebook instance name”.

Configuring Notebook

Step 4: Create a IAM role by giving access to the AWS Sagemaker.

IAM Role in AWS Sagemaker Notebook

Step 4: Open Jupyter. Once the instance is ready, click “Open Jupyter”.

Notebook Created

Step 5: Start a New Notebook
In the Jupyter interface, click “New” and select “conda_tensorflow2_p310”.

Configuring the notebook

Dataset

Below is the Python code to set up and execute the neural network learning task:

Generate Synthetic Data

We will generate a synthetic dataset for this exercise.

import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')# Generating synthetic data
np.random.seed(0)
data_size = 200
features = np.random.rand(data_size, 2) # Two features: visit duration and pages visited
labels = (features[:, 0] + features[:, 1] > 1).astype(int) # Purchase (1) or not (0)
# Convert to DataFrame for easier manipulation
df = pd.DataFrame(features, columns=['VisitDuration', 'PagesVisited'])
df['Purchase'] = labels

Preprocess the Data

We will split the data into training and testing sets.

from sklearn.model_selection import train_test_split
# Split the data
X_train, X_test, y_train, y_test = train_test_split(df[['VisitDuration', 'PagesVisited']], df['Purchase'], test_size=0.2, random_state=42)

Build and Train the Neural Network

We will define, compile, and train the neural network using TensorFlow.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Define the model
model = Sequential([
Dense(10, activation='relu', input_shape=(2,)), # Input layer with 2 features
Dense(1, activation='sigmoid') # Output layer with sigmoid activation for binary classification
])
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Train the model
history = model.fit(X_train, y_train, epochs=10, batch_size=10)

Evaluate the Model

We will evaluate the model’s performance on the test set.

# Evaluate the model on the test set
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {accuracy}")

Visualize the Training Process

We can plot the training loss and accuracy to understand how the model is learning.

import matplotlib.pyplot as plt
# Plot training & validation accuracy values
plt.plot(history.history['accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train'], loc='upper left')
plt.show()
# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train'], loc='upper left')
plt.show()

Visualize the Decision Boundary

For a simple neural network with two input features, we can visualize the decision boundary on a 2D plot.

import matplotlib.pyplot as plt
import numpy as np
# Define a function to plot decision boundary
def plot_decision_boundary(model, X, y):
x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
np.arange(y_min, y_max, 0.01))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = (Z > 0.5).astype(int)
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='g', marker='o', s=100)
plt.xlabel('Visit Duration')
plt.ylabel('Pages Visited')
plt.title('Decision Boundary')
plt.show()
# Plot decision boundary
plot_decision_boundary(model, X_test.values, y_test.values)

Detailed Explanation

Model Accuracy and Loss

In the context of the neural network exercise for predicting customer purchase behavior, “model accuracy” and “model loss” are two important metrics used to evaluate the performance of the model.

Model Accuracy

Model accuracy is the fraction of predictions our model got right. In the context of the exercise, it is the proportion of correctly predicted purchase decisions (both purchases and non-purchases) out of all predictions made.

  • Formula: Accuracy = (Number of Correct Predictions) / (Total Number of Predictions)
  • Interpretation: A higher accuracy indicates a better-performing model. For example, an accuracy of 0.90 means that 90% of the model’s predictions are correct.

Model Loss

Model loss measures how far the model’s predictions are from the actual class labels. It is a measure of the model’s error.

  • Binary Cross-Entropy: Commonly used in binary classification tasks. It calculates the loss for each instance by comparing the predicted probability with the actual label (either 0 or 1), and then takes the average over all instances.
  • Interpretation: Lower loss values are better, indicating that the model’s predictions are closer to the actual labels. A high loss value means the model’s predictions are far off from the actual labels.

Implementation Video-

Conclusion

We created a simple neural network using AWS SageMaker to predict customer purchase behavior based on website visit duration and the number of pages visited. We generated synthetic data, built and trained the neural network, evaluated its performance, and visualized the training process and decision boundary.

Unlike decision trees, neural networks are “black box” models, making them less interpretable. However, we can still visualize certain aspects to gain insights into the model’s performance and decision-making process.

Final Steps
After completing the exercise, remember to delete the notebook instance from AWS SageMaker to avoid unnecessary charges.

Summary
By following these steps, you have gained hands-on experience in building and evaluating a simple neural network for binary classification using AWS SageMaker. This exercise demonstrates how machine learning can provide valuable insights into customer behavior, helping businesses make data-driven decisions.

For further details, you can check out the complete project documentation, demonstration video, and the source code available on GitHub.

🎥 Watch the demonstration video: https://youtu.be/HIb6N1oQrjs
📂 Check out the GitHub documentation: https://github.com/Pratik-Khose/AWS-Machine-Mearning-projects

Let’s connect and discuss more about neural networks, customer behavior prediction, and their applications in various industries. Always eager to learn and collaborate on innovative projects! 🌟

#MachineLearning #NeuralNetwork #CustomerBehavior #AWS #SageMaker #DataScience #BinaryClassification #PredictiveModeling

--

--

Pratik Khose
Pratik Khose

Written by Pratik Khose

Hi, I document my cloud projects here😁

No responses yet