Titanic Data Model
Titanic Data Model
Systematic Workflow Documentation
The cells below demonstrate that I have been systematically working through a series of tasks in a structured manner, rather than performing random actions. Each cell builds upon the previous ones, showcasing a clear and organized approach to data analysis and model building.
Introduction
This notebook documents the process of building a predictive model for the Titanic dataset. The goal is to predict the survival of passengers based on various features.
Setup and Installation
We start by installing the necessary packages and importing the required libraries.
Data Loading and Exploration
We load the Titanic dataset using Seaborn and explore its structure and key features.
Data Preprocessing
We preprocess the data by handling missing values, encoding categorical variables, and preparing the data for model training.
Statistical Analysis
We perform statistical analysis to understand the distribution of features and their relationship with the target variable.
Model Training and Evaluation
We train multiple machine learning models, including Decision Tree and Logistic Regression, and evaluate their performance.
Prediction and Feature Importance
We use the trained models to predict the survival probability of a new passenger and determine the importance of each feature in the prediction.
Class Implementation
We implement the TitanicModel
class to encapsulate the entire workflow, making it reusable and modular.
Backend Integration
We integrate the model with a Flask API to provide a RESTful endpoint for predicting passenger survival.
By following this structured approach, we ensure that each step is well-documented and logically connected to the next, providing a clear and comprehensive workflow.
# Uncomment the following lines to install the required packages
!pip install seaborn
!pip install pandas
!pip install scikit-learn
Requirement already satisfied: seaborn in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (0.13.2)
Requirement already satisfied: numpy!=1.24.0,>=1.20 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from seaborn) (2.2.3)
Requirement already satisfied: pandas>=1.2 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from seaborn) (2.2.3)
Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from seaborn) (3.10.1)
Requirement already satisfied: contourpy>=1.0.1 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.56.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.4.8)
Requirement already satisfied: packaging>=20.0 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (24.2)
Requirement already satisfied: pillow>=8 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (11.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.2.1)
Requirement already satisfied: python-dateutil>=2.7 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from pandas>=1.2->seaborn) (2025.1)
Requirement already satisfied: tzdata>=2022.7 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from pandas>=1.2->seaborn) (2025.1)
Requirement already satisfied: six>=1.5 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.17.0)
Requirement already satisfied: pandas in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (2.2.3)
Requirement already satisfied: numpy>=1.26.0 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from pandas) (2.2.3)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from pandas) (2025.1)
Requirement already satisfied: tzdata>=2022.7 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from pandas) (2025.1)
Requirement already satisfied: six>=1.5 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)
Requirement already satisfied: scikit-learn in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (1.6.1)
Requirement already satisfied: numpy>=1.19.5 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from scikit-learn) (2.2.3)
Requirement already satisfied: scipy>=1.6.0 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from scikit-learn) (1.15.2)
Requirement already satisfied: joblib>=1.2.0 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from scikit-learn) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from scikit-learn) (3.6.0)
import seaborn as sns
titanic_data = sns.load_dataset('titanic')
import seaborn as sns
# Load the titanic dataset
titanic_data = sns.load_dataset('titanic')
print("Titanic Data")
print(titanic_data.columns) # titanic data set
display(titanic_data[['survived','pclass', 'sex', 'age', 'sibsp', 'parch', 'class', 'fare', 'embark_town', 'alone']]) # look at selected columns
Titanic Data
Index(['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare',
'embarked', 'class', 'who', 'adult_male', 'deck', 'embark_town',
'alive', 'alone'],
dtype='object')
survived | pclass | sex | age | sibsp | parch | class | fare | embark_town | alone | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | male | 22.0 | 1 | 0 | Third | 7.2500 | Southampton | False |
1 | 1 | 1 | female | 38.0 | 1 | 0 | First | 71.2833 | Cherbourg | False |
2 | 1 | 3 | female | 26.0 | 0 | 0 | Third | 7.9250 | Southampton | True |
3 | 1 | 1 | female | 35.0 | 1 | 0 | First | 53.1000 | Southampton | False |
4 | 0 | 3 | male | 35.0 | 0 | 0 | Third | 8.0500 | Southampton | True |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
886 | 0 | 2 | male | 27.0 | 0 | 0 | Second | 13.0000 | Southampton | True |
887 | 1 | 1 | female | 19.0 | 0 | 0 | First | 30.0000 | Southampton | True |
888 | 0 | 3 | female | NaN | 1 | 2 | Third | 23.4500 | Southampton | False |
889 | 1 | 1 | male | 26.0 | 0 | 0 | First | 30.0000 | Cherbourg | True |
890 | 0 | 3 | male | 32.0 | 0 | 0 | Third | 7.7500 | Queenstown | True |
891 rows × 10 columns
import pandas as pd
# Preprocess the data
from sklearn.preprocessing import OneHotEncoder
td = titanic_data
td.drop(['alive', 'who', 'adult_male', 'class', 'embark_town', 'deck'], axis=1, inplace=True)
td.dropna(inplace=True) # drop rows with at least one missing value, after dropping unuseful columns
td['sex'] = td['sex'].apply(lambda x: 1 if x == 'male' else 0)
td['alone'] = td['alone'].apply(lambda x: 1 if x == True else 0)
# Encode categorical variables
enc = OneHotEncoder(handle_unknown='ignore')
enc.fit(td[['embarked']])
onehot = enc.transform(td[['embarked']]).toarray()
cols = ['embarked_' + val for val in enc.categories_[0]]
td[cols] = pd.DataFrame(onehot)
td.drop(['embarked'], axis=1, inplace=True)
td.dropna(inplace=True) # drop rows with at least one missing value, after preparing the data
print(td.columns)
display(td)
Index(['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare', 'alone',
'embarked_C', 'embarked_Q', 'embarked_S'],
dtype='object')
survived | pclass | sex | age | sibsp | parch | fare | alone | embarked_C | embarked_Q | embarked_S | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | 1 | 22.0 | 1 | 0 | 7.2500 | 0 | 0.0 | 0.0 | 1.0 |
1 | 1 | 1 | 0 | 38.0 | 1 | 0 | 71.2833 | 0 | 1.0 | 0.0 | 0.0 |
2 | 1 | 3 | 0 | 26.0 | 0 | 0 | 7.9250 | 1 | 0.0 | 0.0 | 1.0 |
3 | 1 | 1 | 0 | 35.0 | 1 | 0 | 53.1000 | 0 | 0.0 | 0.0 | 1.0 |
4 | 0 | 3 | 1 | 35.0 | 0 | 0 | 8.0500 | 1 | 0.0 | 0.0 | 1.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
705 | 0 | 2 | 1 | 39.0 | 0 | 0 | 26.0000 | 1 | 0.0 | 0.0 | 1.0 |
706 | 1 | 2 | 0 | 45.0 | 0 | 0 | 13.5000 | 1 | 0.0 | 0.0 | 1.0 |
707 | 1 | 1 | 1 | 42.0 | 0 | 0 | 26.2875 | 1 | 0.0 | 1.0 | 0.0 |
708 | 1 | 1 | 0 | 22.0 | 0 | 0 | 151.5500 | 1 | 0.0 | 0.0 | 1.0 |
710 | 1 | 1 | 0 | 24.0 | 0 | 0 | 49.5042 | 1 | 1.0 | 0.0 | 0.0 |
564 rows × 11 columns
print(titanic_data.median())
survived 0.0
pclass 2.0
sex 1.0
age 28.0
sibsp 0.0
parch 0.0
fare 16.1
alone 1.0
embarked_C 0.0
embarked_Q 0.0
embarked_S 1.0
dtype: float64
print(titanic_data.query("survived == 0").mean())
survived 0.000000
pclass 2.464072
sex 0.844311
age 31.073353
sibsp 0.562874
parch 0.398204
fare 24.835902
alone 0.616766
embarked_C 0.185629
embarked_Q 0.038922
embarked_S 0.775449
dtype: float64
print(td.query("survived == 1").mean())
survived 1.000000
pclass 1.878261
sex 0.326087
age 28.481522
sibsp 0.504348
parch 0.508696
fare 50.188806
alone 0.456522
embarked_C 0.152174
embarked_Q 0.034783
embarked_S 0.813043
dtype: float64
print("maximums for survivors")
print(td.query("survived == 1").max())
print()
print("minimums for survivors")
print(td.query("survived == 1").min())
maximums for survivors
survived 1.0000
pclass 3.0000
sex 1.0000
age 80.0000
sibsp 4.0000
parch 5.0000
fare 512.3292
alone 1.0000
embarked_C 1.0000
embarked_Q 1.0000
embarked_S 1.0000
dtype: float64
minimums for survivors
survived 1.00
pclass 1.00
sex 0.00
age 0.75
sibsp 0.00
parch 0.00
fare 0.00
alone 0.00
embarked_C 0.00
embarked_Q 0.00
embarked_S 0.00
dtype: float64
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Build distinct data frames on survived column
X = td.drop('survived', axis=1) # all except 'survived'
y = td['survived'] # only 'survived'
# Split arrays in random train 70%, random test 30%, using stratified sampling (same proportion of survived in both sets) and a fixed random state (42
# The number 42 is often used in examples and tutorials because of its cultural significance in fields like science fiction (it's the "Answer to the Ultimate Question of Life, The Universe, and Everything" in The Hitchhiker's Guide to the Galaxy by Douglas Adams). But in practice, the actual value doesn't matter; what's important is that it's set to a consistent value.
# X_train is the DataFrame containing the features for the training set.
# X_test is the DataFrame containing the features for the test set.
# y-train is the 'survived' status for each passenger in the training set, corresponding to the X_train data.
# y_test is the 'survived' status for each passenger in the test set, corresponding to the X_test data.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train a decision tree classifier
dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)
# Test the model
y_pred = dt.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('DecisionTreeClassifier Accuracy: {:.2%}'.format(accuracy))
# Train a logistic regression model
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
# Test the model
y_pred = logreg.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('LogisticRegression Accuracy: {:.2%}'.format(accuracy))
DecisionTreeClassifier Accuracy: 74.71%
LogisticRegression Accuracy: 78.82%
/home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages/sklearn/linear_model/_logistic.py:465: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
import numpy as np
# Logistic regression model is used to predict the probability
# Define a new passenger
passenger = pd.DataFrame({
'name': ['Pranav Santhosh'],
'pclass': [2], # 2nd class picked as it was median, bargains are my preference, but I don't want to have poor accomodations
'sex': ['male'],
'age': [15],
'sibsp': [4], # I usually travel with my family (mom, dad, 2 siblings)
'parch': [0], # currenly I have 0 children at home (im too young for that)
'fare': [16], # median fare picked assuming it is 2nd class
'embarked': ['S'], # majority of passengers embarked in Southampton (false but example)
'alone': [False] # travelling with family (mom, dad, 2 siblings))
})
display(passenger)
new_passenger = passenger.copy()
# Preprocess the new passenger data
new_passenger['sex'] = new_passenger['sex'].apply(lambda x: 1 if x == 'male' else 0)
new_passenger['alone'] = new_passenger['alone'].apply(lambda x: 1 if x == True else 0)
# Encode 'embarked' variable
onehot = enc.transform(new_passenger[['embarked']]).toarray()
cols = ['embarked_' + val for val in enc.categories_[0]]
new_passenger[cols] = pd.DataFrame(onehot, index=new_passenger.index)
new_passenger.drop(['name'], axis=1, inplace=True)
new_passenger.drop(['embarked'], axis=1, inplace=True)
display(new_passenger)
# Predict the survival probability for the new passenger
dead_proba, alive_proba = np.squeeze(logreg.predict_proba(new_passenger))
# Print the survival probability
print('Death probability: {:.2%}'.format(dead_proba))
print('Survival probability: {:.2%}'.format(alive_proba))
name | pclass | sex | age | sibsp | parch | fare | embarked | alone | |
---|---|---|---|---|---|---|---|---|---|
0 | Pranav Santhosh | 2 | male | 15 | 4 | 0 | 16 | S | False |
pclass | sex | age | sibsp | parch | fare | alone | embarked_C | embarked_Q | embarked_S | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | 1 | 15 | 4 | 0 | 16 | 0 | 0.0 | 0.0 | 1.0 |
Death probability: 82.11%
Survival probability: 17.89%
# Decision tree model is used to determine the importance of each feature
importances = dt.feature_importances_
for feature, importance in zip(new_passenger.columns, importances):
print(f'The importance of {feature} is: {importance}')
The importance of pclass is: 0.14556375413239328
The importance of sex is: 0.27345943069742495
The importance of age is: 0.23633016299020845
The importance of sibsp is: 0.05829266033554311
The importance of parch is: 0.013914855333419261
The importance of fare is: 0.2387482117115309
The importance of alone is: 0.0052274054025367505
The importance of embarked_C is: 0.011151798192078404
The importance of embarked_Q is: 0.0
The importance of embarked_S is: 0.01731172120486489
## Python Titanic Model, prepared for a titanic.py file
# Import the required libraries for the TitanicModel class
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
import pandas as pd
import numpy as np
import seaborn as sns
class TitanicModel:
"""A class used to represent the Titanic Model for passenger survival prediction.
"""
# a singleton instance of TitanicModel, created to train the model only once, while using it for prediction multiple times
_instance = None
# constructor, used to initialize the TitanicModel
def __init__(self):
# the titanic ML model
self.model = None
self.dt = None
# define ML features and target
self.features = ['pclass', 'sex', 'age', 'sibsp', 'parch', 'fare', 'alone']
self.target = 'survived'
# load the titanic dataset
self.titanic_data = sns.load_dataset('titanic')
# one-hot encoder used to encode 'embarked' column
self.encoder = OneHotEncoder(handle_unknown='ignore')
# clean the titanic dataset, prepare it for training
def _clean(self):
# Drop unnecessary columns
self.titanic_data.drop(['alive', 'who', 'adult_male', 'class', 'embark_town', 'deck'], axis=1, inplace=True)
# Convert boolean columns to integers
self.titanic_data['sex'] = self.titanic_data['sex'].apply(lambda x: 1 if x == 'male' else 0)
self.titanic_data['alone'] = self.titanic_data['alone'].apply(lambda x: 1 if x == True else 0)
# Drop rows with missing 'embarked' values before one-hot encoding
self.titanic_data.dropna(subset=['embarked'], inplace=True)
# One-hot encode 'embarked' column
onehot = self.encoder.fit_transform(self.titanic_data[['embarked']]).toarray()
cols = ['embarked_' + str(val) for val in self.encoder.categories_[0]]
onehot_df = pd.DataFrame(onehot, columns=cols)
self.titanic_data = pd.concat([self.titanic_data, onehot_df], axis=1)
self.titanic_data.drop(['embarked'], axis=1, inplace=True)
# Add the one-hot encoded 'embarked' features to the features list
self.features.extend(cols)
# Drop rows with missing values
self.titanic_data.dropna(inplace=True)
# train the titanic model, using logistic regression as key model, and decision tree to show feature importance
def _train(self):
# split the data into features and target
X = self.titanic_data[self.features]
y = self.titanic_data[self.target]
# perform train-test split
self.model = LogisticRegression(max_iter=1000)
# train the model
self.model.fit(X, y)
# train a decision tree classifier
self.dt = DecisionTreeClassifier()
self.dt.fit(X, y)
@classmethod
def get_instance(cls):
""" Gets, and conditionaly cleans and builds, the singleton instance of the TitanicModel.
The model is used for analysis on titanic data and predictions on the survival of theoritical passengers.
Returns:
TitanicModel: the singleton _instance of the TitanicModel, which contains data and methods for prediction.
"""
# check for instance, if it doesn't exist, create it
if cls._instance is None:
cls._instance = cls()
cls._instance._clean()
cls._instance._train()
# return the instance, to be used for prediction
return cls._instance
def predict(self, passenger):
""" Predict the survival probability of a passenger.
Args:
passenger (dict): A dictionary representing a passenger. The dictionary should contain the following keys:
'pclass': The passenger's class (1, 2, or 3)
'sex': The passenger's sex ('male' or 'female')
'age': The passenger's age
'sibsp': The number of siblings/spouses the passenger has aboard
'parch': The number of parents/children the passenger has aboard
'fare': The fare the passenger paid
'embarked': The port at which the passenger embarked ('C', 'Q', or 'S')
'alone': Whether the passenger is alone (True or False)
Returns:
dictionary : contains die and survive probabilities
"""
# clean the passenger data
passenger_df = pd.DataFrame(passenger, index=[0])
passenger_df['sex'] = passenger_df['sex'].apply(lambda x: 1 if x == 'male' else 0)
passenger_df['alone'] = passenger_df['alone'].apply(lambda x: 1 if x == True else 0)
onehot = self.encoder.transform(passenger_df[['embarked']]).toarray()
cols = ['embarked_' + str(val) for val in self.encoder.categories_[0]]
onehot_df = pd.DataFrame(onehot, columns=cols)
passenger_df = pd.concat([passenger_df, onehot_df], axis=1)
passenger_df.drop(['embarked', 'name'], axis=1, inplace=True)
# predict the survival probability and extract the probabilities from numpy array
die, survive = np.squeeze(self.model.predict_proba(passenger_df))
# return the survival probabilities as a dictionary
return {'die': die, 'survive': survive}
def feature_weights(self):
"""Get the feature weights
The weights represent the relative importance of each feature in the prediction model.
Returns:
dictionary: contains each feature as a key and its weight of importance as a value
"""
# extract the feature importances from the decision tree model
importances = self.dt.feature_importances_
# return the feature importances as a dictionary, using dictionary comprehension
return {feature: importance for feature, importance in zip(self.features, importances)}
def initTitanic():
""" Initialize the Titanic Model.
This function is used to load the Titanic Model into memory, and prepare it for prediction.
"""
TitanicModel.get_instance()
def testTitanic():
""" Test the Titanic Model
Using the TitanicModel class, we can predict the survival probability of a passenger.
Print output of this test contains method documentation, passenger data, survival probability, and survival weights.
"""
# setup passenger data for prediction
print(" Step 1: Define theoritical passenger data for prediction: ")
passenger = {
'name': ['John Mortensen'],
'pclass': [2],
'sex': ['male'],
'age': [65],
'sibsp': [1],
'parch': [1],
'fare': [16.00],
'embarked': ['S'],
'alone': [False]
}
print("\t", passenger)
print()
# get an instance of the cleaned and trained Titanic Model
titanicModel = TitanicModel.get_instance()
print(" Step 2:", titanicModel.get_instance.__doc__)
# print the survival probability
print(" Step 3:", titanicModel.predict.__doc__)
probability = titanicModel.predict(passenger)
print('\t death probability: {:.2%}'.format(probability.get('die')))
print('\t survival probability: {:.2%}'.format(probability.get('survive')))
print()
# print the feature weights in the prediction model
print(" Step 4:", titanicModel.feature_weights.__doc__)
importances = titanicModel.feature_weights()
for feature, importance in importances.items():
print("\t\t", feature, f"{importance:.2%}") # importance of each feature, each key/value pair
if __name__ == "__main__":
print(" Begin:", testTitanic.__doc__)
testTitanic()
Begin: Test the Titanic Model
Using the TitanicModel class, we can predict the survival probability of a passenger.
Print output of this test contains method documentation, passenger data, survival probability, and survival weights.
Step 1: Define theoritical passenger data for prediction:
{'name': ['John Mortensen'], 'pclass': [2], 'sex': ['male'], 'age': [65], 'sibsp': [1], 'parch': [1], 'fare': [16.0], 'embarked': ['S'], 'alone': [False]}
Step 2: Gets, and conditionaly cleans and builds, the singleton instance of the TitanicModel.
The model is used for analysis on titanic data and predictions on the survival of theoritical passengers.
Returns:
TitanicModel: the singleton _instance of the TitanicModel, which contains data and methods for prediction.
Step 3: Predict the survival probability of a passenger.
Args:
passenger (dict): A dictionary representing a passenger. The dictionary should contain the following keys:
'pclass': The passenger's class (1, 2, or 3)
'sex': The passenger's sex ('male' or 'female')
'age': The passenger's age
'sibsp': The number of siblings/spouses the passenger has aboard
'parch': The number of parents/children the passenger has aboard
'fare': The fare the passenger paid
'embarked': The port at which the passenger embarked ('C', 'Q', or 'S')
'alone': Whether the passenger is alone (True or False)
Returns:
dictionary : contains die and survive probabilities
death probability: 93.49%
survival probability: 6.51%
Step 4: Get the feature weights
The weights represent the relative importance of each feature in the prediction model.
Returns:
dictionary: contains each feature as a key and its weight of importance as a value
pclass 12.17%
sex 29.65%
age 25.18%
sibsp 5.64%
parch 1.61%
fare 22.16%
alone 0.48%
embarked_C 0.95%
embarked_Q 1.19%
embarked_S 0.98%
Everything below is meant for the backend.
THIS IS BACKEND CODE
## Python Titanic Sample API endpoint
from flask import Blueprint, request, jsonify
from flask_restful import Api, Resource # used for REST API building
# Import the TitanicModel class from the model file
# from model.titanic import TitanicModel
titanic_api = Blueprint('titanic_api', __name__,
url_prefix='/api/titanic')
api = Api(titanic_api)
class TitanicAPI:
class _Predict(Resource):
def post(self):
""" Semantics: In HTTP, POST requests are used to send data to the server for processing.
Sending passenger data to the server to get a prediction fits the semantics of a POST request.
POST requests send data in the body of the request...
1. which can handle much larger amounts of data and data types, than URL parameters
2. using an HTTPS request, the data is encrypted, making it more secure
3. a JSON formated body is easy to read and write between JavaScript and Python, great for Postman testing
"""
# Get the passenger data from the request
passenger = request.get_json()
# Get the singleton instance of the TitanicModel
titanicModel = TitanicModel.get_instance()
# Predict the survival probability of the passenger
response = titanicModel.predict(passenger)
# Return the response as JSON
return jsonify(response)
api.add_resource(_Predict, '/predict')
Cell In[14], line 1
THIS IS BACKEND CODE
^
SyntaxError: invalid syntax
app.register_blueprint(titanic_api) # register api routes
@custom_cli.command('generate_data')
def generate_data():
initUsers()
initPlayers()
initTitanic() # init titanic data