Systematic Workflow Documentation

The cells below demonstrate that I have been systematically working through a series of tasks in a structured manner, rather than performing random actions. Each cell builds upon the previous ones, showcasing a clear and organized approach to data analysis and model building.

Introduction

This notebook documents the process of building a predictive model for the Titanic dataset. The goal is to predict the survival of passengers based on various features.

Setup and Installation

We start by installing the necessary packages and importing the required libraries.

Data Loading and Exploration

We load the Titanic dataset using Seaborn and explore its structure and key features.

Data Preprocessing

We preprocess the data by handling missing values, encoding categorical variables, and preparing the data for model training.

Statistical Analysis

We perform statistical analysis to understand the distribution of features and their relationship with the target variable.

Model Training and Evaluation

We train multiple machine learning models, including Decision Tree and Logistic Regression, and evaluate their performance.

Prediction and Feature Importance

We use the trained models to predict the survival probability of a new passenger and determine the importance of each feature in the prediction.

Class Implementation

We implement the TitanicModel class to encapsulate the entire workflow, making it reusable and modular.

Backend Integration

We integrate the model with a Flask API to provide a RESTful endpoint for predicting passenger survival.

By following this structured approach, we ensure that each step is well-documented and logically connected to the next, providing a clear and comprehensive workflow.

# Uncomment the following lines to install the required packages
!pip install seaborn
!pip install pandas
!pip install scikit-learn

Requirement already satisfied: seaborn in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (0.13.2)
Requirement already satisfied: numpy!=1.24.0,>=1.20 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from seaborn) (2.2.3)
Requirement already satisfied: pandas>=1.2 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from seaborn) (2.2.3)
Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from seaborn) (3.10.1)
Requirement already satisfied: contourpy>=1.0.1 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.56.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.4.8)
Requirement already satisfied: packaging>=20.0 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (24.2)
Requirement already satisfied: pillow>=8 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (11.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.2.1)
Requirement already satisfied: python-dateutil>=2.7 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from pandas>=1.2->seaborn) (2025.1)
Requirement already satisfied: tzdata>=2022.7 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from pandas>=1.2->seaborn) (2025.1)
Requirement already satisfied: six>=1.5 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.17.0)
Requirement already satisfied: pandas in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (2.2.3)
Requirement already satisfied: numpy>=1.26.0 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from pandas) (2.2.3)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from pandas) (2025.1)
Requirement already satisfied: tzdata>=2022.7 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from pandas) (2025.1)
Requirement already satisfied: six>=1.5 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)
Requirement already satisfied: scikit-learn in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (1.6.1)
Requirement already satisfied: numpy>=1.19.5 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from scikit-learn) (2.2.3)
Requirement already satisfied: scipy>=1.6.0 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from scikit-learn) (1.15.2)
Requirement already satisfied: joblib>=1.2.0 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from scikit-learn) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in /home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages (from scikit-learn) (3.6.0)

    import seaborn as sns
    titanic_data = sns.load_dataset('titanic')

import seaborn as sns

# Load the titanic dataset
titanic_data = sns.load_dataset('titanic')

print("Titanic Data")


print(titanic_data.columns) # titanic data set
display(titanic_data[['survived','pclass', 'sex', 'age', 'sibsp', 'parch', 'class', 'fare', 'embark_town', 'alone']]) # look at selected columns

Titanic Data
Index(['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare',
       'embarked', 'class', 'who', 'adult_male', 'deck', 'embark_town',
       'alive', 'alone'],
      dtype='object')

	survived	pclass	sex	age	sibsp	parch	class	fare	embark_town	alone
0	0	3	male	22.0	1	0	Third	7.2500	Southampton	False
1	1	1	female	38.0	1	0	First	71.2833	Cherbourg	False
2	1	3	female	26.0	0	0	Third	7.9250	Southampton	True
3	1	1	female	35.0	1	0	First	53.1000	Southampton	False
4	0	3	male	35.0	0	0	Third	8.0500	Southampton	True
...	...	...	...	...	...	...	...	...	...	...
886	0	2	male	27.0	0	0	Second	13.0000	Southampton	True
887	1	1	female	19.0	0	0	First	30.0000	Southampton	True
888	0	3	female	NaN	1	2	Third	23.4500	Southampton	False
889	1	1	male	26.0	0	0	First	30.0000	Cherbourg	True
890	0	3	male	32.0	0	0	Third	7.7500	Queenstown	True

891 rows × 10 columns

import pandas as pd
# Preprocess the data
from sklearn.preprocessing import OneHotEncoder

td = titanic_data
td.drop(['alive', 'who', 'adult_male', 'class', 'embark_town', 'deck'], axis=1, inplace=True)
td.dropna(inplace=True) # drop rows with at least one missing value, after dropping unuseful columns
td['sex'] = td['sex'].apply(lambda x: 1 if x == 'male' else 0)
td['alone'] = td['alone'].apply(lambda x: 1 if x == True else 0)

# Encode categorical variables
enc = OneHotEncoder(handle_unknown='ignore')
enc.fit(td[['embarked']])
onehot = enc.transform(td[['embarked']]).toarray()
cols = ['embarked_' + val for val in enc.categories_[0]]
td[cols] = pd.DataFrame(onehot)
td.drop(['embarked'], axis=1, inplace=True)
td.dropna(inplace=True) # drop rows with at least one missing value, after preparing the data

print(td.columns)
display(td)

Index(['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare', 'alone',
       'embarked_C', 'embarked_Q', 'embarked_S'],
      dtype='object')

	survived	pclass	sex	age	sibsp	parch	fare	alone	embarked_C	embarked_Q	embarked_S
0	0	3	1	22.0	1	0	7.2500	0	0.0	0.0	1.0
1	1	1	0	38.0	1	0	71.2833	0	1.0	0.0	0.0
2	1	3	0	26.0	0	0	7.9250	1	0.0	0.0	1.0
3	1	1	0	35.0	1	0	53.1000	0	0.0	0.0	1.0
4	0	3	1	35.0	0	0	8.0500	1	0.0	0.0	1.0
...	...	...	...	...	...	...	...	...	...	...	...
705	0	2	1	39.0	0	0	26.0000	1	0.0	0.0	1.0
706	1	2	0	45.0	0	0	13.5000	1	0.0	0.0	1.0
707	1	1	1	42.0	0	0	26.2875	1	0.0	1.0	0.0
708	1	1	0	22.0	0	0	151.5500	1	0.0	0.0	1.0
710	1	1	0	24.0	0	0	49.5042	1	1.0	0.0	0.0

564 rows × 11 columns

print(titanic_data.median())

survived       0.0
pclass         2.0
sex            1.0
age           28.0
sibsp          0.0
parch          0.0
fare          16.1
alone          1.0
embarked_C     0.0
embarked_Q     0.0
embarked_S     1.0
dtype: float64

print(titanic_data.query("survived == 0").mean())

survived       0.000000
pclass         2.464072
sex            0.844311
age           31.073353
sibsp          0.562874
parch          0.398204
fare          24.835902
alone          0.616766
embarked_C     0.185629
embarked_Q     0.038922
embarked_S     0.775449
dtype: float64

print(td.query("survived == 1").mean())

survived       1.000000
pclass         1.878261
sex            0.326087
age           28.481522
sibsp          0.504348
parch          0.508696
fare          50.188806
alone          0.456522
embarked_C     0.152174
embarked_Q     0.034783
embarked_S     0.813043
dtype: float64

print("maximums for survivors")
print(td.query("survived == 1").max())
print()
print("minimums for survivors")
print(td.query("survived == 1").min())

maximums for survivors
survived        1.0000
pclass          3.0000
sex             1.0000
age            80.0000
sibsp           4.0000
parch           5.0000
fare          512.3292
alone           1.0000
embarked_C      1.0000
embarked_Q      1.0000
embarked_S      1.0000
dtype: float64

minimums for survivors
survived      1.00
pclass        1.00
sex           0.00
age           0.75
sibsp         0.00
parch         0.00
fare          0.00
alone         0.00
embarked_C    0.00
embarked_Q    0.00
embarked_S    0.00
dtype: float64

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Build distinct data frames on survived column
X = td.drop('survived', axis=1) # all except 'survived'
y = td['survived'] # only 'survived'

# Split arrays in random train 70%, random test 30%, using stratified sampling (same proportion of survived in both sets) and a fixed random state (42
# The number 42 is often used in examples and tutorials because of its cultural significance in fields like science fiction (it's the "Answer to the Ultimate Question of Life, The Universe, and Everything" in The Hitchhiker's Guide to the Galaxy by Douglas Adams). But in practice, the actual value doesn't matter; what's important is that it's set to a consistent value.
# X_train is the DataFrame containing the features for the training set.
# X_test is the DataFrame containing the features for the test set.
# y-train is the 'survived' status for each passenger in the training set, corresponding to the X_train data.
# y_test is the 'survived' status for each passenger in the test set, corresponding to the X_test data.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train a decision tree classifier
dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)

# Test the model
y_pred = dt.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('DecisionTreeClassifier Accuracy: {:.2%}'.format(accuracy))  

# Train a logistic regression model
logreg = LogisticRegression()
logreg.fit(X_train, y_train)

# Test the model
y_pred = logreg.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('LogisticRegression Accuracy: {:.2%}'.format(accuracy))

DecisionTreeClassifier Accuracy: 74.71%
LogisticRegression Accuracy: 78.82%


/home/pranav/nighthawk/Pranav_2025/Pranav_2025/venv/lib/python3.12/site-packages/sklearn/linear_model/_logistic.py:465: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(

import numpy as np

# Logistic regression model is used to predict the probability

# Define a new passenger
passenger = pd.DataFrame({
    'name': ['Pranav Santhosh'],
    'pclass': [2], # 2nd class picked as it was median, bargains are my preference, but I don't want to have poor accomodations
    'sex': ['male'],
    'age': [15],
    'sibsp': [4], # I usually travel with my family (mom, dad, 2 siblings)
    'parch': [0], # currenly I have 0 children at home (im too young for that)
    'fare': [16], # median fare picked assuming it is 2nd class
    'embarked': ['S'], # majority of passengers embarked in Southampton (false but example)
    'alone': [False] # travelling with family (mom, dad, 2 siblings))
})

display(passenger)
new_passenger = passenger.copy()

# Preprocess the new passenger data
new_passenger['sex'] = new_passenger['sex'].apply(lambda x: 1 if x == 'male' else 0)
new_passenger['alone'] = new_passenger['alone'].apply(lambda x: 1 if x == True else 0)

# Encode 'embarked' variable
onehot = enc.transform(new_passenger[['embarked']]).toarray()
cols = ['embarked_' + val for val in enc.categories_[0]]
new_passenger[cols] = pd.DataFrame(onehot, index=new_passenger.index)
new_passenger.drop(['name'], axis=1, inplace=True)
new_passenger.drop(['embarked'], axis=1, inplace=True)

display(new_passenger)

# Predict the survival probability for the new passenger
dead_proba, alive_proba = np.squeeze(logreg.predict_proba(new_passenger))

# Print the survival probability
print('Death probability: {:.2%}'.format(dead_proba))  
print('Survival probability: {:.2%}'.format(alive_proba))

	name	pclass	sex	age	sibsp	parch	fare	embarked	alone
0	Pranav Santhosh	2	male	15	4	0	16	S	False

	pclass	sex	age	sibsp	parch	fare	alone	embarked_C	embarked_Q	embarked_S
0	2	1	15	4	0	16	0	0.0	0.0	1.0

Death probability: 82.11%
Survival probability: 17.89%

# Decision tree model is used to determine the importance of each feature

importances = dt.feature_importances_
for feature, importance in zip(new_passenger.columns, importances):
    print(f'The importance of {feature} is: {importance}')

The importance of pclass is: 0.14556375413239328
The importance of sex is: 0.27345943069742495
The importance of age is: 0.23633016299020845
The importance of sibsp is: 0.05829266033554311
The importance of parch is: 0.013914855333419261
The importance of fare is: 0.2387482117115309
The importance of alone is: 0.0052274054025367505
The importance of embarked_C is: 0.011151798192078404
The importance of embarked_Q is: 0.0
The importance of embarked_S is: 0.01731172120486489

## Python Titanic Model, prepared for a titanic.py file

# Import the required libraries for the TitanicModel class
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
import pandas as pd
import numpy as np
import seaborn as sns

class TitanicModel:
    """A class used to represent the Titanic Model for passenger survival prediction.
    """
    # a singleton instance of TitanicModel, created to train the model only once, while using it for prediction multiple times
    _instance = None
    
    # constructor, used to initialize the TitanicModel
    def __init__(self):
        # the titanic ML model
        self.model = None
        self.dt = None
        # define ML features and target
        self.features = ['pclass', 'sex', 'age', 'sibsp', 'parch', 'fare', 'alone']
        self.target = 'survived'
        # load the titanic dataset
        self.titanic_data = sns.load_dataset('titanic')
        # one-hot encoder used to encode 'embarked' column
        self.encoder = OneHotEncoder(handle_unknown='ignore')

    # clean the titanic dataset, prepare it for training
    def _clean(self):
        # Drop unnecessary columns
        self.titanic_data.drop(['alive', 'who', 'adult_male', 'class', 'embark_town', 'deck'], axis=1, inplace=True)

        # Convert boolean columns to integers
        self.titanic_data['sex'] = self.titanic_data['sex'].apply(lambda x: 1 if x == 'male' else 0)
        self.titanic_data['alone'] = self.titanic_data['alone'].apply(lambda x: 1 if x == True else 0)

        # Drop rows with missing 'embarked' values before one-hot encoding
        self.titanic_data.dropna(subset=['embarked'], inplace=True)
        
        # One-hot encode 'embarked' column
        onehot = self.encoder.fit_transform(self.titanic_data[['embarked']]).toarray()
        cols = ['embarked_' + str(val) for val in self.encoder.categories_[0]]
        onehot_df = pd.DataFrame(onehot, columns=cols)
        self.titanic_data = pd.concat([self.titanic_data, onehot_df], axis=1)
        self.titanic_data.drop(['embarked'], axis=1, inplace=True)

        # Add the one-hot encoded 'embarked' features to the features list
        self.features.extend(cols)
        
        # Drop rows with missing values
        self.titanic_data.dropna(inplace=True)

    # train the titanic model, using logistic regression as key model, and decision tree to show feature importance
    def _train(self):
        # split the data into features and target
        X = self.titanic_data[self.features]
        y = self.titanic_data[self.target]
        
        # perform train-test split
        self.model = LogisticRegression(max_iter=1000)
        
        # train the model
        self.model.fit(X, y)
        
        # train a decision tree classifier
        self.dt = DecisionTreeClassifier()
        self.dt.fit(X, y)
        
    @classmethod
    def get_instance(cls):
        """ Gets, and conditionaly cleans and builds, the singleton instance of the TitanicModel.
        The model is used for analysis on titanic data and predictions on the survival of theoritical passengers.
        
        Returns:
            TitanicModel: the singleton _instance of the TitanicModel, which contains data and methods for prediction.
        """        
        # check for instance, if it doesn't exist, create it
        if cls._instance is None:
            cls._instance = cls()
            cls._instance._clean()
            cls._instance._train()
        # return the instance, to be used for prediction
        return cls._instance

    def predict(self, passenger):
        """ Predict the survival probability of a passenger.

        Args:
            passenger (dict): A dictionary representing a passenger. The dictionary should contain the following keys:
                'pclass': The passenger's class (1, 2, or 3)
                'sex': The passenger's sex ('male' or 'female')
                'age': The passenger's age
                'sibsp': The number of siblings/spouses the passenger has aboard
                'parch': The number of parents/children the passenger has aboard
                'fare': The fare the passenger paid
                'embarked': The port at which the passenger embarked ('C', 'Q', or 'S')
                'alone': Whether the passenger is alone (True or False)

        Returns:
           dictionary : contains die and survive probabilities 
        """
        # clean the passenger data
        passenger_df = pd.DataFrame(passenger, index=[0])
        passenger_df['sex'] = passenger_df['sex'].apply(lambda x: 1 if x == 'male' else 0)
        passenger_df['alone'] = passenger_df['alone'].apply(lambda x: 1 if x == True else 0)
        onehot = self.encoder.transform(passenger_df[['embarked']]).toarray()
        cols = ['embarked_' + str(val) for val in self.encoder.categories_[0]]
        onehot_df = pd.DataFrame(onehot, columns=cols)
        passenger_df = pd.concat([passenger_df, onehot_df], axis=1)
        passenger_df.drop(['embarked', 'name'], axis=1, inplace=True)
        
        # predict the survival probability and extract the probabilities from numpy array
        die, survive = np.squeeze(self.model.predict_proba(passenger_df))
        # return the survival probabilities as a dictionary
        return {'die': die, 'survive': survive}
    
    def feature_weights(self):
        """Get the feature weights
        The weights represent the relative importance of each feature in the prediction model.

        Returns:
            dictionary: contains each feature as a key and its weight of importance as a value
        """
        # extract the feature importances from the decision tree model
        importances = self.dt.feature_importances_
        # return the feature importances as a dictionary, using dictionary comprehension
        return {feature: importance for feature, importance in zip(self.features, importances)} 
    
def initTitanic():
    """ Initialize the Titanic Model.
    This function is used to load the Titanic Model into memory, and prepare it for prediction.
    """
    TitanicModel.get_instance()
    
def testTitanic():
    """ Test the Titanic Model
    Using the TitanicModel class, we can predict the survival probability of a passenger.
    Print output of this test contains method documentation, passenger data, survival probability, and survival weights.
    """
     
    # setup passenger data for prediction
    print(" Step 1:  Define theoritical passenger data for prediction: ")
    passenger = {
        'name': ['John Mortensen'],
        'pclass': [2],
        'sex': ['male'],
        'age': [65],
        'sibsp': [1],
        'parch': [1],
        'fare': [16.00],
        'embarked': ['S'],
        'alone': [False]
    }
    print("\t", passenger)
    print()

    # get an instance of the cleaned and trained Titanic Model
    titanicModel = TitanicModel.get_instance()
    print(" Step 2:", titanicModel.get_instance.__doc__)
   
    # print the survival probability
    print(" Step 3:", titanicModel.predict.__doc__)
    probability = titanicModel.predict(passenger)
    print('\t death probability: {:.2%}'.format(probability.get('die')))  
    print('\t survival probability: {:.2%}'.format(probability.get('survive')))
    print()
    
    # print the feature weights in the prediction model
    print(" Step 4:", titanicModel.feature_weights.__doc__)
    importances = titanicModel.feature_weights()
    for feature, importance in importances.items():
        print("\t\t", feature, f"{importance:.2%}") # importance of each feature, each key/value pair
        
if __name__ == "__main__":
    print(" Begin:", testTitanic.__doc__)
    testTitanic()

 Begin:  Test the Titanic Model
    Using the TitanicModel class, we can predict the survival probability of a passenger.
    Print output of this test contains method documentation, passenger data, survival probability, and survival weights.
    
 Step 1:  Define theoritical passenger data for prediction: 
	 {'name': ['John Mortensen'], 'pclass': [2], 'sex': ['male'], 'age': [65], 'sibsp': [1], 'parch': [1], 'fare': [16.0], 'embarked': ['S'], 'alone': [False]}

 Step 2:  Gets, and conditionaly cleans and builds, the singleton instance of the TitanicModel.
        The model is used for analysis on titanic data and predictions on the survival of theoritical passengers.
        
        Returns:
            TitanicModel: the singleton _instance of the TitanicModel, which contains data and methods for prediction.
        
 Step 3:  Predict the survival probability of a passenger.

        Args:
            passenger (dict): A dictionary representing a passenger. The dictionary should contain the following keys:
                'pclass': The passenger's class (1, 2, or 3)
                'sex': The passenger's sex ('male' or 'female')
                'age': The passenger's age
                'sibsp': The number of siblings/spouses the passenger has aboard
                'parch': The number of parents/children the passenger has aboard
                'fare': The fare the passenger paid
                'embarked': The port at which the passenger embarked ('C', 'Q', or 'S')
                'alone': Whether the passenger is alone (True or False)

        Returns:
           dictionary : contains die and survive probabilities 
        
	 death probability: 93.49%
	 survival probability: 6.51%

 Step 4: Get the feature weights
        The weights represent the relative importance of each feature in the prediction model.

        Returns:
            dictionary: contains each feature as a key and its weight of importance as a value
        
		 pclass 12.17%
		 sex 29.65%
		 age 25.18%
		 sibsp 5.64%
		 parch 1.61%
		 fare 22.16%
		 alone 0.48%
		 embarked_C 0.95%
		 embarked_Q 1.19%
		 embarked_S 0.98%

Everything below is meant for the backend.

THIS IS BACKEND CODE
## Python Titanic Sample API endpoint
from flask import Blueprint, request, jsonify
from flask_restful import Api, Resource # used for REST API building

# Import the TitanicModel class from the model file
# from model.titanic import TitanicModel

titanic_api = Blueprint('titanic_api', __name__,
                   url_prefix='/api/titanic')

api = Api(titanic_api)
class TitanicAPI:
    class _Predict(Resource):
        
        def post(self):
            """ Semantics: In HTTP, POST requests are used to send data to the server for processing.
            Sending passenger data to the server to get a prediction fits the semantics of a POST request.
            
            POST requests send data in the body of the request...
            1. which can handle much larger amounts of data and data types, than URL parameters
            2. using an HTTPS request, the data is encrypted, making it more secure
            3. a JSON formated body is easy to read and write between JavaScript and Python, great for Postman testing
            """     
            # Get the passenger data from the request
            passenger = request.get_json()

            # Get the singleton instance of the TitanicModel
            titanicModel = TitanicModel.get_instance()
            # Predict the survival probability of the passenger
            response = titanicModel.predict(passenger)

            # Return the response as JSON
            return jsonify(response)

    api.add_resource(_Predict, '/predict')

  Cell In[14], line 1
    THIS IS BACKEND CODE
         ^
SyntaxError: invalid syntax

app.register_blueprint(titanic_api) # register api routes

@custom_cli.command('generate_data')
def generate_data():
    initUsers()
    initPlayers()
    initTitanic() # init titanic data