Improving Banking Security: A Full Technical Report

As cyberattacks on the financial sector escalate—costing the industry $2 billion over the last 20 years—traditional passwords are no longer enough. This project details the development of a secure banking application that implements Two-Factor Authentication (2FA) using Facial Recognition and continuous authentication via Keystroke Dynamics (KD) to secure user sessions, even during idle timeouts.

Below is the complete breakdown of our implementation, code decisions, and experimental results.

1. System Architecture & Libraries

The project was built in Python. We started by importing specific libraries chosen for their robustness in data handling and machine learning.

1.1 The Imports

We used logging for traceability, pandas for data manipulation, and sklearn for our Machine Learning components.

import logging
from utils import log_event from math import sqrt import pandas as pd import joblib from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, classification_report from io import BytesIO import sqlite3 db_path = "users.db" import requests import cv2

Key Technical Decisions:

1.2 Database Design

We designed two primary tables: one for users and one for their biometric keystroke data.

The Users Table: We store the password as a BLOB (encrypted) and the face embedding as a BLOB. Note that the username is unique and mandatory.

cursor.execute("""
CREATE TABLE IF NOT EXISTS users (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    username TEXT UNIQUE NOT NULL,
    password BLOB NOT NULL,
    email TEXT NOT NULL,
    face_embedding BLOB
)
""")

The Keystrokes Table: This table links to the users table via a Foreign Key. It stores the statistical features we extract from a user's typing: Mean and Standard Deviation for Hold Time (HT) and flight times (PPT, RRT, RPT).

cursor.execute("""
CREATE TABLE IF NOT EXISTS keystrokes (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    user_id INTEGER NOT NULL,
    ht_mean REAL, ht_std_dev REAL,
    ppt_mean REAL, ppt_std_dev REAL,
    rrt_mean REAL, rrt_std_dev REAL,
    rpt_mean REAL, rpt_std_dev REAL,
    FOREIGN KEY(user_id) REFERENCES users(id)
)
""")

2. Implementing Facial Recognition

We implemented two core functions for facial biometrics: registration and authentication.

2.1 Registering a Face

The register_face function first asks for explicit permission. If granted, it attempts to capture a clear frame up to 10 times.

def register_face(username, conn=None):
response = messagebox.askquestion("Face Authentication", "Do you allow the app to access your camera?")
if response != 'yes':
    return False

retries = 0
max_retries = 10
while not registered and retries < max_retries:
    ret, frame = cap.read()
    if not ret:
        retries += 1
        continue
    
    rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    face_locations = face_recognition.face_locations(rgb_frame, model='cnn')
    face_encodings = face_recognition.face_encodings(rgb_frame, face_locations)

    if len(face_encodings) == 1:
        face_embedding = np.array(face_encodings[0], dtype=np.float64).tobytes()
        cursor.execute('UPDATE users SET face_embedding=? WHERE username=?', (sqlite3.Binary(face_embedding), username))
        conn.commit()
        registered = True

Decision: We convert BGR to RGB because face_recognition requires it. We strictly check len(face_encodings) == 1 to ensure we are only registering a single, clear face.

2.2 Authenticating with Face

During login, authenticate_face compares the live camera feed against the stored embedding. We use Euclidean distance to measure similarity.

if len(face_encodings) == 1:
user_embedding = cursor.execute('SELECT face_embedding FROM users WHERE username=?', (username,))
# ... fetch and convert buffer ...
distance = np.linalg.norm(face_encodings[0] - user_embedding)
if distance < 0.6:
    return username

Decision: We set the distance threshold at 0.6. If the calculated distance is lower than this, it's a match. If not, authentication fails.

3. Implementing Keystroke Dynamics (KD)

This system captures how a user types. This is critical for our idle timeout security logic.

Learning Curve Figure 3.1: Visualization of Keystroke Metrics (Hold Time vs Flight Time)

3.1 Capturing Data

We use on_key_press and on_key_release events to log timestamps. We explicitly ignore modifier keys (Shift, Ctrl, Alt) to avoid noise in the data.

def on_key_press_password(event):
if event.keysym not in ["Shift_L", "Control_L", "Alt_L", "BackSpace", "Delete"]:
    password_keystrokes["press_times"].append(time.time())
def on_key_release_password(event): if event.keysym not in ["Shift_L", "Control_L", "Alt_L", "BackSpace", "Delete"]: password_keystrokes["release_times"].append(time.time())

3.2 Computing Features (The Math)

Raw timestamps aren't enough. We convert them into statistical features using compute_keystroke_features and calculate_mean_and_std.

def calculate_mean_and_std(feature_list):
mean = sum(feature_list) / len(feature_list)
squared_diffs = [(x - mean) ** 2 for x in feature_list]
variance = sum(squared_diffs) / (len(feature_list) - 1 if len(feature_list) > 1 else 1)
std_dev = sqrt(variance)
return mean, std_dev

The Metrics: We calculate the Mean and Standard Deviation for specific flight times (e.g., Press-to-Press, Release-to-Release). This provides a "fingerprint" of the user's typing rhythm.

3.3 Training the ML Model

When a user registers or logs in successfully, we retrain their personal model using train_model.

def train_model(training_data, user_id, conn=None):
required_features = ['ht_mean', 'ht_std_dev', 'ppt_mean', 'ppt_std_dev', ...]
X = training_data[required_features]
y = training_data['user_id']

rf_model = RandomForestClassifier()
rf_model.fit(X, y)

# Serialize the model to store in DB
model_stream = BytesIO()
joblib.dump(rf_model, model_stream)
serialized_model = model_stream.read()

3.4 Predicting the User

To verify a user, predict_user_model loads the serialized model and calculates the probability that the current typing belongs to the user.

def predict_user_model(new_data, conn=None, threshold=0.7):
# ... load model ...
prediction = model.predict(X)
probabilities = model.predict_proba(X)

max_prob = max(probabilities[0])
if max_prob >= threshold:
    return prediction[0]
else:
    return 0

Decision: We set a strict threshold of 0.7 (70%). If the model is less than 70% confident, we treat it as an anomaly.

4. Application Logic: Registration & Login

4.1 User Registration (`register_user`)

The registration process is comprehensive. It validates inputs, ensures password strength (>8 chars), captures initial keystrokes, encrypts data, and registers the face.

Learning Curve Figure 4.1: User Registration GUI with Password Strength and Biometric Consent.
def register_user():
# Input validation
if len(password) < 8: return # Strength check

# Capture keystrokes
features_1 = compute_keystroke_features(password_keystrokes)

# Encrypt data
hashed_password = hash_password(password)
encryption_key = load_aes_key()
encrypted_email = encrypt_data(email, encryption_key)

# Insert into DB
cursor.execute("INSERT INTO users ...", (username, hashed_password, encrypted_email))

# Register Face
face_registration_success = register_face(username, conn)
if not face_registration_success:
    conn.rollback() # Undo everything if face fails
    return

# Train initial model
train_model(training_data, user_id, conn)

4.2 User Login (`login_user`)

Login requires passing the 2FA check (Face + Password). It also silently analyzes keystrokes to update the model or flag security issues.

def login_user():
# 1. Face Authentication
if not authenticate_face(username):
    return

# 2. Password Check
if verify_password(password, hashed_password):
    
    # 3. Keystroke Analysis
    features_1 = compute_keystroke_features(password_keystrokes)
    matched = predict_user_model(features_1, conn)
    
    if matched == user_id:
        security_flag = False
    else:
        security_flag = True
        send_security_alert_in_background(user_email)
        # Trigger Physical Matrix request (See logic in flowchart)

If the keystroke pattern doesn't match, we enable a security_flag. This triggers a background email alert and forces the user to input a code from a Physical Matrix card.

4.3 Login After Idle (`loginAfterIDLE`)

To balance security with usability, if a session times out, we use loginAfterIDLE. This function skips the facial recognition (to be less intrusive) but enforces the keystroke check strictly via a pop-up password request.

5. Security Modules

We implemented a dedicated security.py file to handle encryption standards.

# Generate 256-bit AES Key
def generate_aes_key(): return os.urandom(32)

Encrypt with AES-CFB
def encrypt_data(plaintext, key): iv = os.urandom(16) cipher = Cipher(algorithms.AES(key), modes.CFB(iv), backend=default_backend()) return base64.b64encode(iv + ciphertext).decode()

Hash Password with Bcrypt
def hash_password(password): salt = bcrypt.gensalt() return bcrypt.hashpw(password.encode(), salt)

Decision: We used AES-256 in CFB mode for data encryption and Bcrypt for password hashing. The AES key is fetched securely via SSH from a Key Management Service (KMS) rather than being hardcoded.

6. Experimental Phases & Results

We conducted a study with 3 users to test efficacy, efficiency, security, and privacy. The users had different password complexities:

Phase 1: Facial Recognition Tests

We tested under Normal, Weak, and Mixed lighting.

Result: The CNN model is robust but sensitive to lighting quality.

Phase 2: Keystroke Pattern Tests

We measured the probability of successful authentication over 10 attempts per user.

Learning Curve Figure 6.1: Learning Curve: Prediction Probability vs Number of Attempts.

Result: The ML model learns complex patterns effectively over time.

Phase 3: Cross-Keystroke (Impostor) Tests

We asked users to type each other's passwords to test for false positives.

Critical Finding: Password complexity is directly linked to the security of Keystroke Dynamics. Simple passwords lead to generic typing rhythms that are easier to spoof.

Phase 4: Cross-Facial Tests

We attempted to unlock User A's account with User B's face.

Result: 100% failure rate for impostors. The system perfectly distinguished between users.

7. Efficiency and Privacy Decisions

8. Future Work

While successful, we identified areas for improvement: