Banking Security: Facial Recognition & Keystroke Dynamics

Improving Banking Security: A Full Technical Report

As cyberattacks on the financial sector escalate—costing the industry $2 billion over the last 20 years—traditional passwords are no longer enough. This project details the development of a secure banking application that implements Two-Factor Authentication (2FA) using Facial Recognition and continuous authentication via Keystroke Dynamics (KD) to secure user sessions, even during idle timeouts.

Below is the complete breakdown of our implementation, code decisions, and experimental results.

1. System Architecture & Libraries

The project was built in Python. We started by importing specific libraries chosen for their robustness in data handling and machine learning.

1.1 The Imports

We used logging for traceability, pandas for data manipulation, and sklearn for our Machine Learning components.

import logging
from utils import log_event from math import sqrt import pandas as pd import joblib from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, classification_report from io import BytesIO import sqlite3 db_path = "users.db" import requests import cv2

Key Technical Decisions:

joblib & BytesIO: We needed to serialize our trained Machine Learning models to store them directly in the database as binary objects (BLOBs).
RandomForestClassifier: We chose this meta-estimator because it combines multiple decision trees. It is highly effective for classification tasks like identifying users based on typing rhythm.
sqlite3: Chosen for its simplicity and local storage capabilities, perfect for this scale of prototype.
face_recognition (with cv2): We used cv2 for video capture and face_recognition (based on dlib) for the heavy lifting. Crucially, we specified the CNN model over HOG for better accuracy, even though it requires more processing power.

1.2 Database Design

We designed two primary tables: one for users and one for their biometric keystroke data.

The Users Table: We store the password as a BLOB (encrypted) and the face embedding as a BLOB. Note that the username is unique and mandatory.

cursor.execute("""
CREATE TABLE IF NOT EXISTS users (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    username TEXT UNIQUE NOT NULL,
    password BLOB NOT NULL,
    email TEXT NOT NULL,
    face_embedding BLOB
)
""")

The Keystrokes Table: This table links to the users table via a Foreign Key. It stores the statistical features we extract from a user's typing: Mean and Standard Deviation for Hold Time (HT) and flight times (PPT, RRT, RPT).

cursor.execute("""
CREATE TABLE IF NOT EXISTS keystrokes (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    user_id INTEGER NOT NULL,
    ht_mean REAL, ht_std_dev REAL,
    ppt_mean REAL, ppt_std_dev REAL,
    rrt_mean REAL, rrt_std_dev REAL,
    rpt_mean REAL, rpt_std_dev REAL,
    FOREIGN KEY(user_id) REFERENCES users(id)
)
""")

2. Implementing Facial Recognition

We implemented two core functions for facial biometrics: registration and authentication.

2.1 Registering a Face

The register_face function first asks for explicit permission. If granted, it attempts to capture a clear frame up to 10 times.

def register_face(username, conn=None):
response = messagebox.askquestion("Face Authentication", "Do you allow the app to access your camera?")
if response != 'yes':
    return False

retries = 0
max_retries = 10
while not registered and retries < max_retries:
    ret, frame = cap.read()
    if not ret:
        retries += 1
        continue
    
    rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    face_locations = face_recognition.face_locations(rgb_frame, model='cnn')
    face_encodings = face_recognition.face_encodings(rgb_frame, face_locations)

    if len(face_encodings) == 1:
        face_embedding = np.array(face_encodings[0], dtype=np.float64).tobytes()
        cursor.execute('UPDATE users SET face_embedding=? WHERE username=?', (sqlite3.Binary(face_embedding), username))
        conn.commit()
        registered = True

Decision: We convert BGR to RGB because face_recognition requires it. We strictly check len(face_encodings) == 1 to ensure we are only registering a single, clear face.

2.2 Authenticating with Face

During login, authenticate_face compares the live camera feed against the stored embedding. We use Euclidean distance to measure similarity.

if len(face_encodings) == 1:
user_embedding = cursor.execute('SELECT face_embedding FROM users WHERE username=?', (username,))
# ... fetch and convert buffer ...
distance = np.linalg.norm(face_encodings[0] - user_embedding)
if distance < 0.6:
    return username

Decision: We set the distance threshold at 0.6. If the calculated distance is lower than this, it's a match. If not, authentication fails.

3. Implementing Keystroke Dynamics (KD)

This system captures how a user types. This is critical for our idle timeout security logic.

Figure 3.1: Visualization of Keystroke Metrics (Hold Time vs Flight Time)

3.1 Capturing Data

We use on_key_press and on_key_release events to log timestamps. We explicitly ignore modifier keys (Shift, Ctrl, Alt) to avoid noise in the data.

def on_key_press_password(event):
if event.keysym not in ["Shift_L", "Control_L", "Alt_L", "BackSpace", "Delete"]:
    password_keystrokes["press_times"].append(time.time())
def on_key_release_password(event): if event.keysym not in ["Shift_L", "Control_L", "Alt_L", "BackSpace", "Delete"]: password_keystrokes["release_times"].append(time.time())

3.2 Computing Features (The Math)

Raw timestamps aren't enough. We convert them into statistical features using compute_keystroke_features and calculate_mean_and_std.

def calculate_mean_and_std(feature_list):
mean = sum(feature_list) / len(feature_list)
squared_diffs = [(x - mean) ** 2 for x in feature_list]
variance = sum(squared_diffs) / (len(feature_list) - 1 if len(feature_list) > 1 else 1)
std_dev = sqrt(variance)
return mean, std_dev

The Metrics: We calculate the Mean and Standard Deviation for specific flight times (e.g., Press-to-Press, Release-to-Release). This provides a "fingerprint" of the user's typing rhythm.

3.3 Training the ML Model

When a user registers or logs in successfully, we retrain their personal model using train_model.

def train_model(training_data, user_id, conn=None):
required_features = ['ht_mean', 'ht_std_dev', 'ppt_mean', 'ppt_std_dev', ...]
X = training_data[required_features]
y = training_data['user_id']

rf_model = RandomForestClassifier()
rf_model.fit(X, y)

# Serialize the model to store in DB
model_stream = BytesIO()
joblib.dump(rf_model, model_stream)
serialized_model = model_stream.read()

3.4 Predicting the User

To verify a user, predict_user_model loads the serialized model and calculates the probability that the current typing belongs to the user.

def predict_user_model(new_data, conn=None, threshold=0.7):
# ... load model ...
prediction = model.predict(X)
probabilities = model.predict_proba(X)

max_prob = max(probabilities[0])
if max_prob >= threshold:
    return prediction[0]
else:
    return 0

Decision: We set a strict threshold of 0.7 (70%). If the model is less than 70% confident, we treat it as an anomaly.

4. Application Logic: Registration & Login

4.1 User Registration (`register_user`)

The registration process is comprehensive. It validates inputs, ensures password strength (>8 chars), captures initial keystrokes, encrypts data, and registers the face.

Figure 4.1: User Registration GUI with Password Strength and Biometric Consent.

def register_user():
# Input validation
if len(password) < 8: return # Strength check

# Capture keystrokes
features_1 = compute_keystroke_features(password_keystrokes)

# Encrypt data
hashed_password = hash_password(password)
encryption_key = load_aes_key()
encrypted_email = encrypt_data(email, encryption_key)

# Insert into DB
cursor.execute("INSERT INTO users ...", (username, hashed_password, encrypted_email))

# Register Face
face_registration_success = register_face(username, conn)
if not face_registration_success:
    conn.rollback() # Undo everything if face fails
    return

# Train initial model
train_model(training_data, user_id, conn)

4.2 User Login (`login_user`)

Login requires passing the 2FA check (Face + Password). It also silently analyzes keystrokes to update the model or flag security issues.

def login_user():
# 1. Face Authentication
if not authenticate_face(username):
    return

# 2. Password Check
if verify_password(password, hashed_password):
    
    # 3. Keystroke Analysis
    features_1 = compute_keystroke_features(password_keystrokes)
    matched = predict_user_model(features_1, conn)
    
    if matched == user_id:
        security_flag = False
    else:
        security_flag = True
        send_security_alert_in_background(user_email)
        # Trigger Physical Matrix request (See logic in flowchart)

If the keystroke pattern doesn't match, we enable a security_flag. This triggers a background email alert and forces the user to input a code from a Physical Matrix card.

4.3 Login After Idle (`loginAfterIDLE`)

To balance security with usability, if a session times out, we use loginAfterIDLE. This function skips the facial recognition (to be less intrusive) but enforces the keystroke check strictly via a pop-up password request.

5. Security Modules

We implemented a dedicated security.py file to handle encryption standards.

# Generate 256-bit AES Key
def generate_aes_key(): return os.urandom(32)

Encrypt with AES-CFB
def encrypt_data(plaintext, key): iv = os.urandom(16) cipher = Cipher(algorithms.AES(key), modes.CFB(iv), backend=default_backend()) return base64.b64encode(iv + ciphertext).decode()

Hash Password with Bcrypt
def hash_password(password): salt = bcrypt.gensalt() return bcrypt.hashpw(password.encode(), salt)

Decision: We used AES-256 in CFB mode for data encryption and Bcrypt for password hashing. The AES key is fetched securely via SSH from a Key Management Service (KMS) rather than being hardcoded.

6. Experimental Phases & Results

We conducted a study with 3 users to test efficacy, efficiency, security, and privacy. The users had different password complexities:

User 1: Simple (12345678)
User 2: Medium (qwerty@2023)
User 3: Complex (Xx78yP!z9$5kLm)

Phase 1: Facial Recognition Tests

We tested under Normal, Weak, and Mixed lighting.

Normal Light: 100% success rate. Fast authentication (< 1s).
Weak Light: Slower authentication (> 1.5s) but still accurate.
Mixed Light: Most inconsistent performance (> 2s), but still successful.

Result: The CNN model is robust but sensitive to lighting quality.

Phase 2: Keystroke Pattern Tests

We measured the probability of successful authentication over 10 attempts per user.

User 1 (Simple): Very high success (Prob > 0.87). Simple patterns are easy for the ML to learn.
User 3 (Complex): Initial struggle (Prob < 0.8) but achieved 1.0 (perfect match) by the final attempt.

Figure 6.1: Learning Curve: Prediction Probability vs Number of Attempts.

Result: The ML model learns complex patterns effectively over time.

Phase 3: Cross-Keystroke (Impostor) Tests

We asked users to type each other's passwords to test for false positives.

User 1 → User 2: Failed attempts (Good).
User 2 → User 1: False Positives detected. Because User 1's password was so simple, User 2 was able to mimic the rhythm successfully in 2 out of 3 attempts.
User 1 → User 3: 100% Rejection. The complex password rhythm could not be spoofed.

Critical Finding: Password complexity is directly linked to the security of Keystroke Dynamics. Simple passwords lead to generic typing rhythms that are easier to spoof.

Phase 4: Cross-Facial Tests

We attempted to unlock User A's account with User B's face.

Result: 100% failure rate for impostors. The system perfectly distinguished between users.

7. Efficiency and Privacy Decisions

Efficiency: We chose not to encrypt the username. It is restricted to numbers only (pseudonymization), so it poses no privacy risk. This allows for much faster database lookups compared to decrypting every username on login.
Algorithm Choice: We compared XGBoost vs. RandomForest. RandomForest was chosen because it was faster to train (2 seconds less on average) and less prone to overfitting on our small dataset.
Privacy: We implemented a forced "Terms and Conditions" popup to collect consent for biometric data processing, complying with GDPR requirements.

8. Future Work

While successful, we identified areas for improvement:

Implement facial recognition checks during idle timeouts (currently only keystroke is checked).
Store raw facial images to allow for re-encoding if the face algorithm changes.
Conduct "Replay Attack" studies to see if recorded keystroke timings can spoof the system.