Risk Runs
Overnight.

Until Now.

I built QuantRisk to stop waiting overnight for VaR numbers. GPU-parallelized Monte Carlo runs 1M simulated price paths in under 8 seconds on a laptop GPU. Full portfolio VaR, Greeks, and stress tests on demand, not the next morning.

See Performance View Architecture →

16×Peak GPU Speedup

1MPaths / Run

~45sFull VaR Run

Scroll to discover the story ↓

Why QuantRisk

Your Risk Tools Are Holding You Back

Portfolios containing options, swaps, and structured derivatives require VaR recomputation across millions of stochastic price paths. Most tools make this an overnight batch job. QuantRisk makes it a real-time query.

/01

6Hours

Risk Runs Overnight

A risk manager submits a VaR batch at market close. The engine churns through the night on CPU clusters. By the time results land, European markets have already opened, and the exposure has changed.

▸It's 11:47 PM. The batch is 34% complete.

/02

0Intraday Hedges

Blind to Intraday Risk

A volatility spike hits at 1:30 PM. The derivatives desk has no real-time view of their gamma exposure. Without intraday VaR recomputation, hedging decisions are gut calls, not data-driven responses.

▸Vol surface just moved 40 bps. No one has rerun VaR yet.

/03

48Hour Delay

Capital Frozen

Capital allocation tables depend on stress-test outputs. With CPU engines taking hours per scenario, treasury teams operate on stale numbers, over-reserving capital that could be deployed elsewhere.

▸The stress test queue has 12 scenarios waiting.

“Capital decisions made on yesterday's risk data are nothing more than guesses. Real-time desks need real-time risk intelligence.”

What QuantRisk solves

Performance

Up to 16x Faster Than CPU

Benchmarked on an RTX 3060 Laptop GPU vs an i9-12900H running NumPy/SciPy. At 500K paths the GPU is 16.3x faster. At 1M paths it pulls slightly ahead of the memory bus and settles at 15.7x. Still the difference between waiting and working.

0.0\u00d7

Peak GPU Speedup

500K paths

Paths Per Run

FP16/FP32 mixed

0.0s

1M Path Runtime

RTX 3060 Laptop

Runtime Comparison

Monte Carlo VaR

CPUCPU Cluster

112.8s

112.8 seconds on CPU

GPURTX 3060

7.18s

GPU Speedup at 1M paths

CPU: 112.8s → GPU: 7.18s

15.7×

faster

“At 1M paths, the i9-12900H (20 threads, NumPy vectorized) takes 1 minute 52 seconds. The RTX 3060 Laptop finishes in 7.18 seconds. Same result, 15.7x less waiting.”

All Workloads

CPUGPU

PathsCPUGPUSpeedup

10K1.24s0.22s5.6×

100K11.8s0.91s13×

500K57.4s3.52s16.3×

1M112.8s7.18s15.7×

How It Works

From Market Data to Risk Output

Eight pipeline stages take raw market data and return real-time portfolio risk metrics, all GPU-accelerated.

📊

Market Data

Historical & Synthetic

Equity prices, implied vol surfaces, interest rate curves, and FX rates. Synthetic data generated via GBM or sourced from Parquet stores.

⚡

Path Generator

GBM · Heston · Cholesky

GPU-parallelized stochastic path simulation. Cholesky decomposition enforces asset correlations. Output: [paths × timesteps × assets].

🔢

Monte Carlo Engine

CUDA · PyTorch · FP16/FP32

Batched random number generation, drift+diffusion update, and payoff computation. Async CUDA streams pipeline simulation and pricing.

💹

Derivatives Pricing

5 Instrument Types

GPU kernels for European, Asian, Barrier, Basket, and American options. Black-Scholes analytical baseline for validation.

🗂️

Portfolio Aggregator

PnL · Greeks · Scenarios

Aggregates instrument exposures into portfolio PnL distribution. Computes Delta and Gamma via bump-and-reprice on GPU.

📉

VaR Engine

95% · 99% · 99.9%

Historical VaR, Monte Carlo VaR, and Expected Shortfall (CVaR) at three confidence levels. Sub-second recomputation.

🌊

Stress Testing

Macro Shock Scenarios

Parametric shocks: +300bps rate spike, volatility doubling, equity crash. Outputs drawdown curves and exposure heatmaps.

🖥️

Dashboard & Reports

Streamlit · Plotly · DuckDB

Interactive real-time dashboard for risk managers. Simulation results persisted in DuckDB, visualized with Plotly.

Each stage is independently benchmarkable. NVTX annotations mark every module boundary for Nsight Systems profiling, from raw data ingestion to final VaR output.

Supported Instruments

Five Derivative Instrument Types

Price European, Asian, Barrier, Basket, and American options out of the box. Every payoff kernel runs on GPU across millions of paths simultaneously.

Black-Scholes AnalyticalMonte Carlo SimulationLongstaff-Schwartz LSM

Vanilla

MC + Analytical

European Option

Priced via Black-Scholes closed form and validated against GPU Monte Carlo. Baseline benchmark for all pricing kernels.

Payoff Formula

max (S_{T} - K, 0)

/01

Path-Dependent

MC Only

Asian Option

Payoff depends on arithmetic average price over the path. Requires full path simulation, ideal for GPU parallelism.

Payoff Formula

max (\overset{ˉ}{A}_{T} - K, 0)

/02

Path-Dependent

MC Only

Barrier Option

Up-and-out and down-and-in variants. Barrier monitoring at every timestep across all paths is parallelized on GPU.

Payoff Formula

max (S_{T} - K, 0) \cdot 1_{τ > T}

/03

Multi-Asset

MC Only

Basket Option

Payoff on a weighted basket of correlated assets. Uses Cholesky-decomposed covariance on GPU for correlated path generation.

Payoff Formula

max (w \cdot S_{T} - K, 0)

/04

Early Exercise

MC + LSM

American Option

Longstaff-Schwartz regression on GPU. Backward induction over simulated paths determines optimal exercise boundary.

Payoff Formula

sup_{τ \in [0, T]} E [e^{- r τ} (S_{τ} - K)^{+}]

/05

⚡

Why GPU Matters Here

Each option type requires a full pass through all simulated paths. GPU parallelism means 1M payoff evaluations execute simultaneously, not sequentially.

Throughput delta~16×

Technology

Built on Solid Foundations

GPU compute for raw throughput, rigorous quantitative models for accuracy, and a full risk metrics stack. Open-source. Runs locally.

SM Utilization88%

Memory Bandwidth91%

Warp Occupancy74%

FP16 Throughput96%

CUDA · PyTorch · CuPy

GPU Compute

RTX 3060+: Compute Capability 8.6

FP16 / FP32: Mixed precision kernels

Async Streams: Pipelined simulation

NVTX Profiling: Nsight Systems traces

Capability coverage100%

gpu.py

1# GPU path simulation

2paths = torch.randn(

3 N, T, A, device='cuda',

4 dtype=torch.float16

GBM · Heston · Cholesky

Quant Models

GBM: Geometric Brownian Motion

Heston: Stochastic volatility

Cholesky: Correlated asset paths

Longstaff-Schwartz: American option pricing

Capability coverage100%

quant.py

1# Cholesky correlation

2L = torch.linalg.cholesky(Σ)

3Z = L @ torch.randn(A, N)

4# Z is correlated noise

VaR · CVaR · Greeks

Risk Metrics

VaR 99.9%: Monte Carlo & Historical

CVaR / ES: Expected Shortfall

Delta / Gamma: Bump-and-reprice on GPU

Stress Testing: Macro shock scenarios

Capability coverage100%

risk.py

1# Monte Carlo VaR

2pnl = portfolio_pnl(paths)

3var_99 = torch.quantile(

4 pnl, 0.01

🔬

Nsight Systems Profiling

NVTX · SM utilization · kernel traces

Every hot-path function is annotated with NVTX range markers. Nsight Systems captures kernel launch timelines, memory transfer overlaps, and warp stall reasons, enabling data-driven kernel optimization.

📈

Kernel Traces

💾

Memory BW

🔀

Stream Overlap

⚙️

GPU Optimization Targets

Memory · Warp · Precision

Memory Coalescing

Aligned 128-byte global memory access patterns

92%

Async CUDA Streams

Overlap path generation with pricing kernels

88%

FP16/FP32 Mixed

Half precision paths, full precision accumulation

98%

Kernel Fusion

Drift + diffusion + payoff in single pass

85%

Performance

Real Numbers, Real Speed

See exactly what QuantRisk produces: benchmark runtimes, VaR distributions, and stress scenario outputs, ready whenever you need them.

Benchmark Results

RTX 3060 vs i9-12900H

PathsCPU RuntimeGPU RuntimeSpeedupVerdict

10K

1.24s

0.22s

5.6×

Fast

100K

11.8s

0.91s

13×

Fast

500K

57.4s

3.52s

16.3×

Fast

112.8s

7.18s

15.7×

Fast

Risk Output

Portfolio PnL

PnL Distribution

Monte Carlo · 1M paths

VaR 95%VaR 99%VaR 99.9%

-$29.5K

VaR 95%

-$41.8K

VaR 99%

-$55.2K

CVaR 99.9%

Stress Testing Scenarios

Macro shock analysis · Portfolio drawdown

Worst-case simulation

Scenario 01

+300bps Rate Spike

-18.4%

Portfolio Drawdown

Severity85%

Scenario 02

2× Volatility

-22.1%

Portfolio Drawdown

Severity92%

Scenario 03

Equity Crash −30%

-31.7%

Portfolio Drawdown

Severity98%

Scenario 04

Combined Shock

-44.2%

Portfolio Drawdown

Severity100%

Ready to Use

Your Risk Dashboard, On Demand

QuantRisk runs entirely on your hardware. No cloud subscription, no data leaving your machine. Trigger a full VaR recomputation at any time and get results in seconds.

Python 3.10PyTorch 2.7CUDA 13.1DuckDBStreamlit

Launch App Explore Architecture →

Risk RunsOvernight.

Until Now.

Your Risk Tools Are Holding You Back

Risk Runs Overnight

Blind to Intraday Risk

Capital Frozen

Up to 16x Faster Than CPU

Runtime Comparison

All Workloads

From Market Data to Risk Output

Market Data

Path Generator

Monte Carlo Engine

Derivatives Pricing

Portfolio Aggregator

VaR Engine

Stress Testing

Dashboard & Reports

Five Derivative Instrument Types

European Option

Asian Option

Barrier Option

Basket Option

American Option

Why GPU Matters Here

Built on Solid Foundations

GPU Compute

Quant Models

Risk Metrics

Nsight Systems Profiling

GPU Optimization Targets

Real Numbers, Real Speed

Benchmark Results

Risk Output

PnL Distribution

Stress Testing Scenarios

+300bps Rate Spike

2× Volatility

Equity Crash −30%

Combined Shock

Your Risk Dashboard, On Demand

Risk Runs
Overnight.