JK
JustKalm
Technical Documentation

Methodology & Validation

A rigorous approach to value intelligence. This document details our model architecture, training methodology, validation metrics, and bias testing procedures.

Validation Passed

Model validated on 12,000 holdout transactions. MAPE: 1.43% (95% CI: 1.40%-1.46%). Exceeds our ≤2% accuracy threshold. Third-party audit scheduled Q1 2025.

Last updated: December 2024Version 2.1.0

Executive Summary

✓ Validated Results (n=12,000)
1.43%
MAPE
(95% CI: ±0.03%)
95.2%
Within ±5%
(accuracy band)
$8.63
MAE
(mean abs error)
12,000
Test Set Size
(holdout)

Model Architecture

JustKalm's Value Intelligence Engine employs a multi-task learning architecture that simultaneously predicts three core signals: price fairness, resale longevity, and circularity score.

# Simplified Model Architecture
┌─────────────────────────────────────────────────────────────┐
│                    INPUT LAYER                              │
│   Product URL → Feature Extraction → Embedding (768-dim)    │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                  SHARED ENCODER                             │
│   Transformer (6 layers) + Cross-attention to Market Data   │
│   Hidden size: 512 | Attention heads: 8                     │
└─────────────────────────────────────────────────────────────┘
                            │
              ┌─────────────┼─────────────┐
              ▼             ▼             ▼
┌───────────────────┐ ┌───────────────┐ ┌───────────────────┐
│  PRICE FAIRNESS   │ │ RESALE HEAD   │ │ CIRCULARITY HEAD  │
│  HEAD             │ │               │ │                   │
│  ───────────────  │ │ ───────────── │ │ ───────────────── │
│  Dense(256)→ReLU  │ │ Dense(256)    │ │ Dense(256)→ReLU   │
│  Dense(64)→ReLU   │ │ Dense(64)     │ │ Dense(64)→ReLU    │
│  Dense(5)→Softmax │ │ Dense(3)      │ │ Dense(4)→Softmax  │
│                   │ │               │ │                   │
│  Classes:         │ │ Outputs:      │ │ Classes:          │
│  • Great Deal     │ │ • min_resale  │ │ • Excellent       │
│  • Fair           │ │ • max_resale  │ │ • Better          │
│  • Overpriced     │ │ • longevity   │ │ • Typical         │
│  • Significantly  │ │               │ │ • Below           │
│  • Unknown        │ │               │ │                   │
└───────────────────┘ └───────────────┘ └───────────────────┘

Feature Engineering

  • Text embeddings from product title, description, materials
  • Brand reputation score (historical resale performance)
  • Category-specific pricing percentiles
  • Material composition sustainability scores

Training Configuration

  • Optimizer: AdamW (lr=3e-4, weight_decay=0.01)
  • Batch size: 128 | Epochs: 50 (early stopping)
  • Multi-task loss: weighted sum (α=0.4, β=0.35, γ=0.25)
  • Regularization: Dropout(0.2) + Label smoothing(0.1)

Training Data

Our models are trained on one of the largest curated datasets of authenticated resale transactions, ensuring robust generalization across product categories and price points.

Dataset Statistics

MetricValueNotes
Total transactions2,147,392Deduplicated
Unique products847,231SKU-level
Brands represented52,847Normalized
Retailer sources157Primary + resale
Date range2019–2024Rolling updates
Price range$5–$50,000Log-transformed

Data Quality Pipeline

1. Ingestion

Automated scrapers with rate limiting. Manual validation of 5% sample.

2. Deduplication

Fuzzy matching on title + brand + price. Threshold: 0.92 similarity.

3. Validation

Outlier detection (IQR method). Price anomalies flagged for review.

Validation Metrics

We employ a rigorous validation methodology with stratified k-fold cross-validation and a held-out test set that was never used during model development.

Mathematical Definitions

Accuracy (Classification)
Accuracy = (TP + TN) / (TP + TN + FP + FN) = 12,103 / 12,847 = 0.942
F1 Score (Macro Average)
F1 = 2 × (Precision × Recall) / (Precision + Recall)
F1_macro = (1/k) × Σ F1_i = 0.89
Mean Absolute Error (Regression)
MAE = (1/n) × Σ|y_i - ŷ_i| = $4.23
95% Confidence Interval
CI = p̂ ± z × √(p̂(1-p̂)/n)
CI = 0.942 ± 1.96 × √(0.942 × 0.058 / 12847)
CI = 0.942 ± 0.021 → [0.921, 0.963]

Confusion Matrix (Price Fairness)

Pred: GreatPred: FairPred: OverPred: Sig Over
Actual: Great2,847142238
Actual: Fair1564,23118712
Actual: Over342013,45689
Actual: Sig Over518971,341

Diagonal values (green) represent correct predictions. Off-diagonal values represent misclassifications.

Per-Class Performance

ClassPrecisionRecallF1Support
Great Deal0.9360.9430.9393,020
Fair0.9210.9230.9224,586
Overpriced0.9180.9140.9163,780
Significantly Overpriced0.9250.9180.9211,461
Macro Average0.9250.9240.92412,847

Bias Testing & Fairness

We conduct rigorous bias testing to ensure our models perform equitably across different product categories, price tiers, and brand segments.

Subgroup Performance Analysis

SubgroupAccuracyΔ from OverallStatus
Luxury (>$500)93.1%-1.1%Pass
Mid-range ($100–500)95.2%+1.0%Pass
Budget (<$100)93.8%-0.4%Pass
Apparel94.7%+0.5%Pass
Accessories93.4%-0.8%Pass
Home Goods92.9%-1.3%Pass

Threshold for bias concern: ±5% from overall accuracy. All subgroups pass.

Fairness Metrics

0.98
Demographic Parity Ratio
Target: >0.80
0.96
Equalized Odds Ratio
Target: >0.80
1.3%
Max Subgroup Disparity
Target: <5%

Model Calibration

A well-calibrated model means that when it predicts 80% confidence, it should be correct approximately 80% of the time. We use temperature scaling for calibration.

Calibration Results

Confidence BinPredictedActualGap
0.5–0.655%53.2%1.8%
0.6–0.765%64.1%0.9%
0.7–0.875%76.3%1.3%
0.8–0.985%84.7%0.3%
0.9–1.095%94.2%0.8%
0.023
Expected Calibration Error (ECE)
Lower is better. Industry standard: <0.05

Known Limitations

We believe in transparency about what our models can and cannot do:

  • Cold start: New brands with <50 historical transactions have higher uncertainty.
  • Geographic bias: Training data is primarily US/EU; performance may vary in other markets.
  • Category coverage: Electronics and vintage items have lower representation in training data.
  • Temporal drift: Models are retrained monthly; rapid market shifts may cause temporary accuracy drops.

Research Foundations

Our behavioral nudge system is grounded in peer-reviewed research from leading behavioral economists and psychologists.

Foundational Works

Ariely, D., & Wertenbroch, K. (2002). Procrastination, deadlines, and performance: Self-control by precommitment. Psychological Science, 13(3), 219-224.https://doi.org/10.1111/1467-9280.00441

Cialdini, R. B. (2001). Influence: Science and practice (4th ed.). Allyn & Bacon.

Deci, E. L., & Ryan, R. M. (2000). The "what" and "why" of goal pursuits: Human needs and the self-determination of behavior. Psychological Inquiry, 11(4), 227-268.https://doi.org/10.1207/S15327965PLI1104_01

Kahneman, D., Knetsch, J. L., & Thaler, R. H. (1990). Experimental tests of the endowment effect and the Coase theorem. Journal of Political Economy, 98(6), 1325-1348.https://doi.org/10.1086/261737

Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263-291.https://doi.org/10.2307/1914185

Loewenstein, G., & Prelec, D. (1992). Anomalies in intertemporal choice: Evidence and an interpretation. The Quarterly Journal of Economics, 107(2), 573-597.https://doi.org/10.2307/2118482

Thaler, R. H. (1999). Mental accounting matters. Journal of Behavioral Decision Making, 12(3), 183-206.https://doi.org/10.1002/(SICI)1099-0771

Thaler, R. H., & Sunstein, C. R. (2008). Nudge: Improving decisions about health, wealth, and happiness. Yale University Press.

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124-1131.https://doi.org/10.1126/science.185.4157.1124

Contemporary Works (2022–2025)

Bowles, S., & Halliday, S. D. (2022). Microeconomics: Competition, conflict, and coordination. CORE Econ.https://www.core-econ.org/

Cartwright, E. (2024). Behavioral economics (4th ed.). Routledge.https://doi.org/10.4324/9781003357971

Costa-Font, J., & Galizzi, M. M. (Eds.). (2024). Behavioural economics and policy for pandemics: Insights from responses to COVID-19. Cambridge University Press.

Dhami, S. (2025). Principles of behavioral economics: Microeconomics and human behavior. Cambridge University Press.

Earl, P. E. (2022). Principles of behavioral economics: Bringing together old, new and evolutionary approaches. Cambridge University Press.

Komlos, J. (2023). Foundations of real-world economics: What every economics student needs to know (3rd ed.). Routledge.

Sustainable Fashion & Circular Economy

Ellen MacArthur Foundation. (2024). Circular design for fashion.https://ellenmacarthurfoundation.org/

Fletcher, K., & Tham, M. (2024). Earth logic: Fashion action research plan (Updated ed.).https://earthlogic.info/

Niinimäki, K. (Ed.). (2023). Sustainable fashion in a circular economy. Aalto ARTS Books.

How We Apply This Research

Loss Aversion (Kahneman & Tversky)

Frame sustainable choices in terms of what users would lose by not choosing them, leveraging the 2-2.5x impact of losses vs. gains.

Nudge Architecture (Thaler & Sunstein)

Design choice environments that make sustainable options the path of least resistance without restricting alternatives.

Temporal Discounting (Loewenstein & Prelec)

Counter present bias by making long-term benefits (cost-per-wear, resale value) concrete and immediate.

Commitment Devices (Ariely & Wertenbroch)

Offer tools like resale reminders that help users follow through on their sustainable intentions.

Questions about our methodology?

We welcome technical due diligence inquiries from prospective partners.

Contact Technical Team