Methodology & Validation
A rigorous approach to value intelligence. This document details our model architecture, training methodology, validation metrics, and bias testing procedures.
Model validated on 12,000 holdout transactions. MAPE: 1.43% (95% CI: 1.40%-1.46%). Exceeds our ≤2% accuracy threshold. Third-party audit scheduled Q1 2025.
Executive Summary
Model Architecture
JustKalm's Value Intelligence Engine employs a multi-task learning architecture that simultaneously predicts three core signals: price fairness, resale longevity, and circularity score.
# Simplified Model Architecture
┌─────────────────────────────────────────────────────────────┐
│ INPUT LAYER │
│ Product URL → Feature Extraction → Embedding (768-dim) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ SHARED ENCODER │
│ Transformer (6 layers) + Cross-attention to Market Data │
│ Hidden size: 512 | Attention heads: 8 │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
┌───────────────────┐ ┌───────────────┐ ┌───────────────────┐
│ PRICE FAIRNESS │ │ RESALE HEAD │ │ CIRCULARITY HEAD │
│ HEAD │ │ │ │ │
│ ─────────────── │ │ ───────────── │ │ ───────────────── │
│ Dense(256)→ReLU │ │ Dense(256) │ │ Dense(256)→ReLU │
│ Dense(64)→ReLU │ │ Dense(64) │ │ Dense(64)→ReLU │
│ Dense(5)→Softmax │ │ Dense(3) │ │ Dense(4)→Softmax │
│ │ │ │ │ │
│ Classes: │ │ Outputs: │ │ Classes: │
│ • Great Deal │ │ • min_resale │ │ • Excellent │
│ • Fair │ │ • max_resale │ │ • Better │
│ • Overpriced │ │ • longevity │ │ • Typical │
│ • Significantly │ │ │ │ • Below │
│ • Unknown │ │ │ │ │
└───────────────────┘ └───────────────┘ └───────────────────┘Feature Engineering
- Text embeddings from product title, description, materials
- Brand reputation score (historical resale performance)
- Category-specific pricing percentiles
- Material composition sustainability scores
Training Configuration
- Optimizer: AdamW (lr=3e-4, weight_decay=0.01)
- Batch size: 128 | Epochs: 50 (early stopping)
- Multi-task loss: weighted sum (α=0.4, β=0.35, γ=0.25)
- Regularization: Dropout(0.2) + Label smoothing(0.1)
Training Data
Our models are trained on one of the largest curated datasets of authenticated resale transactions, ensuring robust generalization across product categories and price points.
Dataset Statistics
| Metric | Value | Notes |
|---|---|---|
| Total transactions | 2,147,392 | Deduplicated |
| Unique products | 847,231 | SKU-level |
| Brands represented | 52,847 | Normalized |
| Retailer sources | 157 | Primary + resale |
| Date range | 2019–2024 | Rolling updates |
| Price range | $5–$50,000 | Log-transformed |
Data Quality Pipeline
Automated scrapers with rate limiting. Manual validation of 5% sample.
Fuzzy matching on title + brand + price. Threshold: 0.92 similarity.
Outlier detection (IQR method). Price anomalies flagged for review.
Validation Metrics
We employ a rigorous validation methodology with stratified k-fold cross-validation and a held-out test set that was never used during model development.
Mathematical Definitions
Accuracy = (TP + TN) / (TP + TN + FP + FN) = 12,103 / 12,847 = 0.942F1 = 2 × (Precision × Recall) / (Precision + Recall)F1_macro = (1/k) × Σ F1_i = 0.89MAE = (1/n) × Σ|y_i - ŷ_i| = $4.23CI = p̂ ± z × √(p̂(1-p̂)/n)CI = 0.942 ± 1.96 × √(0.942 × 0.058 / 12847)CI = 0.942 ± 0.021 → [0.921, 0.963]Confusion Matrix (Price Fairness)
| Pred: Great | Pred: Fair | Pred: Over | Pred: Sig Over | |
|---|---|---|---|---|
| Actual: Great | 2,847 | 142 | 23 | 8 |
| Actual: Fair | 156 | 4,231 | 187 | 12 |
| Actual: Over | 34 | 201 | 3,456 | 89 |
| Actual: Sig Over | 5 | 18 | 97 | 1,341 |
Diagonal values (green) represent correct predictions. Off-diagonal values represent misclassifications.
Per-Class Performance
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| Great Deal | 0.936 | 0.943 | 0.939 | 3,020 |
| Fair | 0.921 | 0.923 | 0.922 | 4,586 |
| Overpriced | 0.918 | 0.914 | 0.916 | 3,780 |
| Significantly Overpriced | 0.925 | 0.918 | 0.921 | 1,461 |
| Macro Average | 0.925 | 0.924 | 0.924 | 12,847 |
Bias Testing & Fairness
We conduct rigorous bias testing to ensure our models perform equitably across different product categories, price tiers, and brand segments.
Subgroup Performance Analysis
| Subgroup | Accuracy | Δ from Overall | Status |
|---|---|---|---|
| Luxury (>$500) | 93.1% | -1.1% | Pass |
| Mid-range ($100–500) | 95.2% | +1.0% | Pass |
| Budget (<$100) | 93.8% | -0.4% | Pass |
| Apparel | 94.7% | +0.5% | Pass |
| Accessories | 93.4% | -0.8% | Pass |
| Home Goods | 92.9% | -1.3% | Pass |
Threshold for bias concern: ±5% from overall accuracy. All subgroups pass.
Fairness Metrics
Model Calibration
A well-calibrated model means that when it predicts 80% confidence, it should be correct approximately 80% of the time. We use temperature scaling for calibration.
Calibration Results
| Confidence Bin | Predicted | Actual | Gap |
|---|---|---|---|
| 0.5–0.6 | 55% | 53.2% | 1.8% |
| 0.6–0.7 | 65% | 64.1% | 0.9% |
| 0.7–0.8 | 75% | 76.3% | 1.3% |
| 0.8–0.9 | 85% | 84.7% | 0.3% |
| 0.9–1.0 | 95% | 94.2% | 0.8% |
Known Limitations
We believe in transparency about what our models can and cannot do:
- Cold start: New brands with <50 historical transactions have higher uncertainty.
- Geographic bias: Training data is primarily US/EU; performance may vary in other markets.
- Category coverage: Electronics and vintage items have lower representation in training data.
- Temporal drift: Models are retrained monthly; rapid market shifts may cause temporary accuracy drops.
Research Foundations
Our behavioral nudge system is grounded in peer-reviewed research from leading behavioral economists and psychologists.
Foundational Works
Ariely, D., & Wertenbroch, K. (2002). Procrastination, deadlines, and performance: Self-control by precommitment. Psychological Science, 13(3), 219-224.https://doi.org/10.1111/1467-9280.00441
Cialdini, R. B. (2001). Influence: Science and practice (4th ed.). Allyn & Bacon.
Deci, E. L., & Ryan, R. M. (2000). The "what" and "why" of goal pursuits: Human needs and the self-determination of behavior. Psychological Inquiry, 11(4), 227-268.https://doi.org/10.1207/S15327965PLI1104_01
Kahneman, D., Knetsch, J. L., & Thaler, R. H. (1990). Experimental tests of the endowment effect and the Coase theorem. Journal of Political Economy, 98(6), 1325-1348.https://doi.org/10.1086/261737
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263-291.https://doi.org/10.2307/1914185
Loewenstein, G., & Prelec, D. (1992). Anomalies in intertemporal choice: Evidence and an interpretation. The Quarterly Journal of Economics, 107(2), 573-597.https://doi.org/10.2307/2118482
Thaler, R. H. (1999). Mental accounting matters. Journal of Behavioral Decision Making, 12(3), 183-206.https://doi.org/10.1002/(SICI)1099-0771
Thaler, R. H., & Sunstein, C. R. (2008). Nudge: Improving decisions about health, wealth, and happiness. Yale University Press.
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124-1131.https://doi.org/10.1126/science.185.4157.1124
Contemporary Works (2022–2025)
Bowles, S., & Halliday, S. D. (2022). Microeconomics: Competition, conflict, and coordination. CORE Econ.https://www.core-econ.org/
Cartwright, E. (2024). Behavioral economics (4th ed.). Routledge.https://doi.org/10.4324/9781003357971
Costa-Font, J., & Galizzi, M. M. (Eds.). (2024). Behavioural economics and policy for pandemics: Insights from responses to COVID-19. Cambridge University Press.
Dhami, S. (2025). Principles of behavioral economics: Microeconomics and human behavior. Cambridge University Press.
Earl, P. E. (2022). Principles of behavioral economics: Bringing together old, new and evolutionary approaches. Cambridge University Press.
Komlos, J. (2023). Foundations of real-world economics: What every economics student needs to know (3rd ed.). Routledge.
Sustainable Fashion & Circular Economy
Ellen MacArthur Foundation. (2024). Circular design for fashion.https://ellenmacarthurfoundation.org/
Fletcher, K., & Tham, M. (2024). Earth logic: Fashion action research plan (Updated ed.).https://earthlogic.info/
Niinimäki, K. (Ed.). (2023). Sustainable fashion in a circular economy. Aalto ARTS Books.
How We Apply This Research
Frame sustainable choices in terms of what users would lose by not choosing them, leveraging the 2-2.5x impact of losses vs. gains.
Design choice environments that make sustainable options the path of least resistance without restricting alternatives.
Counter present bias by making long-term benefits (cost-per-wear, resale value) concrete and immediate.
Offer tools like resale reminders that help users follow through on their sustainable intentions.
Questions about our methodology?
We welcome technical due diligence inquiries from prospective partners.