Algorythm Testing Results
Through meticulous validation and backtesting, Janus AI achieves precision-driven performance, delivering reliable, data-backed insights to empower secure crypto investments
Last updated
Through meticulous validation and backtesting, Janus AI achieves precision-driven performance, delivering reliable, data-backed insights to empower secure crypto investments
Last updated
In validating our XGBoost model, we conducted comprehensive backtesting to assess the accuracy and resilience of our approach in real-world conditions. Here’s a breakdown of each stage in our methodology, from data extraction to final performance evaluation, highlighting the rigor and depth involved in generating reliable and actionable risk predictions.
Data Sources and Initial Inspection: Our dataset, sourced from verified blockchain transaction records, includes token metrics such as withdrawal risk, blacklisting risk, liquidity ratios, etc. Each token is categorized by various risk attributes to capture a comprehensive profile of potential vulnerabilities.
Binary and Range Encoding:
Binary risk metrics (e.g., withdrawal_risk
, blacklisting_risk
) are encoded as 0 or 1 for simplicity, while continuous metrics (e.g., locked_liquidity_ratio
) are binned into categorical levels to streamline pattern recognition by the model. This encoding standardizes features across diverse tokens, enhancing model interpretability.
Missing Data Handling: Missing values for critical features are handled with robust imputation techniques. For high-impact variables, the median is used, preserving the dataset's integrity while minimizing bias. Key metrics with missing data are flagged, ensuring transparency in feature availability for each token.
Balancing the Dataset: Our dataset was stratified into “good” and “bad” classifications to prevent model bias. We implemented a 70/30 split, reserving 70% of tokens for training and backtesting and 30% for final testing. This approach ensures that both subsets represent a balanced distribution of risk classifications.
Separate Backtest and Test Sets: The training data is further divided into backtest and test samples, allowing the model to be validated on unseen data, mirroring live production conditions. By segmenting data into these distinct sets, we ensure that our backtesting results reflect realistic model performance when applied to new tokens.
Feature Selection and Label Definition: The XGBoost model was trained with selected token features as input variables, with the token classification (good or bad) as the target label. XGBoost’s ability to handle mixed feature types allows it to capture nuanced patterns across categorical, binary, and continuous risk metrics.
Hyperparameter Tuning:
To refine model performance, we optimized XGBoost’s hyperparameters (e.g., n_estimators
, max_depth
) using cross-validation, balancing between model complexity and generalizability. This fine-tuning is key to maximizing predictive accuracy without overfitting to the training data.
Performance Metrics: Our backtesting produced impressive results, demonstrating the model’s robustness across multiple validation metrics:
Accuracy: 75%, indicating that three-quarters of tokens were correctly classified as good or bad.
F1 Score: 0.74, showing a strong balance between precision and recall, crucial for minimizing false positives and negatives in high-stakes investment decisions.
ROC AUC Score: 83%, underscoring the model’s ability to distinguish between good and bad tokens effectively, even in cases with overlapping features.
Feature Importance Analysis:
XGBoost’s built-in feature importance insights reveal the weight of each token metric in determining classification outcomes. For instance, withdrawal_risk
and locked_liquidity_ratio
emerged as significant predictors, aligning with industry standards for token stability indicators.
Token Scoring and Classification: Tokens are assigned a probability score, with lower scores indicating greater risk. This enables users to interpret model outputs intuitively, translating complex data into actionable insights.
Continuous Model Improvement: As our model adapts to new data, backtesting and model recalibration will continue, ensuring that Argo maintains optimal performance as token ecosystems evolve. This commitment to ongoing validation reflects our dedication to transparency and reliability in crypto investment security.