Janus AI Whitepaper
  • 👋Welcome to Janus AI
  • Overview
    • 📝Thesis
    • ✨Mission
    • 🌓Products
    • Tokenomics
    • Business Plan
  • Argo AI
    • 🤖XGBoost Model
    • 🌿Algorythm Testing Results
      • 🎯Metric Categories
  • Midas AI
    • 🛠️Model
      • 📝RUG Detector
      • 🧑Token Finder
  • Use Cases
    • 🎨For Investors
    • 🖥️For Developers
    • 🫅For Token Holders
Powered by GitBook
On this page
  • Data Extraction and Preprocessing
  • Data Splitting for Validation: Training, Testing, and Backtesting
  • Model Training: XGBoost Implementation
  • Model Evaluation and Backtesting Results
  • Interpreting and Communicating Model Outcomes
  1. Argo AI

Algorythm Testing Results

Through meticulous validation and backtesting, Janus AI achieves precision-driven performance, delivering reliable, data-backed insights to empower secure crypto investments

PreviousXGBoost ModelNextMetric Categories

Last updated 6 months ago

In validating our XGBoost model, we conducted comprehensive backtesting to assess the accuracy and resilience of our approach in real-world conditions. Here’s a breakdown of each stage in our methodology, from data extraction to final performance evaluation, highlighting the rigor and depth involved in generating reliable and actionable risk predictions.


Data Extraction and Preprocessing

  • Data Sources and Initial Inspection: Our dataset, sourced from verified blockchain transaction records, includes token metrics such as withdrawal risk, blacklisting risk, liquidity ratios, etc. Each token is categorized by various risk attributes to capture a comprehensive profile of potential vulnerabilities.

  • Binary and Range Encoding: Binary risk metrics (e.g., withdrawal_risk, blacklisting_risk) are encoded as 0 or 1 for simplicity, while continuous metrics (e.g., locked_liquidity_ratio) are binned into categorical levels to streamline pattern recognition by the model. This encoding standardizes features across diverse tokens, enhancing model interpretability.

  • Missing Data Handling: Missing values for critical features are handled with robust imputation techniques. For high-impact variables, the median is used, preserving the dataset's integrity while minimizing bias. Key metrics with missing data are flagged, ensuring transparency in feature availability for each token.


Data Splitting for Validation: Training, Testing, and Backtesting

  • Balancing the Dataset: Our dataset was stratified into “good” and “bad” classifications to prevent model bias. We implemented a 70/30 split, reserving 70% of tokens for training and backtesting and 30% for final testing. This approach ensures that both subsets represent a balanced distribution of risk classifications.

  • Separate Backtest and Test Sets: The training data is further divided into backtest and test samples, allowing the model to be validated on unseen data, mirroring live production conditions. By segmenting data into these distinct sets, we ensure that our backtesting results reflect realistic model performance when applied to new tokens.

Model Training: XGBoost Implementation

  • Feature Selection and Label Definition: The XGBoost model was trained with selected token features as input variables, with the token classification (good or bad) as the target label. XGBoost’s ability to handle mixed feature types allows it to capture nuanced patterns across categorical, binary, and continuous risk metrics.

  • Hyperparameter Tuning: To refine model performance, we optimized XGBoost’s hyperparameters (e.g., n_estimators, max_depth) using cross-validation, balancing between model complexity and generalizability. This fine-tuning is key to maximizing predictive accuracy without overfitting to the training data.

Model Evaluation and Backtesting Results

  • Performance Metrics: Our backtesting produced impressive results, demonstrating the model’s robustness across multiple validation metrics:

    • Accuracy: 75%, indicating that three-quarters of tokens were correctly classified as good or bad.

    • F1 Score: 0.74, showing a strong balance between precision and recall, crucial for minimizing false positives and negatives in high-stakes investment decisions.

    • ROC AUC Score: 83%, underscoring the model’s ability to distinguish between good and bad tokens effectively, even in cases with overlapping features.

  • Feature Importance Analysis: XGBoost’s built-in feature importance insights reveal the weight of each token metric in determining classification outcomes. For instance, withdrawal_risk and locked_liquidity_ratio emerged as significant predictors, aligning with industry standards for token stability indicators.

Interpreting and Communicating Model Outcomes

  • Token Scoring and Classification: Tokens are assigned a probability score, with lower scores indicating greater risk. This enables users to interpret model outputs intuitively, translating complex data into actionable insights.

  • Continuous Model Improvement: As our model adapts to new data, backtesting and model recalibration will continue, ensuring that Argo maintains optimal performance as token ecosystems evolve. This commitment to ongoing validation reflects our dedication to transparency and reliability in crypto investment security.

🌿