Issues BSD-3-Clause License CodeQL


Logo

Mojmelo

Report Bug · Request Feature

About The Project

The name Mojmelo is derived from the "Mojo Machine Learning" expression. It includes the implementation of Machine Learning algorithms from scratch in pure Mojo. Here is the list of the algorithms:

  • Linear Regression
  • Polynomial Regression
  • Logistic Regression
  • KNN
  • KMeans
  • HDBSCAN
  • DBSCAN
  • SVM
  • Naive Bayes
    1. GaussianNB
    2. MultinomialNB
  • Decision Tree (Regression/Classification)
  • Random Forest (Regression/Classification)
  • GBDT (Regression/Classification)
  • PCA

Preprocessing:

  • normalize
  • MinMaxScaler
  • StandardScaler
  • KFold
  • GridSearchCV
  • LabelEncoder

Documentation: https://yetalit.github.io/Mojmelo/docs/_index.html

Getting Started

If you are not familiar with Mojo projects, you can get started here: https://mojolang.org/docs/manual/get-started/

Prerequisites

  • mojo-compiler 1.0.0b1

Optionally, bellow Python packages can be installed for a better usability and to run tests:

  1. Numpy
  2. Pandas
  3. Scikit-learn
  4. Matplotlib

Installation

There are three ways to install mojmelo: Using Pixi CLI, PyPI CLI and through the source code.

Additionally, completing the setup process (discussed later) is recommended.

Pixi CLI

Make sure you have the Modular community channel (https://repo.prefix.dev/modular-community) in your pixi.toml file in the channels section, then add mojmelo this way:

pixi add mojmelo

To start the setup process, run the following command from the main folder of your project:

bash ./.pixi/envs/default/etc/conda/test-files/mojmelo/0/tests/setup.sh

Note: If CPU cache details are available by the OS, benchmarking parts of the setup will be skipped. Otherwise, please try not to run other tasks on your pc during the process for better results.

PyPI CLI

Using the command below, the PyPI package containing the source code will be installed from the github repository:

pip install "git+https://github.com/yetalit/Mojmelo.git#subdirectory=pypi"

Then start the setup process this way:

mojmelo-setup

Note: If CPU cache details are available by the OS, benchmarking parts of the setup will be skipped. Otherwise, please try not to run other tasks on your pc during the process for better results.

Source Code

Mojmelo can also be installed through the source code. This way, you will have the source code in your project.

First, Download mojmelo folder and setup.mojo file. To start the setup process, run these commands from where mojmelo folder and setup.mojo file are stored:

mojo build setup.mojo -o setup &&
./setup &&
./setup 1 &&
./setup 2 &&
./setup 3 &&
./setup 4 &&
./setup 5 &&
./setup 6 &&
./setup 7 &&
./setup 8 &&
./setup 9 &&
rm -f ./setup

Note: If CPU cache details are available by the OS, benchmarking parts of the setup will be skipped. Otherwise, please try not to run other tasks on your pc during the process for better results.

Usage

Importing models is straightforward:

from mojmelo.LinearRegression import LinearRegression

You may also want to use the utility codes written for this project:

from mojmelo.utils.Matrix import Matrix
from mojmelo.utils.utils import *

Benchmarks (AMD Zen 4)

KMeans

Model Fit Time (s) ARI vs sklearn ARI vs truth
sklearn KMeans 0.2716 ± 0.0012 - 0.9389
mojmelo KMeans 0.1870 ± 0.0052 0.8821 0.9389

HDBSCAN (algorithm='boruvka_kdtree')

Model Fit Time (s) ARI vs sklearn ARI vs truth
skl-contrib HDBS 1.1495 ± 0.0083 - 0.9997
mojmelo HDBS 0.3198 ± 0.0079 0.9930 0.9932

DBSCAN (algorithm='kd_tree')

Model Fit Time (s) ARI vs sklearn ARI vs truth
sklearn DBS 1.1434 ± 0.0055 - 0.8566
mojmelo DBS 0.4028 ± 0.0038 0.9996 0.8566

KNN (algorithm='kd_tree')

Model Fit Time (s) Predict Time (s) Accuracy
sklearn KNN 0.0353 ± 0.0005 1.7600 ± 0.0063 0.8543
mojmelo KNN 0.0149 ± 0.0006 0.2126 ± 0.0040 0.8347

SVM

Model Fit Time (s) Predict Time (s) Accuracy
sklearn SVM 1.0595 ± 0.0010 0.3066 ± 0.0002 0.9798
mojmelo SVM 0.8733 ± 0.0129 0.0603 ± 0.0032 0.9797

DecisionTreeClassifier

Model Fit Time (s) Predict Time (s) Accuracy
sklearn DTC 0.9051 ± 0.0008 0.0004 ± 0.0000 0.9300
mojmelo DTC 0.0749 ± 0.0028 0.0002 ± 0.0000 0.9328

DecisionTreeRegressor

Model Fit Time (s) Predict Time (s) MSE
sklearn DTR 0.6466 ± 0.0006 0.0005 ± 0.0000 8247.9358
mojmelo DTR 0.0795 ± 0.0049 0.0003 ± 0.0000 8192.1982

RandomForestClassifier

Model Fit Time (s) Predict Time (s) Accuracy
sklearn RFC 0.4707 ± 0.0064 0.0140 ± 0.0003 0.9182
mojmelo RFC 0.4534 ± 0.0094 0.0040 ± 0.0000 0.9174

RandomForestRegressor

Model Fit Time (s) Predict Time (s) MSE
sklearn RFR 2.0257 ± 0.0050 0.0134 ± 0.0004 8454.5517
mojmelo RFR 1.2247 ± 0.0094 0.0067 ± 0.0002 9155.6895

PCA (svd_solver='full')

Model Fit Time (s) Transform Time (s) Explained Var
sklearn PCA 0.2070 ± 0.0025 0.0061 ± 0.0000 0.5363
mojmelo PCA 0.0737 ± 0.0003 0.0270 ± 0.0015 0.5363

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Contributions can be done to the project in these 3 ways:

  1. Applying improvements to the code and opening a Pull Request
  2. Reporting a bug
  3. Suggesting new features

Acknowledgments

  • Mojo usage and distribution are licensed under the Modular Community License.

  • Libsvm, A Library for Support Vector Machines by Chih-Chung Chang and Chih-Jen Lin licensed under the BSD-3-Clause license.

  • HDBSCAN implementation is partially based on hdbscan by Leland McInnes, John Healy and Steve Astels licensed under the BSD-3-Clause license and Fast Multicore HDBSCAN by Tutte Institute for Mathematics and Computing licensed under the BSD-2-Clause license.

  • matmul implementation is based on matmul.mojo by Ethan Wu (YichengDWu) licensed under the Apache-2.0 license.

  • argmin, argmax and argsort implementations are based on codes from Modular licensed under the Apache License v2.0 with LLVM Exceptions.

  • KDTREE2, a kd-tree implementation in Fortran 95 and C++ by Matthew B. Kennel.

  • Initially drew inspiration from Patrick Loeber's MLfromscratch.