Skip to content
forked from apple/turicreate

Turi Create simplifies the development of custom machine learning models.

License

Notifications You must be signed in to change notification settings

ylow/turicreate

This branch is 20 commits ahead of apple/turicreate:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Nov 10, 2024
97cc64d · Nov 10, 2024
Oct 16, 2024
Oct 16, 2024
Aug 7, 2019
Oct 16, 2024
Oct 23, 2024
Oct 23, 2024
Oct 9, 2020
Jun 29, 2020
Apr 29, 2020
Feb 5, 2020
Nov 22, 2019
Oct 19, 2024
Aug 16, 2021
Mar 25, 2020
Jul 3, 2019
Oct 23, 2024
Apr 30, 2019
Nov 12, 2018
Feb 12, 2018
Jan 22, 2018
Jan 2, 2018
May 2, 2019
Dec 18, 2019
Nov 10, 2024
Dec 1, 2018
Oct 19, 2024
Feb 21, 2020
May 15, 2020

Quick Links: Installation | Documentation

Turi Create

XFrame

The XFrame is a scalable column compressed disk-backed dataframe optimized for machine learning and data science needs. It supports for strictly typed columns (int, float, str, datetime), weakly typed columns (schema free lists, dictionaries) and has uniform support for missing data.

This is a fork of the SFrame project in Turi Create (originally GraphLab Create), started by Yucheng Low in the startup GraphLab/Dato/Turi between 2013 and 2016. Turi was later acquired by Apple in 2016 where we open-sourced the project in 2016. Efforts we made to keep it compiling for several years through heroic efforts by @TobyRoseman, but otherwise minimal investments were made.

However, I strongly believe that this is still one of most performant and useable data manipulation libraries in Python, and I have wanting to resurrect this project. However as the name SFrame and Turi Create were taken, we renamed it from SFrame to XFrame.

Currently, the fork is in an early stage. What has been done:

  • Removed all ML toolkits
  • Removed the SGraph datastructure
  • Renamed SFrame to XFrame

But there are many many places for improvement and modernization.

There is a significant amount of technical debt which speaks to the history of the project. The very original design of the project was a client-server model where the client and the server coupld be located on different machines. After a while we realized that was not particular useful and so launched both client and server on the same machine communicating via IPC (Interprocess Communication). This is the origin of the whole RPC system called "Unity". Eventually IPC became Inproc, then finally removed, but the basic class hierarchy structure remained.

Following which, there was an effort to build an easy to use C++ interface to all the datastructures so that people can write extensions/plugins in C++. As these extensions also needed an easy way to export to Python bindings, we introduced a whole other class registration mechanism under the "gl_" prefix (Ex: gl_sframe, gl_sarray, etc).

A lot of this can be aggressively simplified and removed.

Goals

  • Streamline Python <-> C++ bridge. We currently use Cython, and there is a non-trivial amount of Cython. Are there better ways to this today? A way to simplify this to use the stable Python APIs might be nice so that we do not need to do build for every other python version.
  • Lambdas currently work by spawning off Python subprocesses and running Interprocess communication. We could potentially replace this perhaps with new PyInterpreters in the same process? Or perhaps even multi-thread now that Python has a GIL-free interpreter?
  • Native Parquet support in the query engine would be really nice.
  • There is a lot of performance we are leaving on the table with vectorization. We could potentially implement our own, or perhaps consider using Arrow or other libraries?
  • Others?

Maintainers

Current maintainers are:

Supported Platforms

XFrame current supports only macOS 15+ (Sequoia) because that is what I have. Linux and Windows should be supported but untestsed.

About

Turi Create simplifies the development of custom machine learning models.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 61.2%
  • JavaScript 15.0%
  • Python 11.3%
  • Swift 6.7%
  • Cython 1.7%
  • CSS 1.4%
  • Other 2.7%