Skip to content

Provides basic dataframe functionality while relying only on native Python libraries. The dataframe adheres to the functional programming paradigm.

Notifications You must be signed in to change notification settings

hwatmos/simple-dataframe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple DataFrame

Simplistic replacement of pandas

Vision

My vision is for this library to be the most trusted Python library for data science, empowering organizations to confidently install and utilize a data science library on their local assets. By maintaining transparency through the exclusive use of pure Python and standard libraries, I ensure that my code is easily analyzed for vulnerabilities. Unlike other libraries with complex dependencies, my commitment to simplicity and security enables organizations to perform essential data science tasks without compromising safety.

While my library does not aim to replace more powerful libraries, it provides sufficient functionality for local data manipulation. I envision data scientists using this library to prepare and transform data locally, with the ability to easily encode, anonymize, and export this data to cloud environments for more advanced analysis. In this way, I enable secure and efficient data handling, ensuring that data is properly transformed before entering more complex and resource-intensive analytical workflows.

Goals

  1. Trustworthiness and Security

Ensure the library is safe and trustworthy for organizational use by utilizing only pure Python and standard libraries. This approach facilitates easy review by cybersecurity specialists.

  1. Functionality and Integration

Provide essential data science functionalities for local data preparation and export for easy ingestion into more powerful libraries (like scikit-learn, statsmodels, pandas) in a cloud environment.

  1. Transparency

Develop the library with transparent, understandable code, avoiding non-standard Python libraries. Provide thorough yet concise documentation, both functional and technical.

Core Functionality

The library is capable of:

  • Loading two-dimensional data
  • Applying manipulations to columns
  • Grouping and aggregating
  • Joining and concatenating datasets
  • Creating dummy variables
  • Exporting data

Additionally, it can completely anonymize a dataset while remembering all anonymization rules internally. The library's main class can be considered a processor rather than a data frame, focusing on data transformation rather than on the data itself.

Functionality

Tto-dos

  • Encode columns and remember encoding rules within the object itself.
  • Get dummies but remember which dummy cols describe the same feature.
    • This way don't need to display all dummies, can just display the original column with an indicator that it has been either "one-hot'ed" or "dummied" etc.
  • Concat

Specific to strings

  • Common operations on strings that pandas makes accessible via .str?

Specific to dates

  • Need to plan this out
  • First of the month
  • str to date

Loosely or un-defined

  • Simplify the common task of specifying dict of col labels and value types.
  • Get unique values. I use this a lot so maybe just implement functionality like df.col_name and it will print out unique if it is a categorical col?
  • Need a simplified mechanism for group by. I do this all the time, maybe specify each column as either Dimension or Measure? Then I can create method for collapsing dimensions but measure recalc may atipp be dependent on other measures and even in thr dimensions...

Research

About

Provides basic dataframe functionality while relying only on native Python libraries. The dataframe adheres to the functional programming paradigm.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages