Skip to content

Latest commit

 

History

History
283 lines (254 loc) · 17.7 KB

History.md

File metadata and controls

283 lines (254 loc) · 17.7 KB

0.1.4.1 (20 August 2016)

  • Fixes
    • Turns out that removing the dependencies did not load a few libraries from the Ruby standard library when daru is deployed on a fresh system. This release fixes that by adding extra require calls.

0.1.4 (19 August 2016)

  • Major Enhancements

    • Added new dependency 'backports' to support #to_h in Ruby 2.0. (@lokeshh)
    • Greatly improve code test coverage. (@zverok)
    • Greatly refactor code and make some methods faster, smaller and more readable. (@zverok)
    • Add support for categorical data with different coding schemes and several methods for in built categorical data support. Add a new index 'Daru::CategoricalIndex'. (@lokeshh)
    • Removed runtime dependencies on 'spreadsheet' and 'reportbuilder'. They are now loaded if the libraries are already present in the system. (@v0dro)
  • Minor enhancements

    • Update SqlDataSource to improve the performance of DataFrame.from_sql. (@dansbits)
    • Remove default DataFrame name. Now DataFrames will no name by default. (@zverok)
    • Better looking #inspect for Vector and DataFrame. (@zverok)
    • Better looking #to_html for Vector and DataFrame. Also better #to_html for MultiIndex. (@zverok)
    • Remove monkey patching on Array and add those methods to Daru::ArrayHelper. (@zverok)
    • Add a rake task for running RSpec for every Ruby version with a single command. (@lokeshh)
    • Add rake tasks for easily setting up and testing test harness. (@lokeshh)
    • Added Daru::Vector#to_nmatrix.
    • Remove the 'metadata' feature introduced in v0.1.3. (@gnilrets)
    • Added DataFrame#to_df and Vector#to_df. (@gnilrets)
  • Fixes

    • DataFrame#clone preserves order and name. (@wlevine)
    • Vector#where preserves name. (@v0dro)
    • Fix bug in DataFrame#pivot_table that prevented anything other than Array or Symbol to be specified in the :values option. (@v0dro)
    • Daru::Index#each returns an Enumerator if block is not specified. (@v0dro)
    • Fixes bug where joins failed when nils were in join keys. (@gnilrets)
    • DataFrame#merge now preserves the vector name type when merging. (@lokeshh)
  • Deprecations

    • Remove methods DataFrame#vector and DataFrame#column. (@zverok)
    • Remove the missing_values feature of daru. The only values that are now treated as 'missing' are nil and Float::NAN. (@lokeshh)

0.1.3.1 (12 May 2016)

  • Fixes
    • Fixed small error with usage of newly introduced #each_with_object (@v0dro).

0.1.3 (May 2016)

  • Enhancements
    • Proper error handling for case where an index specified by the user is not actually present in the DataFrame/Vector (@lokeshh).
    • DataFrame CSV writer function will now supress headers when passing headers: false (@gnilrets).
    • Refactor Index and MultiIndex so that a Vector or DataFrame can access the actual index number without having to check the exact type of index every time (@lokeshh).
    • Refactor Vector#[]= to not use conditionals (@lokeshh).
    • Custom #dup method for Daru::DateTimeIndex (@Deepakkoli93).
    • Massive performance boost to Vector and DataFrame sorting by using in-built Array#sort and removing previous hand-made sort (@lokeshh).
    • Handle nils in sorting for Vectors and DataFrame (@lokeshh, @gnilrets).
    • Add #describe function for Vectors (@shahsaurabh0605).
    • Adds support for concatenating dataframes that don't share all the same vectors (@gnilrets).
    • Massive performance enhancement for joins using the sorted merge method (@gnilrets).
    • New statistics methods and tests for DataFrame (@shahsaurabh0605).
    • Add explicit conversion to hash for DataFrame (DataFrame#to_h, Vector#to_h) and remove implicit conversion to hash (DataFrame#to_hash, Vector#to_hash) (@gnilrets).
    • Add DataFrame#rename_vectors for simplifying renaming of vectors in DataFrame (@gnilrets).
    • MultiIndex raises error on accessing an invalid index (@shreyanshd).
    • Order columns as given in the CSV file when reading into a DataFrame from CSV using DataFrame.from_csv (@lokeshh).
    • Add Vector#percent_change and DataFrame#percent_change (@shahsaurabh0605).
    • Faster DataFrame#filter_rows (@lokeshh).
    • Added Vector#emv for calculating exponential moving variance of Vector (@shahsaurabh0605).
    • Add support for associating metadata with a Vector or DataFrame using the :metadata option (@gnilrets).
    • Add Vector#emsd for calculating exponential moving standard deviation of Vector (@shahsaurabh0605).
    • Sample and population covariance functions for Vector (@shahsaurabh0605).
    • Improve DataFrame#dup performance (@gnilrets).
    • Add Daru::DataFrame::Core::GroupBy#reduce for reducing groups by passing a block (@gnilrets).
    • Add rubocop as development dependency and make changes suggested by it to conform to the Ruby Style Guide (@zverok).
    • Allow Daru::Index to be initialized by a Range (@lokeshh).
  • Fixes
    • Fix conflict with narray that caused namespace clashes with nmatrix in case both narray and nmatrix were installed on the user's system (@lokeshh).
    • Fix bug with dataframe concatenation that caused modifying the arrays that compose the vectors in the original dataframes (@gnilrets).
    • Fix an error where the Vectors in an empty DataFrame would not be assigned correct names (@lokeshh).
    • Correct spelling mistakes and fix broken links in README (@lokeshh).
    • Fix bug in Vector#mode (@sunshineyyy).
    • Fix Vector#index_of method to handle dtype :array differently (@lokeshh).
    • Fix DateTimeIndex#include? method since it was raising an exception when index not found. It returns false now (@Phitherek).
    • Handle nils in group_by keys (@gnilrets).
    • Handle nils for statistics methods in Vector and DataFrame for :array and :gsl data (@lokeshh).
    • Fix DataFrame#clone when no arguments have been passed to it (@lokeshh).
    • Fix bug when joining empty dataframes (@gnilrets).

0.1.2

  • Enhancements

    • New method DataFrame.from_activerecord for importing data sets from ActiveRecord. (by @mrkn)
    • Better importing of data from SQL databases by extracting that functionality into a separate class called Daru::IO::SqlDataSource (by @mrkn).
    • Faster algorithm for performing inner joins by using the bloomfilter-rb gem. Available only for MRI. (by Peter Tung)
    • Added exception SizeError (by Peter Tung).
    • Removed outdated dependencies and build scripts, updated existing dependencies.
    • Ability to sort a Daru::Vector with nils present (by @gnilrets)
  • Fixes

    • Fix column creation for Dataframe.from_sql (by @dansbits).
    • group_by can now be performed on DataFrames with nils (@gnilrets).
    • Bug fix for DataFrame Vectors not duplicating when calling DataFrame#dup (by @gnilrets).
    • Bug fix when concantenating DataFrames (by @gnilrets)
    • Handling improper arguments to Daru::Vector#[] (by @lokeshh)
    • Resolve narray conflict by using the latest nmatrix require methods (by @lokeshh)

0.1.1

  • Enhancements
    • Added a new class Daru::Offsets for providing a uniform API to jump between dates.
    • Added benchmarking scripts
    • Added a new Arel-like querying syntax for Vector and DataFrame. This will allow faster and more intuitive lookup of data than using loops such as filter.
    • Vector
      • #concat now compulsorily requires a second index argument.
      • Added new method #index= to change the index directly.
      • Added basic functions for rolling statistics - mean, std, count, etc.
      • Added cumulative sum function.
      • Added #keep_if.
      • Added #count_values.
    • Indexing
      • Changed Index so that it now accepts all sorts of data (not restricted to only Symbols as it was previously).
      • Re wrote MultiIndex in levels and labels form so that its faster and more accomodative of different kinds of index levels.
      • Changed .new to return appropriate index object based on data passed.
      • Added .from_tuple and .from_array methods to MultiIndex.
      • Added union and intersection behaviour to Index and MultiIndex.
      • Added a new index, DateTimeIndex for indexing with time-based data.
      • Optimized range search for Index.
    • DataFrame
      • Removed the DataFrameByVector class and the #vector function. Now only way to access a Vector in a DF is by using the #[] operator.
      • Added new method #index= and #vectors= for changing row and column indexes directly.
      • Optimized Vector value setting and retreival.
      • Added inner, outer, left outer and right outer joins with the #join method.
      • Added #set_index.
  • Changes
    • Removed the + operator overload from Index and replaced in with union.
    • Removed the second 'values' argument from Daru::Index because it's redundant.
    • Changed behaviour of Vector#reindex and DataFrame#reindex and #reindex_vectors to preserve indexing of original data when possible.
  • Fixes
    • Fixed DataFrame#delete_row and Vector#delete_if.
    • Fixed Vector#rename.

0.1.0

  • Fixes
    • Update documentation and fix it in other places.
    • Fix Vector#sum_of_squares and #ranked.
    • Fixed some tests that were giving RSpec warnings
    • Fixed a bug where nyaplot not being present would raise a warning.
    • Fixed a bug in DataFrame row assignment.
  • Enhancements
    • Wrote a proper .travis.yml
    • Added optional GSL dependency gsl-nmatrix
    • Added Marshalling and unMarshalling capabilities to Vector, Index and DataFrame.
    • Added new method Daru::IO.load for loading data from files by marshalling.
    • Lots of documentation and new notebooks.
    • Added data loading and writing from and to CSV, Excel, plain text and SQL databases.
    • Daru::DataFrame and Vector have now completely replaced Statsample::Dataset and Vector.
    • Vector
      • #center
      • #standardize
      • #vector_percentile
      • Added a new wrapper class Daru::Accessors::GSLWrapper for wrapping around GSL::Vector, which works similarly to NMatrixWrapper or ArrayWrapper.
      • Added a host of statistical methods to GSLWrapper in Daru::Accessors::GSLStatistics that call the relevant GSL::Vector functions for super-fast C level computations.
      • More stats functions - #vector_standardized_compute, #vector_centered_compute, #sample_with_replacement, #sample_without_replacement
      • #only_valid for creating a Vector with only non-nil data.
      • #only_missing for creating a Vector of only missing data.
      • #only_numeric to create Vector of only numerical data.
      • Ported many Statsample::Vector stat methods to Daru::Vector. These are: #percentile, #factors, etc.
      • Added .new_with_size for creating vectors by specifying a size for the vector and a block for generating values.
      • Added Vector#verify, #recode! and #recode.
      • Added #save, #jackknife and #bootstrap.
      • Added #missing_values= that will allow setting values for treating data as 'missing'.
      • Added #split_by_separator, #split_by_separator_freq and #splitted.
      • Added #reset_index!
      • Added #any? and #all?
      • Added #db_type for guessing the type of SQL type contained in the vector.
      • Added and tested plotting support for histogram and box plot.
    • DataFrame
      • #dup_only_valid
      • #clone, #clone_only_valid, #clone_structure
      • #[]= does not clone the vector if it has the same index as the DataFrame.
      • Added a :clone option to initialize that will not clone Daru::Vectors passed into the constructor.
      • Added #save.
      • Added #only_numerics.
      • Added better iterators and changed some behaviour of previous ones to make them more ruby-like. New iterators are #map, #map!, #each, #recode and #collect.
      • Added #vector_sum and #vector_mean.
      • Added #to_gsl to convert to GSL::Matrix.
      • Added #has_missing_data? and #missing_values_rows.
      • Added #compute and #verify.
      • Added .crosstab_by_assignation to generate data frame from row, column and value vectors.
      • Added #filter_vector.
      • Added #standardize and added argument option to #dup.
      • Added #any? and #all? for vector and row axis.
      • Better creation of empty data frames.
      • Added #merge, #one_to_many, #add_vectors_by_split_recode
      • Added constant SPLIT_TOKEN and methods #add_vectors_by_split, .[], #summary.
      • Added #bootstrap.
      • Added a #filter method to wrap around #filter_vectors and #filter_rows.
      • Greatly improved plotting function.
    • Added a lazy update feature that will allow users to delay updating the missing positions index until the last possible moment.
    • Added interoperaility with rserve client which makes it possible to change daru data to R data and perform computation there.
  • Changes
    • Changes Vector#nil_positions to Vector#missing_positions so that future changes for accomodating different values for missing data can be made easily.
    • Changed History.txt to History.md

0.0.5

  • Easy accessors for some methods
  • Faster CSV loading.
  • Changed vector #is_valid? to #exists?
  • Revamped dtype specifiers for Vector. Now specify :array/:nmatrix for changing underlying data implementation. Specigfy nm_dtype for specifying the data type of the NMatrix object.
  • #sort for Vector. Quick sort algorithm with preservation of original indexes.
  • Removed #re_index and #to_index from Daru::Index.
  • Ability to change the index of Vector and DataFrame with #reindex/#reindex!.
  • Multi-level #sort! and #sort for DataFrames. Preserves indexing.
  • All vector statistics now work with NMatrix as the underlying data type.
  • Vectors keep a record of all positions with nils with #nil_positions.
  • Know whether a position has nils or not with #is_nil?
  • Added #clone_structure to Vector for cloning only the index and structure or a vector.
  • Figure out the type of data using #type. Running thru the data to determine its type is delayed till the last possible moment.
  • Added arithmetic operations between data frame and scalars or other data frames.
  • Added #map_vectors!.
  • Create a DataFrame from Array of Arrays and Array of Vectors.
  • Refactored DataFrame.rows and the DataFrame constructor.
  • Added hierarchial indexing to Vector and DataFrame with MultiIndex.
  • Convert DataFrame to ruby Matrix or NMatrix with #to_matrix and #to_nmatrix.
  • Added #group_by to DataFrame for grouping rows according to elements in a given column. Works similar to SQL GROUP BY, only much simpler.
  • Added new class Daru::Core::GroupBy for supporting various grouping methods like #head, #tail, #get_group, #size, #count, #mean, #std, #min, #max.
  • Tranpose indexed/multi-indexed DataFrame with #transpose.
  • Convert Daru::Vector to horizontal or vertical Ruby Matrix with #to_matrix.
  • Added shortcut to DataFrame to allow access of vectors by using only #[] instead of calling #vector or [vector_names, :vector].
  • Added DSL for Vector and DataFrame plotting with nyaplot. Can now grab the underlying Nyaplot::Plot and Nyaplot::Diagram object for performing different operations. Only need to supply parameters for the initial creation of the diagram.
  • Added #pivot_table to DataFrame for reducing and aggregating data to generate a quick summary.
  • Added #shape to DataFrame for knowing the numbers of rows and columns in a DataFrame.
  • Added statistics methods #mean, #std, #max, #min, #count, #product, #sum to DataFrame.
  • Added #describe to DataFrame for producing multiple statistics data of numerical vectors in one shot.
  • Monkey patched Ruby Matrix to include #elementwise_division.
  • Added #covariance to calculate the covariance between numbers of a DataFrame and #correlation to calculate correlation.
  • Enumerators return Enumerator objects if there is no block.

0.0.4

  • Added wrappers for Array, NMatrix and MDArray such that the external implementation is completely transparent of the data type being used internally.
  • Added statistics methods for vectors for ArrayWrapper. These are compatible with statsample methods.
  • Added plotting functions for DataFrame and Vector using Nyaplot.
  • Create a DataFrame by specifying the rows with the ".rows" class method.
  • Create a Vector from a Hash.
  • Call a Vector element by specfying the index name as a method call (method_missing logic).
  • Retrive multiple rows of a DataFrame by specfying a Range or an Array with multiple index names.
  • #head and #tail for DataFrame.
  • #uniq for Vector.
  • #max for Vector can return a Vector object with the index set to the index of the max value.
  • Tonnes of documentation for most methods.

0.0.3.1

  • Added aritmetic methods for vector aritmetic by taking the index of values into account.

0.0.3

  • This release is a complete rewrite of the entire gem to accomodate index values.

0.0.2.4

  • Initialize dataframe from an array which looks like [{a: 10, b: 20}, {a: 11, b: 12}]. Works for parsed JSON.
  • Over-riding vectors in DataFrame will still preserve order.
  • Any re-assignment of rows in #each_row and #each_row_with_index will reflect in the DataFrame.
  • Added #to_a and #to_json to DataFrame.

0.0.2.3

  • Added #filter_rows and #delete_row to DataFrame and changed #row to return a row containing a Hash of column name and value.
  • Vector objects passed into a DataFrame are now duplicated so that any changes dont affect the original vector.
  • Added an optional opts argument to DataFrame.
  • Sending more fields than vectors in DataFrame will cause addition of nil vectors.
  • Init a DataFrame without having to convert explicitly to vectors.

0.0.2.2

  • Added test cases and multiple column access through the [] operator on DataFrames

0.0.2.1

  • Fixed bugs with previous code and more iterators

0.0.2

  • Added iterators for dataframe and vector alongwith printing functions (to_html) to interface properly with iRuby notebook.

0.0.1

  • Added classes for DataFrame and Vector alongwith some super-basic functions to get off the ground