Releases: activeloopai/deeplake
v2.1.1 🌈
🧭 What's Changed
- [AL-1132] Added the ability to do inplace transforms (#1354) @AbhinavTuli
- Pytorch transform api (#1337) @AbhinavTuli
- Chunk engine and chunk refactor (#1307) @AbhinavTuli
- Fix hub list deletion (#1340) @AbhinavTuli
- Autocast empty samples (#1334) @farizrahman4u
- Fix to ensure agreement isn't reloaded for each slice of dataset (#1347) @AbhinavTuli
- added bench repo branch input, set units as default (#1348) @gautamkrishnar
- [tiny] let users load datasets from newer versions of hub (#1306) @nollied
- Fixes pytorch with text (#1342) @AbhinavTuli
- minor bug fix for the benchmark code (#1332) @gautamkrishnar
- Added statement in Version Control log when uncommitted data is present (#1328) @AbhinavTuli
- Fix "NumPy array is not writeable" (#1338) @aliubimov
- Fix leaked semaphore issue on mac (#1291) @AbhinavTuli
⚙️ Who Contributes
@AbhinavTuli, @aliubimov, @davidbuniat, @farizrahman4u, @gautamkrishnar, @mikayelh, @nollied and @tatevikh
v2.1.0 🌈
🧭 What's Changed
- Default shuffle buffer size 2GB (#1330) @aliubimov
- bug_report.md typo and README Badge fix (#1302) @qzylalala
- Bye Bye CircleCI (#1329) @farizrahman4u
- Refactor pytorch integration (#1257) @aliubimov
- Fixes slowdowns with Hub (#1304) @AbhinavTuli
- Added text signwall within Hub (#1323) @AbhinavTuli
- Readme typo (#1299) @farizrahman4u
- Update README.md (#1322) @istranic
- Fixes bug with groups on hub cloud datasets (#1321) @AbhinavTuli
- simplify Github actions (#1318) @gautamkrishnar
- Progressbar + bad transform hanging fix (#1295) @farizrahman4u
- Video (#1216) @FayazRahman
- Changing default dataset setting from public to private (#1317) @istranic
- Add GitHub actions (#1241) @araratpoghosyan
- enable str objects to be converted to tensorflow (#1309) @hoshimura
- Persistence fix (#1298) @farizrahman4u
- Fix/cloud dataset bugs (#1294) @nollied
- [tiny] fix formatting + update keypointscoco to int32 instead of float (#1272) @nollied
- Removed overwrite and public arguments from hub.load (#1275) @AbhinavTuli
- Jpeg fix issue with verify (#1297) @farizrahman4u
- hub.delete + logging fixes (#1278) @farizrahman4u
- Feature/add docker compose (#1250) @davidbuniat
- Tensor.bytes() (#1280) @farizrahman4u
- Add class_names as an option to segment_mask htype (#1281) @farizrahman4u
- Fix grayscale warnings + hub.read docstring (#1289) @farizrahman4u
- Better retries for s3 (#1283) @AbhinavTuli
- Byte compression + empty sample fix (#1293) @farizrahman4u
- Readme updates (#1265) @farizrahman4u
- refactor core/Dataset (#1284) @nollied
- Add token to hub list (#1276) @AbhinavTuli
- Add pytest fixture decorator to test_gcs_tokens (#1285) @dhiganthrao
- Progress bar for hub.eval (#1200) @farizrahman4u
- hub/compression.py dosctring update (#1279) @farizrahman4u
- Better error message for invalid dataset (#1282) @farizrahman4u
⚙️ Who Contributes
@AbhinavTuli, @FayazRahman, @aliubimov, @araratpoghosyan, @davidbuniat, @dhiganthrao, @farizrahman4u, @gautamkrishnar, @hoshimura, @istranic, @mikayelh, @nollied, @qzylalala and @tatevikh
v2.0.13 🌈
🧭 What's Changed
- added improvements and regressions as artifacts (#1255) @gautamkrishnar
- Fixes empty samples in hub compute (#1269) @AbhinavTuli
- [2 liner] TorchDataset.len (#1268) @mccrearyd
- Fix creating nested tensor from non-head node (#1261) @FayazRahman
- fix text not working for transforms (#1258) @mccrearyd
- [Tiny] Updated htype docstring for segment mask (#1264) @istranic
- Added coco_keypoint htype and changed default dtype for segment masks (#1262) @istranic
- Groups perf fix (#1212) @farizrahman4u
- Jpeg: External test images repo + small fixes (#1249) @farizrahman4u
- Json, List, Text (#1214) @farizrahman4u
- fix pytest dependency error (#1254) @mccrearyd
- PNG compression for arbitrary number of channels (#1252) @farizrahman4u
- [tiny] tqdm progress bar for pytorch (#1243) @mccrearyd
🔗 Dependency Updates
- Bump pillow from 8.2.0 to 8.3.2 in /hub/requirements (#1151) @dependabot
⚙️ Who Contributes
@AbhinavTuli, @FayazRahman, @davidbuniat, @dependabot, @farizrahman4u, @gautamkrishnar, @istranic and @mccrearyd
v2.0.12 🌈
🧭 What's Changed
- Fix segmentation fault (#1248) @aliubimov
- added pull_request_target check (#1246) @gautamkrishnar
- removed repository name from pr checks (#1245) @gautamkrishnar
- Added reporting to ingest, ingest_kaggle, and list (#1242) @istranic
- PyTorch & Tensorflow aliases for datasets (#1238) @muhdhisham
- added manual benchmarking for hub baseline (#1224) @gautamkrishnar
- Added benchmarking workflow to hub pull requests (#1218) @gautamkrishnar
- Fix/sagemaker creds (#1204) @AbhinavTuli
- Fix Getting Started link (#1237) @aliubimov
- Fix VC test (#1233) @farizrahman4u
- Add flac and wav support (#1220) @FayazRahman
- Fix pickle5 issue (#1230) @AbhinavTuli
- Fix grayscale tests (#1222) @farizrahman4u
- Fixes issues with kaggle tests after audio merge (#1231) @AbhinavTuli
- Beautify meta .json files (#1229) @jakbin
- Update README.md (#1194) @istranic
- Audio (#1207) @farizrahman4u
- Bring back "Linux Timple Test" (#1203) @Diveafall
- AL-1368 Handle grayscale image after color image (#1206) @jraman
- Added reporting for version control (#1209) @istranic
- [tiny] Update error messages (#1211) @mccrearyd
- Distributed computation with ray (#1140) @AbhinavTuli
- Revert "Adding shared memory support for Pytorch 3.6/3.7, speeding up pytorch integration" (#1210) @AbhinavTuli
- AL-1433 List compression types for hub.compression (#1208) @jraman
- Adding shared memory support for Pytorch 3.6/3.7, speeding up pytorch integration (#1196) @AbhinavTuli
- lz4 backward compatibility (#1201) @FayazRahman
- Update docs for hub.ingest (#1199) @jraman
- Removing "Linux Simple Test" from all workflows (#1198) @Diveafall
- Change lz4 implementation to numcodecs (#1197) @FayazRahman
- Revert "PyTorch & Tensorflow aliases for datasets" (#1195) @mccrearyd
- PyTorch & Tensorflow aliases for datasets (#1183) @muhdhisham
⚙️ Who Contributes
@AbhinavTuli, @Diveafall, @FayazRahman, @aliubimov, @davidbuniat, @farizrahman4u, @gautamkrishnar, @istranic, @jakbin, @jraman, @mccrearyd, @mikayelh, @muhdhisham and @tatevikh
Google Cloud Support and Hierarchical Datasets 🌈
🩰 What's New
- Added hierarchical datasets (tensor groups)
- Added support for Google Cloud Storage (
path = gcs://...
) - Added version control alpha
🧭 What's Changed
- Fix version control speed for first commit (#1192) @AbhinavTuli
- Missing compressions fix (#1191) @farizrahman4u
- Tensor group fixes (#1190) @farizrahman4u
- gcs path fix (#1188) @davidbuniat
- Version Control (#1152) @AbhinavTuli
- One more jpg fix (#1186) @farizrahman4u
- fix running tests fails when no gcs creds exist (#1187) @mccrearyd
- Istranic update readme (#1185) @istranic
- GCS support (#1125) @kristinagrig06
- [Small] Fixes logging (#1153) @AbhinavTuli
- Modify dataset privacy (#1141) @kristinagrig06
- Imagenet weird jpegs fix (#1155) @farizrahman4u
- Hierarchical Tensors (#1139) @farizrahman4u
- [small] client/utils.py: Dont sys.exit on Bad request (#1154) @farizrahman4u
- Add Tensor to API reference (#1146) @kristinagrig06
- [small-Please approve] fix typo (#1144) @thisiseshan
⚙️ Who Contributes
@AbhinavTuli, @davidbuniat, @farizrahman4u, @istranic, @kristinagrig06, @mccrearyd, @tatevikh and @thisiseshan
v2.0.9 🌈
🧭 What's Changed
- Multi - SOF jpeg fix (#1134) @farizrahman4u
- Pytorch shuffling (#1122) @AbhinavTuli
- DS locking: atexit release (#1138) @farizrahman4u
- Treat size 1 arrays as scalars when casting (#1135) @farizrahman4u
- Tensor inplace ops (#1136) @farizrahman4u
- Small fix (#1133) @farizrahman4u
- Faster deserialization (#1131) @farizrahman4u
- hub.read speedup (4x) (#1126) @farizrahman4u
- Delete .from_path() method (#1128) @kristinagrig06
- [small] better error messages + fix ingest_kaggle (#1132) @mccrearyd
- DS lock fix (#1129) @farizrahman4u
- auto refinement (#1124) @thisiseshan
- Chunk-wise compression (#1093) @farizrahman4u
- Ingestion summary (#1117) @thisiseshan
- hub.read speed up (>6X) (#1120) @farizrahman4u
- Dataset locking (#1119) @farizrahman4u
- Backward compat fix (#1121) @farizrahman4u
- backwards compatibility (and CI tests!) (#1110) @mccrearyd
- fix pytorch tests chunk sizes + max_chunk_size in
create_tensor
(#1114) @mccrearyd - Auto compression (#1109) @thisiseshan
- add --kaggle functionality (#1108) @thisiseshan
- Fixes an issue with pytorch old read only (#1113) @AbhinavTuli
- Add Dataset functions to docs (#1098) @kristinagrig06
- [small]
hub.htypes
is a list of all htypes (#1104) @mccrearyd
⚙️ Who Contributes
@AbhinavTuli, @davidbuniat, @farizrahman4u, @kristinagrig06, @mccrearyd, @tatevikh and @thisiseshan
Fixes to parallel computing 🌈
🧭 What's Changed
- Fix/num workers 0 bug (#1112) @davidbuniat
- Move hub.compute reporting to pipeline eval (#1116) @istranic
- Updates transform docstrings/variables to remove usage of the word "transform" (#1115) @AbhinavTuli
- Fixed bugout reporting paths and overreporting during Dataset initialization. (#1111) @davidbuniat
- Release/2.0.6 (#1107) @davidbuniat
- Faster chunk serialization (#1106) @AbhinavTuli
- Remove default logging code in client (#1097) @benchislett
⚙️ Who Contributes
@AbhinavTuli, @benchislett, @davidbuniat, @istranic and @mccrearyd
Refactors and Minor Updates
🩰 What's New
- Added hub.ingest for automatic creation of datasets
- Added hub.list to help users find publicly available datasets
- Lots of refactors to help developers
🧭 What's Changed
- kaggle argument fix (#1101) @thisiseshan
- Polish top directory (#1054) @kristinagrig06
- Ban dataset attributes as tensor names (#1103) @kristinagrig06
- update tensors (#1089) @mccrearyd
- More sample compressions (#1087) @farizrahman4u
- [small] all scalars have shape (1,) instead of () (#1102) @mccrearyd
- refactor input pipeline for samples (#1099) @mccrearyd
- Integrate hub auto + kaggle (#1075) @thisiseshan
- List datasets (#1048) @kristinagrig06
⚙️ Who Contributes
@AbhinavTuli, @farizrahman4u, @kristinagrig06, @mccrearyd and @thisiseshan
Adding metadata and parallel computations
🎁 What's New
- You can add metadata to datasets and tensors
- You can run computations in parallel using
hub.compute
- The dataset API is updated to be more intuitive
🧭 What's Changed
- Add static dataset delete (#1060) @benchislett
- 2.0.4 release Version update (#1094) @davidbuniat
- Adding back transforms for parallel dataset uploads (#1086) @AbhinavTuli
- Release 2.0.3 (#1084) @davidbuniat
- Update code snippets (#1088) @istranic
- [refactor] encoders base class (#1082) @mccrearyd
- Alias "jpg" to "jpeg" (#1073) @benchislett
- Updated readme (#1085) @istranic
- [small] implement
hub.like
in new api (#1083) @mccrearyd - Bugout reporting update (#1081) @istranic
- Info fixes (#1080) @farizrahman4u
- BUGGER_OFF=true when running tests in Circle CI. (#1079) @zomglings
- [small] Remove dataset link during creation of Hub datasets (#1078) @dhiganthrao
- Enable search in pdoc (#1074) @benchislett
- Added "dataset" class for interacting with underlying "Dataset" class (#1063) @AbhinavTuli
- [small] turn off activeloop reporting during circleci tests (#1076) @mccrearyd
- dataset/tensor
info
alongsidemeta
(#1066) @mccrearyd - [small] fix hub cloud throttling for tests (#1077) @mccrearyd
- Add back the history of master into main (#1061) @benchislett
- [Small PR] Removes tfds tests (#1070) @AbhinavTuli
- Old pytorch multiprocessing bug fix (#1068) @AbhinavTuli
- [Small PR] Renames hub.load to hub.read (#1064) @AbhinavTuli
- Made Dataset and LRUCache objects pickleable (#1049) @AbhinavTuli
- Alternate fix for tensor creation bug (#1065) @AbhinavTuli
⚙️ Who Contributes
@AbhinavTuli, @benchislett, @davidbuniat, @dhiganthrao, @farizrahman4u, @istranic, @mccrearyd, @tatevikh and @zomglings
Bug fix for .pytorch DataLoaders 🌈
🎁 What's New
- We mostly focused on refactoring and minor bugs.
- .pytorch() now works with pubic datasets hosted by team Activeloop (e.g. hub://activeloop/mnist-train).
- Underlying data format is now better! Since the new format is incompatible with the prior release, you should update to the new release using
pip3 install --upgrade hub
.
🧭 What's Changed
- version update (#1062) @davidbuniat
- Fixes an issue in which reporting configuration file was not being created if its parent directory didn't exist. (#1058) @zomglings
- Update PR template to new format (#1059) @benchislett
- Add back PR template from master (#1034) @benchislett
- Update htype docs (#1030) @benchislett
- Validate indexing when given, not at compute-time (#1033) @benchislett
- Update readme (#1057) @istranic
- fix meta non-persistence bug (adds test) (#1053) @mccrearyd
- Updating old pytorch warning message (#1055) @AbhinavTuli
- Changed from master to main (#1052) @Anselmoo
- [refactor] Tests/update fixtures (#1046) @mccrearyd
- NPZ replacement format (only) (#1047) @farizrahman4u
- Auto cast (#1041) @farizrahman4u
- Bring back tuple mode, this time serializable (#1028) @farizrahman4u
- Array interface for Tensor (#1042) @farizrahman4u
- Windows always uses old pytorch integration now (#1044) @AbhinavTuli
- [small] remove chunk sizes from htypes (#1037) @mccrearyd
- Small fix for Pytorch shared memory leak (#1040) @AbhinavTuli
- Fixes dataset creation bug with s3/hub cloud datasets having similar names (#1045) @AbhinavTuli
- [small] Update/2.0/hub cloud test (#1023) @mccrearyd
- Fix tensor creation bug (#1043) @farizrahman4u
- Refactor/fstrings (#1035) @dhiganthrao
- update sample compression API (#1038) @mccrearyd
- [small] Silence tensorflow logs in tests (#1029) @benchislett
- [small] update scalar test (#1022) @mccrearyd
🐛 Bug Fixes
- [small] pytorch readonly error bug fix (#1026) @mccrearyd
- [small] Fix/2.0/readonly (#1024) @mccrearyd
🔗 Dependency Updates
- Bump pillow from 7.2.0 to 8.2.0 in /requirements (#1018) @dependabot
⚙️ Who Contributes
@AbhinavTuli, @Anselmoo, @benchislett, @davidbuniat, @dependabot, @dhiganthrao, @farizrahman4u, @istranic, @mccrearyd, @tatevikh and @zomglings