Releases: bigscience-workshop/petals
Releases · bigscience-workshop/petals
v1.0.0: The first stable release
General
This release contains the core functionality of the Petals platform described in our paper.
What's Changed
- Rudimentary decentralization by @justheuristic in #9
- Update model by @dbaranchuk in #17
- Chained rpc_forward & rpc_backward by @dbaranchuk in #18
- Implement block selection on servers by @borzunov in #20
- LM head module by @dbaranchuk in #19
- Measure and cache network & compute throughput by @borzunov in #21
- Shallow prompt tuning with run example on SST-2 by @dbaranchuk in #22
- minimalistic automated tests by @justheuristic in #23
- Clean up readme by @justheuristic in #24
- [Test CI] add instructions to test the full model by @justheuristic in #25
- Fix default branch in CI by @justheuristic in #26
- Fix CI runs in master by @justheuristic in #27
- CI: use GIT_REF_NAME instead of GIT_HEAD_REF by @justheuristic in #28
- Add GenerationMixin class by @artek0chumak in #29
- Decouple make_sequence and move to RemoteSequenceManager by @justheuristic in #30
- fix is_subsequence by @dbaranchuk in #32
- Miscellaneous fixes to automatic tests by @justheuristic in #35
- Efficient forward & backward by @dbaranchuk in #36
- Pack of Inference Changes by @artek0chumak in #37
- Support various backend dtypes & async serialization by @dbaranchuk in #38
- Use "PETALS" as the readme title by @borzunov in #40
- integrate mixed-8bit model by @dbaranchuk in #39
- Rename 350m -> 560m by @dbaranchuk in #43
- make pytest outputs more verbose by @justheuristic in #44
- Distributed prompt tuning by @dbaranchuk in #42
- Reduce vocabulary size in test model, fix bug in routing when overlapped by @justheuristic in #45
- Convert actual model weights by @dbaranchuk in #46
- [quickfix 1/n] remove expensive assertions in inference code by @justheuristic in #48
- [Fix] make distributed seq cls to not create the full bloom model by @dbaranchuk in #49
- Fix recovering for sequential_backward by @dbaranchuk in #50
- Inference: require max sequence length instead of assuming 2048 by @justheuristic in #52
- Add shallow prefix-tuned inference by @artek0chumak in #55
- remove transformer block, implement as sequence size 1 by @GreenFatGuy in #54
- Update readme for the 1st public release by @borzunov in #57
- Use latest version of Petals scheme, shrink Petals logo by @borzunov in #59
- Update bullet points with feedback from Tim and other people by @borzunov in #61
- Update readme with arxiv link and more discussions by @borzunov in #62
- Warn that current instructions involve 6B model but we will replace them soon by @borzunov in #63
- Add deep prompt inference by @artek0chumak in #66
- Fix calling rpc_info multiple times by @justheuristic in #60
- Make attention cache wait until memory is freed by @justheuristic in #53
- Build cpuonly from bitsandbytes main by @justheuristic in #70
- Priority tasks by @GreenFatGuy in #47
- Update dependency versions by @justheuristic in #71
- fix protobuf version by @justheuristic in #74
- Add prompt tuning example on Personachat dataset by @artek0chumak in #69
- Quality of life changes: update readme, simplify run_server interface by @justheuristic in #75
- Use bitsandbytes==0.34.0, update readme by @justheuristic in #76
- Make small readability & style changes to the instructions by @borzunov in #77
- Rebalance swarm when necessary by @borzunov in #34
- Update hivemind to 1.1.2, mark
model
argument as required by @borzunov in #81 - Fix "Too many open files" during rebalancing by @borzunov in #83
- Add colab-related changes by @artek0chumak in #80
- Enable rebalancing by default by @borzunov in #84
- Implement exponential backoff for forward & backward by @borzunov in #85
- Add sst-2 ipynb example by @artek0chumak in #86
- Fix floating point issues in block_selection.py by @borzunov in #89
- Implement timeouts in forward/backward by @borzunov in #90
- Force reinstall of hivemind in example notebooks by @artek0chumak in #88
- Make inference, forward, and backward fully fault-tolerant by @borzunov in #91
- Use public swarm by default by @borzunov in #92
- Make ServerState announcements work better by @borzunov in #93
- Require hivemind with fixed compression and protobuf working on Colab by @borzunov in #94
- Try to fix protobuf versions once again by @borzunov in #95
- Add Beam Search decoding algorithm by @artek0chumak in #87
- Improve server's logging by @borzunov in #96
- Add various server timeouts, lower --max_batch_size and --inference_max_length defaults by @borzunov in #97
- Fix dtype- and device-related client issues by @borzunov in #98
- Make Petals a pip-installable package (attempt 2) by @borzunov in #102
- Fix dtypes in backend schemas by @borzunov in #99
- Fix ptune with
low_cpu_mem_usage=True
(as in Colab) by @borzunov in #103 - Add Dockerfile by @mryab in #82
- Remove unused imports, add missing arguments to docstrings by @mryab in #108
- Expose request_timeout to DistributedBloomConfig by @artek0chumak in #105
- Optimize RemoteSequenceManager by @justheuristic in #106
- Hotfix span selection by @justheuristic in #110
- Patch Linear8bit to enable CxB backward by @justheuristic in #111
- Fix Linear8bitlt state config, update tests by @justheuristic in #112
- Measure throughput for different configs, devices, and dtypes separately by @borzunov in #114
- Support --load_in_8bit on pre-Turing GPUs by @justheuristic in #113
- Fix tile size on ampere by @justheuristic in #116
- Make server use smart defaults by @borzunov in #115
- Suppress quantization warning and fix dtype defaults in compute benchmark by @borzunov in #117
- Choose --num_blocks for bigscience/bloom-petals automatically by @borzunov in #119
- Require hivem...