streamparse 3.0.0
This is the final release of streamparse 3.0.0. The developer preview versions of this release have been used extensively by many people for months, so we are quite confident in this release, but please let us know if you encounter any issues.
You can install this release via pip with pip install streamparse==3.0.0
.
Highlights
- Topologies are now specified via a Python Topology DSL instead of the Clojure Topology DSL. This means you can/must now write your topologies in Python! Components can still be written in any language supported by Storm, of course. (Issues #84 and #136, PR #199, #226)
- When
log.path
is not set in yourconfig.json
, pystorm will no longer issue warning about how you should set it; instead, it will automatically set up aStormHandler
and log everything directly to your Storm logs. This is really handy as in Storm 1.0 there's support through the UI for searching logs. - The
--ackers
and--workers
settings now default to the number of worker nodes in your Storm environment instead of 2. - Added
sparse slot_usage
command that can show you how balanced your topologies are across nodes. This is something that isn't currently possible with the Storm UI on its own. (PR #218) - Now fully Python 3 compatible (and tested on up to 3.5), because we rely on fabric3 instead of plain old fabric now. (4acfa2f)
- Now rely on pystorm package for handling Multi-Lang IPC between Storm and Python. This library is essentially the same as our old
storm
subpackage with a few enhancements (e.g., the ability to use MessagePack instead of JSON to serialize messages). (Issue #174, Commits aaeb3e9 and 1347ded)
⚠️ API Breaking Changes ⚠️
- Topologies are now specified via a Python Topology DSL instead of the Clojure Topology DSL. This means you can/must now write your topologies in Python! Components can still be written in any language supported by Storm, of course. (Issues #84 and #136, PR #199, #226)
- The deprecated
Spout.emit_many
method has been removed. (pystorm/pystorm@004dc27) - As a consequence of using the new Python Topology DSL, all Bolts and Spouts that emit anything are expected to have the
outputs
attribute declared. It must either be a list ofstr
orStream
objects, as described in the docs. - We temporarily removed the
sparse run
command, as we've removed all of our Clojure code, and this was the only thing that had to still be done in Clojure. (Watch issue #213 for future developments) ssh_tunnel
has moved fromstreamparse.contextmanagers
tostreamparse.util
. Thestreamparse.contextmanagers
module has been removed.- The
ssh_tunnel
context manager now returns the hostname and port that should be used for connecting nimbus (e.g.,('localhost', 1234)
whenuse_ssh_for_nimbus
isTrue
or unspecified, and('nimbus.foo.com', 6627)
whenuse_ssh_for_nimbus
isFalse
). need_task_ids
defaults toFalse
instead ofTrue
in allemit()
method calls. If you were previously storing the task IDs that your tuples were emitted to (which is pretty rare), then you must passneed_task_ids=True
in youremit()
calls. This should provide a little speed boost to most users, because we do not need to wait on a return message from Storm for every emitted tuple.- Instead of having the
log.level
setting in yourconfig.json
influence the root logger's level, only your component (and itsStormHandler
if you haven't setlog.path
)'s levels will be set. - When
log.path
is not set in yourconfig.json
, pystorm will no longer issue warning about how you should set it; instead, it will automatically set up aStormHandler
and log everything directly to your Storm logs. This is really handy as in Storm 1.0 there's support through the UI for searching logs. - The
--par
option tosparse submit
has been remove. Please use--ackers
and--workers
instead. - The
--ackers
and--workers
settings now default to the number of worker nodes in your Storm environment instead of 2.
Features
- Added
sparse slot_usage
command that can show you how balanced your topologies are across nodes. This is something that isn't currently possible with the Storm UI on its own. (PR #218) - Can now specify
ssh_password
inconfig.json
if you don't have SSH keys setup. Storing your password in plaintext is not recommended, but nice to have for local VMs. (PR #224, thanks @motazreda) - Now fully Python 3 compatible (and tested on up to 3.5), because we rely on fabric3 instead of plain old fabric now. (4acfa2f)
- Now remove
_resources
directory after JAR has been created. - Added
serializer
setting toconfig.json
that can be used to switch between JSON and msgpack pack serializers (PR #238). Note that you cannot use the msgpack serializer unless you also include a Java implementation in your topology's JAR such as the one provided by Pyleus, or the one being added to Storm in apache/storm#1136. (PR #238) - Added support for custom log filenames (PR #234 — thanks @ kalmanolah)
- Can now set environment-specific
options
,acker_count
, andworker_count
settings, to avoid constantly passing all those pesky options tosparse submit
. (PR #265) - Added option to disable installation of virtualenv was stilling allowing their use,
install_virtualenv
. (PR #264). - The Python Topology DSL now allows topology-level config options to be set via the
config
attribute of theTopology
class. (Issue #276, PRs #284 and #289) - Can now pass any valid YAML as a value for
sparse submit --option
(Issue #280, PR #285) - Added
--override_name
option tokill
,submit
, andupdate_virtualenv
commands so that you can deploy the same topology file multiple times with different overridden names. (Issue #207, PR #286)
Fixes
sparse slot_usage
,sparse stats
, andsparse worker_uptime
are much faster as we've fixed an issue where they were creating many SSH subprocesses.- All commands that must connect to the Nimbus server now properly use SSH tunnels again.
- The output from running
pip install
is now displayed when submitting your topology, so you can see if things get stuck. sparse submit
should no longer sporadically raise exceptions about failing to create SSH tunnels (PR #242).sparse submit
will no longer crash when your provide a value for--ackers
(PR #241).- pin pystorm version to
>=2.0.1
(PR #230) sparse tail
now looks forpystorm
named filenames (@9339908)- Fixed typo that caused crash in
sparse worker_uptime
(@7085804) - Added back
sparse run
(PR #244) sparse run
should no longer crash when searching for the version number on some versions of Storm. (Issue #254, PR #255)sparse run
will no longer crash due to PyYAML dumping out!!python/unicode
garbage into the YAML files. (Issue #256, PR #257)- A
sparse run
TypeError with Python 3 has been fixed. (@e232224) sparse update_virtualenv
will no longer ignore thevirtualenv_flags
setting inconfig.json
. (Issue #281, PR #282)sparse run
now supports named streams on Storm 1.0.1+ (PR #260)- No longer remove non-topology-specific logs with
sparse remove_logs
(@45bd005) sparse tail
will now find logs in subdirectories for Storm 1.0+ compatibility (Issue #268, PR #271)
Other Changes
- Now rely on pystorm package for handling Multi-Lang IPC between Storm and Python. This library is essentially the same as our old
storm
subpackage with a few enhancements (e.g., the ability to use MessagePack instead of JSON to serialize messages). (Issue #174, Commits aaeb3e9 and 1347ded) - All Bolt, Spout, and Topology-related classes are all available directly at the
streamparse
package level (i.e., you can just dofrom streamparse import Bolt
now) (Commit b9bf4ae). sparse kill
now will kill inactive topologies. (Issue #156)- All examples now use the Python DSL
- The Kafka-JVM example has been cleaned up a bit, so now you can click on Storm UI log links and they'll work.
- Docs have been updated to reflect latest Leiningen installation instructions. (PR #261)
- A broken link in our docs was fixed. (PR #273)
- JARs are now uploaded before killing the running topology to reduce downtime during deployments (PR #277)
- Switched from PyYAML to ruamel.yaml (@18fd2e9)
- Added docs for handling multiple streams and groupings (Issue #252, @344ce8c)
- Added VPC deployment docs (Issue #134, @d2bd1ac)