IBM Streams Python support¶
Python APIs for use with IBM® Streaming Analytics service on IBM Cloud and on-premises IBM Streams.
Python Application API for Streams¶
Module that allows the definition and execution of streaming applications implemented in Python. Applications use Python code to process tuples and tuples are Python objects.
SPL operators may also be invoked from Python applications to allow use of existing IBM Streams toolkits.
See topology
Python application support for IBM Streams. |
|
Streaming application definition. |
|
Context for submission and build of topologies. |
|
Schemas for streams. |
|
Application state. |
|
Composite transformations. |
|
Testing support for streaming applications. |
|
Runtime tester functionality. |
|
Access to the IBM Streams execution context. |
|
Integration of SPL operators. |
|
SPL type definitions. |
|
SPL toolkit integration. |
SPL primitive Python operators¶
SPL primitive Python operators provide the ability to perform tuple processing using Python in an SPL application.
A Python function or class is simply turned into an SPL primitive operator through provided decorators.
SPL (Streams Processing Language) is a domain specific language for streaming analytics supported by Streams.
SPL Python primitive operators. |
Streams Python REST API¶
Module that allows interaction with an running Streams instance or service through HTTPS REST APIs.
REST API bindings for IBM® Streams Cloud Pak for Data build service. |
|
REST API bindings for IBM® Streams & Streaming Analytics service. |
|
Primitive objects for REST bindings. |
Scripts¶
The streamsx package provides a number of command line scripts.
spl-python-extract¶
Overview¶
Extracts SPL Python primitive operators from decorated Python classes and functions.
Executing this script against an SPL toolkit creates the SPL primitive operator meta-data required by the SPL compiler (sc).
Usage¶
spl-python-extract [-h] -i DIRECTORY [--make-toolkit] [-v]
Extract SPL operators from decorated Python classes and functions.
optional arguments:
-h, --help show this help message and exit
-i DIRECTORY, --directory DIRECTORY
Toolkit directory
--make-toolkit Index toolkit using spl-make-toolkit
-v, --verbose Print more diagnostics
SPL Python primitive operators¶
SPL operators that call a Python function or callable class are created by decorators provided by the streamsx package.
To create SPL operators from Python functions or classes one or more Python
modules are created in the opt/python/streams
directory
of an SPL toolkit.
spl-python-extract
is a Python script that creates SPL operators from
Python functions and classes contained in modules under opt/python/streams
.
The resulting operators embed the Python runtime to allow stream processing using Python.
Details on how to implement SPL Python primitive operators see
streamsx.spl.spl
.
streamsx-info¶
Overview¶
Information about streamsx package and environment.
Prints to standard out information about the streamsx package and environment variables used to support Python in IBM Streams and Streaming Analytics service.
A Python warning is issued if a mismatch is detected between
the installed streamsx package and its modules. This is typically
due to having a different version of the modules accessible through
the environment variable PYTHONPATH
.
Warning
When using the streamsx package ensure that the environment variable
PYTHONPATH
does not include a path ending with
com.ibm.streamsx.topology/opt/python/packages
.
The IBM Streams environment configuration script streamsprofile.sh
modifies or sets PYTHONPATH
to include the Python support
from the SPL topology toolkit shipped with the product. This was to
support Python before the streamsx package was available. The
recommendation is to unset PYTHONPATH
or modify it not to
include the path to the topology toolkit.
Output is subject to change in the order and information displayed. Intended as an ad-hoc tool to help diagnose issues with streamsx.
Script may also be run as Python module:
python -m streamsx.scripts.info
Usage¶
usage: streamsx-info [-h]
Prints support information about streamsx package and environment.
optional arguments:
-h, --help show this help message and exit
streamsx-runner¶
Overview¶
Submits or builds a Streams application to the Streaming Analytics service.
The application to be submitted can be:
A Python application defined through
Topology
using the--topology
flag.An SPL application (main composite) using the
--main-composite
flag.A Streams application bundle (
sab
file) using the--bundle
flag.
Streaming Analytics service¶
The Streaming Analytics service is defined by:
Service name -
--service-name
defaulting to environment variableSTREAMING_ANALYTICS_SERVICE_NAME
. The service name must exist in the vcap services.Vcap services - Environment variable
VCAP_SERVICES
containing JSON representation of the service definitions or a file name containing the service definitions.
Job submission¶
Job submission occurs unless --create-bundle
is set.
Bundle creation¶
When -create-bundle
is specified with -main-composite
or --topology
then a Streams application bundle (sab file) is created.
If environment variable STREAMS_INSTALL is set the the build is local otherwise the build occurs in the IBM Cloud using the Streaming Analytics service.
When STREAMS_INSTALL
is not set then streamsx-runner can be executed
with no local Streams install.
When compiling an SPL application (--main-composite
) then the
path to the application toolkit containing the main composite must
be listed with --toolkits
.
Any other required local toolkits must be listed with with --toolkits
.
Usage¶
streamsx-runner [-h] [--service-name SERVICE_NAME] | [--create-bundle]
(--topology TOPOLOGY | --main-composite MAIN_COMPOSITE | --bundle BUNDLE)
[--toolkits TOOLKITS [TOOLKITS ...]] [--job-name JOB_NAME]
[--preload] [--trace {error,warn,info,debug,trace}]
[--submission-parameters SUBMISSION_PARAMETERS [SUBMISSION_PARAMETERS ...]]
[--job-config-overlays file]
Execute a Streams application using a Streaming Analytics service.
optional arguments:
-h, --help show this help message and exit
--service-name SERVICE_NAME
Submit to Streaming Analytics service
--create-bundle Create a bundle (sab file). No job submission occurs.
--topology TOPOLOGY Topology to call
--main-composite MAIN_COMPOSITE
SPL main composite (namespace::composite_name)
--bundle BUNDLE Streams application bundle (sab file) to submit to
service
Build options:
Application build options
--toolkits TOOLKITS [TOOLKITS ...]
SPL toolkit path containing the main composite and any
other required SPL toolkit paths.
Job options:
Job configuration options
--job-name JOB_NAME Job name
--preload Preload job onto all resources in the instance
--trace {error,warn,info,debug,trace}
Application trace level
--submission-parameters SUBMISSION_PARAMETERS [SUBMISSION_PARAMETERS ...], -p SUBMISSION_PARAMETERS [SUBMISSION_PARAMETERS ...]
Submission parameters as name=value pairs
--job-config-overlays file
Path to file containing job configuration overlays
JSON. Overrides any job configuration set by the
application.
Submitting to Streaming Analytics service¶
An application is submitted to a Streaming Analytics service using
--service-name SERVICE_NAME
. The named service must exist in the
VCAP services definition pointed to by the VCAP_SERVICES
environment
variable.
The application is submitted as source (except --bundle
) and compiled into
a Streams application bundle (sab
file) using the build service before
being submitted as a running job to the service instance.
See also
Python applications¶
To submit a Python application a Python function must be defined
that returns the application (and optionally its configuration)
to be submitted. The fully qualified name of this function is
specified using the --topology
flag.
For example, an application can be submitted as:
streamsx-runner --service-name Streaming-Analytics-xd \
--topology com.example.apps.sensor_ingester
The function returns one of:
For example the above function might be defined as:
def _create_sensor_ingester_app():
topo = Topology('SensorIngesterApp')
# Application declaration omitted
...
return topo
def sensor_ingester():
return (_create_sensor_ingester_app(), JobConfig(job_name='SensorIngester'))
Thus when this application is submitted using the sensor_ingester function it is always submitted with the same job name SensorIngester.
The function must be accessible from the current Python path
(typically through environment variable PYTHONPATH
).
SPL applications¶
The main composite that defines the application is specified using the -main-composite
flag specifing the fully namespace qualified name.
Any required local SPL toolkits, including the one containing the main composite, must be indivdually specified by location to the --toolkits
flag. Any SPL toolkit that is present on the IBM Cloud service need not be included.
For example, an application that uses the Slack toolkit might be submitted as:
streamsx-runner --service-name Streaming-Analytics-xd \
--main-composite com.example.alert::SlackAlerter \
--toolkits $HOME/app/alerters $HOME/toolkits/com.ibm.streamsx.slack
where $HOME/app/alerters
is the location of the SPL application toolkit containing the com.example.alert::SlackAlerter
main composite.
Warning
The main composite name must be namespace qualified. Use of the default namespace for a main composite is not recommended as it increases the chance of a name clash with another SPL toolkit.
Streams application bundles¶
A Streams application bundle is submitted to a service instance using --bundle
. The argument to --bundle
is a locally accessible file that will be uploaded to the service.
The bundle must have been created on using an IBM Streams install whose architecture and OS version matches the service instance. Currently this is x86_64
and RedHat/CentOS 6 or 7 depending on the service instance.
The --toolkits
flag must not be specified when submitting a bundle.
Job options¶
Job options, such as --job-name
, configure the running job.
For --topology
job options set as arguments to streamsx-runner
override any configuration returned from the function defining the application.
Creating Streams application bundles¶
--create-bundle
uses a local IBM Streams install to attempt to mimic the build that would occur with -topology
or --main-composite
. Differences between the local environment and the IBM Cloud Streaming Analytics build environment may cause build failures in one and not the other.
This can be used as a mechanism to perform a local test build before using the service, or as a valid mechanism to create bundles for later upload with --bundle
.
For example simply changing the --service-name name
to --create-bundle
perfoms a local build of the same application:
# Submit to an Streaming Analytics service
streamsx-runner --service-name Streaming-Analytics-xd \
--main-composite com.example.alert::SlackAlerter \
--toolkits $HOME/app/alerters $HOME/toolkits/com.ibm.streamsx.slack
# Build the same application locally
streamsx-runner --create-bundle \
--main-composite com.example.alert::SlackAlerter \
--toolkits $HOME/app/alerters $HOME/toolkits/com.ibm.streamsx.slack
streamsx-sc¶
Overview¶
SPL compiler for IBM Streams running on IBM Cloud Pak for Data.
streamsx-sc
replicates a sub-set of Streams 4.3 sc
options.
streamsx-sc
is supported for Streams 5.x (Cloud Pak for Data).
A local install of Streams is not required,
simply the installation of the streamsx package. All functionality
is implemented through the Cloud Pak for Data and Streams build service REST apis.
Cloud Pak for Data configuration¶
Integrated configuration¶
The Streams instance (and its build service) and authentication are defined through environment variables:
CP4D_URL - Cloud Pak for Data deployment URL, e.g. https://cp4d_server:31843.
STREAMS_INSTANCE_ID - Streams service instance name.
STREAMS_USERNAME - (optional) User name to submit the job as, defaulting to the current operating system user name.
STREAMS_PASSWORD - Password for authentication.
Standalone configuration¶
The Streams build service and authentication are defined through environment variables:
STREAMS_BUILD_URL - Streams build service URL, e.g. when the service is exposed as node port: https://<NODE-IP>:<NODE-PORT>
STREAMS_USERNAME - (optional) User name to submit the job as, defaulting to the current operating system user name.
STREAMS_PASSWORD - Password for authentication.
Usage¶
streamsx-sc [-h] --main-composite name [--spl-path SPL_PATH]
[--optimized-code-generation] [--no-optimized-code-generation]
[--prefer-facade-tuples] [--ld-flags LD_FLAGS]
[--cxx-flags CXX_FLAGS] [--c++std C++STD]
[--data-directory DATA_DIRECTORY]
[--output-directory OUTPUT_DIRECTORY] [--disable-ssl-verify]
[--static-link] [--standalone-application]
[--set-relax-fusion-relocatability-restartability]
[--checkpoint-directory path] [--profiling-sampling rate]
[compile-time-args [compile-time-args ...]]
Options and arguments
- compile-time-args:
Pass named arguments each in the format name=value to the compiler. The name cannot contain the character
=
but otherwise is a free form string. It matches the name parameter that is specified in calls that are made to the compile-time argument access functions from within SPL code. The value can be any string. See Compile-time arguments .- -M,–main-composite:
SPL Main composite
- -t,–spl-path:
Set the toolkit lookup paths. Separate multiple paths with
:
. Each path is a toolkit directory or a directory of toolkit directories. This path overrides theSTREAMS_SPLPATH
environment variable.- -a,–optimized-code-generation:
Generate optimized code with less runtime error checking
- —no-optimized-code-generation:
Generate non-optimized code with more runtime error checking. Do not use with the –optimized-code- generation option.
- -k,–prefer-facade-tuples:
Generate the facade tuples when it is possible.
- -w,–ld-flags:
Pass the specified flags to ld while linking occurs.
- -x,–cxx-flags:
Pass the specified flags to the C++ compiler during the build.
- –c++std:
Specify the language level for the underlying C++ compiles.
- –data-directory:
Specifies the location of the data directory to use.
- –output-directory:
Specifies a directory where the application artifacts are placed.
- –disable-ssl-verify:
Disable SSL verification against the build service
- Deprecated arguments
Arguments supported by sc but deprecated. They have no affect on compilation.
-s,–static-link
-T,–standalone-application
-O,–set-relax-fusion-relocatability-restartability
-K,–checkpoint-directory
-S,–profiling-sampling
Toolkits¶
The application toolkit is defined as the working directory of streamsx-sc.
Local toolkits are found through the toolkit path set by –spl-path or environment variable STREAMS_SPLPATH
. Local toolkits are included in the build code archive sent to the build service if:
the toolkit is defined as a dependent of the application toolkit including recursive dependencies of required local toolkits.
and a toolkit of a higher version within the required dependency range does not exist locally or remotely on the build service.
The toolkit path for the compilation on the build service includes:
the application toolkit
local tookits included in the build code archive
all toolkits uploaded on the Streams build service
all product toolkits on the Streams build service
The application toolkit and local toolkits included in the build archive are processed prior to the actual compilation by:
having any Python SPL primitive operators extracted using
spl-python-extract
indexed using
spl-make-toolkit
New in version 1.13.
streamsx-service¶
Overview¶
Control commands for a Streaming Analytics service.
Usage¶
streamsx-service [-h] [--service-name SERVICE_NAME] [--full-response]
{start,status,stop} ...
Control commands for a Streaming Analytics service.
positional arguments:
{start,status,stop} Supported commands
start Start the service instance
status Get the service status.
stop Stop the instance for the service.
optional arguments:
-h, --help show this help message and exit
--service-name SERVICE_NAME
Streaming Analytics service name
--full-response Print the full JSON response.
service.py stop [-h] [--force]
optional arguments:
-h, --help show this help message and exit
--force Stop the service even if jobs are running.
Controlling a Streaming Analytics service¶
The Streaming Analytics service to control is defined using
--service-name SERVICE_NAME
. If not provided then the
service name is defined by the environment variable
STREAMING_ANALYTICS_SERVICE_NAME
.
The named service must exist in the VCAP services definition
pointed to by the VCAP_SERVICES
environment variable.
The response from making the control request is printed to
standard out in JSON format. By default a minimal response
is printed including the status of the service and the job count.
The complete response from the service REST API is printed if
the option --full-response
is given.
streamsx-streamtool¶
Overview¶
Command line interface for IBM Streams running on IBM Cloud Pak for Data.
streamsx-streamtool
replicates a sub-set of Streams streamtool
commands focusing on supporting DevOps for streaming applications.
streamsx-streamtool
is supported for Streams Cloud Pak for Data (5.x) instances
A local install of Streams is not required,
simply the installation of the streamsx package. All functionality
is implemented through Cloud Pak for Data and Streams REST apis.
Cloud Pak for Data configuration¶
The Streams instance and authentication are defined through environment variables, the details depend on if the Streams instance is running in integrated or standalone configuration.
Integrated configuration¶
CP4D_URL - Cloud Pak for Data deployment URL, e.g. https://cp4d_server:31843.
STREAMS_INSTANCE_ID - Streams service instance name.
STREAMS_USERNAME - (optional) User name to submit the job as, defaulting to the current operating system user name. Overridden by the
--User
option.STREAMS_PASSWORD - Password for authentication.
Standalone configuration¶
STREAMS_REST_URL - Streams SWS service (REST API) URL, e.g. when the service is exposed as node port: https://<NODE-IP>:<NODE-PORT>
STREAMS_BUILD_URL - Streams build service (REST API) URL, e.g. when the service is exposed as node port: https://<NODE-IP>:<NODE-PORT>. Required for lstoolkit and rmtoolkit.
STREAMS_USERNAME - (optional) User name to submit the job as, defaulting to the current operating system user name.
STREAMS_PASSWORD - Password for authentication.
Usage¶
streamsx-streamtool submitjob [-h] [--jobConfig file-name]
[--jobname job-name] [--jobgroup jobgroup-name]
[--outfile file-name] [--P parameter-name]
[--User user]
sab-pathname
streamsx-streamtool canceljob [-h] [--force] [--collectlogs]
[--jobs job-id | --jobnames job-names | --file file-name]
[--User user]
[jobid [jobid ...]]
streamsx-streamtool lsjobs [-h] [--jobs job-id] [--users user]
[--jobnames job-names] [--fmt format-spec]
[--xheaders] [--long] [--showtimestamp]
[--User user]
streamsx-streamtool lsappconfig [-h] [--fmt format-spec] [--User user]
streamsx-streamtool mkappconfig [-h] [--property name=value]
[--propfile property-file]
[--description description] [--User user]
config-name
streamsx-streamtool rmappconfig [-h] [--noprompt] [--User user] config-name
streamsx-streamtool chappconfig [-h] [--property name=value]
[--description description] [--User user]
config-name
streamsx-streamtool getappconfig [-h] [--User user] config-name
streamsx-streamtool lstoolkit [-h]
(--all | --id toolkit-id | --name toolkit-name | --regex toolkit-regex)
[--User user]
streamsx-streamtool rmtoolkit [-h]
(--toolkitid toolkit-id | --toolkitname toolkit-name | --toolkitregex toolkit-regex)
[--User user]
streamsx-streamtool uploadtoolkit [-h] --path toolkit-path [--User user]
streamsx-streamtool updateoperators [-h] [--jobname job-name]
[--jobConfig file-name]
[--parallelRegionWidth parallelRegionName=width]
[--force] [--User user]
[jobid]
submitjob¶
The streamtool submitjob command previews or submits one job.
Description:
A submitted job runs an application that is defined by an application bundle. Application bundles are created by the Stream Processing Language (SPL) compiler. A job consists of one or more processing elements (PEs). The PEs are placed on one or more of the application resources for the instance. The submission fails if the PE placement constraints can’t be met.
Jobs remain in the system until they are canceled or the instance is stopped.
streamsx-streamtool submitjob [-h] [--jobConfig file-name]
[--jobname job-name] [--jobgroup jobgroup-name]
[--outfile file-name] [--P parameter-name]
[--User user]
sab-pathname
Options and arguments
- sab-pathname
Specifies the path name for the application bundle file. If you do not specify an absolute path, the command seeks the file in the directory where you ran the command. Alternatively, you can specify the path name for the application description language (ADL) file if the application bundle file exists in the same directory.
- -g,–jobConfig:
Specifies the name of an external file that defines a job configuration overlay. You can use a job configuration overlay to set the job configuration when the job is submitted or to change the configuration of a running job.
- -P,–P:
Specifies a submission-time parameter and value for the job. You can specify this option multiple times in the command.
- -J,–jobgroup:
Specifies the job group. If you do not specify this option, the command uses the following job group: default.
- —jobname:
Specifies the name of the job.
- —outfile:
Specifies the path and file name of the output file in which the command writes the list of submitted job IDs. The path can be an absolute or relative path. If you do not specify a path, the file is created in the directory where you run the command.
- -U,–User:
Specifies an IBM Streams user ID that has authority to run the command.
canceljob¶
The streamtool canceljob command cancels one or more jobs.
This command stops the processing elements (PEs) for the job and removes knowledge of the jobs and their PEs from the instance. The log files for the processing elements are scheduled for removal.
If you specify to collect the PE logs before they are removed, the operation can time out waiting for the termination of PEs. If such a timeout occurs, the operation fails and the jobs or PEs are still in the system. The canceljob command can be run again later to cancel them.
You can use the –force option to ignore a PE termination timeout and force the job to cancel.
streamsx-streamtool canceljob [-h] [--force] [--collectlogs]
[--jobs job-id | --jobnames job-names | --file file-name]
[--User user]
[jobid [jobid ...]]
Options and arguments
- jobid
Specifies a list of job IDs.
- -f,–file:
Specifies the file that contains a list of job IDs, one per line.
- -j,–jobs:
Specifies a list of job IDs, which are delimited by commas.
- —jobnames:
Specifies a list of job names, which are delimited by commas.
- —collectlogs:
Specifies to collect the log and trace files for each processing element that is associated with the job.
- —force:
Specifies to quickly cancel a job and remove the job from the Streams data table.
- -U,–User:
Specifies an IBM Streams user ID that has authority to run the command.
lsjobs¶
The streamtool lsjobs command lists the jobs in the instance.
The streamtool lsjobs command provides a health summary for each job. The health summary is an aggregation of the PE health summaries for the job. If all of the PEs for a job are reported as healthy, the job is reported as healthy. Otherwise, the job is reported as not healthy. Use the streamtool lspes command to determine the health of PEs.
The command also reports the status of each job. For more information about job states, see the IBM Streams product documentation.
The date and time that the job was submitted are presented in local time with the iso8601 format: yyyy-mm-ddThh:mm:ss+/-hhmm, where the final hhmm values are the local offset from UTC. For example: 2010-03-16T13:41:53-0500.
When job selection options are specified, selected jobs must meet all of the selection criteria. After a cancel request for a job is processed, this command no longer reports the job or its processing elements (PEs).
streamsx-streamtool lsjobs [-h] [--jobs job-id] [--users user]
[--jobnames job-names] [--fmt format-spec]
[--xheaders] [--long] [--showtimestamp]
[--User user]
Options and arguments
- -j,–jobs:
Specifies a list of job IDs, which are delimited by commas.
- —jobnames:
Specifies a list of job names, which are delimited by commas.
- -u,–users:
Specifies to select from this list of user IDs, which are delimited by commas.
- —xheaders:
Specifies to exclude headings from the report.
- -l,–long:
Reports launch count, full host names, and all of the operator instance names for the PEs.
- —fmt:
Specifies the presentation format. The command supports the following values:
%Mf: Multiline record format. One line per field.
%Nf: Name prefixed field table format. One line per job.
%Tf: Standard table format, which is the default. One line per job.
- —showtimestamp:
Specifies to show a time stamp in the output to indicate when the command was run.
- -U,–User:
Specifies an IBM Streams user ID that has authority to run the command.
lsappconfig¶
The streamtool lsappconfig command lists the available configurations that enable connections to an external application.
Retrieve a list of configurations for making a connection to an external application.
streamsx-streamtool lsappconfig [-h] [--fmt format-spec] [--User user]
Options and arguments
- —fmt:
Specifies the presentation format. The command supports the following values:
%Mf: Multiline record format. One line per field.
%Nf: Name prefixed field table format. One line per cfgname.
%Tf: Standard table format, which is the default. One line per cfgname.
- -U,–User:
Specifies an IBM Streams user ID that has authority to run the command.
mkappconfig¶
The streamtool mkappconfig command creates a configuration that enables connection to an external application.
Operators can retrieve the configuration information to make a connection to an external application, such as an Internet Of Things application. The properties include items that the application needs at runtime, like connection information and credentials.
Use this command to register properties or a properties file. Create the property file using a name=value syntax.
streamsx-streamtool mkappconfig [-h] [--property name=value]
[--propfile property-file]
[--description description] [--User user]
config-name
Options and arguments
- config-name:
Name of the app config
- —description:
Specifies a description for the application configuration. The description can be 1024 characters in length. If the description contains blank characters, it must be enclosed in single or double quotation marks. Quotation marks within the description must be preceded by a backslash ().
- —property:
Specifies a property name and value pair to add to or change in the configuration. This option can be specified multiple times and has an additive effect.
- —propfile:
Specifies the path to a file that contains a list of application configuration properties for connecting to an external application. The properties are listed as name=value pairs, each on a separate line. Use this option as a way to include multiple configuration properties when you create an application configuration. Options that you specify at the command line override values that are specified in this property file.
- -U,–User:
Specifies an IBM Streams user ID that has authority to run the command.
rmappconfig¶
The streamtool rmappconfig command removes a configuration that enables connection to an external application.
This command removes a configuration that is used for making a connection to an external application.
streamsx-streamtool rmappconfig [-h] [--noprompt] [--User user] config-name
Options and arguments
- config-name:
Name of the app config
- —noprompt:
Specifies to suppress confirmation prompts.
- -U,–User:
Specifies an IBM Streams user ID that has authority to run the command.
chappconfig¶
The streamtool chappconfig command updates a configuration that enables connection to an external application.
Use this command to change the configuration properties that are used to make a connection to an external application, such as an Internet Of Things application. You can change the values of properties or add new properties.
streamsx-streamtool chappconfig [-h] [--property name=value]
[--description description] [--User user]
config-name
Options and arguments
- config-name:
Name of the app config
- —description:
Specifies a description for the application configuration. The description can be 1024 characters in length. If the description contains blank characters, it must be enclosed in single or double quotation marks. Quotation marks within the description must be preceded by a backslash ().
- —property:
Specifies a property name and value pair to add to or change in the configuration. This option can be specified multiple times and has an additive effect.
- -U,–User:
Specifies an IBM Streams user ID that has authority to run the command.
getappconfig¶
The streamtool getappconfig command displays the properties of a configuration that enables connection to an external application.
This command retrieves the properties and values of a specific configuration for connecting to an external application.
streamsx-streamtool getappconfig [-h] [--User user] config-name
Options and arguments
- config-name:
Name of the app config
- -U,–User:
Specifies an IBM Streams user ID that has authority to run the command.
lstoolkit¶
List toolkits from a build service.
streamsx-streamtool lstoolkit [-h]
(--all | --id toolkit-id | --name toolkit-name | --regex toolkit-regex)
[--User user]
Options and arguments
- -a,–all:
List all toolkits
- -i,–id:
List a specific toolkit given its toolkit id
- -n,–name:
List all toolkits with this name
- -r,–regex:
List all toolkits where the name matches the given regex pattern
rmtoolkit¶
Remove toolkits from a build service.
streamsx-streamtool rmtoolkit [-h]
(--id toolkit-id | --name toolkit-name | --regex toolkit-regex)
[--User user]
Options and arguments
- -i,–id:
Specifies the id of the toolkit to delete
- -n,–name:
Remove all toolkits with this name
- -r,–regex:
Remove all toolkits where the name matches the given regex pattern
uploadtoolkit¶
Upload a toolkit to a build service.
streamsx-streamtool uploadtoolkit [-h] --path toolkit-path [--User user]
Options and arguments
- -p,–path:
Specifies the path of the indexed toolkit to upload
New in version 1.13.
updateoperators¶
Adjust a job configuration while the job is running in order to improve the job performance
streamsx-streamtool updateoperators [-h] [--jobname job-name]
[--jobConfig file-name]
[--parallelRegionWidth parallelRegionName=width]
[--force] [--User user]
[jobid]
Options and arguments
- jobid:
Specifies a job ID
- —jobname:
Specifies the name of the job
- -g,–jobConfig:
Specifies the name of an external file that defines a job configuration overlay. You can use a job configuration overlay to set the job configuration when the job is submitted or to change the configuration of a running job.
- —parallelRegionWidth:
Specifies a parallel region name and its width.
- —force:
Specifies whether to automatically stop the PEs that need to be stopped.
- -U,–User:
Specifies an IBM Streams user ID that has authority to run the command.
Environments¶
IBM Streaming Analytics service¶
Overview¶
IBM® Streaming Analytics for IBM Cloud is powered by IBM® Streams, an advanced analytic platform that you can use to ingest, analyze, and correlate information as it arrives from different types of data sources in real time. When you create an instance of the Streaming Analytics service, you get your own instance of IBM® Streams running in IBM® Cloud, ready to run your IBM® Streams applications.
Package support¶
This streamsx package supports :
Developing streaming applications in Python that can be submitted to a Streaming Analytics service. See
streamsx.topology.topology
,STREAMING_ANALYTICS_SERVICE
.Submitting streaming applications written in Python or SPL to a Streaming Anlaytics service. See Python applications, SPL applications.
Submitting a pre-compiled Streams application bundle (
sab
file) Python or SPL to a Streaming Anlaytics service. See Streams application bundles.Python bindings to the IBM Streams REST API and the Streaming Analytics REST API. See
streamsx.rest
Accessing a service¶
In order to use a Streaming Analytics service you must have access to credentials for the service. There are two mechanisms used by this package, VCAP services and direct use of Streaming Analytics credentials.
VCAP services¶
This is the format used by Cloud Foundry for bindable services.
The service key for Streaming Analytics service is streaming-analytics
,
the value of that key in the VCAP services is a list of accessible services,
each service represented by a separate object.
Each streaming analytics object must have these keys:
name
identifying the name of the service.
credentials
identifying the connection credentials for the service.
Example VCAP services containing two Streaming Analytics services sa-test and sa-prod (with the specific connection details elided):
{
"streaming-analytics": [
{
"name": "sa-test",
"credentials":
{
"apikey": "...",
"iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - ...",
"iam_apikey_name": "auto-generated-apikey-...",
"iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
"iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity ...",
"v2_rest_url": "https://streams-app-service.ng.bluemix.net/v2/streaming_analytics/..."
}
},
{
"name": "sa-prod",
"credentials":
{
"apikey": "...",
"iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - ...",
"iam_apikey_name": "auto-generated-apikey-...",
"iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
"iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity ...",
"v2_rest_url": "https://streams-app-service.ng.bluemix.net/v2/streaming_analytics/..."
}
}
]
}
Note
The specific keys in the credentials may differ depending on the service plan.
Cloud Foundry applications¶
When a Streaming Analytics service is bound to a Cloud Foundry Python
application the environment variable VCAP_SERVICES
is
automatically defined and contains a string representation of the
JSON VCAP services information.
Client applications¶
Client applications are ones that run outside of the IBM Cloud, for example on a local laptop, or applications that are not bound to a service.
Client applications running must define a valid VCAP services in its JSON format as either:
In the environment variable
VCAP_SERVICES
containing a string representation of the JSON VCAP services information.In a file containing a string representation of the JSON VCAP services information and have the file’s absolute path in either:
the environment variable
VCAP_SERVICES
the configuration property
VCAP_SERVICES
when submitting an application usingsubmit()
with context typeSTREAMING_ANALYTICS_SERVICE
. This overrides the environment variable VCAP_SERVICES.
The contents of the file must be manually created, the credentials for the credentials
key are obtained from the Streaming Analytics manage console. Select the Service Credentials page and then copy the required credentials. You may need to first create credentials. You can an copy the credentials by taking the View credentials action and then clicking the copy to clipboard icon on the right hand side.
Warning
The credential information in VCAP services is in plain text. Ensure that the any file containing the information or setting the environment variable has suitable permissions set. For example only readable by the intended user.
Selecting the service¶
The Streaming Analyitcs service to use is specifed by its name, the required service much exist in the VCAP service information using the name
key.
The name of the service to use is set by:
the environment variable
STREAMING_ANALYTICS_SERVICE_NAME
.the configuration property
SERVICE_NAME
when submitting an application usingsubmit()
with context typeSTREAMING_ANALYTICS_SERVICE
. This overrides the environment variable STREAMING_ANALYTICS_SERVICE_NAME.the
--service-name
option tostreamsx-runner
.
Service definition¶
The Streaming Analytics service to use may be specified solely using its credentials. The credentials are specified:
with the configuration property
SERVICE_DEFINITION
when submitting an application usingsubmit()
with context typeSTREAMING_ANALYTICS_SERVICE
.when using
streamsx.rest.StreamingAnalyticsConnection.of_definition()
to create a REST connection.
Credentials obtained from the Streaming Analytics manage console. Select the Service Credentials page and then copy the required credentials. You may need to first create credentials. You can an copy the credentials by taking the View credentials action and then clicking the copy to clipboard icon on the right hand side.
IBM Streams Python setup¶
Developer setup¶
Developers install the streamsx package Python Package Index (PyPI) to use this functionality:
pip install streamsx
If already installed upgrade to the latest version is recommended:
pip install --upgrade streamsx
A local install of IBM Streams is not required when:
Using the Streams and Streaming Analytics REST bindings
streamsx.rest
.Devloping and submitting streaming applications using
streamsx.topology.topology
to Cloud Pak for Data or Streaming Analytics service on IBM Cloud.
The environment variable
JAVA_HOME
must reference a Java JRE or JDK/SDK version 8 or higher.
A local install of IBM Streams is required when:
Developing and submitting streaming applications using
streamsx.topology.topology
to IBM Streams 4.2, 4.3 distributed or standalone contexts.
If set the environment variable
JAVA_HOME
must reference a Java JRE or JDK/SDK version 8 or higher, otherwise the Java install from$STREAMS_INSTALL/java
is used.Creating SPL toolkits with Python primitive operators using
streamsx.spl.spl
decorators for use with 4.2, 4.3 distributed or standalone applications.
Warning
When using the streamsx package ensure that the environment variable
PYTHONPATH
does not include a path ending with
com.ibm.streamsx.topology/opt/python/packages
.
The IBM Streams environment configuration script streamsprofile.sh
modifies or sets PYTHONPATH
to include the Python support
from the SPL topology toolkit shipped with the product. This was to
support Python before the streamsx package was available. The
recommendation is to unset PYTHONPATH
or modify it not to
include the path to the topology toolkit.
Note
The streamsx package is self-contained and does not depend on any
SPL topology toolkit (com.ibm.streamsx.topology
) installed
under $STREAMS_INSTALL/toolkits
or on the SPL compiler’s (sc
)
toolkit path. This is true at SPL compilation time and runtime.
Streaming Analytics service¶
The service instance has Anaconda installed with Python 3.6 as the
runtime environment and has PYTHONHOME
Streams application environment variable
pre-configured.
Any streaming applications using Python must use Python 3.6 when
submitted to the service instance. The streamsx package must be installed locally and applications are submitted to the STREAMING_ANALYTICS_SERVICE
context.
IBM Cloud Pak for Data¶
An IBM Streams service instance within Cloud Pak for Data has Anaconda installed with Python 3.6 as the
runtime environment and has PYTHONHOME
Streams application environment variable pre-configured.
Any streaming applications using Python must use Python 3.6 when submitted to the service instance.
Streaming applications can be submitted through Jupyter notebooks running in
Cloud Pak for Data projects. The streamsx package is preinstalled and applications are submitted to the DISTRIBUTED
context.
Streaming applications can be submitted externally to the OpenShift cluster containing Cloud Pak for Data.
The streamsx package must be installed locally and applications are submitted to the DISTRIBUTED
context. The specific environment variables depend
on if the Streams instance is in a integrated or standalone configuration. See DISTRIBUTED
for details.
IBM Streams 4.2, 4.3¶
For a distributed cluster running Streams Python 3.7, 3.6 or 3.5 may be used.
Anaconda or Miniconda distributions may be used as the Python runtime, these have the advantage of being pre-built and including a number of standard packages. Ananconda installs may be downloaded at: https://www.continuum.io/downloads .
If building Python from source then it must be built to support embedding
of the runtime with shared libraries (--enable-shared
option to configure).
Distributed¶
For distributed the Streams application environment variable
PYTHONHOME
must be set to the Python install path.
This is set using streamtool as:
streamtool setproperty --application-ev PYTHONHOME=path_to_python_install
The application environment variable may also be set using the Streams console. The Instance Management view has an Application Environment Variables section. Expanding the details for that section allows modification of the set of environment variables available to Streams applications.
The Python install path must be accessible on every application resource that will execute Python code within a Streams application.
Note
The Python version used to declare and submit the application must compatible with the setting of PYTHONHOME
in the instance. For example, if PYTHONHOME
Streams application instance variable points to a Python 3.6 install, then Python 3.5 or 3.6 can be used to declare and submit the application.
Standalone¶
The environment PYTHONHOME
must be set to the Python install path.
Bundle Python version compatibility¶
As of 1.13 Streams application bundles (sab files) invoking Python are binary compatible with a range of Python releases when using Python 3.
The minimum verson supported is the version of Python used during bundle creation.
The maximum version supported is the highest version of Python with a proposed release schedule.
For example if a sab is built with Python 3.6 then it can be submitted to a Streams instance using 3.6 or higher, up to & including 3.9 which is the highest Python release with a proposed release schedule as of 1.13.
Note
Compatability across Python releases is dependent on Python’s Stable Application Binary Inteface.
Restrictions and known bugs¶
Restrictions and known bugs¶
No support for nested parallel regions at sources, i.e. nested
streamsx.topology.topology.Stream.set_parallel()
, for example:topo = Topology() s = topo.source(S()) s.set_parallel(3).set_parallel(2)
In this example, set_parallel(3) is ignored.
No support for nested types when defining stream schemas, for example:
class NamedTupleNestedTupleSchema(typing.NamedTuple): key: str spotted: SpottedSchema
No support of collections of NamedTuple as stream schema, for example:
class NamedTupleListOfTupleSchema(typing.NamedTuple): spotted: typing.List[SpottedSchema]
Python Composites (derived from
streamsx.topology.composite.Composite
) can have only one input port.No support to process window markers or final marker (end of stream) in Python Callables like in SPL operators
No hook for drain processing in consistent region for Python Callables
Submission time parameters, which are defined in SPL composites of other toolkits, or created by using streamsx.spl.op.Expression in the topology, cannot be accessed at runtime with streamsx.ec.get_submission_time_value(name).