IBM Streams Python support

Python APIs for use with IBM® Streaming Analytics service on IBM Cloud and on-premises IBM Streams.

Python Application API for Streams

Module that allows the definition and execution of streaming applications implemented in Python. Applications use Python code to process tuples and tuples are Python objects.

SPL operators may also be invoked from Python applications to allow use of existing IBM Streams toolkits.

See topology

streamsx.topology

Python application support for IBM Streams.

streamsx.topology.topology

Streaming application definition.

streamsx.topology.context

Context for submission and build of topologies.

streamsx.topology.schema

Schemas for streams.

streamsx.topology.state

Application state.

streamsx.topology.composite

Composite transformations.

streamsx.topology.tester

Testing support for streaming applications.

streamsx.topology.tester_runtime

Runtime tester functionality.

streamsx.ec

Access to the IBM Streams execution context.

streamsx.spl.op

Integration of SPL operators.

streamsx.spl.types

SPL type definitions.

streamsx.spl.toolkit

SPL toolkit integration.

SPL primitive Python operators

SPL primitive Python operators provide the ability to perform tuple processing using Python in an SPL application.

A Python function or class is simply turned into an SPL primitive operator through provided decorators.

SPL (Streams Processing Language) is a domain specific language for streaming analytics supported by Streams.

streamsx.spl.spl

SPL Python primitive operators.

Streams Python REST API

Module that allows interaction with an running Streams instance or service through HTTPS REST APIs.


streamsx.build

REST API bindings for IBM® Streams Cloud Pak for Data build service.

streamsx.rest

REST API bindings for IBM® Streams & Streaming Analytics service.

streamsx.rest_primitives

Primitive objects for REST bindings.

Scripts

The streamsx package provides a number of command line scripts.

spl-python-extract

Overview

Extracts SPL Python primitive operators from decorated Python classes and functions.

Executing this script against an SPL toolkit creates the SPL primitive operator meta-data required by the SPL compiler (sc).

Usage

spl-python-extract [-h] -i DIRECTORY [--make-toolkit] [-v]

Extract SPL operators from decorated Python classes and functions.

optional arguments:
  -h, --help            show this help message and exit
  -i DIRECTORY, --directory DIRECTORY
                        Toolkit directory
  --make-toolkit        Index toolkit using spl-make-toolkit
  -v, --verbose         Print more diagnostics

SPL Python primitive operators

SPL operators that call a Python function or callable class are created by decorators provided by the streamsx package.

To create SPL operators from Python functions or classes one or more Python modules are created in the opt/python/streams directory of an SPL toolkit.

spl-python-extract is a Python script that creates SPL operators from Python functions and classes contained in modules under opt/python/streams.

The resulting operators embed the Python runtime to allow stream processing using Python.

Details on how to implement SPL Python primitive operators see streamsx.spl.spl.

streamsx-info

Overview

Information about streamsx package and environment.

Prints to standard out information about the streamsx package and environment variables used to support Python in IBM Streams and Streaming Analytics service.

A Python warning is issued if a mismatch is detected between the installed streamsx package and its modules. This is typically due to having a different version of the modules accessible through the environment variable PYTHONPATH.

Warning

When using the streamsx package ensure that the environment variable PYTHONPATH does not include a path ending with com.ibm.streamsx.topology/opt/python/packages. The IBM Streams environment configuration script streamsprofile.sh modifies or sets PYTHONPATH to include the Python support from the SPL topology toolkit shipped with the product. This was to support Python before the streamsx package was available. The recommendation is to unset PYTHONPATH or modify it not to include the path to the topology toolkit.

Output is subject to change in the order and information displayed. Intended as an ad-hoc tool to help diagnose issues with streamsx.

Script may also be run as Python module:

python -m streamsx.scripts.info

Usage

usage: streamsx-info [-h]

    Prints support information about streamsx package and environment.

optional arguments:
    -h, --help  show this help message and exit

streamsx-runner

Overview

Submits or builds a Streams application to the Streaming Analytics service.

The application to be submitted can be:

  • A Python application defined through Topology using the --topology flag.

  • An SPL application (main composite) using the --main-composite flag.

  • A Streams application bundle (sab file) using the --bundle flag.

Streaming Analytics service

The Streaming Analytics service is defined by:

  • Service name - --service-name defaulting to environment variable STREAMING_ANALYTICS_SERVICE_NAME. The service name must exist in the vcap services.

  • Vcap services - Environment variable VCAP_SERVICES containing JSON representation of the service definitions or a file name containing the service definitions.

Job submission

Job submission occurs unless --create-bundle is set.

Bundle creation

When -create-bundle is specified with -main-composite or --topology then a Streams application bundle (sab file) is created.

If environment variable STREAMS_INSTALL is set the the build is local otherwise the build occurs in the IBM Cloud using the Streaming Analytics service.

When STREAMS_INSTALL is not set then streamsx-runner can be executed with no local Streams install.

When compiling an SPL application (--main-composite) then the path to the application toolkit containing the main composite must be listed with --toolkits.

Any other required local toolkits must be listed with with --toolkits.

Usage

streamsx-runner [-h] [--service-name SERVICE_NAME] | [--create-bundle]
             (--topology TOPOLOGY | --main-composite MAIN_COMPOSITE | --bundle BUNDLE)
             [--toolkits TOOLKITS [TOOLKITS ...]] [--job-name JOB_NAME]
             [--preload] [--trace {error,warn,info,debug,trace}]
             [--submission-parameters SUBMISSION_PARAMETERS [SUBMISSION_PARAMETERS ...]]
             [--job-config-overlays file]

Execute a Streams application using a Streaming Analytics service.

optional arguments:
  -h, --help            show this help message and exit
  --service-name SERVICE_NAME
                        Submit to Streaming Analytics service
  --create-bundle       Create a bundle (sab file). No job submission occurs.
  --topology TOPOLOGY   Topology to call
  --main-composite MAIN_COMPOSITE
                        SPL main composite (namespace::composite_name)
  --bundle BUNDLE       Streams application bundle (sab file) to submit to
                        service

Build options:
  Application build options

  --toolkits TOOLKITS [TOOLKITS ...]
                        SPL toolkit path containing the main composite and any
                        other required SPL toolkit paths.

Job options:
  Job configuration options

  --job-name JOB_NAME   Job name
  --preload             Preload job onto all resources in the instance
  --trace {error,warn,info,debug,trace}
                        Application trace level
  --submission-parameters SUBMISSION_PARAMETERS [SUBMISSION_PARAMETERS ...], -p SUBMISSION_PARAMETERS [SUBMISSION_PARAMETERS ...]
                        Submission parameters as name=value pairs
  --job-config-overlays file
                        Path to file containing job configuration overlays
                        JSON. Overrides any job configuration set by the
                        application.

Submitting to Streaming Analytics service

An application is submitted to a Streaming Analytics service using --service-name SERVICE_NAME. The named service must exist in the VCAP services definition pointed to by the VCAP_SERVICES environment variable.

The application is submitted as source (except --bundle) and compiled into a Streams application bundle (sab file) using the build service before being submitted as a running job to the service instance.

Python applications

To submit a Python application a Python function must be defined that returns the application (and optionally its configuration) to be submitted. The fully qualified name of this function is specified using the --topology flag.

For example, an application can be submitted as:

streamsx-runner --service-name Streaming-Analytics-xd \
    --topology com.example.apps.sensor_ingester

The function returns one of:

  • a Topology instance defining the application

  • a tuple containing two values, in order:
    • a Topology instance defining the application

    • job configuration, one of:
      • JobConfig instance

      • dict corresponding to the configuration object passed into submit()

For example the above function might be defined as:

def _create_sensor_ingester_app():
   topo = Topology('SensorIngesterApp')

   # Application declaration omitted
   ...

   return topo

def sensor_ingester():
    return (_create_sensor_ingester_app(), JobConfig(job_name='SensorIngester'))

Thus when this application is submitted using the sensor_ingester function it is always submitted with the same job name SensorIngester.

The function must be accessible from the current Python path (typically through environment variable PYTHONPATH).

SPL applications

The main composite that defines the application is specified using the -main-composite flag specifing the fully namespace qualified name.

Any required local SPL toolkits, including the one containing the main composite, must be indivdually specified by location to the --toolkits flag. Any SPL toolkit that is present on the IBM Cloud service need not be included.

For example, an application that uses the Slack toolkit might be submitted as:

streamsx-runner --service-name Streaming-Analytics-xd \
    --main-composite com.example.alert::SlackAlerter \
    --toolkits $HOME/app/alerters $HOME/toolkits/com.ibm.streamsx.slack

where $HOME/app/alerters is the location of the SPL application toolkit containing the com.example.alert::SlackAlerter main composite.

Warning

The main composite name must be namespace qualified. Use of the default namespace for a main composite is not recommended as it increases the chance of a name clash with another SPL toolkit.

Streams application bundles

A Streams application bundle is submitted to a service instance using --bundle. The argument to --bundle is a locally accessible file that will be uploaded to the service.

The bundle must have been created on using an IBM Streams install whose architecture and OS version matches the service instance. Currently this is x86_64 and RedHat/CentOS 6 or 7 depending on the service instance.

The --toolkits flag must not be specified when submitting a bundle.

Job options

Job options, such as --job-name, configure the running job.

For --topology job options set as arguments to streamsx-runner override any configuration returned from the function defining the application.

Creating Streams application bundles

--create-bundle uses a local IBM Streams install to attempt to mimic the build that would occur with -topology or --main-composite. Differences between the local environment and the IBM Cloud Streaming Analytics build environment may cause build failures in one and not the other.

This can be used as a mechanism to perform a local test build before using the service, or as a valid mechanism to create bundles for later upload with --bundle.

For example simply changing the --service-name name to --create-bundle perfoms a local build of the same application:

# Submit to an Streaming Analytics service
streamsx-runner --service-name Streaming-Analytics-xd \
    --main-composite com.example.alert::SlackAlerter \
    --toolkits $HOME/app/alerters $HOME/toolkits/com.ibm.streamsx.slack

# Build the same application locally
streamsx-runner --create-bundle \
    --main-composite com.example.alert::SlackAlerter \
    --toolkits $HOME/app/alerters $HOME/toolkits/com.ibm.streamsx.slack

streamsx-sc

Overview

SPL compiler for IBM Streams running on IBM Cloud Pak for Data.

streamsx-sc replicates a sub-set of Streams 4.3 sc options.

streamsx-sc is supported for Streams 5.x (Cloud Pak for Data). A local install of Streams is not required, simply the installation of the streamsx package. All functionality is implemented through the Cloud Pak for Data and Streams build service REST apis.

Cloud Pak for Data configuration
Integrated configuration

The Streams instance (and its build service) and authentication are defined through environment variables:

  • CP4D_URL - Cloud Pak for Data deployment URL, e.g. https://cp4d_server:31843.

  • STREAMS_INSTANCE_ID - Streams service instance name.

  • STREAMS_USERNAME - (optional) User name to submit the job as, defaulting to the current operating system user name.

  • STREAMS_PASSWORD - Password for authentication.

Standalone configuration

The Streams build service and authentication are defined through environment variables:

  • STREAMS_BUILD_URL - Streams build service URL, e.g. when the service is exposed as node port: https://<NODE-IP>:<NODE-PORT>

  • STREAMS_USERNAME - (optional) User name to submit the job as, defaulting to the current operating system user name.

  • STREAMS_PASSWORD - Password for authentication.

Usage

streamsx-sc [-h] --main-composite name [--spl-path SPL_PATH]
        [--optimized-code-generation] [--no-optimized-code-generation]
        [--prefer-facade-tuples] [--ld-flags LD_FLAGS]
        [--cxx-flags CXX_FLAGS] [--c++std C++STD]
        [--data-directory DATA_DIRECTORY]
        [--output-directory OUTPUT_DIRECTORY] [--disable-ssl-verify]
        [--static-link] [--standalone-application]
        [--set-relax-fusion-relocatability-restartability]
        [--checkpoint-directory path] [--profiling-sampling rate]
        [compile-time-args [compile-time-args ...]]

Options and arguments

compile-time-args:

Pass named arguments each in the format name=value to the compiler. The name cannot contain the character = but otherwise is a free form string. It matches the name parameter that is specified in calls that are made to the compile-time argument access functions from within SPL code. The value can be any string. See Compile-time arguments .

-M,–main-composite:

SPL Main composite

-t,–spl-path:

Set the toolkit lookup paths. Separate multiple paths with :. Each path is a toolkit directory or a directory of toolkit directories. This path overrides the STREAMS_SPLPATH environment variable.

-a,–optimized-code-generation:

Generate optimized code with less runtime error checking

—no-optimized-code-generation:

Generate non-optimized code with more runtime error checking. Do not use with the –optimized-code- generation option.

-k,–prefer-facade-tuples:

Generate the facade tuples when it is possible.

-w,–ld-flags:

Pass the specified flags to ld while linking occurs.

-x,–cxx-flags:

Pass the specified flags to the C++ compiler during the build.

–c++std:

Specify the language level for the underlying C++ compiles.

–data-directory:

Specifies the location of the data directory to use.

–output-directory:

Specifies a directory where the application artifacts are placed.

–disable-ssl-verify:

Disable SSL verification against the build service

Deprecated arguments

Arguments supported by sc but deprecated. They have no affect on compilation.

-s,–static-link

-T,–standalone-application

-O,–set-relax-fusion-relocatability-restartability

-K,–checkpoint-directory

-S,–profiling-sampling

Toolkits

The application toolkit is defined as the working directory of streamsx-sc.

Local toolkits are found through the toolkit path set by –spl-path or environment variable STREAMS_SPLPATH. Local toolkits are included in the build code archive sent to the build service if:

  • the toolkit is defined as a dependent of the application toolkit including recursive dependencies of required local toolkits.

  • and a toolkit of a higher version within the required dependency range does not exist locally or remotely on the build service.

The toolkit path for the compilation on the build service includes:

  • the application toolkit

  • local tookits included in the build code archive

  • all toolkits uploaded on the Streams build service

  • all product toolkits on the Streams build service

The application toolkit and local toolkits included in the build archive are processed prior to the actual compilation by:

  • having any Python SPL primitive operators extracted using spl-python-extract

  • indexed using spl-make-toolkit

New in version 1.13.

streamsx-service

Overview

Control commands for a Streaming Analytics service.

Usage

streamsx-service [-h] [--service-name SERVICE_NAME] [--full-response]
              {start,status,stop} ...

Control commands for a Streaming Analytics service.

positional arguments:
  {start,status,stop}   Supported commands
    start               Start the service instance
    status              Get the service status.
    stop                Stop the instance for the service.

optional arguments:
  -h, --help            show this help message and exit
  --service-name SERVICE_NAME
                        Streaming Analytics service name
  --full-response       Print the full JSON response.

service.py stop [-h] [--force]

optional arguments:
  -h, --help  show this help message and exit
  --force     Stop the service even if jobs are running.

Controlling a Streaming Analytics service

The Streaming Analytics service to control is defined using --service-name SERVICE_NAME. If not provided then the service name is defined by the environment variable STREAMING_ANALYTICS_SERVICE_NAME.

The named service must exist in the VCAP services definition pointed to by the VCAP_SERVICES environment variable.

The response from making the control request is printed to standard out in JSON format. By default a minimal response is printed including the status of the service and the job count. The complete response from the service REST API is printed if the option --full-response is given.

streamsx-streamtool

Overview

Command line interface for IBM Streams running on IBM Cloud Pak for Data.

streamsx-streamtool replicates a sub-set of Streams streamtool commands focusing on supporting DevOps for streaming applications.

streamsx-streamtool is supported for Streams Cloud Pak for Data (5.x) instances A local install of Streams is not required, simply the installation of the streamsx package. All functionality is implemented through Cloud Pak for Data and Streams REST apis.

Cloud Pak for Data configuration

The Streams instance and authentication are defined through environment variables, the details depend on if the Streams instance is running in integrated or standalone configuration.

Integrated configuration
  • CP4D_URL - Cloud Pak for Data deployment URL, e.g. https://cp4d_server:31843.

  • STREAMS_INSTANCE_ID - Streams service instance name.

  • STREAMS_USERNAME - (optional) User name to submit the job as, defaulting to the current operating system user name. Overridden by the --User option.

  • STREAMS_PASSWORD - Password for authentication.

Standalone configuration
  • STREAMS_REST_URL - Streams SWS service (REST API) URL, e.g. when the service is exposed as node port: https://<NODE-IP>:<NODE-PORT>

  • STREAMS_BUILD_URL - Streams build service (REST API) URL, e.g. when the service is exposed as node port: https://<NODE-IP>:<NODE-PORT>. Required for lstoolkit and rmtoolkit.

  • STREAMS_USERNAME - (optional) User name to submit the job as, defaulting to the current operating system user name.

  • STREAMS_PASSWORD - Password for authentication.

Usage

streamsx-streamtool submitjob [-h] [--jobConfig file-name]
        [--jobname job-name] [--jobgroup jobgroup-name]
        [--outfile file-name] [--P parameter-name]
        [--User user]
        sab-pathname

streamsx-streamtool canceljob [-h] [--force] [--collectlogs]
        [--jobs job-id | --jobnames job-names | --file file-name]
        [--User user]
        [jobid [jobid ...]]

streamsx-streamtool lsjobs [-h] [--jobs job-id] [--users user]
        [--jobnames job-names] [--fmt format-spec]
        [--xheaders] [--long] [--showtimestamp]
        [--User user]

streamsx-streamtool lsappconfig [-h] [--fmt format-spec] [--User user]

streamsx-streamtool mkappconfig [-h] [--property name=value]
        [--propfile property-file]
        [--description description] [--User user]
        config-name

streamsx-streamtool rmappconfig [-h] [--noprompt] [--User user] config-name

streamsx-streamtool chappconfig [-h] [--property name=value]
        [--description description] [--User user]
        config-name

streamsx-streamtool getappconfig [-h] [--User user] config-name

streamsx-streamtool lstoolkit [-h]
        (--all | --id toolkit-id | --name toolkit-name | --regex toolkit-regex)
        [--User user]

streamsx-streamtool rmtoolkit [-h]
        (--toolkitid toolkit-id | --toolkitname toolkit-name | --toolkitregex toolkit-regex)
        [--User user]

streamsx-streamtool uploadtoolkit [-h] --path toolkit-path [--User user]

streamsx-streamtool updateoperators [-h] [--jobname job-name]
        [--jobConfig file-name]
        [--parallelRegionWidth parallelRegionName=width]
        [--force] [--User user]
        [jobid]

submitjob

The streamtool submitjob command previews or submits one job.

Description:

A submitted job runs an application that is defined by an application bundle. Application bundles are created by the Stream Processing Language (SPL) compiler. A job consists of one or more processing elements (PEs). The PEs are placed on one or more of the application resources for the instance. The submission fails if the PE placement constraints can’t be met.

Jobs remain in the system until they are canceled or the instance is stopped.

streamsx-streamtool submitjob [-h] [--jobConfig file-name]
        [--jobname job-name] [--jobgroup jobgroup-name]
        [--outfile file-name] [--P parameter-name]
        [--User user]
        sab-pathname

Options and arguments

sab-pathname

Specifies the path name for the application bundle file. If you do not specify an absolute path, the command seeks the file in the directory where you ran the command. Alternatively, you can specify the path name for the application description language (ADL) file if the application bundle file exists in the same directory.

-g,–jobConfig:

Specifies the name of an external file that defines a job configuration overlay. You can use a job configuration overlay to set the job configuration when the job is submitted or to change the configuration of a running job.

-P,–P:

Specifies a submission-time parameter and value for the job. You can specify this option multiple times in the command.

-J,–jobgroup:

Specifies the job group. If you do not specify this option, the command uses the following job group: default.

—jobname:

Specifies the name of the job.

—outfile:

Specifies the path and file name of the output file in which the command writes the list of submitted job IDs. The path can be an absolute or relative path. If you do not specify a path, the file is created in the directory where you run the command.

-U,–User:

Specifies an IBM Streams user ID that has authority to run the command.

canceljob

The streamtool canceljob command cancels one or more jobs.

This command stops the processing elements (PEs) for the job and removes knowledge of the jobs and their PEs from the instance. The log files for the processing elements are scheduled for removal.

If you specify to collect the PE logs before they are removed, the operation can time out waiting for the termination of PEs. If such a timeout occurs, the operation fails and the jobs or PEs are still in the system. The canceljob command can be run again later to cancel them.

You can use the –force option to ignore a PE termination timeout and force the job to cancel.

streamsx-streamtool canceljob [-h] [--force] [--collectlogs]
        [--jobs job-id | --jobnames job-names | --file file-name]
        [--User user]
        [jobid [jobid ...]]

Options and arguments

jobid

Specifies a list of job IDs.

-f,–file:

Specifies the file that contains a list of job IDs, one per line.

-j,–jobs:

Specifies a list of job IDs, which are delimited by commas.

—jobnames:

Specifies a list of job names, which are delimited by commas.

—collectlogs:

Specifies to collect the log and trace files for each processing element that is associated with the job.

—force:

Specifies to quickly cancel a job and remove the job from the Streams data table.

-U,–User:

Specifies an IBM Streams user ID that has authority to run the command.

lsjobs

The streamtool lsjobs command lists the jobs in the instance.

The streamtool lsjobs command provides a health summary for each job. The health summary is an aggregation of the PE health summaries for the job. If all of the PEs for a job are reported as healthy, the job is reported as healthy. Otherwise, the job is reported as not healthy. Use the streamtool lspes command to determine the health of PEs.

The command also reports the status of each job. For more information about job states, see the IBM Streams product documentation.

The date and time that the job was submitted are presented in local time with the iso8601 format: yyyy-mm-ddThh:mm:ss+/-hhmm, where the final hhmm values are the local offset from UTC. For example: 2010-03-16T13:41:53-0500.

When job selection options are specified, selected jobs must meet all of the selection criteria. After a cancel request for a job is processed, this command no longer reports the job or its processing elements (PEs).

streamsx-streamtool lsjobs [-h] [--jobs job-id] [--users user]
        [--jobnames job-names] [--fmt format-spec]
        [--xheaders] [--long] [--showtimestamp]
        [--User user]

Options and arguments

-j,–jobs:

Specifies a list of job IDs, which are delimited by commas.

—jobnames:

Specifies a list of job names, which are delimited by commas.

-u,–users:

Specifies to select from this list of user IDs, which are delimited by commas.

—xheaders:

Specifies to exclude headings from the report.

-l,–long:

Reports launch count, full host names, and all of the operator instance names for the PEs.

—fmt:

Specifies the presentation format. The command supports the following values:

  • %Mf: Multiline record format. One line per field.

  • %Nf: Name prefixed field table format. One line per job.

  • %Tf: Standard table format, which is the default. One line per job.

—showtimestamp:

Specifies to show a time stamp in the output to indicate when the command was run.

-U,–User:

Specifies an IBM Streams user ID that has authority to run the command.

lsappconfig

The streamtool lsappconfig command lists the available configurations that enable connections to an external application.

Retrieve a list of configurations for making a connection to an external application.

streamsx-streamtool lsappconfig [-h] [--fmt format-spec] [--User user]

Options and arguments

—fmt:

Specifies the presentation format. The command supports the following values:

  • %Mf: Multiline record format. One line per field.

  • %Nf: Name prefixed field table format. One line per cfgname.

  • %Tf: Standard table format, which is the default. One line per cfgname.

-U,–User:

Specifies an IBM Streams user ID that has authority to run the command.

mkappconfig

The streamtool mkappconfig command creates a configuration that enables connection to an external application.

Operators can retrieve the configuration information to make a connection to an external application, such as an Internet Of Things application. The properties include items that the application needs at runtime, like connection information and credentials.

Use this command to register properties or a properties file. Create the property file using a name=value syntax.

streamsx-streamtool mkappconfig [-h] [--property name=value]
        [--propfile property-file]
        [--description description] [--User user]
        config-name

Options and arguments

config-name:

Name of the app config

—description:

Specifies a description for the application configuration. The description can be 1024 characters in length. If the description contains blank characters, it must be enclosed in single or double quotation marks. Quotation marks within the description must be preceded by a backslash ().

—property:

Specifies a property name and value pair to add to or change in the configuration. This option can be specified multiple times and has an additive effect.

—propfile:

Specifies the path to a file that contains a list of application configuration properties for connecting to an external application. The properties are listed as name=value pairs, each on a separate line. Use this option as a way to include multiple configuration properties when you create an application configuration. Options that you specify at the command line override values that are specified in this property file.

-U,–User:

Specifies an IBM Streams user ID that has authority to run the command.

rmappconfig

The streamtool rmappconfig command removes a configuration that enables connection to an external application.

This command removes a configuration that is used for making a connection to an external application.

streamsx-streamtool rmappconfig [-h] [--noprompt] [--User user] config-name

Options and arguments

config-name:

Name of the app config

—noprompt:

Specifies to suppress confirmation prompts.

-U,–User:

Specifies an IBM Streams user ID that has authority to run the command.

chappconfig

The streamtool chappconfig command updates a configuration that enables connection to an external application.

Use this command to change the configuration properties that are used to make a connection to an external application, such as an Internet Of Things application. You can change the values of properties or add new properties.

streamsx-streamtool chappconfig [-h] [--property name=value]
        [--description description] [--User user]
        config-name

Options and arguments

config-name:

Name of the app config

—description:

Specifies a description for the application configuration. The description can be 1024 characters in length. If the description contains blank characters, it must be enclosed in single or double quotation marks. Quotation marks within the description must be preceded by a backslash ().

—property:

Specifies a property name and value pair to add to or change in the configuration. This option can be specified multiple times and has an additive effect.

-U,–User:

Specifies an IBM Streams user ID that has authority to run the command.

getappconfig

The streamtool getappconfig command displays the properties of a configuration that enables connection to an external application.

This command retrieves the properties and values of a specific configuration for connecting to an external application.

streamsx-streamtool getappconfig [-h] [--User user] config-name

Options and arguments

config-name:

Name of the app config

-U,–User:

Specifies an IBM Streams user ID that has authority to run the command.

lstoolkit

List toolkits from a build service.

streamsx-streamtool lstoolkit [-h]
        (--all | --id toolkit-id | --name toolkit-name | --regex toolkit-regex)
        [--User user]

Options and arguments

-a,–all:

List all toolkits

-i,–id:

List a specific toolkit given its toolkit id

-n,–name:

List all toolkits with this name

-r,–regex:

List all toolkits where the name matches the given regex pattern

rmtoolkit

Remove toolkits from a build service.

streamsx-streamtool rmtoolkit [-h]
        (--id toolkit-id | --name toolkit-name | --regex toolkit-regex)
        [--User user]

Options and arguments

-i,–id:

Specifies the id of the toolkit to delete

-n,–name:

Remove all toolkits with this name

-r,–regex:

Remove all toolkits where the name matches the given regex pattern

uploadtoolkit

Upload a toolkit to a build service.

streamsx-streamtool uploadtoolkit [-h] --path toolkit-path [--User user]

Options and arguments

-p,–path:

Specifies the path of the indexed toolkit to upload

New in version 1.13.

updateoperators

Adjust a job configuration while the job is running in order to improve the job performance

streamsx-streamtool updateoperators [-h] [--jobname job-name]
        [--jobConfig file-name]
        [--parallelRegionWidth parallelRegionName=width]
        [--force] [--User user]
        [jobid]

Options and arguments

jobid:

Specifies a job ID

—jobname:

Specifies the name of the job

-g,–jobConfig:

Specifies the name of an external file that defines a job configuration overlay. You can use a job configuration overlay to set the job configuration when the job is submitted or to change the configuration of a running job.

—parallelRegionWidth:

Specifies a parallel region name and its width.

—force:

Specifies whether to automatically stop the PEs that need to be stopped.

-U,–User:

Specifies an IBM Streams user ID that has authority to run the command.

Environments

IBM Streaming Analytics service

Overview

IBM® Streaming Analytics for IBM Cloud is powered by IBM® Streams, an advanced analytic platform that you can use to ingest, analyze, and correlate information as it arrives from different types of data sources in real time. When you create an instance of the Streaming Analytics service, you get your own instance of IBM® Streams running in IBM® Cloud, ready to run your IBM® Streams applications.

Package support

This streamsx package supports :

Accessing a service

In order to use a Streaming Analytics service you must have access to credentials for the service. There are two mechanisms used by this package, VCAP services and direct use of Streaming Analytics credentials.

VCAP services

This is the format used by Cloud Foundry for bindable services. The service key for Streaming Analytics service is streaming-analytics, the value of that key in the VCAP services is a list of accessible services, each service represented by a separate object.

Each streaming analytics object must have these keys:

  • name identifying the name of the service.

  • credentials identifying the connection credentials for the service.

Example VCAP services containing two Streaming Analytics services sa-test and sa-prod (with the specific connection details elided):

{
"streaming-analytics": [
{
  "name": "sa-test",
  "credentials":
  {
     "apikey": "...",
     "iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - ...",
     "iam_apikey_name": "auto-generated-apikey-...",
     "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
     "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity ...",
      "v2_rest_url": "https://streams-app-service.ng.bluemix.net/v2/streaming_analytics/..."
  }
},
{
  "name": "sa-prod",
  "credentials":
  {
     "apikey": "...",
     "iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - ...",
     "iam_apikey_name": "auto-generated-apikey-...",
     "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
     "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity ...",
      "v2_rest_url": "https://streams-app-service.ng.bluemix.net/v2/streaming_analytics/..."
  }
}
]
}

Note

The specific keys in the credentials may differ depending on the service plan.

Cloud Foundry applications

When a Streaming Analytics service is bound to a Cloud Foundry Python application the environment variable VCAP_SERVICES is automatically defined and contains a string representation of the JSON VCAP services information.

Client applications

Client applications are ones that run outside of the IBM Cloud, for example on a local laptop, or applications that are not bound to a service.

Client applications running must define a valid VCAP services in its JSON format as either:

  • In the environment variable VCAP_SERVICES containing a string representation of the JSON VCAP services information.

  • In a file containing a string representation of the JSON VCAP services information and have the file’s absolute path in either:

The contents of the file must be manually created, the credentials for the credentials key are obtained from the Streaming Analytics manage console. Select the Service Credentials page and then copy the required credentials. You may need to first create credentials. You can an copy the credentials by taking the View credentials action and then clicking the copy to clipboard icon on the right hand side.

Warning

The credential information in VCAP services is in plain text. Ensure that the any file containing the information or setting the environment variable has suitable permissions set. For example only readable by the intended user.

Selecting the service

The Streaming Analyitcs service to use is specifed by its name, the required service much exist in the VCAP service information using the name key.

The name of the service to use is set by:

  • the environment variable STREAMING_ANALYTICS_SERVICE_NAME.

  • the configuration property SERVICE_NAME when submitting an application using submit() with context type STREAMING_ANALYTICS_SERVICE. This overrides the environment variable STREAMING_ANALYTICS_SERVICE_NAME.

  • the --service-name option to streamsx-runner.

Service definition

The Streaming Analytics service to use may be specified solely using its credentials. The credentials are specified:

Credentials obtained from the Streaming Analytics manage console. Select the Service Credentials page and then copy the required credentials. You may need to first create credentials. You can an copy the credentials by taking the View credentials action and then clicking the copy to clipboard icon on the right hand side.

IBM Streams Python setup

Developer setup

Developers install the streamsx package Python Package Index (PyPI) to use this functionality:

pip install streamsx

If already installed upgrade to the latest version is recommended:

pip install --upgrade streamsx

A local install of IBM Streams is not required when:

  • Using the Streams and Streaming Analytics REST bindings streamsx.rest.

  • Devloping and submitting streaming applications using streamsx.topology.topology to Cloud Pak for Data or Streaming Analytics service on IBM Cloud.

    • The environment variable JAVA_HOME must reference a Java JRE or JDK/SDK version 8 or higher.

A local install of IBM Streams is required when:

  • Developing and submitting streaming applications using streamsx.topology.topology to IBM Streams 4.2, 4.3 distributed or standalone contexts.

    • If set the environment variable JAVA_HOME must reference a Java JRE or JDK/SDK version 8 or higher, otherwise the Java install from $STREAMS_INSTALL/java is used.

  • Creating SPL toolkits with Python primitive operators using streamsx.spl.spl decorators for use with 4.2, 4.3 distributed or standalone applications.

Warning

When using the streamsx package ensure that the environment variable PYTHONPATH does not include a path ending with com.ibm.streamsx.topology/opt/python/packages. The IBM Streams environment configuration script streamsprofile.sh modifies or sets PYTHONPATH to include the Python support from the SPL topology toolkit shipped with the product. This was to support Python before the streamsx package was available. The recommendation is to unset PYTHONPATH or modify it not to include the path to the topology toolkit.

Note

The streamsx package is self-contained and does not depend on any SPL topology toolkit (com.ibm.streamsx.topology) installed under $STREAMS_INSTALL/toolkits or on the SPL compiler’s (sc) toolkit path. This is true at SPL compilation time and runtime.

Streaming Analytics service

The service instance has Anaconda installed with Python 3.6 as the runtime environment and has PYTHONHOME Streams application environment variable pre-configured.

Any streaming applications using Python must use Python 3.6 when submitted to the service instance. The streamsx package must be installed locally and applications are submitted to the STREAMING_ANALYTICS_SERVICE context.

IBM Cloud Pak for Data

An IBM Streams service instance within Cloud Pak for Data has Anaconda installed with Python 3.6 as the runtime environment and has PYTHONHOME Streams application environment variable pre-configured.

Any streaming applications using Python must use Python 3.6 when submitted to the service instance.

Streaming applications can be submitted through Jupyter notebooks running in Cloud Pak for Data projects. The streamsx package is preinstalled and applications are submitted to the DISTRIBUTED context.

Streaming applications can be submitted externally to the OpenShift cluster containing Cloud Pak for Data. The streamsx package must be installed locally and applications are submitted to the DISTRIBUTED context. The specific environment variables depend on if the Streams instance is in a integrated or standalone configuration. See DISTRIBUTED for details.

IBM Streams 4.2, 4.3

For a distributed cluster running Streams Python 3.7, 3.6 or 3.5 may be used.

Anaconda or Miniconda distributions may be used as the Python runtime, these have the advantage of being pre-built and including a number of standard packages. Ananconda installs may be downloaded at: https://www.continuum.io/downloads .

If building Python from source then it must be built to support embedding of the runtime with shared libraries (--enable-shared option to configure).

Distributed

For distributed the Streams application environment variable PYTHONHOME must be set to the Python install path.

This is set using streamtool as:

streamtool setproperty --application-ev PYTHONHOME=path_to_python_install

The application environment variable may also be set using the Streams console. The Instance Management view has an Application Environment Variables section. Expanding the details for that section allows modification of the set of environment variables available to Streams applications.

The Python install path must be accessible on every application resource that will execute Python code within a Streams application.

Note

The Python version used to declare and submit the application must compatible with the setting of PYTHONHOME in the instance. For example, if PYTHONHOME Streams application instance variable points to a Python 3.6 install, then Python 3.5 or 3.6 can be used to declare and submit the application.

Standalone

The environment PYTHONHOME must be set to the Python install path.

Bundle Python version compatibility

As of 1.13 Streams application bundles (sab files) invoking Python are binary compatible with a range of Python releases when using Python 3.

The minimum verson supported is the version of Python used during bundle creation.

The maximum version supported is the highest version of Python with a proposed release schedule.

For example if a sab is built with Python 3.6 then it can be submitted to a Streams instance using 3.6 or higher, up to & including 3.9 which is the highest Python release with a proposed release schedule as of 1.13.

Note

Compatability across Python releases is dependent on Python’s Stable Application Binary Inteface.

Restrictions and known bugs

Restrictions and known bugs

  • No support for nested parallel regions at sources, i.e. nested streamsx.topology.topology.Stream.set_parallel(), for example:

    topo = Topology()
    s = topo.source(S())
    s.set_parallel(3).set_parallel(2)
    

    In this example, set_parallel(3) is ignored.

  • No support for nested types when defining stream schemas, for example:

    class NamedTupleNestedTupleSchema(typing.NamedTuple):
        key: str
        spotted: SpottedSchema
    
  • No support of collections of NamedTuple as stream schema, for example:

    class NamedTupleListOfTupleSchema(typing.NamedTuple):
        spotted: typing.List[SpottedSchema]
    
  • Python Composites (derived from streamsx.topology.composite.Composite) can have only one input port.

  • No support to process window markers or final marker (end of stream) in Python Callables like in SPL operators

  • No hook for drain processing in consistent region for Python Callables

  • Submission time parameters, which are defined in SPL composites of other toolkits, or created by using streamsx.spl.op.Expression in the topology, cannot be accessed at runtime with streamsx.ec.get_submission_time_value(name).

Indices and tables