Introduction to Pubnub

PubNub is low latency, real time message passing framework which enables communication at a global scale. Messages are sent and received over communication “channels” through the PubNub data stream network using the PubNub API. PubNub BLOCKS are micro-services which have the power to alter and passively monitor these messages mid-flight.

Lexalytics now offers a text analytics BLOCK which makes the power of Semantria available to the PubNub community. Properly formatted documents published to the “lexalytics-channel” are forwarded to Semantria for processing. The results are then published on the “semoutput” channel.

Block Operation

Each publication to the “lexalytics-channel” channel will initiate a process which copies the input documents to a local queue and then attempts to perform the following steps:

  1. Submit the locally queued documents to Semantria.

  2. Retrieve processed documents from Semantria and publish them to the “semoutput” channel.


Step 1 (submission to Semantria) will only initiate when the number of documents in the local queue exceeds the Batch Limit of the Semantria subscription OR when the elapsed time since the last submission to Semantria exceeds the Batch Delay of 30 seconds. The Batch Limit is the maximum number of documents that can be sent to Semantria in a single batch.


Step 2 (retrieval from Semantria) will only initiate if the elapsed time since the last retrieval from Semantria exceeds the Poll Delay of 30 seconds.

These limitations are in place to lessen the communication burden between PubNub and Semantria.

Submitting Documents

To use the block, documents need to be published as a properly formatted JSON object to the “lexalytics-channel”. Here is an example with two documents:

    “docs”: [

                        {“text”: “This is document #1.”},

                        {“text”: “This is document #2.”, “id”: “123”}

Note that the documents are sent as dictionaries in an array called “docs”. The text of each document is defined by the “text” field. An optional document ID can be specified using the “id” field. If an ID is not specified, then a random, numeric ID will be generated. All fields must be strings.

Retrieving Documents

The block will request to poll Semantria for results every time it receives a message on “lexalytics-channel”, regardless of whether or not the message included any documents. If results are returned from Semantria, then they will be posted to “semoutput”.

To attempt a poll Semantria (or attempt submission) without sending new documents, a blank message can be published to “lexalytics-channel”: