Processing Basics

Data processing essentials

Semantria is an asynchronous API. This means:

  • You submit content to us and retrieve content separately.
  • You can scale your content submission rates as you are not waiting on us to hand data back before you can submit more.
  • You may receive data back in a different order than you submitted it. Batches of content are not necessarily preserved
  • If you use the callback retrieval mechanism, the batches remain the same, but the order might be different
  • If you use auto response or polling, the batch membership may also change
  • If you have multiple machines sending and receiving content, one machine may receive a processed document that was submitted by another.
  • Every piece of content is processed by a Semantria configuration. If you don't specify one, the default English configuration will be used.

Handling Failures

There are several types of failures during the submission and processing of content.

  1. The submission is itself invalid in some way such as invalid JSON. In this case no documents are queued, and no API credits are deducted. You need to correct the errors and resubmit. You will know the submission is invalid if you receive anything other than a 200-series HTTP status response.
  2. The submission is valid but the content itself is failed. In this case you will receive the document back, with a FAILED status and an error message stating why it was failed. Credits are deducted for this. In this case, you should not resubmit the piece of content that was failed, as it will simply fail again. The most common cases of document failure are submitting content to the wrong language (sending Arabic content to an English config for instance) and content that does not have enough text to analyze (such as ASCII art and the like).
  3. The submission is valid but a limit has been exceeded (such as exceeding your license's document submission rate). In this case you may be able to resubmit the content after a small delay.

Keeping Track Of Your Content

Because order and batch are not preserved on the Semantria side, it is up to you to keep track of what you submitted and received. There are several ways for you to identify your content.

  1. Each document must have a unique id associated with it. This is returned to you by Semantria when you receive the processed data. You can use this id to update the status on your side.
  2. Each document can also have a tag field. This is a string field you can fill in with additional information you might use to keep track documents, such as a project ID. You can check on your side to see that you submitted 1,000 documents for tag "my_project" and received 1,000 documents back with that tag.
  3. You can submit and retrieve by job_id. This is a string value you can set when you submit and retrieve documents. it is intended to allow you to separate out processing streams of content for routing or failover purposes on your side, not as a unique ID per batch of content.
  • If you submit by a job_id, you must retrieve via that same job_id.
  • The total number of unique job_ids you use during a 24hr period must not exceed 100.
  • Each document can also have a metadata field. This is a JSON object that can contain arbitrary additional data related to the document, such as source or demographic info. Either the metadata or tag fields can be used for routing analysis results on your side.

🚧

Duplicate Document ID

It is possible to send two documents with the same document id, as long as you send them to different job ids. If you send two documents with the same document id to the same job id, the latter document sent may overwrite the former. Data loss may occur.

Data Processing

Queue: submit a batch of documents for Detailed analysis

  • Queue with a POST method and the server will return with an HTTP status.

Retrieve: return all processed documents.

  • Retrieve with a GET request and the server will return the results of all documents that have been processed. It will return nothing if no documents are processed.

Processing Methods

Queuing Documents

📘

Queuing: submitting a batch of documents for analysis.

Users must queue documents into the API for processing. A document can be processed with:

  • a specific configuration (by setting the using flag to a particular config_id)
  • with a specific language (by setting the using flag to a language code)
  • with a specific industry template (by setting the using flag to the industry template id)
  • with the default configuration (by not setting the using flag)

Single documents under 2KB in size should come back in a few seconds.

The URL is https://api.semantria.com/document.json?using=
The request body should contain a JSON list of objects with two fields: the document ID and the text to be analyzed. There are several optional fields that can be included for each document, such as metadata and tag.
After submitting documents to be queued, each document will be analyzed independently of the others. Semantria API will return an analysis for each document.

📘

The server will process each document independently of any other processes or documents. Documents will not influence each other in processing.