3.5.1

10-10-2013

Features

  • Introduced support for Chinese. All Lexalytics Salience features for Chinese language, except for opinions and relationships, are now supported by Semantria.
  • HTML processing is now supported out of the box. Switch on HTML processing in Semantria configuration and Semantria will clean out HTML tags from content automatically.
  • Introduction of Auto-Categories feature for Basic Mode. Based on Lexalytics’ Salience Concept Topics, Auto-Categories are generated automatically based on the Wikipedia taxonomy (720+ nodes)
  • Added Mentions output for Themes and Entities of collection processing mode (Semantria Discovery Mode). Limits for both Theme and Entity mentions can be configured through configuration (Discovery Mode section).
  • Added “Normalized” field to custom entities configuration through the API. Normalized field allows different entity names to be normalized into a common one (i.e. Big Blue, Intelligent Business Machines, and IBM to IBM).
  • Introduced query grammar for entities configuration. Now, entities extraction can be controlled by Boolean syntax (for example IBM = IBM OR “I.B.M. OR “Big Blue” OR “Intelligent Business Machines").
  • Added Tag field to the incoming/outgoing documents. It can be used as free text marker that Semantria will return back to the user.
  • Added “Is inverted” marker to words returned by POS tagging feature. Marker indicates whether word is a part of sentiment inverting construction like a negator.
  • Added “Label” field to the Topics (concept/query) output for both Basic and Discovery modes. Label can contains any descriptive information about topic, same as Semantria Named Entities.
  • Enhanced “Location” output for Discovery mode. Now it returns not only index number of the document where entity was mentioned, but byte offset and its length as well.

Fixes

  • Fixed bug with wrong Location offset for Mentions in the content with multi-byte symbols.
  • Fixed bug preventing limits for any output data in a configuration to be more than 20. New API allows any limits to be set on demand and doesn’t reset it on configuration update.
  • Fixed bug with incorrect evidence values for Themes and Entities. Now it’s spread between 1 and 7 and doesn’t remain 4 or 7 all time.