Data Basics

Data is the content a user wants to analyze.

Data requirements

All data sets must have a minimum of two fields:

  • Unique ID
  • Text

Any additional fields are considered metadata.

Metadata is data about data. Metadata can be a standard field or a custom field. Standard fields are Dublin Core fields.

Custom fields are non-Dublin Core and require manual configuration by the user.

The list of Standard fields is as follows:

Field

Description

Example

Creator

The author of a document

Jane Doe

Date

When the text document was created

August 15, 2019

Engagement

Metrics on engagement with the document

Examples of Engagement could be number of comments, views or retweets

ID

A unique document identifier, used by the API to organize documents

1, 2, 3, 4...

Language

The written language of a document

English, Spanish, Portuguese

Parent

The parent element

If the document is a retweet, the Parent field could contain the ID of the original tweet. If the document is a comment, the Parent could be a link to or the ID of the original document.

Publisher

An entity responsible for making the resource available

Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity

Reach

The audience reach of a document (usually in the context of social media posts)

150,000 followers

Source

A related source from which the document is derived

Twitter, Amazon, New York Times

Tags

Identifying labels for a human being

high-priority, low-priority

Text

A document comprising natural language feedback

A tweet, call log, or survey response

Title

The title of a natural language document

Titles are frequently part of reviews by travelers, employees, and diners, e.g. "Fun Workplace Culture"

Type

The nature or genre of the resource

Used to describe the file format, physical medium, or dimensions of the resource

URL

A web link to the original natural language document

tripadvisor.com/example123

Example data set

The table below is an example of a very small data set.

The data set has the minimum ID and text columns, as well as a single metadata column labeled Purchase Location.

ID

Text

Purchase Location

1

I liked the product but found the build quality lacking. Would not recommend for serious users, casual use should be fine.

In-Store

2

Product arrived broken. When I contacted support they immediately sent out a replacement that arrived in working condition. Great service!

Online

3

I didn't like the product and tried to return it. Customer support was rude to me and only gave me store credit not a full refund.

Online

Uploading data

Projects are powered by data. Each time a user uploads data it will be sorted into a collection. The collection organizes the data set and all of its analyses.

To start the collection upload process, press the button that says Upload Data.

Uploading data is the first step to using Semantria Storage & VisualizationUploading data is the first step to using Semantria Storage & Visualization

Uploading data is the first step to using Semantria Storage & Visualization

In order to begin the data upload process the user must name the collection.

Name the collection using UTF-8 charactersName the collection using UTF-8 characters

Name the collection using UTF-8 characters

After the user names the collection they will be prompted to select a file for upload. Semantria Storage & Visualization works with .CSV and JSON files. After selecting the file, Semantria Storage & Visualization will scan the data set to find any Dublin Core columns.

Select a .CSV or JSON file for uploadSelect a .CSV or JSON file for upload

Select a .CSV or JSON file for upload

Field set up

It's now time for the user to set up the fields in the data set.

Semantria Storage & Visualization will scan the file and any label known fieldsSemantria Storage & Visualization will scan the file and any label known fields

Semantria Storage & Visualization will scan the file and any label known fields

In the above example Semantria Storage & Visualization automatically identified three Dublin Core columns.

To map the remaining columns the user clicks on "Do not include" in the "Column categories" section and selects the standard field they wish to use. Otherwise, the user may simply key in a custom field name.

📘

CSV and TSV files

If a user uploads a .csv file or a .tsv file and the columns do not look correct, they have the option to switch between comma and tab delimiters.

Pick column categories from a drop-down list or create custom categoriesPick column categories from a drop-down list or create custom categories

Pick column categories from a drop-down list or create custom categories

If the user decides to use a custom field they must specify the field type as one of the following:

  • String: this is used to configure text data, where the column may contain titles to reviews, manager names, locations
  • Integer: this is used to configure numerical data, where the column may contain customer IDs, star ratings, or product numbers
  • Float: this is used to configure numerical data, where the column may contain decimals like prices or percentages
  • Date: this is used to configure temporal data, where the column may contain dates
Manually categorizing custom fields is simpleManually categorizing custom fields is simple

Manually categorizing custom fields is simple

Once the user has finished mapping their data they may save the data to the collection.

🚧

Mapping columns

The user can choose to map all of their columns or just the ones pertinent to their use case.

The only required fields are ID and Text.

A completely categorized data set ready for upload to the collectionA completely categorized data set ready for upload to the collection

A completely categorized data set ready for upload to the collection

Reformatting Fields

When uploading fields you have the option to reformat the fields.

Hovering over the field shows the reformat optionHovering over the field shows the reformat option

Hovering over the field shows the reformat option

Clicking on the icon brings up the the advance column configurator. Here you can apply transformations to your data.

The transformation filters work similarly to Django filters.

Example: Create a slug from the "field-name" field:
field-name | slugify

Example: Lowercase "some-other" field and use "other" if empty:
some-other | lowercase | default:'other'

Example: Specify that date field is 'MM-dd-yyyy':
any-date:'{"format":"MM-dd-yyyy"}'

The following transformations are allowed on content.

Transformation

Result

lower

Lowercase the contents

upper

Uppercase the contents

trim

Trim whitespace from the beginning and the end of contents

concat

Concatenate the argument onto the end of the value

default

Use the argument if the value is missing or empty

slugify

Create a "slug" of the value given

md5ify

Generate an md5 hash for the value given

any-date

Convert the value to a the SSV standard date format. Argument can be a json defining how the string should be interpreted. Applicable to Date fields only

cut

Remove all values of arg from the given string

cut-re

Remove all values of the regular expression arg from the given string

Below is an example of a date transformation.

Example Date transformationExample Date transformation

Example Date transformation

Accepted Date Formats

The rule of thumb for date formats is that SSV accepts all Excel date formats. If you are unsure if your CSV file has an acceptable date, open it in Excel and format the date as an Excel formatted date.

Format

Example

MM/dd/yy

12/25/18

MM/dd/yy HH:mm

12/25/18 21:04

MM/dd/yyyy

12/25/2018

MM/dd/yyyy HH:mm

12/15/2018 21:04

yyyy-MM-dd'T'HH:mm:ss

2018-12-25T21:04:43

yyyy-MM-dd'T'HH:mm:ssX

2018-12-25T21:04:43Z

yyyy-MM-dd'T'HH:mm:ssX

2018-12-25T21:04:43-0000

yyyy-MM-dd'T'HH:mm:ssX

2018-12-25T21:04:43-00

yyyy-MM-dd'T'HH:mm:ssX

2018-12-25T21:04:43-00:00

yyyyMMdd

20181225

dd-MM-yyyy

25-12-2018

yyyy-MM-dd

2018-12-25

yyyy/MM/dd

2018/12/25

dd MMM yyyy

25 Dec 2018

dd MMMM yyyy

25 December 2018

yyyyMMddHHmm

20181225210443

yyyyMMdd HHmm

20181225 2104

dd-MM-yyyy HH:mm

25-12-2018 21:04

yyyy-MM-dd HH:mm

2018-12-25 21:04

yyyy/MM/dd HH:mm

2018/12/15 21:04

dd MMM yyyy HH:mm

25 Dec 2018 21:04

dd MMMM yyyy HH:mm

25 December 2018 21:04

yyyyMMddHHmmss

20181225210443

yyyyMMdd HHmmss

20181225 210443

dd-MM-yyyy HH:mm:ss

25-12-2018 21:04:43

yyyy-MM-dd HH:mm:ss

2018-12-25 21:04:43

MM/dd/yyyy HH:mm:ss

12/15/2018 21:04:43

yyyy/MM/dd HH:mm:ss

2018/12/15 21:04:43

dd MMM yyyy HH:mm:ss

25 Dec 2018 21:04:43

dd MMMM yyyy HH:mm:ss

25 December 2018 21:04:43

EEE, dd MMM yyyy HH:mm:ss z

Thu, 25 Dec 2018 21:04:43 UTC

Analysis

Once data has been uploaded and indexed the user can create an analysis.

🚧

Note on running analyses

Your account must have at least one existing configuration before you can run an analysis. Click on the Configurations tab on the top right of the Semantria Storage and Visualization window to access, create and edit configurations.

Uploaded data will appear on the collection tile under "data"Uploaded data will appear on the collection tile under "data"

Uploaded data will appear on the collection tile under "data"

Click on Create First Analysis to create an analysis of data with no previously existing analyses.

Name the analysis with UTF-8 charactersName the analysis with UTF-8 characters

Name the analysis with UTF-8 characters

There are three fields that must be addressed when creating an analysis:

  • Analysis name: The name that will be assigned to the analysis
  • Configuration: The Semantria configuration used to analyze the data (e.g. the Hospitality Industry Pack)
  • Notes: Any information the user wants to confer about the analysis

The Start Analysis button will enable once an analysis name is assigned and a configuration is chosen.

Clicking on the the "Collection Information" icon reveals more informationClicking on the the "Collection Information" icon reveals more information

Clicking on the the "Collection Information" icon reveals more information

Once the analysis has finished it is listed in the collection tile. If the user wants to see more information about the analyses they can click on the tile.

The collection details are listed under the "collection information" iconThe collection details are listed under the "collection information" icon

The collection details are listed under the "collection information" icon

Clicking on the information icon will bring up a detailed information card.