Data Basics
Data is the content a user wants to analyze.
Data requirements
All data sets must have a minimum of two fields:
- Unique ID
- Text
Any additional fields are considered metadata.
Metadata is data about data. Metadata can be a standard field or a custom field. Standard fields are Dublin Core fields.
Custom fields are non-Dublin Core and require manual configuration by the user.
The list of Standard fields is as follows:
Field | Description | Example |
---|---|---|
Creator | The author of a document | Jane Doe |
Date | When the text document was created | August 15, 2019 |
Engagement | Metrics on engagement with the document | Examples of Engagement could be number of comments, views or retweets |
ID | A unique document identifier, used by the API to organize documents | 1, 2, 3, 4... |
Language | The written language of a document | English, Spanish, Portuguese |
Parent | The parent element | If the document is a retweet, the Parent field could contain the ID of the original tweet. If the document is a comment, the Parent could be a link to or the ID of the original document. |
Publisher | An entity responsible for making the resource available | Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity |
Reach | The audience reach of a document (usually in the context of social media posts) | 150,000 followers |
Source | A related source from which the document is derived | Twitter, Amazon, New York Times |
Tags | Identifying labels for a human being | high-priority, low-priority |
Text | A document comprising natural language feedback | A tweet, call log, or survey response |
Title | The title of a natural language document | Titles are frequently part of reviews by travelers, employees, and diners, e.g. "Fun Workplace Culture" |
Type | The nature or genre of the resource | Used to describe the file format, physical medium, or dimensions of the resource |
URL | A web link to the original natural language document | tripadvisor.com/example123 |
Example data set
The table below is an example of a very small data set.
The data set has the minimum ID and text columns, as well as a single metadata column labeled Purchase Location.
ID | Text | Purchase Location |
---|---|---|
1 | I liked the product but found the build quality lacking. Would not recommend for serious users, casual use should be fine. | In-Store |
2 | Product arrived broken. When I contacted support they immediately sent out a replacement that arrived in working condition. Great service! | Online |
3 | I didn't like the product and tried to return it. Customer support was rude to me and only gave me store credit not a full refund. | Online |
Uploading data
Projects are powered by data. Each time a user uploads data it will be sorted into a collection. The collection organizes the data set and all of its analyses.
To start the collection upload process, press the button that says Upload Data.


Uploading data is the first step to using Semantria Storage & Visualization
In order to begin the data upload process the user must name the collection.


Name the collection using UTF-8 characters
After the user names the collection they will be prompted to select a file for upload. Semantria Storage & Visualization works with .CSV and JSON files. After selecting the file, Semantria Storage & Visualization will scan the data set to find any Dublin Core columns.


Select a .CSV or JSON file for upload
Field set up
It's now time for the user to set up the fields in the data set.
Semantria Storage & Visualization will scan the file and any label known fields
In the above example Semantria Storage & Visualization automatically identified three Dublin Core columns.
To map the remaining columns the user clicks on "Do not include" in the "Column categories" section and selects the standard field they wish to use. Otherwise, the user may simply key in a custom field name.
CSV and TSV files
If a user uploads a .csv file or a .tsv file and the columns do not look correct, they have the option to switch between comma and tab delimiters.
Pick column categories from a drop-down list or create custom categories
If the user decides to use a custom field they must specify the field type as one of the following:
- String: this is used to configure text data, where the column may contain titles to reviews, manager names, locations
- Integer: this is used to configure numerical data, where the column may contain customer IDs, star ratings, or product numbers
- Float: this is used to configure numerical data, where the column may contain decimals like prices or percentages
- Date: this is used to configure temporal data, where the column may contain dates


Manually categorizing custom fields is simple
Once the user has finished mapping their data they may save the data to the collection.
Mapping columns
The user can choose to map all of their columns or just the ones pertinent to their use case.
The only required fields are ID and Text.
A completely categorized data set ready for upload to the collection
Reformatting Fields
When uploading fields you have the option to reformat the fields.


Hovering over the field shows the reformat option
Clicking on the icon brings up the the advance column configurator. Here you can apply transformations to your data.
The transformation filters work similarly to Django filters.
Example: Create a slug from the "field-name" field:
field-name | slugify
Example: Lowercase "some-other" field and use "other" if empty:
some-other | lowercase | default:'other'
Example: Specify that date field is 'MM-dd-yyyy':
any-date:'{"format":"MM-dd-yyyy"}'
The following transformations are allowed on content.
Transformation | Result |
---|---|
lower | Lowercase the contents |
upper | Uppercase the contents |
trim | Trim whitespace from the beginning and the end of contents |
concat | Concatenate the argument onto the end of the value |
default | Use the argument if the value is missing or empty |
slugify | Create a "slug" of the value given |
md5ify | Generate an md5 hash for the value given |
any-date | Convert the value to a the SSV standard date format. Argument can be a json defining how the string should be interpreted. Applicable to Date fields only |
cut | Remove all values of arg from the given string |
cut-re | Remove all values of the regular expression arg from the given string |
Below is an example of a date transformation.


Example Date transformation
Accepted Date Formats
The rule of thumb for date formats is that SSV accepts all Excel date formats. If you are unsure if your CSV file has an acceptable date, open it in Excel and format the date as an Excel formatted date.
Format | Example |
---|---|
MM/dd/yy | 12/25/18 |
MM/dd/yy HH:mm | 12/25/18 21:04 |
MM/dd/yyyy | 12/25/2018 |
MM/dd/yyyy HH:mm | 12/15/2018 21:04 |
yyyy-MM-dd'T'HH:mm:ss | 2018-12-25T21:04:43 |
yyyy-MM-dd'T'HH:mm:ssX | 2018-12-25T21:04:43Z |
yyyy-MM-dd'T'HH:mm:ssX | 2018-12-25T21:04:43-0000 |
yyyy-MM-dd'T'HH:mm:ssX | 2018-12-25T21:04:43-00 |
yyyy-MM-dd'T'HH:mm:ssX | 2018-12-25T21:04:43-00:00 |
yyyyMMdd | 20181225 |
dd-MM-yyyy | 25-12-2018 |
yyyy-MM-dd | 2018-12-25 |
yyyy/MM/dd | 2018/12/25 |
dd MMM yyyy | 25 Dec 2018 |
dd MMMM yyyy | 25 December 2018 |
yyyyMMddHHmm | 20181225210443 |
yyyyMMdd HHmm | 20181225 2104 |
dd-MM-yyyy HH:mm | 25-12-2018 21:04 |
yyyy-MM-dd HH:mm | 2018-12-25 21:04 |
yyyy/MM/dd HH:mm | 2018/12/15 21:04 |
dd MMM yyyy HH:mm | 25 Dec 2018 21:04 |
dd MMMM yyyy HH:mm | 25 December 2018 21:04 |
yyyyMMddHHmmss | 20181225210443 |
yyyyMMdd HHmmss | 20181225 210443 |
dd-MM-yyyy HH:mm:ss | 25-12-2018 21:04:43 |
yyyy-MM-dd HH:mm:ss | 2018-12-25 21:04:43 |
MM/dd/yyyy HH:mm:ss | 12/15/2018 21:04:43 |
yyyy/MM/dd HH:mm:ss | 2018/12/15 21:04:43 |
dd MMM yyyy HH:mm:ss | 25 Dec 2018 21:04:43 |
dd MMMM yyyy HH:mm:ss | 25 December 2018 21:04:43 |
EEE, dd MMM yyyy HH:mm:ss z | Thu, 25 Dec 2018 21:04:43 UTC |
Analysis
Once data has been uploaded and indexed the user can create an analysis.
Note on running analyses
Your account must have at least one existing configuration before you can run an analysis. Click on the Configurations tab on the top right of the Semantria Storage and Visualization window to access, create and edit configurations.


Uploaded data will appear on the collection tile under "data"
Click on Create First Analysis to create an analysis of data with no previously existing analyses.
Name the analysis with UTF-8 characters
There are three fields that must be addressed when creating an analysis:
- Analysis name: The name that will be assigned to the analysis
- Configuration: The Semantria configuration used to analyze the data (e.g. the Hospitality Industry Pack)
- Notes: Any information the user wants to confer about the analysis
The Start Analysis button will enable once an analysis name is assigned and a configuration is chosen.


Clicking on the the "Collection Information" icon reveals more information
Once the analysis has finished it is listed in the collection tile. If the user wants to see more information about the analyses they can click on the tile.


The collection details are listed under the "collection information" icon
Clicking on the information icon will bring up a detailed information card.
Updated 7 months ago