Data is the content a user wants to analyze.
All data sets must have a minimum of two fields:
- Unique ID
Any additional fields are considered metadata.
Metadata is data about data. Metadata can be a standard field or a custom field. Standard fields are Dublin Core fields.
Custom fields are non-Dublin Core and require manual configuration by the user.
The list of Standard fields is as follows:
The author of a document
When the text document was created
August 15, 2019
Metrics on engagement with the document
Examples of Engagement could be number of comments, views or retweets
A unique document identifier, used by the API to organize documents
1, 2, 3, 4...
The written language of a document
English, Spanish, Portuguese
The parent element
If the document is a retweet, the Parent field could contain the ID of the original tweet. If the document is a comment, the Parent could be a link to or the ID of the original document.
An entity responsible for making the resource available
Examples of a Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity
The audience reach of a document (usually in the context of social media posts)
A related source from which the document is derived
Twitter, Amazon, New York Times
Identifying labels for a human being
A document comprising natural language feedback
A tweet, call log, or survey response
The title of a natural language document
Titles are frequently part of reviews by travelers, employees, and diners, e.g. "Fun Workplace Culture"
The nature or genre of the resource
Used to describe the file format, physical medium, or dimensions of the resource
A web link to the original natural language document
The table below is an example of a very small data set.
The data set has the minimum ID and text columns, as well as a single metadata column labeled Purchase Location.
I liked the product but found the build quality lacking. Would not recommend for serious users, casual use should be fine.
Product arrived broken. When I contacted support they immediately sent out a replacement that arrived in working condition. Great service!
I didn't like the product and tried to return it. Customer support was rude to me and only gave me store credit not a full refund.
Projects are powered by data. Each time a user uploads data it will be sorted into a collection. The collection organizes the data set and all of its analyses.
To start the collection upload process, press the button that says Upload Data.
In order to begin the data upload process the user must name the collection.
After the user names the collection they will be prompted to select a file for upload. Semantria Storage & Visualization works with .CSV and JSON files. After selecting the file, Semantria Storage & Visualization will scan the data set to find any Dublin Core columns.
It's now time for the user to set up the fields in the data set.
In the above example Semantria Storage & Visualization automatically identified three Dublin Core columns.
To map the remaining columns the user clicks on "Do not include" in the "Column categories" section and selects the standard field they wish to use. Otherwise, the user may simply key in a custom field name.
CSV and TSV files
If a user uploads a .csv file or a .tsv file and the columns do not look correct, they have the option to switch between comma and tab delimiters.
If the user decides to use a custom field they must specify the field type as one of the following:
- String: this is used to configure text data, where the column may contain titles to reviews, manager names, locations
- Integer: this is used to configure numerical data, where the column may contain customer IDs, star ratings, or product numbers
- Float: this is used to configure numerical data, where the column may contain decimals like prices or percentages
- Date: this is used to configure temporal data, where the column may contain dates
Once the user has finished mapping their data they may save the data to the collection.
The user can choose to map all of their columns or just the ones pertinent to their use case.
The only required fields are ID and Text.
When uploading fields you have the option to reformat the fields.
Clicking on the icon brings up the the advance column configurator. Here you can apply transformations to your data.
The transformation filters work similarly to Django filters.
Example: Create a slug from the "field-name" field:
field-name | slugify
Example: Lowercase "some-other" field and use "other" if empty:
some-other | lowercase | default:'other'
Example: Specify that date field is 'MM-dd-yyyy':
The following transformations are allowed on content.
Lowercase the contents
Uppercase the contents
Trim whitespace from the beginning and the end of contents
Concatenate the argument onto the end of the value
Use the argument if the value is missing or empty
Create a "slug" of the value given
Generate an md5 hash for the value given
Convert the value to a the SSV standard date format. Argument can be a json defining how the string should be interpreted. Applicable to Date fields only
Remove all values of arg from the given string
Remove all values of the regular expression arg from the given string
Below is an example of a date transformation.
The rule of thumb for date formats is that SSV accepts all Excel date formats. If you are unsure if your CSV file has an acceptable date, open it in Excel and format the date as an Excel formatted date.
dd MMM yyyy
25 Dec 2018
dd MMMM yyyy
25 December 2018
dd MMM yyyy HH:mm
25 Dec 2018 21:04
dd MMMM yyyy HH:mm
25 December 2018 21:04
dd MMM yyyy HH:mm:ss
25 Dec 2018 21:04:43
dd MMMM yyyy HH:mm:ss
25 December 2018 21:04:43
EEE, dd MMM yyyy HH:mm:ss z
Thu, 25 Dec 2018 21:04:43 UTC
Once data has been uploaded and indexed the user can create an analysis.
Note on running analyses
Your account must have at least one existing configuration before you can run an analysis. Click on the Configurations tab on the top right of the Semantria Storage and Visualization window to access, create and edit configurations.
Click on Create First Analysis to create an analysis of data with no previously existing analyses.
There are three fields that must be addressed when creating an analysis:
- Analysis name: The name that will be assigned to the analysis
- Configuration: The Semantria configuration used to analyze the data (e.g. the Hospitality Industry Pack)
- Notes: Any information the user wants to confer about the analysis
The Start Analysis button will enable once an analysis name is assigned and a configuration is chosen.
Once the analysis has finished it is listed in the collection tile. If the user wants to see more information about the analyses they can click on the tile.
Clicking on the information icon will bring up a detailed information card.
Updated 7 months ago