The Concept Lab: Viewer App

Version: ‘October’

2019-08-28

Introduction

This is a guide to using the Shiny web ‘Viewer’ application created as part of The Concept Lab project. The purpose of the app is to visualize and explore the architecture of concepts inferred from large text corpora by means of statistical measures of the co-association of words in the text.

This document refers to the ‘October’ version of the app, completed in October 2018. A paper describing in full the natural language processing methods and some of the implementation details is available in the proceedings of IWCS 2017.1 Nulty, Paul. (2017). “Network Visualizations for Exploring Political Concepts”.Proceedings of the 12th International Conference on Computational Semantics (IWCS). For a more theoretical perspective on applying these methods to the study of the history of concepts, see the Concept Lab’s article published in Contributions to the History of Concepts.2 Distributional Concept Analysis: A Computational Model for Parsing Conceptual Forms. de Bolla, P., Jones, E., Recchia, G., Regan, J., & Nulty, P. (2019). Contributions to the History of Concepts

The app is now hosted on Cambridge University Library servers. Older versions were hosted on Amazon Web Services and served through port 3838. On some public wireless networks, this port may be restricted — if the app fails to load try to access it from an internet connection that does not restrict this port.

The app pane is composed of a sidebar (on the left) and a main panel, with a tab menu along the top of the screen to switch between panels showing different aspects of the app.

When the app is first opened in a browser, the sidebar and Configuration Panel are displayed.

Configuration Panel

The Configuration Panel is the main screen from which the dataset and several universal preferences are selected. When the app is opened, the ECCO dataset will be loaded by default. This takes a few seconds, and when it is complete the sidebar text display will show ‘Co-occurrence counts loaded’ as well as several properties of the loaded data (see Figure ). The bottom of the sidebar shows the data file name from which the current co-occurrence counts are loaded, in this case ‘ECCO_100_dist_100_cut_10_2’. Also available are two sets of data constructed from libertarian and socialist text from reddit.

Configuration Panel

Configuration Panel

From the top down, the options available in the configuration panel are as follows:

DPF (Distributional Probability Factor) is a measure similar to pointwise mutual information, with an extra parameter to downweight the score of very infrequent words. By default the log-dpf option is selected, as the association scores calclated by DPF tend to have a power-law distribution

Network Panels

The following options are only displayed in the side panel when one of the Network Visualisation panels is selected.

Exporting images

In the 2D network visualisation panels, a button labeled “Export as png” is available in the bottom right. The default filename of the downloaded image encodes the parameters used as follows:

keywords-dataset-distance-measure-threshold-rank-steps-pruned-concrete

For example in the filename democracy-prorogued-ecco-100-log-dpf-2.6-20-1-2-none.png, the final part (2-none) indicates the nodes with fewer than two links are pruned, and concrete words are not filtered out. If you check the “filter concrete words” box, this “none” will be replaced by the abstract/concrete filter threshold (default 4.5, i.e. exclude words that are > 4.5 on a five point scale from abstract (1) to concrete (5).

Diff view

This pane compares two search terms with the following method: The common items from the lists of word1 and word2 are retrieved, and score_diff shows the score for each term in word2 subtracted from the score for the same term in word1. The result should be that words more associated with word1 get a higher score.

Shortest Path

This pane shows the network plot of shortest route between two nodes. Exactly two search terms must be specified.

Centrality

This tab shows a table of nodes ranked by their centrality score in the co-occurrence ntworek specified by the dataset, thresholds, options, and search terms specified in the sidebar and configuration pane.