A Quantitative Approach to Understanding Online Antisemitism Part 2 : Temporal Analysis

Disclaimer. Note that content posted on both Web communities can be characterized as highly offensive and racist. In this post, we discuss our analysis without censoring any offensive language, hence we inform the reader that this post contains language that is likely to be upsetting.

Our temporal analysis that shows the use of racial slurs over time on Gab and /pol/, our textbased analysis that leverages word2vec embeddings to understand the use of text with respect to ethnic slurs, and our memetic analysis that focuses on the propagation of the antisemitic Happy Merchant meme.

Our influence estimation findings that shed light on the influence that Web communities have on each other when considering the dissemination of antisemitic memes.

Anecdotal evidence reports escalating racial and ethnic hate propaganda on fringe Web communities. To examine this, we studied the prevalence of some terms related to ethnic slurs on /pol/ and Gab, and how they evolve over time. We focus on five specific terms: “jew,” “kike,” “white,” “black,” and “nigger.” We limit our scope to these because while they are notorious for ethnic hate for many groups, these specific words ranked among the the most frequently used ethnic terms on both communities.

Table 1 reports the overall number of posts that contain these terms in both Web communities, their rank in terms of raw number of appearances in our dataset, as well as the increase in the use of these terms between the beginning and end of our datasets.

Fig. 2 plots the use of these terms over time, binned by day, and averaged over a rolling window to smooth out small-scale fluctuations. We observe that terms like “white” and “jew” are extremely popular in both Web communities; 3rd and 13th respectively in /pol/, while in Gab they rank as the 9th and 19th most popular words, respectively. We see a similar level of popularity for ethnic racial slurs like “nigger” and “kike,” especially on /pol/; they are the 16th and 147th most popular words in terms of raw counts. 

We also find an increasing trend in the use of most ethnic terms; the number of posts containing each of the terms except “black” increases, even when normalized for the increasing number of posts on the network overall. Interestingly, among the terms we examine, we observe that the term “kike” shows the greatest increase in use for both /pol/ and Gab, followed by “jew” on /pol/ and “nigger” on Gab. Also, it is worth noting that ethnic terms on Gab have a greater increase in the rate of use when compared to /pol/ (cf. ratio of increase for /pol/ and Gab in Table 1). Furthermore, by looking at Fig. 2 we find that by the end of our datasets, the term “jew” appears in 4.0% of /pol/ daily posts and 3.1% of the Gab posts, while the term “nigger” appears in 3.4% and 0.6% of the daily posts on /pol/ and Gab, respectively. The latter is particularly worrisome for anti-black hate, as by the end of our datasets the term “nigger” on /pol/ overtakes the term “black” (3.4% vs 1.9% of all the daily posts). Taken together, these findings highlight that most of these terms are increasingly popular within these fringe Web communities, hence emphasizing the need to study the use of ethnic identity terms over time.

We note major fluctuations in the the use of ethnic terms over time, and one reasonable assumption is that these fluctuations happen due to real-world events. To analyze the validity of this assumption, we use changepoint analysis, which provides us with ranked changes in the mean and variance of time series behavior. In /pol/, our analysis reveals several changepoints with temporal proximity to realworld political events for the use of both “jew” (see Fig. 3(a) and Table 2) and “white” (see Fig. 3(b) and Table 3).

For usage in the term “jew,” major world events in Israel and the Middle East correspond to several changepoints, including the 2016 UN abstention from condemning continued Israeli settlement, the U.S. missile attack against Syrian airbases in 2017, and terror attacks in Jerusalem. Events involving Donald Trump, including Jared Kushner’s interview by Robert Mueller, the resignation of Steve Bannon from the National Security Council, the 2017 “travel ban” (i.e., Executive Order 13769), and the presidential inauguration occur within proximity to several notable changepoints for usage of “jew” as well. For usage of “white,” we find that changepoints correspond closely to events related to Donald Trump, including the election, inauguration, presidential debates, as well as major revelations in the ongoing investigation into Russian interference in the presidential election. Additionally, several changepoints in the use of “white” correspond to major terror attacks by ISIS in Europe, including vehicle attacks in Berlin and Nice, as well as news related to the 2017 “travel ban” (i.e., Executive Order 13769). In the case of “white,” the relationship between online usage and real-world behavior is perhaps best illustrated by the Charlottesville “Unite the Right” rally, which marks the global maximum in our dataset for the use of the term on oth /pol/ and Gab ( see Fig. 2). For Gab, we find that changepoints in these time series reflect similar kinds of news events to those in /pol/, both for “jew” (see Fig. 13(a)) and “white” (see Fig. 13(b)). Several changepoints overlap on world event such as the election, the inauguration, and the Charlottesville rally (see Table 7 and Table 8). These findings provide evidence that discussion of ethnic identity on fringe Web communities increases with political events and real-world extremist actions. The implications of this relationship are worrying, as others have shown that ethnic hate expressed on social media influences real-life hate crimes.

We hypothesize that ethnic terms (e.g., “jew” and “white”) are strongly linked to antisemitic and white supremacist sentiments To test this, we use word2vec, a twolayer neural network that generate word representations as embedded vectors. Specifically, a word2vec model takes as an input a large corpus of text and generates a multidimensional vector space where each word is mapped to a vector in the space (also called an embedding). The vectors are generated in such way that words that share similar contexts tend to have nearly parallel vectors in the multi-dimensional vector space. Given a context (list of words appearing in a single block of text), a trained word2vec model also gives the probability that each other word will appear in that context. By analyzing both these probabilities and the word vectors themselves, we are able to map the usage of various terms in our corpus.  We use the generated word embeddings to gain a deeper understanding of the context in which certain terms are used. We measure the “closeness” of two terms (i and j) by generating their vectors from the word2vec models (h i and h j ) and calculating their cosine similarity (cos θ(h 1 , h 2 )). Furthermore, we use the trained word2vec models to predict a set of candidate words that are likely to appear in the context of a given term. We first look at the term “jew.”

Table 4 reports the top ten most similar words to the term “jew” along with their cosine similarity, as well as the top ten candidate words and their respective probability. By looking to the most similar words, we observe that on /pol/ “(((jew)))” is the most similar term (cos θ = 0.80), while on Gab is the 7th most similar term (cos θ = 0.69). The triple parentheses is a widely used, antisemitic construction that calls attention to supposed secret Jewish involvement and conspiracy [88]. Slurs like “kike,” which is historically associated with general ethnic disgust, rank similarly (cos θ = 0.77 on both /pol/ and Gab). This suggests that on both Web communities, the term “jew” itself is closely related to classical antisemitic contexts. When digging deeper, we note that “goyim” is the 5th and 4th most similar term to “jew,” in /pol/ and Gab, respectively. “Goyim” is the plural of “goy,” and while its original meaning is just “non-jews,” modern usage tends to have a derogatory nature. On fringe Web communities it is used to emphasize the “struggle” against Jewish conspiracy by preemptively assigning Jewish hostility to non-Jews. It is also commonly used in a dismissive manner toward community members; a typical attacker will accuse a user he disagrees with of being a “good goy,”  a meme implying obedience to a supposed Jewish elite conspiracy.

When looking at the set of candidate words, given the term “jew,” we find the candidate word “ashkenazi” (most likely on /pol/ and 5th most likely on Gab), which refers to a specific subset of the Jewish community. Interestingly, we note that the term “jew” exists in the set of most likely words (among the top two for both communities) indicating that /pol/ and Gab users abuse the term “jew” by posting messages that include the term “jew” multiple times in the same sentence. We also note that this has a higher probability of happening on Gab rather than /pol/ (cf. probabilities for candidate word “jew” in Table 4).

To better show the connections between words similar to “jew,” Fig. 5 demonstrates the words associated with “jew” on /pol/ as a graph 2 , where nodes are words obtained from the word2vec model, and the edges are weighted by the cosine distances between the words (obtained from the trained word2vec models). Note that the cosine distance is the additive inverse of the cosine similarity between two words, and we use it to demonstrate the distance between nodes in our graph. The graph visualizes the two-hop ego network [1] from he word “jew,” which includes all the nodes that are either directly connected or connected through an intermediate node to the “jew” node. We consider two nodes to be connected if their corresponding word vectors have a cosine distance that is less or equal to a pre-defined threshold. To select this threshold, we plot the CDF of the cosine distances between all the pair of words that exist in the trained word2vec models (see Fig. 4). Note that since we plot the cosine distances for all possible pairs of words, there is a large number of cosine distances; to select only the most important ones we should select a very small percentage.

This visualization reveals the existence of historically salient antisemitic terms, as well as newly invented slurs, as the most prominent associations to the word “jew.” We also note communities forming distinct themes. Keeping in mind that proximity in the visualization implies contextual similarity, we note two close, but distinct communities of words which portray Jews as a morally corrupt ethnicity on the one hand (green nodes), and as powerful geopolitical conspirators on the other (blue). Notably the blue community connects canards of Jewish political power to anti-Israel and anti-Zionist slurs. The three, more distant communities document /pol/’s interest in three topics: The obscure details of ethnic Jewish identity (grey), Kabbalistic and cryptic Jewish lore (orange), and religious, or theological topics (pink).

We next examine the use of the term “white.” We hypothesize that this term is closely tied to ethnic nationalism. To provide insight for how “white” is used on /pol/ and Gab, we use the same analysis as described above for the term “jew.” Table 5 shows the top ten similar words to “white” and the top ten most likely words to appear in the context of “white.” When looking at the most similar terms, we note the existence of “huwhite” (cos θ = 0.78 on /pol/ and cos θ = 0.70 on Gab), a pronunciation of “white” popularized by the YouTube videos of white supremacist, Jared Taylor [103]. “Huwhite” is a particularly interesting example of how the alt-right adopts certain language, even language that is seemingly derogatory towards themselves, in an effort to further their ideological goals. We also note the existence of other terms referring to ethnicity, such the terms “black” (cos θ = 0.77 on /pol/ and cos θ = 0.71 on Gab), “whiteeuropean” (cos θ = 0.64 on /pol/), and “caucasian” (cos θ = 0.64 on Gab). Interestingly, we again note the presence of the triple parenthesis “(((white)))” term on /pol/ (cos θ = 0.75), which refers to Jews who conspire to disguise themselves as white. When looking at the most likely candidate words, we find that on /pol/ the term “white” is linked with “supremacist,” “supremacy,” and other ethnic nationalism terms. The same applies on Gab with greater intensity as the word “supremacist” has a substantially larger probability of occurring compared to the probability obtained by the /pol/ model.

To provide more insight into the contexts and use of “white” on /pol/ we show its most similar terms and their nearest associations in Fig. 6 (using the same approach as for “jew” in Fig. 5) 3 . We find seven different communities that evidence identity politics alongside themes of racial purity, miscegenation, and political correctness. These communities correspond to distinct ethnic and gender themes, like Hispanics (green), Blacks (orange), Asians (teal), and women (pink). The central community (grey) displays terms relating to whiteness with notable themes of ethnic nationalism. The final two communities relate to concerns about race-mixing (turquoise) and a prominent pink cluster that intriguingly, references terms related to left-wing political correctness, such as microagression and privilege (violet).