A Quantitative Approach to Understanding Online Antisemitism Part 3: Meme Analysis

Disclaimer. Note that content posted on both Web communities can be characterized as highly offensive and racist. In this post, we discuss our analysis without censoring any offensive language, hence we inform the reader that this post contains language that is likely to be upsetting.

In addition to hateful terms, memes also play a well documented role in the spread of propaganda and ethnic hate in Web communities. To detail how memes spread and how different Web communities influence one another with memes, our previous research established a pipeline which automatically collects, annotates, and analyzes over 160M memes from over 2.6B posts from from Web communities; Reddit, /pol/, Gab, and Twitter. Within Reddit, we pay particular attention to The Donald subreddit (The Donald), a Trump supporting subreddit which notoriously propagates hateful memes and propaganda. In a nutshell, we use perceptual hashing and clustering techniques to track and analyze the propagation of memes across multiple Web communities. To achieve this, we rely on images obtained from the Know Your Meme (KYM) site, which is a comprehensive encyclopedia of memes. In this work, we use this pipeline to study how antisemitic memes spread within and between these Web communities, and examine which communities are the most influential in their spread. To do this, we additionally examine two mainstream Web communities, Twitter and Reddit, and compare their influence (with respect to memes) with /pol/ and Gab. Specifically, we focus on the Happy Merchant meme illustrated in Fig. 1, which is an especially important hate-meme to study in this regard for several reasons. First, it represents an unambiguous instance of antisemitic hate, and second, it is extremely popular and diverse in fringe Web communities like /pol/ and Gab.

First, we aim to assess the popularity and increase of use over time of the Happy Merchant meme on /pol/ and Gab. Fig. 7 shows the number of posts that contain images with the Happy Merchant meme for every day of our /pol/ and Gab dataset. 4 We further note that the numbers here represent a lower bound on the number of Happy Merchant postings: our image processing pipeline is conservative and only labels clusters that are unambiguously Happy Merchant; variations of other memes that incorporate the Happy Merchant are harder to assess. We observe that /pol/ consistently shares antisemitic memes over time, whereas on Gab we note a substantial and sudden increase in posts containing Happy Merchant memes immediately after the Charlottesville rally. Our findings on Gab dramatically illustrate the implication that real world eruptions of antisemitic behavior can catalyze the acceptability and popularity of antisemitic memes on other Web communities. Taken together, these findings highlight that both communities are exploited by users to disseminate racist content that is targeted towards the Jewish community.

Another important step in examining the Happy Merchant meme is to explore how clusters of similar Happy Merchant memes relate to other meme clusters in our dataset. One possibility is that Happy Merchants make-up a unique family of memes, which would suggest that they segregate in form and shape from other memes. Given that many memes evolve from one another, a second possibility is that Happy Merchants “infect” other common memes. This could serve, for instance, to make antisemitism more accessible and common. To this end, we visualize in Fig. 8 a subset of the meme clusters, which we annotate using our KYM dataset, and a Happy Merchant version of each meme. This demonstrates numerous instances of the Happy Merchant infecting well-known and popular memes. 

Influence Estimation.

While the growth and diversity of the Happy Merchant within fringe Web communities is a cause of significant concern, a critical question remains: How do we chart the influence of Web communities on one another in spreading the Happy Merchant? We have, until this point, examined the expanse of antisemitism on individual, fringe Web communities. Memes however, develop with the purpose to replicate and spread between different Web communities. To examine the influence of meme spread between Web communities, we employ Hawkes processes, which can be exploited to measure the predicted, reciprocal influence that various Web communities have to each other. We fit Hawkes models for all of our annotated clusters and report the influence in two ways as in. First, we report the percentage of events expected to be attributable from a source community to a destination community in Fig. 9. In other words, this shows the percentage of memes posted on one community which, in the context of our model, are expected to occur in direct response to posts in the source community. We can thus interpret this percentage in terms of the relative influence of meme postings one network on another. We also report influence in terms of efficacy by normalizing the influence that each source community has, relative to the total number of memes they post (Fig. 10). We compare the influence that Web communities exert on one another for the Jewish-related Happy Merchant memes (HM) and all other memes (OM) in the graph. To assess the statistical significance of the results, we perform two-sample Kolmogorov-Smirnov tests that compare the distributions of influence from the Happy Merchant and other memes; an asterisk within a cell denotes that the distributions of influence between the source and destination platform have statistically significant differences (p < 0.01). Our results show that /pol/ is the single most influential community for the spread of memes to all other Web communities. Interestingly, the influence that /pol/ exhibits in the spread of the Happy Merchant surpasses its influence in the spread of other memes. However, although /pol/’s overall influence is higher on these networks, its per-meme efficacy for the spread of antisemitic memes tended to be lower relative to non-antisemitic memes with one intriguing exception of The Donald. Another interesting feature we observe about this trend is that memes on /pol/ itself show little influence from other Web communities; both in terms of memes generally, and non-antisemitic memes in particular. This suggests a unidirectional meme flow and influence from /pol/ and furthermore, suggest that /pol/ acts as a primary reservoir to incubate and transmit antisemitism to downstream Web communities.

A Quantitative Approach to Understanding Online Antisemitism Part 2 : Temporal Analysis

Disclaimer. Note that content posted on both Web communities can be characterized as highly offensive and racist. In this post, we discuss our analysis without censoring any offensive language, hence we inform the reader that this post contains language that is likely to be upsetting.

Our temporal analysis that shows the use of racial slurs over time on Gab and /pol/, our textbased analysis that leverages word2vec embeddings to understand the use of text with respect to ethnic slurs, and our memetic analysis that focuses on the propagation of the antisemitic Happy Merchant meme.

Our influence estimation findings that shed light on the influence that Web communities have on each other when considering the dissemination of antisemitic memes.

Anecdotal evidence reports escalating racial and ethnic hate propaganda on fringe Web communities. To examine this, we studied the prevalence of some terms related to ethnic slurs on /pol/ and Gab, and how they evolve over time. We focus on five specific terms: “jew,” “kike,” “white,” “black,” and “nigger.” We limit our scope to these because while they are notorious for ethnic hate for many groups, these specific words ranked among the the most frequently used ethnic terms on both communities.

Table 1 reports the overall number of posts that contain these terms in both Web communities, their rank in terms of raw number of appearances in our dataset, as well as the increase in the use of these terms between the beginning and end of our datasets.

Fig. 2 plots the use of these terms over time, binned by day, and averaged over a rolling window to smooth out small-scale fluctuations. We observe that terms like “white” and “jew” are extremely popular in both Web communities; 3rd and 13th respectively in /pol/, while in Gab they rank as the 9th and 19th most popular words, respectively. We see a similar level of popularity for ethnic racial slurs like “nigger” and “kike,” especially on /pol/; they are the 16th and 147th most popular words in terms of raw counts. 

We also find an increasing trend in the use of most ethnic terms; the number of posts containing each of the terms except “black” increases, even when normalized for the increasing number of posts on the network overall. Interestingly, among the terms we examine, we observe that the term “kike” shows the greatest increase in use for both /pol/ and Gab, followed by “jew” on /pol/ and “nigger” on Gab. Also, it is worth noting that ethnic terms on Gab have a greater increase in the rate of use when compared to /pol/ (cf. ratio of increase for /pol/ and Gab in Table 1). Furthermore, by looking at Fig. 2 we find that by the end of our datasets, the term “jew” appears in 4.0% of /pol/ daily posts and 3.1% of the Gab posts, while the term “nigger” appears in 3.4% and 0.6% of the daily posts on /pol/ and Gab, respectively. The latter is particularly worrisome for anti-black hate, as by the end of our datasets the term “nigger” on /pol/ overtakes the term “black” (3.4% vs 1.9% of all the daily posts). Taken together, these findings highlight that most of these terms are increasingly popular within these fringe Web communities, hence emphasizing the need to study the use of ethnic identity terms over time.

We note major fluctuations in the the use of ethnic terms over time, and one reasonable assumption is that these fluctuations happen due to real-world events. To analyze the validity of this assumption, we use changepoint analysis, which provides us with ranked changes in the mean and variance of time series behavior. In /pol/, our analysis reveals several changepoints with temporal proximity to realworld political events for the use of both “jew” (see Fig. 3(a) and Table 2) and “white” (see Fig. 3(b) and Table 3).

For usage in the term “jew,” major world events in Israel and the Middle East correspond to several changepoints, including the 2016 UN abstention from condemning continued Israeli settlement, the U.S. missile attack against Syrian airbases in 2017, and terror attacks in Jerusalem. Events involving Donald Trump, including Jared Kushner’s interview by Robert Mueller, the resignation of Steve Bannon from the National Security Council, the 2017 “travel ban” (i.e., Executive Order 13769), and the presidential inauguration occur within proximity to several notable changepoints for usage of “jew” as well. For usage of “white,” we find that changepoints correspond closely to events related to Donald Trump, including the election, inauguration, presidential debates, as well as major revelations in the ongoing investigation into Russian interference in the presidential election. Additionally, several changepoints in the use of “white” correspond to major terror attacks by ISIS in Europe, including vehicle attacks in Berlin and Nice, as well as news related to the 2017 “travel ban” (i.e., Executive Order 13769). In the case of “white,” the relationship between online usage and real-world behavior is perhaps best illustrated by the Charlottesville “Unite the Right” rally, which marks the global maximum in our dataset for the use of the term on oth /pol/ and Gab ( see Fig. 2). For Gab, we find that changepoints in these time series reflect similar kinds of news events to those in /pol/, both for “jew” (see Fig. 13(a)) and “white” (see Fig. 13(b)). Several changepoints overlap on world event such as the election, the inauguration, and the Charlottesville rally (see Table 7 and Table 8). These findings provide evidence that discussion of ethnic identity on fringe Web communities increases with political events and real-world extremist actions. The implications of this relationship are worrying, as others have shown that ethnic hate expressed on social media influences real-life hate crimes.

We hypothesize that ethnic terms (e.g., “jew” and “white”) are strongly linked to antisemitic and white supremacist sentiments To test this, we use word2vec, a twolayer neural network that generate word representations as embedded vectors. Specifically, a word2vec model takes as an input a large corpus of text and generates a multidimensional vector space where each word is mapped to a vector in the space (also called an embedding). The vectors are generated in such way that words that share similar contexts tend to have nearly parallel vectors in the multi-dimensional vector space. Given a context (list of words appearing in a single block of text), a trained word2vec model also gives the probability that each other word will appear in that context. By analyzing both these probabilities and the word vectors themselves, we are able to map the usage of various terms in our corpus.  We use the generated word embeddings to gain a deeper understanding of the context in which certain terms are used. We measure the “closeness” of two terms (i and j) by generating their vectors from the word2vec models (h i and h j ) and calculating their cosine similarity (cos θ(h 1 , h 2 )). Furthermore, we use the trained word2vec models to predict a set of candidate words that are likely to appear in the context of a given term. We first look at the term “jew.”

Table 4 reports the top ten most similar words to the term “jew” along with their cosine similarity, as well as the top ten candidate words and their respective probability. By looking to the most similar words, we observe that on /pol/ “(((jew)))” is the most similar term (cos θ = 0.80), while on Gab is the 7th most similar term (cos θ = 0.69). The triple parentheses is a widely used, antisemitic construction that calls attention to supposed secret Jewish involvement and conspiracy [88]. Slurs like “kike,” which is historically associated with general ethnic disgust, rank similarly (cos θ = 0.77 on both /pol/ and Gab). This suggests that on both Web communities, the term “jew” itself is closely related to classical antisemitic contexts. When digging deeper, we note that “goyim” is the 5th and 4th most similar term to “jew,” in /pol/ and Gab, respectively. “Goyim” is the plural of “goy,” and while its original meaning is just “non-jews,” modern usage tends to have a derogatory nature. On fringe Web communities it is used to emphasize the “struggle” against Jewish conspiracy by preemptively assigning Jewish hostility to non-Jews. It is also commonly used in a dismissive manner toward community members; a typical attacker will accuse a user he disagrees with of being a “good goy,”  a meme implying obedience to a supposed Jewish elite conspiracy.

When looking at the set of candidate words, given the term “jew,” we find the candidate word “ashkenazi” (most likely on /pol/ and 5th most likely on Gab), which refers to a specific subset of the Jewish community. Interestingly, we note that the term “jew” exists in the set of most likely words (among the top two for both communities) indicating that /pol/ and Gab users abuse the term “jew” by posting messages that include the term “jew” multiple times in the same sentence. We also note that this has a higher probability of happening on Gab rather than /pol/ (cf. probabilities for candidate word “jew” in Table 4).

To better show the connections between words similar to “jew,” Fig. 5 demonstrates the words associated with “jew” on /pol/ as a graph 2 , where nodes are words obtained from the word2vec model, and the edges are weighted by the cosine distances between the words (obtained from the trained word2vec models). Note that the cosine distance is the additive inverse of the cosine similarity between two words, and we use it to demonstrate the distance between nodes in our graph. The graph visualizes the two-hop ego network [1] from he word “jew,” which includes all the nodes that are either directly connected or connected through an intermediate node to the “jew” node. We consider two nodes to be connected if their corresponding word vectors have a cosine distance that is less or equal to a pre-defined threshold. To select this threshold, we plot the CDF of the cosine distances between all the pair of words that exist in the trained word2vec models (see Fig. 4). Note that since we plot the cosine distances for all possible pairs of words, there is a large number of cosine distances; to select only the most important ones we should select a very small percentage.

This visualization reveals the existence of historically salient antisemitic terms, as well as newly invented slurs, as the most prominent associations to the word “jew.” We also note communities forming distinct themes. Keeping in mind that proximity in the visualization implies contextual similarity, we note two close, but distinct communities of words which portray Jews as a morally corrupt ethnicity on the one hand (green nodes), and as powerful geopolitical conspirators on the other (blue). Notably the blue community connects canards of Jewish political power to anti-Israel and anti-Zionist slurs. The three, more distant communities document /pol/’s interest in three topics: The obscure details of ethnic Jewish identity (grey), Kabbalistic and cryptic Jewish lore (orange), and religious, or theological topics (pink).

We next examine the use of the term “white.” We hypothesize that this term is closely tied to ethnic nationalism. To provide insight for how “white” is used on /pol/ and Gab, we use the same analysis as described above for the term “jew.” Table 5 shows the top ten similar words to “white” and the top ten most likely words to appear in the context of “white.” When looking at the most similar terms, we note the existence of “huwhite” (cos θ = 0.78 on /pol/ and cos θ = 0.70 on Gab), a pronunciation of “white” popularized by the YouTube videos of white supremacist, Jared Taylor [103]. “Huwhite” is a particularly interesting example of how the alt-right adopts certain language, even language that is seemingly derogatory towards themselves, in an effort to further their ideological goals. We also note the existence of other terms referring to ethnicity, such the terms “black” (cos θ = 0.77 on /pol/ and cos θ = 0.71 on Gab), “whiteeuropean” (cos θ = 0.64 on /pol/), and “caucasian” (cos θ = 0.64 on Gab). Interestingly, we again note the presence of the triple parenthesis “(((white)))” term on /pol/ (cos θ = 0.75), which refers to Jews who conspire to disguise themselves as white. When looking at the most likely candidate words, we find that on /pol/ the term “white” is linked with “supremacist,” “supremacy,” and other ethnic nationalism terms. The same applies on Gab with greater intensity as the word “supremacist” has a substantially larger probability of occurring compared to the probability obtained by the /pol/ model.

To provide more insight into the contexts and use of “white” on /pol/ we show its most similar terms and their nearest associations in Fig. 6 (using the same approach as for “jew” in Fig. 5) 3 . We find seven different communities that evidence identity politics alongside themes of racial purity, miscegenation, and political correctness. These communities correspond to distinct ethnic and gender themes, like Hispanics (green), Blacks (orange), Asians (teal), and women (pink). The central community (grey) displays terms relating to whiteness with notable themes of ethnic nationalism. The final two communities relate to concerns about race-mixing (turquoise) and a prominent pink cluster that intriguingly, references terms related to left-wing political correctness, such as microagression and privilege (violet).

A Quantitative Approach to Understanding Online Antisemitism : Part 1 Intro

A new wave of growing antisemitism, driven by fringe Web communities, is an increasingly worrying presence in the socio-political realm. The ubiquitous and global nature of the Web has provided tools used by these groups to spread their ideology to the rest of the Internet. Although the study of antisemitism and hate is not new, the scale and rate of change of online data has impacted the efficacy of traditional approaches to measure and understand this worrying trend.

In our latest paper, we present a large-scale, quantitative study of online antisemitism. We collect hundreds of million comments and images from alt-right Web communities like 4chan’s Politically Incorrect board  /pol/) and the Twitter clone, Gab. Using scientifically grounded methods, we quantify the escalation and spread of antisemitic memes and rhetoric across the Web. We find the frequency of antisemitic content greatly increases (in some cases more than doubling) after major political events such as the 2016 US Presidential Election and the “Unite the Right” rally in Charlottesville. Furthermore, this antisemitism appears in tandem with sharp increases in white ethnic nationalist content on the same communities. We extract semantic embeddings from our corpus of posts and demonstrate how automated techniques can discover and categorize the use of antisemitic terminology. We additionally examine the prevalence and spread of the antisemitic “Happy Merchant” meme, and in particular how these fringe communities influence its propagation to more mainstream services like Twitter and Reddit.

Taken together, our results provide a data-driven, quantitative framework for understanding online antisemitism. Our open and scientifically grounded methods serve as a framework to augment current qualitative efforts by anti-hate groups, providing new insights into the growth and spread of anti-semitism online.

We present an open, scientifically rigorous framework for quantitative analysis of online antisemitism. Our methodology is transparent, and our data will be made available upon request. Using this approach, we characterize the rise of online antisemitism across several axes.
More specifically we answer the following research questions:

Has there been a rise in online antisemitism, and if so, what is the trend?

How is online antisemitism expressed, and how can we automatically discover and categorize newly emerging antisemitic language?

How are memes being weaponized to produce easily digestible and shareable antisemitic ideology?

To what degree are fringe communities influencing the rest of the Web in terms of spreading antisemitic propaganda?

We answer these questions by analyzing a dataset of over 100 million posts from two fringe Web communities: 4chan’s Politically Incorrect board (/pol/) and Gab 1 . We train models, which incorporate continuous bag of words models, using the posts on these Web communities to gain an understanding, and discovery of new antisemitic terms. Our analysis reveals thematic communities of derogatory slang words, nationalistic slurs, and religious hatred toward Jews. We analyze almost seven million images using an image processing pipeline we previously developed to quantify the prevalence and diversity of the notoriously antisemitic Happy Merchant meme.

We find that the Happy Merchant enjoys substantial popularity in both communities, and its usage overlaps with other general purpose (i.e. not intrinsically antisemitic) memes. Finally, we model the relative influence of several fringe and mainstream communities with respect to dissemination of the Happy Merchant meme.

The next several posts will highlight these findings.

Digital Anti-Semitism: From Irony to Ideology – Jewish Review of Books

Professor Gavriel Rosenfeld  wrote an excellent article on The Jewish Review of Books on digital antisemitism, citing our recent publication

As the American Jewish community mourns for the martyrs of the Tree of Life synagogue and assesses the historical significance of their murders, it should be under no illusions about the dangers it faces. The history of domestic lone wolf attacks, the ease of online anti-Semitic self-radicalization, and the ubiquity of firearms in the United States are a toxic combination that must be monitored with unprecedented vigilance.

Read his article here