Topics in AO3 Detroit: Become Human fics
What are some topics that appear in D:BH fics?
(Yes, there is smut. There is always smut.)
This was a little exercise I did to try preprocessing for topic modelling (specifically, Latent Dirichlet Allocation, or LDA) on a form of text (long-form fiction) that I typically don’t work with in my research. The results aren’t too exciting but they’re largely interpretable (which is good), so I’m sharing them here. Full blog post detailing preprocessing here.
Interactive visualisation is here
LDA doesn’t generate topic labels for the user, so here are some topic labels I subjectively came up with based on the output (click to skip to table below). LDA doesn’t promise interpretable topics - it can’t take into account syntax, for one, and junk/mixed topics may arise: we can see this happening with some smaller topics like topic 40. Yeah, “Is this Gavin Reed?” is actually really a question from me.
Details about the visualisation
I built the LDA model with Gensim and the visualisation was created using pyLDAvis.
Some things to note, presented in a rather hand-wavey way:
- Circle size: prevalance of topic in corpus. You’ll notice that topics 1 and 2 are relatively larger than the rest and are also kind of more general. This is an artefact of a choice by me to follow this research paper. As the authors observed it has the nice effect of putting corpus (in this case, DBH)-specific common words into just a couple of frequent topics (i.e., topics 1 and 2).
- Distances between circles: topics close in semantic meaning should appear closer together. This is achieved via dimensionality reduction. I wouldn’t read too much into the actual distances, but the relative distances may be interesting (e.g. topics 24 and 31 - on the bedroom and bathroom - are relatively close).
- List of terms as you click through each topic: useful words for trying to make sense on the topic.
- The blue bars next to each term: overall frequency of the term in the corpus (e.g. ‘humans’ has a blue bar that almost hits 50k).
- The sliding bar on the top (relevance metric, with adjustable lambda λ): The original research paper suggests setting it at 0.6 for optimal human interpretation. But in any case, you definitely can play around with it. Setting it at 1 gives you the default output from LDA, and setting it at 0 sets it to really topic-specific words (notice the change in proportion between the red:blue bars).
- Hovering over a term: To see in what other topics the term appears in (circle size proportional to term-specific frequencies in corpus). A term might appear in different topics because words may have more than one meaning (e.g. duck - a bird, or to lower one’s body).
Suggested topic labels
Topic Labels | Top 5 words, lambda=.6 | |
---|---|---|
1 | General language | fact, idea, reason, thinking, question |
2 | Facial expressions | gaze, expression, spoke, glanced, continued |
3 | Body parts | neck, throat, teeth, weight, palm |
4 | Interactive actions | says, looks, nods, asks, turns |
5 | Movements | walked, opened, stopped, grabbed, walking |
6 | Android processing | model, information, data, humans, models |
7 | Sadness/negative emotions | tears, hurts, crying, pain, please |
8 | DPD | desk, coffee, office, precinct, chair |
9 | Places | tree, walls, light, trees, windows |
10 | Smut (past tense) | cock, hips, moaned, tongue, moan |
11 | Revolution | humans, leader, revolution, news, everyone |
12 | Crime scene | scene, crime, victim, crime_scene, police |
13 | Actions inwards/towards | feels, tries, takes, hears, knows |
14 | Security/law enforcement | elevator, security, guards, building, ship |
15 | Darkness/death (supernatural/spiritual) | world, death, blood, soul, fear |
16 | Aggressive actions | hissed, growled, glared, snapped, slammed |
17 | Reflective actions | says, knows, wants, thinks, feels |
18 | Guns | gun, bullet, shot, ground, shoot |
19 | Smut (present tense) | cock, hips, moans, thighs, dick |
20 | The Mission (the Connor topic?) | missions, deviants, machine, lieutenant, deviancy |
21 | Swearing/casual language | fucking, guy, hell, asshole, ass |
22 | Physical affection actions | love, kiss, kissed, kissing, cheek |
23 | Non-physical affection actions | smiled, laughed, sighed, chuckled, grinned |
24 | Sleeping | bed, couch, sleep, bedroom, morning |
25 | Food | food, kitchen, eat, table, breakfast |
26 | Family | boy, child, girl, mother, kid |
27 | Medical care | hospital, doctor, wound, pain, ambulance |
28 | Android errors | memory, error, system, systems, instability |
29 | Celebrations | party, dress, crowd, suit, stage |
30 | Thirium/biocomponents | thirium, pump, thirium_pump, regulator, damaged |
31 | Bathroom | bathroom, clothes, shower, water, shirt |
32 | Vehicles | car, seat, road, drive, parking |
33 | Alcohol | drink, bar, bottle, beer, alcohol |
34 | Phone communication | phone, text, cigarette, call, message |
35 | Stores | box, store, shop, flowers, book |
36 | Cold weather | snow, rain, park, bench, weather |
37 | Pets | dog, cat, fur, dogs, tail |
38 | Art | brother, painting, paint, art, canvas |
39 | Android stress | stress, levels, stress_levels, stress_level, level |
40 | Is this Gavin Reed? | detective, ace, angel, demon, robo |
41 | Music | music, song, dance, piano, singing |
42 | Sea | water, fish, beach, pool, sand |
43 | ABO | sex, alpha, pole, scent, smell |
44 | AU? | knife, blade, fire, rope, flames |