This workshop discusses Laura Nelson’s (Northeastern) work in development, which applies computational methods to historical records to get a sense of what gets remembered. Today’s discusses focuses on what is remembered from the history of the women’s rights movement.
all right and
okay all right
i think we’re uh live now uh all right charlie do you want to uh
introduce us get us going yeah hi everyone i’m charlie gomez uh my co-hosts here are joe
cohen in hongwei zhu and we have laura nelson from northeastern university and she will be presenting her uh work
in progress uh entitled and the rest is history and i was actually just commenting what a great title
that is for someone who can’t come up with titles to save themselves on any paper so i that is uh i i really appreciate that
it’s a rare skill um i’m gonna so uh laura is an assistant professor of sociology at northeastern
university uh where she’s part of the core faculty at the nu lab for text maps and networks
she’s also affiliated at the network science institute at northeastern and is on the executive committee for
women’s gender and sexuality studies uh prior to northeastern she was a postdoctoral fellow at the berkeley
institute for data science and digital humanities um and for and at the management
organizations department at northwestern university where she was also affiliated with the nico the northwestern institute on
complex systems so she uses a computational tools primarily text analysis to study social
movements culture gender institutions and organizations her work can be found in mobilization gender and
society sociological methods and research sociological methodology as well as the oxford university press
among other outlets and she gives many talks and does many workshops on computational social science here
and abroad um and we are really excited to hear uh your new work and um this work is
sort of um it’s a work in progress so we’re also eager to kind of workshop some of the
ideas that that you have around there so everyone you know uh we are a very friendly bunch we we like to kind of uh get
get get our hands dirty so um we were definitely excited to give her to give laura some great feedback on on
her work in progress great thank you thank you for that introduction thank you for having me i’m
i’m really excited about this let me share oh i don’t have permissions oh i do have permissions to share my
screen i’ll be there in about 15 minutes sweetie um all right so i want to talk
today about history unsurprisingly uh so most of you probably recognize
Rosa Parks
this picture and this person this is of course rosa parks who is well known for kind of kick-starting the modern
civil rights movement when she refused to give up her seat on the bus to a white man um and the bus the
montgomery street the montgomery bus boycott followed which was the initial event that we
they commonly say that started the civil rights movement so this picture of course is not a a real picture of the event it’s a
stage picture it was staged and taken after the event in order to commemorate the event which was a common occurrence
during this era that there’s an interesting story of that white guy sitting behind her which is fun to look up if you’re interested but this is a
picture that i think a lot of people would recognize and perhaps a lot of people would think that this was the actual event but
it’s it’s not it’s a staged event and this motivates kind of my interest in history and how we remember history
and how we think about history um i think it’s a good example of kind of how history is made
and um remembered by people so most of you probably recognize rosa
parks some of you may recognize this woman but maybe not all of you this is racy taylor
so rosa parks actually got her start in civil rights organizing in the anti-sexual harassment
movement so she did a lot of work on the harassment and rape cases of black women
in the south and racy taylor was one of her big early cases she was sexually assaulted
pretty gruesomely by a group of white men and rosa parks took up her case
and that got her really well known in the area for her work there there’s books written about her now but
books about racy taylor now but she’s not as well known as rosa parks in the role in the civil rights movement um and most
of you probably don’t recognize that third picture claudette colvin so she was a 15 year old girl and did the same
thing rosa parks did a few years earlier she refused to give up her seat on the bus and of course we don’t commemorate her
the way we do rosa parks and there was hundreds if not thousands of other people who did the exact same
act that rosa parks did got arrested for it and haven’t been kind of enshrined in our historical memory the way that rosa
parks had so this is generally how history works some events are simply more important to
history some events get enshrined in history some events get in events people and ideas get enshrined in
our historical or history books and some don’t some some are kind of lost to the historical
archives or dustbin of history if we want to go marcus here so my question today and this
What gets written into history books
kind of motivating question for this new project i’m working on is what gets written into the history
books and as charles was saying this is very preliminary i’m i’ve been wanting to do
this project for years so i’m excited that i have the the time and opportunity to do it now so i’m basically going to present
a bunch of descriptive stuff descriptive statistics that i um have and some descriptive text
stuff and i’m hoping to really just brainstorm with you guys about the next steps
what’s interesting to you and what’s not what what i could possibly do going forward to kind of shape this into
a more systematic not systematic but into a project that makes sense for sociology
the sociology of knowledge and our understanding of women’s movements over time and how they they get remembered so that that’s where
i am and i’m very happy to be interrupted if something sticks out to you that’s really what i want to get here is if you
see something and you’re like wow that’s fascinating i want to know that because that’s where that’s kind of what will shape
how i go forward me and um i’m working with two graduate students on this so they’re very eager to hear what you
think of this here all righty so the way we get history
How we get history
today is probably not through history books if we’re thinking outside of the school format
it’s probably google so if you’re interested in learning the history or you’re reminding yourself about a
historical thing that you want to put in a paper you probably put it into google or another search engine
and more recently what we’ve been seeing is wikipedia tends to be the first hit so
wikipedia is the first hit here and it gives us some key facts
rosa parks was in the civil rights movement she’s known for the montgomery bus boycott her occupation as a civil
rights activist etc and even more so and this is i don’t know past maybe a year or two google has
been actually putting like a pull-out box on the right side of your search screen
that pulls information from wikipedia and kind of puts it there as the authoritative
source on the subject that you’re searching for this doesn’t work for everything you search for but a lot of time that’s what you get
and not only does it just kind of pull out the images and the screenshot of it you know you start to see some
things oh she’s the first lady of civil rights the mother of the freedom movement so if you’re searching for history this
is what you see right away on google you see wikipedia and then you see this pull out of wikipedia with all of these
details on it so not surprisingly i am very interested in wikipedia what
What is Wikipedia doing
is wikipedia doing how is it conveying these historical movements if people go to wikipedia
to find information about history what are they seeing and does it match up with the way
history was actually created was it doesn’t match up with what the people who were involved in creating those
movements and moments were thinking is important what they were fighting for how does that translation process happen
and you know the bigger questions are how does that impact the way we understand history think about
history etc so there’s a lot of really interesting questions i think that we can dig into with wikipedia
and wikipedia data they’re great they’re open with their data they release it all so it’s a really fantastic resource for
a lot of computational social science work um coming out recently and for
for about 10 years now and the other thing is i have a clarifying question so um i would imagine rosa parks obviously probably
has a very extensive um and uh very very well cited uh wikipedia
entry i’m also wondering about the other two individuals right so these other figures who were involved
um and sort of like you know in like in montgomery um you know because like obviously you know rosa parks was already sort of a prom
some semi-prominent figure but these other figures who came before her um i’m assuming that they don’t necessarily have a wikipedia page
yeah claudette copeland does not racy taylor might now at this point because that book was written on her recently
um but that’s exactly what i’m interested in that is precisely the question i’m interested in is yes rosa parks is an obvious one right she’s
going to have her own wikipedia page claudette coleman doesn’t raci taylor might and that’s precisely what i want
to measure is what ideas and organ organizations and people get their own wikipedia page who doesn’t
and and what is that that means so that’s precisely why i’m going to this wikipedia page
and you know the problem with a lot of social movement research and specifically historical research is the
survivor bias right and the selecting on the dependent variable we know what survived because
it’s in the history books and we read about it we don’t know as much about what didn’t survive
so when when you’re looking at why was rosa parks popular you don’t know about all of the other
rosa parks that weren’t right so it becomes very tricky and doug mcadam has some good work on selecting on the
dependent variable so so one of the things that we can do with these new methods and new sources of data although it’s
not perfect and not super easy but we can try to get around the survivor bias and say yes it’s great to look at the
ideas that survived we also want to find those ideas people etc that did not survive they did
not persist so that’s the id that’s the motivation here is that wikipedia is super extensive there’s six billion
articles in english on wikipedia i mean it’s probably the most extensive historical
not historical just resource for anything that you would want to look up and it’s really really highly trafficked like
it’s the second most traffic site in the united states second only to youtube which actually surprised me that
youtube was more trafficked than wikipedia but over a trillion people click on wikipedia every month
so it’s really important and it’s also extensive so this is like the best case scenario if something is
going to be recorded it’s probably going to be recorded in wikipedia so if you know racy taylor has a page there
because it’s so extensive and so many people are working on it she probably doesn’t get a page in a historical textbook
on the movement for example so yeah best case scenario wikipedia if it’s out there if it’s
being recorded it will be recorded in wikipedia but as i will show some stuff don’t make it into wikipedia
and i think that’s that’s important and that’s what i’m interested in yeah um i’m only looking at english language
stuff here but there’s a lot of opportunity to do cross-language analyses as well for people who are
ambitious and looking at cross-language stuff all right so we have wikipedia
The motivating question
the motivating question then for me is what ideas people and organizations from the early
women’s movement are present in contemporary wikipedia so fairly simple question what is there what is not there and also
what metadata is correlated with the presence or absence of the idea of person or organization so really trying
to dig into what what explains it what are the patterns between what is there and what
is not there to really think about why some things persist and get kind of
enshrined in our historical memory and what doesn’t so that’s the two broad motivating
questions here for me so the data for the women’s movement side which is
The data
tricky to get in any systematic way as anyone who’s done historical research knows comes from the alexander street
collection called women and social movements in the united states 1600 to 2000 so this is a collection
that consists it’s an online collection so it’s digitized which is great for me because i do computational stuff
it consists of about 120 000 pages of primary source collections
and there are of course important features of this collection that impact the interpretation of the
analysis so the collections and the sources were selected by the founding editors of this
collection for their relevance to this specific collection and in particular they wanted to focus
on the diversity of the movement so they’re purposefully collecting documents so already we’re throwing out a bunch of
ideas that existed a bunch of stuff that existed in the women’s movement in that first selection process so they went
and they were trying to be extensive it’s an online database so it’s extensible meaning they can add as much as they want they’re not
constrained by the length of the textbook for example so there’s a lot of resources there but
it’s highly curated and highly selected so we already are starting off with a selection bias going into it but we
gotta start somewhere so this is it sounds like it sounds like the stuff is also ocr too so like this also makes it that’s
great yeah yeah it’s it’s not even just ocr it’s it’s um corrected ocr
oh nice so it’s a hundred percent i mean more or less a hundred percent accurate which is a huge benefit i have done projects on
ocr techs and it’s a nightmare so that’s another benefit of this collection is it’s it’s
corrected because it’s on it’s online you can look at the actual text online it’s not just the images it’s the actual
text online you can copy and paste from it so it’s corrected yeah so it’s a really nice
resource resource um and thankfully my university subscribes to it so i was able to
purchase the actual underlying data behind it which is fantastic so this is um
Documents
a description of the documents that were actually available to me via the i so i purchased an xml file
that contain all the documents so it’s kind of a little bit of a hodgepodge of
curated documents from a variety of women’s movement stuff here and it’s you know the number
of documents is quite different so i’m focusing on the primary source
documents so the ones that were written by people who were involved in the movement so writings of black women suffragists
etc that’s the kind of selection so we go from the selection of
which documents are included in this collection to the which documents i can actually get in the full text
readable format which i think is everything that they they have on the website and then selected further to the
primary source documents the writings by the actual women so that gets 120 000 documents gets
boiled down to just over a thousand documents um i also did a little bit of curating here to
narrow the time frame just for this initial analysis so just looking at 1898 to 1954
so sorry joe where i’m not going to go to four centuries of women’s movement activists but trying
to get some comparable analysis going so writing to black women suffragist and i narrowed it to 1898 to
1954 i do have earlier writings from that collection that i could use the equal rights journal is the journal
of the national woman’s party which is a largely white largely middle to upper class
in some ways quite racist women’s organization and so they you know
champion the equal rights amendment etc and then the national consumers league is a working-class women’s organization
so they focus on the rights of working-class women trying to think through how that we can we can ease burdens on working-class
women and and provide more rights for them so they were often in direct um competition with the equal rights the
national women’s party so directly competing like fighting over bills and such so these
it’s three very different collections different perspectives on the same um
movement here so that’s that’s my collection the first step then when we have the
Ideas
women’s movement data is identifying ideas and this computationally is not trivial
it’s actually quite difficult so just think for a moment with me about the steps it
would take to identify ideas in a text like think about reading through a text how would you say
this is an idea how would you bound the idea in the text and say this is a discrete idea i’m not talking about
themes so we know we can use things like topic models to pull out themes we can use word embedding models to pull out the
ways in which words are used and the relational distance between words we have all of that but i want discrete
ideas i want this is an idea that we can look to see that was present in the women’s movement
and may or may not be present in wikipedia in the history books as i’m measuring it so
it’s a tricky problem and i think we’re doing better as a community at
identifying this but this initially when i started doing computational methods this is what i wanted to do and then
very quickly i was like wow this is nearly impossible i’m not going to do this but i’m back to it thinking through how
we can identify these discrete ideas in the text and the best way that
Key phrase extraction
i know of is through key phrase extraction which is a very common and by now actually quite
good way of extracting phrases from text so i used the rake algorithm
which is one of the older key phrase extraction algorithms so it stands for rapid automatic keyword
extraction it uses a list of stop words it uses phrase delimiters i did not make this
slide i borrowed this slide um and i’m going to kind of hand wave over the the math here which i don’t think is particularly
important here but the idea is it texts it detects the most relevant words or phrases in a piece of
text so it’s not the most frequent it is the most quote unquote relevant and some phrase extraction algorithms
use wikipedia to identify phrases so they keep they take
phrases from wikipedia as a way to kind of train the algorithm to identify these common phrases i of
course don’t want to do that right i don’t want to have wikipedia as both on both sides of the
equation so i that’s why i didn’t use some of the more kind of sophisticated new algorithms to do this because i do
not want to condition the phrases that i extract from the original text based on whether they appeared in wikipedia it
would just you know throw off exactly what i’m trying to do so going to the go to trusty early
rake algorithm to identify these phrases and lark ask a quick clarifying question on
rake so is it the case that um is it does it sort of have like a preset bag of words that were like
that sort of pre-identified phrases or is it sort of unique to each corpus so for instance like
like i know it’s not necessarily like like a tf id app or anything like that um like it’s coming into it i guess it
sounds like based on wikipedia um like these are sort of like very common idea or
not ideas but phrases that have already been identified in sort of this wider body of knowledge um vis-a-vis
wikipedia is that right not rake so other phrase extraction algorithms do that okay not right though okay not rake rake
looks at stop words and looks at other ways of bounding phrases
and you as the user you input the number of words you want to look for so i did everything from unigrams up to
i think five word phrases so you can do you can distinguish how many or you can specify
what types of n-grams you want so yeah i did one one to five and then it uses algorithm a system of
rules based on stop words based on some other kind of components of phrases it does not unlike these other phrase extraction
algorithms it does not use a list of um phrases from wikipedia so
like it doesn’t use wikipedia titles for example too oh that’s super neat yeah so in another project i use a
different phrase extraction algorithm that is conditioned on wikipedia but i i just can’t for obvious reasons can’t write that
yeah yeah yeah i can i don’t want to do that here right yeah so this is really just stop words
other other praised and letters i think there’s some tfidf stuff going on as well actually no it’s not because it’s
individual text it only looks at individual text so there is no no tf idf stuff going okay it’s just
stopped got it um but what it does is yeah it takes so it goes text by text so it’s single
text by single text and this is two two examples of text here with the paragraph
separation there it would take that as input and then it extracts these phrases so united states
rights of women teachers is what i would think of as an idea in the women’s movement focusing on the rights of women
teachers or women teachers as a phrase opposed legislation equal pay measure i’m in you know city
of boston so it identifies places new york uh boards of education
teachers equal pay bill um it identifies names like fairfax brown miss mary
cromwell uh locations and auburn michigan etc so it’s a wide variety of things
and some of them in my eyes represent discrete ideas concepts people
that the women’s movement was proposing and some of them are pretty generic like city of boston and new york
so it does have to be filtered and we filtered it by hand so this method extracted 23 000
ish phrases and me and my trusty grad students went by and just filtered it
which is relevant or not relevant or relevant but may need to be corrected
because there was some corrections that needed or not which sounds like a lot but that is a human sized problem
we could do it in about five hours of work each which to me is not much that does pose
the problem of scalability of course so if i decided oh i want to add more text here or i want a
different corpus to look at you have to go through that process again so i think there is still a lot of
work i mean i can’t think of a computational way to solve that um you know separating out the city of
boston from teachers equal pay bill it’s that’s really quite tricky to do perhaps you
could use wikipedia for that but it gets a little bit tricky so it’s done by hand and in general
when i use computational methods i find that filtering lists is very quick and very
effective so my preferred approach is to use computational methods to pull out relevant lists of words so in one
project i look at verbs and verb phrases to capture tactics and then just filter the verbs and verb
phrases for tactical tactic or not i find it’s a really great way to identify kind of list of words
and gary king and some other psychologists have that same philosophy we’re really terrible at coming up with words in a
category if i were to say name all the animals you would name like five and then you would panic and you’d be like i don’t know any animals
but if i gave you a list and said pull out all the animals you would do it very effectively and very quickly
so for people who are venturing into these types of computational projects filtering lists is great making lists is
terrible so computational methods to make lists humans can filter lists
so after filtering the phrase list the types of phrases we’re focusing on
are actors idb wells people organization like the national women’s party and kind of
groups of actors american feminists movement structures and events so things
that make movements go and what they do on kind of a day-to-day basis or who is doing that
stuff so city council women’s committees seneca falls conventions these are not strict codes we didn’t
like attach a code to each one of these phrases but just generally the kind of framework we were using that
we found emerged from these phrases to help us classify whether they were relevant or not
constituencies working women married teachers french women so like groups of people that the movement was focusing on
grievances and targets race prejudice maternal death canneries was a a frequent target of the
early women’s movement um thinking about workers rights and canneries example for example and then
then ideas so specific ideas like accident accident compensation
specific acts and bills like the shepherd towner act or more kind of generic ideas like
social uplift which was a very important concept in the black women’s movement in this period or equal rights which was
an important concept for the white women’s movement in this period so things like that and then specific public sphere
institutions which is a bit of a tricky category but the ame church was very important to the
women’s movement and then universities were very important so you know who knows if they were relevant
to the movement or not but they were at least mentioned in the in the women’s event documents here so that’s the range of
types of ideas that come through through this algorithm and we ended up with about 5700 unique
phrases and these phrases occurred about 15 000 times in the corpus so they were used on
average about three times each in the corpus so now we’ve gone from corpus to
23 000 phrases to 5700 phrases that we thought were relevant to the the women’s
movement here all right now comes the descriptive statistics
Descriptive statistics
so this is the who provided the ideas so the equal rights journal provided in
this sense three three thousand the phrases come from the equal rights journal um etc over the
the three groups and the equal rights journal use the phrases quite a bit more often so they
of these 3 000 phrases they used it close to 10 000 times in their corpus so there’s some
uh distribution in who is providing the phrases which is um interesting and potentially
consequential and then here are the frequent um
Phrases
phrases from each of these groups and i’ve just put in orange ones that i think are kind of interesting so national consumers league not
surprisingly is looking at labor laws union education etc the equal rights
journal equal rights amendment not surprisingly status which is probably have to do with the commissions on the
status of women which they talked about a lot freedom education family
in the black writings we unsurprisingly get like colored women and negro women war was a big topic uh civil war
freedom slavery community uh body was interesting and i was looking into how body showed up which which was i
think an interesting phrase so there’s a lot you could do just simply looking at the difference in the types of topics and phrases these
women’s groups were talking about i mean and black women i think are undersold in their role that they played in the
first wave women’s movement it was really huge and it was much different than the role that these white middle class women were playing
so just in terms of like digging into what are the different things that these groups found important
is one thing that we can do before we even get to wikipedia with these phrases that we’re looking at
here’s the the same thing but i’m looking only at two grams and up so just throwing out the unigrams here
similar thing you know child labor amendment eight hour days not surprisingly focused on labor here
jury service the the jury rights movement was big for the equal rights for the national women’s party
we get feminist movement here and the feminist movement was largely only mentioned by the national women’s
party the word feminism was not used by black women it was not used by the national consumers league
which is interesting and we we know that second wave feminist black second wave feminist refused
a lot of them refused to use the word feminist and use the word womanist and this goes back into history this
goes back to the 1900s when the word feminist was first imported to the united states from france
so there’s an interesting historical origin story there and then over here we get industrial education training schools so this is
following the booker t washington kind of philosophy of industrial education
that type of kind of economic focus the tuskegee institute is important here so we’re seeing that
focus come there as well all right here so those were the phrases so now
we’ll shift to wikipedia which of these showed up in wikipedia so in all of the phrases all of the
pages in wikipedia and this is the full text not just the title just around 90 of these phrases showed
up in wikipedia which is great i think which means 10 10 of them did not but 90
did and there’s not much difference across these three groups here in which phrases in the amount or
proportion of phrases that showed up and sorry this is across all the entire
body of knowledge of wikipedia so like english english english any english language so like whether it was the parks or talking about like
hydro chemical what have you like these are phrases that popped up okay yeah that’s right and some of the
phrases are pretty generic like abdominal muscles was a phrase that came out in the women’s movement
writing which does in fact show up in wikipedia but uh not necessarily in relation to the
women’s movement right which is why i took this second step yeah abdominal muscles there was a lot about
uh women’s bodies and that actually has to do with health care so a lot about the health care of women um
reproductive issues and such so so that’s why and that exactly
tees me up to look at the second graph which is i don’t want to just look at all pages but were these issues
mentioned in any page that was on a history page it doesn’t have to be history of the women’s movement just a
history page so it contains history in the title or it contains movement in the title so it’s an attempt to narrow down wikipedia
into more of a history book format so if you read an article you don’t know
what you want to search for you just want to know what is the history of the suffrage movement you go to the suffrage movement page and
just a note here that i do get the page titles that redirect so one page on wikipedia tends to have
three or four pages that redirect to it and so i am searching through all of those titles not just the the
the title that it ends up on but all of the pages that then reject direct so the history of women’s suffrage is a
title but it redirects to woman suffrage so just just a note there but i do have all of those titles
so this is an attempt to say not only do the phrases show up in wikipedia but if you were reading a page on
some sort of historical thing would you see these phrases in there and here we see it it’s now
reduced down to about 60 percent of the phrases show up in a history or movement page on
wikipedia which in my eyes is still quite high but it still leaves a full 40 percent of these phrases that are not
showing up in the history pages here uh just to quickly
uh clarify so is it i just want to make sure so the the
i i thought it was a tag on the wikipedia pages like you know you you label your article as
okay this is really to history or this to social movement is that right no this is not from tags this is from
the actual title so does the word history or does the word movement appear in the title of the page there
are tags so wikipedia is tagged um with things like feminist or movements or women’s issues that
could be an alternative route i was looking at that and it it was not i it didn’t make sense to me
why some things were tagged as feminist and something’s not it was pretty weird but i do want to look more
into using those tags those metadata tags so i think that is something that’s potentially a
next step and if anybody here has done research using wikipedia data i would love to hear your opinion on the
usability of those tags and how systematic they are because really just doing some spot checking i
was like wow that’s that’s kind of a weird list of feminist pages so that’s why i didn’t use those so i’m just going for
interpretability and simplicity was just saying does history or movement show up in the title
uh are these titles also contributed by internet users or yep okay yeah
but they do try to do some you know wikipedia has a bunch of style guides they’re pretty strict about what gets
included and what doesn’t and they will the like actual
people who work for wikipedia will change things if it doesn’t fit their style guide they’ll either do it in hit by hand or do it by a bot
so it’s pretty controlled that’s especially the titles but everything about wikipedia is pretty controlled
yeah everything is contributed by the editors so people can add pages and decide on the title and
yeah it’s users and then the third way of slicing the
Titles
data was actually looking at the titles so this is all again all page titles not the history of movement and
the question is do the phrase show up in the title of the page why is this important well
when you do a google search if the phrase is in the title it’s most likely to be that first hit that you see
and it’s most likely to have that little call out box on the right side so if it’s not in the page title the the
page is going to be further down in your search terms and you might even have to add the term wiki in there to get the
page so it is important if the phrase is in the title versus not it impacts the way
people read um and find out about information and so now we’re hovering between 54 to 58 of the phrases are in
a at least one page title and again across all of these there’s not much difference and none of this is
statistically significant the difference is here across these three collections so this
is kind of the meta view of what’s happening with these phrases in wikipedia
Not in Wikipedia
here are some of the phrases that are not in wikipedia so minimum wage boards industrial
hygiene bureaus child wage earners it’s not their mandatory minimum wage law
uh jury service bill surprisingly to me was not in wikipedia and anti-lynching hero ada young
auxiliary is a pretty niche group that was important to the black women’s movement lucy stone civic league
et cetera so just these are some examples of phrases that are not in wikipedia at all at least the way we searched for it oh
and by the way we searched for the phrase using elasticsearch so we did an edit
distance so it didn’t have to be the exact phrase we allowed an edit distance of one for each word so if there were
three words we allowed an edit distance of three and if there were two words we allowed an at a distance of two so there was
some room for different spellings etc but not much so it was
constrained but that’s how we did the the search of wikipedia with these phrases and of course lowercase did and all of
that stuff um here are some phrases that were in wikipedia but not
in the history of the or the movement pages so some interesting things here jim crow car which was
really really important to the early black women’s movement does not show up in any history page in in wikipedia
although it does show up in wikipedia so if you were searching through enough pages you would find it but it’s
not really connected to the women’s movement uh social uplift and nobler ideas were also really really key
concepts to the black women’s movement that don’t show up in these history pages here same same over here pure food law was
very important to the national consumers league night work etc that don’t show up in these history
pages on wikipedia so what
then is lost to history and this is where i need your folks help and where
i’m getting a little stuck so how do i characterize or figure out
what’s going on with these phrases and it’s between 10 to 40 percent depending on how you slice the data
that don’t show up in wikipedia my my initial kind of look through it and i
Qualitative Look
did some word counts i did some frequency counts i just did some qualitative look for black women it’s
specific i mean the general theme is specificity which is not surprising um that the specificity is lost but this
the type of specificity is different across these groups so for black women it was specific clubs organization and
departments that were the most likely to not show up in wikipedia for the equal rights journal so this is
the national women’s league league the national women’s party it was specific commissions bills and treaties
and then for the national consumers league it was specific commissions laws and types of employment so a lot of the
specific industries that they were focusing on does not get mentioned in their pages on wikipedia so just to
show uh one case study of this one of the things i can do is identify
phrases that have a specific word for so departments what departments are being mentioned um
this is easy to do computationally so i just was curious if we break it down to specific types of ideas
do we see differences across groups so i’m looking at any phrase that mentions the word department or departments
Systematic Look
and now we start to see some more systematic differences so in all pages
a hundred percent of the departments mentioned by the equal rights journal was somewhere in wikipedia i mean
there’s some drop off in these other groups but here we start to see a big difference so while around 50 percent
in the equal rights journal national consumers league were mentioned in one of these history pages only 35 percent um from the black
women’s groups were mentioned in these history pages and a lot of that has to do with
things like the negro department which is not mentioned in these these kind of race issues that we don’t
associate with the women’s movement race is separate from the women’s movement and we’re starting to see that
a lot here where we don’t see a lot of these these departments that maybe are not gender specific but still were crucial to the
women’s the black women’s movement were not seen in wikipedia so here are some of the departments that
are missing so again color department negro department but there’s some other ones citizen citizenship department as well so
immigration was a huge part of the early women’s movement and that a little bit gets lost to the history as well this focus on
immigration and nationality etc here and then this is wikipedia
titles so not surprisingly these phrases don’t show up in the history page titles because they’re gene generic
titles like the history of the women’s movement the history of the colored women so they’re not going to show up in the history pages title but we see a pretty
big difference here where 70 of the departments mentioned by the equal rights journal get their own
wikipedia page where only 40 of the departments mentioned by the
black white writers get their own page so now we’re starting to see if you’re kind of searching through
wikipedia it would be much easier to find information on these departments versus
the ones that were important to the the black women’s movement here so that here we’re starting to see
a little evidence that there is actually systematic differences that don’t get picked up when you’re looking at the
phrases as a whole here so very preliminary takeaways and all of
this may change in the coming months as i dig deeper into this i’m convinced and still convinced that
wikipedia is a really fantastic resource i think it’s just an incredible source of knowledge for our world and
they’re so open to making it better and they do campaigns to add more pages so i think it’s just something that we should
celebrate and look at more still 10 of the phrases which is about 500
phrases are not present in wikipedia and this i’m going to do some more cleaning so this number will likely go down so
it’s it’s pretty dang impressive how comprehensive wikipedia is in covering this movement and certainly
if we went back to the 1600s we might find something different but at least for the the modern stuff it’s
pretty good not surprisingly specificity is missing and what’s interesting to me is the type
of specificity is different across subgroups so specific councils which i didn’t show
and departments mentioned by black women are less likely to be in wikipedia this has to do with what these women
were talking about so this is also a difference in what the women were proposing across these different communities not
just in what wikipedia is covering so it’s both what wikipedia is covering but also what the women were finding important that is
different across these and i think that suggests that there are some interesting patterns in this data which may there may be
other systematic differences in what’s missing across subgroups of ideas and again this is where i’m a little bit
this is where i am now is all right i have these lists of phrases that don’t show up in wikipedia across these three clusters
i can’t i don’t have recourse to my normal tools topic models you can’t do on phrases word and bendings doesn’t
really make sense so i’m a little bit you know i’m i don’t feel like i can go back to the stuff
that i know really well and i’m not quite sure what to do with the phrases now so my big questions are
Next Steps
about next steps um and what i was just saying so in some
slices of the data a full 70 of the phrases don’t appear in wikipedia so if you looked at the departments that
showed up in titles for example like how is this just i have to go in by hand
is there some way i can think about patterns across these ideas that would help me get kind of a handle
on what’s going on with representation in wikipedia from this this is kind of an aside but i’m also
bringing in the the role of newspapers so i have the chronicling america data set thinking about that as an intermediate
step did the phrase show up in the contemporary chronicling america newspapers and is that kind of an
intermediate step to shepherding it into wikipedia so i’m currently running that analysis now
to see if i can identify that um chronically in america the ocr is awful awful off awful so that’s a little
bit questionable if how much we can do with that
Gaps in Archives
and just conceptually what to do about gaps in the archives difference in writing styles across group all of the historical issues that
you know are important for any sort of historical analysis comes into it here so just really kind of thinking about how
we do this type of project when the archives are spotty when there’s just issues of style differences
and and such across these these organizations for example i have the journal for the national woman’s party
that i have just the writings for black women so of course the content is going to be different and is that kind of going to impact how how we think
about what’s covered the fact that they’re different is also indicative of the way we think about
history by the way and then just some final thoughts and then i want to i want to get your your thoughts on this so i’m
Comparison
what i’m thinking of is a targeted comparison comparing the national league collection to the actual national
consumers league page equal rights journals the national women’s party page i don’t know how to do that with the black women’s writings
because it’s there’s not a page that’s like black women’s writings right so i don’t know what page i would compare there
or just look at all three collections as a corpus to the history pages as a corpus and not
looking at the phrases or maybe looking at the phrases but just directly doing some comparisons there
Network Analysis
more than just mentions i mean one potential idea i have is looking at the idea structures measured
via network analysis so if you were to read the black women’s writing collection and you were to read the wikipedia history
pages what is the structure of ideas you would see and how how is that different so that’s one concrete idea i have and it
really does borrow from peter berman’s work on word networks also monica lee and john levy martin’s work on
semantic networks and and that sort of thing so that’s one specific idea that i have and that’s it
what else what else can i do what how do i get a handle can i ask a question laura yeah so when you think of the the
process not you know not in quantitative terms but just sort of in natural language terms what do you
what’s your working model in your head of what gets remembered in what doesn’t um
Working Model
i so i think it would certainly i at least i hypothesized coming in that
certain groups what certain groups were doing would be more remembered and recorded because they are
preserved in the archives more they are they were publicized at the time they were probably in the uni
in the newspapers at the time so that was my big initial hypothesis was there’s going
to be some demographic effect here where white women especially middle class white
women are going to have more control over what gets remembered in history versus other
groups so that’s one big model um yeah actually that is the biggest
Rosa Parks Case
even i had even like even like rosa parks right like uh my understanding is
that her case was publicized as sort of a through a media event that was like
deliberately orchestrated like she had organizational support and i wonder
if maybe part of part of the puzzle is figuring out
who had you know organizational resource support to have their messages
magnified you know in the uh in the media sorry i’m just admitting somebody here
something like that you know on a related story i remember when i was a graduate student having
a couple wikipedia editing wars over uh henry ford’s history of anti-semitism
and there they had a very very fast they had like a rapid response team that would contest
any mention of henry ford’s anti-semitism almost immediately to the point that i figured it had to be some type of organized
effort to you know maintain it so you know organizations might not just
platform ideas and help immortalize them but they might actually actively squash them and if i’m wondering if there’s
some way to do that by tracing the genealogy of an idea that died or didn’t something like that
Wikipedia
or the edit pages right we have the entire history of the edit pages of wikipedia so you certainly could pull in that data
as well to try and figure out is there organized efforts on either side like the number of edits so i certainly have
right away the the last edit that was done on the page so some were edited in 2015 some are currently being edited today so
there’s a lot of other data from wikipedia um which i think could be interesting um
so yeah can i think just one more i i would wonder if the concepts or people that you see
The News Archives
featured today if you look through the news archives you would find that
they commanded a news cycle like there was a moment where they captured the national attention and were
able to entrench themselves in the collective imagination and now you could pick up quantitatively right like just the hypothesis would be
that you’d have to dominate a news cycle to get the wide attention that would be
required for cultural creators to be cognizant of your existence know your story and then have the hazard
of creating it in more content that’s consumed i wonder exactly which is why that chronicling
Chronological America
america data is key there yeah precisely so is if we’re looking at the 10
that didn’t show up in wikipedia or the whatever that was 20 30 that didn’t show up in these history
pages is that conditioned on their not only the presence in the newspapers but
the duration of the presence or the geographical reach of the presence where we have the city of the newspapers
so that is precisely i think why i would want to bring in the chronoclean america data there okay may i ask a question
Trending
don’t you guys hear me now did you see any trending between uh the older
the newer stuff the newer references for your list of terms
being more relevant like maybe old phrases kind of started to evolve into something new
Confusion
yeah the so um there is some confounding there specifically with the way we refer to
the black population we don’t use the term colored anymore we don’t use the term negro anymore
and that’s not the same way with white people we have always used the phrase white and we will always from the foreseeable future
use the phrase white i mean women weren’t saying caucasian they weren’t saying the words caucasian there we were using negro
and colored so that is certainly an issue there specifically with like colored women workers doesn’t show up in
wikipedia but black women workers might and so that that changes so that’s an important
caveat um and it’s an interesting problem for wikipedia in general is when you talk
about these historical events do you use the phrase that they use themselves like colored women workers or do you
update that to the modern language which is black women workers or african-american women workers which
probably nobody would want to use now so that’s that’s a big question um with the other stuff with the other kind
of phrases and departments you would think that at least with the departments in particular you would use the actual
department name if there was a negro department you would want to talk about the negro department in a history
book so there’s you know different different ways of thinking about the way phrases change over time
tricky to capture computationally i would say these phrases changes over
time and i’m not sure what i would do with that maybe look yeah newspapers i don’t know um my time
frame is actually not all that large it’s 189 1898 to 1954. um so there may be some
kind of bias toward the 50s but my guess is actually there’s more bias to the first wave movement because
that’s where most of history is done so 1890 to 1920 would be my guess but i actually could look more into year
of creation can i jumping yeah so um this is a
Perspective
really very exciting and amazing project i really really like it it’s really impressive um but like just to contribute one um
perspective maybe so the way i see black women writing is like black women is in the intersection of
racial discrimination and gender discrimination right so like right now here the reference group seems to be like equal rights
journal or some national consumers like i the the the comparative perspective i
think you can further deconstruct further break it down to say what’s the difference between black
women’s writing versus black guys writing would they draw even more attention like say for like for
for for males right black males their uh their movements or their writing
would actually draw even more attention than black women and then also black women versus white women like i see like there are multiple
dimensions going on there and um and also another because i just now
i i think at one point you mentioned a lot of phrases they do appear like they both appear in
black women’s writing and also in wikipedia but it’s not in wikipedia’s
um history or movement page or whatever so like where do they
appear then what are they associated with so they do appear in wikipedia but not associated
with black women’s writing and then they appear somewhere else like seems like they were isolated from uh from their original
contacts then where did they go and why did they go there so i think all those directions are worth dipping um
like diving deeper into and um what else uh
oh and another like one last thing but i think you have already mentioned this because you are now um
also shifting your attention to compare this with uh contemporary writing or contemporary
memories i would say because like right now we thought something is lost to history right
but probably it’s also because we don’t even they just go uh they
they are going actually unnoticed even at present like right now are we also remembering
black women’s writing at this time like at present i like like my my so i think the point
is are they really lost to history which which suggests that they were once remembered they
noticed and then they went lost or they actually they didn’t get any attention at all and they just
you know disappear so um that’s like several dimensions i was thinking through yeah and that that
Stop Sharing
relates back to do you guys prefer that i stopped sharing my screen so we can see each other or should i keep myself that’s great yeah
The Initial Process
uh stop sharing is great yeah sure yeah yeah i always find it so weird talking to these little tiny
boxes um yeah so that relates back to joseph joe or joseph joe joe’s comment
that i’m looking at the initial process so rosa parks it wasn’t arbitrary that her
event became the event it was actually you know they came to her they specifically the the movement
came to her and said rosa parks we need you to get arrested because we’re going to turn your case into a thing and she had a really great
history she was organized you know she didn’t have any problematic features
she was a upstanding citizen etc so it’s she didn’t just get tired and give up her seat the
movement said you need to do this rosa parks and we are going to be behind you so there’s there’s a process beyond just simply
what do we remember in the process and so chronicling america i think would again get at that
was it remembered at the time so up the 10 of the phrases that aren’t in wikipedia were they not even mentioned by
newspapers like was it truly just a non-starter and nobody was talking about it even at the time
so i think from this conversation it’s i that seems to be the essential next step is bringing in this chronicling america
data so i think first first response i’m getting is that should be my focus
is looking at that data because that will be essential um yeah there’s so many different
comparisons here you know the one of the issues with this is this came out of the work on
intersectionality is that the when we think about the women’s movement we think only about gender and so the
race work that black women were doing which was they were being discriminated against because they were black not just because
they were women it was so important and in many cases was more important because we’re talking
post reconstruction here and just the absolutely horrific stuff that was happening in the south with the rise of
the ku klux klan lynching was a huge part of this black woman’s writings and who are we to say that that’s not part of the women’s
movement because it was i mean it was led by black women ida b wells was one of the foremost people who were doing this
stuff so that that process is really interesting to me that you know we we construct the women’s
movement as being so narrow compared to what the participants were thinking about and that’s one of
the things that i really am interested in here and why that difference in the history pages was so important to me
the fact that there was a huge drop in which of the departments mentioned by black women
were in those history pages was a very startling finding to me and it was not
surprising unfortunately but it was startling and i think that really starts to get a handle on how we say
this is the women’s movement this is the history and in fact that was all history right it could have included men’s history and
and the history of race movements and still we see this big drop off so that’s now where i think there’s
something there there’s something happening and we can really kind of get a handle on how we’re defining history and such so where does
one of the questions was like where does that go so the phrases that were in wikipedia but not in those history pages
some of them you know some of these groups have their own pages and history is not in the title so like
the lucy stone league might have its own page but when you read the history of the women’s movement page
you never see the lucy stone league so how would you go from the history to that page so now it becomes a matter of like how
how much knowledge do you have of the movement and would you know to search for that page probably not if you’re trying to learn
about it so then then it becomes kind of like a network of knowledge thing i have a question
if you’re looking for stuff that is mentioned by history then it strikes me that that is
something that is remembered within the history genre or within the institution of
academic history it can still be part of history like our cultural history though not
recognized as part of history through you know instant formal history
or institutionalized history and i think that might be part of it i got to think anything that exists today
has a traceable echo going through and somebody made it survive somehow a journalist it
was in a big book there’s a museum dedicated to it you know yeah it looks like it’s a history
tracing type of thing and i i do think that that 10 once i do some more targeted cleaning i mean there
is a question and i’m interested in particularly from computational folk how much cleaning you do
like how much do you stick to the original phrases versus so for example black women in particular
always used the miss and mrs miss ida b wells mrs ida b wells barnett that doesn’t show up in wikipedia so i
do want to clean out that miss and mrs but in doing that i’m making a choice to say i’m not accepting the way you
phrased it and that’s like a clear example but there’s other ones as well so that is a question to me but anyways
that was a sidetrack once i clean that i actually think that that number will go down to five maybe even three percent and i think you’re
right like virtually everything is recorded which is impressive and um exciting
but that doesn’t mean it’s easily accessible you may have to dig deep deep deep into wikipedia and be really
knowledgeable in in order to find these kind of what we would see as esoteric pages
and why not link to them all in the history page right you can do hyperlinks within wikipedia
and so what’s what’s being chosen in the institutionalized history
kind of version versus the we can dump all of the knowledge we’ve had ever on the internet which is fantastic but
it’s not the same as putting it in the history page which is why i wanted to zero in on that you know i i have this some idea that i
would want to look at actual history textbooks that are taught to kids right and so
looking at the history pages is as close to an approximation of that that i can get and lots of people do look at
history textbooks um so there’s there’s that as well and i and i i do think theoretically and
substantively and empirically it is important whether it’s its own page deep in wikipedia versus
in kind of these two general places where it would be the first place you go if you were trying to look at that
history so that’s definitely the sociology of knowledge institutionalization organizations
everything that’s kind of my jam looking at institutions um
i love this by the way i really do um one of the things i was thinking of was
um maybe even who feels the the ability or the cultural capital to add to wikipedia
themselves yeah you know that yes it’s not a formalized as a textbook right but
that it could have an impact and especially if you’re seeing a difference between
black women’s rights movement and the centered white women’s rights movement
um it could be who feels their voice to speak even now
never mind back then yeah 100 and we know that women are
vastly underrepresented as editors on wikipedia and i’m imagining black women are even
less represented and so that perpetuates what i was just talking about which is we decide
specifically white people decide what the women’s movement is which gets then into the history pages on wikipedia
and either because of resources that people just black women don’t have the time to
edit wikipedia or they don’t feel empowered or they don’t feel invited they don’t do it and so that voice does
not get inserted into our official knowledge and you know i can do that by gender you can look at the editors by
gender getting the race of editors is incredibly tricky but 100 i think that that is the story and
wikipedia is really great they want to include being more inclusive in their editors and so i think like
this research would be very welcomed by them they would say okay here are some gaps now we know kind
of where we can maybe target and improve our wikipedia pages i don’t think they would be
upset to hear things like this they would be very open to it and i know that one of the
things that professors sometimes do is they encourage their students to edit a wikipedia page
to get them as editors to give them an account to get them in the process so one thing that you know i could do or
somebody could do is say let’s go edit the history pages and make sure that this history is being accurately
linked to right that could be one potential outcome of this is identifying where we can improve some of these pages and
literally shape the way that the world reads history i mean it can be
incredibly impactful one trillion people are clicking on wikipedia every month if one trillion not one trillion if a
fraction of those look at the history of the women’s movement page and they’re not seeing those black departments and we put those in that
actually could be pretty impactful i kind of want to piggyback on christine’s comment um i had a good
friend in grad school who did a dissertation on wikipedia uh sort of flame wars right so and particularly in contentious articles
like you could think of like uh us abortion rights for instance where in the early days
of of wikipedia without sort of this the infrastructure that exists now at least to my limited understanding were highly
contentious like minute by minute edits um uh by people sort of on you know either side of the debate right
and i think like when i’m hearing um like one of the things i think might be helpful to push and advance the paper and i know i know
this may be kind of annoying too because like this actually might push in a slightly different direction and everyone hates those kind of comments
um but like i’m thinking like how wikipedia is sort of like the last triangular part in in in this piece and
i think that like even though it is a really cool data source um but i think it’s also part of the story itself and i think
like um because you’re talking about like me like you know leveraging the metadata and i’m kind of wondering you know like
how the editors themselves are a big part of of of this story right as sort of like this gatekeeping mechanism right
um and i think like yeah yeah you may not be able to get race but there are actually a lot of um
more improved algorithms that can actually sort of infer ethnicity now that i know like um there’s like the
officer piece and pnas where they use that to kind of infer diversity um and i think like you may be able to kind of
like gleam at least part of it um because i think that actually might be sort of a big part that at least for
viewers i would imagine would sort of harp on saying well you know like the you know like who get who gets initially put into
wikipedia um you know if you would do some kind of temporal analysis of like when these different articles start to show up is
is probably a product of you know whom are the who whom are the editors um and kind of conversely to
that i’m also kind of wondering you know joe and ian shan kind of hinted at this um i’m kind of wondering if you could
actually look at um you know one cheap easy readily available metadata repository would be
academic articles so you could imagine specific journals of history related to black history maybe articles
journals that are created by hbcus um like they would probably be the best
um sources obviously i would imagine they would probably be more available past the 1950s but like i think that would actually be
a really great source of sort of you know sort of peripheral uh marginalized knowledge that’s well maintained well
curated and data you could probably readily um curate to at least kind of corroborate some of these trends or
you know potentially even as an additional source although that’s again bad me pushing a different direction but
i think like that i that that some like data where there is a community of scholars
who are sort of curating knowledge that is obviously not in in sort of into the mainstream that you would that would be
reflected therefore in wikipedia sort of like in this like fleck kind of you know esoteric
exoterra kind of circles right of of of of knowledge yeah and jstor now has a research bench
where you have access to the entire jstor full text of the entire jstor which is really oh cool
oh it’s a great resource um so so this it would be chronicling america
to kind of pick up the contemporary attention economy it would be jstor to
pick up the academic attention economy and then
looking at how that triangulates or compares to wikipedia which is the
whatever we want to call that popular attention economy yeah i think of like you know like
exoteric and esoteric circles where it’s sort of you know academics who are kind of at the core of like producing and maintaining
and perpetuating knowledge and then more sort of others which are more you know common folk more activists
more you know sort of using the knowledge in different ways i think that might be a kind of a like maybe a helpful frame for it
can i pick you back on on on that idea uh well i i have two comments one is uh if it
gets narrowed down to names like and the question is who gets remembered i wonder if that would be way
way easier to do like a way smaller job of cleaning yeah yeah and then the uh the other
thing is is i wonder if you go into the wikipedia edit histories of a lot of these pages i
remember from my own experience as a wikipedian a lot of esoteric topics
uh are developed mainly by a limited number of super contributors and i would wonder if
there are super contributors who have read specific books and basically integrated
into the wikipedia content and you can find out like the people who these super contributors
are contributing have they been mentioned in a book a history book or something like that uh
i wonder if that’s sort of the mechanism a mechanism that might be at work here well that’s i mean so if we think about
imdb can you guys imagine the most well
researched movie on imdb no no star wars oh of course
and it’s it’s these really nerdy groups of white boys who love star wars and will
detail in great imdb is similar to wikipedia it’s user uh yes okay so not just white but
it’s usually men more women these days actually as it’s becoming disney fied more women are getting into it but so um
imdb is similar where it’s user content users contribute um and put in these like details like
really obscure details about movies and star wars is really well detailed i
mean the most esteric fact about star wars you could find on imdb but you probably couldn’t for
i don’t know a lot of these not that type of movie things um and i’m
guessing a similar thing happens on wikipedia where you just get these nerds who are super interested and they’re like i’m gonna make sure wikipedia has
every freaking detail that i can imagine and it’s well cited and all of that and as uh somebody
sorry i can’t see the that sophia was saying um or somebody was saying it’s not black women who are doing this
they’re not often the ones taking the time to be really super nerdy about it and making sure it all gets on wikipedia
um so that i think is getting a mechanism and i can at least say at least from my friend’s dissertation that she found
overwhelmingly the super editors that sort of joe referenced are overwhelmingly um white men uh in
the us at least in the in the english language like so it’s it’s it’s not um a very diverse uh heterogeneous uh group one other kind
of thought that came to mind too is like when you were talking about star wars i know like star wars and like other kind of like fandoms like star trek which i’m
a big fan of um like the walking dead other things have like these wikipedia ecosystems where they are yeah
yeah yeah they have their own which are sort of an independent system a rather eco knowledge ecosystem than wikipedia
and i don’t know like maybe is there something similar to like to women’s movements where if
there’s like a unique history wikipedia of like i don’t know like of like traditional black history women’s
history that may be curated because there seems to be a wikipedia for everything these days i’m kind of wondering how that would corroborate
with like standard wiki like sort of the mainstream wikipedia yeah i have no idea actually i’m i’m not
in that wiki the wikis not the wikipedia but the wikis
i’m really curious if there’s a women’s movement wiki and that so that is something i’m going to look up right away
so i know there’s i’m just a few minutes left but i’m still and this gives me all sorts of ideas
going forward i’m still a little stuck on yes i can look at who is mentioned which i think is much more tractable and
easier to classify and we can very easily say you know this is a black woman white woman so absolutely but i i really do want to
think about ideas as well like equal rights like social uplift so are there any thoughts on how i deal
with not people and organizations and departments which i have a clear idea in my head how i would do that
but clustering ideas grouping ideas looking for themes across
ideas keeping in mind that i want to bring in chronicling america and maybe now even jstor
i mean can i i still think actually networks are my best approach for this in some ways
because i don’t want to lose that i really do want to think about not just people but ideas that are are
proposed i would do i would defer to my big data colleagues which i’m not one of but
maybe one way to do this is as an exploratory technique where you just identify things that lived and died and
have to trace them out qualitatively on some level i know that you’re looking for a you know computational fixture i don’t
mind qualitative analysis that’s a big part of my research pipeline so it would really just be looking at these case studies and saying here’s a cluster of
ideas from these different collections that didn’t make it and just kind of really doing a
deep dive i don’t mind that actually doing just a few case studies i think that could be really potentially very interesting yeah i mean
to me that sort of like seems like the lowest hanging fruit would be to like do at least a case study because both it’s going to be both an analytic and
methodological effort to once you identify what that is and then learning how to scale it to identify to then identify these other like these
uh these other um ideas in your corpus so i think like you know maybe starting out with something
that’s like well concrete and well-defined and kind of like working reverse engineering it might be the best approach i like that
and then i can bring in all of these ideas about like who’s editing it who is not editing it
and and all that other metadata because throwing all of that stuff in with 5700
phrases it’s going to be a lot of noise ouch
go ahead no don’t go ahead you go um
i have a silly idea which may not be helpful but it’s similar to charities but it’s in the opposite direction so i
happen to write a couple of children’s books for women’s suffrage movement because this you happen to be
yeah right so i noticed that you know like in children’s some
children books have a you know glossary on the back and it kind of you know summarize something like the key
events or key ideas like uh right because it’s targeted like a
children not like a high school history book all like active academic work right so it has to kind of boil
down to uh the most important significant ideas in this in the children young women’s celebrity
movement so i think that maybe to me that’s very helpful you know as someone who is
learning american history by reading children’s book and another thing is that what i realized that i think it was after i
read the after i read like four or five books on the same topic
and that i came across to a book that i mentioned that has a chapter talking about this uh
african-american woman figure in the women’s suffrage movement so i didn’t think about it back in that
but i wonder if the the the first like four or five books that didn’t imagine this
african-american women was published in the early period maybe you know 90s or in the early 2000s
and as we are constantly rediscovering history right so let’s go back to joe’s idea that the
materials are always there right people made a mark on history materials idea but uh
as contemporary people scientists we are just rediscovering materials so my impression is that since the movie
hidden figure came out i’ve seen many more books about oh there’s you know
female computer scientists right who contribute a lot in the early days of the computer error and so we are just
constantly discovering the the history so this idea like how we are the process of rediscovering
history itself it’s kind of like uh i don’t know it’s kind of interesting like uh why
four books didn’t mention that very african-american women at all and then maybe i think the the book i read
talking about this african-american moment maybe part maybe published you know like uh very recently
so i mean it’s and it’s socially determined right it’s black lives matter movement goes up and
the history mentions of the i mean yeah it’s absolutely not arbitrary what we rediscover and remember i mean the racy
taylor stuff came out on the the tales of the metoo movement where it was like actually black women in the civil rights movement got their start in
sexual harassment movements and that was not arbitrary that that came out in that that period when everybody was talking
about it i love the idea of glossaries actually in indexes so now i’m going to think about that as a potential because that’s a
cool ready-made like cluster of ideas um book indexes and stuff that’s great
it’s a great idea to jump over what charlie was saying before about a case
study maybe if you look at one that didn’t make it at all but like in history we
remember the winners right like the rosa park there was a successfulness out of it
right and we don’t remember really the one that wasn’t successful so almost deconstructing in a way some
of the ideas that you have or the key words that you have those movements or what they’re discussing that didn’t make it
you know that weren’t successful we always remember the winners and to see the difference with that i
think is super interesting especially with the idea that’s really out there right now about decolonizing education
about this what we remember is being told by the successful movements or the
successful people rather than the unsuccessful people so understanding
how much falls off based on that i’m just now trying to think of a way to
measure success because there’s also the the science of science literature and
sociology of innovation literature that talks about like the death of ideas obviously in this case it’s science
right um i’m blanking on some references i could i’ll send them to you but like um like this is actually of increased
interest to see like you know like which which which concepts which pop things actually fizzled
um so granted like in the case of science but i mean there is like you know social movements um there is like an intersection within
the sort of the sociology of science so i think that actually might be at least like some theoretical use to
you to kind of kind of uh kind of piggyback with christine was saying to kind of think about a little bit too yeah definitely death of ideas
i think we’re at our uh bewitching hour i don’t want to keep anyone over um but we’ll definitely send you
any emails if we have a late night inspiration of of something to look at but um laura thank you so much for for joining
us this is super fascinating please come back again once uh you know as you develop this we’d love to see
host you again hear this idea or hearing new ideas um it’s been this has been really fantastic well it’s been way more helpful for me
than it probably was for you so i really don’t know i really appreciate the opportunity to just bounce these ideas off an expert
crowd and figure out what to do next because i was a little bit stuck but now i’m not stuck i have lots of ideas
that’s what we like to hear and i love the rake thing the right thing was like i i actually was not familiar with it so i was like super
i love rake i i mean it’s a lot of false positives so just be cautious but the true positives are
freaking gold sometimes so yeah that’s great i i was really like i was like there’s
no way this is going to work no way this is going to work and then i was like wow this actually worked this is great
which is where in like a lot of like nlp stuff because like most of the stuff out there like it’s usually kind of crap i’m sure you and john you could
test you too as well like it’s a lot of it’s not that great but you want it when it hits well it hits them all you’re like wonderful all right thank you so much
thanks so much take care everyone