Jump to content
BernieSanders

Using Machine Learning to automate card cutting/tagging?

Recommended Posts

i think this is a pretty good idea - there are a few posts about people trying to automate card-cutting on here, even going as far as to suggest a program could cut, say, daily politics updates for you on its own by programming it to use google news. however, i dont know if any of these initiatives really took off.

 

here's my thoughts on the individual ideas you have -

#1. this would be good, except i think you'd need a similarity standard before it compiles, just so you don't end up with "econ decline leads to war" and "no war - interdependence" in the same file. the standard and tech may already exist, i am by no means an expert in that field.

 

#2. go for it - you mentioned verbatim doing this but a. it cant highlight, so if that gets implemented that definitely gives you a leg up, and b. it underlines really poorly lol

 

#3. again, no expert with the technology, but i think the auto-tagging would have to be pretty rhetorically powerful, accurate, and concise - for instance, can automation understand the warrants in something as complex as baudrillard, and be able to reiterate them back via a tag? i have no idea, but its something to consider

 

In regards to formatting, perhaps base it off of the Verbatim macro settings?

Share this post


Link to post
Share on other sites

I'm not an ML guy, but I like to lurk over their shoulders.

#1 sounds very viable and is a great idea.

#2 sounds maybe viable, but most text extractors I've seen used don't do a very good job. I know there are papers in which they do extremely well, but I haven't seen anything deployed in a commercial context that does well and so my inclination is to assume that the papers are taking advantage of especially convenient datasets or using lots of resources or something.

#3 doesn't sound viable at all to me, but if you think you can do this then go for it.

It might be better for you to attempt to create ML-based tools that debaters can deploy to assist them in tasks 2 and 3. Go for hybrid centaurs rather than full automation. A novice who's armed with an automatically generated description of what a card should say would probably do better at cutting the card than a novice without such a description. Or, maybe you could create tools that would be good for a first pass through large files, that debaters could then follow up on and fix the mistakes of. But I'd love to see whatever you come up with regardless.

  • Upvote 3

Share this post


Link to post
Share on other sites

I like this project. I know very little about programming. Here are some things you could consider directly or indirectly incorporating.

-the citation maker from Verbatim

-Verbatim

-OneTab is a Chrome extension that compiles all open tabs into a list of exportable links--possibly good for automated researching.

-Just Read is a Chrome extension that removes things that aren't text from a website--possibly good for card cutting without ads/"other articles you might like."

-AdBlock--you probably know what that is--same purpose as Just Read.

 

Some resources for "updates"

-Google News

-RealClearEducation

-think tanks like Cato, Brookings, Heritage etc.

 

As far as automated card-cutting, I think semi-automation might yield far better cards. Like, you type in your tagline, and it gives you the top 10 sources it thinks have a card for you.

 

Things to be careful of:

-not taking the whole paragraph

-sources that conclude the opposite direction

-incorrectly citing things

-grabbing text in the wrong order?

 

It seems like you could pretty easily duplicate Verbatim's underline macro with a highlighting version. Then again, maybe too much automation is bad and we should actually highlight our cards instead of saying "Computer, make me a cap K."

If this gets finished, people might also use it to cut Aff-specific links during a debate, which is questionably ethical.

 

Good luck!

  • Upvote 1

Share this post


Link to post
Share on other sites

Quick update: I can read a word document easily using Python, and given a proper format, output an underlined document (but it's not ready for final release)

 

Input text:

 

 

 

Politics of harmonization eradicates that which presents itself as an alternative option ordering the world into hierarchies of difference. The formulation of politics in this manner pits the insiders versus the outsiders promoting perpetual antagonism within the populous.

Nordin 16 (Astrid, “Futures beyond ‘the West’? Autoimmunity in China’s harmonious world”, Review of International Studies, 42, pp 156-177, January 2016) DP

 

 

The party-state version of harmonious world has then been deployed to ‘do’ various concrete things in Chinese international politics. At the level of imagining difference, it appears to share our concern here with multiplicity and openness. However, groups and cultures are described in ways that correspond with David Kerr’s ‘blending diversity under universalism’, which tends towards an imagination of difference as hierarchically ordered, and sometimes as something that should be eliminated. The future harmonious world is envisaged as an ‘inevitable choice’, and China is imagined as having a privileged position in the construction of this future because of its purported harmonious nature based on history. It is inevitable, yet needs to be constructed and fostered. Against this background, ‘harmonious world’ is said by some to indicate ‘an increasingly confident China relinquishing its aloofness to participate and undertake greater responsibilities in international affairs’. Nonetheless, the term remains to a significant extent a ‘catch all’ phrase of friendly connotations. ‘Harmonious world’ may be useful precisely because of its vague and elusive implications, that nonetheless speak to both Chinese and non-Chinese sensibilities. Indeed, ‘who could argue against global peace and prosperity?’ Nonetheless, what emerges from accounts of harmony as articulated in China in the last decade is a tension in the harmony concept between its need for multiplicity on the one hand, and its presupposition of universalisability on the other. Bart Rockman has suggested that harmony may be a ‘necessary glue without which neither a society nor a polity are sustainable’, but that ‘complete social harmony is ultimately suffocating and illiberal’. Jacob Torfing has also taken issue with predominant understandings of harmony in Southeast Asia that he argues present a ‘post-political vision of politics and governance that tends to eliminate power and antagonism’. Drawing on Laclau and Mouffe, he understands such a post-political vision as both theoretically unsustainable and politically dangerous. It is unsustainable because power and antagonism are inevitable features of the political dimensions of politics. Therefore politics: cannot be reduced to a question of translating diverging interests into effective [win-win] policy solutions, since that can be done in an entirely de-politicized fashion, for example, by applying a particular decision-making rule, relying on a certain rationality or appealing to a set of undisputed virtues and values. Of course, politics always invokes particular rules, rationalities and values, but the political dimension of politics is precisely what escapes all this. Politics, then, unavoidably involves a choice that means eliminating alternative options. Moreover, although we base our decisions on reasons and may have strong motivations for choosing what we choose, we will never be able to provide an ultimate ground for any given choice – in Derridean terms, such grounds will always be indefinitely deferred. Therefore, ‘the ultimate decision will have to rely on a skillful combination of rhetorical strategies and the use of force’. The acts of exclusion that politics necessarily entails will produce antagonism between those who identify with the included options and those who do not. For this reason, the attempt by the promoters of harmony to dissociate harmonious politics from the exercise of power, force and the production of antagonism, claiming a harmony where everyone wins and no-one looses, is bound to fail. Moreover, the post-political vision of politics and harmony is dangerous because its denial of antagonism will tend to alienate those excluded from consideration. This, Torfing writes, will tend to displace antagonistic struggles from the realm of the political to the realm of morals, ‘where conflicts are based on non-negotiable values and the manifestation of “authentic” identities’. Such non-negotiable values would be the opposite of the cooperative harmony sought. To both Rockman and Torfing, then, complete or perfect harmony will defeat harmony and create disharmony. In this way, the excessive production of harmony is what produces the disharmonious elements that come to threaten it. We can see this happening in contemporary China, where the ‘harmonising’ policies enforced under the ‘harmonious society’ slogan have produced a range of oppositional movements, from Chinese youth mocking harmony online to the increasing number of selfimolations we currently witness in and around Tibet. Numerous scholars argue that in order to imagine harmony, we need to imagine heterogeneity and multiplicity. We can now add that the problematic organisation of difference that remains in imaginations of harmonious world eliminates the multiplicity in the here-now that is a prerequisite for harmony. What these renditions of harmony show, I believe, is that the tensions in and logics of harmony are very similar to the ones that are described by Derrida and others in terms of the autoimmune. What we see in these accounts is an irresolvable contradiction, which mirrors the autoimmune logic outlined at the beginning of this article. Harmony must by definition be universal, but its universalisation by definition makes harmony impossible. In this respect harmony works on a self-defeating and self-perpetuating logic that is very similar to what we saw described in the ‘modern West’ and in ‘democracy’.

 

Output (maximum words to underline set to 200, but this is adjustable. I can also do a ratio of the document, like "at most 20%")

 

 

The party-state version of harmonious world has then been deployed to ‘do’ various concrete things in Chinese international politics. At the level of imagining difference, it appears to share our concern here with multiplicity and openness. However, groups and cultures are described in ways that correspond with David Kerr’s ‘blending diversity under universalism’, which tends towards an imagination of difference as hierarchically ordered, and sometimes as something that should be eliminated. The future harmonious world is envisaged as an ‘inevitable choice’, and China is imagined as having a privileged position in the construction of this future because of its purported harmonious nature based on history. It is inevitable, yet needs to be constructed and fostered. Against this background, ‘harmonious world’ is said by some to indicate ‘an increasingly confident China relinquishing its aloofness to participate and undertake greater responsibilities in international affairs’. Nonetheless, the term remains to a significant extent a ‘catch all’ phrase of friendly connotations. ‘Harmonious world’ may be useful precisely because of its vague and elusive implications, that nonetheless speak to both Chinese and non-Chinese sensibilities. Indeed, ‘who could argue against global peace and prosperity?’ Nonetheless, what emerges from accounts of harmony as articulated in China in the last decade is a tension in the harmony concept between its need for multiplicity on the one hand, and its presupposition of universalisability on the other. Bart Rockman has suggested that harmony may be a ‘necessary glue without which neither a society nor a polity are sustainable’, but that ‘complete social harmony is ultimately suffocating and illiberal’. Jacob Torfing has also taken issue with predominant understandings of harmony in Southeast Asia that he argues present a ‘post-political vision of politics and governance that tends to eliminate power and antagonism’. Drawing on Laclau and Mouffe, he understands such a post-political vision as both theoretically unsustainable and politically dangerous. It is unsustainable because power and antagonism are inevitable features of the political dimensions of politics. Therefore politics: cannot be reduced to a question of translating diverging interests into effective [win-win] policy solutions, since that can be done in an entirely de-politicized fashion, for example, by applying a particular decision-making rule, relying on a certain rationality or appealing to a set of undisputed virtues and values. Of course, politics always invokes particular rules, rationalities and values, but the political dimension of politics is precisely what escapes all this. Politics, then, unavoidably involves a choice that means eliminating alternative options. Moreover, although we base our decisions on reasons and may have strong motivations for choosing what we choose, we will never be able to provide an ultimate ground for any given choice – in Derridean terms, such grounds will always be indefinitely deferred. Therefore, ‘the ultimate decision will have to rely on a skillful combination of rhetorical strategies and the use of force’. The acts of exclusion that politics necessarily entails will produce antagonism between those who identify with the included options and those who do not. For this reason, the attempt by the promoters of harmony to dissociate harmonious politics from the exercise of power, force and the production of antagonism, claiming a harmony where everyone wins and no-one looses, is bound to fail. Moreover, the post-political vision of politics and harmony is dangerous because its denial of antagonism will tend to alienate those excluded from consideration. This, Torfing writes, will tend to displace antagonistic struggles from the realm of the political to the realm of morals, ‘where conflicts are based on non-negotiable values and the manifestation of “authentic” identities’. Such non-negotiable values would be the opposite of the cooperative harmony sought. To both Rockman and Torfing, then, complete or perfect harmony will defeat harmony and create disharmony. In this way, the excessive production of harmony is what produces the disharmonious elements that come to threaten it. We can see this happening in contemporary China, where the ‘harmonising’ policies enforced under the ‘harmonious society’ slogan have produced a range of oppositional movements, from Chinese youth mocking harmony online to the increasing number of selfimolations we currently witness in and around Tibet. Numerous scholars argue that in order to imagine harmony, we need to imagine heterogeneity and multiplicity. We can now add that the problematic organisation of difference that remains in imaginations of harmonious world eliminates the multiplicity in the here-now that is a prerequisite for harmony. What these renditions of harmony show, I believe, is that the tensions in and logics of harmony are very similar to the ones that are described by Derrida and others in terms of the autoimmune. What we see in these accounts is an irresolvable contradiction, which mirrors the autoimmune logic outlined at the beginning of this article. Harmony must by definition be universal, but its universalisation by definition makes harmony impossible. In this respect harmony works on a self-defeating and self-perpetuating logic that is very similar to what we saw described in the ‘modern West’ and in ‘democracy’.

 

 

 

 

Also, there are "keywords" generated for the article, which I can use to further increase the emphasis of some underlined lines over others. The closer to the top they are, the most "important" that the algorithm feels each word is. I'm thinking that I can create an adjustable "highlight percentage" which highlights any sentence which has words contained within the top X% of keywords, or something like that

 

Keywords:

 

 

 

harmonious
harmony
politics
politically
political
antagonism
eliminated
eliminates
china
rule
rules
makes
policy
policies
logics
logic
nonetheless
chinese international
ultimately
ultimate
imagining
imagination
imagined
imagine
imaginations
eliminate power
eliminating alternative
greater
term
terms
disharmony
disharmonious
diverging
antagonistic
particular
decision
decisions
inevitable
choice
torfing
jacob
argue
argues
confident
numerous scholars
options
necessarily

 

 

Edited by BernieSanders
  • Upvote 5

Share this post


Link to post
Share on other sites

Stay tuned, I'm going to cook up some more magic pretty soon. I'm not going to directly use the word ecosystem. My idea right now is to build a python script that will read in a carefully formatted word doc.

 

The program will take an input from the user that will be something like this:

 

[1]

*card text, with any type of formatting*

[/1]

 

[2]

*card text*

[/2]

 

and the output will be a new doc in the same format but all of the text being underlined.  (I'll have to modify this in light of making parameters like "max words" customizable, but for now this will do. )

 

Yeah, it's not directly in your word doc, but this design implicitly avoids many pitfalls that I'd have to deal with as the programmer if I were to instead try my hand at Visual Basic (ew). I also avoid having to code around every possible combination of bullshit a user could input the way that Verbatim does with that whole "select the text with your cursor" approach.

Edited by BernieSanders
  • Upvote 1

Share this post


Link to post
Share on other sites

Are you forcing it to highlight full sentences, or did that just happen naturally?

 

It's because of the algorithm I'm using (a modified text-rank). I'm going to explore extractive text algorithms that work on the word level instead of the sentence level.

  • Upvote 3

Share this post


Link to post
Share on other sites

It's because of the algorithm I'm using (a modified text-rank). I'm going to explore extractive text algorithms that work on the word level instead of the sentence level.

 

this is awesome, i've always wanted to learn to code. is python the only code you know? if not, which is the easiest to pick up the basics with? unless its just personal preference

Share this post


Link to post
Share on other sites

I love the smell of technological revolution. You, my friend, are innovating. This is the future.  :Bow

Paper --> Index Cards --> Tubs --> Laptops --> Verbatim --> This thread

 

(some of that timeline might be incorrect ok, just a general view)

Share this post


Link to post
Share on other sites

So, it looks like all types of extractive summarization techniques operate on the sentence level. This means that all summarization techniques will underline full sentences. That's not quite ideal, but still very useful, especially in light of it's ability to generate summaries of "no more than 100 words", or in the case of someone who reads at "x" words per minute, we can generate a summary that "takes no longer than 30 seconds to read" 

 

I'm thinking about methods to mitigate this. So far, the only solutions I can think of involve using word-level part of speech taggers, which dramatically makes things more complicated and still only gives us the ability to use heuristic rules for deciding when to remove certain words from a selected summarized sentence from being underlined. I don't even know what those heuristics would be unless we were willing to completely sacrifice grammatical correctness   :flower:

 

Also, while I want to pour my heart out into this project, I'm still a student about to finish graduating and am mostly focused on the job search instead of this project. Anyone know any people looking for software engineers data scientists, or something in-between? 

 

Also also, eventually I'll create a github for this project (when I have code worth writing a first commit for) and open source it so that everyone can see how it progresses. 

Edited by BernieSanders

Share this post


Link to post
Share on other sites

Maybe sentence parts, separated by commas or semicolons, would work as a slightly finer but still practical unit.

  • Upvote 2

Share this post


Link to post
Share on other sites

Maybe sentence parts, separated by commas or semicolons, would work as a slightly finer but still practical unit.

Yeah, you could have it take any punctuation--phrases in parentheses or quotes, semicolons, periods, question marks, commas, dashes, whatever--and underline it on that basis. And then maybe you could have it repeat that process for highlighting, going through each of those segments separated by some kind of punctuation and highlighting important ones.

  • Upvote 1

Share this post


Link to post
Share on other sites

If you really wanted to get as far into it as you can on the word level, creating a heirarchy of words that you would underline or words that will underline the sentence after the world. An instance of that is that 'therefore' is word that is probably important. The word shows the conclusion sentence where most people get their final hooyah which is most important for policy debate but the word 'therefore' is unimportant to highlight. 

 

Past that I don't think that being completely grammatically correct is key to reading, but there is a fine line when a card becomes unreadable. You could also over underline a card, trending on the over underline/emphasis/highlight then have the person going to read it select the best warrants and actually highlight those. If you really want to do the best perhaps doing word analysis on some big files from camp would get the most common words and utilize those. Doing several instances of how the program will highlight would allow for experimentation so chasing only one would be bad even though it would take a lot of work.

 

This is actually amazing and you have done so much work. I think a lot of people would help you out, especially with how nerdy the debate community is you probably could get a lot of programmers in on this.

  • Upvote 3

Share this post


Link to post
Share on other sites

OH NO I DID A THING

 

So, I took a sample card (I think I mashed like 3 cards together), but nothing about the content is important except for the fact that parts of it are underlined and other parts not underlined. (NOTE: The card goes on for like 3 more pages)

 

 

 

6C8Xxqm.png

 

 

 

And, well, I realized that all I had to do was create a simple "POS" tagger and instruct it to say that a token is either "underlined"  (Und) or not (non) . This can trivially accept an arbitrary number of tags (such as emphasis, highlight, etc), but I decided to only demonstrate underlined due to the difficulties of parsing text from docx files with python.

 

 

 

9hSh4nf.gif

 

 

 

After I parse the text out into a format that my trusty Keras Sequence to Sequence model can learn from, I get 36 "sentences" of 20 words each. Each sentence is a new sequence for the neural network to train on. Ideally, given a *very large* dataset of sequences like this (from thousands of cards), it will eventually learn how to *fully* cut evidence. Let's get to training it now, shall we?

 

 

 

mPGMYuT.gif

 

 

 

It trains fast because the dataset is tiny and has very few relationships that it can actually figure out. The accuracy ends up being pitifully low (50% is baseline), but that's understandable given that this is a proof of concept prototype

 

We see from the end of the gif the resulting predictions that my model makes. The final gif shows the actual results (but jumbles the word order, which is not a problem in this instance) My tagger missed a lot, but it's accuracy was about 70% accurate the small testset when I don't overtrain it (which I am doing in these gifs, resulting in a lower accuracy as it overfits).  

 

 

 

RahtIOS.gif

 

 

 

 

We have lots of data in the form of open evidence, but the formatting differences of each card are a pain in the ass to deal with. I'm going to soon task the members of this community to help me prepare a very large dataset in a consistent format to train on. I am *exceedingly* excited to see the results. I'll also implement it tagging things for emphasis and highlighting in the meantime.

 

I am but an undergrad using the neural network equivalent of off the shelf tools to implement this. It is my hope that this work will utilize and encourage others to use existing innovations within machine learning to usher in a new era of policy debate. One where small schools have tangible possibilities of competing with big schools regarding evidence cutting.

Edited by BernieSanders
  • Upvote 5

Share this post


Link to post
Share on other sites

Many kudos to the progress. This is looking amazing, although it's largely going over my head. Once you decide the exact formatting you want to use for training the neural network, I'll try to help out with providing cards in that format.

 

You should give this project a name. And then, put that in all the citations. Like:

 

Tagline.

Last Name Year (First Name, Credentials, "Title," MM-DD-YYYY, Publication, URL, Date of Access: MM-DD-YYYY) //TheProjectName

  • Upvote 1

Share this post


Link to post
Share on other sites

Many kudos to the progress. This is looking amazing, although it's largely going over my head. Once you decide the exact formatting you want to use for training the neural network, I'll try to help out with providing cards in that format.

 

You should give this project a name. And then, put that in all the citations. Like:

 

Tagline.

Last Name Year (First Name, Credentials, "Title," MM-DD-YYYY, Publication, URL, Date of Access: MM-DD-YYYY) //TheProjectName

I agree.

This project needs a dope name.

 

Also, I think we as policy debaters should strive to have more consistent taglines - some of the community does author initials, the other half doesn't, some people put DoA, others don't, URL, some people like to put credentials in between the last name and the year created, etc. Good luck creating this, man. I'm excited to see where this ends up, and if it becomes as big in the community as verbatim some day.

 

EDIT: Dumb shit that's already been said

Edited by OutKTheK

Share this post


Link to post
Share on other sites

I'm going to soon task the members of this community to help me prepare a very large dataset in a consistent format to train on.

Should we work on this?

  • Upvote 3

Share this post


Link to post
Share on other sites

 I've coded NLP algos that parse news articles for summaries and highlight. Unfortunately, while this is a better way to google research... it's nowhere close to the cohesive thinking and labeling you need for real debate cards. The first step to creating a global brain tree of arguments and sub-responses which might look like this https://www.kialo.com/electric-vehicles-are-better-than-fossil-fuel-vehicles-4748/4748.0=4748.771+4748.303   would be to synergize all research into an open db that NLP algo's could isolate keywords from and create feeds of possible new args to add for ML dscovery thats supervised by debaters.

Edited by Synergy
  • Upvote 3

Share this post


Link to post
Share on other sites

If you really want this to be a thing, feed it well-cut cards. The coding for this is relatively straightforward: creating a dataset of good cards for it to learn from is the hard part.

  • Upvote 4

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...