Jump to content


  • Content Count

  • Joined

  • Last visited

  • Days Won


BernieSanders last won the day on March 16 2019

BernieSanders had the most liked content!

Community Reputation

1775 Excellent

About BernieSanders

  • Rank
    Still not dead, hopefully running in 2020
  • Birthday 09/18/1996

Profile Information

  • Name
    Bernie Sanders
  • School
    Brooklyn College

Recent Profile Visitors

18845 profile views
  1. https://github.com/Hellisotherpeople/CX_DB8 If you know how to get this working with a PoS tagger or a Seq2Seq model like a pointer network (except that it doesn't change the order of the pointed to words in the generated summaries), than you can take this data and make an automatic debate underliner. I'll be working on this in the coming weeks with stuff as I push various NLP authors to give me assistance in implementing it. NLP is innovating super fast right now - and I suspect that an underlining summarizer of very high quality which extracts tokens from the text will be avalible soon which will be able to rival humans at underlining or highlighting... and it will be done on data that *i* farmed
  2. OH NO I DID A THING So, I took a sample card (I think I mashed like 3 cards together), but nothing about the content is important except for the fact that parts of it are underlined and other parts not underlined. (NOTE: The card goes on for like 3 more pages) And, well, I realized that all I had to do was create a simple "POS" tagger and instruct it to say that a token is either "underlined" (Und) or not (non) . This can trivially accept an arbitrary number of tags (such as emphasis, highlight, etc), but I decided to only demonstrate underlined due to the difficulties of parsing text from docx files with python. After I parse the text out into a format that my trusty Keras Sequence to Sequence model can learn from, I get 36 "sentences" of 20 words each. Each sentence is a new sequence for the neural network to train on. Ideally, given a *very large* dataset of sequences like this (from thousands of cards), it will eventually learn how to *fully* cut evidence. Let's get to training it now, shall we? It trains fast because the dataset is tiny and has very few relationships that it can actually figure out. The accuracy ends up being pitifully low (50% is baseline), but that's understandable given that this is a proof of concept prototype We see from the end of the gif the resulting predictions that my model makes. The final gif shows the actual results (but jumbles the word order, which is not a problem in this instance) My tagger missed a lot, but it's accuracy was about 70% accurate the small testset when I don't overtrain it (which I am doing in these gifs, resulting in a lower accuracy as it overfits). We have lots of data in the form of open evidence, but the formatting differences of each card are a pain in the ass to deal with. I'm going to soon task the members of this community to help me prepare a very large dataset in a consistent format to train on. I am *exceedingly* excited to see the results. I'll also implement it tagging things for emphasis and highlighting in the meantime. I am but an undergrad using the neural network equivalent of off the shelf tools to implement this. It is my hope that this work will utilize and encourage others to use existing innovations within machine learning to usher in a new era of policy debate. One where small schools have tangible possibilities of competing with big schools regarding evidence cutting.
  3. https://theanarchistlibrary.org/library/max-stirner-the-ego-and-his-own
  4. Is generic enough to answer any argument on any side ✓ A unique, anti-political, difficult to answer critique of all collectivist epistemology and ideology and defense of radical nominalism ✓ Pissed off Marx so badly Marx wrote a book that was partially about why he was wrong (The German Ideology and Stirner responded to it and the rest of his critics) ✓ Multiple titles to his seminal work ("The Ego and its Own", "The Unique and its Property", etc) ✓ Only racist or sexist statements made in his works were parodies of Hegel's crappy thought (and he wrote in the 1840s) ✓ Spookbusting ✓✓✓ Seriously, why is no one reading Max Stirner?
  5. So, it looks like all types of extractive summarization techniques operate on the sentence level. This means that all summarization techniques will underline full sentences. That's not quite ideal, but still very useful, especially in light of it's ability to generate summaries of "no more than 100 words", or in the case of someone who reads at "x" words per minute, we can generate a summary that "takes no longer than 30 seconds to read" I'm thinking about methods to mitigate this. So far, the only solutions I can think of involve using word-level part of speech taggers, which dramatically makes things more complicated and still only gives us the ability to use heuristic rules for deciding when to remove certain words from a selected summarized sentence from being underlined. I don't even know what those heuristics would be unless we were willing to completely sacrifice grammatical correctness Also, while I want to pour my heart out into this project, I'm still a student about to finish graduating and am mostly focused on the job search instead of this project. Anyone know any people looking for software engineers data scientists, or something in-between? Also also, eventually I'll create a github for this project (when I have code worth writing a first commit for) and open source it so that everyone can see how it progresses.
  6. It's because of the algorithm I'm using (a modified text-rank). I'm going to explore extractive text algorithms that work on the word level instead of the sentence level.
  7. Stay tuned, I'm going to cook up some more magic pretty soon. I'm not going to directly use the word ecosystem. My idea right now is to build a python script that will read in a carefully formatted word doc. The program will take an input from the user that will be something like this: [1] *card text, with any type of formatting* [/1] [2] *card text* [/2] and the output will be a new doc in the same format but all of the text being underlined. (I'll have to modify this in light of making parameters like "max words" customizable, but for now this will do. ) Yeah, it's not directly in your word doc, but this design implicitly avoids many pitfalls that I'd have to deal with as the programmer if I were to instead try my hand at Visual Basic (ew). I also avoid having to code around every possible combination of bullshit a user could input the way that Verbatim does with that whole "select the text with your cursor" approach.
  8. Quick update: I can read a word document easily using Python, and given a proper format, output an underlined document (but it's not ready for final release) Input text: Output (maximum words to underline set to 200, but this is adjustable. I can also do a ratio of the document, like "at most 20%") Also, there are "keywords" generated for the article, which I can use to further increase the emphasis of some underlined lines over others. The closer to the top they are, the most "important" that the algorithm feels each word is. I'm thinking that I can create an adjustable "highlight percentage" which highlights any sentence which has words contained within the top X% of keywords, or something like that Keywords:
  9. I didn't expect to post on this god-forsaken forum ever again, but I had an idea I quit this activity a few years ago, and I haven't really looked back since (at least in the sense of trying to compete), but my interest in it has been reignited by my study of Natural Language Processing and Machine Learning. I've been fiddling with Keras and Tensorflow, and I think that it's possible to automate the entire process of card cutting. I don't think that it will replace card cutting done by hand, but that it has serious potential to save people in situations where they have to quickly use evidence that isn't already cut or quickly cut something during a round. Things that can be potentially automated: #1. Compiling cards from other peoples files: Using "Doc-2-Vec" models, you can compute "similarity" measures between an input card (an important card 1NC K card) and the other 200 cards found in some shitty camp file. This can significantly speed up the process of searching through large files. #2. The card cutting can be done using whatever the state of the art of for extractive text summarization is, and then simply having it underline/highlight what the summarized text is on a card. I'm strongly considering writing some software in python to do this. The hard work has already been done by NLP libraries, so the task would become one of optimizing the workflow and making it usable by normal debaters. #3. Tag generation can be done using "Seq-2-Seq" models trained on a large previous dataset of card taglines and their respective cards. I've experimented with this in Keras and found that the encoding-decoding approach becomes very slow when the input card is long, but in principle, I think this will become viable sometime between now and the next 5 years. I'm especially intrigued by the possibility of generating "personalized" models which will replicate my mannerisms and writing style, and which are tailored to the specific case I'm running. Of these, 1 and 2 are viable right now. I think I can code up an example of #2. pretty easily, but I'm unsure how useful it will be in the field. I think I remember this feature being extremely poorly implemented in Verbaitm, but I wager that I can do it better. Also, there are some challenges caused by the "uniqueness" of this activity, namely the fact that you are all shitty at formatting evidence. There's no standardization of document formatting, making it difficult to properly automate the separation of "tag" "citation" and "card". It's likely that I'll enforce a specific formatting requirement (using Calibri, having specific text sizes, bolding specific things). Any interest in this project? Has anyone tried to do things like this before?
  10. So I've been out of this community for awhile. Why not read trump elections as a DA to literally anything good. I would hope that there's additional evidence talking about how all of the old power structures / concepts we thought would protect us are wrong (NUTS instead of MAD https://en.wikipedia.org/wiki/Nuclear_utilization_target_selection) and how modern liberal-democracy is *again* falling to fascism. The lit is only going to get better and better.
  11. Sever case new 1 off discourse K in the 1AR reap those double 28.7's
  12. 100 + WPM typing speed > all that bullshit Any judge that thinks computer flowing is bad deserves to be gulag'd.
  13. You could also read the speed K separately since it's pretty fucked to spread without tags.
  • Create New...