Dan Simonson is a PhD Candidate at Georgetown University in the Department of Linguistics within the computational concentration. He's most interested in using linguistics and natual language processing to take objective and interesting slices of the real world to yield insight and understanding. In particular, this has led him to pursue problems related to narrative schemas, information extraction and retrieval, semantic modality, critical discourse analysis, and applying these topics to one another.
Dan defended his dissertation, Properties of Narrative Schemas, on November 17th, 2017. His dissertation committee consists of Tony Davis, Amir Zeldes, and Nate Chambers.
Dan is most easily found in rooms with free coffee and on a bicycle. He tried both at the same time once in his youth and ended up burning his hand.
He also feels very weird writing about himself in the third person and will cease immediately.
Use my name at gmail.com. (I will respond to you from a different address that the first one forwards to. Let me know if you have some kind of filter that requires me to respond from the first one.)
Or, on Twitter, @thedansimonson.
Tools and Goodies
Bash Your Way into Bash
Bash Your Way into Bash is a tutorial for using bash. It's for people who have no experience using command line interfaces.
library for Python
is a library for dealing with lists of dictionaries in Python. It makes counting and finding things eas(y|ier). You can get off of pip (). I've made an official page here with some code examples; the in-code documentation is pretty solid.
Simple Python Twitter Scraper (SPyTS)
SPyTS is a tool for scraping tweets. For us plebs who don't have firehose access to Twitter, It spreads queries out over as evenly as possible of a period and prevents exceeding Twitter's API rate limits. SPyTS is available on github.
Research and Publications
A lot of what we know is grounded in stories. Some stories tend to repeat themselves, and once they do enough, it's been hypothesized that we genericize the stories into something called a schema. I work to extract this type of world knowledge from language data and apply it to problems that would otherwise be inaccessible quantitative analysis.
My dissertation focuses on the extraction of these sorts of schemas, to understand their distribution and properties, and how to apply this knowledge to practical, real-world tasks.
You can find more information on this vein of research here.
Simonson, D. (2017, November). Properties of Narrative Schemas. Doctoral Dissertation. Advisor: Davis, A. R. Committee: Zeldes, A. and Chambers, N. Georgetown University, Washington, D.C. [dissertation (forthcoming post-defense revisions)] [Dissertation Defense Slides]
Simonson, D. and Davis, A. (2016, November). NASTEA: Investigating Narrative Schemas through Annotated Entities. In the Second CnewS Workshop, EMNLP 2016, Austin, TX. [paper] [Workshop Slides] [DCNLP Slides]
Simonson, D. and Davis, A. (2015, July). Interactions between Narrative Schemas and Document Categories. In the First CnewS Workshop, ACL 2015, Beijing, China. [paper]
Throughout my time at Georgetown, I have been involved in a project building a theory and corpus of gradable modal expressions [NSF-funded, BCS-1053038]. Modal expressions are those that express possibilities. They span all parts of speech.
I played a number of roles on this project. During the experimental aspects of the project, where the annotation guidelines were developed and tested, I was responsible for reporting interannotator agreement scores. During the corpus construction component, I built and maintained a cross-platform tool for adjudicating annotator output. Throughout both stages of the project, I managed data as it flowed between phases of the project.
Rubinstein, A., Harner, H., Krawczyk, E., Simonson, D., Katz, G., and Portner, P. (2013). Toward Fine-grained Annotation of Modality in Text. In Proceedings of the Tenth International Conference for Computational Semantics (IWCS 2013). [Paper]
Simonson, D., Rubenstein, A., Chung, J., Harner, H., Katz, E.G., Portner, P. (2012, February). Categorizing Modals with Amazon Mechanical Turk. In the Proceedings of the Mid-Atlantic Colloquium of Studies in Meaning (MACSIM 2012). [Poster]
Other Assorted Linguistic Work
Zeldes, A. and Simonson, D. (2016, August) Different Flavors of GUM: Evaluating Genre and Sentence Type Effects on Multilayer Corpus Annotation Quality. In the Proceedings of LAW X: 10th Linguistic Annotation Workshop, Berlin. [paper]
Sierra, S., Simonson, D. (2014, October). Gender and cool solidarity in Mexican Spanish slang phrases In the Proceedings of New Ways of Analyzing Variation 43. Chicago, IL. [Slides from NWAV Presentation]
Undergraduate Research in Astronomy
During my undergrad, I was a physics major and participated in astronomy research under the guidance of Harold Butner. I presented posters in two annual meetings of the American Astronomical Society as results of this research. These proceedings derived from two separate projects involving the DEBRIS target set, a search for binary stars using the Herschel Space Observatory. The first was a search for estimates of stellar age in the literature of our project's target stars; the second reported preliminary results of an observing run in the infrared identifying candidate binaries.
Simonson, D. E., Butner, H. M., Trelawny, D. T., Evans, C. M., Duchene, G., Rodriguez, D. R., ... and DEBRIS, T. (2010, January). Searching for Previously Unresolved Binaries in DEBRIS Survey Target Stars. In Bulletin of the American Astronomical Society (Vol. 42, p. 400). [Poster]
Butner, H. M., McCauley, P., Simonson, D., Matthews, B., Greaves, J. S., Duchene, G., ... and Zuckerman, B. (2009, January). Stellar Ages Of The Debris Sample Stars. In Bulletin of the American Astronomical Society (Vol. 41, p. 209). [Poster]
Pluto never should have been planet.
My first website was D's C&c Page. I lost the whole thing when, for some reason, Geocities decided I violated their TOS. No explanation was given.
wow such linguistics
djorno: pizza for python
Amir Zeldes - Homepage
I am a computational linguist specializing in corpus linguistics, the extraction and analysis of linguistic structures in digital text collections. My main areas of interest are at the syntax-semantics interface: I am interested in how we say what we want to say, and especially in the kinds of discourse models we retain across sentences. This includes representing entity models of who or what has been mentioned, how they are introduced and referred back to, but also relationships between utterances as a complex discourse is constructed, such as expressing causality, signalling support for arguments and opinions with evidence, contrasts and more.
I am also very interested in how we learn to be productive in our first, second and subsequent languages, producing some (but not only, and not just any) utterances and combinations we have never heard before. I believe that very many factors constantly and concurrently influence the choice between competing constructions, which means that we need multifactorial methods and multilayer corpus data in order to understand what it is that we do when we produce and understand language.
- Corpus Linguistics
- Building and using multilayer corpora
- Predictive modelling of syntactic alternations
- Productivity in argument selection
- Information structure
- Digital Humanities for Coptic studies
- Coreference and entity resolution
- Discourse annotation (especially in RST)
- Developing corpus search and annotation interfaces
- Constructions in second language acquisition (esp. of German)
Stuff I work on
News and events
Send me an e-mail if you'd like to join corpinfo, the GU mailing list for information on corpus linguistics events, jobs and corpus releases at GU and the DC area.