Do you want to create your own lexicon for this application? Do you want a lexicon for a different application, such as food-related or sports-related tweets?
The lexicon builder generates a set of terms based on a series of collections of tweets in which about half of the tweets are representative of a specific domain (e.g., food, sports, politics), while the remaining tweets are not. The obtained lexicon contains terms found to be discriminative for the target domain:
$ python build.py \ --terms_scoring pmi \ --output your_new_lexicon.txt \ --input your_labeled_collections
If you want to ensure that the terms are frequent while also filtering out the terms that co-occur:
$ python build.py \ --terms_scoring pmi \ --output your_new_lexicon.txt \ --input your_labeled_collections \ --hit_ratio \ --top_div
Browse on GitHub CrisisLexCreate-v1.0.zip (7.8 KB)
A. Olteanu, C. Castillo, F. Diaz, S. Vieweg. CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises. In Proceedings of the AAAI Conference on Weblogs and Social Media (ICWSM'14). AAAI Press, Ann Arbor, MI, USA.
We would like to host and/or provide links to other tools that support the creation of lexical resources for crises or related-domains. Please contact us to include other tools in this list.