Create Your Own Lexicon

Do you want to create your own lexicon for this application? Do you want a lexicon for a different application, such as food-related or sports-related tweets?

Build a Lexicon

The lexicon builder generates a set of terms based on a series of collections of tweets in which about half of the tweets are representative of a specific domain (e.g., food, sports, politics), while the remaining tweets are not. The obtained lexicon contains terms found to be discriminative for the target domain:

$ python \
        --terms_scoring pmi \
        --output your_new_lexicon.txt \
        --input your_labeled_collections 

If you want to ensure that the terms are frequent while also filtering out the terms that co-occur:

$ python \
        --terms_scoring pmi \
        --output your_new_lexicon.txt \
        --input your_labeled_collections \
        --hit_ratio \

Browse on GitHub (7.8 KB)



Other Methods to Generate Lexical Resources

We would like to host and/or provide links to other tools that support the creation of lexical resources for crises or related-domains. Please contact us to include other tools in this list.