Download Crisis Tweets Collections

This page contains links to download collections of crisis-related tweets.

CrisisLexT26 Tweets from 26 crises, labeled by informativeness, information type and source Nov 2014

Our most recent collection includes tweets collected during 26 large crisis events in 2012 and 2013, with about 1,000 tweets labeled per crisis for informativeness (i.e. “informative," or "not informative"), information type, and source.

Crisis Country Start / Duration #Tweets Category Sub-Category Type Development Spread
2012 Italy earthquakesItalyMay / 32 days7,351NaturalGeophysicalEarthquakeDiffusedInstantaneous
2012 Colorado wildfiresUSJun / 31 days4,172NaturalClimatologicalWildfireDiffusedProgressive
2012 Philipinnes floodsPhilipinnesAug / 13 days2,950NaturalHydrologicalFloodsDiffusedProgressive
2012 Venezuela refinery explosionVenezuelaAug / 12 days2,736Human-inducedAccidentalExplosionFocalizedInstantaneous
2012 Costa Rica earthquakeCosta RicaSep / 13 days2,193NaturalGeophysicalEarthquakeDiffusedInstantaneous
2012 Guatemala earthquakeGuatemalaNov / 20 days3,261NaturalGeophysicalEarthquakeDiffusedInstantaneous
2012 Typhoon PabloPhillipinesNov / 21 days1,944NaturalMeteorologicalTyphoonDiffusedProgressive
2013 Brazil nightclub fireBrazilJan / 16 days4,786Human-inducedAccidentalFireFocalizedInstantaneous
2013 Queensland floodsAustraliaJan / 19 days1,223NaturalHydrologicalFloodsDiffusedProgressive
2013 Russian meteorRussiaFeb / 19 days8,365NaturalOthersMeteoriteFocalizedInstantaneous
2013 Boston bombingsUSApr / 60 days157,454Human-inducedIntentionalBombingsFocalizedInstantaneous
2013 Savar building collapseBangladeshApr / 36 days4,070Human-inducedAccidentalCollapseFocalizedInstantaneous
2013 West Texas explosionUSApr / 29 days14,505Human-inducedAccidentalExplosionFocalizedInstantaneous
2013 Alberta floodsCanadaJun / 25 days5,887NaturalHydrologicalFloodsDiffusedProgressive
2013 Singapore hazeSingaporeJun / 19 days3,639MixedOthersHazeDiffusedProgressive
2013 Lac-Megantic train crashCanadaJul / 14 days2,342Human-inducedAccidentalDerailmentFocalizedInstantaneous
2013 Spain train crashSpainJul / 15 days3,681Human-inducedAccidentalDerailmentFocalizedInstantaneous
2013 Manila floodsPhillipinesAug / 11 days2,032NaturalHydrologicalFloodsDiffusedProgressive
2013 Colorado floodsUSSep / 21 days1,778NaturalHydrologicalFloodsDiffusedProgressive
2013 Australia wildfiresAustraliaOct / 21 days1,982NaturalClimatologicalWildfireDiffusedProgressive
2013 Bohol earthquakePhillipinesOct / 12 days2,214NaturalGeophysicalEarthquakeDiffusedInstantaneous
2013 Glasgow helicopter crashUKNov / 30 days2,558Human-inducedAccidentalCrashFocalizedInstantaneous
2013 LA Airport shootingsUSNov / 12 days2,730Human-inducedIntentionalShootingsFocalizedInstantaneous
2013 NYC train crashUSNov / 8 days1,066Human-inducedAccidentalDerailmentFocalizedInstantaneous
2013 Sardinia floodsItalyNov / 13 days1,143NaturalHydrologicalFloodsDiffusedProgressive
2013 Typhoon YolandaPhillipinesNov / 58 days38,951NaturalMeteorologicalTyphoonDiffusedProgressive
  • Contents: ~250K tweets posted during 26 crisis events in 2012 and 2013, with most events having 2K-4K tweets.
  • Sampling method: by keyword filtering from tweets included in the 1% sample at the Internet Archive.
  • Labels: ~28,000 tweets (about 1,000 in each collection) were labeled by crowdsource workers according to informativeness (informative or not informative), information types (e.g. caution and advice, infrastructure damage, etc.), and information sources (governments, NGOs, etc.).
  • Data format: comma-separated values (.csv) files containing tweet-ids for the unlabeled tweets, plus the text of the tweets and labels for the labeled ones. Also includes a JSON file with metadata about the collection, including the keywords used to select tweets.

If you use the CrisisLexT26 collection, please cite:

Browse on GitHub CrisisLexT26-v1.0.zip (4.6 MB)

CrisisLexT6 Tweets from 6 crises, labeled by relatedness June 2014

Our previous collection includes tweets across 6 large events in 2012 and 2013, with about 10,000 tweets labeled by relatedness (as "on-topic", or "off-topic") with each event.

Crisis Start / Duration Keyword-based sampling (keywords) #Tweets Geo-based sampling (regions or coordinates) #Tweets
2012 Sandy Hurricane 2012-10-28 / 3 days 4: hurricane, hurricane sandy, frankenstorm, #sandy 2,775,812 NY City; Bergen, Ocean, Union, Atlantic, Essex, Cape May, Hudson, Middlesex; Monmouth County, NJ, US 279,454
2013 Boston Bombings 2013-04-15 / 5 days 17: boston explosion, BostonMarathon, boston blast, boston terrorist, boston bomb, boston tragedy, PrayForBoston, boston attack, boston tragic 3,375,076 Suffolk and Norfolk Counties, Massachusetts, US 88,931
2013 Oklahoma Tornado 2013-05-20 / 11 days 36: oklahoma tornado, oklahoma storm, oklahoma relief, oklahoma volunteer, oklahoma disaster, #moore, moore relief, moore storm, #ok, #okc 2,742,588 long. in [-98.25, -96.75] and lat. in [34.5, 35.75] 62,237
2013 West Texas Explosion 2013-04-17 / 11 days 9: #westexplosion, #westtx, west explosion, waco explosion, texas explosion, tx explosion, texas fertilizer, #prayfortexas, #prayforwest 508,333 long. in [-97.5, -96.5] and lat. in [31.5, 32] 16,033
2013 Alberta Floods 2013-06-21 / 11 days 13: alberta flood, #abflood, canada flood, alberta flooding, alberta floods, canada flooding, canada floods, #yycflood, #yycfloods, #yycflooding 370,762 Alberta, Canada 166,012
2013 Queensland Floods 2013-01-27 / 6 days 4: #qldflood, #bigwet, queensland flood, australia flood 5,393 Queensland, Australia 27,000
  • Contents: ~60K tweets posted during 6 crisis events in 2012 and 2013.
  • Sampling method: ~10 million tweets in total were collected by keywords and geographical regions or coordinates. Tweets were provided by Twitter's partner Topsy (4 geo-based), or as lists of tweet ids by Twitris v3 (5 keyword-based datasets, thanks to Hemant Purohit) and Twitter's partner GNIP (1 keyword-based, 2 geo-based, thanks to Aron Culotta) .
  • Labels: ~60,000 tweets (10,000 in each collection) were labeled by crowdsourcing workers according to relatedness (as "on-topic", or "off-topic").
  • Data format: comma-separated values (.csv) files containing the text of the tweets and labels for the labeled ones.

If you use the CrisisLexT6 collection, please cite:

Browse on GitHub CrisisLexT6-v1.0.zip (3.1 MB)

Other Collections

We would like to host and/or provide links to other crisis-related collections. Please contact us to include other collections in this list.