Google+ Peter Bromberg's .NET Blog | Lexical Analysis (Word Count) of Trump's State of the Union Speech

Peter Bromberg's .NET Blog All Things Programming

Lexical Analysis (Word Count) of Trump's State of the Union Speech

31. January 2018 08:30 by admin in

Each  year I do an analysis of the word count of the current president's State of the Union addresss. The code to do this (once you have downloaded the speech as a text file) has gotten much simpler over the years, especially after the introduction of LINQ.

Here is an example of the key part of the code in C# as an extension method:

public static Dictionary<string, int> GetWordFrequency(this string input){
return input.Split(new char[] { ' ' })
.Where(i => i.Trim() != String.Empty && Regex.IsMatch(i,@"\w"))
.Select(i => Regex.Replace(i,@"[^A-Za-z0-9]+$","").ToLower())
.Where(x => !stopwords.Contains(x))
.GroupBy(w => w)
.OrderByDescending(group => group.Count())
.ToDictionary(group => group.Key, group => group.Count());
}

Here is the sorted list of words with their frequency (Down to a count of 5):

american,29
people,23
americans,20
tonight,20
america,14
congress,13
tax,13
country,13
home,11
am,10
administration,9
america's,9
family,9
world,8
immigration,8
united,7
building,7
safe,7
finally,7
workers,7
nation,7
veterans,7
citizens,7
heroes,6
love,6
strong,6
proud,6
jobs,6
protect,6
communities,6
nuclear,6
isis,6
north,6
passed,5
help,5
police,5
including,5
stands,5
bill,5
reform,5
drugs,5
drug,5
dangerous,5
terrorists,5

 

You can download the complete source code below. The speech.text file is in the /bin/debug folder.

SOTU.zip (215.24 kb)

Add comment

  Country flag


Loading