Blog Post #4

From the feedback I've received about my previous blogs, I can see how confusing the posts might have been and how unrelated they are to my project. So let me take a few moments to address these issues.

Regarding the topics discussed from the blog post titled Updated 9/23, I talked about some of the common problems many automatic text summarization tools face. There have been many research and solutions to the problems that were raised and I decided to use Google/Stanford's model which utilizes some of the solutions discussed in the blog post. These problems are considered to be generally solved by many of today's models including that of Google. 

Blog Post #3 outlines the general features implemented in the model I chose to work with. I also talk about some of the problems I faced in running the model on my own system to make sure it works and that I can use it further for my comps project. I also briefly mentioned a problem that the model still faces: reference resolution. Here's an example:

Generated Summary (using the model that I trained on my system for 4 days)
malaysia airlines flight 370 disappeared on march 7 , 2014 , when she kissed her husband paul goodbye at perth international airport . the mother of two is heading to mongolia , where he would begin work as a mechanical engineer .

Reference Summary
danica weeks is raising her two young sons without her husband , who was aboard mh370 . paul weeks was traveling to mongolia to work in the mining industry . the mh370 passenger gave his wife his wedding ring before he left , a keepsake for his sons .

As you can see, the generated summary fails to define "she"/"mother of two" while the reference summary clearly states that the subject is Danica Weeks.

Moving onto what I've been working on...

I've spent some time fully learning about the details within Google's model (Pointer-Generator Model). I tried my best in understanding what happens at each step in the model when the input data is fed, how a summary is generated, and what it means to have a "pointer" that brings a word straight from the input when the output is an OOV word. Although not all of the mathematical equations and the choices that were made are clearly understandable (sadly outside of my knowledge but I can kind of hand-wave most of the confusing parts), I now have a good understanding of what goes on in the model. I decided to study the model deeply because now I need to figure out what part of the model I can manipulate in making sure that I can implement some sort of a feature that would help with the pronoun problems that model is facing as shown above.

There are 3 solutions that I came up with for now. Before I go into the solutions, however, I need to quickly explain one aspect of the model. After the model works its magic in determining which words to output for the summary, there is a final equation that the model goes through which can be called the final probability distribution equation P(w). This equation has a soft-switch that works as follows:

  1. If the word that is to be generated is not in the pool of vocabulary that we have, the model then outputs the most likely word from the input. This is the pointer that points to the input.
  2. If the word that is to be generated is not in the source document, then the word is pulled from the vocabulary that we have. This is the abstractive text summarization part of the model. 
To the solutions we go. Here are the three that I came up with:
  1. Add a third switch to P(w) that would turn on when the word that is to be outputted is a pronoun. When this switch is turned on, the model would then have another process that it goes through that would determine what this pronoun is referring to, a process which has to be figured out.
  2. Add a large switch that checks if the word is a pronoun. If it is, it goes through it's own process in determining the referenced word. If not, then it goes to the model that we have right now and produces the word normally. Although this may seem like the same to the first solution, it is a bit different. While the first solution concatenates another switch to the equation (meaning, there are three switches now), this solution adds a switch outside of the model that we have right now. So for the first solution, the model would compare the probability of the output being from vocabulary, out of vocabulary, or pronoun and determine which switch to turn on. On the other hand, this solution would first go through one set of switches that checks if the word is a pronoun or not. If it's not, then it goes to the second set of switches which is what we have currently that checks if the word is from vocabulary or out of vocabulary. If I were to implement either the first or second solution, I would definitely go for the second solution for now because the way the switches are implemented, it utilizes the complement operation that pretty much means it's one way or the other. It would be difficult and awkward to add a third option to a problem that regularly only allows for two options. Hence, having 2 sets of 2 choices would make this problem a lot smoother which is what the second solution is doing.
  3. The last solution is to not add a switch. Instead, this solution would manipulate the option that generates words that's OOV. This solution came to me from a research paper that I read called End-to-end Neural Coreference Resolution. This paper references the papers that I linked in the last blog under the paper's Related Work section and states that while the papers that I've been reading are good, this paper does a better job and requires less manual hand engineering which the previous work required. Although this model is for a completely different task than the Pointer Generator model, there are numerous features that are implemented in both of the models especially pertaining to generating words/output. The only difference is that the Pointer Generator Model utilizes the entire input in determining the context of the output while the Neural Coreference Resolution Model splits the input document into spans of words and creates a scoring system for each span and determines the best span to output based on this scoring system that is implemented. I believe creating a sort of a hybrid between the two would be the idea solution.
Although I have these three solutions, it's definitely easier said than done. I've been considering implementing the solutions and because there's so much math involved and I don't full understand the magnitudes of the decisions that I make for these mathematical equations, it's hard to predict which solution would be the one to implement. Further investigation and further reasoning into the solutions and fully fleshing out the thought process and the logic behind each solution would definitely help me out in determining which solution to go with.

Also, I need to look into off the shelf models and stuff I can use in creating a baseline model that takes the input, replaces all the pronouns, and then run this through the Pointer Generator Model, and then use another off the shelf model to put back in pronouns.

Comments

Popular posts from this blog

Final Post

Blog Post #5

Senior CS Project Introduction