Update 9/23

After talking with Professor Li, I was able to narrow down my project into something more feasible. I decided to throw out the implementation part of the project completely and focus more into the model aspect. Also, instead of creating my own model, I decided, with the help of Professor Li, to use one of the working models out there and explore some of the problems that the model faces and look for any possible solutions. This reduces the scope of the project into a project that I can actually finish in a semester and I was glad that I received some guidance on how to do so.

On to the topic of what I've done with the project...

Over the summer, I conducted some research and learned as much as possible about automatic text summarization. Automatic text summarization is considered a Sequence-to-Sequence Prediction Problem (seq2seq) which means that it's a prediction problem that takes a sequence as input and requires another sequence as output. A model structure that most people have developed in solving seq2seq problems is the Encoder-Decode model where there are 2 models involved: one for reading input sentences and encoding it to a fixed-length vector and another for decoding the data into a summary. For both of these models, people are using Recurrent Neural Networks (RNN). RNN is a great for seq2seq models because RNN has connections between units which forms a directed graph along a sequence of the input. This allows for the model to base the output on the context provided by the input which is basically what a summary is doing.

RNN has a couple of problems though; the most famous being vanishing/exploding gradient problem. Because RNN applies the same weight during the process, the error either exponentially increase (exploding) or decrease (vanishing). This means that any small patterns of words in the input can become really large and have a big impact on the resulting model or any big patterns of words can be diminished to something insignificant.

There are multiple solutions to vanishing/exploding gradient problem but the two most common ones are Long Short-Term Memory Networks (LSTM) and Gated Recurrent Unit (GRU). LSTM uses what's called the cell state to keep a straight line down the entire chain with only some minor linear interactions in the information that is passed down. This allows for the model to add or remove certain information as necessary throughout the entire model. On the other hand, GRU utilizes update gate and reset gate to decide which information to keep passing down the chain and which information to forget.

I'm still taking online classes from Coursera regarding RNN and reading up on papers that discuss models and problems and how people have overcome those problems

Comments

Popular posts from this blog

Blog Post #5

Final Post

Senior CS Project Introduction