LexPredict Challenge at LexHacks 2015 (Winner!)

This weekend I entered LexHacks 2015 (June 6–7), a hackathon focused on the legal industry and my team won (yay!) the LexPredict challenge, which was to build a parser that could scan a mix of unstructured contracts and identify/extract the names of the parties, the effective date, and the termination date or clause. The hackathon involved a series of specific challenges posted by the event sponsors. Attendees formed teams and, after the two days, submitted a solution.

lexhacks

I decided to focus on a $500 challenge posted by LexPredict, which involved scanning a corpus of over 30,000 unstructured contract documents and trying to extract the parties and certain dates. It seemed obvious from the beginning that the “proper” solution to the problem was to utilize natural language processing methods to extract the data. However, my team’s approach to having something useable within the two-day window of the hackathon involved mixing some basic NLP techniques with some standard text parsing (i.e., using Python).

The typical NLP workflow would have involved several steps, including sentence segmentation, word tokenization, part-of-speech tagging, named entity extraction, and relationship filtering. It would have also required establishing a training set (i.e., a subset of documents tagged with correct answers) and “training” the algorythms used for several of the steps (sentence segmentation, chunking/tokenizing, etc.). I quickly settled on trying to use the Stanford Natural Language Tool Kit (NLTK 3) as part of my solution. I had previously (years ago) taken the Coursera course offered by some of the same Stanford professors that created the toolkit, but I certainly am no NLP pro. While I expected to avoid using the NLP toolkit in favor of custom parsing rules in some places, I also expected to be able to rely on it for some items that were not especially unique to contracts (e.g., identifying names of people).

Luckily, I was able to form a team that included at least three other lawyers and two other developers. In the end, what we understood to be a difficult problem only seemed more so in practice. However, we managed to use the NLP toolkit to tokenize each contract and crafted several token parsing patterns that were able to find and extract some parties and dates. We found that without further parser training, the NLP toolkit did not extract people names as reliably as we wanted (too many false positives). The wide variety of documents in the corpus frequently became the source of pain in writing customized rules. Everyone on the team seemed to walk away with a deeper understanding of this type of NLP problem and their own ideas for how they might proceed – if they had more time.

By the end of the event, everyone seemed happy about what we were able to accomplish in only two days. Overall, it was an interesting event at which I met a lot of new people. Winning our challenge was just a extra bonus.

The team members were: Edward Bryant, Chase Hertel, Tomek Rabczak, Tetyana Rabczak, Bharat Lavania, and Jon Riley.

LexPredict Challenge at LexHacks 2015 (Winner!)

Trending Articles

Moondru Mudichu 27-05-2016 – Polimer tv Serial

Revised GDS Gratuity, Severance Amount and SDBS contribution - Social...

Practice Sheet of Right form of verbs for HSC Students

Name Of Parts Of The Day In hindi And English-List Of Part Of Days In Hindi

Throw Back: Samini — Where My Baby Dey (Prod by Kaywa)

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise

Man to stand trial on three charges of money laundering

Samuel Llewellyn Richards

GTA 5 PPSSPP Zip File Download For Android Mediafire 382 MB

Cheriyal Mandal Sarpanch Mobile Numbers List Warangal District in Telangana...

Snes4Sym emulator for nokia s60v3

Password Reset on SX6036?

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

Rigol oscilloscope teardown and repair

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

DRP MAKER WITH CHEMICALS 9491234553

Chai Status, Funny Tea Quotes in Hindi, चाय पर शायरी

Joshua Pigden from Bristol faces trial over rape and Diazepam...

Gulabi kallu Lyrics and translation | GAV / Govindhudu andhari vadele (2014)

SPY CAMERAS: Bus lane clampdown will be running in Derby by the end of November