How I Won Jeopardy With Data Science

The Plan

The natural place to start was the J! Archive, an online record of every Jeopardy game dating back to 1984. The format of the site makes it pretty easy to replay old games as a study method, but I knew if I just played through as many games on there as I could, it wouldn’t be enough to extract the Pavlov clues from scratch. I also knew some knowledge areas of mine were weaker than others, so I needed a way to prioritize what to study. Using some readily available open-source Python packages, I scraped all of the roughly 400,000 questions from J-Archive onto my computer, extracted the questions into a local database, and began partitioning the questions by category. I knew it would be impractical to try and memorize every answer in every category, so I settled for narrowing it down to only the most common occurring answers in each category. As an example, here are the most common answers for question on religion/the Bible, an area I knew I was weak in, based on how often the answer appears in the questions:

Typical categories
Cards in the “Shakespeare” category

The Results

Time to put the theory to the test.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Colin Davy

Colin Davy


Colin is a consulting data scientist in San Francisco, a two-time winner of the Sloan Sports Analytics Conference Hackathon, and a Jeopardy champion.