What are Kaggle Competitions | Data Science Dojo

First of all, what's Kaggle?

Until a few months ago I didn't know the answer to that question. If you don't either that's okay, we're going to answer it together. But first, you need to know a little background information about this data science network.

Kaggle was founded in 2010 with the idea that data scientists need a place to come together and collaborate on projects. This has transformed into a network with more than 1,000,000 registered users, and has created a safe place for data science learning, sharing, and competition.

Using the human competitive spirit, Kaggle created a platform for organizations to host competitions which have fueled new methodology and techniques in data science, and given organizations new insights from the data they provided.

Being the competitive person I am, the competition aspect is what originally caught my eye, and gave me the desire to learn about the intricacies of a Kaggle Competition.

How it works

While combing through the Kaggle website and other informative articles, I found there are three basic steps in Kaggle Competitions.

  1. Preparation: Each competition has a host, and each host has to prepare and provide data. When providing data, the host has the opportunity to give additional information such as a description, evaluation method, timeline, and prize for winning.
  1. Experimentation: At this time, you've had your morning coffee, you've read all the information in the overview 500 times, and you're ready to win 1st place. Now is time to experiment, submit, and learn. There are three ways to upload your work:

    • Kaggle Kernels
    • Manual Uploads
    • Kaggle API

    If you don't want anyone to really know what you're doing, you should upload your experiments manually or by using the Kaggle API. Kaggle Kernels are a way for competitors to share what they've accomplished and get feedback from their peers. Kernels will give you ideas as to how to conquer the data, and I suggest you go through some of the popular ones.

  1. Results: In every competition there are public and private leaderboards. Be warned, the leaderboards are VERY different. The public leaderboard is based on a small percentage of the test data decided by the host. Although it gives you a good idea, it does not always reflect who will win and lose. The private leaderboard is what really matters. Not calculated until the end of the competition, this leaderboard is based on a larger proportion of data and, ultimately, decides the winners and losers.

If you would like to dive deep into the different types or formats and datasets offered by Kaggle, take a look at Kaggle's Help and Documentation

Active Kaggle Competitions

Competitions have a limited amount of time you can enter your experiments. This list does not represent the amount of time left to enter or the level of difficulty associated with posted datasets. One way to determine the level of difficulty is to look at the prize. Typically, the larger the prize, the more difficult/advanced the problem is. You can also look at the type of competition. You can find the four categories and Kaggle's description of them below.

  1. Featured: "These are full-scale machine learning challenges which pose difficult, generally commercially-purposed prediction problems."
  2. Research: "Research competitions feature problems which are more experimental than featured competition problems."
  3. Getting Started: "These are semi-permanent competitions that are meant to be used by new users just getting their foot in the door in the field of machine learning."
  4. Playground: "These are competitions which often provide relatively simple machine learning tasks, and are similarly targeted at newcomers or Kagglers interested in practicing a new type of problem in a lower-stakes setting."

I will try my best to keep this list as up-to-date as possible. Unfortunately, I'm not spending all my time on Kaggle's website. So if you see something has ended, or a new competition has been added, please leave a comment below. Thanks and have fun!

Similar Posts
            </div></div>

This is a companion discussion topic for the original entry at https://blog.datasciencedojo.com/upcoming-kaggle-competitions-working-title/

The following are some interesting datasets for which the competitions are still going on: