What is this challenge about?
The Grab AI for S.E.A. challenge is an online challenge, dedicated to discovering and hiring some of the most exceptional AI talents across Southeast Asia. Up to 50 participants will be offered a full-time position at Grab, and will be invited to our Demo Day in Singapore. At Demo Day, the top ten winners will pitch and stand a chance to win cash prizes from a pool of SGD 20,000!
What positions is Grab hiring for, through this challenge?
Grab will be hiring for the following positions:
> Data Scientists
> Data Analysts
> Product Analysts
> Data Engineers
> Machine Learning Engineers
Who is eligible to participate?
If you are over 18 years of age and can complete any one of our set challenges based on your skills and creativity, you are eligible to participate.
My current location is not in Southeast Asia. May I still participate?
Yes, you can still participate, irrespective of your nationality and / or current location. However, please note that the positions being offered are located within Southeast Asia. You may have to relocate to fill the position you’re offered, subject to visa approvals.
How do I participate?
To take part, select one of the challenges described on the website and register using your email and LinkedIn profile. You will receive all the information about your chosen assignment, including the time-frame within which you must complete it.
Can I participate as part of a team?
This is an individual challenge where full-time job opportunities are in store for the winning participants. As such, we will not be accepting submissions from teams.
Will there be an interview?
Please note that participants will be subjected to an interview process after successful challenge shortlisting.
I have registered for the wrong challenge. What should I do?
Please build for your preferred challenge. Upon submission, please indicate the correct challenge clearly.
What are the submission requirements?
Each assignment will outline the computer language(s) you may or may not use to design your solution. When you are ready to submit your solution, we will require the link to your code repository as the key deliverable of our submission. You may also submit any accompanying documentation you feel is essential to the understanding of your solution.
What is the deadline for submissions?
Submissions will officially close on Monday, 17 June 2019, 6:00pm (SGT).
May I submit more than one assignment?
In the interest of fairness, we will only accept one submission from each participant.
How do I know if my submission has been received?
Once you have successfully submitted your assignment, you will receive a confirmation email to that effect.
Are there any restrictions on which code language to use?
We would highly recommend for you to use python for challenge. However, we are open to other language options as long as the evaluation can be done according to the criteria indicated.
Is external data allowed in building the data model for submission?
External data is allowed to complement offered data. However, please note that datasets provided on the challenge website should also be used in your submission. Effective use of datasets is also considered in your evaluation.
Can I work with a private Github repository and make it public only after submission?
You may work with a private repository as you build your solution. However, at the time of submission, it must be publicly accessible for ease of evaluation.
What computer languages can I use for my solution?
Python is preferred. However, you may use other languages as long as your submission allows the evaluator to run your code. This should be accompanied by a detailed readme file to facilitate the evaluation.
Can I use open source code?
Open source code can be used to enhance your submission. However, you will need to develop your own unique solution to this challenge.
Are there sample submission files?
There are no sample submission files for any challenge.
Is there a minimum AUC score for our solution?
There is no minimum AUC score.
There are "missing data points" in the Traffic Management dataset. What do I do?
Participants can assume the “missing data points” in the Traffic Management dataset essentially indicates zero aggregated demand over those geohash - time bucket pairs.
What does the field "day" in the Traffic Management dataset refer to?
The value in this field indicates the sequential order of days and not a particular day of the month.
What time period does the Traffic Management test dataset run from?
The test dataset can start from any time period after the 2 months of training data. Your model can use features of up to 14 consecutive days ending at timestamp T and predict T+1 to T+5.
Clarifications on geohash data in the Traffic Management dataset.
The current geohash dataset is correct due to it being anonymised (it may not be on an existing city), but you may assume that adjacency is maintained between the geohashes.
Clarifications on geohash library.
Geohash is a public domain geocoding system which encodes a geographic location into a short string of letters and digits with arbitrary precision. You are free to use any geohash library to encode/decode the geohashes into latitude and longitude or vice versa. Some examples include Github (for Python) and Github (for Java).
Can we use both the training and labeled testing data to train our model?
Please use the published test set for reporting your accuracy. We will inspect your training code and model to ensure only the training data is used for training. We are using an internal/unpublished hold-out test set to assess submitted models as well.
Can we use external datasets?
External data is allowed to complement the offered data. However, do note that data sets provided on the challenge website should also be used in your submission. Effective used of datasets is also considered in your evaluation. Finally, please note that participants will be responsible for the proper procurement and use of external datasets.
There are some bookings with multiple labels in the dataset. What should I do?
Please conduct the necessary data cleansing.
Why are there are more entries in the telematics dataset than the label?
Dangerous driving is labelled per trip, while each trip could contain thousands of telematics data points. Participants are required to create features based on the telematics data.
What does the field "second" in the safety dataset mean?
It refers to the time that the record was created. It is measured by the number of seconds from the start of the trip (beginning from 0).
How are the telematics data generated?
The telematics data is generated from a four-wheel driver's smartphone.
What is the format of the test dataset?
The test data has the same field and distribution as the training data provided.