To clean the data and analyze the included used car listings.
used cars from eBay Kleinanzeigen, a classifieds section of the German eBay website. The dataset was originally scraped and uploaded to Kaggle by user orgesleka. The original dataset isn't available on Kaggle anymore, but you can find it here.
We've made a few modifications from the original dataset. We sampled 50,000 data points from the full dataset, to ensure your code runs quickly in our hosted environment
The data dictionary provided with data is as follows:
-
dateCrawled - When this ad was first crawled. All field-values are taken from this date.
-
name - Name of the car.
-
seller - Whether the seller is private or a dealer.
-
offerType - The type of listing
-
price - The price on the ad to sell the car.
-
abtest - Whether the listing is included in an A/B test.
-
vehicleType - The vehicle Type.
-
yearOfRegistration - The year in which which year the car was first registered.
-
gearbox - The transmission type.
-
powerPS - The power of the car in PS.
-
model - The car model name.
-
kilometer - How many kilometers the car has driven.
-
monthOfRegistration - The month in which which year the car was first registered.
-
fuelType - What type of fuel the car uses.
-
brand - The brand of the car.
-
notRepairedDamage - If the car has a damage which is not yet repaired.
-
dateCreated - The date on which the eBay listing was created.
-
nrOfPictures - The number of pictures in the ad.
-
postalCode - The postal code for the location of the vehicle.
-
lastSeenOnline - When the crawler saw this ad last online.
The range of car mileages does not vary as much as the prices do by brand, instead all falling within 10% for the top brands. There is a slight trend to the more expensive vehicles having higher mileage, with the less expensive vehicles having lower mileage.