In Tutorial 1, we covered Cluster Analysis(KMeans). In this tutorial, we would try to predict prices using Regression Analysis. This is prices of a taxi fare. Later, we would use the same method to predict stock prices.
So give certain parameters like distance, trip time, payment type we would predict the fare.
Let’s get started!
Prerequisites
You need to have Visual Studio 2017. You can download it free from here.
We would cover the following:
- Obtain Your Dataset
- Create a Console Application
- Create the Classes
- Define Data and Model Paths
- Write the Train() Function
- Write the Evaluate() Function
- Write TestSinglePrediction() Function
- Complete the Main Method
1. Obtain Your Dataset
You will need to download the training and test dataset.
Download the taxi-fare-train.csv from here
Also get the taxi-fare-test.csv
Save both files in a local folder. You can also try to open the files with notepad or excel to see the content. You’ll see the data contains the following fields:
- vendor_id: Unique ID of the taxi vendor.
- rate_code: rate type of the taxi trip.
- passenger_count: number of passengers for the trip.
- trip_time_in_secs: The amount of time the trip took. You need to predict the fare before the trip is completed. So at that moment you don’t know how long the trip would take. Therefore, the trip time is not a feature and you’ll exclude this column from the model.
- trip_distance: The distance of the trip.
- payment_type: The payment method – either credit card or cash
- fare_amount: The total taxi fare paid is the label to be predicted.
2. Create a Console Application
Open Visual Studio.
Create a console application – Console-App(.Net Core).
Install the Microsoft.ML Nugget package and Microsoft.ML.FastTree.
Create a folder named Data
Add the two csv files you downloaded, into the Data folder
Set the ‘Copy to Output Directory’ properties of the files to ‘Copy if newer’
3. Create the Classes
We would create two classes: on class to hold the features, the other class to hold the prediction
Step 1: Add a new class and call it TaxiTrip.cs. The content of the class is as shown below:
public class TaxiTrip { [LoadColumn(0)] public string VendorId; [LoadColumn(1)] public string RateCode; [LoadColumn(2)] public float PassengerCount; [LoadColumn(3)] public float TripTime; [LoadColumn(4)] public float TripDistance; [LoadColumn(5)] public string PaymentType; [LoadColumn(6)] public float FareAmount; }
Step 2: Add a second class to and name it TaxiTripFarePrediction
The content is as given below
public class TaxiTripFarePrediction { [ColumnName("Score")] public float FareAmount; }
4. Define the Data and Model Paths
We need to define 3 location in our application class path:
- path to store the training dataset
- path to store the test dataset
- path to save the trained model
To do that, put the code below in the Program.cs file above the main function
static readonly string _trainDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-train.csv"); static readonly string _testDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-test.csv"); static readonly string _modelPath = Path.Combine(Environment.CurrentDirectory, "Data", "Model.zip");
5. Write the Train() Function
The Train() function is a function that build and trains the model by executing the following:
- load the data from disk into memory
- extract the data and perform some transformation
- train the model
- finally return the trained model
The train function returns an model/ITransformer object. This is an object used for data transformation, preprocessing and other tasks..
The complete Train() function is given below:
// The Train() Function public static ITransformer Train(MLContext mlContext, string dataPath) { // The IDataView object holds the training dataset IDataView dataView = mlContext.Data.LoadFromTextFile<TaxiTrip>(dataPath, hasHeader: true, separatorChar: ','); var pipeline = mlContext.Transforms.CopyColumns(outputColumnName: "Label", inputColumnName: "FareAmount") .Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "VendorIdEncoded", inputColumnName: "VendorId")) .Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "RateCodeEncoded", inputColumnName: "RateCode")) .Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "PaymentTypeEncoded", inputColumnName: "PaymentType")) .Append(mlContext.Transforms.Concatenate("Features", "VendorIdEncoded", "RateCodeEncoded", "PassengerCount", "TripTime", "TripDistance", "PaymentTypeEncoded")) .Append(mlContext.Regression.Trainers.FastTree()); //Create the model var model = pipeline.Fit(dataView); //Return the trained model return model; }
From the above code, the OneHotEncoding(() function is used to transform categorical values to numeric values.
Then the Concatenate() function is used to combine all the features into one column.
6. Write the Evaluate Function()
The Evaluate() function is used to assess the model’s performance. Accuracy for example.
The function does not return anything. It simply displays the performance metrics to the standard output.
Evaluation of model accuracy is performed against predictions made with the test dataset.
So these are the steps it follows:
- load the test dataset into an IDataView object
- make predictions by calling the model’s Transform method
- creates regression evaluator object
- generate and display the performance metrics.
The complete Evaluate() function is given below
private static void Evaluate(MLContext mlContext, ITransformer model) { IDataView dataView = mlContext.Data.LoadFromTextFile<TaxiTrip>(_testDataPath, hasHeader: true, separatorChar: ','); var predictions = model.Transform(dataView); var metrics = mlContext.Regression.Evaluate(predictions, "Label", "Score"); Console.WriteLine(); Console.WriteLine($"*************************************************"); Console.WriteLine($"* Model quality metrics output "); Console.WriteLine($"*------------------------------------------------"); Console.WriteLine($"* R-Squared Score: {metrics.RSquared:0.###}"); Console.WriteLine($"* Root-Mean-Squared Error: {metrics.RootMeanSquaredError:#.###}"); Console.WriteLine("Press Enter to continue..."); Console.ReadLine(); }
7. Write the TestSinglePrediction() Function
This method makes a prediction based on a single input record. It does this by performing the following tasks:
- creates a single test data object
- makes prediction of the fare based on this single input by calling the predict() method
- displays the prediction result to the output
This is done using the code below:
private static void TestSinglePrediction(MLContext mlContext, ITransformer model) { var predictionFunction = mlContext.Model.CreatePredictionEngine<TaxiTrip, TaxiTripFarePrediction>(model); //Create a single TaxiTrip object to be used for predictin var taxiTripSample = new TaxiTrip() { VendorId = "CMT", RateCode = "1", PassengerCount = 2, TripTime = 1250, TripDistance = 3.69f, PaymentType = "CSH", FareAmount = 0 }; //Make a prediction var prediction = predictionFunction.Predict(taxiTripSample); Console.WriteLine($"**********************************************************************"); Console.WriteLine($"Predicted fare is: {prediction.FareAmount:0.####}, while actual fare: 15.5"); Console.WriteLine($"**********************************************************************"); Console.ReadLine(); }
8. Complete the Main() Method
Now that we have all our functions, let’s now put it all together in the main method.
The complete main(() method is shown below:
static void Main(string[] args) { MLContext mlContext = new MLContext(seed: 0); var model = Train(mlContext, _trainDataPath); Evaluate(mlContext, model); TestSinglePrediction(mlContext, model); }
At this point, run the application to see the output. Then change the values in the TestSinglePrediction() function and rerun to see how it affects the output.
The output is as shown below:
If you got to this point, congratulations!
Thx sir for sharing knowledge + giving clear example about ML.NET
It really helps me alot.
Wish u for the better luck & joy in the future. (y)