ML.Net Tutorial 2 – Predicting Prices Using Regression Analysis

In Tutorial 1, we covered Cluster Analysis(KMeans). In this tutorial, we would try to predict prices using Regression Analysis. This is prices of a taxi fare. Later, we would use the same method to predict stock prices.

So give certain parameters like distance, trip time, payment type we would predict the fare.

Let’s get started!

 

Prerequisites

You need to have Visual Studio 2017. You can download it free from here.

We would cover the following:

  1. Obtain Your Dataset
  2. Create a Console Application
  3. Create the Classes
  4. Define Data and Model Paths
  5. Write the Train() Function
  6. Write the Evaluate() Function
  7. Write TestSinglePrediction() Function
  8. Complete the Main Method

 

1. Obtain Your Dataset

You will need to download the training and test dataset.

Download the taxi-fare-train.csv from here

Also get the taxi-fare-test.csv

Save both files in a local folder. You can also try to open the files with notepad or excel to see the content. You’ll see the data contains the following fields:

  • vendor_id: Unique ID of the taxi vendor.
  • rate_code: rate type of the taxi trip.
  • passenger_count: number of passengers for the trip.
  • trip_time_in_secs: The amount of time the trip took. You need to predict the fare before the trip is completed. So at that moment you don’t know how long the trip would take. Therefore, the trip time is not a feature and you’ll exclude this column from the model.
  • trip_distance: The distance of the trip.
  • payment_type: The payment method  – either credit card or cash
  • fare_amount: The total taxi fare paid is the label to be predicted.

 

2. Create a Console Application

Open Visual Studio.

Create a console application – Console-App(.Net Core).

Install the Microsoft.ML Nugget package and Microsoft.ML.FastTree.

Create a folder named Data

Add the two csv files you downloaded, into the Data folder

Set the ‘Copy to Output Directory’ properties of the files to ‘Copy if newer’

 

3. Create the Classes

We would create two classes: on class to hold the features, the other class to hold the prediction

Step 1: Add a new class and call it TaxiTrip.cs. The content of the class is as shown below:

public class TaxiTrip
{
    [LoadColumn(0)]
    public string VendorId;

    [LoadColumn(1)]
    public string RateCode;

    [LoadColumn(2)]
    public float PassengerCount;

    [LoadColumn(3)]
    public float TripTime;

    [LoadColumn(4)]
    public float TripDistance;

    [LoadColumn(5)]
    public string PaymentType;

    [LoadColumn(6)]
    public float FareAmount;
}

 

Step 2: Add a second class to and name it TaxiTripFarePrediction

The content is as given below

public class TaxiTripFarePrediction
{
    [ColumnName("Score")]
    public float FareAmount;
}

 

4. Define the Data and Model Paths

We need to define 3 location in our application class path:

  • path to store the training dataset
  • path to store the test dataset
  • path to save the trained model

To do that, put the code below in the Program.cs file above the main function

 

static readonly string _trainDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-train.csv");
static readonly string _testDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-test.csv");
static readonly string _modelPath = Path.Combine(Environment.CurrentDirectory, "Data", "Model.zip");

 

5. Write the Train() Function

The Train() function is a function that build and trains the model by executing the following:

  • load the data from disk into memory
  • extract the data and perform some transformation
  • train the model
  • finally return the trained model

The train function returns an model/ITransformer object. This is an object used for data transformation, preprocessing and other tasks..

The complete Train() function is given below:

// The Train() Function
public static ITransformer Train(MLContext mlContext, string dataPath)
{
    // The IDataView object holds the training dataset 
    IDataView dataView = mlContext.Data.LoadFromTextFile<TaxiTrip>(dataPath, hasHeader: true, separatorChar: ',');

    var pipeline = mlContext.Transforms.CopyColumns(outputColumnName: "Label", inputColumnName: "FareAmount")
        .Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "VendorIdEncoded", inputColumnName: "VendorId"))
        .Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "RateCodeEncoded", inputColumnName: "RateCode"))
        .Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "PaymentTypeEncoded", inputColumnName: "PaymentType"))
        .Append(mlContext.Transforms.Concatenate("Features", "VendorIdEncoded", "RateCodeEncoded", "PassengerCount", "TripTime", "TripDistance", "PaymentTypeEncoded"))
        .Append(mlContext.Regression.Trainers.FastTree());
            
    //Create the model
    var model = pipeline.Fit(dataView);

    //Return the trained model
    return model;
}

From the above code, the OneHotEncoding(() function is used to transform categorical values to numeric values.

Then the Concatenate() function is used to combine all the features into one column.

 

6. Write the Evaluate Function()

The Evaluate() function is used to assess the model’s performance. Accuracy for  example.

The function does not return anything. It simply displays the performance metrics to the standard output.

Evaluation of model accuracy is performed against predictions made with the test dataset.

So these are the steps it follows:

  • load the test dataset into an IDataView object
  • make predictions by calling the model’s Transform method
  • creates regression evaluator object
  • generate and display the performance metrics.

The complete Evaluate() function is given below

private static void Evaluate(MLContext mlContext, ITransformer model)
{
    IDataView dataView = mlContext.Data.LoadFromTextFile<TaxiTrip>(_testDataPath, hasHeader: true, separatorChar: ',');
    var predictions = model.Transform(dataView);
    var metrics = mlContext.Regression.Evaluate(predictions, "Label", "Score");

    Console.WriteLine();
    Console.WriteLine($"*************************************************");
    Console.WriteLine($"*       Model quality metrics output         ");
    Console.WriteLine($"*------------------------------------------------");

    Console.WriteLine($"*       R-Squared Score:      {metrics.RSquared:0.###}");

    Console.WriteLine($"*       Root-Mean-Squared Error:      {metrics.RootMeanSquaredError:#.###}");
    Console.WriteLine("Press Enter to continue...");
    Console.ReadLine();
}

 

 

7. Write the TestSinglePrediction() Function

This method makes a prediction based on a single input record. It does this by performing the following tasks:

  • creates a single test data object
  • makes prediction of the fare based on this single input by calling the predict() method
  • displays the prediction result to the output

This is done using the code below:

private static void TestSinglePrediction(MLContext mlContext, ITransformer model)
{
    var predictionFunction = mlContext.Model.CreatePredictionEngine<TaxiTrip, TaxiTripFarePrediction>(model);

    //Create a single TaxiTrip object to be used for predictin
    var taxiTripSample = new TaxiTrip()
    {
        VendorId = "CMT",
        RateCode = "1",
        PassengerCount = 2,
        TripTime = 1250,
        TripDistance = 3.69f,
        PaymentType = "CSH",
        FareAmount = 0 
    };
    //Make a prediction
    var prediction = predictionFunction.Predict(taxiTripSample);

    Console.WriteLine($"**********************************************************************");
    Console.WriteLine($"Predicted fare is: {prediction.FareAmount:0.####}, while actual fare: 15.5");
    Console.WriteLine($"**********************************************************************");
    Console.ReadLine();
}

 

8. Complete the Main() Method

Now that we have all our functions, let’s now put it all together in the main method.

The complete main(() method is shown below:

static void Main(string[] args)
{
    MLContext mlContext = new MLContext(seed: 0);

    var model = Train(mlContext, _trainDataPath);

    Evaluate(mlContext, model);

    TestSinglePrediction(mlContext, model);
}

At this point, run the application to see the output. Then change the values in the TestSinglePrediction() function and rerun to see how it affects the output.

The output is as shown below:

If you got to this point, congratulations!