We would be using TensorFlow model for this class. But you don’t have to worry if you don’t know TensorFlow. This is because we will simply import a TensorFlow model and use it in ML.Net.
In case you want to know, TensorFlow is a library developed by Google for data science and machine learning modelling.
Prerequisite
Visual Studio 2017
We would cover the following topics:
- Obtain the Model and Dataset
- Set up the Project in Visual Studio
- Add Global Variables
- How the Model Works
- Create the Classes
- Create the ML Context, Lookup Dictionary and Resize action
- Load the TensorFlow Model
- Create a Learning Pipeline
- Make Prediction and Display the Output
1. Obtain the Model and Dataset
The first step is to download the model and the dataset. They are available here.
Unzip the file into a local directory. The folder contains two files and one folder:
imdb_word_index.csv: the is a file containing a mapping of words to integer values.
saved_model.pb: this is the tensorflow model. pb stands for protobuf. This is a file that contains the graph definition of the model as well as the weights. The model takes a fixed length string (600 char) as input. This represents the review text. It output two numbers(probabilities) as output: P(+ve) and P(-ve)
variables: this is a folder containing two files used by the model.
2. Set up the Project in Visual Studio
Open Visual Studio
Create a project and name it SentimentAnalysisTensorFlow (you can use another name but it’s better to stick with this for now)
Add the following packages using Nugget Package Manager
- Microsoft.ML
- Microsoft.ML.TensorFlow
Create a folder in the project. Name it Data
Copy the content of the folder sentiment_model folder into the Data folder
Then set the ‘Copy to Output Directory’ properties of the files to ‘Copy if newer’
3. Add the Global Variables
Since the model expects an array of length 600, define an integer variable FeatureLength. Set it to 600
Also define the path to the model.
The code to do this is shown below. Place it before the main method
public const int FeatureLength = 600; static readonly string _modelPath = Path.Combine(Environment.CurrentDirectory, "sentiment_model");
4. How the Model Works
The movie review is a sentence made by a user.
So the first step is to split the sentence into an array of words
Next, we use the mapping file to map the array of words into variable-length array of integers
Then, we resize this array into a fixed size array
Finally, we make prediction by feeding this array into the model and obtain the prediction output
5. Create the Classes
Based on the above, we would need to create four classes:
- class to hold the input sentence (review) – MovieReview
- class to hold the array of words
- class to hold the variable-length array of integers – VariableLength
- a class to hold fixed-length array of integers – FixedLength
- class to hold the prediction (output) – SentimentPrediction
These classes are given below:
// This class holds the original sentiment data class MovieReview { public string ReviewText { get; set; } }
Next class
// This class holds the variable length feature // i.e reviewtext mapped to integers array class VariableLength { [VectorType] public int[] VariableLengthFeatures { get; set; } }
Next class
//This class defines the variable length string class FixedLength { // the const FeatureLength fixed is also implicitly static [VectorType(Program.FeatureLength)] public int[] Features { get; set; } }
Next class:
// Prediction output by the model class SentimentPrediction { [VectorType(2)] public float[] Prediction { get; set; } }
6. Create the MLContext, the lookup Dictionary and Resize action
We would now create the MLContext in the main method. The MLContext is simply an environment for working with machine learning models. Simply create an MLContext using the line below:
MLContext mlContext = new MLContext();
The next code is the lookup map that maps the text to array of integers using dictionary provided as .csv file
// We now create a dictionary and load the mapping data var lookupMap = mlContext.Data.LoadFromTextFile(Path.Combine(_modelPath, "imdb_word_index.csv"), columns: new[] { new TextLoader.Column("Words", DataKind.String, 0), new TextLoader.Column("Ids", DataKind.Int32, 1), }, separatorChar: ',' );
We now write the code to resize the variable length array to fixed length array:
// We now need to resise the variable length integer array into fixed size of 600 Action<VariableLength, FixedLength> ResizeFeaturesAction = (s, f) => { var features = s.VariableLengthFeatures; Array.Resize(ref features, FeatureLength); f.Features = features; };
7. Load the TensorFlow Model
We would now load the TensorFlow model using the LoadTensorFlowModel() method from the mlContext.model
// Next we load the pre-trained TensorFlow model // Before you can do this, you need to add the ML.TensorFlow package using PackageManager TensorFlowModel tensorFlowModel = mlContext.Model.LoadTensorFlowModel(_modelPath);
Let’s display some data about the model we loaded
// Next we extract the input and output schema from the TensorFlow model and display them to the output /*The input schema is the fixed-length array of integer encoded words. * The output schema is a float array of probabilities * indicating whether a review's sentiment is negative, or positive */ DataViewSchema schema = tensorFlowModel.GetModelSchema(); Console.WriteLine(" =============== TensorFlow Model Schema =============== "); var featuresType = (VectorDataViewType)schema["Features"].Type; Console.WriteLine($"Name: Features, Type: {featuresType.ItemType.RawType}, Size: ({featuresType.Dimensions[0]})"); var predictionType = (VectorDataViewType)schema["Prediction/Softmax"].Type; Console.WriteLine($"Name: Prediction/Softmax, Type: {predictionType.ItemType.RawType}, Size: ({predictionType.Dimensions[0]})"); Console.WriteLine("Press Enter to continue..."); Console.ReadLine();
8. Create a Learning Pipeline
This we do below. I have added all the necessary annotations to explain different parts of the code
// We now create a learning pipeline (you already know this) IEstimator<ITransformer> pipeline = //Split the text into words using spaces mlContext.Transforms.Text.TokenizeIntoWords("TokenizedWords", "ReviewText") // We now map the words into their corresponding integer values .Append(mlContext.Transforms.Conversion.MapValue("VariableLengthFeatures", lookupMap, lookupMap.Schema["Words"], lookupMap.Schema["Ids"], "TokenizedWords" )) //Next we resize the variable length encoding into fixed size length .Append(mlContext.Transforms.CustomMapping(ResizeFeaturesAction, "Resize")) //Classify the input with the model .Append(tensorFlowModel.ScoreTensorFlowModel("Prediction/Softmax", "Features")) // Now create new output column to hold the prediction. This prediction is retrieved from the TensorFlowModel .Append(mlContext.Transforms.CopyColumns("Prediction", "Prediction/Softmax"));
The next step where we need to create a model is also explained via annotations
// We now create a new ML model from the pipeline //remember the original model was from TensorFlow) IDataView dataView = mlContext.Data.LoadFromEnumerable(new List<MovieReview>()); ITransformer model = pipeline.Fit(dataView); // Call the PredictSentiment Method to make a prediction PredictSentiment(mlContext, model);
9. Make Prediction Using the Model
Finally we make prediction using the model and display the results to the output.
//We are almost there. //We have to use the model to make prediction. We create a method public static void PredictSentiment(MLContext mlContext, ITransformer model) { //Create a prediction engine var engine = mlContext.Model.CreatePredictionEngine<MovieReview, SentimentPrediction>(model); var review = new MovieReview() { ReviewText = "This is an interesting movie!" }; var sentimentPrediction = engine.Predict(review); Console.WriteLine("Number of classes: {0}", sentimentPrediction.Prediction.Length); Console.WriteLine("Is sentiment/review positive? {0}", sentimentPrediction.Prediction[1] > 0.5 ? "Yes." : "No."); Console.WriteLine("Press eny key to exit..."); Console.ReadLine(); }
We are done!
Just run the program and you will have the output shown below. If it ran successfully, then congrats!.