In this Part 2, I would teach you how to create your first experiment. An experiment is simply ‘something’ you are working on in Azure Machine Learning Studio. We would be building an experiment for predicting the prices of vehicles. The dataset we would use is a sample dataset provided as part of the studio.
Learn how to get to ML Studio in Part 1
Follow the steps below:
- Click on Experiments in the left panel
- Click on New in the lower left corner of the screen
- Select Blank Experiment (the first item under Microsoft Samples). The screen is shown below
- A new experiment is created with a default name as you can see
- Change the name to Vehicle Price Prediction
- At the left-hand panel you will see a number of items under experiment items
- Now you can expand Saved Datasets, then click on Samples. You will see many sample datasets
- Then drag the Automobile price data(raw) dataset it to the first block in the canvas
- To view the data, just right-click on the item > Dataset > Visualize. The dataset would be displayed as a grid of rows and columns
- Scroll all the way to the far right column, Price. So we would predict the price using the features provided
Preparing the Dataset
Dataset preparation is sometime called preprocessing. This may include removing or filling missing data, deleting erroneous records, converting string to numeric values etc.
In this case, we would first remove the column called normalized losses. Then we also remove rows that contains missing data
- On the left side, look for Data Transformation. Expand it. Click on Manipulation. Drag the Select Columns item to the canvas, below the Automobile price data. This item is used to select which columns to include/exclude from the model
- Connect from the output of the Automobile price data the input of the Select Columns
- Click on the Select Columns in Dataset and then Launch column selector in the Properties panel (right side)
- Click on With rules on the left
- Then under Begin With, click on All columns
- From the dropdown, select Exclude and column names.
- The in the text box, enter/select the normalized-losses
- Click on Ok
Next we would exclude record with missing data
- Drag the Missing Data item to the experiment
- Connect it with the Select Columns item
- In the properties panel, click on Remove entire row (this is under Cleaning mode)
- You can also put descriptions on the modules by double-clickin
- Now run the experiment (Click on Run at the bottom of the page). After a few seconds, if all goes well, then you will have your experiment as below
If you got to this point, then congrats!
Now that we completed the preprocessing steps, our data is now clean. So the next step is to define the features and the classes. This we would do in part 3.
In Part 3, we would continue with Defining our features.