DeepDetect uses a dedicated connector to train and predict from CSV data. At prediction time it is possible to use raw CSV line without the header, and without reading them from a file.

For a comprehensive list of parameters to the CSV input connector, see the API Connectors section.

For a complete tutorial on how to train from CSV data, see here.

Training

Training reads from CSV files and allows some pre-processing to be applied by DeepDetect without modification to the original data. The main features of the CSV input connector at training time are as follows:

  • the data field should hold the CSV training file, and when available, another CSV file containing the test data
  • the CSV files must include the CSV header
  • specifying the label column name is mandatory in training mode
  • at this stage only numerical data are supported, textual data is ignored
  • typical preprocessing includes scaling all data columns into [0,1], test splitting and shuffling of the data

Below is a typical call for training from CSV data:

curl -X POST "http://localhost:8080/train" -d '{
       "service":"covert",
       "async":true,
       "parameters":{
         "mllib":{
           "gpu":true,
           "solver":{
             "iterations":1000,
             "test_interval":100
           },
           "net":{
             "batch_size":512
           }
         },
         "input":{
           "label_offset":-1,
           "label":"Cover_Type",
           "id":"Id",
           "separator":",",
           "shuffle":true,
           "test_split":0.1,
           "scale":true
         },
         "output":{
           "measure":["acc","mcll","f1"]
         }
       },
       "data":["models/covert/train.csv"]
     }'

Note the relevant options: * data holds a single file that holds the training set * test_split in addition to shuffle turns a random 10% of the CVS training set into a testing set * label, id and separator are options to parse the CSV and flag the label and id columns * label_offset is useful when your labels do not originally range from 0 and beyond * scale tells the input connector to scale all data within [0,1] in order to get similar sensitivity across all dimensions. This usually helps the optimization procedure that underlies learning a neural net.

See the Tutorial on training from CSV for a detailed application example.

Prediction

Prediction supports:

  • CSV file data
curl -X POST "http://localhost:8080/predict" -d '{
       "service":"covert",
       "parameters":{
         "input":{
           "id":"Id",
           "separator":",",
           "scale":true
         }
       },
       "data":["models/covert/test10.csv"]
     }'
  • passing of the data directly
curl -X POST "http://localhost:8080/predict" -d '{
       "service":"covert",
       "parameters":{
         "input":{
           "connector":"csv",
           "scale":true,
           "min_vals":[1863,0,0,0,-146,0,0,99,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1],
           "max_vals":[3849,360,52,1343,554,6890,254,254,248,6993,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
         }
       },
       "data":["2499,0,9,150,55,1206,207,223,154,859,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"]
     }'

DeepDetect documentation