NFastText

The NFastText library can be installed from NuGet:

PM> Install-Package NFastText

Description

NFastText is a port of facebook's classification and vectorization tool to .net.

Data

Samples use data files which could be prepared by this script.

Classification

This example demonstrates how to train, test and use text classifier.

Train

Library expects, as an input for training, a text file where every line is a single document. And every line has to contain one or more labels. Labels are words with a specified prefix.

1: 
2: 
3:

__label__spam __label__nsfw enlarge your .... xxx
__label__not_a_spam __label__conference Fsharp conference next year
...

Now we are ready to train. You could use some helpers from a module FileReader to use files and streams as input.

1: 
2: 
3: 
4: 
5: 
6:

#r "NFastText.dll"
open NFastText
open NFastText.FileReader
let trainData = Input.FilePath("./data/dbpedia.train")
//we train our classifier in 4 threads, using 2 wordgrams and default model args, verbosive, without pretrained vectors and with label prefix "__label__"
let state = Classifier.train(trainData, 4, Classifier.args, 2uy, "__label__", true, None)

Test

Test expects a sequence of lines where line represeted as an array of words. You could use helper streamToLines

1: 
2: 
3: 
4:

let testData = Input.FilePath("./data/dbpedia.test")
let r = Classifier.test(state, 1, FileReader.streamToLines testData)
printfn "%A" r
assert(r.precision >= 9.8f)

Predict

Predict expects a sequence of lines where line represeted as an array of words. It returns a sequence of lists where every list contains k best predictions(labels) with weights.

1: 
2: 
3: 
4: 
5: 
6: 
7:

let testData = Input.FilePath("./data/dbpedia.test")
let k = 1
let r = Classifier.predict(state, k, FileReader.streamToLines testData)
let r =  r |> Seq.head
           |> List.head 
           |> fst
assert(r = "__label__9")

Vectorization

Train

Works almost the same as classification, but train files could be without line endings.

1: 
2:

let trainData = Input.FilePath("./data/text9")
let skipgram = Vectorizer.train(trainData,4,Vectorizer.args,Args.VecModel.sg, 3uy, 6uy, true)

Vectorization

Expects as input a sequence of words and result is a sequence of tuples of words with associated vectors.

1: 
2:

let words = Input.FilePath("./data/queries.txt") |> FileReader.streamToWords
let wrodsWithVectors = Vectorizer.getWordVectors(skipgram,words)

Common tasks

You could save and load trained models

1: 
2:

FastTextM.saveState "path" state
let state = FastTextM.loadState "path"

More info

Original FastText project contains papers about how it works.
API Reference contains automatically generated documentation for all types, modules and functions in the library. This includes additional brief samples on using most of the functions.

The project is hosted on GitHub where you can report issues, fork the project and submit pull requests. If you're adding a new public API, please also consider adding samples that can be turned into a documentation. You might also want to read the library design notes to understand how it works.

The library is available under Public Domain license, which allows modification and redistribution for both commercial and non-commercial purposes. For more information see the License file in the GitHub repository.

val trainData : obj

Full name: Index.trainData

val state : obj

Full name: Index.state

union case Option.None: Option<'T>

val testData : obj

Full name: Index.testData

val r : obj

Full name: Index.r

val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn

val k : int

Full name: Index.k

val r : seq<(string * obj) list>

Full name: Index.r

val r : string

Full name: Index.r

module Seq

from Microsoft.FSharp.Collections

val head : source:seq<'T> -> 'T

Full name: Microsoft.FSharp.Collections.Seq.head

Multiple items
module List

from Microsoft.FSharp.Collections

--------------------
type List<'T> =
  | ( [] )
  | ( :: ) of Head: 'T * Tail: 'T list
  interface IEnumerable
  interface IEnumerable<'T>
  member GetSlice : startIndex:int option * endIndex:int option -> 'T list
  member Head : 'T
  member IsEmpty : bool
  member Item : index:int -> 'T with get
  member Length : int
  member Tail : 'T list
  static member Cons : head:'T * tail:'T list -> 'T list
  static member Empty : 'T list

Full name: Microsoft.FSharp.Collections.List<_>

val head : list:'T list -> 'T

Full name: Microsoft.FSharp.Collections.List.head

val fst : tuple:('T1 * 'T2) -> 'T1

Full name: Microsoft.FSharp.Core.Operators.fst

val skipgram : obj

Full name: Index.skipgram

val words : obj

Full name: Index.words

val wrodsWithVectors : obj

Full name: Index.wrodsWithVectors