NFastText


NFastText

The NFastText library can be installed from NuGet:
PM> Install-Package NFastText

Description

NFastText is a port of facebook's classification and vectorization tool to .net.

Data

Samples use data files which could be prepared by this script.

Classification

This example demonstrates how to train, test and use text classifier.

Train

Library expects, as an input for training, a text file where every line is a single document. And every line has to contain one or more labels. Labels are words with a specified prefix.

1: 
2: 
3: 
__label__spam __label__nsfw enlarge your .... xxx
__label__not_a_spam __label__conference Fsharp conference next year
...

Now we are ready to train. You could use some helpers from a module FileReader to use files and streams as input.

1: 
2: 
3: 
4: 
5: 
6: 
#r "NFastText.dll"
open NFastText
open NFastText.FileReader
let trainData = Input.FilePath("./data/dbpedia.train")
//we train our classifier in 4 threads, using 2 wordgrams and default model args, verbosive, without pretrained vectors and with label prefix "__label__"
let state = Classifier.train(trainData, 4, Classifier.args, 2uy, "__label__", true, None)

Test

Test expects a sequence of lines where line represeted as an array of words. You could use helper streamToLines

1: 
2: 
3: 
4: 
let testData = Input.FilePath("./data/dbpedia.test")
let r = Classifier.test(state, 1, FileReader.streamToLines testData)
printfn "%A" r
assert(r.precision >= 9.8f) 

Predict

Predict expects a sequence of lines where line represeted as an array of words. It returns a sequence of lists where every list contains k best predictions(labels) with weights.

1: 
2: 
3: 
4: 
5: 
6: 
7: 
let testData = Input.FilePath("./data/dbpedia.test")
let k = 1
let r = Classifier.predict(state, k, FileReader.streamToLines testData)
let r =  r |> Seq.head
           |> List.head 
           |> fst
assert(r = "__label__9")

Vectorization

Train

Works almost the same as classification, but train files could be without line endings.

1: 
2: 
let trainData = Input.FilePath("./data/text9")
let skipgram = Vectorizer.train(trainData,4,Vectorizer.args,Args.VecModel.sg, 3uy, 6uy, true)

Vectorization

Expects as input a sequence of words and result is a sequence of tuples of words with associated vectors.

1: 
2: 
let words = Input.FilePath("./data/queries.txt") |> FileReader.streamToWords
let wrodsWithVectors = Vectorizer.getWordVectors(skipgram,words)

Common tasks

You could save and load trained models

1: 
2: 
FastTextM.saveState "path" state
let state = FastTextM.loadState "path"

More info

  • Original FastText project contains papers about how it works.

  • API Reference contains automatically generated documentation for all types, modules and functions in the library. This includes additional brief samples on using most of the functions.

Contributing and copyright

The project is hosted on GitHub where you can report issues, fork the project and submit pull requests. If you're adding a new public API, please also consider adding samples that can be turned into a documentation. You might also want to read the library design notes to understand how it works.

The library is available under Public Domain license, which allows modification and redistribution for both commercial and non-commercial purposes. For more information see the License file in the GitHub repository.

val trainData : obj

Full name: Index.trainData
val state : obj

Full name: Index.state
union case Option.None: Option<'T>
val testData : obj

Full name: Index.testData
val r : obj

Full name: Index.r
val printfn : format:Printf.TextWriterFormat<'T> -> 'T

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
val k : int

Full name: Index.k
val r : seq<(string * obj) list>

Full name: Index.r
val r : string

Full name: Index.r
module Seq

from Microsoft.FSharp.Collections
val head : source:seq<'T> -> 'T

Full name: Microsoft.FSharp.Collections.Seq.head
Multiple items
module List

from Microsoft.FSharp.Collections

--------------------
type List<'T> =
  | ( [] )
  | ( :: ) of Head: 'T * Tail: 'T list
  interface IEnumerable
  interface IEnumerable<'T>
  member GetSlice : startIndex:int option * endIndex:int option -> 'T list
  member Head : 'T
  member IsEmpty : bool
  member Item : index:int -> 'T with get
  member Length : int
  member Tail : 'T list
  static member Cons : head:'T * tail:'T list -> 'T list
  static member Empty : 'T list

Full name: Microsoft.FSharp.Collections.List<_>
val head : list:'T list -> 'T

Full name: Microsoft.FSharp.Collections.List.head
val fst : tuple:('T1 * 'T2) -> 'T1

Full name: Microsoft.FSharp.Core.Operators.fst
val skipgram : obj

Full name: Index.skipgram
val words : obj

Full name: Index.words
val wrodsWithVectors : obj

Full name: Index.wrodsWithVectors
Fork me on GitHub