NFastText
PM> Install-Package NFastText
Description
NFastText is a port of facebook's classification and vectorization tool to .net.
Data
Samples use data files which could be prepared by this script.
Classification
This example demonstrates how to train, test and use text classifier.
Train
Library expects, as an input for training, a text file where every line is a single document. And every line has to contain one or more labels. Labels are words with a specified prefix.
1: 2: 3: |
|
Now we are ready to train. You could use some helpers from a module FileReader to use files and streams as input.
1: 2: 3: 4: 5: 6: |
|
Test
Test expects a sequence of lines where line represeted as an array of words. You could use helper streamToLines
1: 2: 3: 4: |
|
Predict
Predict expects a sequence of lines where line represeted as an array of words. It returns a sequence of lists where every list contains k best predictions(labels) with weights.
1: 2: 3: 4: 5: 6: 7: |
|
Vectorization
Train
Works almost the same as classification, but train files could be without line endings.
1: 2: |
|
Vectorization
Expects as input a sequence of words and result is a sequence of tuples of words with associated vectors.
1: 2: |
|
Common tasks
You could save and load trained models
1: 2: |
|
More info
Original FastText project contains papers about how it works.
-
API Reference contains automatically generated documentation for all types, modules and functions in the library. This includes additional brief samples on using most of the functions.
Contributing and copyright
The project is hosted on GitHub where you can report issues, fork the project and submit pull requests. If you're adding a new public API, please also consider adding samples that can be turned into a documentation. You might also want to read the library design notes to understand how it works.
The library is available under Public Domain license, which allows modification and redistribution for both commercial and non-commercial purposes. For more information see the License file in the GitHub repository.
Full name: Index.trainData
Full name: Index.state
Full name: Index.testData
Full name: Index.r
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.printfn
Full name: Index.k
Full name: Index.r
Full name: Index.r
from Microsoft.FSharp.Collections
Full name: Microsoft.FSharp.Collections.Seq.head
module List
from Microsoft.FSharp.Collections
--------------------
type List<'T> =
| ( [] )
| ( :: ) of Head: 'T * Tail: 'T list
interface IEnumerable
interface IEnumerable<'T>
member GetSlice : startIndex:int option * endIndex:int option -> 'T list
member Head : 'T
member IsEmpty : bool
member Item : index:int -> 'T with get
member Length : int
member Tail : 'T list
static member Cons : head:'T * tail:'T list -> 'T list
static member Empty : 'T list
Full name: Microsoft.FSharp.Collections.List<_>
Full name: Microsoft.FSharp.Collections.List.head
Full name: Microsoft.FSharp.Core.Operators.fst
Full name: Index.skipgram
Full name: Index.words
Full name: Index.wrodsWithVectors