Shakerato

spaCy, Natural Language Processing library 본문

Research

spaCy, Natural Language Processing library

Shakeratto 2018. 7. 1. 20:48

spaCy (https://spacy.io/): 


Industrial-Strength Natural Language Processing Python Library

It's commercial open-source software, released under the MIT license.


spaCy github: https://github.com/explosion/spaCy


- Features

Non-destructive tokenization

Named entity recognition

Support for 28+ languages

13 statistical models for 8 languages

Pre-trained word vectors

Easy deep learning integration

Part-of-speech tagging

Labelled dependency parsing

Syntax-driven sentence segmentation

Built in visualizers for syntax and NER

Convenient string-to-hash mapping

Export to numpy data arrays

Efficient binary serialization

Easy model packaging and deployment

State-of-the-art speed

Robust, rigorously evaluated accuracy 



- How to use it?


1. Install spacy in 'cmd' as administrater

pip install spacy


2. Download language model ('en' : 37.4M)

python -m spacy download en


* You can choose other models here: https://spacy.io/models/en

* If you want to delete the model,

  Go to '\site-packages\spacy\data', 

  (If you use Anaconda, 
  C:\ProgramData\Anaconda3\envs\<envname>\Lib\site-packages\spacy\data)

  

  delete the folder name 'en'


3. Try it (using python)

import spacy

nlp = spacy.load('en')

doc = nlp(u'This is a sentence.') 

print(doc.text)


for token in doc:

    print(token.text, token.pos_, token.dep_)


4. Try TorchTextTutorial (using spaCy)

https://github.com/mjc92/TorchTextTutorial


Comments