spaCy, Natural Language Processing library
spaCy (https://spacy.io/):
Industrial-Strength Natural Language Processing Python Library
It's commercial open-source software, released under the MIT license.
spaCy github: https://github.com/explosion/spaCy
- Features
Non-destructive tokenization Named entity recognition Support for 28+ languages 13 statistical models for 8 languages Pre-trained word vectors Easy deep learning integration Part-of-speech tagging Labelled dependency parsing Syntax-driven sentence segmentation Built in visualizers for syntax and NER Convenient string-to-hash mapping Export to numpy data arrays Efficient binary serialization Easy model packaging and deployment State-of-the-art speed Robust, rigorously evaluated accuracy |
- How to use it?
1. Install spacy in 'cmd' as administrater
pip install spacy
2. Download language model ('en' : 37.4M)
python -m spacy download en
* You can choose other models here: https://spacy.io/models/en
* If you want to delete the model,
Go to '\site-packages\spacy\data',
(If you use Anaconda,
C:\ProgramData\Anaconda3\envs\<envname>\Lib\site-packages\spacy\data)
delete the folder name 'en'
3. Try it (using python)
import spacy nlp = spacy.load('en') doc = nlp(u'This is a sentence.') print(doc.text) for token in doc: print(token.text, token.pos_, token.dep_) |
4. Try TorchTextTutorial (using spaCy)
https://github.com/mjc92/TorchTextTutorial