rss
logo

I provide consulting and custom development for Natural Language Processing, Information Extraction and Search solutions.Self Picture


 learn more   get in touch 

Logo - I Build Search
Aug 05
2007

Document Tagger digg

DocTagger lets you automatically classify text documents. Use this as a starting point to write apps that can sort through volumes of unorganized data.

Try it!

Enter some text (300 words max) about any topic and hit Analyze to watch the tagger in action.


How it works

In short,

  1. POS-tagging the document.
  2. Stopword removal.
  3. Construct Synset map.
  4. Analyze Hypernymy relations.
  5. Output Synsets with highest score(s).

To learn more, you can read my presentation titled Text Classification using Wordnet.

Troubleshooting

If you encounter any issues or would like to give me feedback, email me at
pravinp -at- gmail -dot- com

One Response (rss) (trackback)

#1

diego

February 1st, 2010 at 3:23 pm

Hi, from where I can download the code for document tagger? or PHP Classes for Natural Language Processing ?
thanks

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">

Latest Articles

Feb
19

Join a list of integers in Python

How do you run a string join on a list of integers in Python? After googling for about 10 mins, I gave up and did this. I am sure there is a better way of doing it! [Read More]
Jan
21

Writing a spider in 10 mins using Scrapy

I came across Scrapy a few days back and have grown to really love it. This tutorial will illustrate how you can write a simple spider using Scrapy to scrape data off Paul Smith. All this in 10 minutes. [Read More]

Featured Projects

Yahoo Messenger Client for *nix

Yahoo Messenger Client for *nix

Yux is an alternative Yahoo Messenger client for *nix systems that attempts to match the look and feel of the original Windows client.

[Read More]

Indic to English Transliterator

Indic to English Transliterator

Transliteration is the process of converting a word from one language to another while retaining its phonetic characteristics. This application lets you convert a word from any major Indian language (currently supports Hindi, Marathi, Sanskrit and Bengali) to English.

[Read More]

This page and its contents are copyright © 2010, Pravin Paratey.