AI & ML interests
Natural language processing, pharmaceuticals and social sciences research
We help organisations extract value from unstructured data. We provide professional data science consulting services for businesses. We are able to respond rapidly to new challenges and you will always be speaking directly to a professional data scientist rather than a non-technical sales representative.
Read more on our website or on our Github Pages.
Why would I need NLP consulting?
Has your boss ever asked you to make a summary in Excel of how much all your customers spend? If the data is in an Excel already, then the task is easy.
But what if your company processes fixes to new-build houses? Every homeowner sends in a form with the problems (electrical fault, damp, damaged plasterboard) described in plain text. Your boss asks you to find the commonest construction faults. These are all in PDF files submitted by homeowners. How could you do this in Excel? Where would you even start?
With NLP consulting we can analyse text data and produce predictions: how likely is an issue to escalate?
Open source projects (MIT licence)
The Harmony project (Github repo) - Harmony is a tool and research project using natural language processing to harmonise mental health data. Read more at https://harmonydata.org and try the demo at https://app.harmonydata.org/.
Clinical Trial Risk Tool (Github repo) - a tool using natural language processing to categorise clinical trial protocols (PDFs) into high, medium or low risk. Read more at https://clinicaltrialrisk.org/ and try the demo at https://app.clinicaltrialrisk.org/.
Other open source libraries
Localspelling (Github repo) - a library for localising spelling between US and UK variants - install from the command line with
pip install localspelling
country_named_entity_recognition (Github repo) - a lightweight Python library for recognising country names in unstructured text and returning Pycountry objects
pip install country_named_entity_recognition
drug_named_entity_recognition (Github repo) - a lightweight Python library for recognising drug names in unstructured text
pip install drug-named-entity-recognition
Fast Stylometry (Github repo) - a Python library for forensic stylometry.
pip install faststylometry
. Read tutorial.
Blog
We regularly post on Fast Data Science's blog.
Popular posts include
Contact
You can contact us on our website.