DinaDemner-Fushman1,3andJimmyLin2,3
DepartmentofComputerScience2
CollegeofInformationStudies3
InstituteforAdvancedComputerStudies
UniversityofMarylandCollegePark,MD20742,USA
demner@cs.umd.edu,jimmylin@umd.edu
1
Abstract
Thecombinationofrecentdevelopmentsinquestionan-sweringresearchandtheunparalleledresourcesdevel-opedspecificallyforautomaticsemanticprocessingoftextinthemedicaldomainprovidesauniqueopportu-nitytoexplorecomplexquestionansweringintheclin-icaldomain.Inthispaper,weattempttooperationalizemajoraspectsofevidence-basedmedicineintheformofknowledgeextractorsthatserveasthefundamentalbuildingblocksofaclinicalquestionansweringsys-tem.Ourevaluationsdemonstratethatdomain-specificknowledgecanbeeffectivelyleveragedtoextractPICOframeelementsfromMEDLINEabstracts.Clinicalin-formationsystemsinsupportofphysicians’decision-makingprocesshavethepotentialtoimprovethequal-ityofpatientcareinreal-worldsettings.
Introduction
Thefocusofquestionansweringresearchisshiftingawayfromsimplefact-basedquestionsthatcanbeansweredwithrelativelylittlelinguisticknowledgeto“harder”questionsthatrequirereasoningandgatheringinformationfrommul-tiplesources.Generalpurposereasoningonanythingotherthansuperficiallexicalrelationsisexceedinglydifficultbe-causethereisavastamountofworldandcommonsenseknowledgethatmustbeencoded,eithermanuallyorauto-matically,toovercomethebrittlenessoftenassociatedwithlongchainsofevidence.However,theavailabilityofrichexistingknowledgesourcesandontologiesincertaindo-mainspresentsaninterestingopportunityforquestionan-sweringsystems.Howmightonegoaboutleveragingtheseresourceseffectively?
Weexplorethisresearchproblemintheclinicaldomain,whichiswell-suitedforexperimentswithknowledge-basedquestionansweringtechniquesforseveralreasons.First,understandingofthedomainhasalreadybeencodifiedin
theUnifiedMedicalLanguageSystemR
(UMLS)(Lind-berg,Humphreys,&McCray1993).Second,softwareforutilizingthisontologyalreadyexists:MetaMap(Aron-son2001)identifiesconceptsinfreetext,whileSem-Rep(Rindflesch&Fiszman2003)extractsrelationsbetween
Copyright
c2005,AmericanAssociationforArtificialIntelli-gence(www.aaai.org).Allrightsreserved.
therecognizedconcepts.Bothsystemsutilizeandpropa-gatesemanticinformationfromUMLSknowledgesources:
MetathesaurusR
,theSemanticNetwork,andtheSPECIAL-ISTlexicon.The2004versionoftheUMLSMetathesauruscontainsinformationaboutover1millionbiomedicalcon-ceptsand5millionconceptnamesfrommorethan100controlledvocabularies.TheSemanticNetworkprovidesaconsistentcategorizationofallconceptsrepresentedintheUMLSMetathesaurus.Third,theframeworkofevidence-basedmedicine(Sackettetal.2000)providesatask-basedmodeloftheclinicalinformation-seekingprocess;thePICOframeforcapturingwell-formulatedclinicalqueries(de-scribedlater)canserveastheknowledgerepresentationthatbridgestheneedsofcliniciansandanalyticalcapabilitiesofasystem.Theconfluenceofthesemanyfactorsmakesclin-icalquestionansweringaveryexcitingareaofresearch.Furthermore,theneedtoanswerquestionsrelatedtopa-tientcareatthepointofservicehasbeenwellstudiedanddocumented(Covell,Uman,&Manning1985;Gorman,Ash,&Wykoff1994).AccordingtoElyetal.(2005),thedesirablefeaturesofasystemcapableofprovidinganswerstoclinicalquestionsare:
•“comprehensiveresourcesthatanswerquestionslikelytooccurinpracticewithemphasisontreatmentandbottom-lineadvice”,and•theabilityto“locateinformationquicklybyusinglists,tables,boldedsubheadings,andalgorithms,andbyavoid-inglengthy,uninterruptedprose”.
TheMEDLINER
databaseisideallysuitedforaddress-ingthefirstrequirementandisindeedoftenusedbyclini-ciansinthatcapacity(DeGroote&Dorsch2003).However,studieshavealsoshownthatexistingsystemsforsearch-ingMEDLINEareofteninadequateandunabletosupplyclinically-relevantanswersinatimelymanner(Gorman,Ash,&Wykoff1994;Chambliss&Conley1996).Reflect-ingonElyetal.’ssecondrequirement,itisclearthattradi-tionaldocumentretrievaltechnologyappliedtoMEDLINEabstractsisinsufficientforsatisfactoryinformationaccess;researchandexperiencepointtotheneedforsystemsthatautomaticallyanalyzetextandreturnonlytherelevantin-formation,appropriatelysummarizingandfusingsegmentsfrommultipletexts.Suchasystemshouldalsorankresultsbasedontheirrelevancetotheclinicaltask,takingintoac-
countsuchfactorsasthequalityofresearchresultsandre-cencyofthearticle.Inshort,clinicianswouldgreatlybene-fitfromadvancedquestionansweringcapabilitiestoprovidedecisionsupportinthepatientcareprocess.
Thispaperreportsongoingeffortstodevelopanddeployclinicalquestionansweringsystems,whichbuildonprevi-ousrelatedprojectswehaveworkedon(Demner-Fushmanetal.2004).Wefocusourattentionhereonoperationalizingtheprocessofevidence-basedmedicineintermsofknowl-edgeextractors,whichserveasthebuildingblocksofanend-to-endclinicalquestionansweringsystem.Morespecif-ically,thispaperdescribestechniquesforextractingpopula-tion,problem,intervention,comparison,andoutcomefromMEDLINEabstracts.Theseelements,combinedwithmeta-dataalreadyassociatedwithMEDLINEcitations,determinetherelevanceofaparticularabstractwithrespecttoclini-cians’questions.
Withrespecttothebroaderresearchquestionconcerningtheroleofknowledgeinquestionanswering,wedemon-stratethatsimple,appropriateusesofdomainknowledgecansimplifythetaskofextractingrelevantsemanticinfor-mationfromtext.Evaluationsshowthatourknowledgeextractiontechniquesachieverespectableperformanceandserveasasolidfoundationforfuturework.
Evidence-BasedMedicine
Evidence-basedmedicine(EBM)isawidely-acceptedparadigmformedicalpracticethatstressestheimportanceofevidencefrompatient-centeredclinicalresearchinthehealthcareprocess.Clinicalevidenceprovidestheinforma-tionnecessarytodevelopphysicians’individualexpertise,whichinturnresultsinhigherqualitypatientcare.Weseektodevelopdecision-supportsystemsthatcomplementthisparadigmofmedicalpractice.
Evidence-basedmedicineoffersthreeorthogonalviewsofthedomainthat,whentakentogether,provideaframe-workforcodifyingtheknowledgeinvolvedinansweringquestionsrelatedtopatientcare.Thesethreecomplemen-taryviewsareoutlinedbelow.
Thefirstviewdescribesthefourmainclinicaltasksthatphysiciansengagein:therapy,diagnosis,etiology,andprog-nosis.TermsandthetypesofstudiesrelevanttoeachofthefourtaskshavebeenextensivelystudiedbytheHedgesProjectattheMcMasterUniversity(Wilczynski,McKibbon,&Haynes2001).Theresultsofthisresearchareimple-mentedinthePubMedClinicalQueriestools,whichcanbeusedtoretrievetask-specificcitations.
Thesecondviewisindependentoftheclinicaltaskandpertainstothestructureofawell-builtclinicalquestion.Thefollowingfourcomponentshavebeenidentifiedasthekeyelementsofaquestionrelatedtopatientcare(Richardsonetal.1995):
1.Whatistheprimaryproblemordisease?Whatarethecharacteristicsofthepatient(e.g.,age,gender,orco-existingconditions)?2.Whatisthemainintervention(e.g.,adiagnostictest,med-ication,ortherapeuticprocedure)?
3.Whatisthemaininterventioncomparedto(e.g.,nointer-vention,anotherdrug,anothertherapeuticprocedure,oraplacebo)?4.Whatistheeffectoftheintervention?Werethepatient’ssymptomsrelievedoreliminated?Sideeffectsreduced?Costreduced?etc.ThesefourelementsareoftenreferencedwithamnemonicPICO,whichstandsforPatient,Intervention,Comparison,andOutcome.
Finally,thethirdviewservesasatoolforappraisingthestrengthofevidencepresentedintheclinicalstudy,i.e.,howmuchconfidenceshouldaphysicianhaveintheresults?Severaltaxonomiesforappraisingthestrengthofevidencebasedonthetypeandqualityofthestudyhavebeendevel-oped.WechosetheStrengthofRecommendationsTaxon-omy(SORT)asthebasisfordeterminingthepotentialup-perboundonthequalityofevidence,duetoitsemphasisontheuseofpatient-orientedoutcomesanditsattempttounifyotherexistingtaxonomies(Ebelletal.2004).TherearethreelevelsofrecommendationsaccordingtoSORT:1.A-levelevidenceisbasedonconsistent,goodqualitypatientoutcome-orientedevidencepresentedinsystem-aticreviews,randomizedcontrolledclinicaltrials,cohortstudies,andmeta-analysis.2.B-levelevidenceisinconsistent,limitedqualitypatientorientedevidenceonthesametypesofstudies3.C-levelevidenceisbasedondisease-orientedevidenceorstudieslessrigorousthanrandomizedcontrolledclin-icaltrials,cohortstudies,systematicreviewsandmeta-analysis.Anyquestionansweringsystemdesignedtosupportthepracticeofevidence-basedmedicinemustbesensitivetothemultifacetedconsiderationsthatgointoevaluatinganabstract’srelevancetoaclinicalquery.Asacomponentofaclinicalquestionansweringsystem,wehavedevel-opedknowledgeextractorsthatidentifyPICOframeele-mentsfromMEDLINEabstractsandclassifytheirevidencegradelevel(correspondingtothesecondandthirdviewsofevidence-basedmedicine).Wefirstdescribethesetech-niquesandthenpresentourevaluationresults.
ExtractingPICOFrameElements
Evidence-basedmedicineprovidesapre-existingdomainmodelforencodingthesemanticknowledgenecessarytoansweraclinicalquestion.Inparticular,thePICOframedescribesthestructureofawell-builtclinicalquery,andcanserveasthecoreorganizingknowledgestructureofaques-tionansweringsystem:theinformationseekingprocesscanbeviewedassemanticunificationbetweenpartiallyinstanti-atedPICOqueryframesandcorrespondingframesautomat-icallyextractedfromMEDLINEabstracts.Althoughclin-iciansaremostofteninterestedintheoutcome(ofatreat-ment,forexample),theotherframeelementsarecriticalinassessingtherelevanceofaparticularstudy.Thus,theauto-maticextractionofpopulation,problem,intervention,com-parison,andoutcomerepresentsakeycapabilityintegralto
clinicalquestionanswering.Thissectiondetailsextractionmodulesthatidentifyeachoftheseelements(seeexampleofacompletelyannotatedabstractinFigure1).
Aspreviouslymentioned,softwarealreadyexistsforidentifyingconcepts(MetaMap)andrelations(SemRep)inmedicaltexts,whichweextensivelyuseinourknowledgeextractors.Furthermore,wetakeadvantageofcoarser-grainedsemantictypes,SemanticGroups(McCray,Burgun,&Bodenreider2001),tocapturehigher-levelgeneraliza-tions.Anadditionalfeaturewetakeadvantageof(whenpresent)isexplicitdiscoursemarkerspresentinsomeab-stracts.Thesesocalledstructuredabstractswererecom-mendedbytheAdHocWorkingGroupforCriticalAp-praisaloftheMedicalLiterature(1987)tohelphumansas-sessthereliabilityandcontentofapublicationandtofacil-itatetheindexingandretrievalprocesses.Theseabstractslooselyadheretotheintroduction,methods,results,andconclusionsformat,andsummarizeastudyusingsectionswiththeaboveheadings.Althoughmanycoreclinicaljour-nalsrequirestructuredabstracts,thereisagreatdealofvari-ationintheactualheadings.Evenwhenpresent,thehead-ingsareusuallynotorganizedinamannerfocusedonpatientcare.Inaddition,abstractsofmanyhigh-qualityresearchremainunstructured.Forthesereasons,explicitdiscoursemarkersarenotentirelyreliableindicatorsforthevarioussemanticelementsweseektoextract,butmustbeconsid-eredalongwithothersourcesofevidence.
TheextractionofeachPICOframeelementreliestoadif-ferentextentonanannotatedcorpusofMEDLINEabstracts.ThefirstauthorofthispaperleadaneffortinthecreationofsuchacollectionattheNationalLibraryofMedicine.Aswillbedescribedbelow,thepopulation,problem,andtheintervention/comparisonextractionmodulesarebasedonmanuallyconstructedrules;theoutcomeextractionmod-ule,incontrast,employssupervisedmachinelearningtech-niques.Thesetwoverydifferentapproachesarecausedbydifferencesinthenatureoftheframeelements:whereasproblemandinterventioncanbedirectlymappedtocon-cepts,andpopulationeasilymapstopatternsthatincludeconcepts,outcomestatementsarealwayspassagesranginginsizefromasingleclausetoeightsentences.Theinitialgoalofourannotationeffortwastoidentifyoutcomestate-mentsinabstracttext(Demner-Fushmanetal.,inprepara-tion).Aphysician,twonursepractitioners,andanengineer-ingresearchermanuallyidentifiedsentencesthatdescribeoutcomeswithin633MEDLINEabstracts.TheabstractswereretrievedusingPubMedandattemptedtomodeldif-ferentuserbehaviorsrangingfromna¨ıvetoexpert.Withtheexceptionof50articlesretrievedtoanswerachildhoodim-munizationquestion,therestofthearticleswereretrievedusingadisease,forexample,diabetes.Whenemulatinganexpertuser,advancedsearchfeatureswereemployed.Ofthe633citations,onehundredabstractswerealsofullyan-notatedwithpopulation,problem,intervention,andcompar-ison.Theseonehundredabstractsweresetasideasaheld-outtestset.Oftheremainingcitations,275wereusedfortrainingandrulederivation,asdescribedinthefollowingsections.
TheTwoPs
ThePICOframeworkmakesnodistinctionbetweenthepop-ulationandtheproblem,whichisrootedintheconceptofthepopulationinclinicalstudies,e.g.,thefollowingsen-tenceinastructuredabstract:“POPULATION:Fifty-fivepostmenopausalwomenwithaurodynamicdiagnosisofgenuineurinarystressincontinence.”Althoughthisclausesimultaneouslydescribesthepopulation(ofwhichanypar-ticularpatientcanbeviewedasasampletherefrom)andtheproblem,wechosetoseparatetheextractionofthetwoelementsbecausetheyarenotalwaysdescribedtogether.Furthermore,manyclinicalquestionsaskaboutaparticularproblemwithoutspecifyingapopulation.
PopulationExtractor
Populationstatementsareidentifiedusingaseriesofmanu-allycraftedrules,basedonseveralassumptions:
•Theconceptinvolvedinthedescriptionofpopulationbe-longstothesemantictypeGROUPoranyofitschildren.Inaddition,certainnounsareoftenusedtodescribestudyparticipantsinmedicaltexts;forexample,anoftenob-servedpatternis“subjects”or“cases”followedbyacon-ceptfromthesemanticgroupDISORDER.
•Thenumberofsubjectsthatparticipatedinthestudypre-cedesorfollowstheconceptidentifiedasaGROUP.
•TheconfidencethataclausewithanidentifiednumberandGROUPcontainsinformationaboutthepopulationisinverselyproportionaltothedistancebetweenthetwoen-tities.
•Theconfidencethataclausecontainsthepopulationisinfluencedbythepositionoftheclause(withrespecttoheadingsinthecaseofstructuredabstractsandwithre-specttothebeginningoftheabstractinthecaseofun-structuredabstracts).Giventheseassumptions,thepopulationextractorsearchesforthefollowingpatterns:
•GROUP(n=number)(forexample,“in5-6-year-oldFrenchchildren(n=234),Subjects(n=54)”)
•number*GROUP(forexample,“forty-nineinfants”)
•number*DISORDER*GROUP?(forexample,44HIV-infectedchildren)Theconfidenceofaparticularpatternmatchisafunc-tionofbothitspositionintheabstractanditspositionintheclausefromwhichitwasextracted.Ifanumberisfollowedbyameasure,forexample,yearorpercent,thenumberisdiscarded,andpatternmatchingcontinues.Aftertheentiredocumentisprocessedinthismanner,thematchedpatternwiththehighestconfidencevalueisretainedasthepopula-tiondescription.
ProblemExtractor
Theproblemextractorreliesonrecognitionofconceptsbe-longingtothesemanticgroupDISORDER(McCray,Burgun,&Bodenreider2001).Inshort,itsimplyreturnsapartiallyrankedlistofallconceptsrecognizedasDISORDER.We
evaluatetheperformanceofthissimpleheuristiconseg-mentsoftheabstractvaryinginlength:theabstracttitleonly,abstracttitleandfirsttwosentences,andentireabstracttext.Conceptsidentifiedinthetitlearegivenpreferenceintherankedorderingofproblems,inanefforttodistinguishtheprimaryproblemfromco-occurringconditions.
Intervention/ComparisonExtractor
Theinterventionandcomparisonframeelementsdonotre-quireseparateprocessingbecausetheyusuallybelongtothesamesemantictype.Ourinterventionextractorsimplyproducesanunorderedlistofinterventionsunderstudy,ofwhichoneisthemaininterventionandtherestarecompar-isons.Forconvenience,wesimplyrefertothismoduleastheinterventionextractor.
Foreachoftheclinicaltasks,semanticrelationsdefinedintheUMLSSemanticNetworkprovidestrongcuesfortheintervention:treatsandcarriesoutforTHERAPY;diagnosesfordiagnosis;causesandresultofforetiology;andpreventsforprognosis.Restrictionsonthesemantictypeofconceptsparticipatingintheserelationsserveasthebasisforthein-terventionextractorrules.Forexample,therelationTHERA-PEUTICORPREVENTIVEPROCEDUREtreatsDISEASEORSYNDROMEidentifiestheTHERAPEUTICORPREVENTIVEPROCEDUREastheinterventionfortheclinicaltasktherapy.Atpresent,ourinterventionextractorrecognizesconceptsbelongingtoninesemantictypes,forexample,DIAGNOS-TICPROCEDURE,CLINICALDRUG,andHEALTHCAREACTIVITY.
Inadditiontothesemantictypeinformation,theinter-ventionextractionrulestakeintoaccountpositionalinfor-mation.Withstructuredabstracts,thetitles,aims,andmeth-odssectionsaremostlikelytoprovideinformationabouttheintervention.Withunstructuredabstracts,relationsfromthefirstthirdoftheabstractareheavilyfavored.Finally,theinterventionextractortakesintoaccountthepresenceofcertaincuephrasesthatdescribetheaimand/ormeth-odsofthestudy,suchas“This*studyexamines”or“Thispaperdescribes”.Asinthepreviousmodules,informationfromthesedifferentsourcesarecombinedusinganad-hocweightingscheme.
OutcomeExtractor
Incontrastwiththeothermodules,weapproachoutcomeextractionasaclassificationtaskatthesentencelevel,i.e.,foreachsentenceinanabstract,theoutcomeextractorpre-dictswhetheritstatesanoutcomeornot.Ourpreliminaryexplorationshaveleadtoastrategybasedonanensembleofclassifiers,whichinclude:arule-basedclassifier,aunigram“bagofwords”classifier,an-gramclassifier,apositionclas-sifier,adocumentlengthclassifier,andasemanticclassifier.Withtheexceptionoftherule-basedclassifier,allclassifiersweretrainedonthe275citationsfromtheannotatedcollec-tiondescribedabove.
Knowledgefortherule-basedclassifierwashand-codedbyaregisterednursewith20yearsofclinicalexperiencepriortotheannotationeffort.Thisclassifieroutputsabi-narydecisionbasedoncuephrasessuchas“significantlygreater”,“welltolerated”,and“adverseevents”.
Theunigram“bagofwords”classifierisaNa¨ıveBayesclassifierimplemented1withtheAPIprovidedbytheMAL-LETtoolkit.Thisclassifieroutputstheprobabilityofaclassassignment.
Then-grambasedclassifierisalsoaNa¨ıveBayesclassi-fier,butitoperatesonadifferentsetoffeatures.Wefirstidentifiedthemostinformativeunigramsandbigramsus-ingtheinformationgainmeasure(Yang&Pedersen1997),andthenselectedonlythepositiveoutcomepredictorsus-ingoddsratio(Mladenic&Grobelnik1999).Topic-specificterms,suchasrheumatoidarthritis,werethenremoved.Fi-nally,thelistoffeatureswasrevisedbytheregisterednursewhoparticipatedintheannotationeffort.Thisclassifieralsooutputstheprobabilityofaclassassignment.
Thepositionclassifierreturnsthemaximumlikelihoodes-timatethatasentenceisanoutcomebasedonitspositionintheabstract(forstructuredabstracts,withrespecttotheresultsorconclusionssections;forunstructuredabstracts,withrespecttotheendoftheabstract).
Thedocumentlengthclassifierreturnsasmoothed(addonesmoothing)probabilitythatadocumentofagivenlength(inthenumberofsentences)containsanoutcomestatement.Forexample,theprobabilitythatafoursentence-longdocumentcontainsanoutcomestatementis0.25,andtheprobabilityoffindinganoutcomeinatensentence-longabstractis0.92.Interestingly,theaveragelengthofdocu-mentswithandwithoutoutcomestatementsdiffers:theav-eragelengthoftheformeris11.7sentences,whereasthelengthofthelatteris7.95sentences.
Thesemanticclassifierassignstoasentenceanad-hocscorebasedonthepresenceofUMLSconceptsbelong-ingtosemanticgroupshighlyassociatedwithoutcomessuchasTHERAPEUTICPROCEDUREorPHARMACOLOG-ICALSUBSTANCE.Thescoreisgivenanad-hocboostiftheconcepthasalreadybeenidentifiedbyMetaMapelsewhereintheabstract(forexample,iftheproblemorinterventionwereobservedinthesentenceunderconsideration).
Theoutputofourbasicclassifiersarecombinedusinglin-earinterpolationwithad-hocweightsassignedbasedonin-tuition.Werecognizethatouroutcomeextractoremploysa“kitchensink”approach,butnotethatthismoduleismostlytheoutgrowthofanexplorationprocessintheunchartedso-lutionspaceforclinicalquestionanswering.Amoreprin-cipledapproachtooutcomeextractionwillbereservedforfuturework.
DeterminingtheStrengthofEvidence
Thepotentialhighestlevelofthestrengthofevidenceforagivencitationcanbeidentifiedusingthepublicationtypeand/orMeSHheadingspertainingtothetypeoftheclinicalstudyassignedtothearticleduringtheindexingprocess.Ta-ble1showsourmappingfrompublicationtypeandMeSHheadingtotheevidencegrade,basedonprinciplesdefinedintheStrengthofRecommendationsTaxonomy.
Additionalinformationnecessaryforthefinaldetermina-tionofrelevanceincludesthenumberofstudyparticipants,statisticalmethodsinvolved,randomization,blinding,and
1
http://mallet.cs.umass.edu
StrengthofEvidenceLevelA(1)LevelB(2)LevelC(3)PublicationType/MeSHMeta-Analysis,RandomizedControlledTrials,CohortStudy,Follow-upStudyCase-ControlStudy,CaseSeriesJournalArticle,CaseReport
Table1:StrengthofevidencecategoriesbasedonPublicationTypeandMeSHheadings.correctunknownwrongbaseline53.3%-46.7%extractor
80%
10%
10%
Table2:Evaluationofthepopulationextractor
thequalityoffollow-up(Ebelletal.2004).Outofthese,thenumberofsubjectsisextractedalongwiththepopulationinformation.Identificationofthestatistical,blinding,andrandomizationmethodswillbeaddressedinfuturework.
Results
Thissectiondescribesevaluationsconductedontheextrac-tionmodules.Resultsarereportedintermsofthepercentageofcorrectlyidentifiedinstances,percentageofinstancesforwhichtheextractorhadnoanswer,andpercentageofincor-rectlyidentifiedinstances.Thebaselinesandgoldstandardsforeachextractionmodulevaries,andwillbedescribedin-dividually.
PopulationExtraction
Ninetyoftheonehundredfully-annotatedarticlesinourcol-lectionwereagreeduponbytheannotatorsasbeingclinicalinnature,andwereusedastestdataforourpopulationex-tractor.Sincetheseabstractswerenotexaminedintherule-creationprocess,theycanbeviewedasablindheld-outtestset.Theoutputofourpopulationextractorwasjudgedtobecorrectifitoccurredinthesamesentencethatwasannotatedascontainingthepopulationinthegoldstandard.
Forcomparison,ourbaselinesimplyreturnedthefirstthreesentencesoftheabstract.Weconsideredthebaselinecorrectifanyoneofthesentenceswereannotatedascon-tainingthepopulationinthegoldstandard.Thisbaselinewasmotivatedbytheobservationthattheaimandmethodssectionsofstructuredabstractsarelikelytocontainthepop-ulationinformation.Generally,thesesectionscanbefoundinthefirstthreesentencesofbothstructuredandunstruc-turedabstracts.
TheperformanceofourpopulationextractorisshowninTable2;notethattheevaluationofthebaselineismuchmorelenientthantheevaluationofourpopulationextractor.
Therewereseveralsourcesofincorrectandmissedpopu-lations:
•Notallpopulationdescriptionscontainanumberexplic-itly,e.g.,“Themedicalchartsofallpatientswhoweretreatedwithetanerceptforbackorneckpainatasingleprivatemedicalclinicin2003”.
correctunknownwrongabstracttitle85%10%5%title+1sttwosentences90%5%5%entireabstract
86%
2%
12%
Table3:Evaluationoftheproblemextractor
•Notallstudypopulationsarepopulationgroups,asforexamplein“AllPrimaryCareTrustsinEngland.”
•PartofspeechtaggingandchunkingerrorspropagatetothesemantictypeassignmentlevelandaffectthequalityofMetaMapoutput.
ProblemExtraction
Thegoaloftheproblemextractoristoidentifythemainproblemthatcallsfortheinterventionsoutlinedintheab-stract.Atpresent,weassumethatthemainproblemisalwaysaDISORDER.Basedonthisassumption,thegoldstandardfortheproblemextractorcanbedefinedusingtheMeSHheadingsassignedtoanarticleduringthehumanin-dexingprocess,sinceoneoftheindexers’tasksistoidentifythemaintopicofthearticle.Werandomlyselectedfiftyab-stractswithdisordersindexedasthemaintopicfromtheab-stractsretrievedusingPubMedonthefiveclinicalquestionsdescribedin(Sneidermanetal.2005).
Weappliedourproblemextractorondifferentsegmentsoftheabstract:thetitleonly,thetitleandfirsttwosentences,andtheentireabstract.TheseresultsareshowninTable3.Theperformanceofourbestvariant(abstracttitleandfirsttwosentences)approachestheupperboundforMetaMapperformance—whichislimitedbyhumanagreementontheidentificationofsemanticconceptsinmedicaltexts,ases-tablishedin(Pratt&Yetisgen-Yildiz2003).
AlthoughproblemextractionlargelydependsondiseasecoverageinUMLSandMetaMapperformance,theerrorratecouldbefurtherreducedbymoresophisticatedrecog-nitionofimplicitly-statedproblems.Forexample,withre-specttoaquestionaboutimmunizationinchildren,anab-stractaboutthemeasles-mumps-rubellavaccinationnevermentionedthediseasewithoutthewordvaccination;hence,noconceptofthetypeDISEASEORSYNDROMEwasex-tracted.
InterventionExtraction
Theinterventionextractorwasevaluatedinthesameman-nerasthepopulationextractorandcomparedtothesamebaseline.ResultsareshowninTable4.
Someoftheerrorswerecausedbyambiguityoftermsintheintervention.Forexample,intheclause“serumlevels
correctunknownwrongbaseline60%-40%extractor
80%-20%
Table4:Evaluationoftheinterventionextractor
ofanti-HBsAgandpresenceofautoantibodies(ANA,ENA)wereevaluated”,“serum”isrecognizedasaTISSUE,lev-elsasINTELLECTUALPRODUCT,andautoantibodiesandANAasIMMUNOLOGICFACTORS.Inthiscase,however,autoantibodiesshouldbeconsideredaLABORATORYORTESTRESULT.2Inothercases,theextractionerrorswerecausedbysummarysentencesthatwereverysimilartoin-terventionstatements,e.g.,“Thisstudycomparedtheeffectsof52weeks’treatmentwithpioglitazone,athiazolidine-dionethatreducesinsulinresistance,andglibenclamide,oninsulinsensitivity,glycaemiccontrol,andlipidsinpatientswithType2diabetes”.Forthisparticularabstract,thecor-rectinterventioniscontainedinthefollowingsentence:“Pa-tientswithType2diabeteswererandomizedtoreceiveei-therpioglitazone(initially30mgQD,n=91)ormicronizedglibenclamide(initially1.75mgQD,n=109)asmonother-apy”.
OutcomeExtraction
Sinceoutcomestatementswereannotatedineachofthe633citationsinourcollection,itwaspossibletoevaluateouroutcomeextractoronabroadersetofabstracts.Onehundredandfifty-threecitationspertainingtotherapywereselectedfromthosenotusedinthetrainingoftheoutcomeclassifiers.Ofthese,143containedoutcomestatementsandwereusedastheblindheld-outtestset.
Theoutputofouroutcomeextractorisarankedlistofsentences.Basedontheobservationthatannotatorstypi-callymarkedtwotothreesentencesineachabstractasout-comes,weevaluatedtheperformanceofourextractoratcut-offsoftwoandthreesentences;theseresultsareshowninTable5,whereextractor2andextractor3representthetwo-andthree-sentencecutoffs,respectively.Intheevaluation,ouroutcomeextractorwasconsideredcorrectifthesen-tencesitreturnedintersectedwithsentencesjudgedasout-comesbyourannotators.Althoughthisissomewhatofalenientevaluationcriteria,wejustifyitbynotingtheimpor-tanceofpointingthephysicianintherightdirection,eveniftheresultsareonlypartiallyrelevant.Motivatedbythegen-eralexpectationthatoutcomestatementsaretypicallyfoundintheconclusionofastructuredabstractandneartheendoftheabstractinthecaseofunstructuredabstracts,wecom-paredouranswerextractortothebaselineofreturningeitherthefinaltwoorfinalthreesentencesintheabstract(base2andbase3respectivelyinTable5).
AscanbeseeninTable5,returningthetwohighestrankedoutcomesentencesdoesnotoutperformeitherofthebaselines.However,weareencouragedbytheperformanceoftheoutcomeextractoratthethree-sentencecutoff,where
2
MetaMapdoesprovidealternativemappings,butthecurrentextractionmoduleonlyconsidersthebestcandidate.
base2extractor2base3extractor3correct74%75%75%95%unknown----wrong
26%25%25%5%
Table5:Evaluationoftheoutcomeextractor
itachievedhigheraccuracythanthebaselines.Themajor-ityoferrorsinoutcomeextractionwererelatedtoinaccuratesentenceboundaryidentification,chunkingerrors,andwordsenseambiguityintheMetathesaurus.
SampleOutput
AcompleteexampleofourknowledgeextractorsworkinginunisonisshowninFigure1,whichpresentstheextractedPICOelementsoftheabstractretrievedtoanswerthefol-lowingquestion:“Inchildrenwithanacutefebrileillness,whatistheefficacyofsingle-medicationtherapywithac-etaminophenoribuprofeninreducingfever?”(Kauffman,Sawyer,&Scheinbaum1992).“Febrileillness”istheonlyconceptmappedtoDISORDER,andhenceisidentifiedastheproblem.“37otherwisehealthychildrenaged2to12years”iscorrectlyidentifiedasthepopulation.“Acetaminophen”,“ibuprofen”,and“placebo”arecorrectlyextractedasthein-terventionsunderstudy.Thethreeoutcomesentencesarecorrectlyclassified;theshortsentenceconcerningadverseeffectswasrankedlowerthantheotherthreesentencesandhencebelowthecutoff.Thestudydesign,frommetadataas-sociatedwiththecitation,allowsasystemtoautomaticallyclassifythisarticleasapotentiallevel-Aanswer.
RelatedWorkandDiscussion
Clinicalquestionansweringisanemergingareaofresearchthathasonlyrecentlybeguntoreceiveseriousattention.Asaresult,thereexistrelativelyfewpointsofcomparisontoourownwork,astheresearchspaceissparselypopulated.Inthissection,however,wewillattempttodrawconnectionstootherclinicalinformationsystems(althoughnotneces-sarilyforquestionanswering)andrelateddomain-specificquestionansweringsystems.
ThefeasibilityofautomaticallyidentifyingoutcomestatementsinsecondarysourceshasbeendemonstratedinNiuandHirst(2004).Theirstudyalsoillustratestheim-portanceofsemanticclassesandrelations,andinaddi-tionsuggestsanextensionoftheclinicalscenarioviewasapromisingdirectioninclinicalquestionanswering.However,extractionofoutcomestatementsfromsecondarysources(meta-analyses,inthiscase)isaneasierproblemthanextractionofoutcomesfromgeneralMEDLINEcita-tionsbecausesecondarysourcesrepresentknowledgethathasalreadybeendistilledbyhumans(whichalsolimitstheirscope).Sincesecondarysourcesareoftenmorecon-sistentlyorganized,itispossibletodependoncertainsur-facecuesforreliableextraction(whichisnotpossibleforallMEDLINEabstractsingeneral).Ourstudytacklesoutcomeidentificationinprimarymedicalsourcesanddemonstrates
Antipyreticefficacyofibuprofenvsacetaminophen
OBJECTIVE–Tocomparetheantipyreticefficacyofibuprofen,placebo,andacetaminophen.DESIGN–Double-dummy,double-blind,randomized,placebo-controlledtrial.SETTING–Emergencydepartmentandinpatientunitsofalarge,metropolitan,university-based,children’shospitalinMichigan.PARTICIPANTS–37otherwisehealthychildrenaged2to12yearsPopulationwithacute,intercurrent,febrileillnessProblem.INTERVENTIONS–EachchildwasrandomlyassignedtoreceiveasingledoseofacetaminophenIntervention(10mg/kg),ibuprofenIntervention(10mg/kg)(7.5or10mg/kg),orplaceboIntervention(10mg/kg).MEASUREMENTS/MAINRESULTS–Oraltemperaturewasmeasuredbeforedosing,30minutesafterdosing,andhourlythereafterfor8hoursafterthedose.Patientsweremonitoredforadverseeffectsduringthestudyand24hoursafteradministrationoftheassigneddrug.Allthreeactivetreatmentsproducedsignificantantipyresiscomparedwithplacebo.OutcomeIbuprofenprovidedgreatertemperaturedecrementandlongerdurationofantipyresisthanacetaminophenwhenthetwodrugswereadministeredinapproximatelyequaldoses.OutcomeNoadverseeffectswereobservedinanytreatmentgroup.CONCLUSION–Ibuprofenisapotentantipyreticagentandisasafealternativefortheselectedfebrilechildwhomaybenefitfromantipyreticmedicationbutwhoeithercannottakeordoesnotachievesatisfactoryantipyresiswithacetaminophen.Outcome
Figure1:SampleoutputfromourPICOextractors.
thatrespectableperformanceispossiblewithafeature-combinationapproach.
Theliteraturealsocontainsstudiesonsentence-levelclas-sificationofMEDLINEabstractsforotherpurposes.Forexample,McKnightandSrinivasan(2003)describeama-chinelearningapproachtoautomaticallylabelsentencesasbelongingtointroduction,methods,results,orconclusionusingstructuredabstractsastrainingexamples.Note,how-ever,thatsuchlabelsareorthogonaltoPICOframeele-ments,andhencearenotdirectlyrelevanttoknowledgeex-tractionforclinicalquestionanswering.Inasimilarvein,Lightetal.(2004)reportsontheidentificationofspecula-tivestatementsinMEDLINEabstracts.
Otherresearchershavedevelopedsystemsthatattempttocodifytheevidence-basedmedicinedomainmodel.Forex-ample,CiminoandMendonc¸astudiedMeSHtermsthatareassociatedwiththefourbasicclinicaltasks:etiology,prog-nosis,diagnosis,andtherapybasedonanalysis4,000MED-LINEcitations(Mendonc¸a&Cimino2001).Thegoalistoautomaticallyclassifycitationsfortask-specificretrieval,similarinspirittotheHedgesProject(Wilczynski,McK-ibbon,&Haynes2001).Thestudyreportedgoodperfor-manceforetiology,diagnosis,andinparticulartherapy,butnotprognosis.
Summarizationoffersanothergeneralapproachtobuild-ingclinicalinformationsystems.ThePERSIVALsys-temleveragespatientrecordstogeneratepersonalizedsum-mariesinresponsetophysicians’queries(McKeown,El-hadad,&Hatzivassiloglou2003).Ifpatientinformationisavailable,deepsemanticprocessingbecomeslessimpor-tant,asPERSIVALisabletoachieverespectableperfor-mancewithrelativelysuperficialtechniques.Althoughpa-tientinformationisnodoubtimportanttoansweringclini-calquestions,informationsystemsthathaveaccesstopa-tientrecordsarenotwidelyavailable.Inaddition,therearepolicyconcernsandobstaclesforsuchtightintegrationinareal-worldclinicalsetting.
Ourpreliminaryresultshavedemonstratedtheusefulnessofknowledgesourcesinsupportofquestionanswering.Byleveragingexistingdomainmodels(inUMLS),software
(MetaMapandSemRep),andataskmodel(PICOframe),semanticknowledgeextractionisrelativelystraightforward,asevidencedbytherespectableperformanceofourpopu-lation,problem,andinterventionextractorusingonlysim-plerules.Identificationofentitiesattheconceptuallevel(i.e.,withrespecttoasemanticclass)simplifiesextractionofmanyelementsbecausethereisrelativelylittleambigu-ityatthesemanticlevel.Successfulidentificationofout-comestatementsrequiresacombinationofsuperficialandsemanticfeatures,butourresultsdemonstratethefeasibilityofthisgeneraltask.Moreresearchiscertainlynecessary,bothtoimproveperformanceanddevelopamore-principledapproachtotheproblem,butweareencouragedbythesepreliminaryresults.
Theapplicationofdomainmodelsanddeepsemanticknowledgetoquestionansweringhasbeenexploredbyavarietyofresearchers,e.g.,(Jacquemart&Zweigenbaum2003;Rinaldietal.2004),andwasalsoafocusatarecentworkshoponquestionansweringinrestricteddomainsatACL2004.Ourworkcontributestothisongoingdiscoursebyofferingaspecificcasestudyintheclinicaldomain.Finally,theevaluationofdomain-specificquestionan-sweringsystemsremainsanopenresearchproblem.Withrespecttothisissue,Diekemaetal.(2004)offersinterestingobservations.Itisclearthatmeasuresdesignedforopen-domaintasksarenotappropriatefortheevaluationofsys-temsthatonlyoperateonspecificdomains,butthecom-munityhasnotagreedonamethodologythatwillallowmeaningfulcomparisonsofresultsfromrelatedsystems.However,webelieveitmightbeusefultotakecuesfromadvancesintheevaluationofmulti-documentsummariza-tion(Nenkova&Passonneau2004)anddefinitionquestionanswering(Lin&Demner-Fushman2005).
Conclusion
Thispaperdescribesknowledgeextractionmodulesthatserveasbuildingblocksforaclinicalquestionansweringsystem.Ourworkisframedwithinthebroaderissueofknowledgeresourcesindomain-specificquestionanswer-ing,andhowonemightleveragedomainmodels.The
preliminaryresultspresentedhereofferacasestudy:therecognitionofsemanticconceptsandrelations,facilitatedbyUMLS,MetaMap,andSemRep,simplifythetaskofknowl-edgeextraction.Weareencouragedbythesepreliminaryresults,whichdemonstratethefeasibilityofoperationaliz-ingmajoraspectsofevidence-basedmedicine.Informationsystemsinsupportoftheclinicaldecision-makingprocesshavepotentiallyimmenseimpactinaffectingthequalityofpatientcare.
Acknowledgements
WewouldliketothankBarbaraFew,SusanHauser,andMalindaPeeplesfortheirparticipationinthedevelopmentofthetestcollection.ThefirstauthorissupportedbyanappointmenttotheNationalLibraryofMedicineResearchParticipationProgramadministeredbytheOakRidgeIn-stituteforScienceandEducationthroughaninter-agencyagreementbetweentheU.S.DepartmentofEnergyandtheNationalLibraryofMedicine.ThesecondauthorwouldliketothankKiriforherkindsupport.
References
AdHocWorkingGroupforCriticalAppraisaloftheMed-icalLiterature.1987.Aproposalformoreinformativeabstractsofclinicalarticles.AnnalsofInternalMedicine106:595–604.
Aronson,A.R.2001.EffectivemappingofbiomedicaltexttotheUMLSMetathesaurus:TheMetaMapprogram.InProceedingoftheAmericanMedicalInformaticsAsso-ciationAnnualSymposium,17–21.
Chambliss,M.L.,andConley,J.1996.Answeringclinicalquestions.TheJournalofFamilyPractice43:140–144.Covell,D.G.;Uman,G.C.;andManning,P.R.1985.Informationneedsinofficepractice:Aretheybeingmet?AnnalsofInternalMedicine103(4):596–599.
DeGroote,S.L.,andDorsch,J.L.2003.Measuringusepatternsofonlinejournalsanddatabases.JournaloftheMedicalLibraryAssociation91(2):231–240.
Demner-Fushman,D.;Hauser,S.E.;Ford,G.;andThoma,G.R.2004.Organizingliteratureinformationforclinicaldecisionsupport.InProceedingsof11thWorldCongressonMedicalInformatics(MEDINFO2004),602–606.
Diekema,A.R.;Yilmazel,O.;andLiddy,E.D.2004.Eval-uationofrestricteddomainquestion-answeringsystems.InProceedingsoftheACL2004WorkshoponQuestionAn-sweringinRestrictedDomains.
Ebell,M.H.;Siwek,J.;Weiss,B.D.;Woolf,S.H.;Sus-man,J.;Ewigman,B.;andBowman,M.2004.StrengthofRecommendationTaxonomy(SORT):Apatient-centeredapproachtogradingevidenceinthemedicalliterature.TheJournaloftheAmericanBoardofFamilyPractice17(1):59–67.
Ely,J.W.;Osheroff,J.A.;Chambliss,M.L.;Ebell,M.H.;andRosenbaum,M.E.2005.Answeringphysicians’clini-calquestions:Obstaclesandpotentialsolutions.JournaloftheAmericanMedicalInformaticsAssociation12(2):217–224.
Gorman,P.N.;Ash,J.S.;andWykoff,L.W.1994.Canprimarycarephysicians’questionsbeansweredusingthemedicaljournalliterature?BulletinoftheMedicalLibraryAssociation82(2):140–146.
Jacquemart,P.,andZweigenbaum,P.2003.Towardsamedicalquestion-answeringsystem:Afeasibilitystudy.InBaud,R.;Fieschi,M.;Beux,P.L.;andRuch,P.,eds.,TheNewNavigators:FromProfessionalstoPatients,volume95ofActesMedicalInformaticsEurope,StudiesinHealthTechnologyandInformatics.Amsterdam:IOSPress.463–468.
Kauffman,R.E.;Sawyer,L.A.;andScheinbaum,M.L.1992.Antipyreticefficacyofibuprofenvsacetaminophen.AmericanJournalofDiseasesofChildren146(5):622–625.Light,M.;Qiu,X.Y.;andSrinivasan,P.2004.Thelan-guageofbioscience:Facts,speculations,andstatementsinbetween.InBioLINK2004:LinkingBiologicalLiterature,Ontologies,andDatabases.,17–24.
Lin,J.,andDemner-Fushman,D.2005.Automaticallyevaluatinganswerstodefinitionquestions.TechnicalRe-portLAMP-TR-118/CS-TR-4693/UMIACS-TR-2005-03,UniversityofMaryland,CollegePark.
Lindberg,D.A.;Humphreys,B.L.;andMcCray,A.T.1993.TheUnifiedMedicalLanguageSystem.MethodsofInformationinMedicine32(4):281–291.
McCray,A.T.;Burgun,A.;andBodenreider,O.2001.AggregatingUMLSsemantictypesforreducingconcep-tualcomplexity.InProceedingsof10thWorldCongressonMedicalInformatics(MEDINFO2001),216–220.
McKeown,K.;Elhadad,N.;andHatzivassiloglou,V.2003.Leveragingacommonrepresentationforpersonal-izedsearchandsummarizationinamedicaldigitallibrary.In3rdACM/IEEE2003JointConferenceonDigitalLi-braries.
McKnight,L.,andSrinivasan,P.2003.Categorizationofsentencetypesinmedicalabstracts.InProceedingoftheAmericanMedicalInformaticsAssociationAnnualSympo-sium,440–444.Mendonc¸a,E.A.,andCimino,J.J.2001.Buildingaknowledgebasetosupportadigitallibrary.InProceedingsof10thWorldCongressonMedicalInformatics(MED-INFO2001),222–225.
Mladenic,D.,andGrobelnik,M.1999.Featureselec-tionforunbalancedclassdistributionandNa¨ıveBayes.InProceedingsoftheSixteenthInternationalConferenceonMachineLearning(ICML1999),258–267.
Nenkova,A.,andPassonneau,R.2004.Evaluatingcon-tentselectioninsummarization:Thepyramidmethod.InProceedingsofthe2004HumanLanguageTechnologyConferenceandtheNorthAmericanChapteroftheAs-sociationforComputationalLinguisticsAnnualMeeting(HLT/NAACL2004).
Niu,Y.,andHirst,G.2004.Analysisofsemanticclassesinmedicaltextforquestionanswering.InProceedingsoftheACL2004WorkshoponQuestionAnsweringinRestrictedDomains.
Pratt,W.,andYetisgen-Yildiz,M.2003.Astudyofbiomedicalconceptidentification:MetaMapvs.people.InProceedingoftheAmericanMedicalInformaticsAssocia-tionAnnualSymposium,529–533.
Richardson,W.S.;Wilson,M.C.;Nishikawa,J.;andHay-ward,R.S.1995.Thewell-builtclinicalquestion:Akeytoevidence-baseddecisions.AmericanCollegeofPhysiciansJournalClub123(3):A12–A13.
Rinaldi,F.;Dowdall,J.;Schneider,G.;andPersidis,A.2004.Answeringquestionsinthegenomicsdomain.InProceedingsoftheACL2004WorkshoponQuestionAn-sweringinRestrictedDomains.
Rindflesch,T.C.,andFiszman,M.2003.Theinterac-tionofdomainknowledgeandlinguisticstructureinnatu-rallanguageprocessing:Interpretinghypernymicproposi-tionsinbiomedicaltext.JournalofBiomedicalInformatics36(6):462–477.
Sackett,D.L.;Strauss,S.E.;Richardson,W.S.;Rosen-berg,W.;andHaynes,R.B.2000.Evidence-BasedMedicine:HowtoPracticeandTeachEBM.ChurchillLivingstone,secondedition.
Sneiderman,C.;Demner-Fushman,D.;Fiszman,M.;andRindflesch,T.C.2005.SemanticcharacteristicsofMED-LINEcitationsusefulfortherapeuticdecision-making.InProceedingoftheAmericanMedicalInformaticsAssocia-tionAnnualSymposium.Underreview.
Wilczynski,N.;McKibbon,K.A.;andHaynes,R.B.2001.Enhancingretrievalofbestevidenceforhealthcarefrombibliographicdatabases:Calibrationofthehandsearchoftheliterature.InProceedingsof10thWorldCongressonMedicalInformatics(MEDINFO2001),390–393.
Yang,Y.,andPedersen,J.O.1997.Acomparativestudyonfeatureselectionintextcategorization.InProceedingsofthe14thInternationalConferenceonMachineLearning,412–420.
因篇幅问题不能全部显示,请点此查看更多更全内容