2002InstituteofMathematicsandInformatics,Vilnius
47
ExtractionofObject-orientedSchemasfromExistingRelationalDatabases:aForm-drivenApproach
MimounMALKI,AndréFLORY,MustaphaKamelRAHMOUNI
ComputerScienceDepartment,UniversityofSidiBel-Abbes,AlgeriaLISIInsa-Lyon,29avenueAlbertEinsteinVilleurbanne,FranceComputerScienceDepartment,UniversityofOran,Algeria
e-mail:malki_m@yahoo.com,flory@insa.insa-lyon.fr,rahmouni@mail.univ-oran.dzReceived:May2001
Abstract.Inthispaper,wepresentourForm-drivenapproachforreverseengineeringofrelationadatabases.Thismethodologyusestheinformationextractedfrombothformstructureandinstancesasadatabasereverseengineeringinputusinganinteractionwithauser.Throughacombinationofformsstructuresanddatainstancesanalysis,formsrelationalsub-schemasandtheirconstraintsarederived.Theserelationalsub-schemasaremappedtoobjectsub-schemas,whichwillbemerg-ingintoglobalobject-orientedschemathatpresentsthewholeunderlyingdatabases.Theresultingglobalobject-orientedschemamustbevalidatedasarichandcorrectrepresentationoftheapplica-tiondomain.
Keywords:relationaldatabases,reverseengineering,object-orienteddatabases,forms,integration,validation.
1.Introduction
Inmanyorganizations,therearealargenumberofdatabaseapplicationsthathaveevolvedovermanyyears.Thesesystemsarereferredtoaslegacysystems(Farhrner,1995a;Chi-ang,1994;Hainaut,1995;Soutou,1998)andarecharacterizedbylackofdocumentationandnon-uniformityresultingfromnumerousextensions.Thesepropertiesleadtoinflex-iblesystemsandhighmaintenancecosts.
Reverseengineeringcanhelpbyunderstandinglegacydatabasesandextractingtheirdesignspecification(domainsemantics).Itisthepartofdatabasemaintenanceworkthatprocuresasufficientunderstandingofalegacydatabase,anditsapplicationdomaintoallowappropriatechangestobemade.Morespecifically,databasesreverseengineeringcanbetreatedasaprocessthatobtainsdomainsemanticsaboutalegacydatabase,makesthereverseschematransformation(fromlogicaltoconceptual)and,finally,representstheresultusuallyasaconceptualschema(orobject-orientedschema)(Farhrner,1995b;Behm,2000).
Usersinorganizationsarequiteexperiencedinthehandlingandmanipulationofdatabasesthroughforms.TheCODASYLEndUserFacilitiesCommittee(EUFC)has
48M.Malki,A.Flory,M.K.Rahmoun
recognizedtheimportanceofformsasthemostnaturalinterfacebetweenusersanddataandrecommendstheform-orientedapproachasthemainvehicleforuserinterface(Lefkovitz,1979).Followingtheserecommendations,therehavebeenanumberofauto-matictoolsfordefining,specifying,andimplementingformapplications(Batini,1984;Shu,1985;Yao,1984;Choobinen,1992);querylanguagesanddatabasesystemshavealsobeenextendedtodealwithformsandothertypesofdocumentsusedintheofficeenvironmenttofacilitatethemanipulationofdataviacomputerizedform(e.g.,OracleSQLForms,Informix-SQL,MicrosoftAccessFormsandreportstools).
Theobjectiveofthisresearchistoproposeaform-basedapproachforreverseengi-neeringofrelationaldatabases.Byform,wemeananystructuredcollectionofvariables(formfields)whichareappropriatelyformattedtocommunicatewiththedatabasesespe-ciallyfordataretrievalanddatadisplay.
Thisapproachusestheinformationextractedfrombothformstructureandinstancesasadatabasesreverseengineeringinputusinganinteractionwithauser.Thispropositioncanbesupportedbythefollowingarguments:
–aformmodelisadatamodel.Studyingandanalyzingaformanditsrelationshiptootherformscanrevealmanydatadependenciesandmapping;
–usuallythemostwidelyuseddataaregatheredorreportedinaformand,therefore,importantinformationcanbeobtainedbyanalyzingadatabase’sforms;
–formsarefamiliarandcommonplacesobjects,theyareeasytoreadandunderstand;–normallytheformsareaccompaniedbyinstructions.Theinstructionsprovideaddi-tionalinformationaboutorganization’sdataandtheirprocedural(orbehavior)parts.Ourmethodologyarticulatearoundfivephases(Fig.1):–acquisitionoftheformsstructureandinstances,
–extractionofthedomainsemanticsfromformsanalysisinordertoconstructtheirrelationalsub-schemas,
Fig.1.Phasesofform-drivenapproachofrelationaldatabasesreverseenginnering.
ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases49
–transformationofrelationalsub-schemasinobject-orientedsub-schemas,
–integrationoftheseobject-orientedsub-schemasintoaglobalobject-orientedschema,
–validationoftheresultingglobalobject-orientedschema.
Thepaperisorganizedasfollows.InSection2,relatedapproachesindatabasere-verseengineering,arediscussed.Section3containsthedifferentstagesofformacquisi-tionwhileintroducingtheconceptsoftheirmodelandtheirgraphicdescriptionlanguage.Section4presentstheextractionofdomainsemanticsfrombothformstructureandin-stances,andconstructionoftheirrelationalsub-schemas.Section5describestherulesrelationalsub-schemastransformationinobject-orientedsub-schemas,theintegrationofthesesub-schemasintoaglobalobjectschemaandthevalidationoftheresultingglobalobjectschema.Section6concludesthepaperandsummarizestheresults.
2.RelatedWorks
Severalresearcheshavebeendoneonrelationaldatabasereverseengineering,suggest-ingmethodsandrulesforreverseschematransformationofthelogicaldataschemaofalegacydatabaseintoaconceptual(orobject-oriented)schema.Weclassifythesecontri-butionsintwocategories.
Inthefirstcategory,theextractionphaseofdomainsemanticsisnotpresent.Theseapproachesdealwiththeschematransformation,andhaveverystrongrequirementsforlegacydatabasestobereversingengineered.Forexample,theirprocessrequiresthattherelationsareinatleastthirdnormalform(3NF),aconsistentnamingconventionisappliedtoattributes,andinformationontheavailabilityofinclusiondependenciesandkeysisavailable.FarhnerandVossen(Farhrner,1995a)classifythesemethodsintwoclasses:
–Approaches,inthefirstclass,elicitthesemanticsofthegivenrelationalschemathroughanevaluationofitsinclusiondependencies(notedinds).Eachindisinterpretedwithrespecttotherole(key,partofkey,foreignkey,non-key)ofitsattributes.Mostoftheseapproachesrequireacompletesetofindsandkeysoftherelations;sometakeonlykey-basedindsintoaccount(Casanova,1983;Ji,1991;Markowitz,1990),whileothersconsidergeneralinds(Johanneson,1994;Kalman,1991).
–Approaches,inthesecondclass,deriveaconceptualschemathroughanevaluationofkeys,theirconstructionthroughotherkeys,andadiscoveryofforeignkeys.Severaloftheseapproaches(Navathe,1987;Batini,1992;Chiang,1994;Tari,1997);alsoclassifytherelationofthesourceschemawithrespecttotheconstructionoftheirkeys,e.g.,whethertheprimarykeyofarelationistheconcatenationoftheprimarykeysofotherrelations.Thetransformationisthendoneonthebasisofthisclassification.
Inthesecondcategory,theextractionphaseofdomainsemanticsisconsidered,suchaskeys,functionalandinclusiondependencies,andintegrityconstraintsbyusingmachine-analyzablesources,suchasdatabasesinstancesandqueries(orapplicationpro-grams).Forexample,alegacydatabaseforreverseengineeringnormallycontainsalarge
50M.Malki,A.Flory,M.K.Rahmoun
volumeofdatafromapplicationdomain,sothesedatainstancescanbesourcesforex-tractingdomainsemantics.
–Approachesusingapplicationprogramsarebasedontheuseofquerylanguagestatementstoextract(someof)thesemanticinformationstoredinarelationaldatabase(e.g.,Anderson,1994;Petit,1995).Anderson(1994)analysesequi-joinsstatementswhereas.Petit(1995)analysesauto-joins,setoperationsandwhere-byclausestatementsaswellastheequi-joinstatements.Otherwise,theHainaut’sapproach(Hainaut,1995)analysesthecodeofprograms,mainlythelogicaldescriptionoffilesinCobol,forcom-pletingknowledgeonthedatastructureandontheconstraints.
–Approachesusingextensionoflegacydatabasearebasedontheanalysisofdatainstancestounderstand(someof)thesemanticsofdatabaseapplication.Chiang’sap-proach(Chiang,1994)usestherelationalschemainformationasthebasicinputandaimtoextractthesemanticinformationbyexhaustiveanalysisofeachrelationintheschemaandtheirkeyandnon-keyattributes.AlsoAnderson(1994)andPermerlanietal.(1994)analyzedatainstancestoextractindsfromindicationprovidedbyjoins.Otherwise,in(Tari,1997),theclassificationoftherelationalschemaisbasedontheanalysisofdatain-stances.Independently,Mannilaetal.(1994)proposesomealgorithms(based-heuristics)toextractfunctionalandinclusiondependencies.
Untilrecently,researchesindatabasereverseengineeringdonotuseforms,whichmanipulatedatabases,exceptMfourga’sapproach,asmachine-analyzablesource.In(Mfourga,1997),Mfourgaproposesframeworkforextractinganentity-relationshipmodelfromasetofformmodelschemasofalegacyrelationaldatabase.Formmodelschemasgatherinformation,i.e.,structuralinformationandconstraintsamongdata,ex-tractedfromboththeirstructuresandinstances.Thisapproachusestheseschemastosupplementlow-levelschemasschemestohelpreconstructaentity-relationshipschemas.Ourapproachemphasizestheextractionoftherelationalsub-schemaofform,fromboththeformstructureandthephysicalschemaoftheoperationaldatabase,thatisinfactapivotschemareflectingthesemanticsoftheoperationaldatabaseaswellasthesemanticsofforms.Sothisschemafacilitatestheextractionofdependenciesinordertoconstructtheconceptualobjectschemaoftheunderlyingoperationaldatabase.
3.AcquisitionofForms
Thisphasesupportsthedescriptionofformsandtheirinstancesononehand,and,theautomaticextractionoftheformshierarchicalstructuresinordertofacilitatetheiranal-ysisandinterpretationontheotherhand.Inthisway,weuseourformmodelpresentedhereafter.3.1.FormModel
In(Malki,1999),wehavepresentedageneralmodelofformssuitablefordatabasesre-verseengineeringtask.Themodelallowsabstractinganydatabaseform,thatis,tomakeexplicititscomponents,fieldsaswellasobjects,andtheirinterrelationships.Itplacesa
ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases51
specialemphasisonfindingrelevantconstraintsamongdatabaseformcomponentbyex-ploitingtheforminstances.Thismodelissimilarbutnotidenticaltothemodelspresentedin(Choobineh,1992;Mfourga,1997).
3.1.1.FormType
Aformisastructuredcollectionofvariables(i.e.,fields)appropriatelyformattedtocom-municatewithadatabase.Aformtypedefinesthestructure,constraintsandpresentationoftheformfields.Itrepresentstheformintensionasperceivedbyusers.Aparticularrepresentationofformtypeiscalledaformtemplate.Aformtemplatenotonlyspecifiesarepresentation,butisalsomediumdependent.Thispaperonlyaddressesscreenandpaperforms,whichareusedascommunicationinterfaceswiththedatabase.
Threebasiccomponentsofanytemplatearetitle,captions,andentries.Everyformmusthaveatitle,whichnamestheformandprovidesageneraldescriptionofwhatistobefilledinbytherespondent.ThetitleoftheforminFig.2is“CARDOFWAGE”.Captionsarepreprintedontheform.Theyserveasaclueaswhatistobefilledinbytherespondentaswellasaguidetoenterortoreaditontheform.Anentryistheactualdatathatisenteredbyanoperatorordisplayedbytheformprocessingsystemonthespaceprovided.Theentrydatamaybetext,numeric,bitmaporanyotheruserdefineddatatype.AsanexampleinFig.2,Nameisthecaptionassociatedwiththeentrywiththevalueof“Malki”.
Aforminstanceisaparticularcollectionofvaluesfortheformfields.Whenaformtemplateisfilledwithvalues,itbecomesaninstanceofthatformtype.Fig.2isatemplateof“CARDOFWAGE”formwiththevaluesofaparticularforminstances.
3.1.2.StructuralUnits
Threetypesofformfieldscanbeidentified:atomic,box,andmultiplechoice(table).Anatomicfielddoesnothaveanyconstituent(sub)fields.Abox(ortable)fieldgroupsa
Fig.2.Exampleofform:“Cardofwage”.
52M.Malki,A.Flory,M.K.Rahmoun
numberofotherfields.Aformfieldroughlycorrespondstotheconceptofanattributeindatabaseterminology.
Astructuralunittype(SU),isabox(ortable)ofhomogeneouspiecesofinformation,thatis,anobjectthatgroupscloselyrelatedformfields.EachSUisalogicalsub-partofaformtype.TheFig.2showssomestructuralunitsofthe“CARDOFWAGE”form:EmployeeIdentification,Function/activityandPremium/indemnities.
3.1.3.RelationshipsbetweenStructuralUnits
Weusetheparent-childrelationshipwhichisone-to-many(multi-valued)orone-to-one(single)relationshipbetweentwostructuralunits.OneoftheSUs(alwaystheone-side)iscalledtheparentSU,theother(many-sideorsometimesone-side)iscalledthechildSU.AnoccurrenceofarelationshipconsistsofoneSUoccurrenceoftheparentandoneorseveraloccurrencesofthechildSU.IntheFig.2,therelationshipbetweentheSU“CARDOFWAGE”(parent)andtheSU“Identification”(Child)isone-to-one,i.e.,thata“CARDOFWAGE”isestablishedforonlyoneEmployee.OntheotherhandtherelationshipbetweentheSU“CARDOFWAGE”(parent)andtheSU“Premiums/indemnities”isone-to-many,i.e.,thatanemployeemayreceiveanumberofpremiumsandindemnities.Inthisway,ourmodelpermitsahierarchicalstructuringofformfields.Thishierar-chicalstructurecorrespondstoformtypelogicalstructure.Therootofthetreeistheformtitle.Theothernodesrepresentstructuralunits.
Childrenthathavethesameparentbelongtothesamelevel.Therootlevelisnum-ber0andeachsubsequentlevelisonegreaterthanitsparent’s.TheFig.4showsthehierarchicalstructureoftheform“CARDOFWAGE”.
Onemaynotethattheformmodelincludesalogicalandaphysicalaspect.
Thelogicalaspectcorrespondstotheformdefinition,asitisperceivedbyusers(ordesigners),i.e.,itsintensionalpart.Thisintensionalpartisalogicalstructurethatcon-tainslogicalsub-partscorrespondingtotheobjects(structuralunits)thatcomposeaformviewedasaparent-childhierarchicalstructure.Thelogicalaspectallowsunderstandingboththeformstructureandmeaning.
Thephysicalaspectofaformisrepresentedbyagivenformtemplate.Inthistemplate,thetitleoftheformprovidesageneraldescriptionofwhatistobefilledin.Theplacementofthefieldsisoftenwellstudiedandhomogeneousinformationisingeneralcontiguous.Fieldcaptionsareingeneralmoreexplicitthanunderlyingdatabaseattributes.Thetem-platealsoprovidesinstructionsthatrepresentcontextualknowledgethatisknowledgeaboutobjetsrepresentedontheforms.Therefore,theformphysicalaspectcanbeusefulforidentifyingtheobjectscontainedinaform.
Now,wecandefinethenotionoftheformbaserepresentingalegacy(oroperational)databaseapplication.Thisbasepossessesforeveryform:itsdescription,itshierarchi-calstructure,anditsinstances.Formally,aformbaseisatripletB(D,S,I),whereDrepresentsaformdescription(i.e.,bothphysicalandlogicalaspects),Srepresentsahi-erarchicalstructureofformandIrepresentstheirinstances.
ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases53
3.1.4.GraphicLanguageofFormDescription
In(Malki,1999),wehavepresentedagraphiclanguageofformdescription(notedGLFD).Thislanguage,thatisagraphicinterface,assiststheuser(ordesigner)inspecify-ingformdescriptionfromconceptsoftheformmodelpresentedabove.Theideaconsistsinidentifyingfieldsandstructuralunitsoftheform(logicalaspect)fromitsgraphicrep-resentationorformtemplate(physicalaspect).
WedistinguishtwodefinitionshapesoftheGLFD(graphicanddeclarative)thatarejointlyusable.
3.1.4.1.GraphicDefinition.ItwouldbehelpfultousethegraphicdefinitionoftheGLFDinordertofacilitatetheformdescriptionthroughitsgraphicinterfacewhileus-ingthebaseconceptsoftheformmodel:group(orbox),list(ortable)andfields.Thisgraphicinterfacedescribesphysicalaspectofformwhileconsideringthestructuralunitsasbox(ortable)composedwithtext,graphicandinformation(field).Thisidentificationissometimesenforcedbyseparators(e.g.,blanks,lines).
3.1.4.2.DeclarativeDefinition.Thebaseconceptsoftheformmodelusedbythegraphicinterfaceareexpressedalsounderdeclarativespecifications.Thesedeclarativespecificationsdescribethefeaturescharacterizingeveryelement(field,andstructuralunit)offormreproducedgraphically.
Thesyntaxofthisdeclarativespecificationisthefollowing:
Box: Identify: Composition:{ Table: Identify: Field-ColumnorField-line: Field: Identify: Type:integer/real/Boolean/characterLength: Origin:System/User/computing Letustakeanexampleofthelistspecification“premium&indemnities”of“CARDOFWAGE”(seeFig.3). 54M.Malki,A.Flory,M.K.Rahmoun Fig.3.Exampleofspecifationoftableacquisition. 3.2.InsertionofForms Inthisstep,weuseagraphiclanguagebasedontheformmodelconceptspresentedabove,fordescribingtheformaccordingbothtotheirphysicalandlogicalaspects.Inthesameway,aformdictionaryisusedininteractionwiththedesigner.Thisdictionary,whichwillbeintegratedintheformbase,containsthemeaningoffields(designation,synonym,andrulesofcomputedfield).Thisdictionaryisusedtoinsureapresentationascompleteaspossiblefortheformanalysis. Foreachform,wetakeseveralinstances.Theseinstanceswillbeusedwiththeformaldescriptionofformasmachine-analyzablesourceinordertofacilitatetheextractionofthedomainsemantics.3.3.StructuringofForms Thisprocess,thatexploresbothphysicalandlogicalaspectsofformsdescribedbythegraphiclanguage,identifiesthestructuralcomponentsofformandtheirinterrelationshipsinordertoconstructthehierarchicalstructureoftheform.Itwouldbehelpfultohaveaprecisepictureofthehierarchicalrelationshipsofaforminordertoclearlyunderstanditsmeaningandfacilitatetheextractionofthedomainsemantics. Thisprocedure,whichisanautomaticprocessandtransparenttothedesigner,con-structsthehierarchicalstructureinfoursteps: –definingtherootnodewhosenameistheform’stitle.Allfieldsarereattachedtotherootnode; –transforminginthefirstallstructuralunitsinnodewithlevel1,i.e.,theirparentistherootnode; –identifyingtheparent-childrelationshipsamongstructuralunits.Becausethereareseveralleveloffields(e.g.,0,1,2,3,etc.),thisprocessidentifyrecursivelythesub-structure(orsub-node)accordingtolevelsoftheirfieldsspecifiedintheinsertionprocessofforms; ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases55 Fig.4.Hierarchicalstructureoftheform“CardofWage”. –andsimplifyingthehierarchicalstructurewhileeliminatingsub-structuresthatcon-tainonlythetext(orgraphic)andcalculatedfieldsthatdonotcontaininterestinginfor-mation. Forexample,theform“CARDOFWAGE”isrestructuredinhierarchicalstructure(seeFig.4). 4.ExtractionoftheDomainSemantics Formsareoftenthemostconvenientinterfacesforentering,changing,andviewingdataindatabaseapplications.Therefore,themostwidelyuseddataaregatheredorreportedinforms.Theyaredesignedmostlywithadeliberateattemptatuser-friendlinessandtheydonotnecessarilydirectlymirrorthestructureoftheunderlyingdatabase. Adeliberateeffortofabstraction(i.e.,descriptionandstructuringofforms:seesectionabove)isnecessarilyformakingallthecomponentsoftheformindividuallyaccessible,andexplicatingtheirinterrelationshipsaswellastheconstraints(ordependencies)onthedatatheydisplay.Inthisway,theformsofagivendatabasetakenasawholecanbeusedtorecoverdatabasesemantics,especiallywithlegacysystems.Indeed,withsucholdersystems,databaseapplicationsoftendonotcomewithaconceptualschema,butonly,forexample,witharelationalschema. Intherelationalmodel,semanticsarespecifiednotinthestructure,butthroughcon-straints.Datadependenciesareonekindoftheseconstraints;wherefunctionalandinclu-siondependenciesareofparticularinterest.Nowadays,relationaldatabasesarespreadeverywhere,andmanyapplicationslikereverseengineering,integratedaccessforin-teroperabledatabases,knowledgebasedinterfaces,migrationtonewerdatabaseman-agementsystems(DBMS),andimprovementontheeffectiveutilizationofthedataoftheorganizationrequireadeepknowledgeonthesemanticsoftheseexistingdatabases.Unfortunately,manyrelationalDBMSdonotprovidemuchsupportforspecifyingdatadependencies,andconsequently,thissemanticscannotbefoundintheschema.However,sincedatadependenciesareimplicitintheextensionofthedatabases,asolutiontothisproblemistoextractthedependenciesbyanalyzingtheextension. Thegoalofthisphaseofextractionistoderivetherelationalsub-schemaofformfromtheirhierarchicalstructureandtheirinstancesaccordingthephysicalschemaoftheunderlyingdatabase.Thisprocess,inthefirst,identifyrelationsandtheirprimary 56M.Malki,A.Flory,M.K.Rahmoun keysrespectivelywithregardtobothstructuralunits(ornodes)offormandunderly-ingdatabase,thenextractthefunctionalandinclusiondependenciesthroughboththeirhierarchicalstructureandinstances. Sincetheformsgrouprealobjects(orstructureunits)whicharemanipulatedbyend-user,thisprocessofextractiondoesnotnecessarilyrestructuringtherelationalsub-schemesrepresentingtheseformsontheonehand,andfacilitatesthederivationofre-lationsaswellastheidentificationofobjects(orstructureunits)ontheotherhand.Otherwise,itisnotoftentrue,whenonetakesonlythephysicalschemeofdatabaseasmachine-analyzablesource(Chiang,1994;Petit,1995).Inaddition,theusingofforminstancesreducesthetimeofextractionofdomainsemantics,withregardtoapproachesthatusedatabaseinstancesasmachine-analyzablesource(Chiang,1994;Mannila,1994;Petit,1995). Westartbrieflybyintroducingtheconceptsoftherelationalmodel(Ullman,1984).4.1.RelationalModel Inthissub-section,thebasicrelationalconceptsandterminologyusedinthispaperarereviewed.Detailscanbefoundinanytextbookontherelationalmodel,e.g.,(Ullman,1984;Elmasri,1994). Arelationschemeisapair ArelationforarelationschemeRi(Xi)isasetoftuplesforRi(Xi).IftisatupleforRi(Xi)andYisasubsetofXi,thent[Y]denotesthesubtupleoftcorrespondingtoY.LetRi(Xi)bearelationscheme,riarelationforRi(Xi),andYasubsetofXi.TheprojectionofrionY,denotedbyπY(ri),is{t[Y]/t∈ri}. Afunctionaldependency(notedfd)overRiisastatementoftheformRi:Y→Z,whereYandZaresubsetsofXi.AfunctionaldependencyRi:Y→Zissatisfiedbyriifforanytwotuples,tandt,inri,t[Y]=t[Y]impliest[Z]=t[Z]. AkeyforRiisasubsetKiofXi,suchthatRi:Ki→Xiissatisfiedbyanyriasso-ciatedwithRiandtheredoesnotexistanypropersubsetofKihavingthispropriety.Arelationschemecanhaveseveralcandidateskeysfromwhichoneprimarykeyischosen.Therefore,inarelation,attributescanbeclassifiedintoprimarykey,foreignkeys,whicharekeysinanotherrelationandnon-keyattributes.Aprimarykey(orasub-partofakey)canbeatthesametimeaforeignkey. Inanyalgorithmthatdiscoversfdsfromextensionoftherelations,itisimportanttominimizeredundantwork,thatis,theworkofanalyzingthetuplestoobtainredundantfds,becausetheycanbederivedfromotheronesbyapplyingArmstrongrules(Arm-strong,1974).Thefocusisonthefollowingderivedfds:1.Trivialfdsrule:X→YifY⊆X. ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases57 2.Augmentationrule:X→YthenXW→YZ,ifZ⊆W.3.Transitivityrule:ifX→YandY→Z,then,X→Z. Thepresenceofafdcangiverisetoredundantinformation,whichcanbeeliminatedbyperformingadecomposition.Thisobservationhasledtothedefinitionofseveralnor-malformsofrelationaldatabase(1NF,2NF,3NF,3NFBCK).Inourapproach,wesup-posethatrelationsarein3NF. LetRi(Xi)andRj(Xj)betworelationsschemas.LetriandrjberelationsofRi(Xi)andRj(Xj),respectively.Aninclusiondependency(notedind)isastatementoftheformRi.Y< Letustakeanexampleofrelationalschemaof“Personnel”databasefromwhichweapplyourreverseengineeringapproachwhileanalyzingtheirforms: Employee(emp.id.,name,birth-date,address,sex,m.s.,number-chil.,ssn,accountid,nationality,recruitment-date),Child(emp.id.,name,birth-date,sex)Holiday(hol.id,designation) On-holiday(hol.id,emp.id,begin-date,end-date) Element-Paye(payeid,designation,rate-number,type)Payed(payeid,emp.id,date) Sanction(sanc.id,désigantion,degree)Punished(sanc.id.emp.id,sanction-date)Diploma(dipl.id,designation) Graduate(dipl.id,emp.id.,obtain-date,University)Rank(rankid,designation,category,section,level)Having-rank(rankid,emp.id,begin-date,end-date)Function(func.id,designation) Having-function(func.id,emp.id,dept.id,begin-date,end-date)Departure(dep.id,designation) On-Departure(dep.id,emp.id,date,length) Department(dept.id,designation,emp.id),Bank(bankid,name,address.) 4.2.IdentificationofFormRelations Theformseitherpermittheupdatingofrelationsinunderlyingdatabaseorrepresentaviewthatisajointofrelations.Therefore,eachfieldentryisgenerallylinkedtoanat-tributeofonerelationintheunderlyingdatabase.However,theidentificationofformre-lationsandtheirprimarykeysrespectively,consistsindeterminingtheequivalenceand/orthesimilaritybetweenstructuralunits(nodes)ofhierarchicalstructureandrelationsintheunderlyingdatabase.Thisisabasispointfromareverseengineeringpointofview(Malki,1999). 58M.Malki,A.Flory,M.K.Rahmoun Anodeofaformhierarchicalstructuremaybeeither: –equivalenttoarelationintheunderlyingdatabase,i.e.,thesetwoobjects(nodeandrelation)haveasamesetofattributes; –similartoarelation,i.e.,itssetofattributesisasubsetoftheoneoftherelation;–asetofrelations,i.e.,itssetofattributesregroupsseveralrelationsinunderlyingdatabase. Also,fordependentnodes(orformrelation),primarykeysareformedbyconcatenat-ingtheprimarykeyofitsparentwhititslocalprimarykey. Thisprocessofidentificationissemi-automatedbecauseitrequirestheinteractionwiththeanalysttoidentifyobjetsthatdonotverifyproprietiesofequivalenceandsimi-larity. Whileapplyingthisprocessonthehierarchicalstructureof“Cardofwage”andthephysicalrelationalschemaofunderlyingdatabase,weextractthefollowingrelationalsub-schema: Employee(emp.id.,name,birth-date,address,sex,m.s.,number-chil.,ssn,accountid,nationality,recruitment-date),Payed(payeid,emp.id,date) Having-rank(rankid,emp.id,begin-date,end-date) Having-function(func.id,emp.id,dept.id,begin-date,end-date) 4.3.ExtractionoftheFunctionalDependencies Theextractionoffunctionaldependenciesfromtheextensionofdatabasehasreceivedagreatdealofattention(see,e.g.,Anderson,1994;Castellanos,1993;Mannila,1994;Petit,1995).Theaimoftheextractionoffunctionaldependenciesistoreducethecomplexityofextractionalgorithms,thatisexponentialtimewithregardtothenumberofattributes(lefthandsideoffdnotedlhs),andpolynomialwithregardtothenumberoftuplesofrelationindatabase(righthandsideoffdnotedrhs). Inourapproach(Malki,1999),weintroducetwowaystoreducethetimeforex-tractingfunctionaldependencies.Thefirstonereplacesdatabaseinstanceswithamorecompactrepresentationthatis,theforminstances.Thesecondoneistouseonlythenon-keyattributesaslefthandsideoffunctionaldependencies,because,wesupposethattherelationalschemaisinthe3NF(wheretheprimarykeydeterminesfunctionallytherestofattributes,i.e.,non-keyattributes).Inthiscondition,itispossibletoinferanewfdthroughnon-keyattributesandkeyattributes(i.e.,non-keyattributes→keyattributes).However,weconsidertheutilizationofformsassamplingdatabaseforextractingdomainsemantics. LetLabetheleft-handside(simpleorcomposite)ofafd,thatis,thedeterminant,andRatherighthandside(alwayssimple).Foreverypair(La,Ra)allthetuplesoftherelationmustbeanalyzedtotestthefdcondition. Now,wepresentthefundamentalprinciplesthatpermittooptimizetheprocessforextractingthefds: –ThepossibleLa’sareorderedbynumberofattributes(i.e.,non-keyattributes).ForarelationRwithnon-keyattributesA,B,C,theorderwouldbe:A,B,C,AB,AC,BC, ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases59 ABC.ThisorderguaranteesthatnoLaisanalyzeduntilanysubsetofitscomponentattributeshasbeenanalyzed. –ForeverypossibleLaXthatisanalyzed,eachkeyattributeofRiscandidateasaRaofapossibleFd.However,eachsuchattributeisproposedasapossibleRaonlyifthereisnopreviouslyfoundfdwhoselefthandsideiscontainedbythisLaandwhoserighthandsideisequaltothisRa,otherwise,thepair(La,Ra)representsaredundantfd(byaugmentationrule)andRaisnotapossiblebutapositiverighthandsideforLa.Inthisway,onlyfdswithminimalLa’sareobtained.Furthermore,onceafdhasbeendiscovered,alltransitivefdsthatcanbederivedatthatmomentarecomputed.Withthismechanism,theanalysisoftheextensionoftherelation(sub-relationofform)isavoidedformanyredundantfds. –Foreachpair(La,Ra)representingapossiblefd,theextensionoftherelationisanalyzedtotestifitsatisfiestheFdcondition.Thisconditionischeckedsimultaneouslyforallpossiblefds(i.e.,allpossibleRa)onaLawhilethepairsoftuplesti,tjarebeingretrieved.TheproblemisthattotestthefdconditionforeachdifferentvaluebforLa,everytupleintherelationmustberetrievedinsearchofthesamevaluebfortheconditionofattributescomposingLa.Ifthesearchsucceeds,thenforeachpossibleRathecorrespondingattributesofbothtuplesarecheckedtoseeiftheirvaluesmatchtoo.–ThesortingoftherelationsaccordingLareducesthesearchofRa. Thealgorithmforextractingfdsthroughsub-relationformispresentedin(Malki,2000),onlythemainpartsaregivenbelow. Algorithm1.Extractionofthefunctionaldependencies Input: RS:Relationalsub-schema >FB:Formsbase FD:Setoffds:La→Ra /La:attributesnokeysandRas:keyattributes/FD:=∅ ForeachRelationRi La:=Xi−Ki/thesetofnon-keysattributes/ForeachLak Comp−Ra(X−Lak) /itiskeyattributeunderatomicshapeforwhichLa→Raisnoredundant/IfRathen (Lak)Sortri ,FD)/Testthisfd.accordingtoinstancesofform/Detect-fd(Lak,Ra,ri Endif Transit-fd(Fd)/deductallfds.whileapplyingtheruleoftransitivity/EndforEndforFAlgo 60 Detect-fd(La,Ra,r,FD)i:=1 n:=card(r) WhileRa=∅ori FD:=FD+(La,Ra)EndifEndetect M.Malki,A.Flory,M.K.Rahmoun Whileusingnon-keyattributes,thenumberofcandidatestoextractacanonicalfunc-tionaldependencebecomesthereforelinearwithregardtothenumberofattributes.Thetimecomplexityisthenreducedto(n|X|plnp),wheren=|Xi|,Xisathenon-keyattributeandPisnumberoftuplesinri. Whileapplyingthisalgorithmonthesub-schemaof“CardofWage”andtheirin-stances,onefindstheFd:departmentid→Employeeid.Thesefunctionaldependencies,thatareextracted,canbeusedfornormalizingcorrespondents’relationalsub-schemasin3BKCNF. 4.4.ExtractionofInclusionDependencies Oncetheextractionofthefunctionaldependenciesisachieved,oneproceedstotheex-tractionoftheinclusiondependencies,whichcontainthemajorityoftheinformationfortheidentificationofrelationshipsbetweenrelations.TheproblemwiththeextractionofIndsiswellknown(Chiang,1994;Castellanos,1993;Mannila,1994;Petit,1995).Thenaivemethodofchecking,foreachpartofrelationwithnattributesinthedatabase,allpossibleInds,onebyone,isparticularlyimpossible,becausetherearen!possiblesnoequivalentInds.InordertoreducethesetofpossibleInds,Chiangetal.in(Chiang,1994)andPetitetal.in(Petit,1995)useheuristicsthatonlyformulateIndsbetweenrelations’keyattributes.ThesepossibleindswillbeverifiedbyanalyzingdatainstancesinregardwithInddefinition. Inourapproach(Malki,1999),weformulatepossibleIndsbetweenrelations’keyofrelationalsub-schemaofform.Thetimeofthisprocessismoreoptimizedwithregardtotheotherapproaches((Chiang,1994)and(Petit,1995)),becausethepossibleIndsareverifiedbyanalyzingtheformextensionswhicharemorecompactrepresentationwithregardtothedatabaseextension. ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases61 Theheuristicsemployedbytheextractionprocessforproposingpossibleinclusiondependenciesaredescribedinbelow. –TworelationsRiandRj,havingthesameprimarykeyX,maybelinkedbyaInd:Ri.X< –ForeachformulatedInd,theextractionprocessmustcomparethesetsofvaluesappearinginRi.XandRj.XaccordingtotheInddefinition, –RedundantIndscanbedetectedbytheinferencerulesofinclusiondependencies(projectionandtransitivity)(Elmasri,1994),thatis:IfRi.X< Algorithm2.ExtractionofInclusionDependencies Input: RS:Relationalsub-schema: >FB:Formsbase: .Xi< ofSRSForeachRi ForeachRj=RiofSRS /TestofapossibleInd.joiningtworelationsevenhavingthekeyprimary/ =KjIfKi Test-Ind(Rj.Ki,Ri.Ki,IND)/AccordingtoInstanceofform/ =Foreignkey(Rj)ElseIfKi.Ki,Ri.Ki,IND)Test-Ind(Rj EndIfEndForjEndForiEndalgo Test-Ind(Ri.X,Rj.Y,IND) /itisaInd.noredundant/If(Ri.X,Rj.Y)∈IND /TheprojectionofrionX/ri.X:=πX(ri) /TheprojectionofrjonY/rj.Y:=πY(rj) Ifrj.Y=∅andrj.Yri.X IND:=IND+(Rj.Y< Ifri.X=∅andri.Xrj.Y IND:=IND+(ri.X< EndTest-Ind Inthisalgorithm,attributesofdependenciesaretheprimarykeysandforeignkeys.Thus,thetimecomplexityisreducedtothetestoftheinclusiondependencyontheforminstances. ThesetoftheIndsextractedfromtherelationalsub-schema“Cardofwage”is: Having-rank.rankid«Rank.rankidPayed.payeid«Element-Paye.payeidHaving-function.func.id«function.func.id 5.TransformationofRelationalSchemaintoConceptualObjectSchema Object-orientedtechnologyisbeingwidelyusedinsoftwaredevelopmentthesedays.Theobject-orienteddatamodelisgrowinginpopularityintheareaofdatabasedevelopment.Consequently,thereisalsoagrowinginterestinre-engineeringrelationaldatabasestoobject-orienteddatabases.Reverseengineeringofrelationaldatabasesisconcernedwiththetaskofextractingstructurescorrespondingtoaparticularconceptualmodelsuchastheentity-relationship(ER)model,theextendedentity-relationship(EER),theobject-oriented(OO)model,andsoon(Behm,2000;Farhrner,1995b;Ramannathan,1997;Tari,1997). Thefactthatourapproachusesdatabaseformsasmachine-analyzablesource,im-pliesthatwefirsttransformtherelationalsub-schemasofformsintoobject-orientedsub-schemas.Inthefollowing,wemergethesetofobject-orientedsub-schemasinaglobalobject-orientedschemarepresentingtheconceptualschemaoftheunderlyingdatabase.Theresultingglobalobject-orientedschemamustbevalidatedasarichandcorrectrepre-sentationoftheapplicationdomain.Asatargetmodel,weusetheobject-orientedmodelcalledHCB(Ayache,1995)developedinourlaboratory.Itispresentedhereafter.5.1.TheHCBModel TheHCBmodelisanobject-orientedmodelwithathree-dimensionalrepresentationofobjects.HCBisfirstamodelofrepresentation,becauseitallowsthespecificationoftheobjectscheme,followingthreeplanes(Fig.5a):structuring,communicationandinheri-tance,whereeachcomponentisdescribedseparatelyandrepresentstheprojectionoftheobjectonthecorrespondingplane.Asimplerepresentationofobjectsgives,ingeneral,adiagramcomposedofheterogeneousinformationanddifferentlinktypes.Thecompre-hensionandtheexploitationofsuchschemaarenoteasybecauseofthesuperpositionofthedifferentperspectives.TheHCBmodelaimsfirsttoofferanobjectschema,whichiscoherentandeasilyexploitable.ThemodeliscalledHCBbecauseeachperspective ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases63 Fig.5.a)ThethreeplanesofthemodelHCB;b)Representationofaclass. representsaspecificverb.ThestructureismodeledbytheverbHavebecauseitspeci-fieswhattheobjecthas.ThebehaviorismodeledbytheverbCommunicate.Finally,theinheritanceperspectiveismodeledbytheverbBebecauseitexpresswhattheobjectis.5.1.1.ConstructorsoftheHCBModel Thestructuringofinformationisdoneusingthefollowingconstructorstobuildcomposedobjecttypes(Fig.5b.): –tuple:ifaclasscontainscomponentofdifferenttypes,thoselatterwillbestructuredbythetupleconstructors.Thetupleisastructureofheterogeneousobjects,whereorderisnotsignificant.Thestructurecanberecursivesinceatuplemaybeusedtoconstructanothertuple. –set:thesetconstructorisamechanismallowingthecreationofa“set-object”fromhomogeneouscomponentobjectsofsametype,wheretheorderofobjectmemberisnotsignificant. –list:thelistconstructorallowsthegroupingtogetherofhomogeneousobjects,wheretheorderissignificantandanelementmayberepeated. 5.1.2.Composition,AggregationandInheritance Thecompositionlinksintroduceasemanticofcardinalitybetweenanobjectanditscom-ponents(attributes).Tokeepthisinformation,weproposethefollowingrepresentation:–asingle-valuedlinkisrepresentedbyasimplearrow(noted→), –amulti-valuedlinkisrepresentedbyadoublearrow(noted→→),andweuseatriplearrowwhenthevaluesformalist(noted→→→), –alinkofinheritance(i.e.,linkofspecialization/generalization)isrepresentedbyadoublelinksorientedfromsubclasstosuperclass, –inaddition,theconceptofroleallowsthemodelingofcrossedreferencesbetweenentitiesofthesystem.TheHCBmodelgeneralizesthenotionofroleandusesitanytimethattheattributevalueisareferencetoanobjectbelongingtoanotherclass(e.g.,Empidofdepartmentclass:Fig.6a). Thedynamicpartofanobjectoverlaystwoaspects,calculation(i.e.,methodswithcommunicationmessages)andbehavior(i.e.,lifecycle).Wedistinguishtwocategoriesofmethods: –interfacesmethods,whichallowaccesstovaluesofattributes.Wecanwrite,read,modify,andperformotheraccessfunctionsonattributevalues; 64M.Malki,A.Flory,M.K.Rahmoun Fig.6.a)Binaryassociation;b)N-aryassociation. –calculationmethodsallowingcomputationofattributevalues.Calculationmethodssendmessagestoaccessinstances. 5.2.TransformingFormRelationalSub-SchemaintoObject-OrientedSub-Schema M1 definedonadatamodelM1toaFormally,aschemamappingofasourceschemaS1 M2 definedonamodelM2maybedefinedasatransformationTsuchtargetschemaS2 M1M2 thatT(S1)=S2.Thetransformationsareusuallyacollectionofrulesthatmapcon-ceptsfromthesourcedatamodeltothetargetdatamodel(Hainaut,1995;Ramannathan,1997). Theproposedmethodologyoftransformation,whichisamixtureofkey-basedandInd-basedapproaches(Farhrner,1995a;Chiang,1994;Petit,1995),consistsinidentify-ingthecomponentsoftheobjectsub-schemaintheHCBmodelthroughtherelationalsub-schema.However,thisprocessthatconcernsallthreeplanesofHCBmodel(i.e.,structuring,andinheritance)permitsthat: –theidentificationofobjectclassesandtheircompositionrelationshipswillbepre-sentedinthestructuringplane; –theidentificationofinheritancerelationshipswillbepresentedintheinheritanceplane; –andtheextractionofthedynamicaspectofobjectclassesfromtheformswillbepresentedinthecommunicationplane.Thisprocesswillbeinvestigatedsubsequentlyinotherwork. 5.2.1.IdentificationofObjectClasses Inordertosimplifythetranslationprocess,weassumethatallrelations(orrelationscheme)areinthe3NF.Thereasonforthisistoensurethateachrelationcorrespondstoexactlyanobjectclass.Theseobjectclasseshavethesameattributesasthosecontainedintherelations.Thus,wedefineaclass-relationtobearelationthatdirectlymapstoanobjectclass. 5.2.2.IdentificationofAssociations IntheHCBmodel,weidentifytwotypesofassociations:binaryandn-aryassociations.Theseassociationsarerepresentedbyreferencelinks(i.e.,roleattribute)andassociationclassesrespectively. ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases65 5.2.2.1.BinaryAssociations.Theforeignkeysofclass-relationandthecorrespondingIndsidentifyabinaryassociationbetweenclass-relations.Therefore,thisreferentiallinkistranslatedinroleattributeintheHCBmodel.Thetargetwillbe,ingeneral,aroleattributetypedbytheotherclass.Astheassociationisnotdirected,thentheobtainedrepresentedcomprisescrossedreferences,asshowninthefollowingschema(Fig.6a).Whileapplyingthistransformationruleonthetwoclass-relationsandtheirIndsbe-low,wegeneratethefollowingobjectschema(seeFig.6a). Employee(emp.id.,emp.name,birth-date,address,sex,m.s.,number-children,ssn,accountid,nationality,recruitment-date), Department(deptid,design,empid), IND:Department.empid«Employee.empid. 5.2.2.2.n-aryAssociations.Everyclass-relationwhoseprimarykeyisentirelycom-posedofforeignkeysisaclassthatrepresentsanassociation(notedassociation-class)betweenalltheclassescorrespondingtotheclass-relationthatforeignkeysreferto.Thisn-aryassociationmaybetranslatedbyanattributetuple.Inthistransformationstep,anoptionmustbetakenandthepossibilityislettotheanalysttochooseoneofthetransformationrules.Althoughthecreationofassociation-classtransformingthen-aryassociationcanbecomeindispensable. TherelationHaving-function(funcid,empid,deptid,beginning-date,end-date)istranslatedintoAssociation-classasshowinFig.6b. 5.2.3.IdentificationofInheritanceRelationships Twoformsofrelationalstructuresmayindicateaninheritancerelationship: –Inthefirstcase,everypairofclass-relations(CR1,CR2)thathavethesamepri-marykey(notedX)andthecorrespondingInds(i.e.,CR1.X< Fig.7.a)ontworelationsb)ononlyonerelation. 66M.Malki,A.Flory,M.K.Rahmoun nullvalueforrankattributeforalladministrativeemployeesandnullvaluesforFunctionattributeforallteacheremployees.Thisisanexampleofinstancesoftwosubclassesbeingstoredinthesinglerelation. Theidentificationofthisformofinheritancerelationraisestheknowledgedomainfromdatabase(Piatetsky,1991).Thereareseveralknowledgediscoveryalgorithmsthatcanidentifyrulesfromrelationaldatabase(Piatetsky,1991).Byusingoneofthosealgo-rithmsontheEmployeerelation,forinstance,wecanidentifythefollowingrules:EmpType=“T”=⇒Function=Null(1)EmplType=“A”=⇒Rank=Null(2) Suchtypesofrulesaretypicallycalled“strongrules”sincetheyholdforallthein-stancesoftherelation.Ineffect,thesealgorithmscanbeusedtoidentifythediscrimi-nateattributeofthegeneralizationbyfindingruleswhoserighthandsideisoftheformA=NullwhereAissomeattributename.Followingisthegeneralizedprocedureforidentifyingsub-classesoncethecharacteristicruleshavebeenidentified. Asanexample,therules((1)and(2))permittoidentifythehierarchyofgeneralizationwithtwosub-classes(seeFig.7b). Regardingthetransformationrulespresentedabove,weproposeanalgorithmthattransformsaformrelationalsub-schemaintoanobject-orientedsub-schema.Algorithm3.Transformationofarelationalsub-schemaintoOOsubschema Input: RS:Relationalsub-schema: >FB:Formsbase: OS:Object-orientedsub-schema: relationForeachRi Create_classCi C:=C+Ci/eachrelationbecomesaClass/ toCi/Attachment(Xi,Ci)/AttachmentofallattributesofRi Endfor relationForeachRi .YjIf∃Ind:Ri.Xi< H:=H+(Ci−isa→Cj)/Aninheritanceoftwoclassrelations(Ci,Cj)/ ∈K/Xi⊂kiElseIf∃ki Ifkigroupstogetherapartitionofnforeignkeyreferencingforexamplethefollowingrela-tions:Ri,1...Ri,n create_association_classCa C:=C+Ca/Creationofanassociationclass(i.e.,Associationn-ary)/ :Cl/Classparticipanttotheassociation/ForeachRi,l create_attributeTlofCa/typedRolebyCl/ ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases67 Fig.8.Object-orientedsub-schemaof“Cardofwage”. L:=L+(Ca–role→Cl)Endfor ElseIfXiki create_attributeTi/typedRoleCj/ L:=L+(Ci-role→Cj)/generationofabinaryassociation(role)/create_attributeTj/typedRoleCi/L:=L+(Cj-role→Ci) ElseIdentifythestrongrules:(Xa=value)=⇒Xi=Nullcreate_super-classChwhoseasetofattributes=X−∪XiForeachXi create_subclassCh,i H:=H+(Ch,i-isa→Ch)/AninheritanceofonlyoneRelation/EndforEndifEndforEndalgo Fig.8.presentsthetransformationoftherelationalsub-schemaof“cardofwage”intoanobject-orientedsub-schema. 5.3.IntegrationofObject-OrientedSub-Schemas Inthisphaseofreverseengineeringusingformsasmachine-analyzablesource,object-orientedsub-schemasarederivedfromrelationalsub-schemas.Theseobjectsub-schemaswillbemergingintoaglobalobject-orientedschemathatrepresentsthewholeunderly-ingdatabase.However,weapplythetechniquesofintegrationschema.Weassume,inagreementwith(Batini,1986)thattheintegrationschemaprocessconsistsintwophases:comparisonandconformingofschemas,andmergingandrestructuringofschemas. 68M.Malki,A.Flory,M.K.Rahmoun Thecomparisonphaseperformsaparitiescomparisonofobjects(ofthesub-schemas)andfindspossibleobjectspairs,whichmaybesemanticallysimilarwithrespecttosomeproprieties,suchassynonyms(nameofattributeandclass)ofequalprimarykeyattributeandequivalentofclasses.Theconformingisavarietyofanalystsassistedtechniquesthatareusedtoresolveconflictsandmismatchedobjects. Themergingandrestructuringphasegeneratesanintegratedschemafromtwocom-ponentschemasthathavebeencompared.Theintermediateresultsareanalyzedandre-structuredinordertoeliminatethesymmetricalandtransitiverelationshipbetweenob-jects. Inaddition,weconsideronlyonepairofschemasatatime;further,theresultofintegrationschemaisaccumulatedintoasingleschema,whichevolvesgraduallytowardstheglobalobjectschema.Formoredetailssee(Malki,2000).5.4.Validation Thekeyquestiontobeansweredbyvalidationis:howfaithfullywillalegacydatabaseberepresentedbytheglobalobjectschemaproducedfromreverseengineeringprocess?Aschemaiscorrectifallconceptsoftheunderlyingmodelareusedcorrectlywithrespecttosyntaxandsemantics(Chiang,1994;Farhrner,1995b;Hainaut,1995).Ifalldatabaseinstancesthatcanberepresentedinthesourceschemacanalsoberepresentedinthetargetdatabaseschemaandviceversa,theprocessisinformationpreserving. Ourvalidationprocessiscompletelyinteractive.Humaninterventionwillberequiredintheverificationandconformationoftheresults.Inthisprocess,thatisnotdeterminist,someconceptualconstructionwillbeexaminedeithertobevalidatedortobeputbackinreason.However,itconsistsinidentifyingthehiddenobjects,reiteratingtherestructuringprocessinordertoeliminatethesymmetryandthetransitivityofcompositionandinheri-tancerelationshipsandinsuringtheconsistencyoftheglobalobject-orientedschema.Formoredetails,see(Malki,2000).6.Conclusion Amethodologyforperformingreverseengineeringofarelationaldatabasetowardsanobject-orientedsystemisproposed.Ourapproachthatrequirestheinteractionwithusers(i.e.,analystordesigner)usestheinformationextractedfromformstructuresandin-stancesasmachine-analyzablesourcefordatabasereverseengineering. Thismethodologydoesnecessarilyamodelthatallowsabstractingdatabaseformsandaframeworkthatallowstoanalyzingformsinordertoextractstructureinforma-tionaswellasdomainsemantics.Formsareusedincombinationwithothersemanticsources,especiallyrelationaldatabaseschemaanduses“head-knowledge”.Theissueswithregardtothevalidationandverificationoftheextractionandtransformingprocessarealsodiscussed. Anexperimentationofthismethodhasbeenachievedinthesettingofthe“PersonnelManagement”systemofouruniversity.Thisapproachcansupplementexistingdatabasereverseengineeringtechniqueswhereformsconstituteimportantusesofthedatabase. ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases69 Thereareanumberofpossibleextensionstothisresearch: –wehaveusedtechniquesofdataminingforidentifyinginheritancerelationships;thesetechniquescanbegeneralizedforextractingdatadependencies; –forthemigrationtowardObject-orientedsystems,wehavemadeonlytheextractionofthestaticaspect(i.e.,datasemanticandconceptualstructure),itremainstoextractthedynamicaspectinordertotranslatedataandprogramsintoObject-Orientedsystem.Acknowledgments Wewouldliketothanktheanonymousrefereesfortheirinsightfulcommentsandsugges-tionsforimprovingthispaper.ThisresearchwassupportedinpartbyNationalScientificResearchCouncil,Algeria,undercontractNo:B2205/-/11/99.References Anderson,M.(1994).ExtractingaE.R.schemafromarelationaldatabasethroughreverseengineering.InProc.ofthe13thInter.Conf.ontheERA’94,pp.403–419. Armstrong,W.(1974).Dependencystructuresofdatabaserelationships.IFIPCongress,580–583. Ayache,M.,A.Flory(1995).Agenerationprocessofobject-orienteddatabasefromE/Rschema.InProc.ofInt.Conf.ofSoft.Eng.&Know.Eng.,RockvilleMaryland. Batini,C.etal.(1984).Amethodologyforconceptualschemadesignofofficedatabase.InformationSystem,9(3). Batini,C.,M.Lenzerini,S.B.Navathe(1986).Acomparativeanalysisofmethodologiesfordatabaseschemaintegration.ACMComputingSurveys,18. Batini,C.,S.Ceri,S.B.Navathe(1992).ConceptualDatabaseDesign:anEntityRelationshipApproach.Ben-jaminCummings Behm,A.etal.(2000).Algebraicdatabasemigrationtoobjecttechnology.InProc.ofInt.Conf.ER’00,pp.440–453. Casanova,M.A.,J.E.A.DeSà(1983).Designingentity-relationshipschemasforconventionalinformationsys-tem.InProc.of3thConf.OntheEntity-RelationshipApproachCalifornia,pp.265–277. Castellanos,M.,F.Saltor(1993).Extractionofdatadependencies.InProc.of3thEuropean-JapaneseSeminaronInformationModellingandKnowledgeBases. Chiang,R.H.L.,T.M.Barron,V.C.Story(1994).Reverseengineeringofrelationaldatabases:extractionofanEERmodelfromarelationaldatabase.DataandKnowledgeEngineering,10(12). Choobineh,J.(1992).Aform-basedapproachfordatabaseanalysisanddesign.CommunicationoftheACM,35(2). Elmasri,R.,S.Navathe(1994).FundamentalsofDatabaseSystems.BenjaminCumming,2ndedition. Farhrner,C.,G.Vossen(1995).Asurveyofdatabasedesigntransformationbasedontheentity-relationshipmodel.DataandKnowledgeEngineering,15,213–250. Farhrner,C.,G.Vossen(1995).Transformingrelationaldatabaseschemasintoobject-orientedshemasaccord-ingtoODMG-93.InProc.ofthe4thInter.Conf.onDeductiveandObject-OrientedDatabase.Singapour.pp.429–446. Hainaut,J.L.,V.Englebert,J.Henrard,J.M.Hict,D.Roland(1995).Requirementsforinformationsystemreverseengineeringsupport.InProc.oftheIEEEWorkingConf.onReverseEngineering.Toronto. Ji,W.(1991).Analgorithmconvertingrelationalschemastonestedentity-relationshipschemes.InProcof10thint.Conf.onERA’91.SanMateo.pp.231–246. Johanneson,P.(1994).Amethodfortranslatingrelationalshemasintoconceptualschemas.InProc.ofthe10thIEEEint.Conf.onDataEngineering.Houston. Lefkovitz,H.C.(1979).AstatusreportontheactivityiesoftheCODASYLenduserfacilitiescommittee(EUFC).SIGMODRec.,10(2). 70M.Malki,A.Flory,M.K.Rahmoun Kalman,K.(1991).Implementationandcritiquesofanalgorithmwhichmapsarelationaldatabasetoacon-ceptualmodel.InProc.ofthe3thInter.Conf.onComputerAidedSotwareEngineering. Mfourga,N.(1997).Extractingentity-relationshipschemasfromrelationaldatabases:aform-drivenapproach.InProc.ofWorkingConf.onReverseEngineeringWCRE’97. Malki,M.,M.Ayache,M.K.Rahmouni(1999).Rétro-ingénieriedesBasesdeDonnéesRelationnelles:Ap-procheBaséesurl’AnalysedeFormulaires.InProc.ofColloqueofINFORSID’99.Toulon,France. Malki,M.,A.Flory,M.K.Rahmouni(2000).Rétro-ingénieriedesapplicationsdegestionàpartirdel’analysedesformulaires.ResearchReportLRISI,04/2000Univ.SidiBelAbbes. Mannila,H.etal.(1994).TheDesignofRelationalDatabases.Addison-Wesleypublishing. Markowitz,V.M.(1990).Identifyingextentedentity-relationshipobjectstructuresinrelationalschemas.IEEETrans.onSoft.Engineering,16(8). Navathe,S.B.,A.M.Awong(1987).Abstractingrelationalandhierarchicaldatawithasemanticdatamodel.InProc.of6thInt.ConfonERA’87,pp.303–336. Petit,J.M.,F.Toumani,J.Kouloumdjian(1995).Relationaldatabasereverseengineering:amethodbasedonQueryanalysis.Inter.JournalofCooperativeInformationSystem,4(2,3),287–316. Piatetsky-Shapiro,G.,W.Frawley(1991).KnowledgeDiscoveryinDatabases.AAAIPress/TheMITPress,California. Premerlani,W.J.,M.Blaha(1994).Anapproachforreverseengineeringofrelationaldatabases.CommunicationofACM,37(5),42–49. Ramannathan,S.,J.Hodges(1997).Extractionofobject-orientedstructuresfromexistingrelationaldatabases.SIGMODRecord,26(1). Shu,N.C.(1985).FORMAL:Aforms-oriented,visual-directedapplicationdevelopmentsystem.Computer,18(8). Soutou,C.(1998).Relationaldatabasereverseengineering:algorithmtoextractcardinalityconstraints.Data&KnowledgeEngineering,28,161–207. Tari,Z.etal.(1997).Thereengineeringofrelationaldatabasesbasedonkeyanddatacorrelation.InProc.ofInt.Conf.ofIFIP’97.PublishedbyChapman&hall. Ullman,J.D.(1984).PrinciplesofDatabasesSystems.Computersciencespress. Yao,S.B.etal.(1984).Formanager:anofficeformsmanagementsystem.ACMtrans.On.InformationSystem,2(3). ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases71 M.MalkireceivedtheengineerdegreeincomputersciencefromNationalInstituteofComputerScienceAlgiers,andtheMasterofScienceDegreeinComputerSciencefromUniversityofSidiBel-AbbesAlgeria,in1983and1992respectively.HeiscurrentlyaPh.D.candidateintheComputerScienceDepartmentattheSidiBel-AbbesUniversity.HeisaseniorlecturerintheComputerScienceDepartmentattheSidiBel-AbbesUniver-sity.Hisresearchinterestsincludesemanticdatamodeling,databasereverseengineering,informationsystemreengineering,andbuildingweb-basedapplications. A.FloryreceivedthePh.D.degreeincomputersciencefromClaudeBernardUniversity,LyonFrance,in1977.HeisaprofessorintheComputerScienceDepartmentattheNa-tionalInstituteoftheAppliedSciencesofLyon.Hisresearchinterestsareengineeringofinformationsystem,databasereverseengineering,documentaryinformationengineering,medicalinformationsystems. M.K.RahmounireceivedthePh.D.degreeinoperationalresearchfromSouthamptonUniversityUK,in1987.HeisaprofessorintheComputerScienceDepartmentattheUniversityofOranAlgeria.Hisresearchinterestsincludeformalspecifications,method-ologiesofspecificationintheinformationsystems,databasereverseengineering. 72M.Malki,A.Flory,M.K.Rahmoun Objektiniuschemuformavimasišesamurealiaciniubaziuvadovaujantisformomis MimounMALKI,AndréFLORY,MustaphaKamelRAHMOUNI Straipsnyjenagrin˙ejamasuždavinys,kaip,vadovaujantisvartotojuipateikiamomisrezultatu duomenubaziuschemas.Sprendžiantšiuždavinisi¯ulomapasinaudotiformomis,atkurtireliaciniu formustrukt¯ura,irjuturiniu(konkreˇciaisduomenimis).Šitaipgalimaatkurtirealiaciniuirrezultatu poschemiusirjuribojimus.Gautusrealiaciniusposchemiussi¯ulomatransformuotiiobjek-baziu estiniusposchemiusir,agreguojantobjektiniusposchemius,suformuotiatitinkamosduomenubaz˙ visaobjektineschema. 因篇幅问题不能全部显示,请点此查看更多更全内容