您的当前位置:首页正文

Extraction of object-oriented schemas from existing relational databases a form-driven appr

2024-06-02 来源:易榕旅网
INFORMATICA,2002,Vol.13,No.1,47–72

2002InstituteofMathematicsandInformatics,Vilnius

47

ExtractionofObject-orientedSchemasfromExistingRelationalDatabases:aForm-drivenApproach

MimounMALKI,AndréFLORY,MustaphaKamelRAHMOUNI

ComputerScienceDepartment,UniversityofSidiBel-Abbes,AlgeriaLISIInsa-Lyon,29avenueAlbertEinsteinVilleurbanne,FranceComputerScienceDepartment,UniversityofOran,Algeria

e-mail:malki_m@yahoo.com,flory@insa.insa-lyon.fr,rahmouni@mail.univ-oran.dzReceived:May2001

Abstract.Inthispaper,wepresentourForm-drivenapproachforreverseengineeringofrelationadatabases.Thismethodologyusestheinformationextractedfrombothformstructureandinstancesasadatabasereverseengineeringinputusinganinteractionwithauser.Throughacombinationofformsstructuresanddatainstancesanalysis,formsrelationalsub-schemasandtheirconstraintsarederived.Theserelationalsub-schemasaremappedtoobjectsub-schemas,whichwillbemerg-ingintoglobalobject-orientedschemathatpresentsthewholeunderlyingdatabases.Theresultingglobalobject-orientedschemamustbevalidatedasarichandcorrectrepresentationoftheapplica-tiondomain.

Keywords:relationaldatabases,reverseengineering,object-orienteddatabases,forms,integration,validation.

1.Introduction

Inmanyorganizations,therearealargenumberofdatabaseapplicationsthathaveevolvedovermanyyears.Thesesystemsarereferredtoaslegacysystems(Farhrner,1995a;Chi-ang,1994;Hainaut,1995;Soutou,1998)andarecharacterizedbylackofdocumentationandnon-uniformityresultingfromnumerousextensions.Thesepropertiesleadtoinflex-iblesystemsandhighmaintenancecosts.

Reverseengineeringcanhelpbyunderstandinglegacydatabasesandextractingtheirdesignspecification(domainsemantics).Itisthepartofdatabasemaintenanceworkthatprocuresasufficientunderstandingofalegacydatabase,anditsapplicationdomaintoallowappropriatechangestobemade.Morespecifically,databasesreverseengineeringcanbetreatedasaprocessthatobtainsdomainsemanticsaboutalegacydatabase,makesthereverseschematransformation(fromlogicaltoconceptual)and,finally,representstheresultusuallyasaconceptualschema(orobject-orientedschema)(Farhrner,1995b;Behm,2000).

Usersinorganizationsarequiteexperiencedinthehandlingandmanipulationofdatabasesthroughforms.TheCODASYLEndUserFacilitiesCommittee(EUFC)has

48M.Malki,A.Flory,M.K.Rahmoun

recognizedtheimportanceofformsasthemostnaturalinterfacebetweenusersanddataandrecommendstheform-orientedapproachasthemainvehicleforuserinterface(Lefkovitz,1979).Followingtheserecommendations,therehavebeenanumberofauto-matictoolsfordefining,specifying,andimplementingformapplications(Batini,1984;Shu,1985;Yao,1984;Choobinen,1992);querylanguagesanddatabasesystemshavealsobeenextendedtodealwithformsandothertypesofdocumentsusedintheofficeenvironmenttofacilitatethemanipulationofdataviacomputerizedform(e.g.,OracleSQLForms,Informix-SQL,MicrosoftAccessFormsandreportstools).

Theobjectiveofthisresearchistoproposeaform-basedapproachforreverseengi-neeringofrelationaldatabases.Byform,wemeananystructuredcollectionofvariables(formfields)whichareappropriatelyformattedtocommunicatewiththedatabasesespe-ciallyfordataretrievalanddatadisplay.

Thisapproachusestheinformationextractedfrombothformstructureandinstancesasadatabasesreverseengineeringinputusinganinteractionwithauser.Thispropositioncanbesupportedbythefollowingarguments:

–aformmodelisadatamodel.Studyingandanalyzingaformanditsrelationshiptootherformscanrevealmanydatadependenciesandmapping;

–usuallythemostwidelyuseddataaregatheredorreportedinaformand,therefore,importantinformationcanbeobtainedbyanalyzingadatabase’sforms;

–formsarefamiliarandcommonplacesobjects,theyareeasytoreadandunderstand;–normallytheformsareaccompaniedbyinstructions.Theinstructionsprovideaddi-tionalinformationaboutorganization’sdataandtheirprocedural(orbehavior)parts.Ourmethodologyarticulatearoundfivephases(Fig.1):–acquisitionoftheformsstructureandinstances,

–extractionofthedomainsemanticsfromformsanalysisinordertoconstructtheirrelationalsub-schemas,

Fig.1.Phasesofform-drivenapproachofrelationaldatabasesreverseenginnering.

ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases49

–transformationofrelationalsub-schemasinobject-orientedsub-schemas,

–integrationoftheseobject-orientedsub-schemasintoaglobalobject-orientedschema,

–validationoftheresultingglobalobject-orientedschema.

Thepaperisorganizedasfollows.InSection2,relatedapproachesindatabasere-verseengineering,arediscussed.Section3containsthedifferentstagesofformacquisi-tionwhileintroducingtheconceptsoftheirmodelandtheirgraphicdescriptionlanguage.Section4presentstheextractionofdomainsemanticsfrombothformstructureandin-stances,andconstructionoftheirrelationalsub-schemas.Section5describestherulesrelationalsub-schemastransformationinobject-orientedsub-schemas,theintegrationofthesesub-schemasintoaglobalobjectschemaandthevalidationoftheresultingglobalobjectschema.Section6concludesthepaperandsummarizestheresults.

2.RelatedWorks

Severalresearcheshavebeendoneonrelationaldatabasereverseengineering,suggest-ingmethodsandrulesforreverseschematransformationofthelogicaldataschemaofalegacydatabaseintoaconceptual(orobject-oriented)schema.Weclassifythesecontri-butionsintwocategories.

Inthefirstcategory,theextractionphaseofdomainsemanticsisnotpresent.Theseapproachesdealwiththeschematransformation,andhaveverystrongrequirementsforlegacydatabasestobereversingengineered.Forexample,theirprocessrequiresthattherelationsareinatleastthirdnormalform(3NF),aconsistentnamingconventionisappliedtoattributes,andinformationontheavailabilityofinclusiondependenciesandkeysisavailable.FarhnerandVossen(Farhrner,1995a)classifythesemethodsintwoclasses:

–Approaches,inthefirstclass,elicitthesemanticsofthegivenrelationalschemathroughanevaluationofitsinclusiondependencies(notedinds).Eachindisinterpretedwithrespecttotherole(key,partofkey,foreignkey,non-key)ofitsattributes.Mostoftheseapproachesrequireacompletesetofindsandkeysoftherelations;sometakeonlykey-basedindsintoaccount(Casanova,1983;Ji,1991;Markowitz,1990),whileothersconsidergeneralinds(Johanneson,1994;Kalman,1991).

–Approaches,inthesecondclass,deriveaconceptualschemathroughanevaluationofkeys,theirconstructionthroughotherkeys,andadiscoveryofforeignkeys.Severaloftheseapproaches(Navathe,1987;Batini,1992;Chiang,1994;Tari,1997);alsoclassifytherelationofthesourceschemawithrespecttotheconstructionoftheirkeys,e.g.,whethertheprimarykeyofarelationistheconcatenationoftheprimarykeysofotherrelations.Thetransformationisthendoneonthebasisofthisclassification.

Inthesecondcategory,theextractionphaseofdomainsemanticsisconsidered,suchaskeys,functionalandinclusiondependencies,andintegrityconstraintsbyusingmachine-analyzablesources,suchasdatabasesinstancesandqueries(orapplicationpro-grams).Forexample,alegacydatabaseforreverseengineeringnormallycontainsalarge

50M.Malki,A.Flory,M.K.Rahmoun

volumeofdatafromapplicationdomain,sothesedatainstancescanbesourcesforex-tractingdomainsemantics.

–Approachesusingapplicationprogramsarebasedontheuseofquerylanguagestatementstoextract(someof)thesemanticinformationstoredinarelationaldatabase(e.g.,Anderson,1994;Petit,1995).Anderson(1994)analysesequi-joinsstatementswhereas.Petit(1995)analysesauto-joins,setoperationsandwhere-byclausestatementsaswellastheequi-joinstatements.Otherwise,theHainaut’sapproach(Hainaut,1995)analysesthecodeofprograms,mainlythelogicaldescriptionoffilesinCobol,forcom-pletingknowledgeonthedatastructureandontheconstraints.

–Approachesusingextensionoflegacydatabasearebasedontheanalysisofdatainstancestounderstand(someof)thesemanticsofdatabaseapplication.Chiang’sap-proach(Chiang,1994)usestherelationalschemainformationasthebasicinputandaimtoextractthesemanticinformationbyexhaustiveanalysisofeachrelationintheschemaandtheirkeyandnon-keyattributes.AlsoAnderson(1994)andPermerlanietal.(1994)analyzedatainstancestoextractindsfromindicationprovidedbyjoins.Otherwise,in(Tari,1997),theclassificationoftherelationalschemaisbasedontheanalysisofdatain-stances.Independently,Mannilaetal.(1994)proposesomealgorithms(based-heuristics)toextractfunctionalandinclusiondependencies.

Untilrecently,researchesindatabasereverseengineeringdonotuseforms,whichmanipulatedatabases,exceptMfourga’sapproach,asmachine-analyzablesource.In(Mfourga,1997),Mfourgaproposesframeworkforextractinganentity-relationshipmodelfromasetofformmodelschemasofalegacyrelationaldatabase.Formmodelschemasgatherinformation,i.e.,structuralinformationandconstraintsamongdata,ex-tractedfromboththeirstructuresandinstances.Thisapproachusestheseschemastosupplementlow-levelschemasschemestohelpreconstructaentity-relationshipschemas.Ourapproachemphasizestheextractionoftherelationalsub-schemaofform,fromboththeformstructureandthephysicalschemaoftheoperationaldatabase,thatisinfactapivotschemareflectingthesemanticsoftheoperationaldatabaseaswellasthesemanticsofforms.Sothisschemafacilitatestheextractionofdependenciesinordertoconstructtheconceptualobjectschemaoftheunderlyingoperationaldatabase.

3.AcquisitionofForms

Thisphasesupportsthedescriptionofformsandtheirinstancesononehand,and,theautomaticextractionoftheformshierarchicalstructuresinordertofacilitatetheiranal-ysisandinterpretationontheotherhand.Inthisway,weuseourformmodelpresentedhereafter.3.1.FormModel

In(Malki,1999),wehavepresentedageneralmodelofformssuitablefordatabasesre-verseengineeringtask.Themodelallowsabstractinganydatabaseform,thatis,tomakeexplicititscomponents,fieldsaswellasobjects,andtheirinterrelationships.Itplacesa

ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases51

specialemphasisonfindingrelevantconstraintsamongdatabaseformcomponentbyex-ploitingtheforminstances.Thismodelissimilarbutnotidenticaltothemodelspresentedin(Choobineh,1992;Mfourga,1997).

3.1.1.FormType

Aformisastructuredcollectionofvariables(i.e.,fields)appropriatelyformattedtocom-municatewithadatabase.Aformtypedefinesthestructure,constraintsandpresentationoftheformfields.Itrepresentstheformintensionasperceivedbyusers.Aparticularrepresentationofformtypeiscalledaformtemplate.Aformtemplatenotonlyspecifiesarepresentation,butisalsomediumdependent.Thispaperonlyaddressesscreenandpaperforms,whichareusedascommunicationinterfaceswiththedatabase.

Threebasiccomponentsofanytemplatearetitle,captions,andentries.Everyformmusthaveatitle,whichnamestheformandprovidesageneraldescriptionofwhatistobefilledinbytherespondent.ThetitleoftheforminFig.2is“CARDOFWAGE”.Captionsarepreprintedontheform.Theyserveasaclueaswhatistobefilledinbytherespondentaswellasaguidetoenterortoreaditontheform.Anentryistheactualdatathatisenteredbyanoperatorordisplayedbytheformprocessingsystemonthespaceprovided.Theentrydatamaybetext,numeric,bitmaporanyotheruserdefineddatatype.AsanexampleinFig.2,Nameisthecaptionassociatedwiththeentrywiththevalueof“Malki”.

Aforminstanceisaparticularcollectionofvaluesfortheformfields.Whenaformtemplateisfilledwithvalues,itbecomesaninstanceofthatformtype.Fig.2isatemplateof“CARDOFWAGE”formwiththevaluesofaparticularforminstances.

3.1.2.StructuralUnits

Threetypesofformfieldscanbeidentified:atomic,box,andmultiplechoice(table).Anatomicfielddoesnothaveanyconstituent(sub)fields.Abox(ortable)fieldgroupsa

Fig.2.Exampleofform:“Cardofwage”.

52M.Malki,A.Flory,M.K.Rahmoun

numberofotherfields.Aformfieldroughlycorrespondstotheconceptofanattributeindatabaseterminology.

Astructuralunittype(SU),isabox(ortable)ofhomogeneouspiecesofinformation,thatis,anobjectthatgroupscloselyrelatedformfields.EachSUisalogicalsub-partofaformtype.TheFig.2showssomestructuralunitsofthe“CARDOFWAGE”form:EmployeeIdentification,Function/activityandPremium/indemnities.

3.1.3.RelationshipsbetweenStructuralUnits

Weusetheparent-childrelationshipwhichisone-to-many(multi-valued)orone-to-one(single)relationshipbetweentwostructuralunits.OneoftheSUs(alwaystheone-side)iscalledtheparentSU,theother(many-sideorsometimesone-side)iscalledthechildSU.AnoccurrenceofarelationshipconsistsofoneSUoccurrenceoftheparentandoneorseveraloccurrencesofthechildSU.IntheFig.2,therelationshipbetweentheSU“CARDOFWAGE”(parent)andtheSU“Identification”(Child)isone-to-one,i.e.,thata“CARDOFWAGE”isestablishedforonlyoneEmployee.OntheotherhandtherelationshipbetweentheSU“CARDOFWAGE”(parent)andtheSU“Premiums/indemnities”isone-to-many,i.e.,thatanemployeemayreceiveanumberofpremiumsandindemnities.Inthisway,ourmodelpermitsahierarchicalstructuringofformfields.Thishierar-chicalstructurecorrespondstoformtypelogicalstructure.Therootofthetreeistheformtitle.Theothernodesrepresentstructuralunits.

Childrenthathavethesameparentbelongtothesamelevel.Therootlevelisnum-ber0andeachsubsequentlevelisonegreaterthanitsparent’s.TheFig.4showsthehierarchicalstructureoftheform“CARDOFWAGE”.

Onemaynotethattheformmodelincludesalogicalandaphysicalaspect.

Thelogicalaspectcorrespondstotheformdefinition,asitisperceivedbyusers(ordesigners),i.e.,itsintensionalpart.Thisintensionalpartisalogicalstructurethatcon-tainslogicalsub-partscorrespondingtotheobjects(structuralunits)thatcomposeaformviewedasaparent-childhierarchicalstructure.Thelogicalaspectallowsunderstandingboththeformstructureandmeaning.

Thephysicalaspectofaformisrepresentedbyagivenformtemplate.Inthistemplate,thetitleoftheformprovidesageneraldescriptionofwhatistobefilledin.Theplacementofthefieldsisoftenwellstudiedandhomogeneousinformationisingeneralcontiguous.Fieldcaptionsareingeneralmoreexplicitthanunderlyingdatabaseattributes.Thetem-platealsoprovidesinstructionsthatrepresentcontextualknowledgethatisknowledgeaboutobjetsrepresentedontheforms.Therefore,theformphysicalaspectcanbeusefulforidentifyingtheobjectscontainedinaform.

Now,wecandefinethenotionoftheformbaserepresentingalegacy(oroperational)databaseapplication.Thisbasepossessesforeveryform:itsdescription,itshierarchi-calstructure,anditsinstances.Formally,aformbaseisatripletB(D,S,I),whereDrepresentsaformdescription(i.e.,bothphysicalandlogicalaspects),Srepresentsahi-erarchicalstructureofformandIrepresentstheirinstances.

ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases53

3.1.4.GraphicLanguageofFormDescription

In(Malki,1999),wehavepresentedagraphiclanguageofformdescription(notedGLFD).Thislanguage,thatisagraphicinterface,assiststheuser(ordesigner)inspecify-ingformdescriptionfromconceptsoftheformmodelpresentedabove.Theideaconsistsinidentifyingfieldsandstructuralunitsoftheform(logicalaspect)fromitsgraphicrep-resentationorformtemplate(physicalaspect).

WedistinguishtwodefinitionshapesoftheGLFD(graphicanddeclarative)thatarejointlyusable.

3.1.4.1.GraphicDefinition.ItwouldbehelpfultousethegraphicdefinitionoftheGLFDinordertofacilitatetheformdescriptionthroughitsgraphicinterfacewhileus-ingthebaseconceptsoftheformmodel:group(orbox),list(ortable)andfields.Thisgraphicinterfacedescribesphysicalaspectofformwhileconsideringthestructuralunitsasbox(ortable)composedwithtext,graphicandinformation(field).Thisidentificationissometimesenforcedbyseparators(e.g.,blanks,lines).

3.1.4.2.DeclarativeDefinition.Thebaseconceptsoftheformmodelusedbythegraphicinterfaceareexpressedalsounderdeclarativespecifications.Thesedeclarativespecificationsdescribethefeaturescharacterizingeveryelement(field,andstructuralunit)offormreproducedgraphically.

Thesyntaxofthisdeclarativespecificationisthefollowing:

Box:

Identify:

Composition:{,,,}End-Box

Table:

Identify:column:1/NLine:1/M

Field-ColumnorField-line:{}End-table

Field:

Identify:

Type:integer/real/Boolean/characterLength:

Origin:System/User/computinglevel:End-field

Letustakeanexampleofthelistspecification“premium&indemnities”of“CARDOFWAGE”(seeFig.3).

54M.Malki,A.Flory,M.K.Rahmoun

Fig.3.Exampleofspecifationoftableacquisition.

3.2.InsertionofForms

Inthisstep,weuseagraphiclanguagebasedontheformmodelconceptspresentedabove,fordescribingtheformaccordingbothtotheirphysicalandlogicalaspects.Inthesameway,aformdictionaryisusedininteractionwiththedesigner.Thisdictionary,whichwillbeintegratedintheformbase,containsthemeaningoffields(designation,synonym,andrulesofcomputedfield).Thisdictionaryisusedtoinsureapresentationascompleteaspossiblefortheformanalysis.

Foreachform,wetakeseveralinstances.Theseinstanceswillbeusedwiththeformaldescriptionofformasmachine-analyzablesourceinordertofacilitatetheextractionofthedomainsemantics.3.3.StructuringofForms

Thisprocess,thatexploresbothphysicalandlogicalaspectsofformsdescribedbythegraphiclanguage,identifiesthestructuralcomponentsofformandtheirinterrelationshipsinordertoconstructthehierarchicalstructureoftheform.Itwouldbehelpfultohaveaprecisepictureofthehierarchicalrelationshipsofaforminordertoclearlyunderstanditsmeaningandfacilitatetheextractionofthedomainsemantics.

Thisprocedure,whichisanautomaticprocessandtransparenttothedesigner,con-structsthehierarchicalstructureinfoursteps:

–definingtherootnodewhosenameistheform’stitle.Allfieldsarereattachedtotherootnode;

–transforminginthefirstallstructuralunitsinnodewithlevel1,i.e.,theirparentistherootnode;

–identifyingtheparent-childrelationshipsamongstructuralunits.Becausethereareseveralleveloffields(e.g.,0,1,2,3,etc.),thisprocessidentifyrecursivelythesub-structure(orsub-node)accordingtolevelsoftheirfieldsspecifiedintheinsertionprocessofforms;

ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases55

Fig.4.Hierarchicalstructureoftheform“CardofWage”.

–andsimplifyingthehierarchicalstructurewhileeliminatingsub-structuresthatcon-tainonlythetext(orgraphic)andcalculatedfieldsthatdonotcontaininterestinginfor-mation.

Forexample,theform“CARDOFWAGE”isrestructuredinhierarchicalstructure(seeFig.4).

4.ExtractionoftheDomainSemantics

Formsareoftenthemostconvenientinterfacesforentering,changing,andviewingdataindatabaseapplications.Therefore,themostwidelyuseddataaregatheredorreportedinforms.Theyaredesignedmostlywithadeliberateattemptatuser-friendlinessandtheydonotnecessarilydirectlymirrorthestructureoftheunderlyingdatabase.

Adeliberateeffortofabstraction(i.e.,descriptionandstructuringofforms:seesectionabove)isnecessarilyformakingallthecomponentsoftheformindividuallyaccessible,andexplicatingtheirinterrelationshipsaswellastheconstraints(ordependencies)onthedatatheydisplay.Inthisway,theformsofagivendatabasetakenasawholecanbeusedtorecoverdatabasesemantics,especiallywithlegacysystems.Indeed,withsucholdersystems,databaseapplicationsoftendonotcomewithaconceptualschema,butonly,forexample,witharelationalschema.

Intherelationalmodel,semanticsarespecifiednotinthestructure,butthroughcon-straints.Datadependenciesareonekindoftheseconstraints;wherefunctionalandinclu-siondependenciesareofparticularinterest.Nowadays,relationaldatabasesarespreadeverywhere,andmanyapplicationslikereverseengineering,integratedaccessforin-teroperabledatabases,knowledgebasedinterfaces,migrationtonewerdatabaseman-agementsystems(DBMS),andimprovementontheeffectiveutilizationofthedataoftheorganizationrequireadeepknowledgeonthesemanticsoftheseexistingdatabases.Unfortunately,manyrelationalDBMSdonotprovidemuchsupportforspecifyingdatadependencies,andconsequently,thissemanticscannotbefoundintheschema.However,sincedatadependenciesareimplicitintheextensionofthedatabases,asolutiontothisproblemistoextractthedependenciesbyanalyzingtheextension.

Thegoalofthisphaseofextractionistoderivetherelationalsub-schemaofformfromtheirhierarchicalstructureandtheirinstancesaccordingthephysicalschemaoftheunderlyingdatabase.Thisprocess,inthefirst,identifyrelationsandtheirprimary

56M.Malki,A.Flory,M.K.Rahmoun

keysrespectivelywithregardtobothstructuralunits(ornodes)offormandunderly-ingdatabase,thenextractthefunctionalandinclusiondependenciesthroughboththeirhierarchicalstructureandinstances.

Sincetheformsgrouprealobjects(orstructureunits)whicharemanipulatedbyend-user,thisprocessofextractiondoesnotnecessarilyrestructuringtherelationalsub-schemesrepresentingtheseformsontheonehand,andfacilitatesthederivationofre-lationsaswellastheidentificationofobjects(orstructureunits)ontheotherhand.Otherwise,itisnotoftentrue,whenonetakesonlythephysicalschemeofdatabaseasmachine-analyzablesource(Chiang,1994;Petit,1995).Inaddition,theusingofforminstancesreducesthetimeofextractionofdomainsemantics,withregardtoapproachesthatusedatabaseinstancesasmachine-analyzablesource(Chiang,1994;Mannila,1994;Petit,1995).

Westartbrieflybyintroducingtheconceptsoftherelationalmodel(Ullman,1984).4.1.RelationalModel

Inthissub-section,thebasicrelationalconceptsandterminologyusedinthispaperarereviewed.Detailscanbefoundinanytextbookontherelationalmodel,e.g.,(Ullman,1984;Elmasri,1994).

Arelationschemeisapair,usuallydenotedRi(Xi),whereRiisare-lationschemename,andXidenotesasequenceofattributes.Eachattributeofare-lationschemeisassociatedwithaset,whichiscalleditsdomain.Weusethefollow-ingnotationalconvention:ifRi(Xi)isarelationschemewithmattributes,wewriteXi=Ai,1,Ai,2,...,Ai,m.ThedomainforAi,jisdenotedbyDi,j.AtuplefortherelationschemeRi(Xi)isanelementofDi,1×...×Di,m.

ArelationforarelationschemeRi(Xi)isasetoftuplesforRi(Xi).IftisatupleforRi(Xi)andYisasubsetofXi,thent[Y]denotesthesubtupleoftcorrespondingtoY.LetRi(Xi)bearelationscheme,riarelationforRi(Xi),andYasubsetofXi.TheprojectionofrionY,denotedbyπY(ri),is{t[Y]/t∈ri}.

Afunctionaldependency(notedfd)overRiisastatementoftheformRi:Y→Z,whereYandZaresubsetsofXi.AfunctionaldependencyRi:Y→Zissatisfiedbyriifforanytwotuples,tandt󰀁,inri,t[Y]=t󰀁[Y]impliest[Z]=t󰀁[Z].

AkeyforRiisasubsetKiofXi,suchthatRi:Ki→Xiissatisfiedbyanyriasso-ciatedwithRiandtheredoesnotexistanypropersubsetofKihavingthispropriety.Arelationschemecanhaveseveralcandidateskeysfromwhichoneprimarykeyischosen.Therefore,inarelation,attributescanbeclassifiedintoprimarykey,foreignkeys,whicharekeysinanotherrelationandnon-keyattributes.Aprimarykey(orasub-partofakey)canbeatthesametimeaforeignkey.

Inanyalgorithmthatdiscoversfdsfromextensionoftherelations,itisimportanttominimizeredundantwork,thatis,theworkofanalyzingthetuplestoobtainredundantfds,becausetheycanbederivedfromotheronesbyapplyingArmstrongrules(Arm-strong,1974).Thefocusisonthefollowingderivedfds:1.Trivialfdsrule:X→YifY⊆X.

ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases57

2.Augmentationrule:X→YthenXW→YZ,ifZ⊆W.3.Transitivityrule:ifX→YandY→Z,then,X→Z.

Thepresenceofafdcangiverisetoredundantinformation,whichcanbeeliminatedbyperformingadecomposition.Thisobservationhasledtothedefinitionofseveralnor-malformsofrelationaldatabase(1NF,2NF,3NF,3NFBCK).Inourapproach,wesup-posethatrelationsarein3NF.

LetRi(Xi)andRj(Xj)betworelationsschemas.LetriandrjberelationsofRi(Xi)andRj(Xj),respectively.Aninclusiondependency(notedind)isastatementoftheformRi.Y<Arelationalschemaisapair,whereRisasetofrelationschemesandDisasetoffunctionalandinclusiondependencies.

Letustakeanexampleofrelationalschemaof“Personnel”databasefromwhichweapplyourreverseengineeringapproachwhileanalyzingtheirforms:

Employee(emp.id.,name,birth-date,address,sex,m.s.,number-chil.,ssn,accountid,nationality,recruitment-date),Child(emp.id.,name,birth-date,sex)Holiday(hol.id,designation)

On-holiday(hol.id,emp.id,begin-date,end-date)

Element-Paye(payeid,designation,rate-number,type)Payed(payeid,emp.id,date)

Sanction(sanc.id,désigantion,degree)Punished(sanc.id.emp.id,sanction-date)Diploma(dipl.id,designation)

Graduate(dipl.id,emp.id.,obtain-date,University)Rank(rankid,designation,category,section,level)Having-rank(rankid,emp.id,begin-date,end-date)Function(func.id,designation)

Having-function(func.id,emp.id,dept.id,begin-date,end-date)Departure(dep.id,designation)

On-Departure(dep.id,emp.id,date,length)

Department(dept.id,designation,emp.id),Bank(bankid,name,address.)

4.2.IdentificationofFormRelations

Theformseitherpermittheupdatingofrelationsinunderlyingdatabaseorrepresentaviewthatisajointofrelations.Therefore,eachfieldentryisgenerallylinkedtoanat-tributeofonerelationintheunderlyingdatabase.However,theidentificationofformre-lationsandtheirprimarykeysrespectively,consistsindeterminingtheequivalenceand/orthesimilaritybetweenstructuralunits(nodes)ofhierarchicalstructureandrelationsintheunderlyingdatabase.Thisisabasispointfromareverseengineeringpointofview(Malki,1999).

58M.Malki,A.Flory,M.K.Rahmoun

Anodeofaformhierarchicalstructuremaybeeither:

–equivalenttoarelationintheunderlyingdatabase,i.e.,thesetwoobjects(nodeandrelation)haveasamesetofattributes;

–similartoarelation,i.e.,itssetofattributesisasubsetoftheoneoftherelation;–asetofrelations,i.e.,itssetofattributesregroupsseveralrelationsinunderlyingdatabase.

Also,fordependentnodes(orformrelation),primarykeysareformedbyconcatenat-ingtheprimarykeyofitsparentwhititslocalprimarykey.

Thisprocessofidentificationissemi-automatedbecauseitrequirestheinteractionwiththeanalysttoidentifyobjetsthatdonotverifyproprietiesofequivalenceandsimi-larity.

Whileapplyingthisprocessonthehierarchicalstructureof“Cardofwage”andthephysicalrelationalschemaofunderlyingdatabase,weextractthefollowingrelationalsub-schema:

Employee(emp.id.,name,birth-date,address,sex,m.s.,number-chil.,ssn,accountid,nationality,recruitment-date),Payed(payeid,emp.id,date)

Having-rank(rankid,emp.id,begin-date,end-date)

Having-function(func.id,emp.id,dept.id,begin-date,end-date)

4.3.ExtractionoftheFunctionalDependencies

Theextractionoffunctionaldependenciesfromtheextensionofdatabasehasreceivedagreatdealofattention(see,e.g.,Anderson,1994;Castellanos,1993;Mannila,1994;Petit,1995).Theaimoftheextractionoffunctionaldependenciesistoreducethecomplexityofextractionalgorithms,thatisexponentialtimewithregardtothenumberofattributes(lefthandsideoffdnotedlhs),andpolynomialwithregardtothenumberoftuplesofrelationindatabase(righthandsideoffdnotedrhs).

Inourapproach(Malki,1999),weintroducetwowaystoreducethetimeforex-tractingfunctionaldependencies.Thefirstonereplacesdatabaseinstanceswithamorecompactrepresentationthatis,theforminstances.Thesecondoneistouseonlythenon-keyattributesaslefthandsideoffunctionaldependencies,because,wesupposethattherelationalschemaisinthe3NF(wheretheprimarykeydeterminesfunctionallytherestofattributes,i.e.,non-keyattributes).Inthiscondition,itispossibletoinferanewfdthroughnon-keyattributesandkeyattributes(i.e.,non-keyattributes→keyattributes).However,weconsidertheutilizationofformsassamplingdatabaseforextractingdomainsemantics.

LetLabetheleft-handside(simpleorcomposite)ofafd,thatis,thedeterminant,andRatherighthandside(alwayssimple).Foreverypair(La,Ra)allthetuplesoftherelationmustbeanalyzedtotestthefdcondition.

Now,wepresentthefundamentalprinciplesthatpermittooptimizetheprocessforextractingthefds:

–ThepossibleLa’sareorderedbynumberofattributes(i.e.,non-keyattributes).ForarelationRwithnon-keyattributesA,B,C,theorderwouldbe:A,B,C,AB,AC,BC,

ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases59

ABC.ThisorderguaranteesthatnoLaisanalyzeduntilanysubsetofitscomponentattributeshasbeenanalyzed.

–ForeverypossibleLaXthatisanalyzed,eachkeyattributeofRiscandidateasaRaofapossibleFd.However,eachsuchattributeisproposedasapossibleRaonlyifthereisnopreviouslyfoundfdwhoselefthandsideiscontainedbythisLaandwhoserighthandsideisequaltothisRa,otherwise,thepair(La,Ra)representsaredundantfd(byaugmentationrule)andRaisnotapossiblebutapositiverighthandsideforLa.Inthisway,onlyfdswithminimalLa’sareobtained.Furthermore,onceafdhasbeendiscovered,alltransitivefdsthatcanbederivedatthatmomentarecomputed.Withthismechanism,theanalysisoftheextensionoftherelation(sub-relationofform)isavoidedformanyredundantfds.

–Foreachpair(La,Ra)representingapossiblefd,theextensionoftherelationisanalyzedtotestifitsatisfiestheFdcondition.Thisconditionischeckedsimultaneouslyforallpossiblefds(i.e.,allpossibleRa)onaLawhilethepairsoftuplesti,tjarebeingretrieved.TheproblemisthattotestthefdconditionforeachdifferentvaluebforLa,everytupleintherelationmustberetrievedinsearchofthesamevaluebfortheconditionofattributescomposingLa.Ifthesearchsucceeds,thenforeachpossibleRathecorrespondingattributesofbothtuplesarecheckedtoseeiftheirvaluesmatchtoo.–ThesortingoftherelationsaccordingLareducesthesearchofRa.

Thealgorithmforextractingfdsthroughsub-relationformispresentedin(Malki,2000),onlythemainpartsaregivenbelow.

Algorithm1.Extractionofthefunctionaldependencies

Input:

RS:Relationalsub-schema

󰀂>FB:FormsbaseOutput:

FD:Setoffds:La→Ra

/La:attributesnokeysandRas:keyattributes/FD:=∅

󰀂

ForeachRelationRi

󰀂

La:=Xi−Ki/thesetofnon-keysattributes/ForeachLak

Comp−Ra(X−Lak)

/itiskeyattributeunderatomicshapeforwhichLa→Raisnoredundant/IfRathen

󰀂

(Lak)Sortri

󰀂

,FD)/Testthisfd.accordingtoinstancesofform/Detect-fd(Lak,Ra,ri

Endif

Transit-fd(Fd)/deductallfds.whileapplyingtheruleoftransitivity/EndforEndforFAlgo

60

Detect-fd(La,Ra,r,FD)i:=1

n:=card(r)

WhileRa=∅oriIfti[Raj]=ti+1[Raj]Ra:=Ra−RajEndifEndforEndifi:=i+1EndwhileIfRa=∅

FD:=FD+(La,Ra)EndifEndetect

M.Malki,A.Flory,M.K.Rahmoun

Whileusingnon-keyattributes,thenumberofcandidatestoextractacanonicalfunc-tionaldependencebecomesthereforelinearwithregardtothenumberofattributes.Thetimecomplexityisthenreducedto󰀂(n|X|plnp),wheren=|Xi|,Xisathenon-keyattributeandPisnumberoftuplesinri.

Whileapplyingthisalgorithmonthesub-schemaof“CardofWage”andtheirin-stances,onefindstheFd:departmentid→Employeeid.Thesefunctionaldependencies,thatareextracted,canbeusedfornormalizingcorrespondents’relationalsub-schemasin3BKCNF.

4.4.ExtractionofInclusionDependencies

Oncetheextractionofthefunctionaldependenciesisachieved,oneproceedstotheex-tractionoftheinclusiondependencies,whichcontainthemajorityoftheinformationfortheidentificationofrelationshipsbetweenrelations.TheproblemwiththeextractionofIndsiswellknown(Chiang,1994;Castellanos,1993;Mannila,1994;Petit,1995).Thenaivemethodofchecking,foreachpartofrelationwithnattributesinthedatabase,allpossibleInds,onebyone,isparticularlyimpossible,becausetherearen!possiblesnoequivalentInds.InordertoreducethesetofpossibleInds,Chiangetal.in(Chiang,1994)andPetitetal.in(Petit,1995)useheuristicsthatonlyformulateIndsbetweenrelations’keyattributes.ThesepossibleindswillbeverifiedbyanalyzingdatainstancesinregardwithInddefinition.

Inourapproach(Malki,1999),weformulatepossibleIndsbetweenrelations’keyofrelationalsub-schemaofform.Thetimeofthisprocessismoreoptimizedwithregardtotheotherapproaches((Chiang,1994)and(Petit,1995)),becausethepossibleIndsareverifiedbyanalyzingtheformextensionswhicharemorecompactrepresentationwithregardtothedatabaseextension.

ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases61

Theheuristicsemployedbytheextractionprocessforproposingpossibleinclusiondependenciesaredescribedinbelow.

–TworelationsRiandRj,havingthesameprimarykeyX,maybelinkedbyaInd:Ri.X<–WhentheprimarykeyXofarelationRiisaforeignkeyxinanotherrelationRj,thenitispossibletohaveaInd:Rj.x<Inaddition,weapplysomecriteria,whichverifytheIndsproposedbyanalyzingdatainstancesofforms:

–ForeachformulatedInd,theextractionprocessmustcomparethesetsofvaluesappearinginRi.XandRj.XaccordingtotheInddefinition,

–RedundantIndscanbedetectedbytheinferencerulesofinclusiondependencies(projectionandtransitivity)(Elmasri,1994),thatis:IfRi.X<ThealgorithmofextractionofIndisdetailedin(Malki,2000);wepresenthereafteronlythemainparts.

Algorithm2.ExtractionofInclusionDependencies

Input:

RS:Relationalsub-schema:SRS:SetofRSofForms

󰀂>FB:Formsbase:Output:

󰀂󰀂.Xi<BeginIND:=∅

󰀂

ofSRSForeachRi󰀂

ForeachRj=R󰀂iofSRS

/TestofapossibleInd.joiningtworelationsevenhavingthekeyprimary/

󰀂󰀂=KjIfKi

󰀂󰀂󰀂󰀂

Test-Ind(Rj.Ki,Ri.Ki,IND)/AccordingtoInstanceofform/

󰀂󰀂=Foreignkey(Rj)ElseIfKi󰀂󰀂󰀂󰀂.Ki,Ri.Ki,IND)Test-Ind(Rj

EndIfEndForjEndForiEndalgo

Test-Ind(Ri.X,Rj.Y,IND)

/itisaInd.noredundant/If(Ri.X,Rj.Y)∈IND

/TheprojectionofrionX/ri.X:=πX(ri)

/TheprojectionofrjonY/rj.Y:=πY(rj)

Ifrj.Y=∅andrj.Y󰀌ri.X

IND:=IND+(Rj.Y<62M.Malki,A.Flory,M.K.Rahmoun

Ifri.X=∅andri.X󰀌rj.Y

IND:=IND+(ri.X<Transit-Ind(IND)/deductallInds.Fromtheruleoftransitivity/EndIf

EndTest-Ind

Inthisalgorithm,attributesofdependenciesaretheprimarykeysandforeignkeys.Thus,thetimecomplexityisreducedtothetestoftheinclusiondependencyontheforminstances.

ThesetoftheIndsextractedfromtherelationalsub-schema“Cardofwage”is:

Having-rank.rankid«Rank.rankidPayed.payeid«Element-Paye.payeidHaving-function.func.id«function.func.id

5.TransformationofRelationalSchemaintoConceptualObjectSchema

Object-orientedtechnologyisbeingwidelyusedinsoftwaredevelopmentthesedays.Theobject-orienteddatamodelisgrowinginpopularityintheareaofdatabasedevelopment.Consequently,thereisalsoagrowinginterestinre-engineeringrelationaldatabasestoobject-orienteddatabases.Reverseengineeringofrelationaldatabasesisconcernedwiththetaskofextractingstructurescorrespondingtoaparticularconceptualmodelsuchastheentity-relationship(ER)model,theextendedentity-relationship(EER),theobject-oriented(OO)model,andsoon(Behm,2000;Farhrner,1995b;Ramannathan,1997;Tari,1997).

Thefactthatourapproachusesdatabaseformsasmachine-analyzablesource,im-pliesthatwefirsttransformtherelationalsub-schemasofformsintoobject-orientedsub-schemas.Inthefollowing,wemergethesetofobject-orientedsub-schemasinaglobalobject-orientedschemarepresentingtheconceptualschemaoftheunderlyingdatabase.Theresultingglobalobject-orientedschemamustbevalidatedasarichandcorrectrepre-sentationoftheapplicationdomain.Asatargetmodel,weusetheobject-orientedmodelcalledHCB(Ayache,1995)developedinourlaboratory.Itispresentedhereafter.5.1.TheHCBModel

TheHCBmodelisanobject-orientedmodelwithathree-dimensionalrepresentationofobjects.HCBisfirstamodelofrepresentation,becauseitallowsthespecificationoftheobjectscheme,followingthreeplanes(Fig.5a):structuring,communicationandinheri-tance,whereeachcomponentisdescribedseparatelyandrepresentstheprojectionoftheobjectonthecorrespondingplane.Asimplerepresentationofobjectsgives,ingeneral,adiagramcomposedofheterogeneousinformationanddifferentlinktypes.Thecompre-hensionandtheexploitationofsuchschemaarenoteasybecauseofthesuperpositionofthedifferentperspectives.TheHCBmodelaimsfirsttoofferanobjectschema,whichiscoherentandeasilyexploitable.ThemodeliscalledHCBbecauseeachperspective

ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases63

Fig.5.a)ThethreeplanesofthemodelHCB;b)Representationofaclass.

representsaspecificverb.ThestructureismodeledbytheverbHavebecauseitspeci-fieswhattheobjecthas.ThebehaviorismodeledbytheverbCommunicate.Finally,theinheritanceperspectiveismodeledbytheverbBebecauseitexpresswhattheobjectis.5.1.1.ConstructorsoftheHCBModel

Thestructuringofinformationisdoneusingthefollowingconstructorstobuildcomposedobjecttypes(Fig.5b.):

–tuple:ifaclasscontainscomponentofdifferenttypes,thoselatterwillbestructuredbythetupleconstructors.Thetupleisastructureofheterogeneousobjects,whereorderisnotsignificant.Thestructurecanberecursivesinceatuplemaybeusedtoconstructanothertuple.

–set:thesetconstructorisamechanismallowingthecreationofa“set-object”fromhomogeneouscomponentobjectsofsametype,wheretheorderofobjectmemberisnotsignificant.

–list:thelistconstructorallowsthegroupingtogetherofhomogeneousobjects,wheretheorderissignificantandanelementmayberepeated.

5.1.2.Composition,AggregationandInheritance

Thecompositionlinksintroduceasemanticofcardinalitybetweenanobjectanditscom-ponents(attributes).Tokeepthisinformation,weproposethefollowingrepresentation:–asingle-valuedlinkisrepresentedbyasimplearrow(noted→),

–amulti-valuedlinkisrepresentedbyadoublearrow(noted→→),andweuseatriplearrowwhenthevaluesformalist(noted→→→),

–alinkofinheritance(i.e.,linkofspecialization/generalization)isrepresentedbyadoublelinksorientedfromsubclasstosuperclass,

–inaddition,theconceptofroleallowsthemodelingofcrossedreferencesbetweenentitiesofthesystem.TheHCBmodelgeneralizesthenotionofroleandusesitanytimethattheattributevalueisareferencetoanobjectbelongingtoanotherclass(e.g.,Empidofdepartmentclass:Fig.6a).

Thedynamicpartofanobjectoverlaystwoaspects,calculation(i.e.,methodswithcommunicationmessages)andbehavior(i.e.,lifecycle).Wedistinguishtwocategoriesofmethods:

–interfacesmethods,whichallowaccesstovaluesofattributes.Wecanwrite,read,modify,andperformotheraccessfunctionsonattributevalues;

64M.Malki,A.Flory,M.K.Rahmoun

Fig.6.a)Binaryassociation;b)N-aryassociation.

–calculationmethodsallowingcomputationofattributevalues.Calculationmethodssendmessagestoaccessinstances.

5.2.TransformingFormRelationalSub-SchemaintoObject-OrientedSub-Schema

M1

definedonadatamodelM1toaFormally,aschemamappingofasourceschemaS1

M2

definedonamodelM2maybedefinedasatransformationTsuchtargetschemaS2

M1M2

thatT(S1)=S2.Thetransformationsareusuallyacollectionofrulesthatmapcon-ceptsfromthesourcedatamodeltothetargetdatamodel(Hainaut,1995;Ramannathan,1997).

Theproposedmethodologyoftransformation,whichisamixtureofkey-basedandInd-basedapproaches(Farhrner,1995a;Chiang,1994;Petit,1995),consistsinidentify-ingthecomponentsoftheobjectsub-schemaintheHCBmodelthroughtherelationalsub-schema.However,thisprocessthatconcernsallthreeplanesofHCBmodel(i.e.,structuring,andinheritance)permitsthat:

–theidentificationofobjectclassesandtheircompositionrelationshipswillbepre-sentedinthestructuringplane;

–theidentificationofinheritancerelationshipswillbepresentedintheinheritanceplane;

–andtheextractionofthedynamicaspectofobjectclassesfromtheformswillbepresentedinthecommunicationplane.Thisprocesswillbeinvestigatedsubsequentlyinotherwork.

5.2.1.IdentificationofObjectClasses

Inordertosimplifythetranslationprocess,weassumethatallrelations(orrelationscheme)areinthe3NF.Thereasonforthisistoensurethateachrelationcorrespondstoexactlyanobjectclass.Theseobjectclasseshavethesameattributesasthosecontainedintherelations.Thus,wedefineaclass-relationtobearelationthatdirectlymapstoanobjectclass.

5.2.2.IdentificationofAssociations

IntheHCBmodel,weidentifytwotypesofassociations:binaryandn-aryassociations.Theseassociationsarerepresentedbyreferencelinks(i.e.,roleattribute)andassociationclassesrespectively.

ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases65

5.2.2.1.BinaryAssociations.Theforeignkeysofclass-relationandthecorrespondingIndsidentifyabinaryassociationbetweenclass-relations.Therefore,thisreferentiallinkistranslatedinroleattributeintheHCBmodel.Thetargetwillbe,ingeneral,aroleattributetypedbytheotherclass.Astheassociationisnotdirected,thentheobtainedrepresentedcomprisescrossedreferences,asshowninthefollowingschema(Fig.6a).Whileapplyingthistransformationruleonthetwoclass-relationsandtheirIndsbe-low,wegeneratethefollowingobjectschema(seeFig.6a).

Employee(emp.id.,emp.name,birth-date,address,sex,m.s.,number-children,ssn,accountid,nationality,recruitment-date),

Department(deptid,design,empid),

IND:Department.empid«Employee.empid.

5.2.2.2.n-aryAssociations.Everyclass-relationwhoseprimarykeyisentirelycom-posedofforeignkeysisaclassthatrepresentsanassociation(notedassociation-class)betweenalltheclassescorrespondingtotheclass-relationthatforeignkeysreferto.Thisn-aryassociationmaybetranslatedbyanattributetuple.Inthistransformationstep,anoptionmustbetakenandthepossibilityislettotheanalysttochooseoneofthetransformationrules.Althoughthecreationofassociation-classtransformingthen-aryassociationcanbecomeindispensable.

TherelationHaving-function(funcid,empid,deptid,beginning-date,end-date)istranslatedintoAssociation-classasshowinFig.6b.

5.2.3.IdentificationofInheritanceRelationships

Twoformsofrelationalstructuresmayindicateaninheritancerelationship:

–Inthefirstcase,everypairofclass-relations(CR1,CR2)thathavethesamepri-marykey(notedX)andthecorrespondingInds(i.e.,CR1.X<–Inthesecondcase,onlyoneclass-relationisconsideredtowhichaninheritancerelationshipmaybegenerated.Forexample,intherelationEmployee,theattributerankisrelevantonlyfortheteacheremployee(i.e.,EmpType=“T”)andtheattributeFunctionisrelevantonlyforadministrativeemployee(i.e.,EmplType=“A”).Therelationcontains

Fig.7.a)ontworelationsb)ononlyonerelation.

66M.Malki,A.Flory,M.K.Rahmoun

nullvalueforrankattributeforalladministrativeemployeesandnullvaluesforFunctionattributeforallteacheremployees.Thisisanexampleofinstancesoftwosubclassesbeingstoredinthesinglerelation.

Theidentificationofthisformofinheritancerelationraisestheknowledgedomainfromdatabase(Piatetsky,1991).Thereareseveralknowledgediscoveryalgorithmsthatcanidentifyrulesfromrelationaldatabase(Piatetsky,1991).Byusingoneofthosealgo-rithmsontheEmployeerelation,forinstance,wecanidentifythefollowingrules:EmpType=“T”=⇒Function=Null(1)EmplType=“A”=⇒Rank=Null(2)

Suchtypesofrulesaretypicallycalled“strongrules”sincetheyholdforallthein-stancesoftherelation.Ineffect,thesealgorithmscanbeusedtoidentifythediscrimi-nateattributeofthegeneralizationbyfindingruleswhoserighthandsideisoftheformA=NullwhereAissomeattributename.Followingisthegeneralizedprocedureforidentifyingsub-classesoncethecharacteristicruleshavebeenidentified.

Asanexample,therules((1)and(2))permittoidentifythehierarchyofgeneralizationwithtwosub-classes(seeFig.7b).

Regardingthetransformationrulespresentedabove,weproposeanalgorithmthattransformsaformrelationalsub-schemaintoanobject-orientedsub-schema.Algorithm3.Transformationofarelationalsub-schemaintoOOsubschema

Input:

RS:Relationalsub-schema:

󰀂>FB:Formsbase:󰀂󰀂.Xi<Output:

OS:Object-orientedsub-schema:C:=∅;L:=∅;H:=∅;

󰀂

relationForeachRi

Create_classCi

C:=C+Ci/eachrelationbecomesaClass/

󰀂

toCi/Attachment(Xi,Ci)/AttachmentofallattributesofRi

Endfor

󰀂

relationForeachRi󰀂󰀂

.YjIf∃Ind:Ri.Xi<IfXiisPrimaryKeyofRi

H:=H+(Ci−isa→Cj)/Aninheritanceoftwoclassrelations(Ci,Cj)/

󰀂󰀂∈K󰀂/Xi⊂kiElseIf∃ki

󰀂Ifkigroupstogetherapartitionofnforeignkeyreferencingforexamplethefollowingrela-󰀂󰀂tions:Ri,1...Ri,n

create_association_classCa

C:=C+Ca/Creationofanassociationclass(i.e.,Associationn-ary)/

󰀂

:Cl/Classparticipanttotheassociation/ForeachRi,l

create_attributeTlofCa/typedRolebyCl/

ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases67

Fig.8.Object-orientedsub-schemaof“Cardofwage”.

L:=L+(Ca–role→Cl)Endfor

󰀂

ElseIfXi󰀁ki

create_attributeTi/typedRoleCj/

L:=L+(Ci-role→Cj)/generationofabinaryassociation(role)/create_attributeTj/typedRoleCi/L:=L+(Cj-role→Ci)

ElseIdentifythestrongrules:(Xa=value)=⇒Xi=Nullcreate_super-classChwhoseasetofattributes=X−∪XiForeachXi

create_subclassCh,i

H:=H+(Ch,i-isa→Ch)/AninheritanceofonlyoneRelation/EndforEndifEndforEndalgo

Fig.8.presentsthetransformationoftherelationalsub-schemaof“cardofwage”intoanobject-orientedsub-schema.

5.3.IntegrationofObject-OrientedSub-Schemas

Inthisphaseofreverseengineeringusingformsasmachine-analyzablesource,object-orientedsub-schemasarederivedfromrelationalsub-schemas.Theseobjectsub-schemaswillbemergingintoaglobalobject-orientedschemathatrepresentsthewholeunderly-ingdatabase.However,weapplythetechniquesofintegrationschema.Weassume,inagreementwith(Batini,1986)thattheintegrationschemaprocessconsistsintwophases:comparisonandconformingofschemas,andmergingandrestructuringofschemas.

68M.Malki,A.Flory,M.K.Rahmoun

Thecomparisonphaseperformsaparitiescomparisonofobjects(ofthesub-schemas)andfindspossibleobjectspairs,whichmaybesemanticallysimilarwithrespecttosomeproprieties,suchassynonyms(nameofattributeandclass)ofequalprimarykeyattributeandequivalentofclasses.Theconformingisavarietyofanalystsassistedtechniquesthatareusedtoresolveconflictsandmismatchedobjects.

Themergingandrestructuringphasegeneratesanintegratedschemafromtwocom-ponentschemasthathavebeencompared.Theintermediateresultsareanalyzedandre-structuredinordertoeliminatethesymmetricalandtransitiverelationshipbetweenob-jects.

Inaddition,weconsideronlyonepairofschemasatatime;further,theresultofintegrationschemaisaccumulatedintoasingleschema,whichevolvesgraduallytowardstheglobalobjectschema.Formoredetailssee(Malki,2000).5.4.Validation

Thekeyquestiontobeansweredbyvalidationis:howfaithfullywillalegacydatabaseberepresentedbytheglobalobjectschemaproducedfromreverseengineeringprocess?Aschemaiscorrectifallconceptsoftheunderlyingmodelareusedcorrectlywithrespecttosyntaxandsemantics(Chiang,1994;Farhrner,1995b;Hainaut,1995).Ifalldatabaseinstancesthatcanberepresentedinthesourceschemacanalsoberepresentedinthetargetdatabaseschemaandviceversa,theprocessisinformationpreserving.

Ourvalidationprocessiscompletelyinteractive.Humaninterventionwillberequiredintheverificationandconformationoftheresults.Inthisprocess,thatisnotdeterminist,someconceptualconstructionwillbeexaminedeithertobevalidatedortobeputbackinreason.However,itconsistsinidentifyingthehiddenobjects,reiteratingtherestructuringprocessinordertoeliminatethesymmetryandthetransitivityofcompositionandinheri-tancerelationshipsandinsuringtheconsistencyoftheglobalobject-orientedschema.Formoredetails,see(Malki,2000).6.Conclusion

Amethodologyforperformingreverseengineeringofarelationaldatabasetowardsanobject-orientedsystemisproposed.Ourapproachthatrequirestheinteractionwithusers(i.e.,analystordesigner)usestheinformationextractedfromformstructuresandin-stancesasmachine-analyzablesourcefordatabasereverseengineering.

Thismethodologydoesnecessarilyamodelthatallowsabstractingdatabaseformsandaframeworkthatallowstoanalyzingformsinordertoextractstructureinforma-tionaswellasdomainsemantics.Formsareusedincombinationwithothersemanticsources,especiallyrelationaldatabaseschemaanduses“head-knowledge”.Theissueswithregardtothevalidationandverificationoftheextractionandtransformingprocessarealsodiscussed.

Anexperimentationofthismethodhasbeenachievedinthesettingofthe“PersonnelManagement”systemofouruniversity.Thisapproachcansupplementexistingdatabasereverseengineeringtechniqueswhereformsconstituteimportantusesofthedatabase.

ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases69

Thereareanumberofpossibleextensionstothisresearch:

–wehaveusedtechniquesofdataminingforidentifyinginheritancerelationships;thesetechniquescanbegeneralizedforextractingdatadependencies;

–forthemigrationtowardObject-orientedsystems,wehavemadeonlytheextractionofthestaticaspect(i.e.,datasemanticandconceptualstructure),itremainstoextractthedynamicaspectinordertotranslatedataandprogramsintoObject-Orientedsystem.Acknowledgments

Wewouldliketothanktheanonymousrefereesfortheirinsightfulcommentsandsugges-tionsforimprovingthispaper.ThisresearchwassupportedinpartbyNationalScientificResearchCouncil,Algeria,undercontractNo:B2205/-/11/99.References

Anderson,M.(1994).ExtractingaE.R.schemafromarelationaldatabasethroughreverseengineering.InProc.ofthe13thInter.Conf.ontheERA’94,pp.403–419.

Armstrong,W.(1974).Dependencystructuresofdatabaserelationships.IFIPCongress,580–583.

Ayache,M.,A.Flory(1995).Agenerationprocessofobject-orienteddatabasefromE/Rschema.InProc.ofInt.Conf.ofSoft.Eng.&Know.Eng.,RockvilleMaryland.

Batini,C.etal.(1984).Amethodologyforconceptualschemadesignofofficedatabase.InformationSystem,9(3).

Batini,C.,M.Lenzerini,S.B.Navathe(1986).Acomparativeanalysisofmethodologiesfordatabaseschemaintegration.ACMComputingSurveys,18.

Batini,C.,S.Ceri,S.B.Navathe(1992).ConceptualDatabaseDesign:anEntityRelationshipApproach.Ben-jaminCummings

Behm,A.etal.(2000).Algebraicdatabasemigrationtoobjecttechnology.InProc.ofInt.Conf.ER’00,pp.440–453.

Casanova,M.A.,J.E.A.DeSà(1983).Designingentity-relationshipschemasforconventionalinformationsys-tem.InProc.of3thConf.OntheEntity-RelationshipApproachCalifornia,pp.265–277.

Castellanos,M.,F.Saltor(1993).Extractionofdatadependencies.InProc.of3thEuropean-JapaneseSeminaronInformationModellingandKnowledgeBases.

Chiang,R.H.L.,T.M.Barron,V.C.Story(1994).Reverseengineeringofrelationaldatabases:extractionofanEERmodelfromarelationaldatabase.DataandKnowledgeEngineering,10(12).

Choobineh,J.(1992).Aform-basedapproachfordatabaseanalysisanddesign.CommunicationoftheACM,35(2).

Elmasri,R.,S.Navathe(1994).FundamentalsofDatabaseSystems.BenjaminCumming,2ndedition.

Farhrner,C.,G.Vossen(1995).Asurveyofdatabasedesigntransformationbasedontheentity-relationshipmodel.DataandKnowledgeEngineering,15,213–250.

Farhrner,C.,G.Vossen(1995).Transformingrelationaldatabaseschemasintoobject-orientedshemasaccord-ingtoODMG-93.InProc.ofthe4thInter.Conf.onDeductiveandObject-OrientedDatabase.Singapour.pp.429–446.

Hainaut,J.L.,V.Englebert,J.Henrard,J.M.Hict,D.Roland(1995).Requirementsforinformationsystemreverseengineeringsupport.InProc.oftheIEEEWorkingConf.onReverseEngineering.Toronto.

Ji,W.(1991).Analgorithmconvertingrelationalschemastonestedentity-relationshipschemes.InProcof10thint.Conf.onERA’91.SanMateo.pp.231–246.

Johanneson,P.(1994).Amethodfortranslatingrelationalshemasintoconceptualschemas.InProc.ofthe10thIEEEint.Conf.onDataEngineering.Houston.

Lefkovitz,H.C.(1979).AstatusreportontheactivityiesoftheCODASYLenduserfacilitiescommittee(EUFC).SIGMODRec.,10(2).

70M.Malki,A.Flory,M.K.Rahmoun

Kalman,K.(1991).Implementationandcritiquesofanalgorithmwhichmapsarelationaldatabasetoacon-ceptualmodel.InProc.ofthe3thInter.Conf.onComputerAidedSotwareEngineering.

Mfourga,N.(1997).Extractingentity-relationshipschemasfromrelationaldatabases:aform-drivenapproach.InProc.ofWorkingConf.onReverseEngineeringWCRE’97.

Malki,M.,M.Ayache,M.K.Rahmouni(1999).Rétro-ingénieriedesBasesdeDonnéesRelationnelles:Ap-procheBaséesurl’AnalysedeFormulaires.InProc.ofColloqueofINFORSID’99.Toulon,France.

Malki,M.,A.Flory,M.K.Rahmouni(2000).Rétro-ingénieriedesapplicationsdegestionàpartirdel’analysedesformulaires.ResearchReportLRISI,04/2000Univ.SidiBelAbbes.

Mannila,H.etal.(1994).TheDesignofRelationalDatabases.Addison-Wesleypublishing.

Markowitz,V.M.(1990).Identifyingextentedentity-relationshipobjectstructuresinrelationalschemas.IEEETrans.onSoft.Engineering,16(8).

Navathe,S.B.,A.M.Awong(1987).Abstractingrelationalandhierarchicaldatawithasemanticdatamodel.InProc.of6thInt.ConfonERA’87,pp.303–336.

Petit,J.M.,F.Toumani,J.Kouloumdjian(1995).Relationaldatabasereverseengineering:amethodbasedonQueryanalysis.Inter.JournalofCooperativeInformationSystem,4(2,3),287–316.

Piatetsky-Shapiro,G.,W.Frawley(1991).KnowledgeDiscoveryinDatabases.AAAIPress/TheMITPress,California.

Premerlani,W.J.,M.Blaha(1994).Anapproachforreverseengineeringofrelationaldatabases.CommunicationofACM,37(5),42–49.

Ramannathan,S.,J.Hodges(1997).Extractionofobject-orientedstructuresfromexistingrelationaldatabases.SIGMODRecord,26(1).

Shu,N.C.(1985).FORMAL:Aforms-oriented,visual-directedapplicationdevelopmentsystem.Computer,18(8).

Soutou,C.(1998).Relationaldatabasereverseengineering:algorithmtoextractcardinalityconstraints.Data&KnowledgeEngineering,28,161–207.

Tari,Z.etal.(1997).Thereengineeringofrelationaldatabasesbasedonkeyanddatacorrelation.InProc.ofInt.Conf.ofIFIP’97.PublishedbyChapman&hall.

Ullman,J.D.(1984).PrinciplesofDatabasesSystems.Computersciencespress.

Yao,S.B.etal.(1984).Formanager:anofficeformsmanagementsystem.ACMtrans.On.InformationSystem,2(3).

ExtractionofObject-OrientedSchemasfromExistingRelationalDatabases71

M.MalkireceivedtheengineerdegreeincomputersciencefromNationalInstituteofComputerScienceAlgiers,andtheMasterofScienceDegreeinComputerSciencefromUniversityofSidiBel-AbbesAlgeria,in1983and1992respectively.HeiscurrentlyaPh.D.candidateintheComputerScienceDepartmentattheSidiBel-AbbesUniversity.HeisaseniorlecturerintheComputerScienceDepartmentattheSidiBel-AbbesUniver-sity.Hisresearchinterestsincludesemanticdatamodeling,databasereverseengineering,informationsystemreengineering,andbuildingweb-basedapplications.

A.FloryreceivedthePh.D.degreeincomputersciencefromClaudeBernardUniversity,LyonFrance,in1977.HeisaprofessorintheComputerScienceDepartmentattheNa-tionalInstituteoftheAppliedSciencesofLyon.Hisresearchinterestsareengineeringofinformationsystem,databasereverseengineering,documentaryinformationengineering,medicalinformationsystems.

M.K.RahmounireceivedthePh.D.degreeinoperationalresearchfromSouthamptonUniversityUK,in1987.HeisaprofessorintheComputerScienceDepartmentattheUniversityofOranAlgeria.Hisresearchinterestsincludeformalspecifications,method-ologiesofspecificationintheinformationsystems,databasereverseengineering.

72M.Malki,A.Flory,M.K.Rahmoun

Objektiniu󰀁schemu󰀁formavimasišesamu󰀁realiaciniu󰀁baziu󰀁vadovaujantisformomis

MimounMALKI,AndréFLORY,MustaphaKamelRAHMOUNI

Straipsnyjenagrin˙ejamasuždavinys,kaip,vadovaujantisvartotojuipateikiamomisrezultatu󰀁

duomenubaziuschemas.Sprendžiantšiuždavinisi¯ulomapasinaudotiformomis,atkurtireliaciniu󰀁󰀁󰀁󰀁󰀁

formustrukt¯ura,irjuturiniu(konkreˇciaisduomenimis).Šitaipgalimaatkurtirealiaciniuirrezultatu󰀁󰀁󰀁󰀁

poschemiusirjuribojimus.Gautusrealiaciniusposchemiussi¯ulomatransformuotiiobjek-baziu󰀁󰀁󰀁

estiniusposchemiusir,agreguojantobjektiniusposchemius,suformuotiatitinkamosduomenu󰀁baz˙

visa󰀁objektine󰀁schema.󰀁

因篇幅问题不能全部显示,请点此查看更多更全内容