ࡱ> G -6bjbjَ h0]  8\xDg<(  14efffffff$QiEkf!f   ff#$Uf ,`ܣJA~>fAN ALGORITHM TO GENERATE DATA FOR TESTING Kreaimir Fertalj, Vedran Mornar, Damir Kalpi Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, 10000 Zagreb, Croatia kresimir.fertalj@fer.hr, vedran.mornar@fer.hr, damir.kalpic@fer.hr Abstract: This paper describes a generally applicable algorithm to generate the test data. The generated data obey the integrity rules. The procedures described in the article may be implemented as part of a tool specialized for data generation. In addition, the procedures can be directly implemented in front-end language of the target database management system. Depending on the way of implementation of the algorithm, generated data can be written into flat files, into a special database of test data, or directly into the target database tables. Keywords: relational database, integrity constraints, test data Introduction One of the problems encountered by a database application developer is lack of the test data. While it is relatively easy to add a few records to every entity set in order to fill the forms and give an idea about the general behavior and appearance of an application, it is certainly not easy to enter sufficient data to simulate the application behavior in the production. At a first glance, it may seem sufficient to design the database properly and to create some indices to improve the response time as the amount of data grows. It may seem rather simple to apply a random number generator to produce attribute values, but these data must also conform to the integrity rules. What's more, the initial application design is often changed, resulting in a redesign of the user interface. Also, an introduction of controlled redundancy is sometimes desirable and it results in a deviation from the standard normal form. Therefore, for multiple steps in an application development the test data of the corresponding database structure is repeatedly required. An application generator has resulted from the authors' research and development. It generates the software upon computable specifications and upon the entity-relationship data model as proposed in (Chen, 1976). The model is stored in the meta-base shown in  REF _Ref414693674 \h Fig. 1. The meta-base stores the information about entities, domains, attributes, relationships and keys. It is used both to generate the application source code based on the program templates written in a specification language described in (Fertalj, 1993) and (Fertalj, 1997), and to generate the test data which conform to the integrity rules (Maier, 1983).  Fig.  SEQ Figure \* ARABIC 1. The meta-base The Basic Algorithm The basic algorithm is described by the procedure GenTestData. As first, for every domain in the model, the procedure GenDomData is invoked to generate a set of unique values for the domain processed. Second, for each entity in the model, the procedure GenEntData is used to generate entity touples. Within one touple, for every foreign key an appropriate value from the set of touples of parent entity is taken. For each attribute of the touple, which is not part of a foreign key, a value from the domain data set is taken. Let the following symbols denote: D the set of all domains from the data model E the set of all entities from the data model, ordered by the respective count of contained foreign keys procedure GenTestData()  // generate domain values for each D ( D  GenDomData(D) // generate touples  for each E ( E GenEntData(E) end GenTestData. The Generation of Domain Values As already mentioned in the previous chapter, the idea is to generate a set of unique values for the domains. In this way, the problem of unique values for serial numbers (incremental and random), and the problem of the fields with unique index is solved. Following notation is assumed: VD the set of values for domain D VD0 the set of default values for the domain D (user defined values or values from other projects) d the value for the domain dmin minimal value of numerical type or date type dmax maximal value of numerical type or date type ( maximal number of touples of entities defined over domain D Rand(a,b) function that returns a pseudo-random integer in the range [a,b] RString(D,n) function that returns a pseudo-random character string of length n from the set of characters valid for the domain D D.DomLen maximal length in memory units required to represent a value from the domain D procedure GenDomData(D)  if (VD0( ( 0  VD := VD0 else  while (VD( < (  if type of domain D is numeric or date  d := Rand(dmin, dmax) else // type of domain D is character type of length defined by D  d := RString(D, Rand(1, D.DomLen)) VD := VD ( {d} end GenDomData. Data Generation The Generation of Entity Data The procedure GenEntData generates touples for a given entity E. If necessary, the data for each entity, which is referenced by foreign keys of E, is generated first. In this way, the existence of values for foreign keys is ensured. Touples are generated by subsequent calls to the function FillRec. The function returns a valid touple of generated values. If the primary key from the entity E is of type serial, the function FillRec is called for each serial number. Otherwise, the function FillRec is called until the required number of touples is generated. It is assumed that each entity has only one field of type serial. In addition, it is assumed that this field is not part of another primary or alternate key. To describe the remainder of the algorithm, following notation is used: KE the set of fields of entity E that denote primary attributes OE the set of fields of entity E that are not part of neither primary nor foreign keys RE the set of relationships where entity E from set E is a child; it is assumed that the order of RE is decreased by the count of pairs of attributes defining the relationship (in other words, it is sorted in descending order by the complexity of foreign key) TE set of touples generated for entity E from the set E t entity touple t(F) the value of field F within touple t n vector of numbers of touples to be generated, ordered in the same way as the set E Parent(R) function that returns the parent entity in relationship R Serial(F) function that, for the field F, returns logical value true when the field holds serial values, otherwise returns logical value false Required(F) function that, for the field F, returns logical value true when the field requires a value, otherwise returns logical value false Unique(F) function that, for the field F, returns logical value true when unique values for the field are expected, otherwise returns logical value false FldDom(F) function that returns the domain for the field F b flag that represents a decision on determination of values for foreign attributes when the field does not require a value CR the set of pairs of fields defining the relationship R ChiFld(C) function that, for the pair C ( CR , determines the field corresponding to the foreign attribute from the child entity ParFld(C) function that, for the pair C ( CR , determines the field corresponding to the primary attribute from the parent entity procedure GenEntData(E)  TE := ( create KE, OE, RE // Generate touples for parent entities for each R ( RE  P := Parent(R) if (TP( = 0 ( ((P ( E)  GenEntData(P) // Generate touples for entity E if ( K ( KE | Serial(K) = true  for i := 1 to nE  tE := FillRec(E) tE(K) := i TE := TE ( {tE} else  while (TE( < nE  tE := FillRec(E) TE := TE ( {tE} end GenEntData. The Generation of Touples While considering possible solutions for touples generation, it was recognized that, generally: an attribute can be a primary attribute in more than one key, a primary attribute can be a foreign attribute in more than one foreign key, a foreign attribute can be a foreign attribute in more than one foreign key. Due to these reasons, the process of filling a touple in function FillRec is done in the following way: For each relationship where the entity E acts as child, the values for foreign keys are determined based on existing touples of parent entities. If, for the field to be filled, a value already exists (it has been filled by filling some other foreign key), the value previously filled remains unchanged. Besides that, for the foreign keys that may acquire null values, the decision whether to take null values is made randomly. When an autoreflexive relationship is processed, it is assumed that for the first generated touple, the foreign key in that relationship must contain the null value. After the foreign key values have been determined, a referential integrity check is performed. If the generated foreign key values are correct (if they satisfy the referential integrity constraint/rule), the algorithm continues to determine the values for primary keys of the entity E. For each field that represents a primary attribute, and which it is not a field of the type serial, or the value for the field has already been set, the function GetFldVal is used to set a unique value for the field from set of values for the correspondent domain. By preserving the previously set value, the problem of overlapping values for foreign and primary attributes is avoided. After the values for all the keys have been set, a check of existential integrity is performed. If the generated values are not correct, the procedure of filling the primary and foreign attribute values is repeated. Finally, the values for all other attributes are determined. The decision is made randomly whether to set a null value for a field that can acquire null value. Upon the decision, the function GetFldVal is called to return a member of the set of values from the correspondent domain. function GetFldVal(E, F)  D := FldDom(F) if Unique(F) ( F ( KE  j := (TE(+ 1 else  j := Rand(1, (VD() return(VDj) end GetFldVal. function FillRec(E)  do  initialize touple tE to null for each R ( RE // determine foreign key values  P := Parent(R)  if P ( E ( (TE( = 0 b := 0 else  b := Rand(0, 1) j := Rand(1, (TP() // index of parent touple create CR for each C ( CR  if tE(ChiFld(C)) = null ( (Required(ChiFld(C)) ( b=1) tE(ChiFld(C)) := tPj(ParFld(C)) if all foreign keys are correct  for each K ( KE // determine primary keys of E  if (Serial(K) ( tE(K) = null  tE(K) := GetFldVal(K) until all key values are correct for each O ( OE // values for other attributes  if Required(O) ( Rand(0, 1) = 1  tE(O) := GetFldVal(O) return(tE) end FillRec. Conclusion The procedures described in this article efficiently generate the sets of test data of the required cardinality. While it is impossible for a development team to enter manually the adequate amount of data to test the application performance under real life-like conditions, it is only a matter of minutes to populate the database with necessary number of records by using the described procedures. They might not be so helpful when the data can be imported from an existing information system. Even then, we consider these procedures to be a valuable auxiliary tool to prevent some unfortunate scenarios from happening upon the deployment of a completely new database application (Bach, 1997). Testing the application against the data generated in the described manner can prevent a mistake in the design than can not only leave a customer unsatisfied, but can also result in an application which is beyond usability. References Bach, J. (1997), "The Hard Road From Methods to Practice", Computer, IEEE Computer Society, Vol. 30, No. 2, pp. 129-130. Chen, P.P. (1976), "The EntityRelationship Model  Toward a Unified View of Data", ACM Transaction on Database Systems, Vol. 1, No. 1, pp. 936. Fertalj, K. (1993), An application generator for a general 4th generation language, Masters Thesis (in Croatian), ZPM ETF, Zagreb. Fertalj, K. (1997), An improvement of methods to accelerate and standardize the software application production, Ph.D. Thesis (in Croatian), ZPM FER, Zagreb. Maier, D. (1983), The Theory of Relational Databases, Computer Science Press Inc., Potomac, Maryland.  The parent entity is the one referenced by a foreign key, in contrast to the child (or referencing) entity which contains foreign keys. PAGE   +,6km{ '(./EFGHku7@ALOPnvyz{|}~ƵƭƤƵƵƭƤOJQJmHnH jmHnHjUhmHnH 5mHnHmHnH >*mHnHOJQJ j0J&U 5OJQJ j}UmHjU jU@65@TV>|+,mn{| &')XYmn}~$$$ $ TV>|+,mn{| &')XYmn}~67Ol}?þ~{x:;  ~ "#;  EGH ; .67Ol}?"`.p;0=>?@ABlm"#^_`di,7 jmHnH H*mHnH H*mHnHOJQJmHnH jmHnHjUhmHnH >*OJQJ jhOJQJ 5OJQJ H*OJQJOJQJ H*OJQJOJQJ 5mHnH >*mHnHmHnH:"`. 2{/0X&|~{xurolif+/;  ;   5a*:RSTUV0|' 2{/0X  !$'8<{| ;<!OJQJ 5OJQJ jmHnH 5mHnH jhmHnH jmHnH >*mHnH H*mHnH H*mHnHOJQJmHnHjUhmHnHmHnHD!(78XYZwx$%&'y|JOPX[yz"rwx~1 >* >*OJQJ5>*OJQJ H*OJQJOJQJOJQJ 5OJQJV&|Px1 k n!o!!!!!!!")"L"m"""p0|Px1 k n!o!!!!!!!")"L"m""""""""#"###=#>###*$þqnkdZF    ;   /@Xq8LWops*fb#1 2 3 i j k q t !!!!!!!o!x!y!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"""""¼˴¼¼¼ˮ˦¼ˮˮ˞¼ jmHnH jmHnH >*mHnH jmHnH H*mHnHOJQJmHnHmHnHjUhmHnH5mHmH>*mH j >*OJQJOJQJ H*OJQJOJQJ>" " " " """"""%"N"P"Q"R"U"V"W"X"Y"\"b"h"l"m"n"r"u"}"""""""""""""""""""""""""""""""""""ҭҭҭҭ jmHnH jmHnH H*mHnHOJQJmHnH jmHnH j$mHnH >*mHnH 5mHnHjUhmHnH jmHnH jmHnH jmHnHmHnH>""""""#"###=#>###*$w$x$$$2'3'Q(R(**++++ ,"""""""""""""#### # # # ####### #"##w$$$% %N(O(((m+v+++++++++++++,,,, , , , , ,,jUmHnH jmHnH jmHnHOJQJ 5OJQJmH  >*mHnH jmHnHOJQJmHnHjUhmHnH 5mHnHmHnH jmHnH H*mHnH>*$w$x$$$2'3'Q(R(**++++ ,,%,=,K,Z,[,o,u,,,,, --3-4-h-y----}xsnid_S 'BX/AZ[vwBC  $,,,,, ,%,&,/,3,7,8,9,:,;,?,E,F,G,I,K,N,O,X,Z,[,c,d,k,o,p,r,t,u,v,,,,,,,,,,,,,,,,,,,,,,,,,,,,-- jmHnH jmHnH jmHnHj5UhmHnH jmHnH 5mHnHjUhmHnH >*mHnHmHnH H*mHnHOJQJmHnH jmHnH? ,,%,=,K,Z,[,o,u,,,,, --3-4-h-y-----.U.{.....------ -$-(-?-C-G-H-I-J-K-u-v-w----------------------------------------.. .&...1.2.3.4.͹ jmHnH jmHnHj5UhmHnH jmHnH H*mHnHOJQJmHnH jmHnH 5mHnHjUhmHnH >*mHnHmHnHB--.U.{...../5/B/O/P/Q/\/]/K1L12233z3 44-5566)6*6+6,6-6ywwwtwwy" k  fk  k  {k   k ;  ];   1bc()"4.5.T.U.V.^.`.a.b.h.l.m.o.p.v.{.|...................//// ///// /'/0/7/=/>/?/@/B/E/F/M/O/<3[33333 4H46 jmHnHOJQJmHnH jmHnH 5mHnHjUhmHnH jmHnH jmHnH >*mHnHj5UhmHnHmHnH H*mHnH@./5/B/O/P/Q/\/]/K1L12233z3 44-5566'6(6)6*6+6"h"&`#$% & FkH4J4^444?5a555556666%6&6'6-60J# j0J#U j0J&U66H*+6,6-6 & Fk. A!n"n#$%}DyK _Ref414693674'DdH2upb  c >AGENDATA4.WMF2qvЧ$i T8Mp!EڍiQC/wvЧ$i T8ٜ$ :<6#x]lUGw"Hڥ U[(DQ۰Ƶ5VZ* iR? XOĚFh "QF`k#Ɛ* zfIN&dw͝;+|gmX㧯b=6XpMTJ=)öGiz!l]H B7>oerQ$dE%$PGGy^GEf5ZOQJK՛L&z}(G%alъMalюBiZ->Z9E)͝{ZO"Bc)ZLU\Ҧ(2c1\r4+g-'VͺZW$piCk(DLFL8cǦҔ9L^Lm;UKx ]tN\ъ3w̸𜮨APz X.h\mǙVRT<ӹ}ڹ!:^4ciʣ"S{=AJԝ[|Fge3 ZUvYQ(GqgoҜ|rŠHY/9Y3f}{iY_9=iY1H96*Sp:#&h3Z).y5%ޠ3>VOqG'I* @Vtg0ۼ5/os]P/늯ђHt{Qg .;]E>;)wYc~F1G1g2k``vx< /#(Nfۙșٝ"xR]8Sgtzzt\%\2ty yt _Ref414698425 _Ref414693674 _Ref414693649 _Ref414694347 _Ref414694060 _Ref414694073) W . //.1H W y./,0.1+3=CDJLQRX#$' ( . H 7 7 O O P P l l } } ~ ~     22{{||LLmmnn&&&&&&&&&& ' ' ' '''%'%'&'&'Z'['o'o'p'p'u'u'v'v''''''''''''' ( (((((y(y((((((((()) ) )U)U)V)V){){)|)|)))))))****01&1)1)1*1*1.1+#$& % ' ( . H 7 7 O O P P l l } } ~ ~     22{{||oLLmmnn&&&&&&&&&& ' ' ' '''%'%'&'&'Z'['o'o'p'p'u'u'v'v''''''''''''' ( (((((y(y((((((((()) ) )U)U)V)V){){)|)|)))))))****5*01&1)1)1*1*1.1Kresimir Fertalj'C:\TMP\AutoRecovery save of GenData.asdKresimir Fertalj'C:\TMP\AutoRecovery save of GenData.asdKresimir Fertalj'C:\TMP\AutoRecovery save of GenData.asdKresimir Fertalj'C:\TMP\AutoRecovery save of GenData.asdKresimir FertaljC:\znanost\ITI98\GenData.docKresimir FertaljC:\znanost\ITI98\GenData.docKresimir FertaljC:\znanost\ITI98\GenData.docKresimir FertaljC:\znanost\ITI98\GenData.docKresimir FertaljC:\znanost\ITI98\GenData.docKresimir FertaljC:\znanost\ITI98\GenData.docEHX366 h$hm $9K;5;T=$L j@Ni0c   HY 1U`fz1n + kUo `F(4~k y.! }""{#J [D$"\9{' Qf(<[XO,-LO0U. 64 {5z@6 0@8 >:sr\`= *= C+A6 @>A n-B,u @C Q_C~!OE nE C|G  H r^oI +^K )"L NL FO  wQ MQbYYXH[n^fz1^'N'%^Ln`8Zd .f Ogi jR=kbN CokpV>|q Ys ct _AzD5zkpV8.| Y `~  OJQJo( OJQJo( OJQJo( OJQJo(hh. hhOJQJo(@<.@<..@L<...@ <....@  < .....@ < ......@ \<.......@  <........@ <.........* hhOJQJo(hh)hh)@.hh.P.....x....   .....  X@  ......  ....... 8x........ `H.........@ PRILOG :@ hhOJQJo( hhOJQJo(hh.P.....x....   .....  X@  ......  ....... 8x........ `H.........hh)hh.hh. hhOJQJo( hhOJQJo(hh)hh))88)()()pp()  .@ @ .  .@hh) hhOJQJo( hhOJQJo(@ hhOJQJo(hh.hh)hh. hhOJQJo(hh)hh.hh. hhOJQJo(P.@@.0..``...  ....  .....  ...... `....... 00........ hhOJQJo(hh.hh. hhOJQJo(hh.P.....x....   .....  X@  ......  ....... 8x........ `H......... hhOJQJo(hh.hh.P.....x....   .....  X@  ......  ....... 8x........ `H.........hh.hh.P.....x....   .....  X@  ......  ....... 8x........ `H......... hhOJQJo( hhOJQJo(@hh.@<@<.@L<..@ <...@  < ....@ < .....@ \< ......@  <.......@ <........@56>*CJOJQJo()  hhOJQJo(hh.hh.P.....x....   .....  X@  ......  ....... 8x........ `H.........P. @@OJQJo(..0...``...  ....  .....  ...... `....... 00........@.hh))88)()()pp()  .@ @ .  . hhOJQJo(hh))88)()()pp()  .@ @ .  .@.@.hh))88)()()pp()  .@ @ .  .hh.t[D$bYYlȧ$x` Hk)"L@>AFO\`=!OEnԨ2*=@664MQ+O0U.;"{#O,-ct8.|{'ji0c >|qr^oIYs43Co5z%^0@8Ogi33NL wQkUo$LH445y.!i0c \5 jR=k5+^K6C+A`F[D$`6}">:[D$l6[D$x6[D$6[D$6[D$6[D$6666.f6477_Az_Az7<8H8_Az88^ @Cd9\9 H[n^91U`{5:p::(;;;<<<Y `~C|GnE^<^<<=Qf(Q_Cn-BxT@ OJQJo(ԧ @# OJQJo(0 @CJOJQJo(u @ OJQJo(2`m@CJOJQJo(u2 3@ OJQJo(@3 @ OJQJo(3 @ OJQJo(3 @ OJQJo(T4 @ OJQJo(4 8@ 0OJQJo( 5 @ OJQJo( 5 @ OJQJo(6 @ OJQJo(  6 C@ OJQJo(@7`\@ OJQJo(7`\@ OJQJo(7 @.T8`\@ OJQJo(8 @. 9 3@h OJQJo(h9 3@h OJQJo(9 @h OJQJo( : @h OJQJo(|: @h OJQJo(: @ OJQJo(4; @h hOJQJo(; @h OJQJo(; @h OJQJo(H< @h OJQJo(@< @h OJQJo($= @h OJQJo(E@....-100 @ GTimes New Roman5Symbol3& Arial?5 Courier NewOBScriptDomCasual BT5& TahomaEMonotype Sorts71CourierA"GenevaArial"1h#&1 &&) &&Bm (U3?!0d/1"07C:\Program Files\Microsoft Office\Templates\NormalK.dot1Kresimir FertaljKresimir FertaljOh+'0  0 < H T`hpx1ssKresimir Fertaljdres NormalK.dotKresimir Fertaljd66sMicrosoft Word 8.0@j2@F2I@<{P@vLJ (՜.+,D՜.+,, hp|  FERU/1  1 Title(RZ _PID_GUID _PID_HLINKSAN{77032843-88B8-11D0-A372-008048EDDF3E}ATJP' GENDATA4.WMF  !"#$%&'()*+,-./012346789:;<>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrsuvwxyz{}~Root Entry F`ۼ}JData 51Table=lWordDocumenthSummaryInformation(tDocumentSummaryInformation8|CompObjj@xH  FMicrosoft Word Document MSWordDocWord.Document.89q