Napredna pretraga

Pregled bibliografske jedinice broj: 536292

Reinforcement learning in non-Markov conservative environment using an inductive qualitative model

Jović, Franjo; Slavek, Ninoslav; Blažević, Damir
Reinforcement learning in non-Markov conservative environment using an inductive qualitative model // International journal on artificial intelligence tools, 20 (2011), 5; 887-909 doi:10.1142/S0218213011000425 (međunarodna recenzija, članak, znanstveni)

Reinforcement learning in non-Markov conservative environment using an inductive qualitative model
(Reinforcement learning in non-markov conservative environment using an inductive qualitative model)

Jović, Franjo ; Slavek, Ninoslav ; Blažević, Damir

International journal on artificial intelligence tools (0218-2130) 20 (2011), 5; 887-909

Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni

Ključne riječi
Retail process; data normalization; periodicity elimination

The majority of real-world processes, such as power plants, banking and retail businesses, are non-Markov processes, being conservative systems with stochastic supply and demand. As an example, a retail process possesses long-term memory of the customer's experience and market price drift that deviates from the Markov property. Modeling the reward in this process is directed towards actions that have to be executed daily in order to support it. These actions are further severely distracted by the hidden periodicity of customer behavior on a monthly and weekly basis. Alternative solutions in the retail business are achieved using a retail potential market model and a pricing policy based on demography. The policy of non-Markov behavior has not been intensively studied, although the literature indicates the non-Markov nature of many real process models, such as bank rating migrations. A solution is proposed, based on day-to-day data collection from point-of-sale (POS) locations, synthesizing the reward function from separate sale component rewards using qualitative models, and indicating the most outstanding sale groups that form the reward model. The normalization of POS data has been used for the elimination of periodicities and of non-Markov features of the process data. Reinforcement learning has been additionally supported by artificial corrections of the normalized reward function, and thus the obtained models used for recognition of the most promising and most defective hidden retail product groups. Model data were analyzed for the statistical significance of the obtained results, comparing normalized and non-normalized sales data distributions. The method is simple and effective, being applicable to each POS separately, for a complex retail business network, as well as for other conservative environments. The obtained qualitative correlations of model and reward function lie between 0.72 and 0.95, even for the simple cases presented.

Izvorni jezik

Znanstvena područja


Projekt / tema
165-1652017-2016 - Holografski logički analizator (Ninoslav Slavek, )

Fakultet elektrotehnike, računarstva i informacijskih tehnologija Osijek

Časopis indeksira:

  • Current Contents Connect (CCC)
  • Web of Science Core Collection (WoSCC)
    • Science Citation Index Expanded (SCI-EXP)
    • SCI-EXP, SSCI i/ili A&HCI
  • Scopus

Uključenost u ostale bibliografske baze podataka:

  • ISI Alerting Services
  • CompuMath Citation Index
  • Current Contents�//Engineering, Computing, and Technology