Pregled bibliografske jedinice broj: 536292
Reinforcement learning in non-Markov conservative environment using an inductive qualitative model
Reinforcement learning in non-Markov conservative environment using an inductive qualitative model // International journal on artificial intelligence tools, 20 (2011), 5; 887-909 doi:10.1142/S0218213011000425 (međunarodna recenzija, članak, znanstveni)
CROSBI ID: 536292 Za ispravke kontaktirajte CROSBI podršku putem web obrasca
Naslov
Reinforcement learning in non-Markov conservative environment using an inductive qualitative model
(Reinforcement learning in non-markov conservative environment using an inductive qualitative model)
Autori
Jović, Franjo ; Slavek, Ninoslav ; Blažević, Damir
Izvornik
International journal on artificial intelligence tools (0218-2130) 20
(2011), 5;
887-909
Vrsta, podvrsta i kategorija rada
Radovi u časopisima, članak, znanstveni
Ključne riječi
retail process; data normalization; periodicity elimination
Sažetak
The majority of real-world processes, such as power plants, banking and retail businesses, are non-Markov processes, being conservative systems with stochastic supply and demand. As an example, a retail process possesses long-term memory of the customer's experience and market price drift that deviates from the Markov property. Modeling the reward in this process is directed towards actions that have to be executed daily in order to support it. These actions are further severely distracted by the hidden periodicity of customer behavior on a monthly and weekly basis. Alternative solutions in the retail business are achieved using a retail potential market model and a pricing policy based on demography. The policy of non-Markov behavior has not been intensively studied, although the literature indicates the non-Markov nature of many real process models, such as bank rating migrations. A solution is proposed, based on day-to-day data collection from point-of-sale (POS) locations, synthesizing the reward function from separate sale component rewards using qualitative models, and indicating the most outstanding sale groups that form the reward model. The normalization of POS data has been used for the elimination of periodicities and of non-Markov features of the process data. Reinforcement learning has been additionally supported by artificial corrections of the normalized reward function, and thus the obtained models used for recognition of the most promising and most defective hidden retail product groups. Model data were analyzed for the statistical significance of the obtained results, comparing normalized and non-normalized sales data distributions. The method is simple and effective, being applicable to each POS separately, for a complex retail business network, as well as for other conservative environments. The obtained qualitative correlations of model and reward function lie between 0.72 and 0.95, even for the simple cases presented.
Izvorni jezik
Engleski
Znanstvena područja
Računarstvo
POVEZANOST RADA
Projekti:
165-1652017-2016 - Holografski logički analizator (Slavek, Ninoslav, MZO ) ( CroRIS)
Ustanove:
Fakultet elektrotehnike, računarstva i informacijskih tehnologija Osijek
Citiraj ovu publikaciju:
Časopis indeksira:
- Current Contents Connect (CCC)
- Web of Science Core Collection (WoSCC)
- Science Citation Index Expanded (SCI-EXP)
- SCI-EXP, SSCI i/ili A&HCI
- Scopus
Uključenost u ostale bibliografske baze podataka::
- ISI Alerting Services
- CompuMath Citation Index
- Current Contents�//Engineering, Computing, and Technology