Summary We consider the Markov decision process with finite state and action spaces at the criterion of average reward per unit time. We study the method of value oriented successive approximations, extensively treated by Van Nunen for the total reward case. Under a strong aperiodicity assumption and various conditions which guarantee that the gain of the process is independent of the starting state we show that the method converges and produces nearly optimal policies.
Zusammenfassung Wir betrachten Markoffsche Entscheidungsprozesse mit endlichem Zustands- und Aktionenräumen für das Kriterium des Durchschnittsertrags. Wir untersuchen die Methode der wertorientierten sukzessiven Approximation, die für das Kriterium des Gesamtertrags von Van Nunen ausführlich untersucht wurde. Unter einer starken Aperiodizitätsbedingung und verschiedenen Voraussetzungen, die eine Unabhängigkeit des optimalen Durchschnittsertrages vom Anfangszustand garantieren, beweisen wir die Konvergenz der Methode.

G. Hübner 《OR Spectrum》1988,10(3):161-166
Summary The classical procedure for the adaptive control of average reward Markov decision processes with an unknown parameter chooses at each stage a decision which is optimal for the average reward problem with the presently estimated parameter. But in many cases it is inefficient or impossible to compute each time the long run optimal policy. So successive approximation methods were proposed and investigated. We present a unifying and generalizing approach including both types of methods mentioned above and generating a lot of new procedures, too.
Zusammenfassung Das klassische Verfahren für die adaptive Steuerung von Markovschen Entscheidungsprozessen mit einem unbekannten Parameter und Durchschnittsgewinn-Kriterium wählt auf jeder Stufe eine Entscheidung, die durchschnittsoptimal ist für das Problem mit dem gerade geschätzten Parameter. Aber in vielen Fällen ist es nicht effektiv oder unmöglich, jeweils die optimale Politik für unendlichen Planungshorizont zu berechnen. Deshalb wurden Verfahren der sukzessiven Approximation vorgeschlagen und untersucht. Hier wird nun ein allgemeiner Ansatz vorgestellt, der die beiden genannten Methoden enthält und darüber hinaus den Rahmen für eine Reihe weiterer Verfahren bietet.

Summary The first part of this survey paper is devoted to derive under rather weak conditions, which don't guarantee contraction, a number of important existency and convergency results in Markov decision theory. In the second part of the paper conditions that guarantee that the contraction mapping approach can be used are analysed. These conditions are rather weak and allow for unbounded rewards. The generation of successive approximation methods for solving Markov decision processes by using action depending stopping times is described at the end of the paper.
Zusammenfassung Im ersten Teil dieser Übersichtsarbeit werden unter schwachen Voraussetzungen, die keine Kontraktion garantieren, eine Reihe wichtiger Existenz- und Konvergenzaussagen für Markoffsche Entscheidungs-prozesse gewonnen. Im zweiten Teil der Arbeit werden Bedingungen untersucht, die einen Zugang mit Hilfe kontrahierender Abbildungen erlauben. Diese Bedingungen sind recht schwach, insbesondere sind unbeschränkte Erträge zugelassen. Am Ende der Arbeit wird beschrieben, wie man mit Hilfe aktionsabhängiger Stopzeiten verschiedene Verfahren der sukzessiven Approximation zur Lösung Markoffscher Entscheidungsprozesse gewinnen kann.

V Rajaraman  N R Garud 《Sadhana》1996,21(3):381-393
In this paper we define nondeterministic decision tables to describe process control rules specified imprecisely. An example of such a control rule is ‘if temperature ishigh and pressure islow then open valveslightly”. The definition of nondeterministic decision tables is based on fuzzy sets and associated logic. We show how nondeterministic decision tables are interpreted and specified actions executed based on measured values of independent control variables. When nondeterministic decision tables are formulated based on rules given by experts it is necessary to determine whether they have any redundant rules, missing rules or contradictory rules. We define these terms for nondeterministic decision tables and show how such logical errors can be detected in certain cases. Grant of a fellowship to N R Garud by the Jawaharlal Nehru Centre for Advanced Scientific Research, Bangalore for doing this research is gratefully acknowledged.  相似文献   

Summary In this paper we study three finite state, value and policy iteration algorithms for denumerable space Markov decision processes with respect to the average cost criterion. The convergence of these algorithms is guaranteed under a scrambling-type recurrency condition and various tail conditions on the transition probabilities. With the value iteration schemes we construct nearly optimal policies by concentrating on a finite set of important states and controlling them as well as we can. The policy space algorithm consists of a value determination scheme associated with a policy and a policy improvement step where a better policy is determined. Thus a sequence of improved policies is constructed which is shown to converge to the optimal average cost policy.
Zusammenfassung Für Markovsche Entscheidungsprozesse mit abzählbarem Zustandsraum untersuchen wir für den Fall des Durchschnittskostenkriteriums drei endliche Wertiterations- und Politikiterations-Algorithmen. Die Konvergenz der Algorithmen wird durch scramblingtype Rekurrenzbedingungen und verschiene tail Bedingungen an die Übergangswahrscheinlichkeiten gesichert. Mit den Wertiterationsverfahren konstruieren wir fast optimale Politiken, indem wir uns auf eine endliche Menge von wichtigen Zuständen konzentrieren und diese bestmöglich kontrollieren. Der Politikiterations-Algorithmus besteht aus einem Schritt zur Wertbestimmung für eine Politik und einem Schritt zur Verbesserung der Politik. Auf diese Weise wird eine Folge verbesserter Politiken konstruiert, die Konvergenz zur optimalen Politik wird gezeigt.

The paper describes the background and evaluates the effectiveness of a flashing green phase just before the amber leading to red phase in the traffic signal cycle. This type of signal sequence, practiced in Israel and a few other countries, was thought to provide drivers with useful information which should lead to better stop or go decisions. However, accident analysis studies have shown an increase in collisions following the introduction of a flashing green. A laboratory simulation study designed to explain the accident findings was conducted. It was found that the flashing green elicited from the drivers an earlier decision response and a higher number of inappropriate stopping decisions, particularly close to the intersection. The pattern of stopping and crossing decisions in the flashing green program suggests more friction between stopping and nonstopping vehicles and, therefore, a higher likelihood of front-rear collisions compared to a regular program.  相似文献   

Two alternatives to the multivariate exponentially weighted moving average (EWMA) chart are considered. One of these alternatives is an arithmetic moving average control chart which is the arithmetic average of the sample means for the last k periods. The other alternative is a truncated version of the EWMA which truncates the EWMA after a fairly short period of time so that more emphasis is placed on the most current observation. Simulated average run length (ARL) results indicate that for some situations these alternatives charts outperform the multivariate EWMA chart. Some suggestions are made for designing charts to detect a specific shift and comparing the alternative charts. Some authors have noted that past in-control data may diminish the chart's ability to detect a shift in the process mean. To examine this, the scenario will be discussed when the process is in-control initially but goes out-of-control at some random time period. This is more like a realistic manufacturing setting, where the process is in-control initially, but after some time the process mean shifts to a new mean and in this paper it will be shown which control charts detect a shift faster using this scenario.  相似文献   

In this paper, we propose 2 new exponentially weighted moving average (EWMA) control charts based on the moving average (MA) statistic and lnS2 to monitor the process mean and variability of a Weibull process with subgroups. The inverse error function is used to transform the Weibull‐distributed data to a standard normal distribution. The Markov chain approach is used to derive the average run length (ARL). Subsequently, the performances of the proposed charts with other existing control charts are provided. The comparison shows that the EWMA‐MA outperforms the and EWMA‐ control charts for monitoring the process mean of ARL values. The comparison also shows that the EWMA‐lnS2 outperforms the S2 and S2‐MA control charts for monitoring the process variability of ARL value. Two examples are used to illustrate the application of the proposed control charts.  相似文献   

An efficient alternative to the S control chart for detecting shifts of small magnitude in the process variability using a moving average based on the sample standard deviation s statistic is proposed. Control limit factors are derived for the chart for different values of sample size and span w. The performance of the moving average S chart is compared to the S chart in terms of average run length. The result shows that the performance of moving average S chart for varying values of w outweigh those of the S chart for small and moderate shifts in process variability.  相似文献   

This paper presents a unifying algorithmic analysis for a general class of single server queueing systems with a state dependent Markovian input process and a phase-type service time distribution including single server queues with random and quasirandom input. Using regenerative analysis we develop numerically stable and efficient recursion schemes to compute the state probabilities. The computation of the waiting times is based on the state probabilities.  相似文献   

The semi-Markov decision model is a powerful tool in analyzing sequential decision processes with random decision epochs. In this paper, we have built the semi-Markov decision process (SMDP) for the maintenance policy optimization of condition-based preventive maintenance problems, and have presented the approach for joint optimization of inspection rate and maintenance policy. Through numerical examples, the improvement of this method is compared with the scheme, which optimizes only over the inspection rate. We also find that under a special case when the deterioration rate at each failure stage is the same, the optimal policy obtained by SMDP algorithm is a dynamic threshold-type scheme with threshold value depending on the inspection rate.  相似文献   

Several modifications and enhancements to control charts in increasing the performance of small and moderate process shifts have been introduced in the quality control charting techniques. In this paper, a new hybrid control chart for monitoring process location is proposed by combining two homogeneously weighted moving average (HWMA) control charts. The hybrid homogeneously weighted moving average (HHWMA) statistic is derived using two smoothing constants λ1 and λ2 . The average run length (ARL) and the standard deviation of the run length (SDRL) values of the HHWMA control chart are obtained and compared with some existing control charts for monitoring small and moderate shifts in the process location. The results of study show that the HHWMA control chart outperforms the existing control charts in many situations. The application of the HHWMA chart is demonstrated using a simulated data.  相似文献   

In the service and manufacturing industry, memory-type control charts are extensively applied for monitoring the production process. These types of charts have the ability to efficiently detect disturbances, especially of smaller amount, in the process mean and/or dispersion. Recently, a new homogeneously weighted moving average (HWMA) chart has been proposed for efficient monitoring of smaller shifts. In this study, we have proposed a new double HWMA (DHWMA) chart to monitor the changes in the process mean. The run length profile of the proposed DHWMA chart is evaluated and compared with some existing control charts. The outcomes reveal that the DHWMA chart shows better performance over its competitor charts. The effect of non-normality (in terms of robustness) and the estimation of the unknown parameters on the performance of the DHWMA chart are also investigated as a part of this study. Finally, a real-life industrial application is offered to demonstrate the proposal for practical considerations.  相似文献   

The objective of the study is to select the best performing supplier among the group according to the prioritisation of performance criterion through the application of techniques like MISM (modified interpretive structural modelling), MICMAC (impact matrix cross-reference multiplication applied to a classification), and AHP (analytical hierarchy process). To understand the interaction between the factors and to prioritise them, MISM technique has been applied by using which weights have been calculated for the performance factors and establishing a contextual relationship between the available factors and then ranking of the factors is done based on the results obtained. In the MICMAC analysis, performance criteria are classified into four clusters depending upon their driving power and dependence power. This helps to find out which criteria are influencing the supplier selection process. AHP is used to rank the supplier to find the best one from the group of suppliers. After ranking the suppliers, sensitivity analysis has been applied to determine the most critical criteria i.e. how sensitive is the ranking of the alternative to the change in weights of the criterion or the alternatives. A study was done in an automotive component manufacturing industry in the southern part of India. Finally, validation of the model is performed by the sensitivity analysis.  相似文献   

Vibratory finishing (VF) employs vibrationally-fluidized granular media to finish the surfaces of workpieces that are entrained in the flowing media. Its application has been based mostly on experience and trial-and-error due to the complexity of the granular material behavior. The present study used discrete element modeling (DEM) to investigate how the movement of a commercial two-dimensional tub finisher influenced the average particle speed of the media in a bed of smooth, steel, spherical particles, and thus the work that would be done on an entrained workpiece. The parameters governing the tub wall motion (frequency, in-plane amplitudes, and phases of vibration) and the coefficient of friction between the media and the wall were systematically varied in 71 three-dimensional DEM simulations. The average particle speed was affected mostly by the vertical amplitude of tub motion rather than by the frequency, and was mostly independent of other parameters of motion and of the wall friction. A strong relationship was found between the average particle speed and the work done by the wall per cycle of vibration. The normal force on the wall was also found to correlate strongly with the normal component of the wall velocity. Together, these relationships offer the potential to enable the analytical prediction of the average particle speed based on the motion parameters of the tub alone. The paper provides a set of practical guidelines for the control of the average particle speed in VF that are explained by the forces between the media and walls of the tub finisher.  相似文献   

An auxiliary information-based (AIB) maximum exponentially weighted moving average (MaxEWMA) chart has been proposed to simultaneously monitor both increases and decreases in the process mean and/or variability, called the AIB-MaxEWMA chart, which is superior to the existing MaxEWMA chart. In this paper, we propose the AIB maximum generally weighted moving average chart, called the AIB-MaxGWMA chart, to further enhance the sensitivity of the AIB-MaxEWMA chart. Numerical simulation studies indicate that the AIB-MaxGWMA chart is sensitive to small shifts in the process mean and/or variability. The performance of the AIB-MaxGWMA chart based on average run lengths (ARLs) also outperforms than its counterparts including AIB-MaxEWMA, MaxGWMA and MaxEWMA charts. An example is used to illustrate the efficiency of the proposed AIB-MaxGWMA chart in detecting small process shifts.  相似文献   

The variable sampling interval exponentially weighted moving average median chart with estimated process parameters is proposed. The charting statistic, optimal design, performance evaluation, and implementation of the proposed chart are discussed. The average of the average time to signal (AATS) criterion is adopted to evaluate the performance of the proposed chart. The estimated process parameter‐based VSI EWMA median (VSI EWMA median‐e) chart is compared with the estimated process parameter‐based Shewhart median (SH median‐e), EWMA median (EWMA median‐e), and variable sampling interval run sum median (VSI RS median‐e) charts, in terms of the AATS criterion, where the VSI EWMA median‐e chart is shown to be superior. When process parameters are estimated, the standard deviation of the average time to signal (SDATS) criterion is used to evaluate the AATS performance of the VSI EWMA median‐e chart. Based on the SDATS criterion, the minimum number of phase‐I samples required by the VSI EWMA median‐e chart so that its performance is close to the known process parameters VSI EWMA median chart is recommended.  相似文献   

