首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The important new revenue opportunities that multimedia services offer to network and service providers come with important management challenges. For providers, it is important to control the video quality that is offered and perceived by the user, typically known as the quality of experience (QoE). Both admission control and scalable video coding techniques can control the QoE by blocking connections or adapting the video rate but influence each other’s performance. In this article, we propose an in-network video rate adaptation mechanism that enables a provider to define a policy on how the video rate adaptation should be performed to maximize the provider’s objective (e.g., a maximization of revenue or QoE). We discuss the need for a close interaction of the video rate adaptation algorithm with a measurement based admission control system, allowing to effectively orchestrate both algorithms and timely switch from video rate adaptation to the blocking of connections. We propose two different rate adaptation decision algorithms that calculate which videos need to be adapted: an optimal one in terms of the provider’s policy and a heuristic based on the utility of each connection. Through an extensive performance evaluation, we show the impact of both algorithms on the rate adaptation, network utilisation and the stability of the video rate adaptation. We show that both algorithms outperform other configurations with at least 10 %. Moreover, we show that the proposed heuristic is about 500 times faster than the optimal algorithm and experiences only a performance drop of approximately 2 %, given the investigated video delivery scenario.  相似文献   

2.
Nowadays, numerous social videos have pervaded on the web. Social web videos are characterized with the accompanying rich contextual information which describe the content of videos and thus greatly facilitate video search and browsing. Generally, those contextual data such as tags are provided at the whole video level, without temporal indication of when they actually appear in the video, let alone the spatial annotation of object related tags in the video frames. However, many tags only describe parts of the video content. Therefore, tag localization, the process of assigning tags to the underlying relevant video segments or frames even regions in frames is gaining increasing research interests and a benchmark dataset for the fair evaluation of tag localization algorithms is highly desirable. In this paper, we describe and release a dataset called DUT-WEBV, which contains about 4,000 videos collected from YouTube portal by issuing 50 concepts as queries. These concepts cover a wide range of semantic aspects including scenes like “mountain”, events like “flood”, objects like “cows”, sites like “gas station”, and activities like “handshaking”, offering great challenges to the tag (i.e., concept) localization task. For each video of a tag, we carefully annotate the time durations when the tag appears in the video and also label the spatial location of object with mask in frames for object related tag. Besides the video itself, the contextual information, such as thumbnail images, titles, and YouTube categories, is also provided. Together with this benchmark dataset, we present a baseline for tag localization using multiple instance learning approach. Finally, we discuss some open research issues for tag localization in web videos.  相似文献   

3.
We propose ViComp, an automatic audio-visual camera selection framework for composing uninterrupted recordings from multiple user-generated videos (UGVs) of the same event. We design an automatic audio-based cut-point selection method to segment the UGV. ViComp combines segments of UGVs using a rank-based camera selection strategy by considering audio-visual quality and camera selection history. We analyze the audio to maintain audio continuity. To filter video segments which contain visual degradations, we perform spatial and spatio-temporal quality assessment. We validate the proposed framework with subjective tests and compare it with state-of-the-art methods.  相似文献   

4.
Locating content in existing video archives is both a time and bandwidth consuming process since users might have to download and manually watch large portions of superfluous videos. In this paper, we present two novel prototypes using an Internet based video composition and streaming system with a keyword-based search interface that collects, converts, analyses, indexes, and ranks video content. At user requests, the system can automatically sequence out portions of single videos or aggregate content from multiple videos to produce a single, personalized video stream on-the-fly.  相似文献   

5.
空地异构机器人系统由空中无人机和地面无人车组成,当两者协作执行持续巡逻任务时,使用无人车充当无人机的地面移动补给站能够解决无人机续航能力不足的问题.运动受限于路网中的无人车必须在适当地点为无人机补充能量,这使得两者的路径高度耦合,给空地协作路径规划带来了挑战.针对此问题,本文通过分析无人机能量、路网、空地汇合时间、巡逻任务全覆盖等多种约束,以无人机完成全部巡逻任务的总距离为代价,建立了空地协作巡逻路径规划模型.该模型可推广至多架无人机与多辆无人车协作的情形.然后,采用遗传算法与蚁群算法相融合的方法,对无人机巡逻路径和无人车能量补给路径进行优化求解.仿真实验表明,本文的方法不仅可以得到很好的路径规划结果,而且较其他算法具有更优的收敛性和执行速度.  相似文献   

6.
智能手机等设备在拍摄照片和录制视频时会将拍摄位置和光学参数记录到影像文件中,可以提取并利用这些信息,在二维平面空间中还原出图片所对应的扇形视域(field-of-view,FOV).将影像文件及其对应的FOV存储在计算机中,用来支持用户对影像文件的空间查询.一种典型的空间查询是用户在地图上指定查询区域,计算机找出拍摄到...  相似文献   

7.
频质量评价有助于多媒体网络系统优化和视频编解码算法改进,近年来已成为图像质量评价领域的热门研究方向。在图像质量评价模型(FSIM)的基础上,结合视频局部多帧之间的时域相关性,通过采用新型的三维梯度算子计算原始视频序列与失真视频序列间的梯度相似度矩阵,提出了一种基于时域梯度相似度的视频质量评价模型(TGSM FSIM)。在LIVE视频数据上的测试结果表明,所提模型与视频主观评价有较好的一致性,SROCC与PLCC指标优于VSSIM和VQM两种广泛使用的视频质量评价算法。  相似文献   

8.

Videos are tampered by the forgers to modify or remove their content for malicious purpose. Many video authentication algorithms are developed to detect this tampering. At present, very few standard and diversified tampered video dataset is publicly available for reliable verification and authentication of forensic algorithms. In this paper, we propose the development of total 210 videos for Temporal Domain Tampered Video Dataset (TDTVD) using Frame Deletion, Frame Duplication and Frame Insertion. Out of total 210 videos, 120 videos are developed based on Event/Object/Person (EOP) removal or modification and remaining 90 videos are created based on Smart Tampering (ST) or Multiple Tampering. 16 original videos from SULFA and 24 original videos from YouTube (VTD Dataset) are used to develop different tampered videos. EOP based videos include 40 videos for each tampering type of frame deletion, frame insertion and frame duplication. ST based tampered video contains multiple tampering in a single video. Multiple tampering is developed in three categories (1) 10-frames tampered (frame deletion, frame duplication or frame insertion) at 3-different locations (2) 20-frames tampered at 3- different locations and (3) 30-frames tampered at 3-different locations in the video. Proposed TDTVD dataset includes all temporal domain tampering and also includes multiple tampering videos. The resultant tampered videos have video length ranging from 6 s to 18 s with resolution 320X240 or 640X360 pixels. The database is comprised of static and dynamic videos with various activities, like traffic, sports, news, a ball rolling, airport, garden, highways, zoom in zoom out etc. This entire dataset is publicly accessible for researchers, and this will be especially valuable to test their algorithms on this vast dataset. The detailed ground truth information like tampering type, frames tampered, location of tampering is also given for each developed tampered video to support verifying tampering detection algorithms. The dataset is compared with state of the art and validated with two video tampering detection methods.

  相似文献   

9.
Over-The-Top (OTT) video services are becoming more and more important in today’s broadband access networks. While original OTT services only offered short duration medium quality videos, more recently, premium content such as high definition full feature movies and live video are offered as well. For operators, who see the potential in providing Quality of Experience (QoE) assurance for an increased revenue, this introduces important new network management challenges. Traditional network management paradigms are often not suited for ensuring QoE guarantees as the provider does not have any control on the content’s origin. In this article, we focus on the management of an OTT-based video service. We present a loosely coupled architecture that can be seamlessly integrated into an existing OTT-based video delivery architecture. The framework has the goal of resolving the network bottleneck that might occur from high peaks in the requests for OTT video services. The proposed approach groups the existing Hypertext Transfer Protocol (HTTP) based video connections to be multicasted over an access network’s bottleneck and then splits them again to reconstruct the original HTTP connections. A prototype of this architecture is presented, which includes the caching of videos and incorporates retransmission schemes to ensure robust transmission. Furthermore, an autonomic algorithm is presented that allows to intelligently select which OTT videos need to be multicasted by making a remote assessment of the cache state to predict the future availability of content. The approach was evaluated through both simulation and large scale emulation and shows a significant gain in scalability of the prototype compared to a traditional video delivery architecture.  相似文献   

10.
Wang  Bing  Peng  Qiang  Wang  Eric  Xiang  Wei  Wu  Xiao 《Multimedia Tools and Applications》2022,81(2):1893-1918

The sheer size and complex structure of light field (LF) videos bring new challenges to their compression and transmission. There have been numerous LF video compression algorithms reported in the literature to date. All of these algorithms compress and transmit all the views of an LF video. However, in some interactive or selective applications where users can choose the area of interest to be displayed, these algorithms generate a significant computational load and enormous data redundancies. In this paper, we propose an interactive LF video streaming system based on a user-dependent view selection scheme and an LF video coding method, which streams only the required data. Specifically, by predicting trajectories and using projection models, the viewing area of users in a limited consecutive number of time slots is firstly calculated, and then a user-dependent view selection method is proposed to determine the selected views of users for streaming. Finally, with the novel LF video sequence formed by only the selected sets of views, an adaptive coding method is presented for different LF video sequences based on users’ gestures. Experimental results illustrate that the proposed interactive LF video streaming system can achieve the best performance compared with other comparison methods.

  相似文献   

11.
Large stores of digital video pose severe computational challenges to existing video analysis algorithms. In applying these algorithms, users must often trade off processing speed for accuracy, as many sophisticated and effective algorithms require large computational resources that make it impractical to apply them throughout long videos. One can save considerable effort by applying these expensive algorithms sparingly, directing their application using the results of more limited processing. We show how to do this for retrospective video analysis by modeling a video using a chain graphical model and performing inference both to analyze the video and to direct processing. We apply our method to problems in background subtraction and face detection, and show in experiments that this leads to significant improvements over baseline algorithms.  相似文献   

12.
胡志军  徐勇 《计算机科学》2020,47(1):117-123
视频是携带信息量最大的媒体,随着抖音短视频等APP的兴起,网络以及数据库的视频数量急剧增加,人工标注的方法已经无法胜任视频检索的任务。视频检索通过提取视频帧的空间特征或者帧与帧之间的时间特征,使得用户能够更客观、更高效地进行视频查找与归类。文中概述了基于内容的视频检索算法,归纳总结了视频检索的一些经典算法,并总结了深度学习在基于内容的视频检索中的研究与应用,最后分析了深度学习在视频检索中的发展前景。  相似文献   

13.
Location-based services (LBSs), considered as a killer application in the wireless data market, provide information based on locations specified in the queries. In this paper, we examine the indexing issue for querying location-dependent data in wireless LBSs; in particular, we focus on an important class of queries, planar point queries. To address the issues of responsiveness, energy consumption, and bandwidth contention in wireless communications, an index has to minimize the search time and maintain a small storage overhead. It is shown that the traditional point-location algorithms and spatial index structures fail to achieve either objective or both. This paper proposes a new index structure, called D-tree, which indexes spatial regions based on the divisions that form the boundaries of the regions. We describe how to construct a binary D-tree index, how to process queries based on the D-tree, and how to page the binary D-tree. Moreover, two parameterized methods for partitioning the original space, called fixed grid assignment (FGA) and adaptive grid assignment (AGA), are proposed to enhance the D-tree. The performance of the D-tree is evaluated using both synthetic and real data sets. Experimental results show that the proposed D-tree outperforms the well-known indexes such as the R/sup */-tree, and that both the FGA and AGA approaches can achieve different performance trade-offs between the index search time and storage overhead by fine-tuning their algorithmic parameters.  相似文献   

14.
Automatic parsing and indexing of news video   总被引:9,自引:0,他引:9  
Automatic construction of content-based indices for video source material requires general semantic interpretation of both images and their accompanying sounds; but such a broadly-based semantic analysis is beyond the capabilities of the current technologies of machine vision and audio signal analysis. However, if one can assume a limited and well-demarcated body of domain knowledge for describing the content of a body of video, then it becomes easier to interpret a video source in terms of that domain knowledge. This paper presents our work on using domain knowledge to parse news video programs and to index them on the basis of their visual content. Models based on both the spatial structure of image frames and the temporal structure of the entire program have been developed for news videos, along with algorithms that apply these models by locating and identifying instances of their elements. Experimental results are also discussed in detail to evaluate both the models and the algorithms that use them. Finally, proposals for future work are summarized.  相似文献   

15.
Facing the explosive growth of near-duplicate videos, video archaeology is quite desired to investigate the history of the manipulations on these videos. With the determination of derived videos according to the manipulations, a video migration map can be constructed with the pair-wise relationships in a set of near-duplicate videos. In this paper, we propose an improved video archaeology (I-VA) system by extending our previous work (Shen et al. 2010). The extensions include more comprehensive video manipulation detectors and improved techniques for these detectors. Specially, the detectors are used for two categories of manipulations, i.e., semantic-based manipulations and non-semantic-based manipulations. Moreover, the improved detecting algorithms are more stable. The key of I-VA is the construction of a video migration map, which represents the history of how near-duplicate videos have been manipulated. There are various applications based on the proposed I-VA system, such as better understanding of the meaning and context conveyed by the manipulated videos, improving current video search engines by better presentation based on the migration map, and better indexing scheme based on the annotation propagation. The system is tested on a collection of 12,790 videos and 3,481 duplicates. The experimental results show that I-VA can discover the manipulation relation among the near-duplicate videos effectively.  相似文献   

16.
This paper addresses the problem of object tracking in video sequences for surveillance applications by using a recently proposed structural similarity-based image distance measure. Multimodality surveillance videos pose specific challenges to tracking algorithms, due to, for example, low or variable light conditions and the presence of spurious or camouflaged objects. These factors often cause undesired luminance and contrast variations in videos produced by infrared sensors (due to varying thermal conditions) and visible sensors (e.g., the object entering shadowy areas). Commonly used colour and edge histogram-based trackers often fail in such conditions. In contrast, the structural similarity measure reflects the distance between two video frames by jointly comparing their luminance, contrast and spatial characteristics and is sensitive to relative rather than absolute changes in the video frame. In this work, we show that the performance of a particle filter tracker is improved significantly when the structural similarity-based distance is applied instead of the conventional Bhattacharyya histogram-based distance. Extensive evaluation of the proposed algorithm is presented together with comparisons with colour, edge and mean-shift trackers using real-world surveillance video sequences from multimodal (infrared and visible) cameras.  相似文献   

17.
18.
The emergence of smart edge-network content item hotspots, which are equipped with huge storage space (e.g., several GBs), opens up the opportunity to study the possibility of delivering videos at the edge network. Different from both the conventional content item delivery network (CDN) and the peer-to-peer (P2P) scheme, this new delivery paradigm, namely edge video CDN, requires up to millions of edge hotspots located at users’ homes/offices to be coordinately managed to serve mobile video content item. Specifically, two challenges are involved in building edge video CDN, including how edge content item hotspots should be organized to serve users, and how content items should be replicated to them at different locations to serve users. To address these challenges, we propose our data-driven design as follows. First, we formulate an edge region partition problem to jointly maximize the quality experienced by users and minimize the replication cost, which is NP-hard in nature, and we design a Voronoi-like partition algorithm to generate optimal service cells. Second, to replicate content items to edge-network content item hotspots, we propose an edge request prediction based replication strategy, which carries out the replication in a server peak offloading manner. We implement our design and use trace-driven experiments to verify its effectiveness. Compared with conventional centralized CDN and popularity-based replication, our design can significantly improve users’ quality of experience, in terms of users’ perceived bandwidth and latency, up to 40%.  相似文献   

19.
Traditional nearest-neighbor (NN) search is based on two basic indexing approaches: object-based indexing and solution-based indexing. The former is constructed based on the locations of data objects: using some distance heuristics on object locations. The latter is built on a precomputed solution space. Thus, NN queries can be reduced to and processed as simple point queries in this solution space. Both approaches exhibit some disadvantages, especially when employed for wireless data broadcast in mobile computing environments. In this paper, we introduce a new index method, called the grid-partition index, to support NN search in both on-demand access and periodic broadcast modes of mobile computing. The grid-partition index is constructed based on the Voronoi diagram, i.e., the solution space of NN queries. However, it has two distinctive characteristics. First, it divides the solution space into grid cells such that a query point can be efficiently mapped into a grid cell around which the nearest object is located. This significantly reduces the search space. Second, the grid-partition index stores the objects that are potential NNs of any query falling within the cell. The storage of objects, instead of the Voronoi cells, makes the grid-partition index a hybrid of the solution-based and object-based approaches. As a result, it achieves a much more compact representation than the pure solution-based approach and avoids backtracked traversals required in the typical object-based approach, thus realizing the advantages of both approaches. We develop an incremental construction algorithm to address the issue of object update. In addition, we present a cost model to approximate the search cost of different grid partitioning schemes. The performances of the grid-partition index and existing indexes are evaluated using both synthetic and real data. The results show that, overall, the grid-partition index significantly outperforms object-based indexes and solution-based indexes. Furthermore, we extend the grid-partition index to support continuous-nearest-neighbor search. Both algorithms and experimental results are presented. Edited by R. Guting  相似文献   

20.
近年来,视频换脸技术发展迅速。该技术可被用于伪造视频来影响政治行动和获得不当利益,从而给社会带来严重危害,目前已经引起了各国政府和舆论的广泛关注。本文通过分析现有的主流视频换脸生成技术和检测技术,指出当前主流的生成方法在时域和空域中均具有伪造痕迹和生成损失。而当前基于神经网络检测合成人脸视频的算法大部分方法只考虑了空域的单幅图像特征,并且在实际检测中有明显的过拟合问题。针对目前检测方法的不足,本文提出一种高效的基于时空域结合的检测算法。该方法同时对视频换脸生成结果在空域与时域中的伪造痕迹进行捕捉,其中,针对单帧的空域特征设计了全卷积网络模块,该模块采用3D卷积结构,能够精确地提取视频帧阵列中每帧的伪造痕迹;针对帧阵列的时域特征设计了卷积长短时记忆网络模块,该模块能够检测伪造视频帧之间的时序伪造痕迹;最后,根据特征分类设计特征网络金字塔网络结构,该结构能够融合不同尺寸的时空域特征,通过多尺度融合来提高分类效果,并减少过拟合现象。与现有方法相比,该方法在训练中的收敛效果和分类效果方面有明显优势。除此之外,我们在保证检测准确率的前提下采用较少的参数,相比现有结构而言训练效率更高。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号