Research Article - (2021) Volume 10, Issue 6
, DOI: 10.35248/2469-4134.21.10.291
Satellite imagery is one of the emerging technologies which is extensively utilized in various applications such as detection/extraction of man-made structures, monitoring of sensitive areas, creating graphic maps etc. The main approach here is the automated detection of buildings from very high resolution (VHR) optical satellite images. Initially the shadow and the building region are investigated and building extraction is mainly focused. Once all the landscape is collected a trimming process is done so as to eliminate the landscapes that may occur due to nonbuilding objects. Finally the label method is used to extract the building regions. The label method may be altered for efficient building extraction. The images used for the analysis are the ones which are extracted from the sensors having resolution less than 1 meter (VHR). This method provides an efficient way to produce good results. The additional overhead of mid processing is eliminated without compromising the quality of the output to ease out processing steps required and time consumed in the same. /
In today’s scenario almost more than 50% of the population resides in urban and sub-urban environment. The manual monitoring of the land coverage area is difficult and not feasible as it would provide inaccurate values leading to inconvenient informatory data. So to obtain an acceptable database the satellite imagery technologies comes in picture. As the main concern for humans is the building areas in an environment hence the reliable and accurate extraction of buildings from satellite images becomes an important task. Taking into account the application perspective of satellite imagery this field emerges as an active research field. To justify the problem the basic concentration is on the automatic building extraction technique from the VHR images.
Automatic detection of buildings in very high spatial resolution remotely sensed imagery has been an important and critical problem because the detection/extraction results can be used in various applications viz. structure change detection, urbanization monitoring, and digital map production. This task also offers an excellent domain for studying the general problems of scene segmentation, 3D inference, and shape description under highly challenging conditions. Very high resolution satellite images provide valuable information to researchers. Among these, urbanarea boundaries and building locations play crucial roles. For a human expert, manually extracting this valuable information is tedious and time consuming. One possible solution to extract this information or data is using automated techniques. The most important data input source to be utilized for the purpose of object extraction are the very high resolution (VHR) satellite images [1]. Since more than 50% of the world population lives in urban and sub-urban environments [2], reliable and accurate detection of building objects from satellite images is an essential task and is a very active research field. The sensors which provide VHR satellite images are QuickBird, GeoEye I, GeoEye II, Worldview I, Worldview II etc, since its resolution is 1 meter or less. Human settlement analysis for slum and unorganized settlement monitoring can be assisted by automatically extracted building information because slum areas can generally be characterized by a high density of short and small buildings in irregular spatial arrangements [3].
The recent work performed by Akçay and Aksoy, investigates the shadow evidence to focus on building regions [3]. With accordance to this concept, the directional spatial relationship between buildings and their shadows with the prior knowledge of illumination direction is modelled. For the same, a new fuzzy landscape generation approach is proposed which is especially designed for modelling the directional relationship between buildings and their shadows. Once all landscapes are collected, a trimming process is applied for the elimination of the landscapes that may occur due to non-building objects viz. roads, sewages, garden wall, bridges etc.
Literature survey
Many studies and research have been carried out in the context of building detection, extraction and reconstruction. Simultaneously other man-made structures have also been considered for maintaining and updating geographic information system (GIS) databases. A number of surveys and methodologies were considered in the past to do the same. A state-of-the-art automatic object extraction technique from aerial imagery [4] was surveyed in the year 1999. This survey included approaches for object extraction from satellite images, which influenced the extraction from aerial imagery. It only covered models and strategies using well defined criteria. Algorithms and underlying technologies were not reviewed. Assessment, Complexity Criteria for the Assessment of Images/Models/Strategies, Characterization of Models, Characterization of Strategies, Classification of Models and Strategies were the approaches carried out in the survey. Since this was a survey it rendered as an information source for further analysis. With the existing geo-data a building and road detection technique [5] was also developed which focused on the analysis and aspects of knowledge that could be used for extraction such as types of knowledge, problems in using existing knowledge, knowledge representation and management, current and possible use of knowledge, upgrading and augmenting of knowledge [6]. Approaches were also developed for building extraction and updating from high resolution satellite imagery (Figure 1) [7].
Figure 1:Architectures and components of image analysis systems for object extraction.
The developed approaches include two main stages:
• Detecting the building patches and
• Delineating the building boundaries.
The building patches were detected from high resolution satellite imagery using the Support Vector Machines (SVM) classification, which is utilized for both the building extraction and updating approaches. In the building extraction part, the previously detected building patches were delineated using the Hough transform and boundary tracing based techniques.
Extraction and description of cultural man-made features and objects, such as buildings and transportation networks were also a research topic in the past. The textural features such as densities, shape of the structures, image quality were analysed. The methodology consisted of the following procedure: Detecting lines and corners, label corners based on shadows, trace object boundaries and verify hypotheses.
Since the shadow of an object also plays a vital role during the building extraction process, studying it was also expected. A computational technique for utilizing the relationship between shadows and man-made structures to aid in the automatic extraction of man-made structures from aerial imagery was studied. Four methods were described that performed the prediction of structure shape, grouping of related structures, verification of individual structures, and structure height estimation. In each method the relationship between structure and cast shadows was exploited in a unique fashion [8].
After the twentieth century the emergence of fuzzy logic was a widely accepted area of interest. An attempt was made to present an Object-based approach for urban land cover classification from high resolution multispectral image data that builds upon a pixel-based fuzzy classification [9] approach. This combined pixel/object approach was demonstrated using pan-sharpened multispectral IKONOS imagery from dense urban areas.
The fuzzy pixel-based classifier utilized both spectral and spatial information to discriminate between spectrally similar Road and Building urban land cover classes.
Further the images were segmented and accordingly the nonbuilding, non-road surface were eliminated. Using these techniques, the object-based classifier was able to identify Buildings, Impervious Surface, and Roads in dense urban areas with 76%, 81%, and 99% classification accuracies [5].
As the resolution quality of the satellite sensors upgraded, there arises a need for better quality performance tool for computer aided interpretation. Hence a system was designed for the detection and recognition of man-made objects in high resolution optical remote sensing images. Detection was done by finding a small rectangular area in the image containing an object. Recognition was the attribution of a class label [10]. Supervised learning approach based on support vector machines was used. The system would learn a generic model for each class of objects by using a geometric characterization of the examples in the database (SPOT 5 THR images, 2.5 m resolution). High number of geometric image features were utilized which allowed characterizing several classes of objects with different geometric properties using a supervised learning approach [3]. The results showed the possibility of discrimination of several classes of objects with classification rates higher than 80%.
The system consists of multiple stages which are initially being segregated and then these individual stages are designed, which are further pooled to obtain the final required output.
Image preparation and Pre-processing: The image used or selected is pan-sharped which undergoes pre-processing. The pre-processing includes grey scale conversion and enhancing the image to fulfil the characteristics of the input image. The pre- processing also includes thresholding at various levels.
Vegetation extraction and shadow detection: For vegetation extraction NDVI (Normalized Difference Vegetation Index) is the widely accepted metric. By applying an appropriate threshold we compute a binary vegetation mask.
NDV I=((NIR–R))/((NIR+R))
(1)
NIR and R represent the normalized near-infrared and red image bands: For automatic shadow detection the multispectral false color shadow detection [11] is the convenient technique due to two reasons (i) it utilizes advantage of near-infrared (NIR) image (ii) it is fully independent of user and data-dependent thresholds.
(Ratio Map) RS=(S − I)/(S+I)
(2)
Where,
(S)-normalized saturation and
(I)-normalized intensity.
Shadow detection and removal: A model for the spatial arrangement between shadow and building is designed using a morphological fuzzy relation. With reference to the object and a specified direction, the landscape around the reference object along the given direction can be defined as a fuzzy set of membership values in image space. The landscape membership values are defined in the range of 0 and 1.
Fuzzy relation approach is used to determine the spatial arrangement between buildings and their shadows.
Morphological characteristics information are utilized to find the exact relationship.
With a reference (shadow) object and a direction specified by an angle −, the landscape around the reference object along the given direction is defined as a fuzzy set of membership values in image space.
In an urban area, it is essential for a building detection task to eliminate the landscapes that may occur due to shadows cast by non-building objects. To separate the landscapes of building and other non-building objects, the height difference of the objects compared to the terrain height is assessed. A minimum shadow length is computed, which is then compared with the perimeter pixels of a shadow object. If the length is found to be satisfying the length Lmin, an assumption is made that the shadow is cast from a non-building object, and thus, the generated fuzzy landscape is rejected (Figure 2) [12].
Figure 2: Building detection approach.
Building Detection: Finally now the building and the non-building region need to be extracted. The Classical image segmentation tools use either texture (colour) information, e.g. Magic Wand, or edge (contrast) information, e.g. Intelligent Scissors.
The filtered image is then passed through Grab Cut [13] label methodology, bw and rgb to extract the building structure. The resultant blobs are numbered which is then outlined to reconstruct the building structures (Figure 3)
Figure 3: Color to HSI normalization.
Step 1: Pre-processing
Above are the pre-processing stages. The process includes conversion of Input Image (the image that contains building structure) Figure 4 to grayscale Figure 5 which then acts as the input to the enhancement stage. Enhancement is done using the morphological dilation and erosion methodology, the Figure 6 shows the enhanced image. The enhanced image then is made to undergo different levels of thresholding, Figures 7-9 to provide an appropriate input for the Vegetation Extraction and Shadow Detection stage.
Figure 4: Input Image
Figure 5: Input Grey Image.
Figure 6: Enhanced Image
Figure 7: Threshold, n=2
Figure 8: Threshold, n=3.
Figure 9: Threshold, n=4.
For vegetation extraction the NDVI is used. The filtered image Figure 10 seen is that which is obtained based on the output of the pre-processing and the actual input image which undergoes the NDVI processing. This then is smoothened Figure 11 to negate out the inducive noise and connected components. The histogram Figure 12 manifests the background and foreground for the original image, which is thresholded at a gray level of 70.
Figure 10: Filtered Image
Figure 11: Smoothened Image
Shadow detection and its extraction play a vital role in obtaining efficient and accurate output. The shadow here is similar to any unwanted noise (which needs to be taken care of) either in speech or image processing. In this stage of execution the image is filtered and thinned Figure 13 to eliminate the noise followed by edge detection Figure 14.
This edged image is then made to go through a dual adaptive threshold Figure 15 process to detect the abnormal regions and eliminate them too. Once this is done, “hole fill” is initiated to get rid of any background pixels inside the blobs Figure 16. The resultant image is then dilated Figure 17 and the regions which are minuet Figure 18 and merely useful are discarded so that the image has only those regions which have building structure in it.
The final stage involves partioning and post processing. The grab cut partioning utilizes the foreground and the background Figure 12 along with the BW labeled Figure 19 and pseudo colored labeled Figure 20 to segment the landscape. The resultant image is then outlined Figure 21 for the structure boundaries which at last is mapped over the original grayscale image to display the building region Figure 22.
Figure 12: Histogram: Background and Foreground
Figure 13: Thinned Morphology Image
Figure 14: Edged Image
Figure 15: Dual Adaptive Threshold Image
Figure 16: Filed Holes
Figure 17: Dilated Image
Figure 18: Removing Small Regions
Figure 19: BW Labelled Image
Figure 20: Pseudo Coloured Labelled Image
Figure 21: Outlined Image
Figure 22: Building Extracted Image
The automatic extraction of building is possible with,
• Higher accuracy.
• Least Processing time.
The performance of the above approach is affected majority by the shadow generation which seems to be a tentative drawback. Also the non-building regions come into picture to impact the quality of the output. In future the proposed method could be used to generate 3-D representation of the detected buildings. The problem of detecting buildings retains many complexities requiring substantial future research. The future scope would be to develop and integrate road and/or bridge detection with the current methodology to eliminate the superfluous land area which does not fall under the building category. By this way, most of the road segments that are erroneously labeled can be identified and eliminated. The other future scope would be to improve the boundary detection to enhance the output quality leading to a higher level of accuracy of the building by means of a generalization process. Additionally, there is also a possibility to reconstruct the detected buildings regions; therefore, as a final future work, there can be a plan to generate a 3-D representation of the detected buildings.