最近看programcreek的《Simple Java》材料,在 How to Check if an Array Contains a Value in Java Efficiently一文中作者列举了四中解决方案,分别是使用List、Set、loop、binarySearch方法,如下所示:
package atlas; import java.util.Arrays; import java.util.HashSet; import java.util.Set; /** * @author atlas */ //Four Different Ways to Check If an Array Contains a Value public class checkArrayContailAValue { // use list public boolean useList(String[] arr, String targetValue) { return Arrays.asList(arr).contains(targetValue); } //use set public boolean useSet(String[] arr, String targetValue) { Set<String> set = new HashSet<String>(Arrays.asList(arr)); return set.contains(targetValue); } //use loop public boolean useLoop(String[] arr, String targetValue) { for(String s: arr){ if(s.equals(targetValue)) return true; } return false; } //use binarysearch public boolean useArraysBinarySearch(String[] arr, String targetValue) { int a = Arrays.binarySearch(arr, targetValue); return a > 0; } }
并且使用了数组为不同大小的的测试用例:5、1k、10k
在我机器运行的时间分别是:
结果很明显,使用二分查找的方式是最快的,这个不难理解(O(log(n))的复杂度),但是不要忘了一个前提,二分查找的数组必须是有序的!,以为到这里文章结束了么?不,并没有那么简单。我们看到其他三种方式的差别比较大,这是为什么呢?这是我们今天研究的重点!
首先,我们来分析下两个时间相近的方式,使用List和Loop的方式。
使用loop的方式,好理解是ava的for循环并结合泛型使用(本质是采用了迭代器Iterator的遍历),这里速度是最快的;
其次来看下List,为什么它的耗时比loop方式大一些呢,分析这个原因,需要知道这两点,(1)将数组array转化为list是需要成本的;(2)list的contatains方式的处理方式,我们逐个分析,将数组转为list,是调用的Arrays.asList()方法,看Arrays的源码中关于这个实现,
/** * Returns a fixed-size list backed by the specified array. (Changes to * the returned list "write through" to the array.) This method acts * as bridge between array-based and collection-based APIs, in * combination with {@link Collection#toArray}. The returned list is * serializable and implements {@link RandomAccess}. * * <p>This method also provides a convenient way to create a fixed-size * list initialized to contain several elements: * <pre> * List<String> stooges = Arrays.asList("Larry", "Moe", "Curly"); * </pre> * * @param a the array by which the list will be backed * @return a list view of the specified array */ public static <T> List<T> asList(T... a) { return new ArrayList<T>(a); }
是调用ArrayList的一个构造函数,传入的参数一个数组,返回一个可调整大小的arrayList。
private static class ArrayList<E> extends AbstractList<E> implements RandomAccess, java.io.Serializable { private static final long serialVersionUID = -2764017481108945198L; private final E[] a; ArrayList(E[] array) { if (array==null) throw new NullPointerException(); a = array; } ... }
这个转换的过程是一个赋值的过程,需要消耗一定的时间。我们再来看下contains方式的实现,
/** * Returns <tt>true</tt> if this list contains the specified element. * More formally, returns <tt>true</tt> if and only if this list contains * at least one element <tt>e</tt> such that * <tt>(o==null ? e==null : o.equals(e))</tt>. * * @param o element whose presence in this list is to be tested * @return <tt>true</tt> if this list contains the specified element */ public boolean contains(Object o) { return indexOf(o) >= 0; } /** * Returns the index of the first occurrence of the specified element * in this list, or -1 if this list does not contain the element. * More formally, returns the lowest index <tt>i</tt> such that * <tt>(o==null ? get(i)==null : o.equals(get(i)))</tt>, * or -1 if there is no such index. */ public int indexOf(Object o) { if (o == null) { for (int i = 0; i < size; i++) if (elementData[i]==null) return i; } else { for (int i = 0; i < size; i++) if (o.equals(elementData[i])) return i; } return -1; }
可以看到contains方式内部也是通过一个for循环比较来寻找是否有这个元素,也就是同loop方式一样;
由此,可以推算出来,数组转为list的开销也比较大。
最后,来看一下最耗时的方式Set方法,为啥这个方式最耗时呢,首先你肯定想到了,转换的开销是比较大的,而且还是经过了两种的转换,
Set<String> set = new HashSet<String>(Arrays.asList(arr));
private transient HashMap<E,Object> map /** * Constructs a new set containing the elements in the specified * collection. The <tt>HashMap</tt> is created with default load factor * (0.75) and an initial capacity sufficient to contain the elements in * the specified collection. * * @param c the collection whose elements are to be placed into this set * @throws NullPointerException if the specified collection is null */ public HashSet(Collection<? extends E> c) { map = new HashMap<E,Object>(Math.max((int) (c.size()/.75f) + 1, 16)); addAll(c); }
/** * {@inheritDoc} * * <p>This implementation iterates over the specified collection, and adds * each object returned by the iterator to this collection, in turn. * * <p>Note that this implementation will throw an * <tt>UnsupportedOperationException</tt> unless <tt>add</tt> is * overridden (assuming the specified collection is non-empty). * * @throws UnsupportedOperationException {@inheritDoc} * @throws ClassCastException {@inheritDoc} * @throws NullPointerException {@inheritDoc} * @throws IllegalArgumentException {@inheritDoc} * @throws IllegalStateException {@inheritDoc} * * @see #add(Object) */ public boolean addAll(Collection<? extends E> c) { boolean modified = false; Iterator<? extends E> e = c.iterator(); while (e.hasNext()) { if (add(e.next())) modified = true; } return modified; }
首先是先申请一个hashmap,然后通过addall()方法将list元素放入到map中,addall方法也是用过迭代器的方式挨个放入元素,然后调用contains方式,
public Iterator<Map.Entry<K,V>> iterator() { return newEntryIterator(); } public boolean contains(Object o) { if (!(o instanceof Map.Entry)) return false; Map.Entry<K,V> e = (Map.Entry<K,V>) o; Entry<K,V> candidate = getEntry(e.getKey()); return candidate != null && candidate.equals(e); } public boolean remove(Object o) { return removeMapping(o) != null; } public int size() { return size; } public void clear() { HashMap.this.clear(); } }
同样也是一个循环比较的过程。
至此,我们分析了这几种方式的耗时情况以及原因,在项目开发中对于数据量不大的情况下还是建议使用Loop的方式来处理,你知道了么?